Bug#1065410: libhsa-runtime64-1: assertion in gfx10addrlib.cpp on gfx1035

2024-03-17 Thread Cordell Bloor

On Mon, 04 Mar 2024 04:35:50 + Cordell Bloor  wrote:
> Many tests began failing for the gfx1035 ISA on the Debian ROCm CI upon
> the update to libhsa-runtime64-1 (5.7.1-1). The failure is an assertion:
>
> ./src/image/addrlib/src/gfx10/gfx10addrlib.cpp:1083: virtual 
rocr::Addr::ChipFamily 
rocr::Addr::V2::Gfx10Lib::HwlConvertChipFamily(unsigned int, unsigned 
int): Assertion `false' failed.

>
> The rocblas test logs suggest that this was introduced with the update
> to rocr-runtime 5.7.1-1 [1], as the tests passed before [2]. On Debian
> Testing, it even passed with libhsakm1 (5.7.0-1) [3].
>
> The assertion is complaining that it's not a Rembrandt ASIC [4].
> However, the test system is a Minisforum UM773 Lite with an AMD Ryzen
> 7735 HS (/w AMD Radeon 680M integrated graphics).

This seems to be due to the check on the chipRevision that being added 
some time between 5.2.3 and 5.7.1. For the APUs, the check is written as 
ensuring that the revision is in the range 0x1 to 0xFF [5]. However, the 
chipRevision of my Rembrandt APU is 0x00 within this function.


rocminfo reports

  Chip ID: 5761(0x1681)
  ASIC Revision:   2(0x2)

so I imagine that the chip revision should probably be 2 and the value 
of 0 is really just because it was never initialized.


I've reproduced the problem using AMD's prebuilt binaries from ROCm 
6.0.2, so this is an issue in the upstream project as well.


Sincerely,
Cory Bloor

> [1]: 
https://ci.rocm.debian.net/data/autopkgtest/unstable/amd64+gfx1035/r/rocblas/7826/log.gz
> [2]: 
https://ci.rocm.debian.net/data/autopkgtest/unstable/amd64+gfx1035/r/rocblas/4334/log.gz
> [3]: 
https://ci.rocm.debian.net/data/autopkgtest/testing/amd64+gfx1035/r/rocblas/8115/log.gz
> [4]: 
https://salsa.debian.org/rocm-team/rocr-runtime/-/blob/debian/5.7.1-1/src/image/addrlib/src/gfx10/gfx10addrlib.cpp?ref_type=tags#L1083


[5]: 
https://salsa.debian.org/rocm-team/rocr-runtime/-/blob/debian/5.7.1-1/src/image/addrlib/src/amdgpu_asic_addr.h#L123




Bug#1065410: libhsa-runtime64-1: assertion in gfx10addrlib.cpp on gfx1035

2024-03-03 Thread Cordell Bloor
Package: libhsa-runtime64-1
Version: 5.7.1-1
Severity: normal
X-Debbugs-Cc: c...@slerp.xyz

Dear Maintainer,

Many tests began failing for the gfx1035 ISA on the Debian ROCm CI upon
the update to libhsa-runtime64-1 (5.7.1-1). The failure is an assertion:

./src/image/addrlib/src/gfx10/gfx10addrlib.cpp:1083: virtual 
rocr::Addr::ChipFamily rocr::Addr::V2::Gfx10Lib::HwlConvertChipFamily(unsigned 
int, unsigned int): Assertion `false' failed.

The rocblas test logs suggest that this was introduced with the update
to rocr-runtime 5.7.1-1 [1], as the tests passed before [2]. On Debian
Testing, it even passed with libhsakm1 (5.7.0-1) [3].

The assertion is complaining that it's not a Rembrandt ASIC [4].
However, the test system is a Minisforum UM773 Lite with an AMD Ryzen
7735 HS (/w AMD Radeon 680M integrated graphics). That's Rembrandt.

Sincerely,
Cory Bloor

[1]: 
https://ci.rocm.debian.net/data/autopkgtest/unstable/amd64+gfx1035/r/rocblas/7826/log.gz
[2]: 
https://ci.rocm.debian.net/data/autopkgtest/unstable/amd64+gfx1035/r/rocblas/4334/log.gz
[3]: 
https://ci.rocm.debian.net/data/autopkgtest/testing/amd64+gfx1035/r/rocblas/8115/log.gz
[4]: 
https://salsa.debian.org/rocm-team/rocr-runtime/-/blob/debian/5.7.1-1/src/image/addrlib/src/gfx10/gfx10addrlib.cpp?ref_type=tags#L1083

-- System Information:
Debian Release: trixie/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 6.6.15-amd64 (SMP w/32 CPU threads; PREEMPT)
Locale: LANG=C, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: unable to detect

Versions of packages libhsa-runtime64-1 depends on:
ii  libc6   2.38-6
ii  libdrm-amdgpu1  2.4.120-2
ii  libdrm2 2.4.120-2
ii  libelf1 0.190-1+b1
ii  libgcc-s1   14-20240221-2.1
ii  libhsakmt1  5.7.0-1
ii  libstdc++6  14-20240221-2.1

libhsa-runtime64-1 recommends no packages.

libhsa-runtime64-1 suggests no packages.

-- no debconf information