Bug#1065410: libhsa-runtime64-1: assertion in gfx10addrlib.cpp on gfx1035
On Mon, 04 Mar 2024 04:35:50 + Cordell Bloor wrote: > Many tests began failing for the gfx1035 ISA on the Debian ROCm CI upon > the update to libhsa-runtime64-1 (5.7.1-1). The failure is an assertion: > > ./src/image/addrlib/src/gfx10/gfx10addrlib.cpp:1083: virtual rocr::Addr::ChipFamily rocr::Addr::V2::Gfx10Lib::HwlConvertChipFamily(unsigned int, unsigned int): Assertion `false' failed. > > The rocblas test logs suggest that this was introduced with the update > to rocr-runtime 5.7.1-1 [1], as the tests passed before [2]. On Debian > Testing, it even passed with libhsakm1 (5.7.0-1) [3]. > > The assertion is complaining that it's not a Rembrandt ASIC [4]. > However, the test system is a Minisforum UM773 Lite with an AMD Ryzen > 7735 HS (/w AMD Radeon 680M integrated graphics). This seems to be due to the check on the chipRevision that being added some time between 5.2.3 and 5.7.1. For the APUs, the check is written as ensuring that the revision is in the range 0x1 to 0xFF [5]. However, the chipRevision of my Rembrandt APU is 0x00 within this function. rocminfo reports Chip ID: 5761(0x1681) ASIC Revision: 2(0x2) so I imagine that the chip revision should probably be 2 and the value of 0 is really just because it was never initialized. I've reproduced the problem using AMD's prebuilt binaries from ROCm 6.0.2, so this is an issue in the upstream project as well. Sincerely, Cory Bloor > [1]: https://ci.rocm.debian.net/data/autopkgtest/unstable/amd64+gfx1035/r/rocblas/7826/log.gz > [2]: https://ci.rocm.debian.net/data/autopkgtest/unstable/amd64+gfx1035/r/rocblas/4334/log.gz > [3]: https://ci.rocm.debian.net/data/autopkgtest/testing/amd64+gfx1035/r/rocblas/8115/log.gz > [4]: https://salsa.debian.org/rocm-team/rocr-runtime/-/blob/debian/5.7.1-1/src/image/addrlib/src/gfx10/gfx10addrlib.cpp?ref_type=tags#L1083 [5]: https://salsa.debian.org/rocm-team/rocr-runtime/-/blob/debian/5.7.1-1/src/image/addrlib/src/amdgpu_asic_addr.h#L123
Bug#1065410: libhsa-runtime64-1: assertion in gfx10addrlib.cpp on gfx1035
Package: libhsa-runtime64-1 Version: 5.7.1-1 Severity: normal X-Debbugs-Cc: c...@slerp.xyz Dear Maintainer, Many tests began failing for the gfx1035 ISA on the Debian ROCm CI upon the update to libhsa-runtime64-1 (5.7.1-1). The failure is an assertion: ./src/image/addrlib/src/gfx10/gfx10addrlib.cpp:1083: virtual rocr::Addr::ChipFamily rocr::Addr::V2::Gfx10Lib::HwlConvertChipFamily(unsigned int, unsigned int): Assertion `false' failed. The rocblas test logs suggest that this was introduced with the update to rocr-runtime 5.7.1-1 [1], as the tests passed before [2]. On Debian Testing, it even passed with libhsakm1 (5.7.0-1) [3]. The assertion is complaining that it's not a Rembrandt ASIC [4]. However, the test system is a Minisforum UM773 Lite with an AMD Ryzen 7735 HS (/w AMD Radeon 680M integrated graphics). That's Rembrandt. Sincerely, Cory Bloor [1]: https://ci.rocm.debian.net/data/autopkgtest/unstable/amd64+gfx1035/r/rocblas/7826/log.gz [2]: https://ci.rocm.debian.net/data/autopkgtest/unstable/amd64+gfx1035/r/rocblas/4334/log.gz [3]: https://ci.rocm.debian.net/data/autopkgtest/testing/amd64+gfx1035/r/rocblas/8115/log.gz [4]: https://salsa.debian.org/rocm-team/rocr-runtime/-/blob/debian/5.7.1-1/src/image/addrlib/src/gfx10/gfx10addrlib.cpp?ref_type=tags#L1083 -- System Information: Debian Release: trixie/sid APT prefers unstable APT policy: (500, 'unstable'), (1, 'experimental') Architecture: amd64 (x86_64) Kernel: Linux 6.6.15-amd64 (SMP w/32 CPU threads; PREEMPT) Locale: LANG=C, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set Shell: /bin/sh linked to /usr/bin/dash Init: unable to detect Versions of packages libhsa-runtime64-1 depends on: ii libc6 2.38-6 ii libdrm-amdgpu1 2.4.120-2 ii libdrm2 2.4.120-2 ii libelf1 0.190-1+b1 ii libgcc-s1 14-20240221-2.1 ii libhsakmt1 5.7.0-1 ii libstdc++6 14-20240221-2.1 libhsa-runtime64-1 recommends no packages. libhsa-runtime64-1 suggests no packages. -- no debconf information