Bug#1065701: rocm_agent_enumerator: crash on systems without AMD GPU

2024-03-15 Thread Petter Reinholdtsen


Control: notfound -1 5.7.1-1

[Cordell Bloor]
> reportbug noted that I had 5.7.1-1 installed, but my report was based on 
> the rocm-hipamd autopkgtests run on the DebCI. I didn't notice that the 
> DebCI was using a different version of rocminfo than me, as I'd assumed 
> it used packages from Unstable.

OK.  Updated issue metadata.

-- 
Happy hacking
Petter Reinholdtsen



Bug#1065701: rocm_agent_enumerator: crash on systems without AMD GPU

2024-03-15 Thread Cordell Bloor

Hi Petter,

On 2024-03-15 02:39, Petter Reinholdtsen wrote:

[Cordell Bloor]

Thanks Petter. After inspecting the code and reviewing both your report
and the buildd logs, my conclusion is that this issue was fixed by
upstream and included in 5.7.1-1.

But your original report claimed it was present in version 5.7.1-1?
Which version was used to trigger the error?
reportbug noted that I had 5.7.1-1 installed, but my report was based on 
the rocm-hipamd autopkgtests run on the DebCI. I didn't notice that the 
DebCI was using a different version of rocminfo than me, as I'd assumed 
it used packages from Unstable. In retrospect, I suppose it makes sense 
that Testing is used as the base for the autopkgtests, since they're 
used to gate migration to Testing.


Sincerely,
Cory Bloor


Bug#1065701: rocm_agent_enumerator: crash on systems without AMD GPU

2024-03-15 Thread Petter Reinholdtsen
[Cordell Bloor]
> Thanks Petter. After inspecting the code and reviewing both your report 
> and the buildd logs, my conclusion is that this issue was fixed by 
> upstream and included in 5.7.1-1.

But your original report claimed it was present in version 5.7.1-1?
Which version was used to trigger the error?

-- 
Happy hacking
Petter Reinholdtsen



Bug#1065701: rocm_agent_enumerator: crash on systems without AMD GPU

2024-03-15 Thread Cordell Bloor
On Sun, 10 Mar 2024 06:56:56 +0100 Petter Reinholdtsen  
wrote:

>
> In my sid chroot, on a laptop with no AMD GPU, I get this:
>
> root@minerva:/# rocminfo
> ROCk module is NOT loaded, possibly no GPU devices
> root@minerva:/# rocm_agent_enumerator
> gfx000
> root@minerva:/#

Thanks Petter. After inspecting the code and reviewing both your report 
and the buildd logs, my conclusion is that this issue was fixed by 
upstream and included in 5.7.1-1.


Sincerely,
Cory Bloor



Bug#1065701: rocm_agent_enumerator: crash on systems without AMD GPU

2024-03-09 Thread Petter Reinholdtsen


In my sid chroot, on a laptop with no AMD GPU, I get this:

root@minerva:/# rocminfo 
ROCk module is NOT loaded, possibly no GPU devices
root@minerva:/# rocm_agent_enumerator
gfx000
root@minerva:/#

-- 
Happy hacking
Petter Reinholdtsen



Bug#1065701: rocm_agent_enumerator: crash on systems without AMD GPU

2024-03-09 Thread Christian Kastner
Control: found -1 5.2.3-3

Hi Cory,

On 2024-03-09 07:20, Cordell Bloor wrote:
> On systems, the rocm_agent_enumerator command may crash with an error:
> 
> Traceback (most recent call last):
>   File "/usr/bin/rocm_agent_enumerator", line 260, in 
> main()
>   File "/usr/bin/rocm_agent_enumerator", line 244, in main
> target_list = readFromKFD()
>   ^
>   File "/usr/bin/rocm_agent_enumerator", line 193, in readFromKFD
> for node in sorted(os.listdir(topology_dir)):
>
> FileNotFoundError: [Errno 2] No such file or directory: 
> '/sys/class/kfd/kfd/topology/nodes/'

I've been seeing this one for a long time in package builds, but it
didn't occur to me that this is a user-visible issue, too.

Seen here [1], for example.

Best,
Christian

[1] 
https://buildd.debian.org/status/fetch.php?pkg=rocblas=amd64=5.3.3%2Bdfsg-2=1685955323=0



Bug#1065701: rocm_agent_enumerator: crash on systems without AMD GPU

2024-03-08 Thread Cordell Bloor
Package: rocminfo
Version: 5.7.1-1
Severity: normal
X-Debbugs-Cc: c...@slerp.xyz

Dear Maintainer,

On systems, the rocm_agent_enumerator command may crash with an error:

Traceback (most recent call last):
  File "/usr/bin/rocm_agent_enumerator", line 260, in 
main()
  File "/usr/bin/rocm_agent_enumerator", line 244, in main
target_list = readFromKFD()
  ^
  File "/usr/bin/rocm_agent_enumerator", line 193, in readFromKFD
for node in sorted(os.listdir(topology_dir)):
   
FileNotFoundError: [Errno 2] No such file or directory: 
'/sys/class/kfd/kfd/topology/nodes/'

It's not clear to me exactly why this error is emitted. Perhaps it's
because the system does not have an AMD GPU at all. In that case, the
expected output would be "gfx000\n". The purpose of
rocm_agent_enumerator is to list all AMD GPUs on a system. If there are
no AMD GPUs, then it should be an empty list.

This behaviour can be seen in the rocm-hipamd autopkgtests [1]. While
hipcc should probably not be calling rocm_agent_enumerator when the
offload architecture has been manually specified, the
rocm_agent_enumerator shouldn't be emiting any output on stderr.

Sincerely,
Cory Bloor

[1]: 
https://ci.debian.net/data/autopkgtest/testing/amd64/r/rocm-hipamd/43752739/log.gz

-- System Information:
Debian Release: trixie/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 6.6.15-amd64 (SMP w/32 CPU threads; PREEMPT)
Locale: LANG=C, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: unable to detect

Versions of packages rocminfo depends on:
ii  kmod31+20240202-2
ii  libc6   2.37-15.1
ii  libgcc-s1   14-20240303-1
ii  libhsa-runtime64-1  5.7.1-1
ii  libstdc++6  14-20240303-1
ii  pciutils1:3.11.1-1
ii  python3 3.11.8-1

rocminfo recommends no packages.

rocminfo suggests no packages.

-- no debconf information