Dear all,

I was now able to test my hypothesis that this is/was a bug in Linux that has been fixed since. Here's what I tried:

0. Under the old system, the test failed with the previously mentioned error message. To make sure it wasn't caused by some weird system state, I rebooted the machine and tested again. This did not change anything. 1. I upgraded linux-image-amd64 from 4.9+80+deb9u5 to 4.9+80+deb9u6. This installed linux-image-4.9.0-8-amd64 version 4.9.130-2 in addition to the previously installed and used linux-image-4.9.0-7-amd64 version 4.9.110-3+deb9u2. After rebooting into the new kernel, the test still failed. 2. Since that didn't help, I tried the exact same kernel image that works on the other, very similar machine, i.e. downgraded linux-image-4.9.0-8-amd64 to 4.9.110-3+deb9u4. After rebooting, the test still failed.

I then upgraded all installed packages to their latest versions. Unsurprisingly given the differences in package versions, that didn't change anything.

So I went back to the original idea that it's a GDB bug. I searched around a bit whether there were any other known issues with GDB and/or this CPU model, and I found a thread about cuda-gdb failing on certain versions with this Intel 6140 CPU [0] and the related GDB bug report [1]. It indicates that some bugs related to this CPU's (extended) instruction set have been fixed in a higher GDB version, which lead me to try out newer versions. Lo and behold, it works correctly with GDB 8.0 (both compiled from source and the 8.0-1 Debian package).

I bisected the 2288 commits between GDB 7.12 and 8.0 and identified upstream commit 51547df6 [2] as the one fixing this issue.

Cheers,
JAA

[0] https://devtalk.nvidia.com/default/topic/1038201/cuda-gdb/cuda-gdbserver-cannot-be-connected-unknown-register-ymm0h-requested/
[1] https://sourceware.org/bugzilla/show_bug.cgi?id=22137
[2] https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=51547df62c155231530ca502c485659f3d2b66cb

Reply via email to