[Bug libstdc++/87071] libstdc++ crashes during GPU driver initialization with suspected attempt to execute unsupported instruction by Athlon64 X2 TK-57
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87071 --- Comment #9 from Sergey Kondakov --- (In reply to Alexander Monakov from comment #8) > You should have mentioned you were using a custom-compiled Mesa, not the > distribution package (both here and in the original report to Mesa project). > > For some reason the disasm in the provided log is unusable (shows assembly > of the outermost frame), but downloading your package shows that failing > instruction is > >928ea: c5 fa 6f 05 0e 09 c3 00 vmovdqu 0xc3090e(%rip),%xmm0 > # cc3200 > > i.e. an AVX instruction, not supported on the CPU. Given that you were using > Clang to compile the package, this is not a GCC issue. You actually managed to get some info from separate package ? Amazing. I should have but half of my system is customized in some way, by me or by others via OBS's community repositories, at this point + it's rolling release distro. And my attention was completely drawn from Mesa. But here's the interesting part: a guy from openSUSE just figured out that offending code was launched by in-Mesa "SWR", Intel's AVX-based software renderer, which, for some reason, tried to do something even though it should not load unless explicitly requested or if direct rendering has failed. And it doesn't, if Mesa is built with gcc & linked with ld, even with it enabled ! One thing doesn't build with gcc, other fails with clang… there is no peace with Mesa. Anyway, thanks for your advices, I was getting desperate with that weird issue.
[Bug libstdc++/87071] libstdc++ crashes during GPU driver initialization with suspected attempt to execute unsupported instruction by Athlon64 X2 TK-57
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87071 Alexander Monakov changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |INVALID --- Comment #8 from Alexander Monakov --- You should have mentioned you were using a custom-compiled Mesa, not the distribution package (both here and in the original report to Mesa project). For some reason the disasm in the provided log is unusable (shows assembly of the outermost frame), but downloading your package shows that failing instruction is 928ea: c5 fa 6f 05 0e 09 c3 00 vmovdqu 0xc3090e(%rip),%xmm0 # cc3200 i.e. an AVX instruction, not supported on the CPU. Given that you were using Clang to compile the package, this is not a GCC issue.
[Bug libstdc++/87071] libstdc++ crashes during GPU driver initialization with suspected attempt to execute unsupported instruction by Athlon64 X2 TK-57
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87071 --- Comment #7 from Sergey Kondakov --- Created attachment 44583 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44583&action=edit Xorg.pid-1381.gdb.log with disas (In reply to Alexander Monakov from comment #6) > In your gdb script, please add > > x/i $pc > disas > > after the backtrace command (probably 'bt full') and regenerate the log. > Without that the log doesn't actually show the "illegal" instruction. > Now, THAT's some real advice, thanks ! I've managed get that log but… > Also please show output of rpm -qf /usr/lib64/dri/r300_dri.so This, of course, points to Mesa-dri package BUT that gave me an idea to not rule out Mesa yet. And, indeed, it was the culprit all along, despite of what its dev have said. Or, more accurately, it was clang/LLVM… probably, I haven't properly checked yet. I've forked distro package (or, more precisely, its auto-build OBS scripts) of Mesa to build it with LTO and decrease its monstrous size. But it doesn't compile with gcc that way because of a long-standing bug and broken autotools scripts + OBS can't handle requirements of full LTO anyway. So I used clang & gold for building Mesa (and only it) with ThinLTO and it worked (and works, on newer PCs) fine. I spent days on figuring that out and hours on fighting package manager to selectively install "old", default Mesa package-set without affecting anything else. And it worked, crash is gone. Maybe it was combination of factors but I shouldn't have believed Mesa is irrelevant, as I was told, just because it's far from being the first in the chain of jumps of the backtrace. But then again, I don't have much clue about how it works. So, this issue can be closed UNLESS you think that backtrace shouldn't have lead to libstdc++ anyway and/or clang & Mesa couldn't have failed like that on their own. Normally, such suicidal code should not come out of a compiler with almost-default non-aggressive settings, wherever it may be. (In reply to Uroš Bizjak from comment #4) > (In reply to Sergey Kondakov from comment #3) > > > If your code is correct then whose isn't ? > Instructions are generated by the compiler. So, it is the compiler's fault, > it probably emits a SSE instruction that your processor doesn't understand. > > That said, at least we need a runtime testcase that fails on your target. If > this is not possible, then please at least decompile the library and show > the instruction that fails. We also need preprocessed source and exact > instructions, how to build the source, so the illegal insn will be generated. > > Also, please read https://gcc.gnu.org/bugs/ Oh, I've read that, all right. Full verbose (very, very verbose) build logs, self-tests, compilation scripts and built gcc/libstdc++ packages are in the OBS links in the original post, more precisely: https://build.opensuse.org/package/live_build_log/devel:gcc/gcc8/openSUSE_Factory/x86_64 https://build.opensuse.org/package/binaries/devel:gcc/gcc8/openSUSE_Factory (requires OBS registration to show download links) https://download.opensuse.org/repositories/devel:/gcc/openSUSE_Factory/ (does not require registration but page with massive package-listing halts browser and requires a lot of RAM to view) Except for https://build.opensuse.org/package/show/home:X0F:HSF/Mesa where my Mesa-dri:r300_dri.so is built. But asking to decompile one of the core system libraries or make an example, faulty program on a spot is like asking to decompile kernel or write a driver (actually, probably even worse): anyone capable of doing it in a day does not require anyone else's software-related assistance. (In reply to Jonathan Wakely from comment #5) > Right. As you can see in GDB, the libstdc++ code says: > > 350 return static_cast(__builtin_memcpy(__s1, __s2, > __n)); > > Do you see any processor instructions there? Anything that isn't valid on > your CPU? No, because it's just C++ code. I see a bunch of over-complicated gibberish starting with libstdc++ which was used as an argument by widely-known, reputable and respected Mesa developer to look into libstdc++. Your reclussive, insular existance and passive-aggressive "deal with issues of our _irreplaceable_ software, requiring high-level low-abstraction-layer engineer-grade knowledge and experience, yourself"-attitude in relation to one of the most obscure subject matters, on the other hand, begets only distrust and frustration. As the result, I spent almost no time investigating my suspicion which was correct in the first place, spent a lot of effort to investigate his claim and couldn't do anything with yours because of how non-productive it was. > (In reply to Uroš Bizjak from comment #4) > > Also, please read https://gcc.gnu.org/bugs/ > > I already said that before the bug was even filed. No wonder googling "gcc bugzilla registration" brings up an upvoted years-old post that's advising not to bother.
[Bug libstdc++/87071] libstdc++ crashes during GPU driver initialization with suspected attempt to execute unsupported instruction by Athlon64 X2 TK-57
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87071 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comment #6 from Alexander Monakov --- In your gdb script, please add x/i $pc disas after the backtrace command (probably 'bt full') and regenerate the log. Without that the log doesn't actually show the "illegal" instruction. Also please show output of rpm -qf /usr/lib64/dri/r300_dri.so
[Bug libstdc++/87071] libstdc++ crashes during GPU driver initialization with suspected attempt to execute unsupported instruction by Athlon64 X2 TK-57
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87071 Jonathan Wakely changed: What|Removed |Added Status|UNCONFIRMED |WAITING Last reconfirmed||2018-08-23 Ever confirmed|0 |1 --- Comment #5 from Jonathan Wakely --- (In reply to Uroš Bizjak from comment #4) > (In reply to Sergey Kondakov from comment #3) > > > If your code is correct then whose isn't ? > Instructions are generated by the compiler. So, it is the compiler's fault, > it probably emits a SSE instruction that your processor doesn't understand. Right. As you can see in GDB, the libstdc++ code says: 350 return static_cast(__builtin_memcpy(__s1, __s2, __n)); Do you see any processor instructions there? Anything that isn't valid on your CPU? No, because it's just C++ code. > Also, please read https://gcc.gnu.org/bugs/ I already said that before the bug was even filed.
[Bug libstdc++/87071] libstdc++ crashes during GPU driver initialization with suspected attempt to execute unsupported instruction by Athlon64 X2 TK-57
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87071 --- Comment #4 from Uroš Bizjak --- (In reply to Sergey Kondakov from comment #3) > If your code is correct then whose isn't ? Instructions are generated by the compiler. So, it is the compiler's fault, it probably emits a SSE instruction that your processor doesn't understand. That said, at least we need a runtime testcase that fails on your target. If this is not possible, then please at least decompile the library and show the instruction that fails. We also need preprocessed source and exact instructions, how to build the source, so the illegal insn will be generated. Also, please read https://gcc.gnu.org/bugs/
[Bug libstdc++/87071] libstdc++ crashes during GPU driver initialization with suspected attempt to execute unsupported instruction by Athlon64 X2 TK-57
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87071 --- Comment #3 from Sergey Kondakov --- (In reply to Jonathan Wakely from comment #2) > (EE) Illegal instruction at address 0x72f2c8ea > > I don't see how this can possibly be a libstdc++ problem, since libstdc++ > doesn't produce any CPU instructions, illegal or not. > > As I already said to you, there's nothing we can do with this info. And Mesa devs said that it is. So everyone points fingers at each other in a circle and what am I supposed to do ? Program received signal SIGILL, Illegal instruction. 0x72f2c8ea in _GLOBAL__sub_I_lower_x86.cpp () at /usr/bin/../lib64/gcc/x86_64-suse-linux/8/../../../../include/c++/8/bits/char_traits.h:350 350 return static_cast(__builtin_memcpy(__s1, __s2, __n)); #0 0x72f2c8ea in _GLOBAL__sub_I_lower_x86.cpp () at /usr/bin/../lib64/gcc/x86_64-suse-linux/8/../../../../include/c++/8/bits/char_traits.h:350 InitializeLowerX86PassFlag = {_M_once = 0} SwrJit::intrinsicMap2[abi:cxx11] = {std::map with 0 elements, std::map with 0 elements, std::map with 0 elements} std::__ioinit = {static _S_refcount = 13, static _S_synced_with_stdio = true} SwrJit::DOUBLE = std::piecewise_construct = (anonymous namespace)::ForceMCJITLinking = SwrJit::intrinsicMap[abi:cxx11] = std::map with 0 elements SwrJit::LowerX86::ID = 0 '\000' Is include/c++/8/bits/char_traits.h not part of libstdc++ ? If your code is correct then whose isn't ?
[Bug libstdc++/87071] libstdc++ crashes during GPU driver initialization with suspected attempt to execute unsupported instruction by Athlon64 X2 TK-57
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87071 --- Comment #2 from Jonathan Wakely --- (EE) Illegal instruction at address 0x72f2c8ea I don't see how this can possibly be a libstdc++ problem, since libstdc++ doesn't produce any CPU instructions, illegal or not. As I already said to you, there's nothing we can do with this info.
[Bug libstdc++/87071] libstdc++ crashes during GPU driver initialization with suspected attempt to execute unsupported instruction by Athlon64 X2 TK-57
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87071 --- Comment #1 from Sergey Kondakov --- Created attachment 44577 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44577&action=edit Asus_F3Ke.dmesg Verbose dmesg from affected machine.