[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64
https://sourceware.org/bugzilla/show_bug.cgi?id=30930 --- Comment #24 from Julian Sikorski --- I was able to reproduce the problem with the following make call, without the need to use the RPM tooling: make -j16 VERBOSE=1 NOWERROR=1 SYMBOLS=1 SYMLEVEL=1 OPTIMIZE=2 OPT_FLAGS="-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Werror=implicit-function-declaration -Werror=implicit-int -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer" LDOPTS="-Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1" -- You are receiving this mail because: You are on the CC list for the bug.
[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64
https://sourceware.org/bugzilla/show_bug.cgi?id=30930 --- Comment #23 from Julian Sikorski --- I was able to complete git bisect in the meantime, it also points to 15b4f66b0a9a3be6caf1898d22a13c39e662006f being the first bad commit. Interestingly enough, I was not able to reproduce the issue with a simple make from mame's git snapshot, which indicates that the erroneous behaviour is being triggered by Fedora RPM packaging options and/or compiler and/or linker flags. -- You are receiving this mail because: You are on the CC list for the bug.
[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64
https://sourceware.org/bugzilla/show_bug.cgi?id=30930 Szabolcs Nagy changed: What|Removed |Added Last reconfirmed||2023-10-05 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #22 from Szabolcs Nagy --- i can confirm the issue, not yet sure what's going on. -- You are receiving this mail because: You are on the CC list for the bug.
[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64
https://sourceware.org/bugzilla/show_bug.cgi?id=30930 --- Comment #21 from Julian Sikorski --- (In reply to Szabolcs Nagy from comment #20) > seems they made the build use lld, so now i have to undo that. > will look at it tomorrow Sorry about that, I should have mentioned it here. You do not need to do git revert. You can do $ fedpkg switch-branch f39 $ fedpkg srpm $ mock -r fedora-rawhide-aarch64 mame-0.259-1.fc39.src.rpm -- You are receiving this mail because: You are on the CC list for the bug.
[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64
https://sourceware.org/bugzilla/show_bug.cgi?id=30930 --- Comment #20 from Szabolcs Nagy --- (In reply to Szabolcs Nagy from comment #18) > i tried the specified steps and the bug is not reproducible. (In reply to Nick Clifton from comment #19) > (In reply to Szabolcs Nagy from comment #18) > > i tried the specified steps and the bug is not reproducible. > > Oh dear - that implies that the problem might be specific the Fedora > binutils. > > Thanks for having a look at the problem. I guess it is up to the Fedora > folks now. seems they made the build use lld, so now i have to undo that. will look at it tomorrow -- You are receiving this mail because: You are on the CC list for the bug.
[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64
https://sourceware.org/bugzilla/show_bug.cgi?id=30930 --- Comment #19 from Nick Clifton --- (In reply to Szabolcs Nagy from comment #18) > i tried the specified steps and the bug is not reproducible. Oh dear - that implies that the problem might be specific the Fedora binutils. Thanks for having a look at the problem. I guess it is up to the Fedora folks now. -- You are receiving this mail because: You are on the CC list for the bug.
[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64
https://sourceware.org/bugzilla/show_bug.cgi?id=30930 --- Comment #18 from Szabolcs Nagy --- i tried the specified steps and the bug is not reproducible. -- You are receiving this mail because: You are on the CC list for the bug.
[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64
https://sourceware.org/bugzilla/show_bug.cgi?id=30930 --- Comment #17 from Julian Sikorski --- (In reply to Nick Clifton from comment #16) > Created attachment 15152 [details] > Proposed patch > > (In reply to Julian Sikorski from comment #13) > > Thanks! The patch does not revert cleanly unfortunately and the changes are > > complicated enough that I do not feel comfortable running git mergetool. > > Would someone please be so kind and provide a patch I can apply against > > 2.41? > > Please try this patch. Thanks! With this patch applied the linked mame binary no longer gets stuck. -- You are receiving this mail because: You are on the CC list for the bug.
[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64
https://sourceware.org/bugzilla/show_bug.cgi?id=30930 --- Comment #16 from Nick Clifton --- Created attachment 15152 --> https://sourceware.org/bugzilla/attachment.cgi?id=15152&action=edit Proposed patch (In reply to Julian Sikorski from comment #13) > Thanks! The patch does not revert cleanly unfortunately and the changes are > complicated enough that I do not feel comfortable running git mergetool. > Would someone please be so kind and provide a patch I can apply against 2.41? Please try this patch. -- You are receiving this mail because: You are on the CC list for the bug.
[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64
https://sourceware.org/bugzilla/show_bug.cgi?id=30930 Sam James changed: What|Removed |Added CC||nsz at gcc dot gnu.org --- Comment #15 from Sam James --- ./configure --disable-werror -- You are receiving this mail because: You are on the CC list for the bug.
[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64
https://sourceware.org/bugzilla/show_bug.cgi?id=30930 --- Comment #14 from Julian Sikorski --- (In reply to Julian Sikorski from comment #10) > Done: https://bugzilla.redhat.com/show_bug.cgi?id=2241902 > > I managed to set up an aarch64 rawhide instance on Oracle Cloud but I cannot > connect to it yet :( If I manage to get it working, I can see if I can set > up a bisect mentioned in comment #2. Is there a straightforward way of disabling -Werror for non-releases? I got my cloud instance running but I cannot build a mid-release snapshot due to this. -- You are receiving this mail because: You are on the CC list for the bug.
[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64
https://sourceware.org/bugzilla/show_bug.cgi?id=30930 --- Comment #13 from Julian Sikorski --- Thanks! The patch does not revert cleanly unfortunately and the changes are complicated enough that I do not feel comfortable running git mergetool. Would someone please be so kind and provide a patch I can apply against 2.41? -- You are receiving this mail because: You are on the CC list for the bug.
[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64
https://sourceware.org/bugzilla/show_bug.cgi?id=30930 Carlos O'Donell changed: What|Removed |Added CC||carlos at redhat dot com --- Comment #12 from Carlos O'Donell --- I suggest starting by reverting the BTI stub change: commit 15b4f66b0a9a3be6caf1898d22a13c39e662006f Author: Szabolcs Nagy Date: Wed Jan 18 12:56:46 2023 + bfd: aarch64: Fix stubs that may break BTI PR30076 Insert two stubs in a BTI enabled binary when fixing long calls: The first is near the call site and uses an indirect jump like before, but it targets the second stub that is near the call target site and uses a direct jump. This is needed when a single stub breaks BTI compatibility. The stub layout is kept fixed between sizing and building the stubs, so the location of the second stub is known at build time, this may introduce padding between stubs when those are relaxed. Stub layout with BTI disabled is unchanged. ... and see if that fixes the issue. -- You are receiving this mail because: You are on the CC list for the bug.
[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64
https://sourceware.org/bugzilla/show_bug.cgi?id=30930 --- Comment #11 from Julian Sikorski --- With a non-mock, fedpkg compile build on Fedora rawhide aarch running on OCI the backtrace is slightly different: #0 0xb5bd4fb0 in ___ZN3emu6detail16device_registrar15register_deviceERNS0_21device_type_impl_baseE_bti_veneer () #1 0xaec52368 in device_type_impl_base () at ../../../../../src/emu/device.h:240 #2 device_type_impl () at ../../../../../src/emu/device.h:283 #3 __static_initialization_and_destruction_0 () at ../../../../../src/mame/acorn/z88_impexp.cpp:34 #4 _GLOBAL__sub_I_Z88_IMPEXP () at ../../../../../src/mame/acorn/z88_impexp.cpp:278 #5 0xf5870b2c in call_init (env=, argv=0xf258, argc=2) at ../csu/libc-start.c:145 #6 __libc_start_main_impl (main=0xaeedadc0 , argc=2, argv=0xf258, init=, fini=, rtld_fini=, stack_end=) at ../csu/libc-start.c:347 #7 0xaef01570 in _start -- You are receiving this mail because: You are on the CC list for the bug.
[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64
https://sourceware.org/bugzilla/show_bug.cgi?id=30930 --- Comment #10 from Julian Sikorski --- Done: https://bugzilla.redhat.com/show_bug.cgi?id=2241902 I managed to set up an aarch64 rawhide instance on Oracle Cloud but I cannot connect to it yet :( If I manage to get it working, I can see if I can set up a bisect mentioned in comment #2. -- You are receiving this mail because: You are on the CC list for the bug.
[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64
https://sourceware.org/bugzilla/show_bug.cgi?id=30930 --- Comment #9 from Nick Clifton --- (In reply to Julian Sikorski from comment #8) > How would I bring in help from glibc folks? Should I just reassign the bug > to glibc? Yes/No. Since you are using Fedora and there is a possibility that this problem is specific to that distribution, I think that the best thing to do would be to file a new ticket with the Fedora bug tracking system, assigned to glibc for now. They can always change it to being a binutils ticket once they can show what the linker is doing wrong. -- You are receiving this mail because: You are on the CC list for the bug.
[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64
https://sourceware.org/bugzilla/show_bug.cgi?id=30930 --- Comment #8 from Julian Sikorski --- (In reply to Nick Clifton from comment #7) > (In reply to Julian Sikorski from comment #5) > > (In reply to Nick Clifton from comment #4) > > > You may find it useful to compare a broken-linked-with-ld.bfd binary > > > with a working-linked-with-lld binary. In particular the contents > > > of whatever init sections they have, and the ordering of function > > > pointers therein. > > > > I am downloading the broken binary from the test system now. How can I do > > the above? > > Well first you can compare the disassembly of the .init section to make sure > that it is the same in both binaries: > > objdump -D -j .init mame > > Next I was going to suggest that you check the contents of the .init_array > section but it appears to be all zeros, which is a bit strange. > > You could be paranoid and check that the hardware property notes are the > same on both binaries: > > readelf -n -W mame | grep -e .note.gnu.property -A 4 > > But I doubt if that show any discrepancies. > > But I suspect that the only real way you are going to get some traction on > this problem is if you bring in the glibc folks. Maybe file a bug report > telling them that mame is hanging during initialization and that you need > their help finding out where things have gone wrong ? Let them know about > the new version of binutils of course, but do ask them if they can track > down exactly what the linker has done wrong in order to cause the init code > to hang. How would I bring in help from glibc folks? Should I just reassign the bug to glibc? -- You are receiving this mail because: You are on the CC list for the bug.
[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64
https://sourceware.org/bugzilla/show_bug.cgi?id=30930 --- Comment #7 from Nick Clifton --- (In reply to Julian Sikorski from comment #5) > (In reply to Nick Clifton from comment #4) > > You may find it useful to compare a broken-linked-with-ld.bfd binary > > with a working-linked-with-lld binary. In particular the contents > > of whatever init sections they have, and the ordering of function > > pointers therein. > > I am downloading the broken binary from the test system now. How can I do > the above? Well first you can compare the disassembly of the .init section to make sure that it is the same in both binaries: objdump -D -j .init mame Next I was going to suggest that you check the contents of the .init_array section but it appears to be all zeros, which is a bit strange. You could be paranoid and check that the hardware property notes are the same on both binaries: readelf -n -W mame | grep -e .note.gnu.property -A 4 But I doubt if that show any discrepancies. But I suspect that the only real way you are going to get some traction on this problem is if you bring in the glibc folks. Maybe file a bug report telling them that mame is hanging during initialization and that you need their help finding out where things have gone wrong ? Let them know about the new version of binutils of course, but do ask them if they can track down exactly what the linker has done wrong in order to cause the init code to hang. -- You are receiving this mail because: You are on the CC list for the bug.
[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64
https://sourceware.org/bugzilla/show_bug.cgi?id=30930 --- Comment #6 from Julian Sikorski --- (In reply to Sam James from comment #2) > Could you try give some instructions to reproduce manually from source, > without using Fedora and Fedora specific tooling? > > Bisecting binutils using 'git bisect run' + timeout would be helpful too if > you can. This might work, however I cannot test it unfortunately: 1. git clone https://github.com/mamedev/mame.git 2. Install deps according to the distros mechanism of choice 3. cd mame 4. make -O -j2 V=1 VERBOSE=1 NOWERROR=1 OPTIMIZE=2 PYTHON_EXECUTABLE=python3 QT_HOME=/usr/lib64/qt6 VERBOSE=1 USE_SYSTEM_LIB_ASIO=1 USE_SYSTEM_LIB_EXPAT=1 USE_SYSTEM_LIB_FLAC=1 USE_SYSTEM_LIB_GLM=1 USE_SYSTEM_LIB_JPEG=1 USE_SYSTEM_LIB_PORTAUDIO=1 USE_SYSTEM_LIB_PORTMIDI=1 USE_SYSTEM_LIB_PUGIXML=1 USE_SYSTEM_LIB_RAPIDJSON=1 USE_SYSTEM_LIB_SQLITE3=1 USE_SYSTEM_LIB_UTF8PROC=1 USE_SYSTEM_LIB_ZLIB=1 'SDL_INI_PATH=/etc/mame;' TOOLS=1 'OPT_FLAGS=-O2 -fexceptions -g1 -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Werror=implicit-function-declaration -Werror=implicit-int -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer' 'LDOPTS=-Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1 -specs=/usr/lib/rpm/redhat/redhat-package-notes 5. ./mame -validate -- You are receiving this mail because: You are on the CC list for the bug.
[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64
https://sourceware.org/bugzilla/show_bug.cgi?id=30930 --- Comment #5 from Julian Sikorski --- (In reply to Nick Clifton from comment #4) > You may find it useful to compare a broken-linked-with-ld.bfd binary > with a working-linked-with-lld binary. In particular the contents > of whatever init sections they have, and the ordering of function > pointers therein. I am downloading the broken binary from the test system now. How can I do the above? -- You are receiving this mail because: You are on the CC list for the bug.
[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64
https://sourceware.org/bugzilla/show_bug.cgi?id=30930 Nick Clifton changed: What|Removed |Added CC||nickc at redhat dot com --- Comment #4 from Nick Clifton --- (In reply to Julian Sikorski from comment #1) > Validation gets stuck on the following function: > #0 0xb5bddb08 in ___ZN4bgfx12VertexLayoutC1Ev_bti_veneer () Hmm, this suggests that maybe there is a problem with the BTI hardware enablement. You may need to bring help from the glibc folks to find out what is going wrong. > #1 0xf5870b2c in call_init (env=, And this suggests that the issue might also relate to the ordering of function calls during the init sequence. Maybe that BTI call above is being invoked before a get-ready-for-BTI call is made ? You may find it useful to compare a broken-linked-with-ld.bfd binary with a working-linked-with-lld binary. In particular the contents of whatever init sections they have, and the ordering of function pointers therein. -- You are receiving this mail because: You are on the CC list for the bug.
[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64
https://sourceware.org/bugzilla/show_bug.cgi?id=30930 --- Comment #3 from Julian Sikorski --- I will try, however my problem is that the issue only appears to happen with a full (as opposed to single-driver) build. It takes close to 3 hours on the only aarch64 machine I have access to so far, which makes experimentation somewhat challenging. -- You are receiving this mail because: You are on the CC list for the bug.
[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64
https://sourceware.org/bugzilla/show_bug.cgi?id=30930 Sam James changed: What|Removed |Added CC||sam at gentoo dot org See Also||https://github.com/mamedev/ ||mame/issues/11587 --- Comment #2 from Sam James --- Could you try give some instructions to reproduce manually from source, without using Fedora and Fedora specific tooling? Bisecting binutils using 'git bisect run' + timeout would be helpful too if you can. -- You are receiving this mail because: You are on the CC list for the bug.
[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64
https://sourceware.org/bugzilla/show_bug.cgi?id=30930 --- Comment #1 from Julian Sikorski --- Validation gets stuck on the following function: #0 0xb5bddb08 in ___ZN4bgfx12VertexLayoutC1Ev_bti_veneer () #1 0xf5870b2c in call_init (env=, argv=0xf388, argc=1) at ../csu/libc-start.c:145 #2 __libc_start_main_impl (main=0xaeedadc0 , argc=1, argv=0xf388, init=, fini=, rtld_fini=, stack_end=) at ../csu/libc-start.c:347 #3 0xaef01570 in _start () -- You are receiving this mail because: You are on the CC list for the bug.