[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-06 Thread belegdol at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #24 from Julian Sikorski  ---
I was able to reproduce the problem with the following make call, without the
need to use the RPM tooling:

make -j16 VERBOSE=1 NOWERROR=1 SYMBOLS=1 SYMLEVEL=1 OPTIMIZE=2 OPT_FLAGS="-O2
-fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang
-Werror=format-security -Werror=implicit-function-declaration
-Werror=implicit-int -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3
-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong
-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -mbranch-protection=standard
-fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer
-mno-omit-leaf-frame-pointer" LDOPTS="-Wl,-z,relro -Wl,--as-needed  -Wl,-z,now
-specs=/usr/lib/rpm/redhat/redhat-hardened-ld
-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -Wl,--build-id=sha1"

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-06 Thread belegdol at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #23 from Julian Sikorski  ---
I was able to complete git bisect in the meantime, it also points to
15b4f66b0a9a3be6caf1898d22a13c39e662006f being the first bad commit.
Interestingly enough, I was not able to reproduce the issue with a simple make
from mame's git snapshot, which indicates that the erroneous behaviour is being
triggered by Fedora RPM packaging options and/or compiler and/or linker flags.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-05 Thread nsz at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

Szabolcs Nagy  changed:

   What|Removed |Added

   Last reconfirmed||2023-10-05
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #22 from Szabolcs Nagy  ---
i can confirm the issue, not yet sure what's going on.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-04 Thread belegdol at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #21 from Julian Sikorski  ---
(In reply to Szabolcs Nagy from comment #20)
> seems they made the build use lld, so now i have to undo that.
> will look at it tomorrow

Sorry about that, I should have mentioned it here. You do not need to do git
revert. You can do

$ fedpkg switch-branch f39
$ fedpkg srpm
$ mock -r fedora-rawhide-aarch64 mame-0.259-1.fc39.src.rpm

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-04 Thread nsz at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #20 from Szabolcs Nagy  ---
(In reply to Szabolcs Nagy from comment #18)
> i tried the specified steps and the bug is not reproducible.

(In reply to Nick Clifton from comment #19)
> (In reply to Szabolcs Nagy from comment #18)
> > i tried the specified steps and the bug is not reproducible.
> 
> Oh dear - that implies that the problem might be specific the Fedora
> binutils.
> 
> Thanks for having a look at the problem.  I guess it is up to the Fedora
> folks now.

seems they made the build use lld, so now i have to undo that.
will look at it tomorrow

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-04 Thread nickc at redhat dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #19 from Nick Clifton  ---
(In reply to Szabolcs Nagy from comment #18)
> i tried the specified steps and the bug is not reproducible.

Oh dear - that implies that the problem might be specific the Fedora binutils.

Thanks for having a look at the problem.  I guess it is up to the Fedora folks
now.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-04 Thread nsz at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #18 from Szabolcs Nagy  ---
i tried the specified steps and the bug is not reproducible.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-03 Thread belegdol at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #17 from Julian Sikorski  ---
(In reply to Nick Clifton from comment #16)
> Created attachment 15152 [details]
> Proposed patch
> 
> (In reply to Julian Sikorski from comment #13)
> > Thanks! The patch does not revert cleanly unfortunately and the changes are
> > complicated enough that I do not feel comfortable running git mergetool.
> > Would someone please be so kind and provide a patch I can apply against 
> > 2.41?
> 
> Please try this patch.

Thanks! With this patch applied the linked mame binary no longer gets stuck.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-03 Thread nickc at redhat dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #16 from Nick Clifton  ---
Created attachment 15152
  --> https://sourceware.org/bugzilla/attachment.cgi?id=15152&action=edit
Proposed patch

(In reply to Julian Sikorski from comment #13)
> Thanks! The patch does not revert cleanly unfortunately and the changes are
> complicated enough that I do not feel comfortable running git mergetool.
> Would someone please be so kind and provide a patch I can apply against 2.41?

Please try this patch.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-03 Thread sam at gentoo dot org
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

Sam James  changed:

   What|Removed |Added

 CC||nsz at gcc dot gnu.org

--- Comment #15 from Sam James  ---
./configure --disable-werror

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-03 Thread belegdol at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #14 from Julian Sikorski  ---
(In reply to Julian Sikorski from comment #10)
> Done: https://bugzilla.redhat.com/show_bug.cgi?id=2241902
> 
> I managed to set up an aarch64 rawhide instance on Oracle Cloud but I cannot
> connect to it yet :( If I manage to get it working, I can see if I can set
> up a bisect mentioned in comment #2.

Is there a straightforward way of disabling -Werror for non-releases? I got my
cloud instance running but I cannot build a mid-release snapshot due to this.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-03 Thread belegdol at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #13 from Julian Sikorski  ---
Thanks! The patch does not revert cleanly unfortunately and the changes are
complicated enough that I do not feel comfortable running git mergetool. Would
someone please be so kind and provide a patch I can apply against 2.41?

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-03 Thread carlos at redhat dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

Carlos O'Donell  changed:

   What|Removed |Added

 CC||carlos at redhat dot com

--- Comment #12 from Carlos O'Donell  ---
I suggest starting by reverting the BTI stub change:

commit 15b4f66b0a9a3be6caf1898d22a13c39e662006f
Author: Szabolcs Nagy 
Date:   Wed Jan 18 12:56:46 2023 +

bfd: aarch64: Fix stubs that may break BTI PR30076

Insert two stubs in a BTI enabled binary when fixing long calls: The
first is near the call site and uses an indirect jump like before,
but it targets the second stub that is near the call target site and
uses a direct jump.

This is needed when a single stub breaks BTI compatibility.

The stub layout is kept fixed between sizing and building the stubs,
so the location of the second stub is known at build time, this may
introduce padding between stubs when those are relaxed.  Stub layout
with BTI disabled is unchanged.

... and see if that fixes the issue.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-03 Thread belegdol at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #11 from Julian Sikorski  ---
With a non-mock, fedpkg compile build on Fedora rawhide aarch running on OCI
the backtrace is slightly different:

#0  0xb5bd4fb0 in
___ZN3emu6detail16device_registrar15register_deviceERNS0_21device_type_impl_baseE_bti_veneer
()
#1  0xaec52368 in device_type_impl_base ()
at ../../../../../src/emu/device.h:240
#2  device_type_impl () at
../../../../../src/emu/device.h:283
#3  __static_initialization_and_destruction_0 () at
../../../../../src/mame/acorn/z88_impexp.cpp:34
#4  _GLOBAL__sub_I_Z88_IMPEXP () at
../../../../../src/mame/acorn/z88_impexp.cpp:278
#5  0xf5870b2c in call_init (env=, argv=0xf258,
argc=2) at ../csu/libc-start.c:145
#6  __libc_start_main_impl (main=0xaeedadc0 , argc=2,
argv=0xf258, init=, fini=, 
rtld_fini=, stack_end=) at
../csu/libc-start.c:347
#7  0xaef01570 in _start

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-03 Thread belegdol at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #10 from Julian Sikorski  ---
Done: https://bugzilla.redhat.com/show_bug.cgi?id=2241902

I managed to set up an aarch64 rawhide instance on Oracle Cloud but I cannot
connect to it yet :( If I manage to get it working, I can see if I can set up a
bisect mentioned in comment #2.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-03 Thread nickc at redhat dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #9 from Nick Clifton  ---
(In reply to Julian Sikorski from comment #8)

> How would I bring in help from glibc folks? Should I just reassign the bug
> to glibc?

Yes/No.  Since you are using Fedora and there is a possibility that this
problem is specific to that distribution, I think that the best thing to do
would be to file a new ticket with the Fedora bug tracking system, assigned to
glibc for now.  They can always change it to being a binutils ticket once they
can show what the linker is doing wrong.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-03 Thread belegdol at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #8 from Julian Sikorski  ---
(In reply to Nick Clifton from comment #7)
> (In reply to Julian Sikorski from comment #5)
> > (In reply to Nick Clifton from comment #4)
> > > You may find it useful to compare a broken-linked-with-ld.bfd binary
> > > with a working-linked-with-lld binary.  In particular the contents 
> > > of whatever init sections they have, and the ordering of function
> > > pointers therein.
> > 
> > I am downloading the broken binary from the test system now. How can I do
> > the above?
> 
> Well first you can compare the disassembly of the .init section to make sure
> that it is the same in both binaries:
> 
>   objdump -D -j .init mame
> 
> Next I was going to suggest that you check the contents of the .init_array
> section but it appears to be all zeros, which is a bit strange.
> 
> You could be paranoid and check that the hardware property notes are the
> same on both binaries:
> 
>   readelf -n -W mame | grep -e .note.gnu.property -A 4
> 
> But I doubt if that show any discrepancies.
> 
> But I suspect that the only real way you are going to get some traction on
> this problem is if you bring in the glibc folks.  Maybe file a bug report
> telling them that mame is hanging during initialization and that you need
> their help finding out where things have gone wrong ?  Let them know about
> the new version of binutils of course, but do ask them if they can track
> down exactly what the linker has done wrong in order to cause the init code
> to hang.

How would I bring in help from glibc folks? Should I just reassign the bug to
glibc?

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-03 Thread nickc at redhat dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #7 from Nick Clifton  ---
(In reply to Julian Sikorski from comment #5)
> (In reply to Nick Clifton from comment #4)
> > You may find it useful to compare a broken-linked-with-ld.bfd binary
> > with a working-linked-with-lld binary.  In particular the contents 
> > of whatever init sections they have, and the ordering of function
> > pointers therein.
> 
> I am downloading the broken binary from the test system now. How can I do
> the above?

Well first you can compare the disassembly of the .init section to make sure
that it is the same in both binaries:

  objdump -D -j .init mame

Next I was going to suggest that you check the contents of the .init_array
section but it appears to be all zeros, which is a bit strange.

You could be paranoid and check that the hardware property notes are the same
on both binaries:

  readelf -n -W mame | grep -e .note.gnu.property -A 4

But I doubt if that show any discrepancies.

But I suspect that the only real way you are going to get some traction on this
problem is if you bring in the glibc folks.  Maybe file a bug report telling
them that mame is hanging during initialization and that you need their help
finding out where things have gone wrong ?  Let them know about the new version
of binutils of course, but do ask them if they can track down exactly what the
linker has done wrong in order to cause the init code to hang.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-02 Thread belegdol at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #6 from Julian Sikorski  ---
(In reply to Sam James from comment #2)
> Could you try give some instructions to reproduce manually from source,
> without using Fedora and Fedora specific tooling?
> 
> Bisecting binutils using 'git bisect run' + timeout would be helpful too if
> you can.

This might work, however I cannot test it unfortunately:
1. git clone https://github.com/mamedev/mame.git
2. Install deps according to the distros mechanism of choice
3. cd mame
4. make -O -j2 V=1 VERBOSE=1 NOWERROR=1 OPTIMIZE=2 PYTHON_EXECUTABLE=python3
QT_HOME=/usr/lib64/qt6 VERBOSE=1 USE_SYSTEM_LIB_ASIO=1 USE_SYSTEM_LIB_EXPAT=1
USE_SYSTEM_LIB_FLAC=1 USE_SYSTEM_LIB_GLM=1 USE_SYSTEM_LIB_JPEG=1
USE_SYSTEM_LIB_PORTAUDIO=1 USE_SYSTEM_LIB_PORTMIDI=1 USE_SYSTEM_LIB_PUGIXML=1
USE_SYSTEM_LIB_RAPIDJSON=1 USE_SYSTEM_LIB_SQLITE3=1 USE_SYSTEM_LIB_UTF8PROC=1
USE_SYSTEM_LIB_ZLIB=1 'SDL_INI_PATH=/etc/mame;' TOOLS=1 'OPT_FLAGS=-O2
-fexceptions -g1 -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang
-Werror=format-security -Werror=implicit-function-declaration
-Werror=implicit-int -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3
-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong
-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -mbranch-protection=standard
-fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer
-mno-omit-leaf-frame-pointer' 'LDOPTS=-Wl,-z,relro -Wl,--as-needed  -Wl,-z,now
-specs=/usr/lib/rpm/redhat/redhat-hardened-ld
-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -Wl,--build-id=sha1
-specs=/usr/lib/rpm/redhat/redhat-package-notes
5. ./mame -validate

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-02 Thread belegdol at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #5 from Julian Sikorski  ---
(In reply to Nick Clifton from comment #4)
> You may find it useful to compare a broken-linked-with-ld.bfd binary
> with a working-linked-with-lld binary.  In particular the contents 
> of whatever init sections they have, and the ordering of function
> pointers therein.

I am downloading the broken binary from the test system now. How can I do the
above?

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-02 Thread nickc at redhat dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

Nick Clifton  changed:

   What|Removed |Added

 CC||nickc at redhat dot com

--- Comment #4 from Nick Clifton  ---
(In reply to Julian Sikorski from comment #1)

> Validation gets stuck on the following function:
> #0  0xb5bddb08 in ___ZN4bgfx12VertexLayoutC1Ev_bti_veneer ()

Hmm, this suggests that maybe there is a problem with the BTI hardware
enablement.  You may need to bring help from the glibc folks to find
out what is going wrong.

> #1  0xf5870b2c in call_init (env=,

And this suggests that the issue might also relate to the ordering of
function calls during the init sequence.  Maybe that BTI call above
is being invoked before a get-ready-for-BTI call is made ?

You may find it useful to compare a broken-linked-with-ld.bfd binary
with a working-linked-with-lld binary.  In particular the contents 
of whatever init sections they have, and the ordering of function
pointers therein.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-02 Thread belegdol at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #3 from Julian Sikorski  ---
I will try, however my problem is that the issue only appears to happen with a
full (as opposed to single-driver) build. It takes close to 3 hours on the only
aarch64 machine I have access to so far, which makes experimentation somewhat
challenging.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-02 Thread sam at gentoo dot org
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

Sam James  changed:

   What|Removed |Added

 CC||sam at gentoo dot org
   See Also||https://github.com/mamedev/
   ||mame/issues/11587

--- Comment #2 from Sam James  ---
Could you try give some instructions to reproduce manually from source, without
using Fedora and Fedora specific tooling?

Bisecting binutils using 'git bisect run' + timeout would be helpful too if you
can.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-01 Thread belegdol at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #1 from Julian Sikorski  ---
Validation gets stuck on the following function:

#0  0xb5bddb08 in ___ZN4bgfx12VertexLayoutC1Ev_bti_veneer ()
#1  0xf5870b2c in call_init (env=, argv=0xf388,
argc=1) at ../csu/libc-start.c:145
#2  __libc_start_main_impl (main=0xaeedadc0 , argc=1,
argv=0xf388, init=, fini=, 
rtld_fini=, stack_end=) at
../csu/libc-start.c:347
#3  0xaef01570 in _start ()

-- 
You are receiving this mail because:
You are on the CC list for the bug.