On Fri, 3 Mar 2023 at 17:47, Alexandre Oliva <ol...@adacore.com> wrote:

> On Mar  3, 2023, Jonathan Wakely <jwak...@redhat.com> wrote:
>
> > On Fri, 3 Mar 2023 at 09:33, Jonathan Wakely <jwak...@redhat.com> wrote:
> >> Jakub previously suggested doing this for PR 61841, which was a similar
> >> problem with pthread_create:
> >>
> >> __asm ("" : : "r" (&pthread_create)); would not be optimized away.
> >>
> >>
> >> That would avoid the multiple copies.
>
> Not really.  There would be multiple copies of the code that loads
> pthread_create's address.  And we don't really need the address, a
> single never-executed call would do.  I've explored these possibilities
> a bit, and here's what I've come up with: a private static member
> function that we output in units that instantiate the thread template
> ctor, to pass its address to _M_start_thread.  Since it's never actually
> called, we don't really need the hacks in some of the alternatives I
> left in place, mainly for your enjoyment.
>
> They all work equally well, just as efficient per-instantiation at
> runtime, a little different space and loading overheads, but the last
> one, that is enabled, is my favorite: only PLT relocations, that we'd
> likely get anyway, no full-address resolution, and as-short-as-possible
> calls, enough to get a relocation with a strong reference to pull the
> symbol in when linking, but as short as possible call sequences, because
> of the type cast.
>

And those expressions aren't ever optimized away as unused?


>
> As a bonus, I put in (in the last minute, after my test runs) something
> to keep even LTO happy: the asm statements to prevent depend from being
> optimized out in _M_start_thread.  In non-LTO, its impact should be
> virtually zero.
>
> How does this look?  (minus the #if 0/#elif 0/.../#else)
>

Looks good, thanks for going the extra mile to check all the alternatives,
and the futureproofing it for LTO.

OK for trunk.

Reply via email to