Re: aarch64 TLS optimizations?

2019-05-20 Thread Tom Horsley
On Mon, 20 May 2019 17:07:59 +
Szabolcs Nagy wrote:

> and the R_*_TLSIE_* relocs are for .LANCHOR3 + 0,
> so there will be one GOT entry for the 3 objects
> and you should see

That may indeed explain what is going on. I'll
have to take a closer look at the specific
ubuntu libraries I have installed and see if I
detect something similar. Thanks.


Re: aarch64 TLS optimizations?

2019-05-20 Thread Szabolcs Nagy
On 20/05/2019 16:59, Tom Horsley wrote:
> On Mon, 20 May 2019 15:43:53 +
> Szabolcs Nagy wrote:
> 
>> you can verify that 0x152000 + 3608 == 0x152e18 is
>> indeed a GOT entry (falls into .got) and there is a
>>
>> 00152e18 R_AARCH64_TLS_TPREL64  *ABS*+0x0010
> 
> There are a couple of other TLS variables in malloc, and I
> suspect this is one of them, where it is actually looking
> at tcache_shutting_down (verified with debug info and disassembly),
> it is simply using the tpidr_el0 value still laying around
> in the register from the 1st TLS reference and loading
> tcache_shutting_down from an offset which appears for all the
> world to simply be hard coded, no GOT reference involved.
> 
> I suppose at some point I'll be forced to understand how to build
> glibc from the ubuntu source package so I can see exactly
> what options and ifdefs are used and check the relocations in
> the malloc.o file from before it is incorporated with libc.so

in my build of malloc.os in glibc in the symtab i see

84:  0 TLS LOCAL  DEFAULT   10 .LANCHOR3
85:  8 TLS LOCAL  DEFAULT   10 thread_arena
86: 0008 8 TLS LOCAL  DEFAULT   10 tcache
87: 0010 1 TLS LOCAL  DEFAULT   10 tcache_shutting_down

and the R_*_TLSIE_* relocs are for .LANCHOR3 + 0,
so there will be one GOT entry for the 3 objects
and you should see

tp + got_value + (0 or 8 or 16)

address computation to access the 3 objects.

e.g. in __malloc_arena_thread_freeres i see

4e04:   d53bd056mrs x22, tpidr_el0
4e08:   9015adrpx21, 0 <_dl_tunable_set_mmap_threshold> 
4e08: R_AARCH64_TLSIE_ADR_GOTTPREL_PAGE21   .LANCHOR3
4e0c:   f94002b5ldr x21, [x21]  4e0c: 
R_AARCH64_TLSIE_LD64_GOTTPREL_LO12_NC .LANCHOR3
4e10:   a90153f3stp x19, x20, [sp, #16]
4e14:   8b1502c0add x0, x22, x21   // x0 = tp + got_value
4e18:   f9400414ldr x20, [x0, #8]  // read from tcache
4e1c:   f9001bf7str x23, [sp, #48]
4e20:   b4000234cbz x20, 4e64 
<__malloc_arena_thread_freeres+0x6c>
4e24:   52800021mov w1, #0x1// #1
4e28:   91010293add x19, x20, #0x40
4e2c:   91090297add x23, x20, #0x240
4e30:   f900041fstr xzr, [x0, #8] // write to tcache
4e34:   39004001strbw1, [x0, #16] // write to 
tchace_shutting_down

i doubt ubuntu changed this, but if the offset is
a fixed const in the binary that means they moved
that variable into the glibc internal pthread struct
(which is at a fixed offset from tp).



Re: aarch64 TLS optimizations?

2019-05-20 Thread Tom Horsley
On Mon, 20 May 2019 15:43:53 +
Szabolcs Nagy wrote:

> you can verify that 0x152000 + 3608 == 0x152e18 is
> indeed a GOT entry (falls into .got) and there is a
> 
> 00152e18 R_AARCH64_TLS_TPREL64  *ABS*+0x0010

There are a couple of other TLS variables in malloc, and I
suspect this is one of them, where it is actually looking
at tcache_shutting_down (verified with debug info and disassembly),
it is simply using the tpidr_el0 value still laying around
in the register from the 1st TLS reference and loading
tcache_shutting_down from an offset which appears for all the
world to simply be hard coded, no GOT reference involved.

I suppose at some point I'll be forced to understand how to build
glibc from the ubuntu source package so I can see exactly
what options and ifdefs are used and check the relocations in
the malloc.o file from before it is incorporated with libc.so


Re: aarch64 TLS optimizations?

2019-05-20 Thread Szabolcs Nagy
On 17/05/2019 14:51, Tom Horsley wrote:
> I'm trying (for reason too complex to go into) to
> locate the TLS offset of the tcache_shutting_down
> variable from malloc in the ubuntu provided
> glibc on aarch64 ubuntu 18.04.
> 
> Various "normal" TLS variables appear to operate
> much like x86_64 with a GOT table entry where the
> TLS offset of the variable gets stashed.

this is more of a glibc question than a gcc one
(i.e. libc-help list would be better).

tls in glibc uses the initial-exec tls access model,
(tls object is at a fixed offset from tp across threads),
that requires a GOT entry for the offset which is set
up via a R_*_TPREL dynamic reloc at startup time.

(note: if a symbol is internal to the module its TPREL
reloc is not tied to a symbol, it only has an addend
for the offset within the module)

> But in the ubuntu glibc there is no GOT entry for
> that variable, and disassembly of the code shows
> that it seems to "just know" the offset to use.

i see adrp+ldr sequences that access GOT entries.

e.g. in the objdump of libc.so.6:

000771d0 <__libc_malloc@@GLIBC_2.17>:
...
   77400:   f6c0adrpx0, 152000 

   77404:   f9470c00ldr x0, [x0, #3608]
   77408:   d53bd041mrs x1, tpidr_el0

you can verify that 0x152000 + 3608 == 0x152e18 is
indeed a GOT entry (falls into .got) and there is a

00152e18 R_AARCH64_TLS_TPREL64  *ABS*+0x0010

dynamic relocation for that entry as expected.
(but i don't know which symbol this entry is for,
only that the symbol must be a local tls sym)

> Is there some kind of magic TLS optimization that
> can happen for certain variables on aarch64? I'm trying
> to understand how it could know the offset like
> it appears to do in the code.

there is no magic.


Re: aarch64 TLS optimizations?

2019-05-17 Thread Andrew Haley
On 5/17/19 2:51 PM, Tom Horsley wrote:
> I'm trying (for reason too complex to go into) to
> locate the TLS offset of the tcache_shutting_down
> variable from malloc in the ubuntu provided
> glibc on aarch64 ubuntu 18.04.
> 
> Various "normal" TLS variables appear to operate
> much like x86_64 with a GOT table entry where the
> TLS offset of the variable gets stashed.
> 
> But in the ubuntu glibc there is no GOT entry for
> that variable, and disassembly of the code shows
> that it seems to "just know" the offset to use.
> 
> Is there some kind of magic TLS optimization that
> can happen for certain variables on aarch64? I'm trying
> to understand how it could know the offset like
> it appears to do in the code.

https://www.fsfla.org/~lxoliva/writeups/TLS/paper-lk2006.pdf



-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. 
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


aarch64 TLS optimizations?

2019-05-17 Thread Tom Horsley
I'm trying (for reason too complex to go into) to
locate the TLS offset of the tcache_shutting_down
variable from malloc in the ubuntu provided
glibc on aarch64 ubuntu 18.04.

Various "normal" TLS variables appear to operate
much like x86_64 with a GOT table entry where the
TLS offset of the variable gets stashed.

But in the ubuntu glibc there is no GOT entry for
that variable, and disassembly of the code shows
that it seems to "just know" the offset to use.

Is there some kind of magic TLS optimization that
can happen for certain variables on aarch64? I'm trying
to understand how it could know the offset like
it appears to do in the code.