https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102772

--- Comment #29 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Ok, that looks like a linker bug:
        movaps  %xmm7, thr.1@ntpoff(%ebx)
...
        movaps  %xmm7, thr.1@ntpoff+16(%ebx)
...
        movl    %eax, thr.1@ntpoff+32(%ebx)
in assembly correctly turned into:
  2a:   0f 29 bb 00 00 00 00    movaps %xmm7,0x0(%ebx)
                        2d: R_386_TLS_LE        thr.1
...
  36:   0f 29 bb 10 00 00 00    movaps %xmm7,0x10(%ebx)
                        39: R_386_TLS_LE        thr.1
...
  40:   89 83 20 00 00 00       mov    %eax,0x20(%ebx)
                        42: R_386_TLS_LE        thr.1
  [ 5] .tbss             NOBITS          00000000 000570 000024 00 WAT  0   0
16
But linker turns that into:
 80517ba:       0f 29 bb d8 ff ff ff    movaps %xmm7,-0x28(%ebx)
...
 80517c6:       0f 29 bb e8 ff ff ff    movaps %xmm7,-0x18(%ebx)
...
 80517d0:       89 83 f8 ff ff ff       mov    %eax,-0x8(%ebx)
Even when dynamic linker properly ensures the %gs:0 base is 16-byte aligned
because PT_TLS segment is:
  TLS            0x00b6f0 0x0806b6f0 0x00000000 0x00000 0x00024 RW  0x10
the R_386_TLS_LE immediates are off, they don't take into account the needed
alignment.

We should probably use a better small testcase:
struct S { char buf[0x24]; };
__thread struct S s __attribute__((aligned (16)));
__attribute__((noipa)) struct S *foo (void) { return &s; }
int
main ()
{
  #pragma omp parallel
  __builtin_printf ("%p\n", foo ());
  return 0;
}
because with the aligned (16) attribute on struct S we've increased its size to
0x30 that way.

https://akkadia.org/drepper/tls.pdf
says that
tsoffset_1 = round(tlssize_1, align_1)
tlsoffset_m+1 = round(tlsoffset_m + tlssize_m+1, align_m+1)
and as tlssize_1 (of the binary) is 36 and align_1 is 16, tlsoffset_1 is 48
and so R_386_TLS_LE thr.1 should resolve to -48 + 0 (thr.1 is at offset 0 of
the segment).
But it seems the linker instead uses -40, i.e. just the size rounded up to 8
byte alignment boundary rather than 16.

We could work around it on GCC side by padding up any TLS vars up to their
alignment, but that would be horribly expensive, potentially wasting a lot of
the precious TLS memory.

Reply via email to