Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
On Mon, May 12, 2014 at 7:38 PM, Wei Mi w...@google.com wrote: Here is a patch for the test. It contains two changes: 1. For emutls, there will be an explicit call generated at expand pass, and no stack adjustment is needed. So add /* { dg-require-effective-target tls_native } */ in the test. 2. Replace cfi_def_cfa_offset with insn sequence check. Is it ok? No, the test FAILs for 32-bit i386-pc-solaris2.11 with Sun as/ld: FAIL: gcc.target/i386/pr58066.c scan-assembler sub[^\r\n]*8[^\r\n]*sp.*call[^\r\n]*__tls_get_addr.*sub[^\r\n]*8[^\r\n]*sp.*call[^\r\n]*__tls_get_addr The TLS code sequence is different here: subl$8, %esp lealccc1@tlsgd(,%ebx,1), %eax callccc1@tlsgdplt I fear this insn scanning is going to be extremely fragile. Rainer Thanks for trying the testcase. rtl scanning will be slightly better than assembly scanning. So how about this one? This is OK, with a small effective-target addition, as shown below. Thanks, Uros. Index: testsuite/gcc.target/i386/pr58066.c === --- testsuite/gcc.target/i386/pr58066.c (revision 210222) +++ testsuite/gcc.target/i386/pr58066.c (working copy) @@ -1,5 +1,6 @@ /* { dg-do compile } */ -/* { dg-options -fPIC -O2 } */ +/* { dg-require-effective-target tls_native } */ Please also add /* { dg-require-effective-target fpic } */ +/* { dg-options -fPIC -fomit-frame-pointer -O2 -fdump-rtl-final } */ /* Check whether the stack frame starting addresses of tls expanded calls in foo and goo are 16bytes aligned. */ @@ -15,4 +16,6 @@ void* goo() return ccc2; } -/* { dg-final { scan-assembler-times .cfi_def_cfa_offset 16 2 } } */ +/* { dg-final { scan-rtl-dump Function foo.*set\[^\r\n\]*sp\\)\[\r\n\]\[^\r\n\]*plus\[^\r\n\]*sp\\)\[\r\n\]\[^\r\n\]*const_int -8.*UNSPEC_TLS.*Function goo final } } */ +/* { dg-final { scan-rtl-dump Function goo.*set\[^\r\n\]*sp\\)\[\r\n\]\[^\r\n\]*plus\[^\r\n\]*sp\\)\[\r\n\]\[^\r\n\]*const_int -8.*UNSPEC_TLS final } } */ +/* { dg-final { cleanup-rtl-dump final } } */
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
Wei Mi w...@google.com writes: Can I checkin this testcase fix? I think this is for Uros to approve. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
Can I checkin this testcase fix? Thanks, Wei. On Tue, May 13, 2014 at 1:39 AM, Rainer Orth r...@cebitec.uni-bielefeld.de wrote: Wei Mi w...@google.com writes: Thanks for trying the testcase. rtl scanning will be slightly better than assembly scanning. So how about this one? This one works fine for me. Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
Wei Mi w...@google.com writes: Thanks for trying the testcase. rtl scanning will be slightly better than assembly scanning. So how about this one? This one works fine for me. Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
Hi Wei, please teach your mailer not to break/mangle long lines. Thanks. Here is a patch for the test. It contains two changes: 1. For emutls, there will be an explicit call generated at expand pass, and no stack adjustment is needed. So add /* { dg-require-effective-target tls_native } */ in the test. 2. Replace cfi_def_cfa_offset with insn sequence check. Is it ok? No, the test FAILs for 32-bit i386-pc-solaris2.11 with Sun as/ld: FAIL: gcc.target/i386/pr58066.c scan-assembler sub[^\r\n]*8[^\r\n]*sp.*call[^\r\n]*__tls_get_addr.*sub[^\r\n]*8[^\r\n]*sp.*call[^\r\n]*__tls_get_addr The TLS code sequence is different here: subl$8, %esp lealccc1@tlsgd(,%ebx,1), %eax callccc1@tlsgdplt I fear this insn scanning is going to be extremely fragile. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
Here is a patch for the test. It contains two changes: 1. For emutls, there will be an explicit call generated at expand pass, and no stack adjustment is needed. So add /* { dg-require-effective-target tls_native } */ in the test. 2. Replace cfi_def_cfa_offset with insn sequence check. Is it ok? No, the test FAILs for 32-bit i386-pc-solaris2.11 with Sun as/ld: FAIL: gcc.target/i386/pr58066.c scan-assembler sub[^\r\n]*8[^\r\n]*sp.*call[^\r\n]*__tls_get_addr.*sub[^\r\n]*8[^\r\n]*sp.*call[^\r\n]*__tls_get_addr The TLS code sequence is different here: subl$8, %esp lealccc1@tlsgd(,%ebx,1), %eax callccc1@tlsgdplt I fear this insn scanning is going to be extremely fragile. Rainer Thanks for trying the testcase. rtl scanning will be slightly better than assembly scanning. So how about this one? Thanks, Wei. Index: testsuite/gcc.target/i386/pr58066.c === --- testsuite/gcc.target/i386/pr58066.c (revision 210222) +++ testsuite/gcc.target/i386/pr58066.c (working copy) @@ -1,5 +1,6 @@ /* { dg-do compile } */ -/* { dg-options -fPIC -O2 } */ +/* { dg-require-effective-target tls_native } */ +/* { dg-options -fPIC -fomit-frame-pointer -O2 -fdump-rtl-final } */ /* Check whether the stack frame starting addresses of tls expanded calls in foo and goo are 16bytes aligned. */ @@ -15,4 +16,6 @@ void* goo() return ccc2; } -/* { dg-final { scan-assembler-times .cfi_def_cfa_offset 16 2 } } */ +/* { dg-final { scan-rtl-dump Function foo.*set\[^\r\n\]*sp\\)\[\r\n\]\[^\r\n\]*plus\[^\r\n\]*sp\\)\[\r\n\]\[^\r\n\]*const_int -8.*UNSPEC_TLS.*Function goo final } } */ +/* { dg-final { scan-rtl-dump Function goo.*set\[^\r\n\]*sp\\)\[\r\n\]\[^\r\n\]*plus\[^\r\n\]*sp\\)\[\r\n\]\[^\r\n\]*const_int -8.*UNSPEC_TLS final } } */ +/* { dg-final { cleanup-rtl-dump final } } */
Re: Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
This is the updated patch of pr58066-3.patch. ... On x86_64-apple-darwin13 I get FAIL: gcc.target/i386/pr58066.c scan-assembler-times .cfi_def_cfa_offset 16 2 Dominique
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
domi...@lps.ens.fr (Dominique Dhumieres) writes: This is the updated patch of pr58066-3.patch. ... On x86_64-apple-darwin13 I get FAIL: gcc.target/i386/pr58066.c scan-assembler-times .cfi_def_cfa_offset 16 2 Same on i386-pc-solaris2.* with Sun as (which doesn't support cfi directives). Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
Here is a patch for the test. It contains two changes: 1. For emutls, there will be an explicit call generated at expand pass, and no stack adjustment is needed. So add /* { dg-require-effective-target tls_native } */ in the test. 2. Replace cfi_def_cfa_offset with insn sequence check. Is it ok? Thanks, Wei. Index: testsuite/gcc.target/i386/pr58066.c === --- testsuite/gcc.target/i386/pr58066.c (revision 210301) +++ testsuite/gcc.target/i386/pr58066.c (working copy) @@ -1,5 +1,6 @@ /* { dg-do compile } */ -/* { dg-options -fPIC -O2 } */ +/* { dg-require-effective-target tls_native } */ +/* { dg-options -fPIC -fomit-frame-pointer -O2 } */ /* Check whether the stack frame starting addresses of tls expanded calls in foo and goo are 16bytes aligned. */ @@ -15,4 +16,4 @@ void* goo() return ccc2; } -/* { dg-final { scan-assembler-times .cfi_def_cfa_offset 16 2 } } */ +/* { dg-final { scan-assembler sub\[^\r\n\]*8\[^\r\n\]*sp.*call\[^\r\n\]*__tls_get_addr.*sub\[^\r\n\]*8\[^\r\n\]*sp.*call\[^\r\n\]*__tls_get_addr } } */ On Sat, May 10, 2014 at 6:47 AM, Rainer Orth r...@cebitec.uni-bielefeld.de wrote: domi...@lps.ens.fr (Dominique Dhumieres) writes: This is the updated patch of pr58066-3.patch. ... On x86_64-apple-darwin13 I get FAIL: gcc.target/i386/pr58066.c scan-assembler-times .cfi_def_cfa_offset 16 2 Same on i386-pc-solaris2.* with Sun as (which doesn't support cfi directives). Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
On Thu, May 8, 2014 at 12:59 AM, Wei Mi w...@google.com wrote: The calls added in the templates of tls_local_dynamic_base_32 and tls_global_dynamic_32 in pr58066-3.patch are used to prevent sched2 from moving sp setting across implicit tls calls, but those calls make the combine of UNSPEC_TLS_LD_BASE and UNSPEC_DTPOFF difficult, so that the optimization in tls_local_dynamic_32_once to convert local_dynamic to global_dynamic mode for single tls reference cannot take effect. In the updated patch, I remove those calls from insn templates and add reg:SI SP_REG explicitly in the templates of UNSPEC_TLS_GD and UNSPEC_TLS_LD_BASE. It solves the sched2 and combine problems above, and now the optimization in tls_local_dynamic_32_once works. bootstrapped ok on x86_64-linux-gnu. regression is going on. Is it OK if regression passes? Please update ChangeLog with all changes, see below: ChangeLog: gcc/ 2014-05-07 Wei Mi w...@google.com * config/i386/i386.c (ix86_compute_frame_layout): preferred_stack_boundary updated for tls expanded call. (...): Update preferred_stack_boundary for call, expanded from tls descriptor. * config/i386/i386.md: Set ix86_tls_descriptor_calls_expanded_in_cfun. * config/i386/i386.md (*tls_global_dynamic_32_gnu): Depend on SP register. (*tls_local_dynamic_base_32_gnu): Ditto. ... (tls_global_dynamic_32): Set ix86_tls_descriptor_calls_expanded_in_cfun. Update RTX to depend on SP register. (tls_local_dynamic_base_32): Ditto. ... The patch is OK for mainline with updated and complete ChangeLog entry. Thanks, Uros.
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
This is the updated patch of pr58066-3.patch. The calls added in the templates of tls_local_dynamic_base_32 and tls_global_dynamic_32 in pr58066-3.patch are used to prevent sched2 from moving sp setting across implicit tls calls, but those calls make the combine of UNSPEC_TLS_LD_BASE and UNSPEC_DTPOFF difficult, so that the optimization in tls_local_dynamic_32_once to convert local_dynamic to global_dynamic mode for single tls reference cannot take effect. In the updated patch, I remove those calls from insn templates and add reg:SI SP_REG explicitly in the templates of UNSPEC_TLS_GD and UNSPEC_TLS_LD_BASE. It solves the sched2 and combine problems above, and now the optimization in tls_local_dynamic_32_once works. bootstrapped ok on x86_64-linux-gnu. regression is going on. Is it OK if regression passes? Thanks. Wei. ChangeLog: gcc/ 2014-05-07 Wei Mi w...@google.com * config/i386/i386.c (ix86_compute_frame_layout): preferred_stack_boundary updated for tls expanded call. * config/i386/i386.md: Set ix86_tls_descriptor_calls_expanded_in_cfun. gcc/testsuite/ 2014-05-07 Wei Mi w...@google.com * gcc.target/i386/pr58066.c: New test. Index: testsuite/gcc.target/i386/pr58066.c === --- testsuite/gcc.target/i386/pr58066.c (revision 0) +++ testsuite/gcc.target/i386/pr58066.c (revision 0) @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options -fPIC -O2 } */ + +/* Check whether the stack frame starting addresses of tls expanded calls + in foo and goo are 16bytes aligned. */ +static __thread char ccc1; +void* foo() +{ + return ccc1; +} + +__thread char ccc2; +void* goo() +{ + return ccc2; +} + +/* { dg-final { scan-assembler-times .cfi_def_cfa_offset 16 2 } } */ Index: config/i386/i386.c === --- config/i386/i386.c (revision 209979) +++ config/i386/i386.c (working copy) @@ -9485,20 +9485,30 @@ ix86_compute_frame_layout (struct ix86_f frame-nregs = ix86_nsaved_regs (); frame-nsseregs = ix86_nsaved_sseregs (); - stack_alignment_needed = crtl-stack_alignment_needed / BITS_PER_UNIT; - preferred_alignment = crtl-preferred_stack_boundary / BITS_PER_UNIT; - /* 64-bit MS ABI seem to require stack alignment to be always 16 except for function prologues and leaf. */ - if ((TARGET_64BIT_MS_ABI preferred_alignment 16) + if ((TARGET_64BIT_MS_ABI crtl-preferred_stack_boundary 128) (!crtl-is_leaf || cfun-calls_alloca != 0 || ix86_current_function_calls_tls_descriptor)) { - preferred_alignment = 16; - stack_alignment_needed = 16; crtl-preferred_stack_boundary = 128; crtl-stack_alignment_needed = 128; } + /* preferred_stack_boundary is never updated for call + expanded from tls descriptor. Update it here. We don't update it in + expand stage because according to the comments before + ix86_current_function_calls_tls_descriptor, tls calls may be optimized + away. */ + else if (ix86_current_function_calls_tls_descriptor + crtl-preferred_stack_boundary PREFERRED_STACK_BOUNDARY) +{ + crtl-preferred_stack_boundary = PREFERRED_STACK_BOUNDARY; + if (crtl-stack_alignment_needed PREFERRED_STACK_BOUNDARY) + crtl-stack_alignment_needed = PREFERRED_STACK_BOUNDARY; +} + + stack_alignment_needed = crtl-stack_alignment_needed / BITS_PER_UNIT; + preferred_alignment = crtl-preferred_stack_boundary / BITS_PER_UNIT; gcc_assert (!size || stack_alignment_needed); gcc_assert (preferred_alignment = STACK_BOUNDARY / BITS_PER_UNIT); Index: config/i386/i386.md === --- config/i386/i386.md (revision 209979) +++ config/i386/i386.md (working copy) @@ -12530,7 +12530,8 @@ (unspec:SI [(match_operand:SI 1 register_operand b) (match_operand 2 tls_symbolic_operand) - (match_operand 3 constant_call_address_operand z)] + (match_operand 3 constant_call_address_operand z) + (reg:SI SP_REG)] UNSPEC_TLS_GD)) (clobber (match_scratch:SI 4 =d)) (clobber (match_scratch:SI 5 =c)) @@ -12555,11 +12556,14 @@ [(set (match_operand:SI 0 register_operand) (unspec:SI [(match_operand:SI 2 register_operand) (match_operand 1 tls_symbolic_operand) - (match_operand 3 constant_call_address_operand)] + (match_operand 3 constant_call_address_operand) + (reg:SI SP_REG)] UNSPEC_TLS_GD)) (clobber (match_scratch:SI 4)) (clobber (match_scratch:SI 5)) - (clobber (reg:CC FLAGS_REG))])]) + (clobber (reg:CC FLAGS_REG))])] + + ix86_tls_descriptor_calls_expanded_in_cfun = true;) (define_insn *tls_global_dynamic_64_mode [(set (match_operand:P 0 register_operand =a) @@ -12614,13 +12618,15 @@ (const_int 0)))
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
On Thu, May 1, 2014 at 6:42 AM, Wei Mi w...@google.com wrote: Ping. Is pr58066-3.patch or pr58066-4.patch ok for trunk? None of these patches have correct ChangeLog entries. Please follow the rules, outlined in http://gcc.gnu.org/contribute.html (Submitting Patches section), otherwise your patches will be simply ignored. I attached the patch which combined your two patches and the fix in legitimize_tls_address. I tried pr58066.c and c.i in ia32/x32/x86_64, the code looked fine. Do you think it is ok? Thanks, Wei. Either pr58066-3.patch or pr58066-4.patch looks good to me. pr58066-4 patch is definitely not OK. I wonder, how it works at all, since you can't split the insn to the same pattern. The generic code detects this condition and forces ICE (IIRC: this is the reason for UNSPEC_DIV_ALREADY_SPLIT tag in divmodmode4_1). From pr58066-3 patch: -;; Local dynamic of a single variable is a lose. Show combine how -;; to convert that back to global dynamic. - -(define_insn_and_split *tls_local_dynamic_32_once - [(set (match_operand:SI 0 register_operand =a) -(plus:SI - (unspec:SI [(match_operand:SI 1 register_operand b) - (match_operand 2 constant_call_address_operand z)] -UNSPEC_TLS_LD_BASE) - (const:SI (unspec:SI -[(match_operand 3 tls_symbolic_operand)] -UNSPEC_DTPOFF - (clobber (match_scratch:SI 4 =d)) - (clobber (match_scratch:SI 5 =c)) - (clobber (reg:CC FLAGS_REG))] - - # - - [(parallel - [(set (match_dup 0) - (unspec:SI [(match_dup 1) (match_dup 3) (match_dup 2)] - UNSPEC_TLS_GD)) - (clobber (match_dup 4)) - (clobber (match_dup 5)) - (clobber (reg:CC FLAGS_REG))])]) Why did you remove this splitter? Please do not write: +{ + ix86_tls_descriptor_calls_expanded_in_cfun = true; +}) but use a short form: + ix86_tls_descriptor_calls_expanded_in_cfun = true;) Please also add a testcase (from one of the previous mails): --- testsuite/gcc.dg/pr58066.c (revision 0) +++ testsuite/gcc.dg/pr58066.c (revision 0) Put this test to gcc.target/i386 directory ... @@ -0,0 +1,18 @@ +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } { ! ia32 } } } } */ ... to avoid target selector. +/* { dg-options -fPIC -O2 } */ + +/* Check whether the stack frame starting addresses of tls expanded calls + in foo and goo are 16bytes aligned. */ +static __thread char ccc1; +void* foo() +{ + return ccc1; +} + +__thread char ccc2; +void* goo() +{ + return ccc2; +} + +/* { dg-final { scan-assembler-times .cfi_def_cfa_offset 16 2 } } */ Please repost the complete patch with a proper ChangeLog. Uros.
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
On Wed, Apr 30, 2014 at 11:44 PM, Uros Bizjak ubiz...@gmail.com wrote: On Thu, May 1, 2014 at 6:42 AM, Wei Mi w...@google.com wrote: Ping. Is pr58066-3.patch or pr58066-4.patch ok for trunk? None of these patches have correct ChangeLog entries. Please follow the rules, outlined in http://gcc.gnu.org/contribute.html (Submitting Patches section), otherwise your patches will be simply ignored. Will add Changelog in the final patch. pr58066-4 patch is definitely not OK. I wonder, how it works at all, since you can't split the insn to the same pattern. The generic code detects this condition and forces ICE (IIRC: this is the reason for UNSPEC_DIV_ALREADY_SPLIT tag in divmodmode4_1). If the generic code detects same pattern, it will not ICE but return the original insn (See the code in try_split under the comment /* Avoid infinite loop if any insn of the result matches the original pattern. */). ix86_tls_descriptor_calls_expanded_in_cfun will be set to true even if original insn is returned by try_split. That is how pr58066-4 works. From pr58066-3 patch: -;; Local dynamic of a single variable is a lose. Show combine how -;; to convert that back to global dynamic. - -(define_insn_and_split *tls_local_dynamic_32_once - [(set (match_operand:SI 0 register_operand =a) -(plus:SI - (unspec:SI [(match_operand:SI 1 register_operand b) - (match_operand 2 constant_call_address_operand z)] -UNSPEC_TLS_LD_BASE) - (const:SI (unspec:SI -[(match_operand 3 tls_symbolic_operand)] -UNSPEC_DTPOFF - (clobber (match_scratch:SI 4 =d)) - (clobber (match_scratch:SI 5 =c)) - (clobber (reg:CC FLAGS_REG))] - - # - - [(parallel - [(set (match_dup 0) - (unspec:SI [(match_dup 1) (match_dup 3) (match_dup 2)] - UNSPEC_TLS_GD)) - (clobber (match_dup 4)) - (clobber (match_dup 5)) - (clobber (reg:CC FLAGS_REG))])]) Why did you remove this splitter? After we add add call into the pattern of tls_local_dynamic_base_32, it is difficult for the pattern tls_local_dynamic_32_once to be matched. I don't have a good way to rewrite tls_local_dynamic_32_once. Do you have any idea? (define_expand tls_local_dynamic_base_32 [(parallel [(set (match_operand:SI 0 register_operand) (call:SI (mem:QI (match_operand 2 constant_call_address_operand)) (const_int 0))) (unspec:SI [(match_operand:SI 1 register_operand)] UNSPEC_TLS_LD_BASE) (clobber (match_scratch:SI 3)) (clobber (match_scratch:SI 4)) (clobber (reg:CC FLAGS_REG))])] Please do not write: +{ + ix86_tls_descriptor_calls_expanded_in_cfun = true; +}) but use a short form: + ix86_tls_descriptor_calls_expanded_in_cfun = true;) Please also add a testcase (from one of the previous mails): --- testsuite/gcc.dg/pr58066.c (revision 0) +++ testsuite/gcc.dg/pr58066.c (revision 0) Put this test to gcc.target/i386 directory ... @@ -0,0 +1,18 @@ +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } { ! ia32 } } } } */ ... to avoid target selector. +/* { dg-options -fPIC -O2 } */ + +/* Check whether the stack frame starting addresses of tls expanded calls + in foo and goo are 16bytes aligned. */ +static __thread char ccc1; +void* foo() +{ + return ccc1; +} + +__thread char ccc2; +void* goo() +{ + return ccc2; +} + +/* { dg-final { scan-assembler-times .cfi_def_cfa_offset 16 2 } } */ Please repost the complete patch with a proper ChangeLog. Uros. will do that. Thanks, Wei.
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
Ping. Is pr58066-3.patch or pr58066-4.patch ok for trunk? Thanks, Wei. I attached the patch which combined your two patches and the fix in legitimize_tls_address. I tried pr58066.c and c.i in ia32/x32/x86_64, the code looked fine. Do you think it is ok? Thanks, Wei. Either pr58066-3.patch or pr58066-4.patch looks good to me. Thanks. -- H.J.
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
On Wed, Mar 12, 2014 at 10:52 PM, Wei Mi w...@google.com wrote: I saw the problem last patch had on ia32. Without explicit call in rtl template, scheduler may schedule the sp adjusting insn across tls descriptor and break the alignment assumption. I am testing the updated patch on x86_64. Can we combine the last two patches, both adding call explicitly in rtl template for tls_local_dynamic_base_32/tls_global_dynamic_32, and set ix86_tls_descriptor_calls_expanded_in_cfun to true only after reload complete? My ia32 change generates much worse code: [hjl@gnu-6 gcc]$ cat /tmp/c.i static __thread char ccc, bbb; int __cxa_get_globals() { return ccc - bbb; } [hjl@gnu-6 gcc]$ ./xgcc -B./ -S -O2 -fPIC /tmp/c.i [hjl@gnu-6 gcc]$ cat c.s .file c.i .section .text.unlikely,ax,@progbits .LCOLDB0: .text .LHOTB0: .p2align 4,,15 .globl __cxa_get_globals .type __cxa_get_globals, @function __cxa_get_globals: .LFB0: .cfi_startproc subq $8, %rsp .cfi_def_cfa_offset 16 leaq ccc@tlsld(%rip), %rdi call __tls_get_addr@PLT addq $8, %rsp .cfi_def_cfa_offset 8 leaq ccc@dtpoff(%rax), %rcx leaq bbb@dtpoff(%rax), %rdx movq %rcx, %rax subq %rdx, %rax ret .cfi_endproc .LFE0: .size __cxa_get_globals, .-__cxa_get_globals .section .text.unlikely .LCOLDE0: .text .LHOTE0: .section .tbss,awT,@nobits .type bbb, @object .size bbb, 1 bbb: .zero 1 .type ccc, @object .size ccc, 1 ccc: .zero 1 .ident GCC: (GNU) 4.9.0 20140312 (experimental) .section .note.GNU-stack,,@progbits [hjl@gnu-6 gcc]$ cat /tmp/c.i static __thread char ccc, bbb; int __cxa_get_globals() { return ccc - bbb; } [hjl@gnu-6 gcc]$ ./xgcc -B./ -S -O2 -fPIC /tmp/c.i -m32 [hjl@gnu-6 gcc]$ cat c.s .file c.i .section .text.unlikely,ax,@progbits .LCOLDB0: .text .LHOTB0: .p2align 4,,15 .globl __cxa_get_globals .type __cxa_get_globals, @function __cxa_get_globals: .LFB0: .cfi_startproc pushl %esi .cfi_def_cfa_offset 8 .cfi_offset 6, -8 pushl %ebx .cfi_def_cfa_offset 12 .cfi_offset 3, -12 call __x86.get_pc_thunk.bx addl $_GLOBAL_OFFSET_TABLE_, %ebx subl $4, %esp .cfi_def_cfa_offset 16 leal ccc@tlsldm(%ebx), %eax call ___tls_get_addr@PLT leal ccc@dtpoff(%eax), %esi leal ccc@tlsldm(%ebx), %eax call ___tls_get_addr@PLT addl $4, %esp .cfi_def_cfa_offset 12 leal bbb@dtpoff(%eax), %eax popl %ebx .cfi_restore 3 .cfi_def_cfa_offset 8 subl %eax, %esi movl %esi, %eax popl %esi .cfi_restore 6 .cfi_def_cfa_offset 4 ret .cfi_endproc Maybe we should keep the original patterns and split them to add CALL. -- H.J.
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
pr58066-2.patch worked for pr58066.c on ia32/x32/x86_64, but it failed on bootstrap. /usr/local/google/home/wmi/workarea/gcc-r208410-2/build/./gcc/xgcc -B/usr/local/google/home/wmi/workarea/gcc-r208410-2/build/./gcc/ -B/usr/local/google/home/wmi/workarea/gcc-r208410-2/build/install/x86_64-unknown-linux-gnu/bin/ -B/usr/local/google/home/wmi/workarea/gcc-r208410-2/build/install/x86_64-unknown-linux-gnu/lib/ -isystem /usr/local/google/home/wmi/workarea/gcc-r208410-2/build/install/x86_64-unknown-linux-gnu/include -isystem /usr/local/google/home/wmi/workarea/gcc-r208410-2/build/install/x86_64-unknown-linux-gnu/sys-include -g -O2 -m64 -O2 -g -O2 -DIN_GCC-W -Wall -Wwrite-strings -Wcast-qual -Wno-format -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -isystem ./include -fpic -mlong-double-80 -g -DIN_LIBGCC2 -fbuilding-libgcc -fno-stack-protector -fpic -mlong-double-80 -I. -I. -I../../.././gcc -I../../../../src/libgcc -I../../../../src/libgcc/. -I../../../../src/libgcc/../gcc -I../../../../src/libgcc/../include -I../../../../src/libgcc/config/libbid -DENABLE_DECIMAL_BID_FORMAT -DHAVE_CC_TLS -DUSE_TLS -o bid_decimal_globals.o -MT bid_decimal_globals.o -MD -MP -MF bid_decimal_globals.dep -c ../../../../src/libgcc/config/libbid/bid_decimal_globals.c (call_insn 5 2 6 2 (parallel [ (set (reg/f:SI 85) (call:SI (mem:QI (symbol_ref:SI (___tls_get_addr)) [0 S1 A8]) (const_int 0 [0]))) (unspec:SI [ (reg:SI 3 bx) (symbol_ref:SI (__bid_IDEC_glbflags) [flags 0x10] var_decl 0x761c1da8 __bid_IDEC_glbflags) ] UNSPEC_TLS_GD) (clobber (reg:SI 91)) (clobber (reg:SI 92)) (clobber (reg:CC 17 flags)) ]) ../../../../src/libgcc/config/libbid/bid_decimal_globals.c:51 772 {*tls_global_dynamic_32_gnu} (expr_list:REG_UNUSED (reg:SI 92) (expr_list:REG_UNUSED (reg:SI 91) (nil))) (nil)) ../../../../src/libgcc/config/libbid/bid_decimal_globals.c:52:1: internal compiler error: in curr_insn_transform, at lra-constraints.c:3262 0xad8453 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*) ../../src/gcc/rtl-error.c:109 0x9d1221 curr_insn_transform ../../src/gcc/lra-constraints.c:3262 0x9d40e4 lra_constraints(bool) ../../src/gcc/lra-constraints.c:4157 0x9c0ad8 lra(_IO_FILE*) ../../src/gcc/lra.c:2340 0x96e310 do_reload ../../src/gcc/ira.c:5457 0x96e622 rest_of_handle_reload ../../src/gcc/ira.c:5598 0x96e66c execute ../../src/gcc/ira.c:5627 The problem is the return value of the call may be assigned to a different hardreg than AX_REG. But LRA cannot do reload for output operand of call. The fix is to change the above pattern to the following pattern in legitimize_tls_address() in config/i386/i386.c. (call_insn/u 5 4 6 (parallel [ (set (reg:SI 0 ax) (call:SI (mem:QI (symbol_ref:SI (___tls_get_addr)) [0 S1 A8]) (const_int 0 [0]))) (unspec:SI [ (reg:SI 3 bx) (symbol_ref:SI (__bid_IDEC_glbflags) [flags 0x10] var_decl 0x75ef3da8 __bid_IDEC_glbflags) ] UNSPEC_TLS_GD) (clobber (scratch:SI)) (clobber (scratch:SI)) (clobber (reg:CC 17 flags)) ]) ../../../../src/libgcc/config/libbid/bid_decimal_globals.c:51 -1 (expr_list:REG_EH_REGION (const_int -2147483648 [0x8000]) (nil)) (nil)) (insn 6 5 7 (set (reg/f:SI 85) (reg:SI 0 ax)) ../../../../src/libgcc/config/libbid/bid_decimal_globals.c:51 -1 (expr_list:REG_EQUAL (symbol_ref:SI (__bid_IDEC_glbflags) [flags 0x10] var_decl 0x75ef3da8 __bid_IDEC_glbflags) After the problem is fixed, bootstrap and regression test on x86-64 are ok. Thanks, Wei. Index: config/i386/i386.md === --- config/i386/i386.md (revision 208410) +++ config/i386/i386.md (working copy) @@ -12859,13 +12859,14 @@ (define_insn *tls_global_dynamic_32_gnu [(set (match_operand:SI 0 register_operand =a) - (unspec:SI - [(match_operand:SI 1 register_operand b) - (match_operand 2 tls_symbolic_operand) - (match_operand 3 constant_call_address_operand z)] - UNSPEC_TLS_GD)) - (clobber (match_scratch:SI 4 =d)) - (clobber (match_scratch:SI 5 =c)) + (call:SI + (mem:QI (match_operand 3 constant_call_address_operand z)) + (match_operand 4))) + (unspec:SI [(match_operand:SI 1 register_operand b) + (match_operand 2 tls_symbolic_operand)] + UNSPEC_TLS_GD) + (clobber (match_scratch:SI 5 =d)) + (clobber (match_scratch:SI 6 =c)) (clobber (reg:CC FLAGS_REG))] !TARGET_64BIT TARGET_GNU_TLS { @@ -12885,13 +12886,19 @@ (define_expand tls_global_dynamic_32 [(parallel [(set (match_operand:SI 0 register_operand) - (unspec:SI [(match_operand:SI 2 register_operand) - (match_operand 1
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
My ia32 change generates much worse code: [hjl@gnu-6 gcc]$ cat /tmp/c.i static __thread char ccc, bbb; int __cxa_get_globals() { return ccc - bbb; } [hjl@gnu-6 gcc]$ ./xgcc -B./ -S -O2 -fPIC /tmp/c.i [hjl@gnu-6 gcc]$ cat c.s .file c.i .section .text.unlikely,ax,@progbits .LCOLDB0: .text .LHOTB0: .p2align 4,,15 .globl __cxa_get_globals .type __cxa_get_globals, @function __cxa_get_globals: .LFB0: .cfi_startproc subq $8, %rsp .cfi_def_cfa_offset 16 leaq ccc@tlsld(%rip), %rdi call __tls_get_addr@PLT addq $8, %rsp .cfi_def_cfa_offset 8 leaq ccc@dtpoff(%rax), %rcx leaq bbb@dtpoff(%rax), %rdx movq %rcx, %rax subq %rdx, %rax ret .cfi_endproc .LFE0: .size __cxa_get_globals, .-__cxa_get_globals .section .text.unlikely .LCOLDE0: .text .LHOTE0: .section .tbss,awT,@nobits .type bbb, @object .size bbb, 1 bbb: .zero 1 .type ccc, @object .size ccc, 1 ccc: .zero 1 .ident GCC: (GNU) 4.9.0 20140312 (experimental) .section .note.GNU-stack,,@progbits [hjl@gnu-6 gcc]$ cat /tmp/c.i static __thread char ccc, bbb; int __cxa_get_globals() { return ccc - bbb; } [hjl@gnu-6 gcc]$ ./xgcc -B./ -S -O2 -fPIC /tmp/c.i -m32 [hjl@gnu-6 gcc]$ cat c.s .file c.i .section .text.unlikely,ax,@progbits .LCOLDB0: .text .LHOTB0: .p2align 4,,15 .globl __cxa_get_globals .type __cxa_get_globals, @function __cxa_get_globals: .LFB0: .cfi_startproc pushl %esi .cfi_def_cfa_offset 8 .cfi_offset 6, -8 pushl %ebx .cfi_def_cfa_offset 12 .cfi_offset 3, -12 call __x86.get_pc_thunk.bx addl $_GLOBAL_OFFSET_TABLE_, %ebx subl $4, %esp .cfi_def_cfa_offset 16 leal ccc@tlsldm(%ebx), %eax call ___tls_get_addr@PLT leal ccc@dtpoff(%eax), %esi leal ccc@tlsldm(%ebx), %eax call ___tls_get_addr@PLT addl $4, %esp .cfi_def_cfa_offset 12 leal bbb@dtpoff(%eax), %eax popl %ebx .cfi_restore 3 .cfi_def_cfa_offset 8 subl %eax, %esi movl %esi, %eax popl %esi .cfi_restore 6 .cfi_def_cfa_offset 4 ret .cfi_endproc Maybe we should keep the original patterns and split them to add CALL. -- H.J. I tried pr58066-3.patch on the above testcase, the code it generated seems ok. I think after we change the 32bits pattern in i386.md to be similar as 64bits pattern, we should change 32bit expand to be similar as 64bit expand in legitimize_tls_address too? Thanks, Wei. ~/workarea/gcc-r208410-2/build/install/bin/gcc -m32 -S -fPIC 1.c .file 1.c .section.tbss,awT,@nobits .type ccc, @object .size ccc, 1 ccc: .zero 1 .type bbb, @object .size bbb, 1 bbb: .zero 1 .text .globl __cxa_get_globals .type __cxa_get_globals, @function __cxa_get_globals: .LFB0: .cfi_startproc pushl %ebp .cfi_def_cfa_offset 8 .cfi_offset 5, -8 movl%esp, %ebp .cfi_def_cfa_register 5 pushl %esi pushl %ebx .cfi_offset 6, -12 .cfi_offset 3, -16 call__x86.get_pc_thunk.bx addl$_GLOBAL_OFFSET_TABLE_, %ebx lealccc@tlsgd(,%ebx,1), %eax call___tls_get_addr@PLT movl%eax, %esi lealbbb@tlsgd(,%ebx,1), %eax call___tls_get_addr@PLT subl%eax, %esi movl%esi, %eax popl%ebx .cfi_restore 3 popl%esi .cfi_restore 6 popl%ebp .cfi_restore 5 .cfi_def_cfa 4, 4 ret .cfi_endproc .LFE0: .size __cxa_get_globals, .-__cxa_get_globals .section .text.__x86.get_pc_thunk.bx,axG,@progbits,__x86.get_pc_thunk.bx,comdat .globl __x86.get_pc_thunk.bx .hidden __x86.get_pc_thunk.bx .type __x86.get_pc_thunk.bx, @function __x86.get_pc_thunk.bx: .LFB1: .cfi_startproc movl(%esp), %ebx ret .cfi_endproc .LFE1: .ident GCC: (GNU) 4.9.0 20140307 (experimental) .section.note.GNU-stack,,@progbits
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
Can we combine the last two patches, both adding call explicitly in rtl template for tls_local_dynamic_base_32/tls_global_dynamic_32, and set ix86_tls_descriptor_calls_expanded_in_cfun to true only after reload complete? Hi H.J. I attached the patch which combined your two patches and the fix in legitimize_tls_address. I tried pr58066.c and c.i in ia32/x32/x86_64, the code looked fine. Do you think it is ok? Thanks, Wei. Index: config/i386/i386.c === --- config/i386/i386.c (revision 208410) +++ config/i386/i386.c (working copy) @@ -9082,7 +9082,7 @@ ix86_frame_pointer_required (void) we've not got a leaf function. */ if (TARGET_OMIT_LEAF_FRAME_POINTER (!crtl-is_leaf - || ix86_current_function_calls_tls_descriptor)) + || ix86_tls_descriptor_calls_expanded_in_cfun)) return true; if (crtl-profile !flag_fentry) @@ -9331,7 +9331,7 @@ ix86_select_alt_pic_regnum (void) { if (crtl-is_leaf !crtl-profile - !ix86_current_function_calls_tls_descriptor) + !ix86_tls_descriptor_calls_expanded_in_cfun) { int i, drap; /* Can't use the same register for both PIC and DRAP. */ @@ -9490,20 +9490,28 @@ ix86_compute_frame_layout (struct ix86_f frame-nregs = ix86_nsaved_regs (); frame-nsseregs = ix86_nsaved_sseregs (); - stack_alignment_needed = crtl-stack_alignment_needed / BITS_PER_UNIT; - preferred_alignment = crtl-preferred_stack_boundary / BITS_PER_UNIT; - /* 64-bit MS ABI seem to require stack alignment to be always 16 except for function prologues and leaf. */ - if ((TARGET_64BIT_MS_ABI preferred_alignment 16) + if ((TARGET_64BIT_MS_ABI crtl-preferred_stack_boundary 128) (!crtl-is_leaf || cfun-calls_alloca != 0 - || ix86_current_function_calls_tls_descriptor)) + || ix86_tls_descriptor_calls_expanded_in_cfun)) { - preferred_alignment = 16; - stack_alignment_needed = 16; crtl-preferred_stack_boundary = 128; crtl-stack_alignment_needed = 128; } + /* preferred_stack_boundary is never updated for call expanded from + tls descriptor. Update it here. We don't update it in expand stage + because tls calls may be optimized away. */ + else if (ix86_tls_descriptor_calls_expanded_in_cfun + crtl-preferred_stack_boundary PREFERRED_STACK_BOUNDARY) +{ + crtl-preferred_stack_boundary = PREFERRED_STACK_BOUNDARY; + if (crtl-stack_alignment_needed PREFERRED_STACK_BOUNDARY) + crtl-stack_alignment_needed = PREFERRED_STACK_BOUNDARY; +} + + stack_alignment_needed = crtl-stack_alignment_needed / BITS_PER_UNIT; + preferred_alignment = crtl-preferred_stack_boundary / BITS_PER_UNIT; gcc_assert (!size || stack_alignment_needed); gcc_assert (preferred_alignment = STACK_BOUNDARY / BITS_PER_UNIT); @@ -9608,7 +9616,7 @@ ix86_compute_frame_layout (struct ix86_f || size != 0 || !crtl-is_leaf || cfun-calls_alloca - || ix86_current_function_calls_tls_descriptor) + || ix86_tls_descriptor_calls_expanded_in_cfun) offset = (offset + stack_alignment_needed - 1) -stack_alignment_needed; /* Frame pointer points here. */ @@ -9623,7 +9631,7 @@ ix86_compute_frame_layout (struct ix86_f of stack frame are unused. */ if (ACCUMULATE_OUTGOING_ARGS (!crtl-is_leaf || cfun-calls_alloca - || ix86_current_function_calls_tls_descriptor)) + || ix86_tls_descriptor_calls_expanded_in_cfun)) { offset += crtl-outgoing_args_size; frame-outgoing_arguments_size = crtl-outgoing_args_size; @@ -9634,7 +9642,7 @@ ix86_compute_frame_layout (struct ix86_f /* Align stack boundary. Only needed if we're calling another function or using alloca. */ if (!crtl-is_leaf || cfun-calls_alloca - || ix86_current_function_calls_tls_descriptor) + || ix86_tls_descriptor_calls_expanded_in_cfun) offset = (offset + preferred_alignment - 1) -preferred_alignment; /* We've reached end of stack frame. */ @@ -9650,7 +9658,7 @@ ix86_compute_frame_layout (struct ix86_f if (ix86_using_red_zone () crtl-sp_is_unchanging crtl-is_leaf - !ix86_current_function_calls_tls_descriptor) + !ix86_tls_descriptor_calls_expanded_in_cfun) { frame-red_zone_size = to_allocate; if (frame-save_regs_using_mov) @@ -10623,7 +10631,7 @@ ix86_finalize_stack_realign_flags (void) crtl-is_leaf flag_omit_frame_pointer crtl-sp_is_unchanging - !ix86_current_function_calls_tls_descriptor + !ix86_tls_descriptor_calls_expanded_in_cfun !crtl-accesses_prior_frames !cfun-calls_alloca !crtl-calls_eh_return @@ -13437,26 +13445,25 @@ legitimize_tls_address (rtx x, enum tls_ else { rtx caddr = ix86_tls_get_addr (); + rtx ax = gen_rtx_REG (Pmode, AX_REG); + rtx insns; + start_sequence (); if (TARGET_64BIT) - { -
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
On Thu, Mar 13, 2014 at 10:55 AM, Wei Mi w...@google.com wrote: Can we combine the last two patches, both adding call explicitly in rtl template for tls_local_dynamic_base_32/tls_global_dynamic_32, and set ix86_tls_descriptor_calls_expanded_in_cfun to true only after reload complete? Hi H.J. I attached the patch which combined your two patches and the fix in legitimize_tls_address. I tried pr58066.c and c.i in ia32/x32/x86_64, the code looked fine. Do you think it is ok? Thanks, Wei. Either pr58066-3.patch or pr58066-4.patch looks good to me. Thanks. -- H.J.
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
On Fri, Mar 7, 2014 at 2:33 PM, Wei Mi w...@google.com wrote: Yes, x32 has the same problem. It should be tested. Fixed. Thanks, Wei. On Fri, Mar 7, 2014 at 2:06 PM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Mar 7, 2014 at 1:26 PM, Wei Mi w...@google.com wrote: Hi, This patch is to fix the problem described here: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066 I follow Ian's suggestion and set ix86_tls_descriptor_calls_expanded_in_cfun in tls_global_dynamic_64_mode and tls_local_dynamic_base_64_mode. Although 32bit doesn't have the problem, ix86_tls_descriptor_calls_expanded_in_cfun is also set for tls_global_dynamic_32 and tls_local_dynamic_base_32 to make ix86_tls_descriptor_calls_expanded_in_cfun setting consistent across 32bits and 64bits. If ix86_current_function_calls_tls_descriptor is set, we know that there is tls expanded call in current function. Update crtl-preferred_stack_boundary and crtl-stack_alignment_needed to be no less than PREFERED_STACK_ALIGNMENT at the start of ix86_compute_frame_layout. We don't do the update in legitimize_tls_address in cfgexpand stage, which is too early because according to the comments before ix86_current_function_calls_tls_descriptor, tls call may be optimized away. ix86_compute_frame_layout is the latest place to do the update. bootstrap on x86_64-linux-gnu is ok. regression test is going on. Ok for trunk if tests pass? Thanks, Wei. gcc/ChangeLog: 2014-03-07 Wei Mi w...@google.com * config/i386/i386.c (ix86_compute_frame_layout): Update preferred_stack_boundary when there is tls expanded call. * config/i386/i386.md: Set ix86_tls_descriptor_calls_expanded_in_cfun. gcc/testsuite/ChangeLog: 2014-03-07 Wei Mi w...@google.com * g++.dg/pr58066.C: New test. Index: gcc/config/i386/i386.c === --- gcc/config/i386/i386.c (revision 208410) +++ gcc/config/i386/i386.c (working copy) @@ -9504,6 +9504,19 @@ ix86_compute_frame_layout (struct ix86_f crtl-preferred_stack_boundary = 128; crtl-stack_alignment_needed = 128; } + /* For 64-bit target, preferred_stack_boundary is never updated for call + expanded from tls descriptor. Update it here. We don't update it in + expand stage because according to the comments before + ix86_current_function_calls_tls_descriptor, tls calls may be optimized + away. */ + else if (TARGET_64BIT + ix86_current_function_calls_tls_descriptor + crtl-preferred_stack_boundary PREFERRED_STACK_BOUNDARY) +{ + crtl-preferred_stack_boundary = PREFERRED_STACK_BOUNDARY; + if (crtl-stack_alignment_needed PREFERRED_STACK_BOUNDARY) + crtl-stack_alignment_needed = PREFERRED_STACK_BOUNDARY; +} There are several problems with this: 1. It doesn't work with C. 2. IA32 has the same issue and isn't fixed. 3. There is no testcase for global dynamic model. -- H.J.
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
There are several problems with this: 1. It doesn't work with C. Ok, I will change the testcase using C. 2. IA32 has the same issue and isn't fixed. I thought IA32 didn't have the same issue because abi only requires 32 bit alignment for stack starting address. oh, I found the old patch http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00298.html which changed the default alignment to 128bit. Ok, will remove the TARGET_64BIT constraint. 3. There is no testcase for global dynamic model. -- H.J. Will add the testcase. Thanks, Wei.
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
On Wed, Mar 12, 2014 at 2:03 PM, Wei Mi w...@google.com wrote: There are several problems with this: 1. It doesn't work with C. Ok, I will change the testcase using C. 2. IA32 has the same issue and isn't fixed. I thought IA32 didn't have the same issue because abi only requires 32 bit alignment for stack starting address. oh, I found the old patch http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00298.html which changed the default alignment to 128bit. Ok, will remove the TARGET_64BIT constraint. 3. There is no testcase for global dynamic model. -- H.J. Will add the testcase. I posted a different patch in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066 -- H.J.
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
Hi H.J., Could you show me why you postpone the setting ix86_tls_descriptor_calls_expanded_in_cfun until reload_complete and use ix86_tls_descriptor_calls_expanded_in_cfun instead of ix86_current_function_calls_tls_descriptor? Isn't ix86_current_function_calls_tls_descriptor useful to consider the case that tls call is optimized away? Thanks, Wei. On Wed, Mar 12, 2014 at 2:07 PM, H.J. Lu hjl.to...@gmail.com wrote: On Wed, Mar 12, 2014 at 2:03 PM, Wei Mi w...@google.com wrote: There are several problems with this: 1. It doesn't work with C. Ok, I will change the testcase using C. 2. IA32 has the same issue and isn't fixed. I thought IA32 didn't have the same issue because abi only requires 32 bit alignment for stack starting address. oh, I found the old patch http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00298.html which changed the default alignment to 128bit. Ok, will remove the TARGET_64BIT constraint. 3. There is no testcase for global dynamic model. -- H.J. Will add the testcase. I posted a different patch in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066 -- H.J.
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
On Wed, Mar 12, 2014 at 2:36 PM, Wei Mi w...@google.com wrote: Hi H.J., Could you show me why you postpone the setting ix86_tls_descriptor_calls_expanded_in_cfun until reload_complete and use ix86_tls_descriptor_calls_expanded_in_cfun instead of ix86_current_function_calls_tls_descriptor? Isn't ix86_current_function_calls_tls_descriptor useful to consider the case that tls call is optimized away? When a tls call is optimized away, it won't survive reload. If it does survive reload, it isn't optimized away. Also checking df_regs_ever_live_p (SP_REG) isn't reliable when called from ix86_compute_frame_layout. -- H.J.
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
Oh, I see. Thanks! Wei. On Wed, Mar 12, 2014 at 2:42 PM, H.J. Lu hjl.to...@gmail.com wrote: On Wed, Mar 12, 2014 at 2:36 PM, Wei Mi w...@google.com wrote: Hi H.J., Could you show me why you postpone the setting ix86_tls_descriptor_calls_expanded_in_cfun until reload_complete and use ix86_tls_descriptor_calls_expanded_in_cfun instead of ix86_current_function_calls_tls_descriptor? Isn't ix86_current_function_calls_tls_descriptor useful to consider the case that tls call is optimized away? When a tls call is optimized away, it won't survive reload. If it does survive reload, it isn't optimized away. Also checking df_regs_ever_live_p (SP_REG) isn't reliable when called from ix86_compute_frame_layout. -- H.J.
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
This is the updated testcase. Thanks, Wei. === --- testsuite/gcc.dg/pr58066.c (revision 0) +++ testsuite/gcc.dg/pr58066.c (revision 0) @@ -0,0 +1,18 @@ +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } { ! ia32 } } } } */ +/* { dg-options -fPIC -O2 } */ + +/* Check whether the stack frame starting addresses of tls expanded calls + in foo and goo are 16bytes aligned. */ +static __thread char ccc1; +void* foo() +{ + return ccc1; +} + +__thread char ccc2; +void* goo() +{ + return ccc2; +} + +/* { dg-final { scan-assembler-times .cfi_def_cfa_offset 16 2 } } */ On Wed, Mar 12, 2014 at 2:51 PM, Wei Mi w...@google.com wrote: Oh, I see. Thanks! Wei. On Wed, Mar 12, 2014 at 2:42 PM, H.J. Lu hjl.to...@gmail.com wrote: On Wed, Mar 12, 2014 at 2:36 PM, Wei Mi w...@google.com wrote: Hi H.J., Could you show me why you postpone the setting ix86_tls_descriptor_calls_expanded_in_cfun until reload_complete and use ix86_tls_descriptor_calls_expanded_in_cfun instead of ix86_current_function_calls_tls_descriptor? Isn't ix86_current_function_calls_tls_descriptor useful to consider the case that tls call is optimized away? When a tls call is optimized away, it won't survive reload. If it does survive reload, it isn't optimized away. Also checking df_regs_ever_live_p (SP_REG) isn't reliable when called from ix86_compute_frame_layout. -- H.J.
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
On Wed, Mar 12, 2014 at 2:58 PM, Wei Mi w...@google.com wrote: This is the updated testcase. Does my patch fix the original problem? Thanks, Wei. === --- testsuite/gcc.dg/pr58066.c (revision 0) +++ testsuite/gcc.dg/pr58066.c (revision 0) @@ -0,0 +1,18 @@ +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } { ! ia32 } } } } */ Since it is a C testcase and we should test it under ia32, it should be moved to gcc.target/i386 and remove target. +/* { dg-options -fPIC -O2 } */ + +/* Check whether the stack frame starting addresses of tls expanded calls + in foo and goo are 16bytes aligned. */ +static __thread char ccc1; +void* foo() +{ + return ccc1; +} + +__thread char ccc2; +void* goo() +{ + return ccc2; +} + +/* { dg-final { scan-assembler-times .cfi_def_cfa_offset 16 2 } } */ On Wed, Mar 12, 2014 at 2:51 PM, Wei Mi w...@google.com wrote: Oh, I see. Thanks! Wei. On Wed, Mar 12, 2014 at 2:42 PM, H.J. Lu hjl.to...@gmail.com wrote: On Wed, Mar 12, 2014 at 2:36 PM, Wei Mi w...@google.com wrote: Hi H.J., Could you show me why you postpone the setting ix86_tls_descriptor_calls_expanded_in_cfun until reload_complete and use ix86_tls_descriptor_calls_expanded_in_cfun instead of ix86_current_function_calls_tls_descriptor? Isn't ix86_current_function_calls_tls_descriptor useful to consider the case that tls call is optimized away? When a tls call is optimized away, it won't survive reload. If it does survive reload, it isn't optimized away. Also checking df_regs_ever_live_p (SP_REG) isn't reliable when called from ix86_compute_frame_layout. -- H.J. -- H.J.
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
On Wed, Mar 12, 2014 at 3:07 PM, H.J. Lu hjl.to...@gmail.com wrote: On Wed, Mar 12, 2014 at 2:58 PM, Wei Mi w...@google.com wrote: This is the updated testcase. Does my patch fix the original problem? Yes, it works. I am doing bootstrap and regression test for your patch. Thanks! Thanks, Wei. === --- testsuite/gcc.dg/pr58066.c (revision 0) +++ testsuite/gcc.dg/pr58066.c (revision 0) @@ -0,0 +1,18 @@ +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } { ! ia32 } } } } */ Since it is a C testcase and we should test it under ia32, it should be moved to gcc.target/i386 and remove target. Fixed. Thanks, Wei. Index: testsuite/gcc.target/i386/pr58066.c === --- testsuite/gcc.target/i386/pr58066.c (revision 0) +++ testsuite/gcc.target/i386/pr58066.c (revision 0) @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options -fPIC -O2 } */ + +/* Check whether the stack frame starting addresses of tls expanded calls + in foo and goo are 16bytes aligned. */ +static __thread char ccc1; +void* foo() +{ + return ccc1; +} + +__thread char ccc2; +void* goo() +{ + return ccc2; +} + +/* { dg-final { scan-assembler-times .cfi_def_cfa_offset 16 2 } } */
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
Does my patch fix the original problem? Yes, it works. I am doing bootstrap and regression test for your patch. Thanks! The patch passes bootstrap and regression test on x86_64-linux-gnu. Thanks, Wei.
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
On Wed, Mar 12, 2014 at 5:28 PM, Wei Mi w...@google.com wrote: Does my patch fix the original problem? Yes, it works. I am doing bootstrap and regression test for your patch. Thanks! The patch passes bootstrap and regression test on x86_64-linux-gnu. My patch fails to handle ia32. Here is the updated one. -- H.J. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 9e33d53..da603b6 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -9490,20 +9490,30 @@ ix86_compute_frame_layout (struct ix86_frame *frame) frame-nregs = ix86_nsaved_regs (); frame-nsseregs = ix86_nsaved_sseregs (); - stack_alignment_needed = crtl-stack_alignment_needed / BITS_PER_UNIT; - preferred_alignment = crtl-preferred_stack_boundary / BITS_PER_UNIT; - /* 64-bit MS ABI seem to require stack alignment to be always 16 except for function prologues and leaf. */ - if ((TARGET_64BIT_MS_ABI preferred_alignment 16) + if ((TARGET_64BIT_MS_ABI crtl-preferred_stack_boundary 128) (!crtl-is_leaf || cfun-calls_alloca != 0 || ix86_current_function_calls_tls_descriptor)) { - preferred_alignment = 16; - stack_alignment_needed = 16; crtl-preferred_stack_boundary = 128; crtl-stack_alignment_needed = 128; } + /* preferred_stack_boundary is never updated for call + expanded from tls descriptor. Update it here. We don't update it in + expand stage because according to the comments before + ix86_current_function_calls_tls_descriptor, tls calls may be optimized + away. */ + else if (ix86_current_function_calls_tls_descriptor + crtl-preferred_stack_boundary PREFERRED_STACK_BOUNDARY) +{ + crtl-preferred_stack_boundary = PREFERRED_STACK_BOUNDARY; + if (crtl-stack_alignment_needed PREFERRED_STACK_BOUNDARY) + crtl-stack_alignment_needed = PREFERRED_STACK_BOUNDARY; +} + + stack_alignment_needed = crtl-stack_alignment_needed / BITS_PER_UNIT; + preferred_alignment = crtl-preferred_stack_boundary / BITS_PER_UNIT; gcc_assert (!size || stack_alignment_needed); gcc_assert (preferred_alignment = STACK_BOUNDARY / BITS_PER_UNIT); diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index ea1d85f..bc8fb1f 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -12859,13 +12859,14 @@ (define_insn *tls_global_dynamic_32_gnu [(set (match_operand:SI 0 register_operand =a) - (unspec:SI - [(match_operand:SI 1 register_operand b) - (match_operand 2 tls_symbolic_operand) - (match_operand 3 constant_call_address_operand z)] - UNSPEC_TLS_GD)) - (clobber (match_scratch:SI 4 =d)) - (clobber (match_scratch:SI 5 =c)) + (call:SI + (mem:QI (match_operand 3 constant_call_address_operand z)) + (match_operand 4))) + (unspec:SI [(match_operand:SI 1 register_operand b) + (match_operand 2 tls_symbolic_operand)] + UNSPEC_TLS_GD) + (clobber (match_scratch:SI 5 =d)) + (clobber (match_scratch:SI 6 =c)) (clobber (reg:CC FLAGS_REG))] !TARGET_64BIT TARGET_GNU_TLS { @@ -12885,13 +12886,19 @@ (define_expand tls_global_dynamic_32 [(parallel [(set (match_operand:SI 0 register_operand) - (unspec:SI [(match_operand:SI 2 register_operand) - (match_operand 1 tls_symbolic_operand) - (match_operand 3 constant_call_address_operand)] - UNSPEC_TLS_GD)) + (call:SI + (mem:QI (match_operand 3 constant_call_address_operand)) + (const_int 0))) + (unspec:SI [(match_operand:SI 2 register_operand) + (match_operand 1 tls_symbolic_operand)] + UNSPEC_TLS_GD) (clobber (match_scratch:SI 4)) (clobber (match_scratch:SI 5)) - (clobber (reg:CC FLAGS_REG))])]) + (clobber (reg:CC FLAGS_REG))])] + +{ + ix86_tls_descriptor_calls_expanded_in_cfun = true; +}) (define_insn *tls_global_dynamic_64_mode [(set (match_operand:P 0 register_operand =a) @@ -12946,16 +12953,20 @@ (const_int 0))) (unspec:P [(match_operand 1 tls_symbolic_operand)] UNSPEC_TLS_GD)])] - TARGET_64BIT) + TARGET_64BIT +{ + ix86_tls_descriptor_calls_expanded_in_cfun = true; +}) (define_insn *tls_local_dynamic_base_32_gnu [(set (match_operand:SI 0 register_operand =a) - (unspec:SI - [(match_operand:SI 1 register_operand b) - (match_operand 2 constant_call_address_operand z)] - UNSPEC_TLS_LD_BASE)) - (clobber (match_scratch:SI 3 =d)) - (clobber (match_scratch:SI 4 =c)) + (call:SI + (mem:QI (match_operand 2 constant_call_address_operand z)) + (match_operand 3))) + (unspec:SI [(match_operand:SI 1 register_operand b)] + UNSPEC_TLS_LD_BASE) + (clobber (match_scratch:SI 4 =d)) + (clobber (match_scratch:SI 5 =c)) (clobber (reg:CC FLAGS_REG))] !TARGET_64BIT TARGET_GNU_TLS { @@ -12976,13 +12987,18 @@ (define_expand tls_local_dynamic_base_32 [(parallel [(set (match_operand:SI 0 register_operand) - (unspec:SI - [(match_operand:SI 1 register_operand) - (match_operand 2
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
I saw the problem last patch had on ia32. Without explicit call in rtl template, scheduler may schedule the sp adjusting insn across tls descriptor and break the alignment assumption. I am testing the updated patch on x86_64. Can we combine the last two patches, both adding call explicitly in rtl template for tls_local_dynamic_base_32/tls_global_dynamic_32, and set ix86_tls_descriptor_calls_expanded_in_cfun to true only after reload complete? Regards, Wei. On Wed, Mar 12, 2014 at 5:33 PM, H.J. Lu hjl.to...@gmail.com wrote: On Wed, Mar 12, 2014 at 5:28 PM, Wei Mi w...@google.com wrote: Does my patch fix the original problem? Yes, it works. I am doing bootstrap and regression test for your patch. Thanks! The patch passes bootstrap and regression test on x86_64-linux-gnu. My patch fails to handle ia32. Here is the updated one. -- H.J.
[PATCH, PR58066] preferred_stack_boundary update for tls expanded call
Hi, This patch is to fix the problem described here: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066 I follow Ian's suggestion and set ix86_tls_descriptor_calls_expanded_in_cfun in tls_global_dynamic_64_mode and tls_local_dynamic_base_64_mode. Although 32bit doesn't have the problem, ix86_tls_descriptor_calls_expanded_in_cfun is also set for tls_global_dynamic_32 and tls_local_dynamic_base_32 to make ix86_tls_descriptor_calls_expanded_in_cfun setting consistent across 32bits and 64bits. If ix86_current_function_calls_tls_descriptor is set, we know that there is tls expanded call in current function. Update crtl-preferred_stack_boundary and crtl-stack_alignment_needed to be no less than PREFERED_STACK_ALIGNMENT at the start of ix86_compute_frame_layout. We don't do the update in legitimize_tls_address in cfgexpand stage, which is too early because according to the comments before ix86_current_function_calls_tls_descriptor, tls call may be optimized away. ix86_compute_frame_layout is the latest place to do the update. bootstrap on x86_64-linux-gnu is ok. regression test is going on. Ok for trunk if tests pass? Thanks, Wei. gcc/ChangeLog: 2014-03-07 Wei Mi w...@google.com * config/i386/i386.c (ix86_compute_frame_layout): Update preferred_stack_boundary when there is tls expanded call. * config/i386/i386.md: Set ix86_tls_descriptor_calls_expanded_in_cfun. gcc/testsuite/ChangeLog: 2014-03-07 Wei Mi w...@google.com * g++.dg/pr58066.C: New test. Index: gcc/config/i386/i386.c === --- gcc/config/i386/i386.c (revision 208410) +++ gcc/config/i386/i386.c (working copy) @@ -9504,6 +9504,19 @@ ix86_compute_frame_layout (struct ix86_f crtl-preferred_stack_boundary = 128; crtl-stack_alignment_needed = 128; } + /* For 64-bit target, preferred_stack_boundary is never updated for call + expanded from tls descriptor. Update it here. We don't update it in + expand stage because according to the comments before + ix86_current_function_calls_tls_descriptor, tls calls may be optimized + away. */ + else if (TARGET_64BIT + ix86_current_function_calls_tls_descriptor + crtl-preferred_stack_boundary PREFERRED_STACK_BOUNDARY) +{ + crtl-preferred_stack_boundary = PREFERRED_STACK_BOUNDARY; + if (crtl-stack_alignment_needed PREFERRED_STACK_BOUNDARY) + crtl-stack_alignment_needed = PREFERRED_STACK_BOUNDARY; +} gcc_assert (!size || stack_alignment_needed); gcc_assert (preferred_alignment = STACK_BOUNDARY / BITS_PER_UNIT); Index: gcc/config/i386/i386.md === --- gcc/config/i386/i386.md (revision 208410) +++ gcc/config/i386/i386.md (working copy) @@ -12891,7 +12891,11 @@ UNSPEC_TLS_GD)) (clobber (match_scratch:SI 4)) (clobber (match_scratch:SI 5)) - (clobber (reg:CC FLAGS_REG))])]) + (clobber (reg:CC FLAGS_REG))])] + +{ + ix86_tls_descriptor_calls_expanded_in_cfun = true; +}) (define_insn *tls_global_dynamic_64_mode [(set (match_operand:P 0 register_operand =a) @@ -12946,7 +12950,10 @@ (const_int 0))) (unspec:P [(match_operand 1 tls_symbolic_operand)] UNSPEC_TLS_GD)])] - TARGET_64BIT) + TARGET_64BIT +{ + ix86_tls_descriptor_calls_expanded_in_cfun = true; +}) (define_insn *tls_local_dynamic_base_32_gnu [(set (match_operand:SI 0 register_operand =a) @@ -12982,7 +12989,11 @@ UNSPEC_TLS_LD_BASE)) (clobber (match_scratch:SI 3)) (clobber (match_scratch:SI 4)) - (clobber (reg:CC FLAGS_REG))])]) + (clobber (reg:CC FLAGS_REG))])] + +{ + ix86_tls_descriptor_calls_expanded_in_cfun = true; +}) (define_insn *tls_local_dynamic_base_64_mode [(set (match_operand:P 0 register_operand =a) @@ -13029,7 +13040,10 @@ (mem:QI (match_operand 1)) (const_int 0))) (unspec:P [(const_int 0)] UNSPEC_TLS_LD_BASE)])] - TARGET_64BIT) + TARGET_64BIT +{ + ix86_tls_descriptor_calls_expanded_in_cfun = true; +}) ;; Local dynamic of a single variable is a lose. Show combine how ;; to convert that back to global dynamic. Index: gcc/testsuite/g++.dg/pr58066.C === --- gcc/testsuite/g++.dg/pr58066.C (revision 0) +++ gcc/testsuite/g++.dg/pr58066.C (revision 0) @@ -0,0 +1,12 @@ +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } lp64 } } } */ +/* { dg-options -fPIC -O2 -m64 } */ + +/* Check whether the stack frame starting address of tls expanded call + in __cxa_get_globals() is 16bytes aligned. */ +static __thread char ccc; +extern C void* __cxa_get_globals() throw() +{ + return ccc; +} + +/* { dg-final { scan-assembler .cfi_def_cfa_offset 16 } } */
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
Regression test is ok. Thanks, Wei. On Fri, Mar 7, 2014 at 1:26 PM, Wei Mi w...@google.com wrote: Hi, This patch is to fix the problem described here: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066 I follow Ian's suggestion and set ix86_tls_descriptor_calls_expanded_in_cfun in tls_global_dynamic_64_mode and tls_local_dynamic_base_64_mode. Although 32bit doesn't have the problem, ix86_tls_descriptor_calls_expanded_in_cfun is also set for tls_global_dynamic_32 and tls_local_dynamic_base_32 to make ix86_tls_descriptor_calls_expanded_in_cfun setting consistent across 32bits and 64bits. If ix86_current_function_calls_tls_descriptor is set, we know that there is tls expanded call in current function. Update crtl-preferred_stack_boundary and crtl-stack_alignment_needed to be no less than PREFERED_STACK_ALIGNMENT at the start of ix86_compute_frame_layout. We don't do the update in legitimize_tls_address in cfgexpand stage, which is too early because according to the comments before ix86_current_function_calls_tls_descriptor, tls call may be optimized away. ix86_compute_frame_layout is the latest place to do the update. bootstrap on x86_64-linux-gnu is ok. regression test is going on. Ok for trunk if tests pass? Thanks, Wei. gcc/ChangeLog: 2014-03-07 Wei Mi w...@google.com * config/i386/i386.c (ix86_compute_frame_layout): Update preferred_stack_boundary when there is tls expanded call. * config/i386/i386.md: Set ix86_tls_descriptor_calls_expanded_in_cfun. gcc/testsuite/ChangeLog: 2014-03-07 Wei Mi w...@google.com * g++.dg/pr58066.C: New test. Index: gcc/config/i386/i386.c === --- gcc/config/i386/i386.c (revision 208410) +++ gcc/config/i386/i386.c (working copy) @@ -9504,6 +9504,19 @@ ix86_compute_frame_layout (struct ix86_f crtl-preferred_stack_boundary = 128; crtl-stack_alignment_needed = 128; } + /* For 64-bit target, preferred_stack_boundary is never updated for call + expanded from tls descriptor. Update it here. We don't update it in + expand stage because according to the comments before + ix86_current_function_calls_tls_descriptor, tls calls may be optimized + away. */ + else if (TARGET_64BIT + ix86_current_function_calls_tls_descriptor + crtl-preferred_stack_boundary PREFERRED_STACK_BOUNDARY) +{ + crtl-preferred_stack_boundary = PREFERRED_STACK_BOUNDARY; + if (crtl-stack_alignment_needed PREFERRED_STACK_BOUNDARY) + crtl-stack_alignment_needed = PREFERRED_STACK_BOUNDARY; +} gcc_assert (!size || stack_alignment_needed); gcc_assert (preferred_alignment = STACK_BOUNDARY / BITS_PER_UNIT); Index: gcc/config/i386/i386.md === --- gcc/config/i386/i386.md (revision 208410) +++ gcc/config/i386/i386.md (working copy) @@ -12891,7 +12891,11 @@ UNSPEC_TLS_GD)) (clobber (match_scratch:SI 4)) (clobber (match_scratch:SI 5)) - (clobber (reg:CC FLAGS_REG))])]) + (clobber (reg:CC FLAGS_REG))])] + +{ + ix86_tls_descriptor_calls_expanded_in_cfun = true; +}) (define_insn *tls_global_dynamic_64_mode [(set (match_operand:P 0 register_operand =a) @@ -12946,7 +12950,10 @@ (const_int 0))) (unspec:P [(match_operand 1 tls_symbolic_operand)] UNSPEC_TLS_GD)])] - TARGET_64BIT) + TARGET_64BIT +{ + ix86_tls_descriptor_calls_expanded_in_cfun = true; +}) (define_insn *tls_local_dynamic_base_32_gnu [(set (match_operand:SI 0 register_operand =a) @@ -12982,7 +12989,11 @@ UNSPEC_TLS_LD_BASE)) (clobber (match_scratch:SI 3)) (clobber (match_scratch:SI 4)) - (clobber (reg:CC FLAGS_REG))])]) + (clobber (reg:CC FLAGS_REG))])] + +{ + ix86_tls_descriptor_calls_expanded_in_cfun = true; +}) (define_insn *tls_local_dynamic_base_64_mode [(set (match_operand:P 0 register_operand =a) @@ -13029,7 +13040,10 @@ (mem:QI (match_operand 1)) (const_int 0))) (unspec:P [(const_int 0)] UNSPEC_TLS_LD_BASE)])] - TARGET_64BIT) + TARGET_64BIT +{ + ix86_tls_descriptor_calls_expanded_in_cfun = true; +}) ;; Local dynamic of a single variable is a lose. Show combine how ;; to convert that back to global dynamic. Index: gcc/testsuite/g++.dg/pr58066.C === --- gcc/testsuite/g++.dg/pr58066.C (revision 0) +++ gcc/testsuite/g++.dg/pr58066.C (revision 0) @@ -0,0 +1,12 @@ +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } lp64 } } } */ +/* { dg-options -fPIC -O2 -m64 } */ + +/* Check whether the stack frame starting address of tls expanded call + in __cxa_get_globals() is 16bytes aligned. */ +static __thread char ccc; +extern C
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
On Fri, Mar 7, 2014 at 1:26 PM, Wei Mi w...@google.com wrote: Hi, This patch is to fix the problem described here: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066 I follow Ian's suggestion and set ix86_tls_descriptor_calls_expanded_in_cfun in tls_global_dynamic_64_mode and tls_local_dynamic_base_64_mode. Although 32bit doesn't have the problem, ix86_tls_descriptor_calls_expanded_in_cfun is also set for tls_global_dynamic_32 and tls_local_dynamic_base_32 to make ix86_tls_descriptor_calls_expanded_in_cfun setting consistent across 32bits and 64bits. If ix86_current_function_calls_tls_descriptor is set, we know that there is tls expanded call in current function. Update crtl-preferred_stack_boundary and crtl-stack_alignment_needed to be no less than PREFERED_STACK_ALIGNMENT at the start of ix86_compute_frame_layout. We don't do the update in legitimize_tls_address in cfgexpand stage, which is too early because according to the comments before ix86_current_function_calls_tls_descriptor, tls call may be optimized away. ix86_compute_frame_layout is the latest place to do the update. bootstrap on x86_64-linux-gnu is ok. regression test is going on. Ok for trunk if tests pass? Thanks, Wei. gcc/ChangeLog: 2014-03-07 Wei Mi w...@google.com * config/i386/i386.c (ix86_compute_frame_layout): Update preferred_stack_boundary when there is tls expanded call. * config/i386/i386.md: Set ix86_tls_descriptor_calls_expanded_in_cfun. gcc/testsuite/ChangeLog: 2014-03-07 Wei Mi w...@google.com * g++.dg/pr58066.C: New test. Index: gcc/config/i386/i386.c === --- gcc/config/i386/i386.c (revision 208410) +++ gcc/config/i386/i386.c (working copy) @@ -9504,6 +9504,19 @@ ix86_compute_frame_layout (struct ix86_f crtl-preferred_stack_boundary = 128; crtl-stack_alignment_needed = 128; } + /* For 64-bit target, preferred_stack_boundary is never updated for call + expanded from tls descriptor. Update it here. We don't update it in + expand stage because according to the comments before + ix86_current_function_calls_tls_descriptor, tls calls may be optimized + away. */ + else if (TARGET_64BIT + ix86_current_function_calls_tls_descriptor + crtl-preferred_stack_boundary PREFERRED_STACK_BOUNDARY) +{ + crtl-preferred_stack_boundary = PREFERRED_STACK_BOUNDARY; + if (crtl-stack_alignment_needed PREFERRED_STACK_BOUNDARY) + crtl-stack_alignment_needed = PREFERRED_STACK_BOUNDARY; +} gcc_assert (!size || stack_alignment_needed); gcc_assert (preferred_alignment = STACK_BOUNDARY / BITS_PER_UNIT); Index: gcc/config/i386/i386.md === --- gcc/config/i386/i386.md (revision 208410) +++ gcc/config/i386/i386.md (working copy) @@ -12891,7 +12891,11 @@ UNSPEC_TLS_GD)) (clobber (match_scratch:SI 4)) (clobber (match_scratch:SI 5)) - (clobber (reg:CC FLAGS_REG))])]) + (clobber (reg:CC FLAGS_REG))])] + +{ + ix86_tls_descriptor_calls_expanded_in_cfun = true; +}) (define_insn *tls_global_dynamic_64_mode [(set (match_operand:P 0 register_operand =a) @@ -12946,7 +12950,10 @@ (const_int 0))) (unspec:P [(match_operand 1 tls_symbolic_operand)] UNSPEC_TLS_GD)])] - TARGET_64BIT) + TARGET_64BIT +{ + ix86_tls_descriptor_calls_expanded_in_cfun = true; +}) (define_insn *tls_local_dynamic_base_32_gnu [(set (match_operand:SI 0 register_operand =a) @@ -12982,7 +12989,11 @@ UNSPEC_TLS_LD_BASE)) (clobber (match_scratch:SI 3)) (clobber (match_scratch:SI 4)) - (clobber (reg:CC FLAGS_REG))])]) + (clobber (reg:CC FLAGS_REG))])] + +{ + ix86_tls_descriptor_calls_expanded_in_cfun = true; +}) (define_insn *tls_local_dynamic_base_64_mode [(set (match_operand:P 0 register_operand =a) @@ -13029,7 +13040,10 @@ (mem:QI (match_operand 1)) (const_int 0))) (unspec:P [(const_int 0)] UNSPEC_TLS_LD_BASE)])] - TARGET_64BIT) + TARGET_64BIT +{ + ix86_tls_descriptor_calls_expanded_in_cfun = true; +}) ;; Local dynamic of a single variable is a lose. Show combine how ;; to convert that back to global dynamic. Index: gcc/testsuite/g++.dg/pr58066.C === --- gcc/testsuite/g++.dg/pr58066.C (revision 0) +++ gcc/testsuite/g++.dg/pr58066.C (revision 0) @@ -0,0 +1,12 @@ +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } lp64 } } } */ ^ It should be not ia32. since we also want to test it for x32. +/* { dg-options -fPIC -O2 -m64 } */ No need to add -m64. + +/* Check whether the stack frame starting address
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
Yes, x32 has the same problem. It should be tested. Fixed. Thanks, Wei. On Fri, Mar 7, 2014 at 2:06 PM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Mar 7, 2014 at 1:26 PM, Wei Mi w...@google.com wrote: Hi, This patch is to fix the problem described here: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066 I follow Ian's suggestion and set ix86_tls_descriptor_calls_expanded_in_cfun in tls_global_dynamic_64_mode and tls_local_dynamic_base_64_mode. Although 32bit doesn't have the problem, ix86_tls_descriptor_calls_expanded_in_cfun is also set for tls_global_dynamic_32 and tls_local_dynamic_base_32 to make ix86_tls_descriptor_calls_expanded_in_cfun setting consistent across 32bits and 64bits. If ix86_current_function_calls_tls_descriptor is set, we know that there is tls expanded call in current function. Update crtl-preferred_stack_boundary and crtl-stack_alignment_needed to be no less than PREFERED_STACK_ALIGNMENT at the start of ix86_compute_frame_layout. We don't do the update in legitimize_tls_address in cfgexpand stage, which is too early because according to the comments before ix86_current_function_calls_tls_descriptor, tls call may be optimized away. ix86_compute_frame_layout is the latest place to do the update. bootstrap on x86_64-linux-gnu is ok. regression test is going on. Ok for trunk if tests pass? Thanks, Wei. gcc/ChangeLog: 2014-03-07 Wei Mi w...@google.com * config/i386/i386.c (ix86_compute_frame_layout): Update preferred_stack_boundary when there is tls expanded call. * config/i386/i386.md: Set ix86_tls_descriptor_calls_expanded_in_cfun. gcc/testsuite/ChangeLog: 2014-03-07 Wei Mi w...@google.com * g++.dg/pr58066.C: New test. Index: gcc/config/i386/i386.c === --- gcc/config/i386/i386.c (revision 208410) +++ gcc/config/i386/i386.c (working copy) @@ -9504,6 +9504,19 @@ ix86_compute_frame_layout (struct ix86_f crtl-preferred_stack_boundary = 128; crtl-stack_alignment_needed = 128; } + /* For 64-bit target, preferred_stack_boundary is never updated for call + expanded from tls descriptor. Update it here. We don't update it in + expand stage because according to the comments before + ix86_current_function_calls_tls_descriptor, tls calls may be optimized + away. */ + else if (TARGET_64BIT + ix86_current_function_calls_tls_descriptor + crtl-preferred_stack_boundary PREFERRED_STACK_BOUNDARY) +{ + crtl-preferred_stack_boundary = PREFERRED_STACK_BOUNDARY; + if (crtl-stack_alignment_needed PREFERRED_STACK_BOUNDARY) + crtl-stack_alignment_needed = PREFERRED_STACK_BOUNDARY; +} gcc_assert (!size || stack_alignment_needed); gcc_assert (preferred_alignment = STACK_BOUNDARY / BITS_PER_UNIT); Index: gcc/config/i386/i386.md === --- gcc/config/i386/i386.md (revision 208410) +++ gcc/config/i386/i386.md (working copy) @@ -12891,7 +12891,11 @@ UNSPEC_TLS_GD)) (clobber (match_scratch:SI 4)) (clobber (match_scratch:SI 5)) - (clobber (reg:CC FLAGS_REG))])]) + (clobber (reg:CC FLAGS_REG))])] + +{ + ix86_tls_descriptor_calls_expanded_in_cfun = true; +}) (define_insn *tls_global_dynamic_64_mode [(set (match_operand:P 0 register_operand =a) @@ -12946,7 +12950,10 @@ (const_int 0))) (unspec:P [(match_operand 1 tls_symbolic_operand)] UNSPEC_TLS_GD)])] - TARGET_64BIT) + TARGET_64BIT +{ + ix86_tls_descriptor_calls_expanded_in_cfun = true; +}) (define_insn *tls_local_dynamic_base_32_gnu [(set (match_operand:SI 0 register_operand =a) @@ -12982,7 +12989,11 @@ UNSPEC_TLS_LD_BASE)) (clobber (match_scratch:SI 3)) (clobber (match_scratch:SI 4)) - (clobber (reg:CC FLAGS_REG))])]) + (clobber (reg:CC FLAGS_REG))])] + +{ + ix86_tls_descriptor_calls_expanded_in_cfun = true; +}) (define_insn *tls_local_dynamic_base_64_mode [(set (match_operand:P 0 register_operand =a) @@ -13029,7 +13040,10 @@ (mem:QI (match_operand 1)) (const_int 0))) (unspec:P [(const_int 0)] UNSPEC_TLS_LD_BASE)])] - TARGET_64BIT) + TARGET_64BIT +{ + ix86_tls_descriptor_calls_expanded_in_cfun = true; +}) ;; Local dynamic of a single variable is a lose. Show combine how ;; to convert that back to global dynamic. Index: gcc/testsuite/g++.dg/pr58066.C === --- gcc/testsuite/g++.dg/pr58066.C (revision 0) +++ gcc/testsuite/g++.dg/pr58066.C (revision 0) @@ -0,0 +1,12 @@ +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } lp64 } } } */ ^ It should be not ia32. since we also want to test it for x32. +/* { dg-options