[COMMITTED] Fix debug/60438 -- i686 stack vs fp operations

2014-03-13 Thread Richard Henderson
The original ICE is caused by the dwarf2cfi pass not noticing a stack
adjustment in the insn stream.  The reason for the miss is that the push/pop
was added by a post-reload splitter, which did nothing to mark the insn for
special treatment.

In the PR, Jakub and I threw around several ideas.

My first attempt, to annotate the stack adjustment so that dwarf2cfi could see
it, showed a lack of supporting infrastructure within the csa and dwarf2cfi
passes.  In order to make that path work in the short term, csa had to be 
crippled.

My second attempt removes ix86_force_to_memory and all uses thereof.  Thus
there are no longer any troublesome post-reload stack manipulations and
dwarf2cfi doesn't get confused.

In the case of int->float conversions, we can figure out at rtl-expand time
that we might need a bit o stack memory, and we can allocate it via
assign_386_stack_local.  This produces minimal churn in this area, though we
still get to remove some patterns that didn't have the scratch memory.

In the other cases the patterns are created by combine, and there we have no
chance to use assign_386_stack_local.  Here, I simply remove the register
alternatives, leaving the register allocator no choice but to force the value
into memory.  If I recall correctly, the old reload would barf on this (thus
the byzantine structure of the existing patterns).  But Vlad assured me that
LRA will handle this just fine.

Considering that we mostly will never choose the combined
int-convert-and-operate patterns (-Os or ancient cpu tunings only), I think
this is the best of the available options.

For stage1, it would be interesting to investigate whether we can eliminate the
assign_386_stack_local fiddly bits from int->float conversions, and simply rely
on LRA there as well.  It would certainly reduce some code duplication.
Indeed, with clever use of "enabled" perhaps we can reduce to a single pattern.

Tested on x86_64 and i686-linux.


r~
PR debug/60438
* config/i386/i386.c (ix86_split_fp_branch): Remove pushed argument.
(ix86_force_to_memory, ix86_free_from_memory): Remove.
* config/i386/i386-protos.h: Likewise.
* config/i386/i386.md (floathi2): Use assign_386_stack_local
in the expander instead of a splitter.
(float2): Use assign_386_stack_local if there is
any possibility of requiring a memory.
(*floatsi2_vector_mixed): Remove, and the splitters.
(*floatsi2_vector_sse): Remove, and the splitters.
(fp branch splitters): Update for ix86_split_fp_branch.
(*jcc__i387): Remove r/f alternative.
(*jcc__r_i387): Likewise.
(splitter for jcc__i387 r/f): Remove.
(*fop__2_i387): Remove f/r alternative.
(*fop__3_i387): Likewise.
(*fop_xf_2_i387, *fop_xf_3_i387): Likewise.
(splitters for the fop_* register patterns): Remove.
(fscalexf4_i387): Rename from *fscalexf4_i387.
(ldexpxf3): Use gen_floatsixf2 and gen_fscalexf4_i387.



diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 3493904..6e32978 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -154,13 +154,11 @@ extern enum machine_mode ix86_fp_compare_mode (enum 
rtx_code);
 extern rtx ix86_libcall_value (enum machine_mode);
 extern bool ix86_function_arg_regno_p (int);
 extern void ix86_asm_output_function_label (FILE *, const char *, tree);
-extern rtx ix86_force_to_memory (enum machine_mode, rtx);
-extern void ix86_free_from_memory (enum machine_mode);
 extern void ix86_call_abi_override (const_tree);
 extern int ix86_reg_parm_stack_space (const_tree);
 
 extern void ix86_split_fp_branch (enum rtx_code code, rtx, rtx,
- rtx, rtx, rtx, rtx);
+ rtx, rtx, rtx);
 extern bool ix86_hard_regno_mode_ok (int, enum machine_mode);
 extern bool ix86_modes_tieable_p (enum machine_mode, enum machine_mode);
 extern bool ix86_secondary_memory_needed (enum reg_class, enum reg_class,
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 9e33d53..64b8e0a 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -19993,7 +19993,7 @@ ix86_expand_branch (enum rtx_code code, rtx op0, rtx 
op1, rtx label)
 /* Split branch based on floating point condition.  */
 void
 ix86_split_fp_branch (enum rtx_code code, rtx op1, rtx op2,
- rtx target1, rtx target2, rtx tmp, rtx pushed)
+ rtx target1, rtx target2, rtx tmp)
 {
   rtx condition;
   rtx i;
@@ -20009,10 +20009,6 @@ ix86_split_fp_branch (enum rtx_code code, rtx op1, rtx 
op2,
   condition = ix86_expand_fp_compare (code, op1, op2,
  tmp);
 
-  /* Remove pushed operand from stack.  */
-  if (pushed)
-ix86_free_from_memory (GET_MODE (pushed));
-
   i = emit_jump_insn (gen_rtx_SET
  (VOIDmode, pc_rtx,
   gen_rtx_IF_THEN_ELSE (VOIDmode,
@@ 

Re: [COMMITTED] Fix debug/60438 -- i686 stack vs fp operations

2014-04-26 Thread Tom de Vries

On 13-03-14 21:49, Richard Henderson wrote:

  (define_expand "ldexpxf3"
-  [(set (match_dup 3)
-   (float:XF (match_operand:SI 2 "register_operand")))
-   (parallel [(set (match_operand:XF 0 " register_operand")
-  (unspec:XF [(match_operand:XF 1 "register_operand")
-  (match_dup 3)]
- UNSPEC_FSCALE_FRACT))
- (set (match_dup 4)
-  (unspec:XF [(match_dup 1) (match_dup 3)]
- UNSPEC_FSCALE_EXP))])]
+  [(match_operand:XF 0 "register_operand")
+   (match_operand:XF 1 "register_operand")
+   (match_operand:SI 2 "register_operand")]
"TARGET_USE_FANCY_MATH_387
 && flag_unsafe_math_optimizations"
  {
@@ -14808,6 +14633,11 @@

operands[3] = gen_reg_rtx (XFmode);
operands[4] = gen_reg_rtx (XFmode);
+
+  emit_insn (gen_floatsixf2 (operands[3], operands[2]));
+  emit_insn (gen_fscalexf4_i387 (operands[0], operands[4],
+ operands[1], operands[3]));
+  DONE;
  })


Richard,

For a non-bootstrap x86_64 build, gcc.dg/builtins-34.c fails for me with a 
sigsegv.

I've traced it back to this code in insn-emit.c:
...
rtx
gen_ldexpxf3 (rtx operand0,
rtx operand1,
rtx operand2)
{
  rtx _val = 0;
  start_sequence ();
  {
rtx operands[3];
operands[0] = operand0;
operands[1] = operand1;
operands[2] = operand2;

{
  if (optimize_insn_for_size_p ())
FAIL;

  operands[3] = gen_reg_rtx (XFmode);
  operands[4] = gen_reg_rtx (XFmode);
...

operands is declared with size 3, and operands[3,4] accesses are out of bounds.

I've done a minimal build with attached patch, and reran the test-case, which 
passes now.


OK if bootstrap succeeds?

Thanks,
- Tom
2014-04-26  Tom de Vries  

	* config/i386/i386.md (define_expand "ldexpxf3"): Fix out-of-bounds
	array accesses.
---
 gcc/config/i386/i386.md | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 25e2e93..9f103cf 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -14427,15 +14427,16 @@
   "TARGET_USE_FANCY_MATH_387
&& flag_unsafe_math_optimizations"
 {
+  rtx tmp1, tmp2;
   if (optimize_insn_for_size_p ())
 FAIL;
 
-  operands[3] = gen_reg_rtx (XFmode);
-  operands[4] = gen_reg_rtx (XFmode);
+  tmp1 = gen_reg_rtx (XFmode);
+  tmp2 = gen_reg_rtx (XFmode);
 
-  emit_insn (gen_floatsixf2 (operands[3], operands[2]));
-  emit_insn (gen_fscalexf4_i387 (operands[0], operands[4],
- operands[1], operands[3]));
+  emit_insn (gen_floatsixf2 (tmp1, operands[2]));
+  emit_insn (gen_fscalexf4_i387 (operands[0], tmp2,
+ operands[1], tmp1));
   DONE;
 })
 
-- 
1.8.3.2



Re: [COMMITTED] Fix debug/60438 -- i686 stack vs fp operations

2014-04-26 Thread Uros Bizjak
On Sat, Apr 26, 2014 at 11:27 AM, Tom de Vries  wrote:
> On 13-03-14 21:49, Richard Henderson wrote:
>>
>>   (define_expand "ldexpxf3"
>> -  [(set (match_dup 3)
>> -   (float:XF (match_operand:SI 2 "register_operand")))
>> -   (parallel [(set (match_operand:XF 0 " register_operand")
>> -  (unspec:XF [(match_operand:XF 1 "register_operand")
>> -  (match_dup 3)]
>> - UNSPEC_FSCALE_FRACT))
>> - (set (match_dup 4)
>> -  (unspec:XF [(match_dup 1) (match_dup 3)]
>> - UNSPEC_FSCALE_EXP))])]
>> +  [(match_operand:XF 0 "register_operand")
>> +   (match_operand:XF 1 "register_operand")
>> +   (match_operand:SI 2 "register_operand")]
>> "TARGET_USE_FANCY_MATH_387
>>  && flag_unsafe_math_optimizations"
>>   {
>> @@ -14808,6 +14633,11 @@
>>
>> operands[3] = gen_reg_rtx (XFmode);
>> operands[4] = gen_reg_rtx (XFmode);
>> +
>> +  emit_insn (gen_floatsixf2 (operands[3], operands[2]));
>> +  emit_insn (gen_fscalexf4_i387 (operands[0], operands[4],
>> + operands[1], operands[3]));
>> +  DONE;
>>   })
>
>
> Richard,
>
> For a non-bootstrap x86_64 build, gcc.dg/builtins-34.c fails for me with a
> sigsegv.
>
> I've traced it back to this code in insn-emit.c:
> ...
> rtx
> gen_ldexpxf3 (rtx operand0,
> rtx operand1,
> rtx operand2)
> {
>   rtx _val = 0;
>   start_sequence ();
>   {
> rtx operands[3];
> operands[0] = operand0;
> operands[1] = operand1;
> operands[2] = operand2;
>
> {
>   if (optimize_insn_for_size_p ())
> FAIL;
>
>   operands[3] = gen_reg_rtx (XFmode);
>   operands[4] = gen_reg_rtx (XFmode);
> ...
>
> operands is declared with size 3, and operands[3,4] accesses are out of
> bounds.
>
> I've done a minimal build with attached patch, and reran the test-case,
> which passes now.
>
> OK if bootstrap succeeds?
>
> 2014-04-26  Tom de Vries  
>
> * config/i386/i386.md (define_expand "ldexpxf3"): Fix out-of-bounds
> array accesses.

OK for mainline and 4.9 branch.

Thanks,
Uros.


Re: [COMMITTED] Fix debug/60438 -- i686 stack vs fp operations

2014-04-26 Thread Tom de Vries

OK if bootstrap succeeds?


With testing of the bootstrap build of the patch, I ran into the following 
regression compared to a reference bootstrap build without the patch:

...
FAIL: g++.dg/tsan/cond_race.C  -O2  output pattern test, is ==3087==WARNING: 
Program is run with unlimited stack size, which wouldn't work with Threa\

dSanitizer.
==3087==Re-execing with stack size limited to 33554432 bytes.
==
WARNING: ThreadSanitizer: heap-use-after-free (pid=3087)
  Read of size 8 at 0x7d18efc8 by thread T1:
#0 pthread_cond_signal 
/home/vries/gcc_versions/data/test-fix-expand-ldexp/with/src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.i\

nc:2266 (libtsan.so.0+0x00039b21)
#1 thr(void*) 
/home/vries/gcc_versions/data/test-fix-expand-ldexp/with/src/gcc/testsuite/g++.dg/tsan/cond_race.C:20 
(cond_race.exe+0x1033\

)
  Previous write of size 8 at 0x7d18efc8 by main thread:
#0 operator delete(void*) 
/home/vries/gcc_versions/data/test-fix-expand-ldexp/with/src/libsanitizer/tsan/tsan_interceptors.cc:592 
(libtsan.so.0+0\

x000494b0)
#1 main 
/home/vries/gcc_versions/data/test-fix-expand-ldexp/with/src/gcc/testsuite/g++.dg/tsan/cond_race.C:34 
(cond_race.exe+0x0ea0)

  Location is heap block of size 96 at 0x7d18efa0 allocated by main thread:
#0 operator new(unsigned long) 
/home/vries/gcc_versions/data/test-fix-expand-ldexp/with/src/libsanitizer/tsan/tsan_interceptors.cc:560 
(libtsan.s\

o.0+0x000496f2)
#1 main 
/home/vries/gcc_versions/data/test-fix-expand-ldexp/with/src/gcc/testsuite/g++.dg/tsan/cond_race.C:25 
(cond_race.exe+0x0e12)

  Thread T1 (tid=3101, running) created by main thread at:
#0 pthread_create 
/home/vries/gcc_versions/data/test-fix-expand-ldexp/with/src/libsanitizer/tsan/tsan_interceptors.cc:877 
(libtsan.so.0+0x000\

47c23)
#1 main 
/home/vries/gcc_versions/data/test-fix-expand-ldexp/with/src/gcc/testsuite/g++.dg/tsan/cond_race.C:29 
(cond_race.exe+0x0e5a)
SUMMARY: ThreadSanitizer: heap-use-after-free 
/home/vries/gcc_versions/data/test-fix-expand-ldexp/with/src/gcc/testsuite/g++.dg/tsan/cond_race.C:20 
t\

hr(void*)
==
ThreadSanitizer: reported 1 warnings
, should match ThreadSanitizer: data race.*pthread_cond_signal.*
...

I've found the same failure here: 
http://gcc.gnu.org/ml/gcc-testresults/2014-01/msg00127.html, so I'm assuming 
it's a spurious failure.


I've committed to trunk and 4.9.

Thanks,
- Tom