Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg

2016-08-04 Thread James Greenhalgh
On Wed, Aug 03, 2016 at 04:08:30PM -0600, Jeff Law wrote:
> On 08/03/2016 11:41 AM, Bernd Edlinger wrote:
> >On 08/03/16 17:38, Jeff Law wrote:
> >>cse.c changes look good, but I'd really like to see a testcase for each
> >>issue in the dejagnu framework.  Extra points if you tried to build a
> >>unit test using David M's framework, but that isn't required.
> >>
> >>The testcase from 70903 ought to be trivial to add to the dejagnu suite.
> >>  71779 might be more difficult, but if you could take a stab, it'd be
> >>appreciated.
> >>
> >
> >
> >Yes, sure.  I had assumed that the pr70903 test case is using some
> >target-specific vector types, but now I see that it even works as-is in
> >the gcc.c-torture/execute directory.
> >
> >So I've added the test case to the cse patch.  And quickly verified that
> >it works on x86_64-linux-gnu.
> >
> >
> >The pr71779 test case will be pretty difficult to reduce, because it
> >depends on combine to do the incorrect transformation and lra to spill
> >the subreg, and on the stack content at runtime to be non-zero.
> >
> >But technically it *is* already in the isl-test suite, so if isl is
> >in-tree, it is always executed by make check or make check-isl.
> >
> >It is just that gmp/mpfr/mpc and isl test results are not included by
> >contrib/test_summary, but that should be fixable.  What do you think?
> >
> >Actually that should not be too difficult, as there are test-suite.log
> >files that we could just added to the test_summary output as-is, for
> >instance:
> >
> >cat isl/test-suite.log
> >
> >==
> >isl 0.16.1: ./test-suite.log
> >==
> >
> ># TOTAL: 5
> ># PASS:  5
> ># SKIP:  0
> ># XFAIL: 0
> ># FAIL:  0
> ># XPASS: 0
> ># ERROR: 0
> >
> >.. contents:: :depth: 2
> >
> >
> >Are the patches OK now?
> Yes.  Thanks for taking care of this...
> 

Hi Bernd,

Thanks for fixing this, but it looks like you accidentally double-added
the pr70903.c testcase.

  Failures:
gcc.c-torture/execute/pr70903.c

  Bisected to:

  Author: edlinger 
  Date:   Thu Aug 4 13:20:57 2016 +

2016-08-04  Bernd Edlinger  

PR rtl-optimization/70903
* cse.c (cse_insn): If DEST is a paradoxical SUBREG, don't record 
DEST.

testsuite:
2016-08-04  Bernd Edlinger  

PR rtl-optimization/70903
* gcc.c-torture/execute/pr70903.c: New test.

  .../gcc/testsuite/gcc.c-torture/execute/pr70903.c:25:1: error: redefinition 
of 'foo'
  .../gcc/testsuite/gcc.c-torture/execute/pr70903.c:6:1: note: previous 
definition of 'foo' was here
  .../gcc/testsuite/gcc.c-torture/execute/pr70903.c:31:5: error: redefinition 
of 'main'
  .../gcc/testsuite/gcc.c-torture/execute/pr70903.c:12:5: note: previous 
definition of 'main' was here

I've fixed that up as so in revision 239142, I hope you agree the change
is obvious.

Thanks,
James

---
2016-08-04  James Greenhalgh  

* gcc.c-torture/execute/pr70903.c: Fix duplicate body.


diff --git a/gcc/testsuite/gcc.c-torture/execute/pr70903.c 
b/gcc/testsuite/gcc.c-torture/execute/pr70903.c
index 6ffd0aa..175ad1a 100644
--- a/gcc/testsuite/gcc.c-torture/execute/pr70903.c
+++ b/gcc/testsuite/gcc.c-torture/execute/pr70903.c
@@ -17,22 +17,4 @@ int main ()
 __builtin_abort();
   return 0;
 }
-typedef unsigned char V8 __attribute__ ((vector_size (32)));
-typedef unsigned int V32 __attribute__ ((vector_size (32)));
-typedef unsigned long long V64 __attribute__ ((vector_size (32)));
-
-static V32 __attribute__ ((noinline, noclone))
-foo (V64 x)
-{
-  V64 y = (V64)(V8){((V8)(V64){65535, x[0]})[1]};
-  return (V32){y[0], 255};
-}
 
-int main ()
-{
-  V32 x = foo ((V64){});
-//  __builtin_printf ("%08x %08x %08x %08x %08x %08x %08x %08x\n", x[0], x[1], 
x[2], x[3], x[4], x[5], x[6], x[7]);
-  if (x[1] != 255)
-__builtin_abort();
-  return 0;
-}




Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg

2016-08-03 Thread Jeff Law

On 08/03/2016 11:41 AM, Bernd Edlinger wrote:

On 08/03/16 17:38, Jeff Law wrote:

cse.c changes look good, but I'd really like to see a testcase for each
issue in the dejagnu framework.  Extra points if you tried to build a
unit test using David M's framework, but that isn't required.

The testcase from 70903 ought to be trivial to add to the dejagnu suite.
  71779 might be more difficult, but if you could take a stab, it'd be
appreciated.




Yes, sure.  I had assumed that the pr70903 test case is using some
target-specific vector types, but now I see that it even works as-is in
the gcc.c-torture/execute directory.

So I've added the test case to the cse patch.  And quickly verified that
it works on x86_64-linux-gnu.


The pr71779 test case will be pretty difficult to reduce, because it
depends on combine to do the incorrect transformation and lra to spill
the subreg, and on the stack content at runtime to be non-zero.

But technically it *is* already in the isl-test suite, so if isl is
in-tree, it is always executed by make check or make check-isl.

It is just that gmp/mpfr/mpc and isl test results are not included by
contrib/test_summary, but that should be fixable.  What do you think?

Actually that should not be too difficult, as there are test-suite.log
files that we could just added to the test_summary output as-is, for
instance:

cat isl/test-suite.log

==
isl 0.16.1: ./test-suite.log
==

# TOTAL: 5
# PASS:  5
# SKIP:  0
# XFAIL: 0
# FAIL:  0
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2


Are the patches OK now?

Yes.  Thanks for taking care of this...

Jeff



Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg

2016-08-03 Thread Bernd Edlinger
On 08/03/16 17:38, Jeff Law wrote:
> cse.c changes look good, but I'd really like to see a testcase for each
> issue in the dejagnu framework.  Extra points if you tried to build a
> unit test using David M's framework, but that isn't required.
>
> The testcase from 70903 ought to be trivial to add to the dejagnu suite.
>   71779 might be more difficult, but if you could take a stab, it'd be
> appreciated.
>


Yes, sure.  I had assumed that the pr70903 test case is using some
target-specific vector types, but now I see that it even works as-is in
the gcc.c-torture/execute directory.

So I've added the test case to the cse patch.  And quickly verified that
it works on x86_64-linux-gnu.


The pr71779 test case will be pretty difficult to reduce, because it
depends on combine to do the incorrect transformation and lra to spill
the subreg, and on the stack content at runtime to be non-zero.

But technically it *is* already in the isl-test suite, so if isl is
in-tree, it is always executed by make check or make check-isl.

It is just that gmp/mpfr/mpc and isl test results are not included by
contrib/test_summary, but that should be fixable.  What do you think?

Actually that should not be too difficult, as there are test-suite.log
files that we could just added to the test_summary output as-is, for
instance:

cat isl/test-suite.log

==
isl 0.16.1: ./test-suite.log
==

# TOTAL: 5
# PASS:  5
# SKIP:  0
# XFAIL: 0
# FAIL:  0
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2


Are the patches OK now?


Thanks
Bernd.
2016-08-01  Bernd Edlinger  

	PR rtl-optimization/70903
	* cse.c (cse_insn): If DEST is a paradoxical SUBREG, don't record DEST.

testsuite:
2016-08-01  Bernd Edlinger  

	PR rtl-optimization/70903
	* gcc.c-torture/execute/pr70903.c: New test.

Index: gcc/cse.c
===
--- gcc/cse.c	(revision 238915)
+++ gcc/cse.c	(working copy)
@@ -5898,15 +5898,7 @@ cse_insn (rtx_insn *insn)
 	|| GET_MODE (dest) == BLKmode
 	/* If we didn't put a REG_EQUAL value or a source into the hash
 	   table, there is no point is recording DEST.  */
-	|| sets[i].src_elt == 0
-	/* If DEST is a paradoxical SUBREG and SRC is a ZERO_EXTEND
-	   or SIGN_EXTEND, don't record DEST since it can cause
-	   some tracking to be wrong.
-
-	   ??? Think about this more later.  */
-	|| (paradoxical_subreg_p (dest)
-		&& (GET_CODE (sets[i].src) == SIGN_EXTEND
-		|| GET_CODE (sets[i].src) == ZERO_EXTEND)))
+	|| sets[i].src_elt == 0)
 	  continue;
 
 	/* STRICT_LOW_PART isn't part of the value BEING set,
@@ -5925,6 +5917,11 @@ cse_insn (rtx_insn *insn)
 	  sets[i].dest_hash = HASH (dest, GET_MODE (dest));
 	}
 
+	/* If DEST is a paradoxical SUBREG, don't record DEST since the bits
+	   outside the mode of GET_MODE (SUBREG_REG (dest)) are undefined.  */
+	if (paradoxical_subreg_p (dest))
+	  continue;
+
 	elt = insert (dest, sets[i].src_elt,
 		  sets[i].dest_hash, GET_MODE (dest));
 
Index: gcc/testsuite/gcc.c-torture/execute/pr70903.c
===
--- gcc/testsuite/gcc.c-torture/execute/pr70903.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/pr70903.c	(working copy)
@@ -0,0 +1,19 @@
+typedef unsigned char V8 __attribute__ ((vector_size (32)));
+typedef unsigned int V32 __attribute__ ((vector_size (32)));
+typedef unsigned long long V64 __attribute__ ((vector_size (32)));
+
+static V32 __attribute__ ((noinline, noclone))
+foo (V64 x)
+{
+  V64 y = (V64)(V8){((V8)(V64){65535, x[0]})[1]};
+  return (V32){y[0], 255};
+}
+
+int main ()
+{
+  V32 x = foo ((V64){});
+//  __builtin_printf ("%08x %08x %08x %08x %08x %08x %08x %08x\n", x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7]);
+  if (x[1] != 255)
+__builtin_abort();
+  return 0;
+}
2016-08-01  Bernd Edlinger  

	PR rtl-optimization/71779
	* emit-rtl.c (set_reg_attrs_from_value): Only propagate REG_POINTER,
	if the value was sign-extended according to POINTERS_EXTEND_UNSIGNED
	or if it was truncated.

Index: gcc/emit-rtl.c
===
--- gcc/emit-rtl.c	(revision 238915)
+++ gcc/emit-rtl.c	(working copy)
@@ -1156,7 +1156,11 @@ set_reg_attrs_from_value (rtx reg, rtx x)
 {
 #if defined(POINTERS_EXTEND_UNSIGNED)
   if (((GET_CODE (x) == SIGN_EXTEND && POINTERS_EXTEND_UNSIGNED)
-	   || (GET_CODE (x) != SIGN_EXTEND && ! POINTERS_EXTEND_UNSIGNED))
+	   || (GET_CODE (x) == ZERO_EXTEND && ! POINTERS_EXTEND_UNSIGNED)
+	   || (paradoxical_subreg_p (x)
+	   && ! (SUBREG_PROMOTED_VAR_P (x)
+		 && SUBREG_CHECK_PROMOTED_SIGN (x,
+		POINTERS_EXTEND_UNSIGNED
 	  && !targetm.have_ptr_extend ())
 	can_be_reg_pointer = false;
 #endif


Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg

2016-08-03 Thread Jeff Law

On 08/01/2016 12:52 PM, Bernd Edlinger wrote:

Hi Jeff,

On 08/01/16 19:54, Jeff Law wrote:

> Looks like you've probably nailed it.  It'll be interesting see if
> there's any fallout (though our RTL optimizer testing is pretty weak, so
> even if there were, I doubt we'd catch it).
>

If there is, it will probably a performance regression...

Anyway I'd say these two patches do just disable actually wrong
transformations.  So here are both patches as separate diffs
with your suggestion for the comment in cse_insn.

I believe that on x86_64 both patches do not change a single bit.

However I think there are more paradoxical subregs generated all over,
but the aarch64 insv code pattern did trigger more hidden bugs than
any other port.  It is certainly unfortunate that the major source
of paradoxical subreg is in a target-dependent code path :(

Please apologize that I am not able to reduce/finalize the aarch64 test
case at this time, as I usually only work with arm and intel targets,
but I made an exception here, because a bug like that may affect all
targets sooner or later.


Boot-strap and reg-testing on x86_64-linux-gnu.
Plus aarch64 bootstrap and isl-testing by Andreas.


Is it OK for trunk?
cse.c changes look good, but I'd really like to see a testcase for each 
issue in the dejagnu framework.  Extra points if you tried to build a 
unit test using David M's framework, but that isn't required.


The testcase from 70903 ought to be trivial to add to the dejagnu suite. 
 71779 might be more difficult, but if you could take a stab, it'd be 
appreciated.


jeff





Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg

2016-08-03 Thread Jeff Law

On 08/02/2016 02:16 PM, Bernd Schmidt wrote:

On 08/02/2016 06:46 PM, Segher Boessenkool wrote:

On Tue, Aug 02, 2016 at 09:21:34AM -0600, Jeff Law wrote:

However I think there are more paradoxical subregs generated all over,
but the aarch64 insv code pattern did trigger more hidden bugs than
any other port.  It is certainly unfortunate that the major source
of paradoxical subreg is in a target-dependent code path :(


It is certainly unfortunate that paradoxical subregs exist at all!  :-)

Yea.  It probably seemed like a good idea 25-30 years ago, but I always
cringe when I see them being used.  Yea it gives the compiler some more
freedom, but more often than not I think we'd be better off with real
extensions.


And then perhaps have some bits marked as "do not care", perhaps using
a register note...  This would help other cases as well.


I'm thinking maybe an any_extend code to go along with sign_extend and
zero_extend. If input and output registers are the same it would be
treated like a no-op move. That might be close enough to get us the
benefits of a paradoxical subreg.

I was kind of thinking along the same lines.  Not high priority though.

Jeff


Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg

2016-08-03 Thread Bernd Edlinger
Hi,

Is it OK for the trunk?

I guess so, but need an explicit OK.


Thanks
Bernd.

On 08/01/16 20:52, Bernd Edlinger wrote:
> Hi Jeff,
>
> On 08/01/16 19:54, Jeff Law wrote:
>> Looks like you've probably nailed it.  It'll be interesting see if
>> there's any fallout (though our RTL optimizer testing is pretty weak, so
>> even if there were, I doubt we'd catch it).
>>
>
> If there is, it will probably a performance regression...
>
> Anyway I'd say these two patches do just disable actually wrong
> transformations.  So here are both patches as separate diffs
> with your suggestion for the comment in cse_insn.
>
> I believe that on x86_64 both patches do not change a single bit.
>
> However I think there are more paradoxical subregs generated all over,
> but the aarch64 insv code pattern did trigger more hidden bugs than
> any other port.  It is certainly unfortunate that the major source
> of paradoxical subreg is in a target-dependent code path :(
>
> Please apologize that I am not able to reduce/finalize the aarch64 test
> case at this time, as I usually only work with arm and intel targets,
> but I made an exception here, because a bug like that may affect all
> targets sooner or later.
>
>
> Boot-strap and reg-testing on x86_64-linux-gnu.
> Plus aarch64 bootstrap and isl-testing by Andreas.
>
>
> Is it OK for trunk?
>
>
>
> Thanks
> Bernd.


Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg

2016-08-02 Thread Bernd Schmidt

On 08/02/2016 06:46 PM, Segher Boessenkool wrote:

On Tue, Aug 02, 2016 at 09:21:34AM -0600, Jeff Law wrote:

However I think there are more paradoxical subregs generated all over,
but the aarch64 insv code pattern did trigger more hidden bugs than
any other port.  It is certainly unfortunate that the major source
of paradoxical subreg is in a target-dependent code path :(


It is certainly unfortunate that paradoxical subregs exist at all!  :-)

Yea.  It probably seemed like a good idea 25-30 years ago, but I always
cringe when I see them being used.  Yea it gives the compiler some more
freedom, but more often than not I think we'd be better off with real
extensions.


And then perhaps have some bits marked as "do not care", perhaps using
a register note...  This would help other cases as well.


I'm thinking maybe an any_extend code to go along with sign_extend and 
zero_extend. If input and output registers are the same it would be 
treated like a no-op move. That might be close enough to get us the 
benefits of a paradoxical subreg.



Bernd



Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg

2016-08-02 Thread Jeff Law

On 08/02/2016 11:46 AM, Richard Biener wrote:

But we love to exploit undefined behavior elsewhere, too.  Now the
init-regs pass comes to my mind again (papering over issues
elsewhere)..
True.  I just haven't seen that the don't care bits created by 
paradoxical subregs has actually been all that useful in practice.


In fact, I've seen subregs get in the way often, but if the code had a 
normal extension it would have been optimized better.


Jeff


Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg

2016-08-02 Thread Richard Biener
On August 2, 2016 5:21:34 PM GMT+02:00, Jeff Law  wrote:
>On 08/01/2016 05:31 PM, Segher Boessenkool wrote:
>> Hi,
>>
>> On Mon, Aug 01, 2016 at 06:52:54PM +, Bernd Edlinger wrote:
>>> On 08/01/16 19:54, Jeff Law wrote:
 Looks like you've probably nailed it.  It'll be interesting see if
 there's any fallout (though our RTL optimizer testing is pretty
>weak, so
 even if there were, I doubt we'd catch it).
>>>
>>> If there is, it will probably a performance regression...
>>
>> I tested building Linux with and without the patch, on many archs.
>> The few that show differences are:
>>
>>alpha   6148872   6148776
>> ia64  16946958  16946670
>> s390  12345770  12345850
>> tile  12016086  12016070
>>
>> (left before, right after; arm and aarch64 did not build, kernel
>problems).
>>
>> So all except s390 generate smaller code even.
>They're all deep enough in the noise that I wouldn't care either way
>:-)
>
>>
>>> However I think there are more paradoxical subregs generated all
>over,
>>> but the aarch64 insv code pattern did trigger more hidden bugs than
>>> any other port.  It is certainly unfortunate that the major source
>>> of paradoxical subreg is in a target-dependent code path :(
>>
>> It is certainly unfortunate that paradoxical subregs exist at all! 
>:-)
>Yea.  It probably seemed like a good idea 25-30 years ago, but I always
>
>cringe when I see them being used.  Yea it gives the compiler some more
>
>freedom, but more often than not I think we'd be better off with real 
>extensions.

But we love to exploit undefined behavior elsewhere, too.  Now the init-regs 
pass comes to my mind again (papering over issues elsewhere)..

Richard.


>jeff




Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg

2016-08-02 Thread Segher Boessenkool
On Tue, Aug 02, 2016 at 09:21:34AM -0600, Jeff Law wrote:
> >>However I think there are more paradoxical subregs generated all over,
> >>but the aarch64 insv code pattern did trigger more hidden bugs than
> >>any other port.  It is certainly unfortunate that the major source
> >>of paradoxical subreg is in a target-dependent code path :(
> >
> >It is certainly unfortunate that paradoxical subregs exist at all!  :-)
> Yea.  It probably seemed like a good idea 25-30 years ago, but I always 
> cringe when I see them being used.  Yea it gives the compiler some more 
> freedom, but more often than not I think we'd be better off with real 
> extensions.

And then perhaps have some bits marked as "do not care", perhaps using
a register note...  This would help other cases as well.


Segher


Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg

2016-08-02 Thread Jeff Law

On 08/01/2016 05:31 PM, Segher Boessenkool wrote:

Hi,

On Mon, Aug 01, 2016 at 06:52:54PM +, Bernd Edlinger wrote:

On 08/01/16 19:54, Jeff Law wrote:

Looks like you've probably nailed it.  It'll be interesting see if
there's any fallout (though our RTL optimizer testing is pretty weak, so
even if there were, I doubt we'd catch it).


If there is, it will probably a performance regression...


I tested building Linux with and without the patch, on many archs.
The few that show differences are:

   alpha   6148872   6148776
ia64  16946958  16946670
s390  12345770  12345850
tile  12016086  12016070

(left before, right after; arm and aarch64 did not build, kernel problems).

So all except s390 generate smaller code even.

They're all deep enough in the noise that I wouldn't care either way :-)




However I think there are more paradoxical subregs generated all over,
but the aarch64 insv code pattern did trigger more hidden bugs than
any other port.  It is certainly unfortunate that the major source
of paradoxical subreg is in a target-dependent code path :(


It is certainly unfortunate that paradoxical subregs exist at all!  :-)
Yea.  It probably seemed like a good idea 25-30 years ago, but I always 
cringe when I see them being used.  Yea it gives the compiler some more 
freedom, but more often than not I think we'd be better off with real 
extensions.


jeff



Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg

2016-08-01 Thread Segher Boessenkool
Hi,

On Mon, Aug 01, 2016 at 06:52:54PM +, Bernd Edlinger wrote:
> On 08/01/16 19:54, Jeff Law wrote:
> > Looks like you've probably nailed it.  It'll be interesting see if
> > there's any fallout (though our RTL optimizer testing is pretty weak, so
> > even if there were, I doubt we'd catch it).
> 
> If there is, it will probably a performance regression...

I tested building Linux with and without the patch, on many archs.
The few that show differences are:

   alpha   6148872   6148776
ia64  16946958  16946670
s390  12345770  12345850
tile  12016086  12016070

(left before, right after; arm and aarch64 did not build, kernel problems).

So all except s390 generate smaller code even.

> However I think there are more paradoxical subregs generated all over,
> but the aarch64 insv code pattern did trigger more hidden bugs than
> any other port.  It is certainly unfortunate that the major source
> of paradoxical subreg is in a target-dependent code path :(

It is certainly unfortunate that paradoxical subregs exist at all!  :-)


Segher


Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg

2016-08-01 Thread Bernd Edlinger
Hi Jeff,

On 08/01/16 19:54, Jeff Law wrote:
> Looks like you've probably nailed it.  It'll be interesting see if
> there's any fallout (though our RTL optimizer testing is pretty weak, so
> even if there were, I doubt we'd catch it).
>

If there is, it will probably a performance regression...

Anyway I'd say these two patches do just disable actually wrong
transformations.  So here are both patches as separate diffs
with your suggestion for the comment in cse_insn.

I believe that on x86_64 both patches do not change a single bit.

However I think there are more paradoxical subregs generated all over,
but the aarch64 insv code pattern did trigger more hidden bugs than
any other port.  It is certainly unfortunate that the major source
of paradoxical subreg is in a target-dependent code path :(

Please apologize that I am not able to reduce/finalize the aarch64 test
case at this time, as I usually only work with arm and intel targets, 
but I made an exception here, because a bug like that may affect all
targets sooner or later.


Boot-strap and reg-testing on x86_64-linux-gnu.
Plus aarch64 bootstrap and isl-testing by Andreas.


Is it OK for trunk?



Thanks
Bernd.
2016-08-01  Bernd Edlinger  

	PR rtl-optimization/70903
	* cse.c (cse_insn): If DEST is a paradoxical SUBREG, don't record DEST.

Index: gcc/cse.c
===
--- gcc/cse.c	(revision 238915)
+++ gcc/cse.c	(working copy)
@@ -5898,15 +5898,7 @@ cse_insn (rtx_insn *insn)
 	|| GET_MODE (dest) == BLKmode
 	/* If we didn't put a REG_EQUAL value or a source into the hash
 	   table, there is no point is recording DEST.  */
-	|| sets[i].src_elt == 0
-	/* If DEST is a paradoxical SUBREG and SRC is a ZERO_EXTEND
-	   or SIGN_EXTEND, don't record DEST since it can cause
-	   some tracking to be wrong.
-
-	   ??? Think about this more later.  */
-	|| (paradoxical_subreg_p (dest)
-		&& (GET_CODE (sets[i].src) == SIGN_EXTEND
-		|| GET_CODE (sets[i].src) == ZERO_EXTEND)))
+	|| sets[i].src_elt == 0)
 	  continue;
 
 	/* STRICT_LOW_PART isn't part of the value BEING set,
@@ -5925,6 +5917,11 @@ cse_insn (rtx_insn *insn)
 	  sets[i].dest_hash = HASH (dest, GET_MODE (dest));
 	}
 
+	/* If DEST is a paradoxical SUBREG, don't record DEST since the bits
+	   outside the mode of GET_MODE (SUBREG_REG (dest)) are undefined.  */
+	if (paradoxical_subreg_p (dest))
+	  continue;
+
 	elt = insert (dest, sets[i].src_elt,
 		  sets[i].dest_hash, GET_MODE (dest));
 
2016-08-01  Bernd Edlinger  

	PR rtl-optimization/71779
	* emit-rtl.c (set_reg_attrs_from_value): Only propagate REG_POINTER,
	if the value was sign-extended according to POINTERS_EXTEND_UNSIGNED
	or if it was truncated.

Index: gcc/emit-rtl.c
===
--- gcc/emit-rtl.c	(revision 238915)
+++ gcc/emit-rtl.c	(working copy)
@@ -1156,7 +1156,11 @@ set_reg_attrs_from_value (rtx reg, rtx x)
 {
 #if defined(POINTERS_EXTEND_UNSIGNED)
   if (((GET_CODE (x) == SIGN_EXTEND && POINTERS_EXTEND_UNSIGNED)
-	   || (GET_CODE (x) != SIGN_EXTEND && ! POINTERS_EXTEND_UNSIGNED))
+	   || (GET_CODE (x) == ZERO_EXTEND && ! POINTERS_EXTEND_UNSIGNED)
+	   || (paradoxical_subreg_p (x)
+	   && ! (SUBREG_PROMOTED_VAR_P (x)
+		 && SUBREG_CHECK_PROMOTED_SIGN (x,
+		POINTERS_EXTEND_UNSIGNED
 	  && !targetm.have_ptr_extend ())
 	can_be_reg_pointer = false;
 #endif


Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg

2016-08-01 Thread Jeff Law

On 07/31/2016 04:44 AM, Bernd Edlinger wrote:


like this?

Index: emit-rtl.c
===
--- emit-rtl.c  (revision 238891)
+++ emit-rtl.c  (working copy)
@@ -1156,7 +1156,11 @@
  {
  #if defined(POINTERS_EXTEND_UNSIGNED)
if (((GET_CODE (x) == SIGN_EXTEND && POINTERS_EXTEND_UNSIGNED)
-  || (GET_CODE (x) != SIGN_EXTEND && ! POINTERS_EXTEND_UNSIGNED))
+  || (GET_CODE (x) == ZERO_EXTEND && ! POINTERS_EXTEND_UNSIGNED)
+  || (paradoxical_subreg_p (x)
+  && ! (SUBREG_PROMOTED_VAR_P (x)
+&& SUBREG_CHECK_PROMOTED_SIGN (x,
+   POINTERS_EXTEND_UNSIGNED
  && !targetm.have_ptr_extend ())
can_be_reg_pointer = false;
  #endif

In the test case of pr71779 the subreg is no promoted var, so this
has no influence at this time.  Also I have not POINTERS_EXTEND_SIGNED
target, but for the symmetry it ought to check explicitly for
ZERO_EXTEND as well, and allow the pointer value to pass thru a
TRUNCATE.
I believe MIPS is likely the only target that extends signed, a few need 
special extension code (ia64/s390).  But this looks reasonable to me.  I 
don't think it's worth the complication of dealing with truncation.







I debugged the cse again, to see how it works and why it mis-compiles
this example.

I found out that the trouble starts one instruction earlier:


[ Fixing the missing bits from the insn using your later message... ]

(insn 19 40 20 2 (set (subreg:DI (reg:QI 93) 0)
 (const_int 255 [0xff])) pr.c:8 50 {*movdi_aarch64}
  (expr_list:REG_DEAD (reg:DI 110 [ D.3037 ])
 (nil)))




cse_main sees the constant value and maps:
(reg:QI 93)  =>  (const_int 255 [0xff])
OK.  This looks OK to me.  We know unambiguously that (reg 93) has the 
value 0xff -- that's because (reg 93) is a QImode register.   There are 
no outside QImode.






plus (I mean that is wrong):
(subreg:DI (reg:QI 93) 0)  =>  (const_int 255 [0xff])

When the next insn is scanned

(insn 20 19 21 2 (set (reg:DI 94)
 (subreg:DI (reg:QI 93) 0)) pr.c:8 50 {*movdi_aarch64}
  (expr_list:REG_DEAD (reg:QI 93)
 (nil)))

I see fold_rtx (subreg:DI (reg:QI 93) 0))
return (const_int 255 [0xff]) which is wrong.
I think this is the key point where things have gone wrong.  While we 
know the QImode bits are 0xff the bits outside QImode are undefined.  So 
we can't legitimately return 0xff when folding that rtx.






now cse maps:
(reg:DI 94)  =>  (const_int 255 [0xff])

And now we propagate the mistake from the previous step






Now I think I found a better place for a patch, where the first bogus
mapping is recorded:

Index: cse.c
===
--- cse.c   (revision 238891)
+++ cse.c   (working copy)
@@ -5898,15 +5898,7 @@ cse_insn (rtx_insn *insn)
|| GET_MODE (dest) == BLKmode
/* If we didn't put a REG_EQUAL value or a source into the hash
   table, there is no point is recording DEST.  */
-   || sets[i].src_elt == 0
-   /* If DEST is a paradoxical SUBREG and SRC is a ZERO_EXTEND
-  or SIGN_EXTEND, don't record DEST since it can cause
-  some tracking to be wrong.
-
-  ??? Think about this more later.  */
-   || (paradoxical_subreg_p (dest)
-   && (GET_CODE (sets[i].src) == SIGN_EXTEND
-   || GET_CODE (sets[i].src) == ZERO_EXTEND)))
+   || sets[i].src_elt == 0)
  continue;

/* STRICT_LOW_PART isn't part of the value BEING set,
@@ -5925,6 +5917,11 @@ cse_insn (rtx_insn *insn)
  sets[i].dest_hash = HASH (dest, GET_MODE (dest));
}

+   /* If DEST is a paradoxical SUBREG, don't record DEST since it can
+  cause some tracking to be wrong.  */
+   if (paradoxical_subreg_p (dest))
+ continue;
+
elt = insert (dest, sets[i].src_elt,
  sets[i].dest_hash, GET_MODE (dest));
Instead of saying "cause some tracking to be wrong", it might be better 
to say "the bits outside the mode of GET_MODE (SUBREG_REG (dest)) are 
undefined".







So apparently there was already an attempt of a fix for a similar bug,
and svn blame points to:

svn log -v -r8354

r8354 | kenner | 1994-10-28 23:55:05 +0100 (Fri, 28 Oct 1994) | 3 lines
Changed paths:
M /trunk/gcc/cse.c

(cse_insn): Don't record a DEST a paradoxical SUBREG and SRC is a
SIGN_EXTEND or ZERO_EXTEND.



This way we can still map the underlying QI register to 255 but
not the SUBREG if it is a paradoxical subreg.

In the test case this patch still works (output code does not change).

What do you think?
Looks like you've probably nailed it.  It'll be interesting see if 

Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg

2016-08-01 Thread Jeff Law

On 07/30/2016 02:17 AM, Bernd Edlinger wrote:


In your first mail you showed reg 481 as _not_ being REG_POINTER:

(insn 1047 1046 1048 (set (reg:DI 481)
 (subreg:DI (reg/f:SI 479) 0)) y.c:12702 -1
  (nil))

(note the lack of /f).  So which is it?  REG_POINTER here is not correct
as far as I can see.



Oh yes, that's an interesting point, in expand I still see this:


(insn 1047 1046 1048 (set (reg:DI 481)
 (subreg:DI (reg/f:SI 479) 0)) isl_input.c:2496 -1
  (nil))

But in the last dump before combine I have this:

(insn 1047 1044 1048 101 (set (reg/f:DI 481)
 (subreg:DI (reg/f:SI 545) 0)) isl_input.c:2496 50 {*movdi_aarch64}
  (nil))


However I was not really surpised by that, because the reg 545 does
in deed hold a pointer value: _obj_map_vtable
So just an FYI.  It should always be safe to fail to mark something with 
REG_POINTER -- though it is possible that something has violated that 
design decision.


So one interesting test would be to hack up things so that REG_POINTER 
never gets set on anything and see what that does to your testcase.





(insn 22 17 23 51 (set (reg/f:SI 544)
 (high:SI (symbol_ref:SI ("isl_obj_map_vtable") [flags 0xc0]
))) isl_input.c:2415 49
{*movsi_aarch64}
  (nil))
(insn 23 22 24 51 (set (reg/f:SI 545)
 (lo_sum:SI (reg/f:SI 544)
 (symbol_ref:SI ("isl_obj_map_vtable") [flags 0xc0]
))) isl_input.c:2415 917
{add_losym_si}
  (expr_list:REG_DEAD (reg/f:SI 544)
 (expr_list:REG_EQUAL (symbol_ref:SI ("isl_obj_map_vtable")
[flags 0xc0] )
 (nil

The "reg/f:DI 481" first appeared in cse1.


I'll try to see what's happening there next



Ok, I the incorrect REG_POINTER is done here:

cse_main -> reg_scan -> reg_scan_mark_refs -> set_reg_attrs_from_value
(reg 544) is technically not a pointer, though I think we have allowed 
REG_POINTER to be set on (HIGH (SYMBOL_REF)).  I would expect (reg 545) 
to be marked as a pointer.


The hesitation I have is because Pmode != SImode on this target, so 
technically the value has to be zero extended out to Pmode to ensure its 
validity.  One could argue that only a properly extended object should 
have REG_POINTER set.








and here I see a bug, because if POINTERS_EXTEND_UNSIGNED
can_be_reg_pointer is only set to false if x is SIGN_EXTEND but not
if x is a SUBREG as in this case.

Seems like a bug to me.





So I think that should be fixed this way:

Index: emit-rtl.c
===
--- emit-rtl.c  (revision 238891)
+++ emit-rtl.c  (working copy)
@@ -1155,7 +1155,7 @@ set_reg_attrs_from_value (rtx reg, rtx x)
 || (GET_CODE (x) == SUBREG && subreg_lowpart_p (x)))
  {
  #if defined(POINTERS_EXTEND_UNSIGNED)
-  if (((GET_CODE (x) == SIGN_EXTEND && POINTERS_EXTEND_UNSIGNED)
+  if (((GET_CODE (x) != ZERO_EXTEND && POINTERS_EXTEND_UNSIGNED)
   || (GET_CODE (x) != SIGN_EXTEND && ! POINTERS_EXTEND_UNSIGNED))
  && !targetm.have_ptr_extend ())
can_be_reg_pointer = false;


What do you think does this look like the right fix?
As Segher pointed out, I think you also want to look at 
SUBREG_PROMOTED_VAR_P and SUBREG_PROMOTED_UNSIGNED_P as well.  I don't 
think they're applicable for this target, but it still seems like the 
right thing to do.




With this patch the code the reg/f:DI 481 does no longer appear,
and also the invalid combine does no longer happen.

However the test case from pr70903 does not get fixed by this.

But when I look at the dumps, I see again the first incorrect
transformation in cse2 (again cse!):

(insn 20 19 21 2 (set (reg:DI 94)
 (subreg:DI (reg:QI 93) 0)) pr.c:8 50 {*movdi_aarch64}
  (expr_list:REG_EQUAL (const_int 255 [0xff])
 (expr_list:REG_DEAD (reg:QI 93)
 (nil

but that is simply wrong, because later optimization passes
expect reg 94 to be 0x00ff but the upper bits are unspecified,
so that REG_EQUAL should better not exist.
Now this one could be related to PROMOTE_MODE and friends.  You might 
want to review Jim W's comments in pr65932 which describe some problems 
with the way the port uses PROMOTE_MODE.





When I looked at cse.c I saw a comment in #if 0, which exactly
describes the problem that we have with paradoxical subreg here:

Index: cse.c
===
--- cse.c   (revision 238891)
+++ cse.c   (working copy)
@@ -4716,10 +4716,6 @@ cse_insn (rtx_insn *insn)
}
}

-#if 0
-  /* It is no longer clear why we used to do this, but it doesn't
-appear to still be needed.  So let's try without it since this
-code hurts cse'ing widened ops.  */
/* If source is a paradoxical subreg (such as QI treated as an SI),
 treat it as volatile.  It may do the work of an SI in one context
 where the extra bits are not being used, but cannot replace an SI
@@ -4726,7 +4722,6 @@ 

Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg

2016-07-31 Thread Bernd Edlinger
On 07/31/16 12:43, Bernd Edlinger wrote:
> I found out that the trouble starts one instruction earlier:
>
> (insn 19 40 20 2 (set (subreg:DI (reg:QI 93) 0)
>  ) pr.c:8 50 {*movdi_aarch64}
>   (expr_list:REG_DEAD (reg:DI 110 [ D.3037 ])
>  (nil)))
>

oops, my e-mail missed one line

that's what the insn actually looks like:

(insn 19 40 20 2 (set (subreg:DI (reg:QI 93) 0)
 (const_int 255 [0xff])) pr.c:8 50 {*movdi_aarch64}
  (expr_list:REG_DEAD (reg:DI 110 [ D.3037 ])
 (nil)))

So we can say reg:QI 93 is 255, but have no idea
what an expression like (subreg:DI (reg:QI 93) 0) will
evaluate to.


Bernd.


Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg

2016-07-31 Thread Bernd Edlinger
On 07/30/16 13:39, Segher Boessenkool wrote:
> On Sat, Jul 30, 2016 at 08:17:59AM +, Bernd Edlinger wrote:
>> Ok, I the incorrect REG_POINTER is done here:
>>
>> cse_main -> reg_scan -> reg_scan_mark_refs -> set_reg_attrs_from_value
>>
>> and here I see a bug, because if POINTERS_EXTEND_UNSIGNED
>> can_be_reg_pointer is only set to false if x is SIGN_EXTEND but not
>> if x is a SUBREG as in this case.
>>
>> So I think that should be fixed this way:
>>
>> Index: emit-rtl.c
>> ===
>> --- emit-rtl.c   (revision 238891)
>> +++ emit-rtl.c   (working copy)
>> @@ -1155,7 +1155,7 @@ set_reg_attrs_from_value (rtx reg, rtx x)
>>   || (GET_CODE (x) == SUBREG && subreg_lowpart_p (x)))
>>{
>>#if defined(POINTERS_EXTEND_UNSIGNED)
>> -  if (((GET_CODE (x) == SIGN_EXTEND && POINTERS_EXTEND_UNSIGNED)
>> +  if (((GET_CODE (x) != ZERO_EXTEND && POINTERS_EXTEND_UNSIGNED)
>> || (GET_CODE (x) != SIGN_EXTEND && ! POINTERS_EXTEND_UNSIGNED))
>>&& !targetm.have_ptr_extend ())
>>  can_be_reg_pointer = false;
>>
>>
>> What do you think does this look like the right fix?
>
> There also is (from rtl.texi), for paradoxical subregs:
>
> """
> @item @code{subreg} of @code{reg}s
> The upper bits are defined when @code{SUBREG_PROMOTED_VAR_P} is true.
> @code{SUBREG_PROMOTED_UNSIGNED_P} describes what the upper bits hold.
> Such subregs usually represent local variables, register variables
> and parameter pseudo variables that have been promoted to a wider mode.
> """
>
> so you might want to test for these as well.
>

like this?

Index: emit-rtl.c
===
--- emit-rtl.c  (revision 238891)
+++ emit-rtl.c  (working copy)
@@ -1156,7 +1156,11 @@
  {
  #if defined(POINTERS_EXTEND_UNSIGNED)
if (((GET_CODE (x) == SIGN_EXTEND && POINTERS_EXTEND_UNSIGNED)
-  || (GET_CODE (x) != SIGN_EXTEND && ! POINTERS_EXTEND_UNSIGNED))
+  || (GET_CODE (x) == ZERO_EXTEND && ! POINTERS_EXTEND_UNSIGNED)
+  || (paradoxical_subreg_p (x)
+  && ! (SUBREG_PROMOTED_VAR_P (x)
+&& SUBREG_CHECK_PROMOTED_SIGN (x,
+   POINTERS_EXTEND_UNSIGNED
  && !targetm.have_ptr_extend ())
can_be_reg_pointer = false;
  #endif

In the test case of pr71779 the subreg is no promoted var, so this
has no influence at this time.  Also I have not POINTERS_EXTEND_SIGNED
target, but for the symmetry it ought to check explicitly for
ZERO_EXTEND as well, and allow the pointer value to pass thru a
TRUNCATE.

>> With this patch the code the reg/f:DI 481 does no longer appear,
>> and also the invalid combine does no longer happen.
>
> Good :-)
>
>> However the test case from pr70903 does not get fixed by this.
>>
>> But when I look at the dumps, I see again the first incorrect
>> transformation in cse2 (again cse!):
>>
>> (insn 20 19 21 2 (set (reg:DI 94)
>>   (subreg:DI (reg:QI 93) 0)) pr.c:8 50 {*movdi_aarch64}
>>(expr_list:REG_EQUAL (const_int 255 [0xff])
>>   (expr_list:REG_DEAD (reg:QI 93)
>>   (nil
>>
>> but that is simply wrong, because later optimization passes
>> expect reg 94 to be 0x00ff but the upper bits are unspecified,
>> so that REG_EQUAL should better not exist.
>
> Agreed.  So where does that come from?
>
>> When I looked at cse.c I saw a comment in #if 0, which exactly
>> describes the problem that we have with paradoxical subreg here:
>
> 
>
>> I am pretty sure, that this has to be reverted, and that it can
>> generate less efficient code.
>>
>> What do you think?
>
> I think this pessimises the generated code too much; there must be a
> better solution.
>

I debugged the cse again, to see how it works and why it mis-compiles
this example.

I found out that the trouble starts one instruction earlier:

(insn 19 40 20 2 (set (subreg:DI (reg:QI 93) 0)
 ) pr.c:8 50 {*movdi_aarch64}
  (expr_list:REG_DEAD (reg:DI 110 [ D.3037 ])
 (nil)))

cse_main sees the constant value and maps:
(reg:QI 93)  =>  (const_int 255 [0xff])

plus (I mean that is wrong):
(subreg:DI (reg:QI 93) 0)  =>  (const_int 255 [0xff])

When the next insn is scanned

(insn 20 19 21 2 (set (reg:DI 94)
 (subreg:DI (reg:QI 93) 0)) pr.c:8 50 {*movdi_aarch64}
  (expr_list:REG_DEAD (reg:QI 93)
 (nil)))

I see fold_rtx (subreg:DI (reg:QI 93) 0))
return (const_int 255 [0xff]) which is wrong.

now cse maps:
(reg:DI 94)  =>  (const_int 255 [0xff])

which is also not guaranteed to be correct, but the REG_EQUAL at
insn 20 is probably only a symptom, suppressing only that does not
fix the test case, because later we have:

(insn 25 24 26 2 (set (reg:DI 97)
 (const_int 255 [0xff])) pr.c:9 50 {*movdi_aarch64}
  (nil))
(insn 26 25 48 2 (set (zero_extract:DI (reg:DI 102 [ D.3038 ])
 (const_int 32 [0x20])
 

Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg

2016-07-30 Thread Segher Boessenkool
On Sat, Jul 30, 2016 at 08:17:59AM +, Bernd Edlinger wrote:
> Ok, I the incorrect REG_POINTER is done here:
> 
> cse_main -> reg_scan -> reg_scan_mark_refs -> set_reg_attrs_from_value
> 
> and here I see a bug, because if POINTERS_EXTEND_UNSIGNED
> can_be_reg_pointer is only set to false if x is SIGN_EXTEND but not
> if x is a SUBREG as in this case.
> 
> So I think that should be fixed this way:
> 
> Index: emit-rtl.c
> ===
> --- emit-rtl.c(revision 238891)
> +++ emit-rtl.c(working copy)
> @@ -1155,7 +1155,7 @@ set_reg_attrs_from_value (rtx reg, rtx x)
>|| (GET_CODE (x) == SUBREG && subreg_lowpart_p (x)))
>   {
>   #if defined(POINTERS_EXTEND_UNSIGNED)
> -  if (((GET_CODE (x) == SIGN_EXTEND && POINTERS_EXTEND_UNSIGNED)
> +  if (((GET_CODE (x) != ZERO_EXTEND && POINTERS_EXTEND_UNSIGNED)
>  || (GET_CODE (x) != SIGN_EXTEND && ! POINTERS_EXTEND_UNSIGNED))
> && !targetm.have_ptr_extend ())
>   can_be_reg_pointer = false;
> 
> 
> What do you think does this look like the right fix?

There also is (from rtl.texi), for paradoxical subregs:

"""
@item @code{subreg} of @code{reg}s
The upper bits are defined when @code{SUBREG_PROMOTED_VAR_P} is true.
@code{SUBREG_PROMOTED_UNSIGNED_P} describes what the upper bits hold.
Such subregs usually represent local variables, register variables
and parameter pseudo variables that have been promoted to a wider mode.
"""

so you might want to test for these as well.

> With this patch the code the reg/f:DI 481 does no longer appear,
> and also the invalid combine does no longer happen.

Good :-)

> However the test case from pr70903 does not get fixed by this.
> 
> But when I look at the dumps, I see again the first incorrect
> transformation in cse2 (again cse!):
> 
> (insn 20 19 21 2 (set (reg:DI 94)
>  (subreg:DI (reg:QI 93) 0)) pr.c:8 50 {*movdi_aarch64}
>   (expr_list:REG_EQUAL (const_int 255 [0xff])
>  (expr_list:REG_DEAD (reg:QI 93)
>  (nil
> 
> but that is simply wrong, because later optimization passes
> expect reg 94 to be 0x00ff but the upper bits are unspecified,
> so that REG_EQUAL should better not exist.

Agreed.  So where does that come from?

> When I looked at cse.c I saw a comment in #if 0, which exactly
> describes the problem that we have with paradoxical subreg here:



> I am pretty sure, that this has to be reverted, and that it can
> generate less efficient code.
> 
> What do you think?

I think this pessimises the generated code too much; there must be a
better solution.


Segher


Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg

2016-07-30 Thread Bernd Edlinger
On 07/29/16 15:03, Bernd Edlinger wrote:
> On 07/29/16 09:02, Segher Boessenkool wrote:
>> On Thu, Jul 28, 2016 at 08:59:44PM +, Bernd Edlinger wrote:
>>> (insn 1047 1044 1048 101 (set (reg/f:DI 481)
>>>   (subreg:DI (reg/f:SI 545) 0)) isl_input.c:2496 50
>>> {*movdi_aarch64}
>>>(nil))
>>
>> In your first mail you showed reg 481 as _not_ being REG_POINTER:
>>
>> (insn 1047 1046 1048 (set (reg:DI 481)
>>  (subreg:DI (reg/f:SI 479) 0)) y.c:12702 -1
>>   (nil))
>>
>> (note the lack of /f).  So which is it?  REG_POINTER here is not correct
>> as far as I can see.
>>
>
> Oh yes, that's an interesting point, in expand I still see this:
>
>
> (insn 1047 1046 1048 (set (reg:DI 481)
>  (subreg:DI (reg/f:SI 479) 0)) isl_input.c:2496 -1
>   (nil))
>
> But in the last dump before combine I have this:
>
> (insn 1047 1044 1048 101 (set (reg/f:DI 481)
>  (subreg:DI (reg/f:SI 545) 0)) isl_input.c:2496 50 {*movdi_aarch64}
>   (nil))
>
>
> However I was not really surpised by that, because the reg 545 does
> in deed hold a pointer value: _obj_map_vtable
>
> (insn 22 17 23 51 (set (reg/f:SI 544)
>  (high:SI (symbol_ref:SI ("isl_obj_map_vtable") [flags 0xc0]
> ))) isl_input.c:2415 49
> {*movsi_aarch64}
>   (nil))
> (insn 23 22 24 51 (set (reg/f:SI 545)
>  (lo_sum:SI (reg/f:SI 544)
>  (symbol_ref:SI ("isl_obj_map_vtable") [flags 0xc0]
> ))) isl_input.c:2415 917
> {add_losym_si}
>   (expr_list:REG_DEAD (reg/f:SI 544)
>  (expr_list:REG_EQUAL (symbol_ref:SI ("isl_obj_map_vtable")
> [flags 0xc0] )
>  (nil
>
> The "reg/f:DI 481" first appeared in cse1.
>
>
> I'll try to see what's happening there next


Ok, I the incorrect REG_POINTER is done here:

cse_main -> reg_scan -> reg_scan_mark_refs -> set_reg_attrs_from_value

and here I see a bug, because if POINTERS_EXTEND_UNSIGNED
can_be_reg_pointer is only set to false if x is SIGN_EXTEND but not
if x is a SUBREG as in this case.

So I think that should be fixed this way:

Index: emit-rtl.c
===
--- emit-rtl.c  (revision 238891)
+++ emit-rtl.c  (working copy)
@@ -1155,7 +1155,7 @@ set_reg_attrs_from_value (rtx reg, rtx x)
 || (GET_CODE (x) == SUBREG && subreg_lowpart_p (x)))
  {
  #if defined(POINTERS_EXTEND_UNSIGNED)
-  if (((GET_CODE (x) == SIGN_EXTEND && POINTERS_EXTEND_UNSIGNED)
+  if (((GET_CODE (x) != ZERO_EXTEND && POINTERS_EXTEND_UNSIGNED)
   || (GET_CODE (x) != SIGN_EXTEND && ! POINTERS_EXTEND_UNSIGNED))
  && !targetm.have_ptr_extend ())
can_be_reg_pointer = false;


What do you think does this look like the right fix?

With this patch the code the reg/f:DI 481 does no longer appear,
and also the invalid combine does no longer happen.

However the test case from pr70903 does not get fixed by this.

But when I look at the dumps, I see again the first incorrect
transformation in cse2 (again cse!):

(insn 20 19 21 2 (set (reg:DI 94)
 (subreg:DI (reg:QI 93) 0)) pr.c:8 50 {*movdi_aarch64}
  (expr_list:REG_EQUAL (const_int 255 [0xff])
 (expr_list:REG_DEAD (reg:QI 93)
 (nil

but that is simply wrong, because later optimization passes
expect reg 94 to be 0x00ff but the upper bits are unspecified,
so that REG_EQUAL should better not exist.

When I looked at cse.c I saw a comment in #if 0, which exactly
describes the problem that we have with paradoxical subreg here:

Index: cse.c
===
--- cse.c   (revision 238891)
+++ cse.c   (working copy)
@@ -4716,10 +4716,6 @@ cse_insn (rtx_insn *insn)
}
}

-#if 0
-  /* It is no longer clear why we used to do this, but it doesn't
-appear to still be needed.  So let's try without it since this
-code hurts cse'ing widened ops.  */
/* If source is a paradoxical subreg (such as QI treated as an SI),
 treat it as volatile.  It may do the work of an SI in one context
 where the extra bits are not being used, but cannot replace an SI
@@ -4726,7 +4722,6 @@ cse_insn (rtx_insn *insn)
 in general.  */
if (paradoxical_subreg_p (src))
sets[i].src_volatile = 1;
-#endif

/* Locate all possible equivalent forms for SRC.  Try to replace
   SRC in the insn with each cheaper equivalent.


removing the #if 0 seems to fix the sample from pr70903, but generates
3 more instructions than with my initial patch:

@@ -6,12 +6,15 @@
  foo:
adrpx0, .LC0
add x0, x0, :lo12:.LC0
+   mov x5, -1
mov x1, 0
ldp x4, x3, [x0, 8]
ldr x2, [x0, 24]
-   mov x0, 255
-   bfi x1, x0, 0, 32
+   mov x0, 0
+   bfi x0, x5, 0, 8
stp x3, x2, [x8, 16]
+   bfi x1, x0, 0, 32
+   mov x0, 255
bfi x1, x0, 32, 32
stp x1, 

Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg

2016-07-29 Thread Bernd Edlinger
On 07/29/16 09:02, Segher Boessenkool wrote:
> On Thu, Jul 28, 2016 at 08:59:44PM +, Bernd Edlinger wrote:
>> (insn 1047 1044 1048 101 (set (reg/f:DI 481)
>>   (subreg:DI (reg/f:SI 545) 0)) isl_input.c:2496 50 {*movdi_aarch64}
>>(nil))
>
> In your first mail you showed reg 481 as _not_ being REG_POINTER:
>
> (insn 1047 1046 1048 (set (reg:DI 481)
>  (subreg:DI (reg/f:SI 479) 0)) y.c:12702 -1
>   (nil))
>
> (note the lack of /f).  So which is it?  REG_POINTER here is not correct
> as far as I can see.
>

Oh yes, that's an interesting point, in expand I still see this:


(insn 1047 1046 1048 (set (reg:DI 481)
 (subreg:DI (reg/f:SI 479) 0)) isl_input.c:2496 -1
  (nil))

But in the last dump before combine I have this:

(insn 1047 1044 1048 101 (set (reg/f:DI 481)
 (subreg:DI (reg/f:SI 545) 0)) isl_input.c:2496 50 {*movdi_aarch64}
  (nil))


However I was not really surpised by that, because the reg 545 does
in deed hold a pointer value: _obj_map_vtable

(insn 22 17 23 51 (set (reg/f:SI 544)
 (high:SI (symbol_ref:SI ("isl_obj_map_vtable") [flags 0xc0] 
))) isl_input.c:2415 49 
{*movsi_aarch64}
  (nil))
(insn 23 22 24 51 (set (reg/f:SI 545)
 (lo_sum:SI (reg/f:SI 544)
 (symbol_ref:SI ("isl_obj_map_vtable") [flags 0xc0] 
))) isl_input.c:2415 917 
{add_losym_si}
  (expr_list:REG_DEAD (reg/f:SI 544)
 (expr_list:REG_EQUAL (symbol_ref:SI ("isl_obj_map_vtable") 
[flags 0xc0] )
 (nil

The "reg/f:DI 481" first appeared in cse1.


I'll try to see what's happening there next


Thanks
Bernd.


Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg

2016-07-29 Thread Segher Boessenkool
On Thu, Jul 28, 2016 at 08:59:44PM +, Bernd Edlinger wrote:
> (insn 1047 1044 1048 101 (set (reg/f:DI 481)
>  (subreg:DI (reg/f:SI 545) 0)) isl_input.c:2496 50 {*movdi_aarch64}
>   (nil))

In your first mail you showed reg 481 as _not_ being REG_POINTER:

(insn 1047 1046 1048 (set (reg:DI 481)
(subreg:DI (reg/f:SI 479) 0)) y.c:12702 -1
 (nil))

(note the lack of /f).  So which is it?  REG_POINTER here is not correct
as far as I can see.


Segher


Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg

2016-07-28 Thread Bernd Edlinger
Hi,

On 07/27/16 23:31, Jeff Law wrote:
> So you're stumbling into another really interesting area.
>

Absolutely, I am just too curious what's going on here ;-)


> I can hazard a guess that the reason we create the paradoxical SUBREG
> rather than a zero or sign extension is because various optimizers know
> that the upper bits in a paradoxical are don't care bits and may make
> transformations with that knowledge.
>
> Once you turn that into a zero/sign extend, then the rest of the
> optimizers must preserve the zero/sign extension property -- they have
> no way of knowing the bits were don't cares.
>
> So it's not necessarily clear that your change results in generally
> better code or not.
>
> More importantly, it really feels like you're papering over a real bug
> elsewhere.  AFAICT the use of a paradoxical subreg here is safe.  So the
> real bug ought to be downstream of this point.  Several folks have
> pointed at reload, which would certainly be a point ripe for a
> paradoxical subreg problem.


I have looked again at the test case of PR 71779 and discovered
something that I wanted to discuss here.

I have tried to figure out what made combine change this:


(insn 1048 1047 1049 101 (set (zero_extract:DI (reg/v:DI 191 [ 
obj1D.17943 ])
 (const_int 32 [0x20])
 (const_int 0 [0]))
 (reg/f:DI 481)) isl_input.c:2496 691 {*insv_regdi}
  (nil))
(insn 1049 1048 1050 101 (set (zero_extract:DI (reg/v:DI 191 [ 
obj1D.17943 ])
 (const_int 32 [0x20])
 (const_int 32 [0x20]))
 (reg/v/f:DI 145 [ SR.327D.17957 ])) isl_input.c:2496 691 
{*insv_regdi}
  (expr_list:REG_DEAD (reg/v/f:DI 145 [ SR.327D.17957 ])
 (nil)))
(insn 1050 1049 1051 101 (set (reg:DI 1 x1)
 (reg/v:DI 191 [ obj1D.17943 ])) isl_input.c:2496 50 
{*movdi_aarch64}
  (expr_list:REG_DEAD (reg/v:DI 191 [ obj1D.17943 ])
 (nil)))


into that (which is wrong because reg 481 has undefined bits
in insn 1050):


(insn 1048 1047 1049 101 (set (zero_extract:DI (reg/v:DI 191 [ 
obj1D.17943 ])
 (const_int 32 [0x20])
 (const_int 0 [0]))
 (reg/f:DI 481)) isl_input.c:2496 691 {*insv_regdi}
  (nil))
(insn 1049 1048 1050 101 (set (zero_extract:DI (reg/v:DI 191 [ 
obj1D.17943 ])
 (const_int 32 [0x20])
 (const_int 32 [0x20]))
 (reg/v/f:DI 145 [ SR.327D.17957 ])) isl_input.c:2496 691 
{*insv_regdi}
  (expr_list:REG_DEAD (reg/v/f:DI 145 [ SR.327D.17957 ])
 (nil)))
(insn 1050 1049 1051 101 (set (reg:DI 1 x1)
 (ior:DI (ashift:DI (reg/v/f:DI 145 [ SR.327D.17957 ])
 (const_int 32 [0x20]))
 (reg/f:DI 481))) isl_input.c:2496 50 {*movdi_aarch64}
  (nil))


The interesting fact is that combine did that while it
was only considering 1048, 1049 -> 1050,
and not the instruction with the paradoxical subreg:

(insn 1047 1044 1048 101 (set (reg/f:DI 481)
 (subreg:DI (reg/f:SI 545) 0)) isl_input.c:2496 50 {*movdi_aarch64}
  (nil))

I found by debugging that expand_field_assignment
was trying to build:

(and:DI (reg/f:DI 481)
 (const_int -4294967296 [0x]))

with this statement:

   masked = simplify_gen_binary (ASHIFT, compute_mode,
 simplify_gen_binary (
   AND, compute_mode,
   gen_lowpart (compute_mode, 
SET_SRC (x)),
   mask),
 pos);

but the result is just (reg/f:DI 481) !

I debugged and found the first wrong result in
rtlanal.c (nonzero_bits1) where the following if-statement
tells us that the upper 32 bits of reg 481 must be zero:

   switch (code)
 {
 case REG:
#if defined(POINTERS_EXTEND_UNSIGNED)
   /* If pointers extend unsigned and this is a pointer in Pmode, 
say that
  all the bits above ptr_mode are known to be zero.  */
   /* As we do not know which address space the pointer is referring to,
  we can do this only if the target does not support different 
pointer
  or address modes depending on the address space.  */
   if (target_default_pointer_address_modes_p ()
   && POINTERS_EXTEND_UNSIGNED && GET_MODE (x) == Pmode
   && REG_POINTER (x)
   && !targetm.have_ptr_extend ())
 nonzero &= GET_MODE_MASK (ptr_mode);
#endif


Pmode = DImode, ptr_mode=SImode, and REG_POINTER(x) is true.

Yes, the value is actually a pointer!

Which means, that it is not safe to extend a pointer from SI to DI
with anything but a zero-extend.

And if I remove this statement the wrong code is still not fixed.
So there must be more places where the similar assumptions are
made.

However, PR 70903 is not about pointers, and uses a mode where
Pmode=ptr_mode=DImode, so there must be a different as hard to track
down reason why this did not work out right.


So the question is, if the 

Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg

2016-07-27 Thread Bernd Edlinger
Am 27.07.2016 um 23:31 schrieb Jeff Law:
> On 07/26/2016 11:32 AM, Bernd Edlinger wrote:
>> Hi!
>>
>> As described in PR 71779 and PR 70903 we have a wrong code issue on
>> aarch64
>> which is triggered by a paradoxical subreg that can be created in
>> store_bit_field_using_insv when the target has an EP_insv bitfield
>> instruction.
>>
>> The attached patch changes this insn...
>>
>> (insn 1047 1046 1048 (set (reg:DI 481)
>> (subreg:DI (reg/f:SI 479) 0)) y.c:12702 -1
>>  (nil))
>>
>> ... into a more simple insn:
>>
>> (insn 1047 1046 1048 (set (reg:DI 479)
>> (zero_extend:DI (reg:SI 480))) isl_input.c:2496 -1
>>  (nil))
>>
>> which manages to fix two bugs at the same time.
>>
>> The patch does not include a test case, as I was looking at the PR 71779
>> just by curiosity, and I have only a theoretical interest in aarch64.
>> However,
>> it should be easy to extract one from PR 70903 at least, but I can't
>> do that by
>> myself.
>>
>> Therefore I would like to leave the test case(s) to Cristina Tamar
>> from ARM,
>> and/or Andreas Schwab, who have also helped out with bootstrapping and
>> reg-testing on aarch64 and aarch64-ilp32.
>>
>> Bootstrap and reg-test on x86_64-linux-gnu with no regressions.
>> Successfully bootstrapped on aarch64_ilp32 and the ISL testsuite passes.
>> Also successfully bootstrapped on an aarch64 juno board and no
>> regressions.
>>
>>
>> Is it OK for trunk?
>>
>>
>> Thanks
>> Bernd.
>>
>>
>> patch-pr71779.diff
>>
>>
>> 2016-07-25  Bernd Edlinger  
>>
>> PR rtl-optimization/71779
>> PR rtl-optimization/70903
>> * expmed.c (store_bit_field_using_insv): Don't generate a paradoxical
>> subreg.
> So you're stumbling into another really interesting area.
>
> I can hazard a guess that the reason we create the paradoxical SUBREG
> rather than a zero or sign extension is because various optimizers know
> that the upper bits in a paradoxical are don't care bits and may make
> transformations with that knowledge.
>
> Once you turn that into a zero/sign extend, then the rest of the
> optimizers must preserve the zero/sign extension property -- they have
> no way of knowing the bits were don't cares.
>
> So it's not necessarily clear that your change results in generally
> better code or not.
>

Yes, I understand, on x86 the stage 2/3 did not change but x86 has
probably not a very sophisticated insv code pattern.

> More importantly, it really feels like you're papering over a real bug
> elsewhere.  AFAICT the use of a paradoxical subreg here is safe.  So the
> real bug ought to be downstream of this point.  Several folks have
> pointed at reload, which would certainly be a point ripe for a
> paradoxical subreg problem.
>

I debugged lra already, because I was initially sure it did something
wrong with the subreg, but it turned out that it follows exactly what
you say here, it spills the subreg twice, once as a 32 bit value,
into a 64 bit slot with stack image in the upper word and then again
as a zero extended 64 bit value.  Later the garbage extended 64 bit
value is used when it should not.

So lra seems not to be the reason, but it is apparently spilling the
paradoxical subreg twice.

So at least lra does not generate better code from paradoxical subreg.

> Sooo. I don't think we can go with this patch as-is today.  We need to
> find the root problem which I believe is later in the pipeline.
>
> This patch might make sense from an optimization standpoint -- I'm
> generally of the opinion that while paradoxical subregs give the
> optimziers more freedom, the optimizers rarely are able to use it and
> they generally know much more about the semantics of and how to optimize
> around zero/sign extensions.  But we really should fix the bug first,
> then come back to improving the code generation.
>

Yes.  And paradoxical subreg have non-intuitive semantics.


Thanks
Bernd.

> Now if someone wanted to look for an interesting optimization project --
> ree.c knows how to do redundant extension eliminations.  A paradoxical
> is a form of extension that isn't currently understood by REE. Similarly
> REE doesn't know how to handle zero-extension as an "and" or sign
> extension as a pair of shifts.  Both of which can occur in practice due
> to costs or ISA limitations.  I wouldn't be surprise if adding those
> capabilities to REE would significantly improve its ability to eliminate
> unnecessary extensions.
>
> Jeff
>


Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg

2016-07-27 Thread Jeff Law

On 07/26/2016 11:32 AM, Bernd Edlinger wrote:

Hi!

As described in PR 71779 and PR 70903 we have a wrong code issue on aarch64
which is triggered by a paradoxical subreg that can be created in
store_bit_field_using_insv when the target has an EP_insv bitfield instruction.

The attached patch changes this insn...

(insn 1047 1046 1048 (set (reg:DI 481)
(subreg:DI (reg/f:SI 479) 0)) y.c:12702 -1
 (nil))

... into a more simple insn:

(insn 1047 1046 1048 (set (reg:DI 479)
(zero_extend:DI (reg:SI 480))) isl_input.c:2496 -1
 (nil))

which manages to fix two bugs at the same time.

The patch does not include a test case, as I was looking at the PR 71779
just by curiosity, and I have only a theoretical interest in aarch64. However,
it should be easy to extract one from PR 70903 at least, but I can't do that by
myself.

Therefore I would like to leave the test case(s) to Cristina Tamar from ARM,
and/or Andreas Schwab, who have also helped out with bootstrapping and
reg-testing on aarch64 and aarch64-ilp32.

Bootstrap and reg-test on x86_64-linux-gnu with no regressions.
Successfully bootstrapped on aarch64_ilp32 and the ISL testsuite passes.
Also successfully bootstrapped on an aarch64 juno board and no regressions.


Is it OK for trunk?


Thanks
Bernd.


patch-pr71779.diff


2016-07-25  Bernd Edlinger  

PR rtl-optimization/71779
PR rtl-optimization/70903
* expmed.c (store_bit_field_using_insv): Don't generate a paradoxical
subreg.

So you're stumbling into another really interesting area.

I can hazard a guess that the reason we create the paradoxical SUBREG 
rather than a zero or sign extension is because various optimizers know 
that the upper bits in a paradoxical are don't care bits and may make 
transformations with that knowledge.


Once you turn that into a zero/sign extend, then the rest of the 
optimizers must preserve the zero/sign extension property -- they have 
no way of knowing the bits were don't cares.


So it's not necessarily clear that your change results in generally 
better code or not.


More importantly, it really feels like you're papering over a real bug 
elsewhere.  AFAICT the use of a paradoxical subreg here is safe.  So the 
real bug ought to be downstream of this point.  Several folks have 
pointed at reload, which would certainly be a point ripe for a 
paradoxical subreg problem.


Sooo. I don't think we can go with this patch as-is today.  We need to 
find the root problem which I believe is later in the pipeline.


This patch might make sense from an optimization standpoint -- I'm 
generally of the opinion that while paradoxical subregs give the 
optimziers more freedom, the optimizers rarely are able to use it and 
they generally know much more about the semantics of and how to optimize 
around zero/sign extensions.  But we really should fix the bug first, 
then come back to improving the code generation.


Now if someone wanted to look for an interesting optimization project -- 
ree.c knows how to do redundant extension eliminations.  A paradoxical 
is a form of extension that isn't currently understood by REE. 
Similarly REE doesn't know how to handle zero-extension as an "and" or 
sign extension as a pair of shifts.  Both of which can occur in practice 
due to costs or ISA limitations.  I wouldn't be surprise if adding those 
capabilities to REE would significantly improve its ability to eliminate 
unnecessary extensions.


Jeff




[PATCH] Fix wrong code on aarch64 due to paradoxical subreg

2016-07-26 Thread Bernd Edlinger
Hi!

As described in PR 71779 and PR 70903 we have a wrong code issue on aarch64
which is triggered by a paradoxical subreg that can be created in
store_bit_field_using_insv when the target has an EP_insv bitfield instruction.

The attached patch changes this insn...

(insn 1047 1046 1048 (set (reg:DI 481)
(subreg:DI (reg/f:SI 479) 0)) y.c:12702 -1
 (nil))

... into a more simple insn:

(insn 1047 1046 1048 (set (reg:DI 479)
(zero_extend:DI (reg:SI 480))) isl_input.c:2496 -1
 (nil))

which manages to fix two bugs at the same time.

The patch does not include a test case, as I was looking at the PR 71779
just by curiosity, and I have only a theoretical interest in aarch64. However,
it should be easy to extract one from PR 70903 at least, but I can't do that by
myself.

Therefore I would like to leave the test case(s) to Cristina Tamar from ARM,
and/or Andreas Schwab, who have also helped out with bootstrapping and
reg-testing on aarch64 and aarch64-ilp32.

Bootstrap and reg-test on x86_64-linux-gnu with no regressions.
Successfully bootstrapped on aarch64_ilp32 and the ISL testsuite passes.
Also successfully bootstrapped on an aarch64 juno board and no regressions.


Is it OK for trunk?


Thanks
Bernd.2016-07-25  Bernd Edlinger  

	PR rtl-optimization/71779
	PR rtl-optimization/70903
	* expmed.c (store_bit_field_using_insv): Don't generate a paradoxical
	subreg.

Index: gcc/calls.c
Index: expmed.c
===
--- expmed.c	(revision 238694)
+++ expmed.c	(working copy)
@@ -664,14 +664,7 @@ store_bit_field_using_insv (const extraction_insn
 	 if we must narrow it, be sure we do it correctly.  */
 
 	  if (GET_MODE_SIZE (GET_MODE (value)) < GET_MODE_SIZE (op_mode))
-	{
-	  tmp = simplify_subreg (op_mode, value1, GET_MODE (value), 0);
-	  if (! tmp)
-		tmp = simplify_gen_subreg (op_mode,
-	   force_reg (GET_MODE (value),
-		  value1),
-	   GET_MODE (value), 0);
-	}
+	tmp = convert_to_mode (op_mode, value1, 1);
 	  else
 	{
 	  tmp = gen_lowpart_if_possible (op_mode, value1);