Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
On Wed, Aug 03, 2016 at 04:08:30PM -0600, Jeff Law wrote: > On 08/03/2016 11:41 AM, Bernd Edlinger wrote: > >On 08/03/16 17:38, Jeff Law wrote: > >>cse.c changes look good, but I'd really like to see a testcase for each > >>issue in the dejagnu framework. Extra points if you tried to build a > >>unit test using David M's framework, but that isn't required. > >> > >>The testcase from 70903 ought to be trivial to add to the dejagnu suite. > >> 71779 might be more difficult, but if you could take a stab, it'd be > >>appreciated. > >> > > > > > >Yes, sure. I had assumed that the pr70903 test case is using some > >target-specific vector types, but now I see that it even works as-is in > >the gcc.c-torture/execute directory. > > > >So I've added the test case to the cse patch. And quickly verified that > >it works on x86_64-linux-gnu. > > > > > >The pr71779 test case will be pretty difficult to reduce, because it > >depends on combine to do the incorrect transformation and lra to spill > >the subreg, and on the stack content at runtime to be non-zero. > > > >But technically it *is* already in the isl-test suite, so if isl is > >in-tree, it is always executed by make check or make check-isl. > > > >It is just that gmp/mpfr/mpc and isl test results are not included by > >contrib/test_summary, but that should be fixable. What do you think? > > > >Actually that should not be too difficult, as there are test-suite.log > >files that we could just added to the test_summary output as-is, for > >instance: > > > >cat isl/test-suite.log > > > >== > >isl 0.16.1: ./test-suite.log > >== > > > ># TOTAL: 5 > ># PASS: 5 > ># SKIP: 0 > ># XFAIL: 0 > ># FAIL: 0 > ># XPASS: 0 > ># ERROR: 0 > > > >.. contents:: :depth: 2 > > > > > >Are the patches OK now? > Yes. Thanks for taking care of this... > Hi Bernd, Thanks for fixing this, but it looks like you accidentally double-added the pr70903.c testcase. Failures: gcc.c-torture/execute/pr70903.c Bisected to: Author: edlingerDate: Thu Aug 4 13:20:57 2016 + 2016-08-04 Bernd Edlinger PR rtl-optimization/70903 * cse.c (cse_insn): If DEST is a paradoxical SUBREG, don't record DEST. testsuite: 2016-08-04 Bernd Edlinger PR rtl-optimization/70903 * gcc.c-torture/execute/pr70903.c: New test. .../gcc/testsuite/gcc.c-torture/execute/pr70903.c:25:1: error: redefinition of 'foo' .../gcc/testsuite/gcc.c-torture/execute/pr70903.c:6:1: note: previous definition of 'foo' was here .../gcc/testsuite/gcc.c-torture/execute/pr70903.c:31:5: error: redefinition of 'main' .../gcc/testsuite/gcc.c-torture/execute/pr70903.c:12:5: note: previous definition of 'main' was here I've fixed that up as so in revision 239142, I hope you agree the change is obvious. Thanks, James --- 2016-08-04 James Greenhalgh * gcc.c-torture/execute/pr70903.c: Fix duplicate body. diff --git a/gcc/testsuite/gcc.c-torture/execute/pr70903.c b/gcc/testsuite/gcc.c-torture/execute/pr70903.c index 6ffd0aa..175ad1a 100644 --- a/gcc/testsuite/gcc.c-torture/execute/pr70903.c +++ b/gcc/testsuite/gcc.c-torture/execute/pr70903.c @@ -17,22 +17,4 @@ int main () __builtin_abort(); return 0; } -typedef unsigned char V8 __attribute__ ((vector_size (32))); -typedef unsigned int V32 __attribute__ ((vector_size (32))); -typedef unsigned long long V64 __attribute__ ((vector_size (32))); - -static V32 __attribute__ ((noinline, noclone)) -foo (V64 x) -{ - V64 y = (V64)(V8){((V8)(V64){65535, x[0]})[1]}; - return (V32){y[0], 255}; -} -int main () -{ - V32 x = foo ((V64){}); -// __builtin_printf ("%08x %08x %08x %08x %08x %08x %08x %08x\n", x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7]); - if (x[1] != 255) -__builtin_abort(); - return 0; -}
Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
On 08/03/2016 11:41 AM, Bernd Edlinger wrote: On 08/03/16 17:38, Jeff Law wrote: cse.c changes look good, but I'd really like to see a testcase for each issue in the dejagnu framework. Extra points if you tried to build a unit test using David M's framework, but that isn't required. The testcase from 70903 ought to be trivial to add to the dejagnu suite. 71779 might be more difficult, but if you could take a stab, it'd be appreciated. Yes, sure. I had assumed that the pr70903 test case is using some target-specific vector types, but now I see that it even works as-is in the gcc.c-torture/execute directory. So I've added the test case to the cse patch. And quickly verified that it works on x86_64-linux-gnu. The pr71779 test case will be pretty difficult to reduce, because it depends on combine to do the incorrect transformation and lra to spill the subreg, and on the stack content at runtime to be non-zero. But technically it *is* already in the isl-test suite, so if isl is in-tree, it is always executed by make check or make check-isl. It is just that gmp/mpfr/mpc and isl test results are not included by contrib/test_summary, but that should be fixable. What do you think? Actually that should not be too difficult, as there are test-suite.log files that we could just added to the test_summary output as-is, for instance: cat isl/test-suite.log == isl 0.16.1: ./test-suite.log == # TOTAL: 5 # PASS: 5 # SKIP: 0 # XFAIL: 0 # FAIL: 0 # XPASS: 0 # ERROR: 0 .. contents:: :depth: 2 Are the patches OK now? Yes. Thanks for taking care of this... Jeff
Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
On 08/03/16 17:38, Jeff Law wrote: > cse.c changes look good, but I'd really like to see a testcase for each > issue in the dejagnu framework. Extra points if you tried to build a > unit test using David M's framework, but that isn't required. > > The testcase from 70903 ought to be trivial to add to the dejagnu suite. > 71779 might be more difficult, but if you could take a stab, it'd be > appreciated. > Yes, sure. I had assumed that the pr70903 test case is using some target-specific vector types, but now I see that it even works as-is in the gcc.c-torture/execute directory. So I've added the test case to the cse patch. And quickly verified that it works on x86_64-linux-gnu. The pr71779 test case will be pretty difficult to reduce, because it depends on combine to do the incorrect transformation and lra to spill the subreg, and on the stack content at runtime to be non-zero. But technically it *is* already in the isl-test suite, so if isl is in-tree, it is always executed by make check or make check-isl. It is just that gmp/mpfr/mpc and isl test results are not included by contrib/test_summary, but that should be fixable. What do you think? Actually that should not be too difficult, as there are test-suite.log files that we could just added to the test_summary output as-is, for instance: cat isl/test-suite.log == isl 0.16.1: ./test-suite.log == # TOTAL: 5 # PASS: 5 # SKIP: 0 # XFAIL: 0 # FAIL: 0 # XPASS: 0 # ERROR: 0 .. contents:: :depth: 2 Are the patches OK now? Thanks Bernd. 2016-08-01 Bernd EdlingerPR rtl-optimization/70903 * cse.c (cse_insn): If DEST is a paradoxical SUBREG, don't record DEST. testsuite: 2016-08-01 Bernd Edlinger PR rtl-optimization/70903 * gcc.c-torture/execute/pr70903.c: New test. Index: gcc/cse.c === --- gcc/cse.c (revision 238915) +++ gcc/cse.c (working copy) @@ -5898,15 +5898,7 @@ cse_insn (rtx_insn *insn) || GET_MODE (dest) == BLKmode /* If we didn't put a REG_EQUAL value or a source into the hash table, there is no point is recording DEST. */ - || sets[i].src_elt == 0 - /* If DEST is a paradoxical SUBREG and SRC is a ZERO_EXTEND - or SIGN_EXTEND, don't record DEST since it can cause - some tracking to be wrong. - - ??? Think about this more later. */ - || (paradoxical_subreg_p (dest) - && (GET_CODE (sets[i].src) == SIGN_EXTEND - || GET_CODE (sets[i].src) == ZERO_EXTEND))) + || sets[i].src_elt == 0) continue; /* STRICT_LOW_PART isn't part of the value BEING set, @@ -5925,6 +5917,11 @@ cse_insn (rtx_insn *insn) sets[i].dest_hash = HASH (dest, GET_MODE (dest)); } + /* If DEST is a paradoxical SUBREG, don't record DEST since the bits + outside the mode of GET_MODE (SUBREG_REG (dest)) are undefined. */ + if (paradoxical_subreg_p (dest)) + continue; + elt = insert (dest, sets[i].src_elt, sets[i].dest_hash, GET_MODE (dest)); Index: gcc/testsuite/gcc.c-torture/execute/pr70903.c === --- gcc/testsuite/gcc.c-torture/execute/pr70903.c (revision 0) +++ gcc/testsuite/gcc.c-torture/execute/pr70903.c (working copy) @@ -0,0 +1,19 @@ +typedef unsigned char V8 __attribute__ ((vector_size (32))); +typedef unsigned int V32 __attribute__ ((vector_size (32))); +typedef unsigned long long V64 __attribute__ ((vector_size (32))); + +static V32 __attribute__ ((noinline, noclone)) +foo (V64 x) +{ + V64 y = (V64)(V8){((V8)(V64){65535, x[0]})[1]}; + return (V32){y[0], 255}; +} + +int main () +{ + V32 x = foo ((V64){}); +// __builtin_printf ("%08x %08x %08x %08x %08x %08x %08x %08x\n", x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7]); + if (x[1] != 255) +__builtin_abort(); + return 0; +} 2016-08-01 Bernd Edlinger PR rtl-optimization/71779 * emit-rtl.c (set_reg_attrs_from_value): Only propagate REG_POINTER, if the value was sign-extended according to POINTERS_EXTEND_UNSIGNED or if it was truncated. Index: gcc/emit-rtl.c === --- gcc/emit-rtl.c (revision 238915) +++ gcc/emit-rtl.c (working copy) @@ -1156,7 +1156,11 @@ set_reg_attrs_from_value (rtx reg, rtx x) { #if defined(POINTERS_EXTEND_UNSIGNED) if (((GET_CODE (x) == SIGN_EXTEND && POINTERS_EXTEND_UNSIGNED) - || (GET_CODE (x) != SIGN_EXTEND && ! POINTERS_EXTEND_UNSIGNED)) + || (GET_CODE (x) == ZERO_EXTEND && ! POINTERS_EXTEND_UNSIGNED) + || (paradoxical_subreg_p (x) + && ! (SUBREG_PROMOTED_VAR_P (x) + && SUBREG_CHECK_PROMOTED_SIGN (x, + POINTERS_EXTEND_UNSIGNED && !targetm.have_ptr_extend ()) can_be_reg_pointer = false; #endif
Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
On 08/01/2016 12:52 PM, Bernd Edlinger wrote: Hi Jeff, On 08/01/16 19:54, Jeff Law wrote: > Looks like you've probably nailed it. It'll be interesting see if > there's any fallout (though our RTL optimizer testing is pretty weak, so > even if there were, I doubt we'd catch it). > If there is, it will probably a performance regression... Anyway I'd say these two patches do just disable actually wrong transformations. So here are both patches as separate diffs with your suggestion for the comment in cse_insn. I believe that on x86_64 both patches do not change a single bit. However I think there are more paradoxical subregs generated all over, but the aarch64 insv code pattern did trigger more hidden bugs than any other port. It is certainly unfortunate that the major source of paradoxical subreg is in a target-dependent code path :( Please apologize that I am not able to reduce/finalize the aarch64 test case at this time, as I usually only work with arm and intel targets, but I made an exception here, because a bug like that may affect all targets sooner or later. Boot-strap and reg-testing on x86_64-linux-gnu. Plus aarch64 bootstrap and isl-testing by Andreas. Is it OK for trunk? cse.c changes look good, but I'd really like to see a testcase for each issue in the dejagnu framework. Extra points if you tried to build a unit test using David M's framework, but that isn't required. The testcase from 70903 ought to be trivial to add to the dejagnu suite. 71779 might be more difficult, but if you could take a stab, it'd be appreciated. jeff
Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
On 08/02/2016 02:16 PM, Bernd Schmidt wrote: On 08/02/2016 06:46 PM, Segher Boessenkool wrote: On Tue, Aug 02, 2016 at 09:21:34AM -0600, Jeff Law wrote: However I think there are more paradoxical subregs generated all over, but the aarch64 insv code pattern did trigger more hidden bugs than any other port. It is certainly unfortunate that the major source of paradoxical subreg is in a target-dependent code path :( It is certainly unfortunate that paradoxical subregs exist at all! :-) Yea. It probably seemed like a good idea 25-30 years ago, but I always cringe when I see them being used. Yea it gives the compiler some more freedom, but more often than not I think we'd be better off with real extensions. And then perhaps have some bits marked as "do not care", perhaps using a register note... This would help other cases as well. I'm thinking maybe an any_extend code to go along with sign_extend and zero_extend. If input and output registers are the same it would be treated like a no-op move. That might be close enough to get us the benefits of a paradoxical subreg. I was kind of thinking along the same lines. Not high priority though. Jeff
Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
Hi, Is it OK for the trunk? I guess so, but need an explicit OK. Thanks Bernd. On 08/01/16 20:52, Bernd Edlinger wrote: > Hi Jeff, > > On 08/01/16 19:54, Jeff Law wrote: >> Looks like you've probably nailed it. It'll be interesting see if >> there's any fallout (though our RTL optimizer testing is pretty weak, so >> even if there were, I doubt we'd catch it). >> > > If there is, it will probably a performance regression... > > Anyway I'd say these two patches do just disable actually wrong > transformations. So here are both patches as separate diffs > with your suggestion for the comment in cse_insn. > > I believe that on x86_64 both patches do not change a single bit. > > However I think there are more paradoxical subregs generated all over, > but the aarch64 insv code pattern did trigger more hidden bugs than > any other port. It is certainly unfortunate that the major source > of paradoxical subreg is in a target-dependent code path :( > > Please apologize that I am not able to reduce/finalize the aarch64 test > case at this time, as I usually only work with arm and intel targets, > but I made an exception here, because a bug like that may affect all > targets sooner or later. > > > Boot-strap and reg-testing on x86_64-linux-gnu. > Plus aarch64 bootstrap and isl-testing by Andreas. > > > Is it OK for trunk? > > > > Thanks > Bernd.
Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
On 08/02/2016 06:46 PM, Segher Boessenkool wrote: On Tue, Aug 02, 2016 at 09:21:34AM -0600, Jeff Law wrote: However I think there are more paradoxical subregs generated all over, but the aarch64 insv code pattern did trigger more hidden bugs than any other port. It is certainly unfortunate that the major source of paradoxical subreg is in a target-dependent code path :( It is certainly unfortunate that paradoxical subregs exist at all! :-) Yea. It probably seemed like a good idea 25-30 years ago, but I always cringe when I see them being used. Yea it gives the compiler some more freedom, but more often than not I think we'd be better off with real extensions. And then perhaps have some bits marked as "do not care", perhaps using a register note... This would help other cases as well. I'm thinking maybe an any_extend code to go along with sign_extend and zero_extend. If input and output registers are the same it would be treated like a no-op move. That might be close enough to get us the benefits of a paradoxical subreg. Bernd
Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
On 08/02/2016 11:46 AM, Richard Biener wrote: But we love to exploit undefined behavior elsewhere, too. Now the init-regs pass comes to my mind again (papering over issues elsewhere).. True. I just haven't seen that the don't care bits created by paradoxical subregs has actually been all that useful in practice. In fact, I've seen subregs get in the way often, but if the code had a normal extension it would have been optimized better. Jeff
Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
On August 2, 2016 5:21:34 PM GMT+02:00, Jeff Lawwrote: >On 08/01/2016 05:31 PM, Segher Boessenkool wrote: >> Hi, >> >> On Mon, Aug 01, 2016 at 06:52:54PM +, Bernd Edlinger wrote: >>> On 08/01/16 19:54, Jeff Law wrote: Looks like you've probably nailed it. It'll be interesting see if there's any fallout (though our RTL optimizer testing is pretty >weak, so even if there were, I doubt we'd catch it). >>> >>> If there is, it will probably a performance regression... >> >> I tested building Linux with and without the patch, on many archs. >> The few that show differences are: >> >>alpha 6148872 6148776 >> ia64 16946958 16946670 >> s390 12345770 12345850 >> tile 12016086 12016070 >> >> (left before, right after; arm and aarch64 did not build, kernel >problems). >> >> So all except s390 generate smaller code even. >They're all deep enough in the noise that I wouldn't care either way >:-) > >> >>> However I think there are more paradoxical subregs generated all >over, >>> but the aarch64 insv code pattern did trigger more hidden bugs than >>> any other port. It is certainly unfortunate that the major source >>> of paradoxical subreg is in a target-dependent code path :( >> >> It is certainly unfortunate that paradoxical subregs exist at all! >:-) >Yea. It probably seemed like a good idea 25-30 years ago, but I always > >cringe when I see them being used. Yea it gives the compiler some more > >freedom, but more often than not I think we'd be better off with real >extensions. But we love to exploit undefined behavior elsewhere, too. Now the init-regs pass comes to my mind again (papering over issues elsewhere).. Richard. >jeff
Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
On Tue, Aug 02, 2016 at 09:21:34AM -0600, Jeff Law wrote: > >>However I think there are more paradoxical subregs generated all over, > >>but the aarch64 insv code pattern did trigger more hidden bugs than > >>any other port. It is certainly unfortunate that the major source > >>of paradoxical subreg is in a target-dependent code path :( > > > >It is certainly unfortunate that paradoxical subregs exist at all! :-) > Yea. It probably seemed like a good idea 25-30 years ago, but I always > cringe when I see them being used. Yea it gives the compiler some more > freedom, but more often than not I think we'd be better off with real > extensions. And then perhaps have some bits marked as "do not care", perhaps using a register note... This would help other cases as well. Segher
Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
On 08/01/2016 05:31 PM, Segher Boessenkool wrote: Hi, On Mon, Aug 01, 2016 at 06:52:54PM +, Bernd Edlinger wrote: On 08/01/16 19:54, Jeff Law wrote: Looks like you've probably nailed it. It'll be interesting see if there's any fallout (though our RTL optimizer testing is pretty weak, so even if there were, I doubt we'd catch it). If there is, it will probably a performance regression... I tested building Linux with and without the patch, on many archs. The few that show differences are: alpha 6148872 6148776 ia64 16946958 16946670 s390 12345770 12345850 tile 12016086 12016070 (left before, right after; arm and aarch64 did not build, kernel problems). So all except s390 generate smaller code even. They're all deep enough in the noise that I wouldn't care either way :-) However I think there are more paradoxical subregs generated all over, but the aarch64 insv code pattern did trigger more hidden bugs than any other port. It is certainly unfortunate that the major source of paradoxical subreg is in a target-dependent code path :( It is certainly unfortunate that paradoxical subregs exist at all! :-) Yea. It probably seemed like a good idea 25-30 years ago, but I always cringe when I see them being used. Yea it gives the compiler some more freedom, but more often than not I think we'd be better off with real extensions. jeff
Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
Hi, On Mon, Aug 01, 2016 at 06:52:54PM +, Bernd Edlinger wrote: > On 08/01/16 19:54, Jeff Law wrote: > > Looks like you've probably nailed it. It'll be interesting see if > > there's any fallout (though our RTL optimizer testing is pretty weak, so > > even if there were, I doubt we'd catch it). > > If there is, it will probably a performance regression... I tested building Linux with and without the patch, on many archs. The few that show differences are: alpha 6148872 6148776 ia64 16946958 16946670 s390 12345770 12345850 tile 12016086 12016070 (left before, right after; arm and aarch64 did not build, kernel problems). So all except s390 generate smaller code even. > However I think there are more paradoxical subregs generated all over, > but the aarch64 insv code pattern did trigger more hidden bugs than > any other port. It is certainly unfortunate that the major source > of paradoxical subreg is in a target-dependent code path :( It is certainly unfortunate that paradoxical subregs exist at all! :-) Segher
Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
Hi Jeff, On 08/01/16 19:54, Jeff Law wrote: > Looks like you've probably nailed it. It'll be interesting see if > there's any fallout (though our RTL optimizer testing is pretty weak, so > even if there were, I doubt we'd catch it). > If there is, it will probably a performance regression... Anyway I'd say these two patches do just disable actually wrong transformations. So here are both patches as separate diffs with your suggestion for the comment in cse_insn. I believe that on x86_64 both patches do not change a single bit. However I think there are more paradoxical subregs generated all over, but the aarch64 insv code pattern did trigger more hidden bugs than any other port. It is certainly unfortunate that the major source of paradoxical subreg is in a target-dependent code path :( Please apologize that I am not able to reduce/finalize the aarch64 test case at this time, as I usually only work with arm and intel targets, but I made an exception here, because a bug like that may affect all targets sooner or later. Boot-strap and reg-testing on x86_64-linux-gnu. Plus aarch64 bootstrap and isl-testing by Andreas. Is it OK for trunk? Thanks Bernd. 2016-08-01 Bernd EdlingerPR rtl-optimization/70903 * cse.c (cse_insn): If DEST is a paradoxical SUBREG, don't record DEST. Index: gcc/cse.c === --- gcc/cse.c (revision 238915) +++ gcc/cse.c (working copy) @@ -5898,15 +5898,7 @@ cse_insn (rtx_insn *insn) || GET_MODE (dest) == BLKmode /* If we didn't put a REG_EQUAL value or a source into the hash table, there is no point is recording DEST. */ - || sets[i].src_elt == 0 - /* If DEST is a paradoxical SUBREG and SRC is a ZERO_EXTEND - or SIGN_EXTEND, don't record DEST since it can cause - some tracking to be wrong. - - ??? Think about this more later. */ - || (paradoxical_subreg_p (dest) - && (GET_CODE (sets[i].src) == SIGN_EXTEND - || GET_CODE (sets[i].src) == ZERO_EXTEND))) + || sets[i].src_elt == 0) continue; /* STRICT_LOW_PART isn't part of the value BEING set, @@ -5925,6 +5917,11 @@ cse_insn (rtx_insn *insn) sets[i].dest_hash = HASH (dest, GET_MODE (dest)); } + /* If DEST is a paradoxical SUBREG, don't record DEST since the bits + outside the mode of GET_MODE (SUBREG_REG (dest)) are undefined. */ + if (paradoxical_subreg_p (dest)) + continue; + elt = insert (dest, sets[i].src_elt, sets[i].dest_hash, GET_MODE (dest)); 2016-08-01 Bernd Edlinger PR rtl-optimization/71779 * emit-rtl.c (set_reg_attrs_from_value): Only propagate REG_POINTER, if the value was sign-extended according to POINTERS_EXTEND_UNSIGNED or if it was truncated. Index: gcc/emit-rtl.c === --- gcc/emit-rtl.c (revision 238915) +++ gcc/emit-rtl.c (working copy) @@ -1156,7 +1156,11 @@ set_reg_attrs_from_value (rtx reg, rtx x) { #if defined(POINTERS_EXTEND_UNSIGNED) if (((GET_CODE (x) == SIGN_EXTEND && POINTERS_EXTEND_UNSIGNED) - || (GET_CODE (x) != SIGN_EXTEND && ! POINTERS_EXTEND_UNSIGNED)) + || (GET_CODE (x) == ZERO_EXTEND && ! POINTERS_EXTEND_UNSIGNED) + || (paradoxical_subreg_p (x) + && ! (SUBREG_PROMOTED_VAR_P (x) + && SUBREG_CHECK_PROMOTED_SIGN (x, + POINTERS_EXTEND_UNSIGNED && !targetm.have_ptr_extend ()) can_be_reg_pointer = false; #endif
Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
On 07/31/2016 04:44 AM, Bernd Edlinger wrote: like this? Index: emit-rtl.c === --- emit-rtl.c (revision 238891) +++ emit-rtl.c (working copy) @@ -1156,7 +1156,11 @@ { #if defined(POINTERS_EXTEND_UNSIGNED) if (((GET_CODE (x) == SIGN_EXTEND && POINTERS_EXTEND_UNSIGNED) - || (GET_CODE (x) != SIGN_EXTEND && ! POINTERS_EXTEND_UNSIGNED)) + || (GET_CODE (x) == ZERO_EXTEND && ! POINTERS_EXTEND_UNSIGNED) + || (paradoxical_subreg_p (x) + && ! (SUBREG_PROMOTED_VAR_P (x) +&& SUBREG_CHECK_PROMOTED_SIGN (x, + POINTERS_EXTEND_UNSIGNED && !targetm.have_ptr_extend ()) can_be_reg_pointer = false; #endif In the test case of pr71779 the subreg is no promoted var, so this has no influence at this time. Also I have not POINTERS_EXTEND_SIGNED target, but for the symmetry it ought to check explicitly for ZERO_EXTEND as well, and allow the pointer value to pass thru a TRUNCATE. I believe MIPS is likely the only target that extends signed, a few need special extension code (ia64/s390). But this looks reasonable to me. I don't think it's worth the complication of dealing with truncation. I debugged the cse again, to see how it works and why it mis-compiles this example. I found out that the trouble starts one instruction earlier: [ Fixing the missing bits from the insn using your later message... ] (insn 19 40 20 2 (set (subreg:DI (reg:QI 93) 0) (const_int 255 [0xff])) pr.c:8 50 {*movdi_aarch64} (expr_list:REG_DEAD (reg:DI 110 [ D.3037 ]) (nil))) cse_main sees the constant value and maps: (reg:QI 93) => (const_int 255 [0xff]) OK. This looks OK to me. We know unambiguously that (reg 93) has the value 0xff -- that's because (reg 93) is a QImode register. There are no outside QImode. plus (I mean that is wrong): (subreg:DI (reg:QI 93) 0) => (const_int 255 [0xff]) When the next insn is scanned (insn 20 19 21 2 (set (reg:DI 94) (subreg:DI (reg:QI 93) 0)) pr.c:8 50 {*movdi_aarch64} (expr_list:REG_DEAD (reg:QI 93) (nil))) I see fold_rtx (subreg:DI (reg:QI 93) 0)) return (const_int 255 [0xff]) which is wrong. I think this is the key point where things have gone wrong. While we know the QImode bits are 0xff the bits outside QImode are undefined. So we can't legitimately return 0xff when folding that rtx. now cse maps: (reg:DI 94) => (const_int 255 [0xff]) And now we propagate the mistake from the previous step Now I think I found a better place for a patch, where the first bogus mapping is recorded: Index: cse.c === --- cse.c (revision 238891) +++ cse.c (working copy) @@ -5898,15 +5898,7 @@ cse_insn (rtx_insn *insn) || GET_MODE (dest) == BLKmode /* If we didn't put a REG_EQUAL value or a source into the hash table, there is no point is recording DEST. */ - || sets[i].src_elt == 0 - /* If DEST is a paradoxical SUBREG and SRC is a ZERO_EXTEND - or SIGN_EXTEND, don't record DEST since it can cause - some tracking to be wrong. - - ??? Think about this more later. */ - || (paradoxical_subreg_p (dest) - && (GET_CODE (sets[i].src) == SIGN_EXTEND - || GET_CODE (sets[i].src) == ZERO_EXTEND))) + || sets[i].src_elt == 0) continue; /* STRICT_LOW_PART isn't part of the value BEING set, @@ -5925,6 +5917,11 @@ cse_insn (rtx_insn *insn) sets[i].dest_hash = HASH (dest, GET_MODE (dest)); } + /* If DEST is a paradoxical SUBREG, don't record DEST since it can + cause some tracking to be wrong. */ + if (paradoxical_subreg_p (dest)) + continue; + elt = insert (dest, sets[i].src_elt, sets[i].dest_hash, GET_MODE (dest)); Instead of saying "cause some tracking to be wrong", it might be better to say "the bits outside the mode of GET_MODE (SUBREG_REG (dest)) are undefined". So apparently there was already an attempt of a fix for a similar bug, and svn blame points to: svn log -v -r8354 r8354 | kenner | 1994-10-28 23:55:05 +0100 (Fri, 28 Oct 1994) | 3 lines Changed paths: M /trunk/gcc/cse.c (cse_insn): Don't record a DEST a paradoxical SUBREG and SRC is a SIGN_EXTEND or ZERO_EXTEND. This way we can still map the underlying QI register to 255 but not the SUBREG if it is a paradoxical subreg. In the test case this patch still works (output code does not change). What do you think? Looks like you've probably nailed it. It'll be interesting see if
Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
On 07/30/2016 02:17 AM, Bernd Edlinger wrote: In your first mail you showed reg 481 as _not_ being REG_POINTER: (insn 1047 1046 1048 (set (reg:DI 481) (subreg:DI (reg/f:SI 479) 0)) y.c:12702 -1 (nil)) (note the lack of /f). So which is it? REG_POINTER here is not correct as far as I can see. Oh yes, that's an interesting point, in expand I still see this: (insn 1047 1046 1048 (set (reg:DI 481) (subreg:DI (reg/f:SI 479) 0)) isl_input.c:2496 -1 (nil)) But in the last dump before combine I have this: (insn 1047 1044 1048 101 (set (reg/f:DI 481) (subreg:DI (reg/f:SI 545) 0)) isl_input.c:2496 50 {*movdi_aarch64} (nil)) However I was not really surpised by that, because the reg 545 does in deed hold a pointer value: _obj_map_vtable So just an FYI. It should always be safe to fail to mark something with REG_POINTER -- though it is possible that something has violated that design decision. So one interesting test would be to hack up things so that REG_POINTER never gets set on anything and see what that does to your testcase. (insn 22 17 23 51 (set (reg/f:SI 544) (high:SI (symbol_ref:SI ("isl_obj_map_vtable") [flags 0xc0] ))) isl_input.c:2415 49 {*movsi_aarch64} (nil)) (insn 23 22 24 51 (set (reg/f:SI 545) (lo_sum:SI (reg/f:SI 544) (symbol_ref:SI ("isl_obj_map_vtable") [flags 0xc0] ))) isl_input.c:2415 917 {add_losym_si} (expr_list:REG_DEAD (reg/f:SI 544) (expr_list:REG_EQUAL (symbol_ref:SI ("isl_obj_map_vtable") [flags 0xc0] ) (nil The "reg/f:DI 481" first appeared in cse1. I'll try to see what's happening there next Ok, I the incorrect REG_POINTER is done here: cse_main -> reg_scan -> reg_scan_mark_refs -> set_reg_attrs_from_value (reg 544) is technically not a pointer, though I think we have allowed REG_POINTER to be set on (HIGH (SYMBOL_REF)). I would expect (reg 545) to be marked as a pointer. The hesitation I have is because Pmode != SImode on this target, so technically the value has to be zero extended out to Pmode to ensure its validity. One could argue that only a properly extended object should have REG_POINTER set. and here I see a bug, because if POINTERS_EXTEND_UNSIGNED can_be_reg_pointer is only set to false if x is SIGN_EXTEND but not if x is a SUBREG as in this case. Seems like a bug to me. So I think that should be fixed this way: Index: emit-rtl.c === --- emit-rtl.c (revision 238891) +++ emit-rtl.c (working copy) @@ -1155,7 +1155,7 @@ set_reg_attrs_from_value (rtx reg, rtx x) || (GET_CODE (x) == SUBREG && subreg_lowpart_p (x))) { #if defined(POINTERS_EXTEND_UNSIGNED) - if (((GET_CODE (x) == SIGN_EXTEND && POINTERS_EXTEND_UNSIGNED) + if (((GET_CODE (x) != ZERO_EXTEND && POINTERS_EXTEND_UNSIGNED) || (GET_CODE (x) != SIGN_EXTEND && ! POINTERS_EXTEND_UNSIGNED)) && !targetm.have_ptr_extend ()) can_be_reg_pointer = false; What do you think does this look like the right fix? As Segher pointed out, I think you also want to look at SUBREG_PROMOTED_VAR_P and SUBREG_PROMOTED_UNSIGNED_P as well. I don't think they're applicable for this target, but it still seems like the right thing to do. With this patch the code the reg/f:DI 481 does no longer appear, and also the invalid combine does no longer happen. However the test case from pr70903 does not get fixed by this. But when I look at the dumps, I see again the first incorrect transformation in cse2 (again cse!): (insn 20 19 21 2 (set (reg:DI 94) (subreg:DI (reg:QI 93) 0)) pr.c:8 50 {*movdi_aarch64} (expr_list:REG_EQUAL (const_int 255 [0xff]) (expr_list:REG_DEAD (reg:QI 93) (nil but that is simply wrong, because later optimization passes expect reg 94 to be 0x00ff but the upper bits are unspecified, so that REG_EQUAL should better not exist. Now this one could be related to PROMOTE_MODE and friends. You might want to review Jim W's comments in pr65932 which describe some problems with the way the port uses PROMOTE_MODE. When I looked at cse.c I saw a comment in #if 0, which exactly describes the problem that we have with paradoxical subreg here: Index: cse.c === --- cse.c (revision 238891) +++ cse.c (working copy) @@ -4716,10 +4716,6 @@ cse_insn (rtx_insn *insn) } } -#if 0 - /* It is no longer clear why we used to do this, but it doesn't -appear to still be needed. So let's try without it since this -code hurts cse'ing widened ops. */ /* If source is a paradoxical subreg (such as QI treated as an SI), treat it as volatile. It may do the work of an SI in one context where the extra bits are not being used, but cannot replace an SI @@ -4726,7 +4722,6 @@
Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
On 07/31/16 12:43, Bernd Edlinger wrote: > I found out that the trouble starts one instruction earlier: > > (insn 19 40 20 2 (set (subreg:DI (reg:QI 93) 0) > ) pr.c:8 50 {*movdi_aarch64} > (expr_list:REG_DEAD (reg:DI 110 [ D.3037 ]) > (nil))) > oops, my e-mail missed one line that's what the insn actually looks like: (insn 19 40 20 2 (set (subreg:DI (reg:QI 93) 0) (const_int 255 [0xff])) pr.c:8 50 {*movdi_aarch64} (expr_list:REG_DEAD (reg:DI 110 [ D.3037 ]) (nil))) So we can say reg:QI 93 is 255, but have no idea what an expression like (subreg:DI (reg:QI 93) 0) will evaluate to. Bernd.
Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
On 07/30/16 13:39, Segher Boessenkool wrote: > On Sat, Jul 30, 2016 at 08:17:59AM +, Bernd Edlinger wrote: >> Ok, I the incorrect REG_POINTER is done here: >> >> cse_main -> reg_scan -> reg_scan_mark_refs -> set_reg_attrs_from_value >> >> and here I see a bug, because if POINTERS_EXTEND_UNSIGNED >> can_be_reg_pointer is only set to false if x is SIGN_EXTEND but not >> if x is a SUBREG as in this case. >> >> So I think that should be fixed this way: >> >> Index: emit-rtl.c >> === >> --- emit-rtl.c (revision 238891) >> +++ emit-rtl.c (working copy) >> @@ -1155,7 +1155,7 @@ set_reg_attrs_from_value (rtx reg, rtx x) >> || (GET_CODE (x) == SUBREG && subreg_lowpart_p (x))) >>{ >>#if defined(POINTERS_EXTEND_UNSIGNED) >> - if (((GET_CODE (x) == SIGN_EXTEND && POINTERS_EXTEND_UNSIGNED) >> + if (((GET_CODE (x) != ZERO_EXTEND && POINTERS_EXTEND_UNSIGNED) >> || (GET_CODE (x) != SIGN_EXTEND && ! POINTERS_EXTEND_UNSIGNED)) >>&& !targetm.have_ptr_extend ()) >> can_be_reg_pointer = false; >> >> >> What do you think does this look like the right fix? > > There also is (from rtl.texi), for paradoxical subregs: > > """ > @item @code{subreg} of @code{reg}s > The upper bits are defined when @code{SUBREG_PROMOTED_VAR_P} is true. > @code{SUBREG_PROMOTED_UNSIGNED_P} describes what the upper bits hold. > Such subregs usually represent local variables, register variables > and parameter pseudo variables that have been promoted to a wider mode. > """ > > so you might want to test for these as well. > like this? Index: emit-rtl.c === --- emit-rtl.c (revision 238891) +++ emit-rtl.c (working copy) @@ -1156,7 +1156,11 @@ { #if defined(POINTERS_EXTEND_UNSIGNED) if (((GET_CODE (x) == SIGN_EXTEND && POINTERS_EXTEND_UNSIGNED) - || (GET_CODE (x) != SIGN_EXTEND && ! POINTERS_EXTEND_UNSIGNED)) + || (GET_CODE (x) == ZERO_EXTEND && ! POINTERS_EXTEND_UNSIGNED) + || (paradoxical_subreg_p (x) + && ! (SUBREG_PROMOTED_VAR_P (x) +&& SUBREG_CHECK_PROMOTED_SIGN (x, + POINTERS_EXTEND_UNSIGNED && !targetm.have_ptr_extend ()) can_be_reg_pointer = false; #endif In the test case of pr71779 the subreg is no promoted var, so this has no influence at this time. Also I have not POINTERS_EXTEND_SIGNED target, but for the symmetry it ought to check explicitly for ZERO_EXTEND as well, and allow the pointer value to pass thru a TRUNCATE. >> With this patch the code the reg/f:DI 481 does no longer appear, >> and also the invalid combine does no longer happen. > > Good :-) > >> However the test case from pr70903 does not get fixed by this. >> >> But when I look at the dumps, I see again the first incorrect >> transformation in cse2 (again cse!): >> >> (insn 20 19 21 2 (set (reg:DI 94) >> (subreg:DI (reg:QI 93) 0)) pr.c:8 50 {*movdi_aarch64} >>(expr_list:REG_EQUAL (const_int 255 [0xff]) >> (expr_list:REG_DEAD (reg:QI 93) >> (nil >> >> but that is simply wrong, because later optimization passes >> expect reg 94 to be 0x00ff but the upper bits are unspecified, >> so that REG_EQUAL should better not exist. > > Agreed. So where does that come from? > >> When I looked at cse.c I saw a comment in #if 0, which exactly >> describes the problem that we have with paradoxical subreg here: > > > >> I am pretty sure, that this has to be reverted, and that it can >> generate less efficient code. >> >> What do you think? > > I think this pessimises the generated code too much; there must be a > better solution. > I debugged the cse again, to see how it works and why it mis-compiles this example. I found out that the trouble starts one instruction earlier: (insn 19 40 20 2 (set (subreg:DI (reg:QI 93) 0) ) pr.c:8 50 {*movdi_aarch64} (expr_list:REG_DEAD (reg:DI 110 [ D.3037 ]) (nil))) cse_main sees the constant value and maps: (reg:QI 93) => (const_int 255 [0xff]) plus (I mean that is wrong): (subreg:DI (reg:QI 93) 0) => (const_int 255 [0xff]) When the next insn is scanned (insn 20 19 21 2 (set (reg:DI 94) (subreg:DI (reg:QI 93) 0)) pr.c:8 50 {*movdi_aarch64} (expr_list:REG_DEAD (reg:QI 93) (nil))) I see fold_rtx (subreg:DI (reg:QI 93) 0)) return (const_int 255 [0xff]) which is wrong. now cse maps: (reg:DI 94) => (const_int 255 [0xff]) which is also not guaranteed to be correct, but the REG_EQUAL at insn 20 is probably only a symptom, suppressing only that does not fix the test case, because later we have: (insn 25 24 26 2 (set (reg:DI 97) (const_int 255 [0xff])) pr.c:9 50 {*movdi_aarch64} (nil)) (insn 26 25 48 2 (set (zero_extract:DI (reg:DI 102 [ D.3038 ]) (const_int 32 [0x20])
Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
On Sat, Jul 30, 2016 at 08:17:59AM +, Bernd Edlinger wrote: > Ok, I the incorrect REG_POINTER is done here: > > cse_main -> reg_scan -> reg_scan_mark_refs -> set_reg_attrs_from_value > > and here I see a bug, because if POINTERS_EXTEND_UNSIGNED > can_be_reg_pointer is only set to false if x is SIGN_EXTEND but not > if x is a SUBREG as in this case. > > So I think that should be fixed this way: > > Index: emit-rtl.c > === > --- emit-rtl.c(revision 238891) > +++ emit-rtl.c(working copy) > @@ -1155,7 +1155,7 @@ set_reg_attrs_from_value (rtx reg, rtx x) >|| (GET_CODE (x) == SUBREG && subreg_lowpart_p (x))) > { > #if defined(POINTERS_EXTEND_UNSIGNED) > - if (((GET_CODE (x) == SIGN_EXTEND && POINTERS_EXTEND_UNSIGNED) > + if (((GET_CODE (x) != ZERO_EXTEND && POINTERS_EXTEND_UNSIGNED) > || (GET_CODE (x) != SIGN_EXTEND && ! POINTERS_EXTEND_UNSIGNED)) > && !targetm.have_ptr_extend ()) > can_be_reg_pointer = false; > > > What do you think does this look like the right fix? There also is (from rtl.texi), for paradoxical subregs: """ @item @code{subreg} of @code{reg}s The upper bits are defined when @code{SUBREG_PROMOTED_VAR_P} is true. @code{SUBREG_PROMOTED_UNSIGNED_P} describes what the upper bits hold. Such subregs usually represent local variables, register variables and parameter pseudo variables that have been promoted to a wider mode. """ so you might want to test for these as well. > With this patch the code the reg/f:DI 481 does no longer appear, > and also the invalid combine does no longer happen. Good :-) > However the test case from pr70903 does not get fixed by this. > > But when I look at the dumps, I see again the first incorrect > transformation in cse2 (again cse!): > > (insn 20 19 21 2 (set (reg:DI 94) > (subreg:DI (reg:QI 93) 0)) pr.c:8 50 {*movdi_aarch64} > (expr_list:REG_EQUAL (const_int 255 [0xff]) > (expr_list:REG_DEAD (reg:QI 93) > (nil > > but that is simply wrong, because later optimization passes > expect reg 94 to be 0x00ff but the upper bits are unspecified, > so that REG_EQUAL should better not exist. Agreed. So where does that come from? > When I looked at cse.c I saw a comment in #if 0, which exactly > describes the problem that we have with paradoxical subreg here: > I am pretty sure, that this has to be reverted, and that it can > generate less efficient code. > > What do you think? I think this pessimises the generated code too much; there must be a better solution. Segher
Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
On 07/29/16 15:03, Bernd Edlinger wrote: > On 07/29/16 09:02, Segher Boessenkool wrote: >> On Thu, Jul 28, 2016 at 08:59:44PM +, Bernd Edlinger wrote: >>> (insn 1047 1044 1048 101 (set (reg/f:DI 481) >>> (subreg:DI (reg/f:SI 545) 0)) isl_input.c:2496 50 >>> {*movdi_aarch64} >>>(nil)) >> >> In your first mail you showed reg 481 as _not_ being REG_POINTER: >> >> (insn 1047 1046 1048 (set (reg:DI 481) >> (subreg:DI (reg/f:SI 479) 0)) y.c:12702 -1 >> (nil)) >> >> (note the lack of /f). So which is it? REG_POINTER here is not correct >> as far as I can see. >> > > Oh yes, that's an interesting point, in expand I still see this: > > > (insn 1047 1046 1048 (set (reg:DI 481) > (subreg:DI (reg/f:SI 479) 0)) isl_input.c:2496 -1 > (nil)) > > But in the last dump before combine I have this: > > (insn 1047 1044 1048 101 (set (reg/f:DI 481) > (subreg:DI (reg/f:SI 545) 0)) isl_input.c:2496 50 {*movdi_aarch64} > (nil)) > > > However I was not really surpised by that, because the reg 545 does > in deed hold a pointer value: _obj_map_vtable > > (insn 22 17 23 51 (set (reg/f:SI 544) > (high:SI (symbol_ref:SI ("isl_obj_map_vtable") [flags 0xc0] > ))) isl_input.c:2415 49 > {*movsi_aarch64} > (nil)) > (insn 23 22 24 51 (set (reg/f:SI 545) > (lo_sum:SI (reg/f:SI 544) > (symbol_ref:SI ("isl_obj_map_vtable") [flags 0xc0] > ))) isl_input.c:2415 917 > {add_losym_si} > (expr_list:REG_DEAD (reg/f:SI 544) > (expr_list:REG_EQUAL (symbol_ref:SI ("isl_obj_map_vtable") > [flags 0xc0] ) > (nil > > The "reg/f:DI 481" first appeared in cse1. > > > I'll try to see what's happening there next Ok, I the incorrect REG_POINTER is done here: cse_main -> reg_scan -> reg_scan_mark_refs -> set_reg_attrs_from_value and here I see a bug, because if POINTERS_EXTEND_UNSIGNED can_be_reg_pointer is only set to false if x is SIGN_EXTEND but not if x is a SUBREG as in this case. So I think that should be fixed this way: Index: emit-rtl.c === --- emit-rtl.c (revision 238891) +++ emit-rtl.c (working copy) @@ -1155,7 +1155,7 @@ set_reg_attrs_from_value (rtx reg, rtx x) || (GET_CODE (x) == SUBREG && subreg_lowpart_p (x))) { #if defined(POINTERS_EXTEND_UNSIGNED) - if (((GET_CODE (x) == SIGN_EXTEND && POINTERS_EXTEND_UNSIGNED) + if (((GET_CODE (x) != ZERO_EXTEND && POINTERS_EXTEND_UNSIGNED) || (GET_CODE (x) != SIGN_EXTEND && ! POINTERS_EXTEND_UNSIGNED)) && !targetm.have_ptr_extend ()) can_be_reg_pointer = false; What do you think does this look like the right fix? With this patch the code the reg/f:DI 481 does no longer appear, and also the invalid combine does no longer happen. However the test case from pr70903 does not get fixed by this. But when I look at the dumps, I see again the first incorrect transformation in cse2 (again cse!): (insn 20 19 21 2 (set (reg:DI 94) (subreg:DI (reg:QI 93) 0)) pr.c:8 50 {*movdi_aarch64} (expr_list:REG_EQUAL (const_int 255 [0xff]) (expr_list:REG_DEAD (reg:QI 93) (nil but that is simply wrong, because later optimization passes expect reg 94 to be 0x00ff but the upper bits are unspecified, so that REG_EQUAL should better not exist. When I looked at cse.c I saw a comment in #if 0, which exactly describes the problem that we have with paradoxical subreg here: Index: cse.c === --- cse.c (revision 238891) +++ cse.c (working copy) @@ -4716,10 +4716,6 @@ cse_insn (rtx_insn *insn) } } -#if 0 - /* It is no longer clear why we used to do this, but it doesn't -appear to still be needed. So let's try without it since this -code hurts cse'ing widened ops. */ /* If source is a paradoxical subreg (such as QI treated as an SI), treat it as volatile. It may do the work of an SI in one context where the extra bits are not being used, but cannot replace an SI @@ -4726,7 +4722,6 @@ cse_insn (rtx_insn *insn) in general. */ if (paradoxical_subreg_p (src)) sets[i].src_volatile = 1; -#endif /* Locate all possible equivalent forms for SRC. Try to replace SRC in the insn with each cheaper equivalent. removing the #if 0 seems to fix the sample from pr70903, but generates 3 more instructions than with my initial patch: @@ -6,12 +6,15 @@ foo: adrpx0, .LC0 add x0, x0, :lo12:.LC0 + mov x5, -1 mov x1, 0 ldp x4, x3, [x0, 8] ldr x2, [x0, 24] - mov x0, 255 - bfi x1, x0, 0, 32 + mov x0, 0 + bfi x0, x5, 0, 8 stp x3, x2, [x8, 16] + bfi x1, x0, 0, 32 + mov x0, 255 bfi x1, x0, 32, 32 stp x1,
Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
On 07/29/16 09:02, Segher Boessenkool wrote: > On Thu, Jul 28, 2016 at 08:59:44PM +, Bernd Edlinger wrote: >> (insn 1047 1044 1048 101 (set (reg/f:DI 481) >> (subreg:DI (reg/f:SI 545) 0)) isl_input.c:2496 50 {*movdi_aarch64} >>(nil)) > > In your first mail you showed reg 481 as _not_ being REG_POINTER: > > (insn 1047 1046 1048 (set (reg:DI 481) > (subreg:DI (reg/f:SI 479) 0)) y.c:12702 -1 > (nil)) > > (note the lack of /f). So which is it? REG_POINTER here is not correct > as far as I can see. > Oh yes, that's an interesting point, in expand I still see this: (insn 1047 1046 1048 (set (reg:DI 481) (subreg:DI (reg/f:SI 479) 0)) isl_input.c:2496 -1 (nil)) But in the last dump before combine I have this: (insn 1047 1044 1048 101 (set (reg/f:DI 481) (subreg:DI (reg/f:SI 545) 0)) isl_input.c:2496 50 {*movdi_aarch64} (nil)) However I was not really surpised by that, because the reg 545 does in deed hold a pointer value: _obj_map_vtable (insn 22 17 23 51 (set (reg/f:SI 544) (high:SI (symbol_ref:SI ("isl_obj_map_vtable") [flags 0xc0] ))) isl_input.c:2415 49 {*movsi_aarch64} (nil)) (insn 23 22 24 51 (set (reg/f:SI 545) (lo_sum:SI (reg/f:SI 544) (symbol_ref:SI ("isl_obj_map_vtable") [flags 0xc0] ))) isl_input.c:2415 917 {add_losym_si} (expr_list:REG_DEAD (reg/f:SI 544) (expr_list:REG_EQUAL (symbol_ref:SI ("isl_obj_map_vtable") [flags 0xc0] ) (nil The "reg/f:DI 481" first appeared in cse1. I'll try to see what's happening there next Thanks Bernd.
Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
On Thu, Jul 28, 2016 at 08:59:44PM +, Bernd Edlinger wrote: > (insn 1047 1044 1048 101 (set (reg/f:DI 481) > (subreg:DI (reg/f:SI 545) 0)) isl_input.c:2496 50 {*movdi_aarch64} > (nil)) In your first mail you showed reg 481 as _not_ being REG_POINTER: (insn 1047 1046 1048 (set (reg:DI 481) (subreg:DI (reg/f:SI 479) 0)) y.c:12702 -1 (nil)) (note the lack of /f). So which is it? REG_POINTER here is not correct as far as I can see. Segher
Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
Hi, On 07/27/16 23:31, Jeff Law wrote: > So you're stumbling into another really interesting area. > Absolutely, I am just too curious what's going on here ;-) > I can hazard a guess that the reason we create the paradoxical SUBREG > rather than a zero or sign extension is because various optimizers know > that the upper bits in a paradoxical are don't care bits and may make > transformations with that knowledge. > > Once you turn that into a zero/sign extend, then the rest of the > optimizers must preserve the zero/sign extension property -- they have > no way of knowing the bits were don't cares. > > So it's not necessarily clear that your change results in generally > better code or not. > > More importantly, it really feels like you're papering over a real bug > elsewhere. AFAICT the use of a paradoxical subreg here is safe. So the > real bug ought to be downstream of this point. Several folks have > pointed at reload, which would certainly be a point ripe for a > paradoxical subreg problem. I have looked again at the test case of PR 71779 and discovered something that I wanted to discuss here. I have tried to figure out what made combine change this: (insn 1048 1047 1049 101 (set (zero_extract:DI (reg/v:DI 191 [ obj1D.17943 ]) (const_int 32 [0x20]) (const_int 0 [0])) (reg/f:DI 481)) isl_input.c:2496 691 {*insv_regdi} (nil)) (insn 1049 1048 1050 101 (set (zero_extract:DI (reg/v:DI 191 [ obj1D.17943 ]) (const_int 32 [0x20]) (const_int 32 [0x20])) (reg/v/f:DI 145 [ SR.327D.17957 ])) isl_input.c:2496 691 {*insv_regdi} (expr_list:REG_DEAD (reg/v/f:DI 145 [ SR.327D.17957 ]) (nil))) (insn 1050 1049 1051 101 (set (reg:DI 1 x1) (reg/v:DI 191 [ obj1D.17943 ])) isl_input.c:2496 50 {*movdi_aarch64} (expr_list:REG_DEAD (reg/v:DI 191 [ obj1D.17943 ]) (nil))) into that (which is wrong because reg 481 has undefined bits in insn 1050): (insn 1048 1047 1049 101 (set (zero_extract:DI (reg/v:DI 191 [ obj1D.17943 ]) (const_int 32 [0x20]) (const_int 0 [0])) (reg/f:DI 481)) isl_input.c:2496 691 {*insv_regdi} (nil)) (insn 1049 1048 1050 101 (set (zero_extract:DI (reg/v:DI 191 [ obj1D.17943 ]) (const_int 32 [0x20]) (const_int 32 [0x20])) (reg/v/f:DI 145 [ SR.327D.17957 ])) isl_input.c:2496 691 {*insv_regdi} (expr_list:REG_DEAD (reg/v/f:DI 145 [ SR.327D.17957 ]) (nil))) (insn 1050 1049 1051 101 (set (reg:DI 1 x1) (ior:DI (ashift:DI (reg/v/f:DI 145 [ SR.327D.17957 ]) (const_int 32 [0x20])) (reg/f:DI 481))) isl_input.c:2496 50 {*movdi_aarch64} (nil)) The interesting fact is that combine did that while it was only considering 1048, 1049 -> 1050, and not the instruction with the paradoxical subreg: (insn 1047 1044 1048 101 (set (reg/f:DI 481) (subreg:DI (reg/f:SI 545) 0)) isl_input.c:2496 50 {*movdi_aarch64} (nil)) I found by debugging that expand_field_assignment was trying to build: (and:DI (reg/f:DI 481) (const_int -4294967296 [0x])) with this statement: masked = simplify_gen_binary (ASHIFT, compute_mode, simplify_gen_binary ( AND, compute_mode, gen_lowpart (compute_mode, SET_SRC (x)), mask), pos); but the result is just (reg/f:DI 481) ! I debugged and found the first wrong result in rtlanal.c (nonzero_bits1) where the following if-statement tells us that the upper 32 bits of reg 481 must be zero: switch (code) { case REG: #if defined(POINTERS_EXTEND_UNSIGNED) /* If pointers extend unsigned and this is a pointer in Pmode, say that all the bits above ptr_mode are known to be zero. */ /* As we do not know which address space the pointer is referring to, we can do this only if the target does not support different pointer or address modes depending on the address space. */ if (target_default_pointer_address_modes_p () && POINTERS_EXTEND_UNSIGNED && GET_MODE (x) == Pmode && REG_POINTER (x) && !targetm.have_ptr_extend ()) nonzero &= GET_MODE_MASK (ptr_mode); #endif Pmode = DImode, ptr_mode=SImode, and REG_POINTER(x) is true. Yes, the value is actually a pointer! Which means, that it is not safe to extend a pointer from SI to DI with anything but a zero-extend. And if I remove this statement the wrong code is still not fixed. So there must be more places where the similar assumptions are made. However, PR 70903 is not about pointers, and uses a mode where Pmode=ptr_mode=DImode, so there must be a different as hard to track down reason why this did not work out right. So the question is, if the
Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
Am 27.07.2016 um 23:31 schrieb Jeff Law: > On 07/26/2016 11:32 AM, Bernd Edlinger wrote: >> Hi! >> >> As described in PR 71779 and PR 70903 we have a wrong code issue on >> aarch64 >> which is triggered by a paradoxical subreg that can be created in >> store_bit_field_using_insv when the target has an EP_insv bitfield >> instruction. >> >> The attached patch changes this insn... >> >> (insn 1047 1046 1048 (set (reg:DI 481) >> (subreg:DI (reg/f:SI 479) 0)) y.c:12702 -1 >> (nil)) >> >> ... into a more simple insn: >> >> (insn 1047 1046 1048 (set (reg:DI 479) >> (zero_extend:DI (reg:SI 480))) isl_input.c:2496 -1 >> (nil)) >> >> which manages to fix two bugs at the same time. >> >> The patch does not include a test case, as I was looking at the PR 71779 >> just by curiosity, and I have only a theoretical interest in aarch64. >> However, >> it should be easy to extract one from PR 70903 at least, but I can't >> do that by >> myself. >> >> Therefore I would like to leave the test case(s) to Cristina Tamar >> from ARM, >> and/or Andreas Schwab, who have also helped out with bootstrapping and >> reg-testing on aarch64 and aarch64-ilp32. >> >> Bootstrap and reg-test on x86_64-linux-gnu with no regressions. >> Successfully bootstrapped on aarch64_ilp32 and the ISL testsuite passes. >> Also successfully bootstrapped on an aarch64 juno board and no >> regressions. >> >> >> Is it OK for trunk? >> >> >> Thanks >> Bernd. >> >> >> patch-pr71779.diff >> >> >> 2016-07-25 Bernd Edlinger>> >> PR rtl-optimization/71779 >> PR rtl-optimization/70903 >> * expmed.c (store_bit_field_using_insv): Don't generate a paradoxical >> subreg. > So you're stumbling into another really interesting area. > > I can hazard a guess that the reason we create the paradoxical SUBREG > rather than a zero or sign extension is because various optimizers know > that the upper bits in a paradoxical are don't care bits and may make > transformations with that knowledge. > > Once you turn that into a zero/sign extend, then the rest of the > optimizers must preserve the zero/sign extension property -- they have > no way of knowing the bits were don't cares. > > So it's not necessarily clear that your change results in generally > better code or not. > Yes, I understand, on x86 the stage 2/3 did not change but x86 has probably not a very sophisticated insv code pattern. > More importantly, it really feels like you're papering over a real bug > elsewhere. AFAICT the use of a paradoxical subreg here is safe. So the > real bug ought to be downstream of this point. Several folks have > pointed at reload, which would certainly be a point ripe for a > paradoxical subreg problem. > I debugged lra already, because I was initially sure it did something wrong with the subreg, but it turned out that it follows exactly what you say here, it spills the subreg twice, once as a 32 bit value, into a 64 bit slot with stack image in the upper word and then again as a zero extended 64 bit value. Later the garbage extended 64 bit value is used when it should not. So lra seems not to be the reason, but it is apparently spilling the paradoxical subreg twice. So at least lra does not generate better code from paradoxical subreg. > Sooo. I don't think we can go with this patch as-is today. We need to > find the root problem which I believe is later in the pipeline. > > This patch might make sense from an optimization standpoint -- I'm > generally of the opinion that while paradoxical subregs give the > optimziers more freedom, the optimizers rarely are able to use it and > they generally know much more about the semantics of and how to optimize > around zero/sign extensions. But we really should fix the bug first, > then come back to improving the code generation. > Yes. And paradoxical subreg have non-intuitive semantics. Thanks Bernd. > Now if someone wanted to look for an interesting optimization project -- > ree.c knows how to do redundant extension eliminations. A paradoxical > is a form of extension that isn't currently understood by REE. Similarly > REE doesn't know how to handle zero-extension as an "and" or sign > extension as a pair of shifts. Both of which can occur in practice due > to costs or ISA limitations. I wouldn't be surprise if adding those > capabilities to REE would significantly improve its ability to eliminate > unnecessary extensions. > > Jeff >
Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
On 07/26/2016 11:32 AM, Bernd Edlinger wrote: Hi! As described in PR 71779 and PR 70903 we have a wrong code issue on aarch64 which is triggered by a paradoxical subreg that can be created in store_bit_field_using_insv when the target has an EP_insv bitfield instruction. The attached patch changes this insn... (insn 1047 1046 1048 (set (reg:DI 481) (subreg:DI (reg/f:SI 479) 0)) y.c:12702 -1 (nil)) ... into a more simple insn: (insn 1047 1046 1048 (set (reg:DI 479) (zero_extend:DI (reg:SI 480))) isl_input.c:2496 -1 (nil)) which manages to fix two bugs at the same time. The patch does not include a test case, as I was looking at the PR 71779 just by curiosity, and I have only a theoretical interest in aarch64. However, it should be easy to extract one from PR 70903 at least, but I can't do that by myself. Therefore I would like to leave the test case(s) to Cristina Tamar from ARM, and/or Andreas Schwab, who have also helped out with bootstrapping and reg-testing on aarch64 and aarch64-ilp32. Bootstrap and reg-test on x86_64-linux-gnu with no regressions. Successfully bootstrapped on aarch64_ilp32 and the ISL testsuite passes. Also successfully bootstrapped on an aarch64 juno board and no regressions. Is it OK for trunk? Thanks Bernd. patch-pr71779.diff 2016-07-25 Bernd EdlingerPR rtl-optimization/71779 PR rtl-optimization/70903 * expmed.c (store_bit_field_using_insv): Don't generate a paradoxical subreg. So you're stumbling into another really interesting area. I can hazard a guess that the reason we create the paradoxical SUBREG rather than a zero or sign extension is because various optimizers know that the upper bits in a paradoxical are don't care bits and may make transformations with that knowledge. Once you turn that into a zero/sign extend, then the rest of the optimizers must preserve the zero/sign extension property -- they have no way of knowing the bits were don't cares. So it's not necessarily clear that your change results in generally better code or not. More importantly, it really feels like you're papering over a real bug elsewhere. AFAICT the use of a paradoxical subreg here is safe. So the real bug ought to be downstream of this point. Several folks have pointed at reload, which would certainly be a point ripe for a paradoxical subreg problem. Sooo. I don't think we can go with this patch as-is today. We need to find the root problem which I believe is later in the pipeline. This patch might make sense from an optimization standpoint -- I'm generally of the opinion that while paradoxical subregs give the optimziers more freedom, the optimizers rarely are able to use it and they generally know much more about the semantics of and how to optimize around zero/sign extensions. But we really should fix the bug first, then come back to improving the code generation. Now if someone wanted to look for an interesting optimization project -- ree.c knows how to do redundant extension eliminations. A paradoxical is a form of extension that isn't currently understood by REE. Similarly REE doesn't know how to handle zero-extension as an "and" or sign extension as a pair of shifts. Both of which can occur in practice due to costs or ISA limitations. I wouldn't be surprise if adding those capabilities to REE would significantly improve its ability to eliminate unnecessary extensions. Jeff
[PATCH] Fix wrong code on aarch64 due to paradoxical subreg
Hi! As described in PR 71779 and PR 70903 we have a wrong code issue on aarch64 which is triggered by a paradoxical subreg that can be created in store_bit_field_using_insv when the target has an EP_insv bitfield instruction. The attached patch changes this insn... (insn 1047 1046 1048 (set (reg:DI 481) (subreg:DI (reg/f:SI 479) 0)) y.c:12702 -1 (nil)) ... into a more simple insn: (insn 1047 1046 1048 (set (reg:DI 479) (zero_extend:DI (reg:SI 480))) isl_input.c:2496 -1 (nil)) which manages to fix two bugs at the same time. The patch does not include a test case, as I was looking at the PR 71779 just by curiosity, and I have only a theoretical interest in aarch64. However, it should be easy to extract one from PR 70903 at least, but I can't do that by myself. Therefore I would like to leave the test case(s) to Cristina Tamar from ARM, and/or Andreas Schwab, who have also helped out with bootstrapping and reg-testing on aarch64 and aarch64-ilp32. Bootstrap and reg-test on x86_64-linux-gnu with no regressions. Successfully bootstrapped on aarch64_ilp32 and the ISL testsuite passes. Also successfully bootstrapped on an aarch64 juno board and no regressions. Is it OK for trunk? Thanks Bernd.2016-07-25 Bernd EdlingerPR rtl-optimization/71779 PR rtl-optimization/70903 * expmed.c (store_bit_field_using_insv): Don't generate a paradoxical subreg. Index: gcc/calls.c Index: expmed.c === --- expmed.c (revision 238694) +++ expmed.c (working copy) @@ -664,14 +664,7 @@ store_bit_field_using_insv (const extraction_insn if we must narrow it, be sure we do it correctly. */ if (GET_MODE_SIZE (GET_MODE (value)) < GET_MODE_SIZE (op_mode)) - { - tmp = simplify_subreg (op_mode, value1, GET_MODE (value), 0); - if (! tmp) - tmp = simplify_gen_subreg (op_mode, - force_reg (GET_MODE (value), - value1), - GET_MODE (value), 0); - } + tmp = convert_to_mode (op_mode, value1, 1); else { tmp = gen_lowpart_if_possible (op_mode, value1);