Re: [PATCH] combine: Don't optimize SIGN_EXTEND of MEM on WORD_REGISTER_OPERATIONS targets [PR113010]

2024-02-07 Thread Greg McGary



On 2/4/24 9:58 PM, Jeff Law wrote:


On 2/2/24 15:48, Greg McGary wrote:


input: (sign_extend:DI (mem/c:SI (symbol_ref:DI ("minus_1") [flags 
0x86] ) [1 minus_1+0 S4 A32]))


result: (subreg:DI (mem/c:SI (symbol_ref:DI ("minus_1") [flags 0x86] 
) [1 minus_1+0 S4 A32]) 0)


Later, the high part of the PSoM statically evaluates to 0, the code 
to load and test is elided, and the incorrect alternative is emitted 
unconditionally.
So I think we need to know where that high part gets statically turned 
into a zero.


I'm not happy with the sign_extend->subreg transformation as we 
generally want to avoid (subreg (mem)) for various reasons.  So we'll 
probably want to do something like your patch as well.   But let's 
chase down the static evaluation of the high part to zero first -- 
that's clearly wrong given the defined semantics of (subreg (mem)) in 
the presence of LOAD_EXTEND_OP.



This prevents the buggy evaluation. What think ye?

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 812553c091e..736206242e1 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -12550,9 +12550,15 @@ simplify_comparison (enum rtx_code code, rtx 
*pop0, rtx *pop1)

    }

  /* If the inner mode is narrower and we are extracting the 
low part,

-    we can treat the SUBREG as if it were a ZERO_EXTEND. */
+    we can treat the SUBREG as if it were a ZERO_EXTEND ...  */
  if (paradoxical_subreg_p (op0))
-   ;
+   {
+ /* ... except we can't treat as ZERO_EXTEND when a machine
+    automatically sign-extends loads. */
+ if (MEM_P (SUBREG_REG (op0)) && WORD_REGISTER_OPERATIONS
+ && load_extend_op (inner_mode) == SIGN_EXTEND)
+   break;
+   }
  else if (subreg_lowpart_p (op0)
   && GET_MODE_CLASS (mode) == MODE_INT
   && is_int_mode (GET_MODE (SUBREG_REG (op0)), 
_mode)




Re: [PATCH] combine: Don't optimize SIGN_EXTEND of MEM on WORD_REGISTER_OPERATIONS targets [PR113010]

2024-02-04 Thread Jeff Law




On 2/2/24 15:48, Greg McGary wrote:

On 2/1/24 10:24 PM, Jeff Law wrote:


On 2/1/24 18:24, Greg McGary wrote:

However, for a machine where (WORD_REGISTER_OPERATIONS && 
load_extend_op (inner_mode) == SIGN_EXTEND), the high part of a PSoM 
is  only known at runtime as 0s or 1s. That's the downstream bug. The 
fix for such machines is either (A) forbid static evaluation of the 
high part of a PSoM, or (B) forbid transforming (SIGN_EXTEND (MEM 
...) ) into a PSoM. My patch does B. Perhaps you prefer A? The 
trouble with A is that in the zero-extend case, it is valid to 
statically evaluate as 0. It is only the sign-extend case that isn't 
known until runtime. By the time we have a PSoM, its upstream 
ancestor as sign- or zero-extend is already lost.


Does that give you the understanding you desire, or are there deeper 
mysteries to probe?
It's a good start and I can see what you're trying to do -- and it may 
in fact be correct -- the quick discussion with Palmer Tuesday and 
your follow-up have helped a lot).


But just to be sure, what's the incoming rtl at function entry? just 
"debug_rtx (x)" should be sufficient.


input: (sign_extend:DI (mem/c:SI (symbol_ref:DI ("minus_1") [flags 0x86] 
) [1 minus_1+0 S4 A32]))


result: (subreg:DI (mem/c:SI (symbol_ref:DI ("minus_1") [flags 0x86] 
) [1 minus_1+0 S4 A32]) 0)


Later, the high part of the PSoM statically evaluates to 0, the code to 
load and test is elided, and the incorrect alternative is emitted 
unconditionally.
So I think we need to know where that high part gets statically turned 
into a zero.


I'm not happy with the sign_extend->subreg transformation as we 
generally want to avoid (subreg (mem)) for various reasons.  So we'll 
probably want to do something like your patch as well.   But let's chase 
down the static evaluation of the high part to zero first -- that's 
clearly wrong given the defined semantics of (subreg (mem)) in the 
presence of LOAD_EXTEND_OP.


jeff


Re: [PATCH] combine: Don't optimize SIGN_EXTEND of MEM on WORD_REGISTER_OPERATIONS targets [PR113010]

2024-02-02 Thread Greg McGary

On 2/1/24 10:24 PM, Jeff Law wrote:


On 2/1/24 18:24, Greg McGary wrote:

However, for a machine where (WORD_REGISTER_OPERATIONS && 
load_extend_op (inner_mode) == SIGN_EXTEND), the high part of a PSoM 
is  only known at runtime as 0s or 1s. That's the downstream bug. The 
fix for such machines is either (A) forbid static evaluation of the 
high part of a PSoM, or (B) forbid transforming (SIGN_EXTEND (MEM 
...) ) into a PSoM. My patch does B. Perhaps you prefer A? The 
trouble with A is that in the zero-extend case, it is valid to 
statically evaluate as 0. It is only the sign-extend case that isn't 
known until runtime. By the time we have a PSoM, its upstream 
ancestor as sign- or zero-extend is already lost.


Does that give you the understanding you desire, or are there deeper 
mysteries to probe?
It's a good start and I can see what you're trying to do -- and it may 
in fact be correct -- the quick discussion with Palmer Tuesday and 
your follow-up have helped a lot).


But just to be sure, what's the incoming rtl at function entry? just 
"debug_rtx (x)" should be sufficient.


input: (sign_extend:DI (mem/c:SI (symbol_ref:DI ("minus_1") [flags 0x86] 
) [1 minus_1+0 S4 A32]))


result: (subreg:DI (mem/c:SI (symbol_ref:DI ("minus_1") [flags 0x86] 
) [1 minus_1+0 S4 A32]) 0)


Later, the high part of the PSoM statically evaluates to 0, the code to 
load and test is elided, and the incorrect alternative is emitted 
unconditionally.


G



Re: [PATCH] combine: Don't optimize SIGN_EXTEND of MEM on WORD_REGISTER_OPERATIONS targets [PR113010]

2024-02-01 Thread Jeff Law




On 2/1/24 18:24, Greg McGary wrote:

On 1/18/24 9:24 AM, Jeff Law wrote:


On 1/17/24 20:53, Greg McGary wrote:


While the code comment is true, perhaps it obscures the primary intent,
which is recognition that the pattern (SIGN_EXTEND (mem ...) ) is 
destined

to expand into a single memory-load instruction and no simplification is
possible, so why waste time with further analysis or transformation? 
There
are plenty of other conditions that also short circuit to "do 
nothing" and
this seems just as straightforward as those others. Efforts to catch 
this

further downstream add gratuitous complexity.
Because the real bug is likely still lurking, waiting for something 
else to trigger it.


An early exit is fine when we're just trying to avoid unnecessary 
work, but there's something else going on here we need to understand 
first.



expand_compound_operation() wants to evaluate the sign- and 
zero-extension of MEM. It begins by calling gen_lowpart(), which returns 
the same result in both cases: a paradoxical subreg of a MEM (PSoM). 
What is the value of the high part of a PSoM? Can that high part be 
evaluated at compile time?
Potentially, yes.   If LOAD_EXTEND_OP returns ZERO_EXTEND, then we know 
at compile time those bits are zero.


I could come up with ways to optimize the SIGN_EXTEND case as well, but 
that would require some state tracking and it doesn't immediately look 
like expand_compound_operation or its children use any of the state 
tracking that's available in combine.  So for the sake of this problem, 
let's consider the SIGN_EXTEND case as not computable at compile-time in 
expand_compound_operation.




However, for a machine where (WORD_REGISTER_OPERATIONS && load_extend_op 
(inner_mode) == SIGN_EXTEND), the high part of a PSoM is  only known at 
runtime as 0s or 1s. That's the downstream bug. The fix for such 
machines is either (A) forbid static evaluation of the high part of a 
PSoM, or (B) forbid transforming (SIGN_EXTEND (MEM ...) ) into a PSoM. 
My patch does B. Perhaps you prefer A? The trouble with A is that in the 
zero-extend case, it is valid to statically evaluate as 0. It is only 
the sign-extend case that isn't known until runtime. By the time we have 
a PSoM, its upstream ancestor as sign- or zero-extend is already lost.


Does that give you the understanding you desire, or are there deeper 
mysteries to probe?
It's a good start and I can see what you're trying to do -- and it may 
in fact be correct -- the quick discussion with Palmer Tuesday and your 
follow-up have helped a lot).


But just to be sure, what's the incoming rtl at function entry?  just 
"debug_rtx (x)" should be sufficient.


jeff


Re: [PATCH] combine: Don't optimize SIGN_EXTEND of MEM on WORD_REGISTER_OPERATIONS targets [PR113010]

2024-02-01 Thread Greg McGary

On 1/18/24 9:24 AM, Jeff Law wrote:


On 1/17/24 20:53, Greg McGary wrote:


While the code comment is true, perhaps it obscures the primary intent,
which is recognition that the pattern (SIGN_EXTEND (mem ...) ) is 
destined

to expand into a single memory-load instruction and no simplification is
possible, so why waste time with further analysis or transformation? 
There
are plenty of other conditions that also short circuit to "do 
nothing" and
this seems just as straightforward as those others. Efforts to catch 
this

further downstream add gratuitous complexity.
Because the real bug is likely still lurking, waiting for something 
else to trigger it.


An early exit is fine when we're just trying to avoid unnecessary 
work, but there's something else going on here we need to understand 
first.



expand_compound_operation() wants to evaluate the sign- and 
zero-extension of MEM. It begins by calling gen_lowpart(), which returns 
the same result in both cases: a paradoxical subreg of a MEM (PSoM). 
What is the value of the high part of a PSoM? Can that high part be 
evaluated at compile time?


According to existing code, the high part of a PSoM is statically 0. 
However, for a machine where (WORD_REGISTER_OPERATIONS && load_extend_op 
(inner_mode) == SIGN_EXTEND), the high part of a PSoM is  only known at 
runtime as 0s or 1s. That's the downstream bug. The fix for such 
machines is either (A) forbid static evaluation of the high part of a 
PSoM, or (B) forbid transforming (SIGN_EXTEND (MEM ...) ) into a PSoM. 
My patch does B. Perhaps you prefer A? The trouble with A is that in the 
zero-extend case, it is valid to statically evaluate as 0. It is only 
the sign-extend case that isn't known until runtime. By the time we have 
a PSoM, its upstream ancestor as sign- or zero-extend is already lost.


Does that give you the understanding you desire, or are there deeper 
mysteries to probe?


G



Re: [PATCH] combine: Don't optimize SIGN_EXTEND of MEM on WORD_REGISTER_OPERATIONS targets [PR113010]

2024-01-18 Thread Jeff Law




On 1/17/24 20:53, Greg McGary wrote:
On Tue, Jan 16, 2024 at 11:44 PM Richard Biener 
mailto:richard.guent...@gmail.com>> wrote:


 > On Tue, Jan 16, 2024 at 11:20 PM Greg McGary > wrote:


 > >

 > > The sign bit of a sign-extending load cannot be known until runtime,

 > > so don't attempt to simplify it in the combiner.

 >
 > It feels like this papers over an issue downstream?

While the code comment is true, perhaps it obscures the primary intent,
which is recognition that the pattern (SIGN_EXTEND (mem ...) ) is destined
to expand into a single memory-load instruction and no simplification is
possible, so why waste time with further analysis or transformation? There
are plenty of other conditions that also short circuit to "do nothing" and
this seems just as straightforward as those others. Efforts to catch this
further downstream add gratuitous complexity.
Because the real bug is likely still lurking, waiting for something else 
to trigger it.


An early exit is fine when we're just trying to avoid unnecessary work, 
but there's something else going on here we need to understand first.


jeff


Re: [PATCH] combine: Don't optimize SIGN_EXTEND of MEM on WORD_REGISTER_OPERATIONS targets [PR113010]

2024-01-17 Thread Greg McGary
On Tue, Jan 16, 2024 at 11:44 PM Richard Biener 
wrote:

> > On Tue, Jan 16, 2024 at 11:20 PM Greg McGary  wrote:

> > >

> > > The sign bit of a sign-extending load cannot be known until runtime,

> > > so don't attempt to simplify it in the combiner.

> >
> It feels like this papers over an issue downstream?

While the code comment is true, perhaps it obscures the primary intent,
which is recognition that the pattern (SIGN_EXTEND (mem ...) ) is destined
to expand into a single memory-load instruction and no simplification is
possible, so why waste time with further analysis or transformation? There
are plenty of other conditions that also short circuit to "do nothing" and
this seems just as straightforward as those others. Efforts to catch this
further downstream add gratuitous complexity.

G


Re: [PATCH] combine: Don't optimize SIGN_EXTEND of MEM on WORD_REGISTER_OPERATIONS targets [PR113010]

2024-01-16 Thread YunQiang Su
Greg McGary  于2024年1月17日周三 06:20写道:
>
> The sign bit of a sign-extending load cannot be known until runtime,
> so don't attempt to simplify it in the combiner.
>
> 2024-01-11  Greg McGary  
>
> PR rtl-optimization/113010
> * combine.cc (expand_compound_operation): Don't simplify
> SIGN_EXTEND of a MEM on WORD_REGISTER_OPERATIONS targets
>
> * gcc.c-torture/execute/pr113010.c: New test.
> ---
>  gcc/combine.cc | 5 +
>  gcc/testsuite/gcc.c-torture/execute/pr113010.c | 9 +
>  2 files changed, 14 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr113010.c
>
> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index 812553c091e..ba587184dfc 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -7208,6 +7208,11 @@ expand_compound_operation (rtx x)
>if (len == 0)
> return x;
>
> +  /* Sign-extending loads can never be simplified at compile time.  */
> +  if (WORD_REGISTER_OPERATIONS && MEM_P (XEXP (x, 0))
> + && load_extend_op (inner_mode) == SIGN_EXTEND)
> +   return x;
> +
>break;
>
>  case ZERO_EXTRACT:
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr113010.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr113010.c
> new file mode 100644
> index 000..a95c613c1df
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr113010.c
> @@ -0,0 +1,9 @@
> +int minus_1 = -1;
> +
> +int
> +main ()
> +{
> +  if ((0, 0xul) >= minus_1)

There is a warning option:

-Wsign-compare
Warn when a comparison between signed and unsigned values could
produce an incorrect result when the signed value is converted to unsigned.

> +__builtin_abort ();
> +  return 0;
> +}
> --
> 2.34.1
>


-- 
YunQiang Su


Re: [PATCH] combine: Don't optimize SIGN_EXTEND of MEM on WORD_REGISTER_OPERATIONS targets [PR113010]

2024-01-16 Thread Richard Biener
On Tue, Jan 16, 2024 at 11:20 PM Greg McGary  wrote:
>
> The sign bit of a sign-extending load cannot be known until runtime,
> so don't attempt to simplify it in the combiner.

It feels like this papers over an issue downstream?

> 2024-01-11  Greg McGary  
>
> PR rtl-optimization/113010
> * combine.cc (expand_compound_operation): Don't simplify
> SIGN_EXTEND of a MEM on WORD_REGISTER_OPERATIONS targets
>
> * gcc.c-torture/execute/pr113010.c: New test.
> ---
>  gcc/combine.cc | 5 +
>  gcc/testsuite/gcc.c-torture/execute/pr113010.c | 9 +
>  2 files changed, 14 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr113010.c
>
> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index 812553c091e..ba587184dfc 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -7208,6 +7208,11 @@ expand_compound_operation (rtx x)
>if (len == 0)
> return x;
>
> +  /* Sign-extending loads can never be simplified at compile time.  */
> +  if (WORD_REGISTER_OPERATIONS && MEM_P (XEXP (x, 0))
> + && load_extend_op (inner_mode) == SIGN_EXTEND)
> +   return x;
> +
>break;
>
>  case ZERO_EXTRACT:
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr113010.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr113010.c
> new file mode 100644
> index 000..a95c613c1df
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr113010.c
> @@ -0,0 +1,9 @@
> +int minus_1 = -1;
> +
> +int
> +main ()
> +{
> +  if ((0, 0xul) >= minus_1)
> +__builtin_abort ();
> +  return 0;
> +}
> --
> 2.34.1
>


[PATCH] combine: Don't optimize SIGN_EXTEND of MEM on WORD_REGISTER_OPERATIONS targets [PR113010]

2024-01-16 Thread Greg McGary
The sign bit of a sign-extending load cannot be known until runtime,
so don't attempt to simplify it in the combiner.

2024-01-11  Greg McGary  

PR rtl-optimization/113010
* combine.cc (expand_compound_operation): Don't simplify
SIGN_EXTEND of a MEM on WORD_REGISTER_OPERATIONS targets

* gcc.c-torture/execute/pr113010.c: New test.
---
 gcc/combine.cc | 5 +
 gcc/testsuite/gcc.c-torture/execute/pr113010.c | 9 +
 2 files changed, 14 insertions(+)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr113010.c

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 812553c091e..ba587184dfc 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -7208,6 +7208,11 @@ expand_compound_operation (rtx x)
   if (len == 0)
return x;
 
+  /* Sign-extending loads can never be simplified at compile time.  */
+  if (WORD_REGISTER_OPERATIONS && MEM_P (XEXP (x, 0))
+ && load_extend_op (inner_mode) == SIGN_EXTEND)
+   return x;
+
   break;
 
 case ZERO_EXTRACT:
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr113010.c 
b/gcc/testsuite/gcc.c-torture/execute/pr113010.c
new file mode 100644
index 000..a95c613c1df
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr113010.c
@@ -0,0 +1,9 @@
+int minus_1 = -1;
+
+int
+main ()
+{
+  if ((0, 0xul) >= minus_1)
+__builtin_abort ();
+  return 0;
+}
-- 
2.34.1