[Bug middle-end/115551] [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a"

2024-06-20 Thread burnus at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115551

--- Comment #6 from Tobias Burnus  ---
Crossref: New Bug 11 is for the range analysis to deduce from 'x << a' that
'a' must be nonnegative.

[Bug middle-end/115551] [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a"

2024-06-20 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115551

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #5 from Jakub Jelinek  ---
We should also verify that it doesn't stand in the way of shift sanitization,
because
unsigned int a;
...
1 << (5 + a);
is well defined only for a in [0, 26] while once we optimize it to
(1 << 5) << a;
that would be well defined for a in [0, 31].  I think the shift sanitization is
done early, so just something to be verified in a testcase next to the patch.

[Bug middle-end/115551] [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a"

2024-06-20 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115551

--- Comment #4 from Richard Biener  ---
all same for right-shifts (both logical and arithmetic).

Note that 1 << (a + 5) might be cheaper than (1<<5) << a due to constraints
on immediates but for GIMPLE the latter is definitely more canonical.

[Bug middle-end/115551] [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a"

2024-06-20 Thread burnus at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115551

--- Comment #3 from Tobias Burnus  ---
As we want to have a >= 0, I tried to convey it differently for the example in
comment 0:

(a) __attribute__((assume(ch >= 0)));
(b) 'unsigned ch' (instead of 'int ch')

but it didn't help.  Thus, it looks as if it could be implemented for:
  c1 << (a + c2)
if (a >= 0 and c2 >= 0) doing then the conversion
  (c1 << c2) << a

Whether Ranger handles abs(c2) > c3 >= 0 → a >= 0 for 'c1 << (a*c2 + c3)', I
don't know, but the variant above with unsigned and 'assume(a >= 0)' should be
implementable and seems to make sense.

[Bug middle-end/115551] [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a"

2024-06-20 Thread burnus at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115551

--- Comment #2 from Tobias Burnus  ---
> Thus we need some range info to do this optimization.

Good point.

It seems as if for c1 << (c2 * a + c3),  C requires a >= -c3/c2 (read as float
division; c2 ≠ 0)

And the suggested optimization requires c2*a >= 0 and c3 >= 0 to fulfill C
requirement of nonnegative shifts.

Thus, this is fulfilled for any value of 'a' if c3 >= 0 and abs(c2) > c3.


The optimization can also be done for any value of 'a', if the hardware
supports c1 << (negative value)  (as right shift, fillung with zeros) and
popcount(c1) == popcount(c1 << c3).


The first condition is fulfilled in this example.

I don't know about the second, but observed that Clang/LLVM optimizes the diff
mask1-mask2 to 0 on ARM but not x86_64 (not checked why nor whether ARM handles
negative shifts in a well-defined way or not).

[Bug middle-end/115551] [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a"

2024-06-19 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115551

Xi Ruoyao  changed:

   What|Removed |Added

 CC||xry111 at gcc dot gnu.org

--- Comment #1 from Xi Ruoyao  ---
"c1 << (-5 + 5)" is fine but "(c1 << -5) << 5" invokes undefined behavior. 
Thus we need some range info to do this optimization.