[Bug rtl-optimization/110206] [14 Regression] wrong code with -Os -march=cascadelake since r14-1246

2023-07-14 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110206

Uroš Bizjak  changed:

   What|Removed |Added

 Resolution|--- |FIXED
   Target Milestone|14.0|12.4
 Status|ASSIGNED|RESOLVED

--- Comment #20 from Uroš Bizjak  ---
Fixed for gcc-12.4+.

[Bug rtl-optimization/110206] [14 Regression] wrong code with -Os -march=cascadelake since r14-1246

2023-07-14 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110206

--- Comment #19 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:eeb8e9a36d7aa9bc4ac8b0d7abe1e84e9afc4250

commit r12-9774-geeb8e9a36d7aa9bc4ac8b0d7abe1e84e9afc4250
Author: Uros Bizjak 
Date:   Fri Jul 14 11:46:22 2023 +0200

cprop: Do not set REG_EQUAL note when simplifying paradoxical subreg
[PR110206]

cprop1 pass does not consider paradoxical subreg and for (insn 22) claims
that it equals 8 elements of HImodeby setting REG_EQUAL note:

(insn 21 19 22 4 (set (reg:V4QI 98)
(mem/u/c:V4QI (symbol_ref/u:DI ("*.LC1") [flags 0x2]) [0  S4 A32]))
"pr110206.c":12:42 1530 {*movv4qi_internal}
 (expr_list:REG_EQUAL (const_vector:V4QI [
(const_int -52 [0xffcc]) repeated x4
])
(nil)))
(insn 22 21 23 4 (set (reg:V8HI 100)
(zero_extend:V8HI (vec_select:V8QI (subreg:V16QI (reg:V4QI 98) 0)
(parallel [
(const_int 0 [0])
(const_int 1 [0x1])
(const_int 2 [0x2])
(const_int 3 [0x3])
(const_int 4 [0x4])
(const_int 5 [0x5])
(const_int 6 [0x6])
(const_int 7 [0x7])
] "pr110206.c":12:42 7471
{sse4_1_zero_extendv8qiv8hi2}
 (expr_list:REG_EQUAL (const_vector:V8HI [
(const_int 204 [0xcc]) repeated x8
])
(expr_list:REG_DEAD (reg:V4QI 98)
(nil

We rely on the "undefined" vals to have a specific value (from the earlier
REG_EQUAL note) but actual code generation doesn't ensure this (it doesn't
need to).  That said, the issue isn't the constant folding per-se but that
we do not actually constant fold but register an equality that doesn't
hold.

PR target/110206

gcc/ChangeLog:

* fwprop.cc (contains_paradoxical_subreg_p): Move to ...
* rtlanal.cc (contains_paradoxical_subreg_p): ... here.
* rtlanal.h (contains_paradoxical_subreg_p): Add prototype.
* cprop.cc (try_replace_reg): Do not set REG_EQUAL note
when the original source contains a paradoxical subreg.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110206.c: New test.

(cherry picked from commit 1815e313a8fb519a77c94a908eb6dafc4ce51ffe)

[Bug rtl-optimization/110206] [14 Regression] wrong code with -Os -march=cascadelake since r14-1246

2023-07-14 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110206

--- Comment #18 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:bef95ba085b0ae9bf3eb79a8eed685236d773116

commit r13-7565-gbef95ba085b0ae9bf3eb79a8eed685236d773116
Author: Uros Bizjak 
Date:   Fri Jul 14 11:46:22 2023 +0200

cprop: Do not set REG_EQUAL note when simplifying paradoxical subreg
[PR110206]

cprop1 pass does not consider paradoxical subreg and for (insn 22) claims
that it equals 8 elements of HImodeby setting REG_EQUAL note:

(insn 21 19 22 4 (set (reg:V4QI 98)
(mem/u/c:V4QI (symbol_ref/u:DI ("*.LC1") [flags 0x2]) [0  S4 A32]))
"pr110206.c":12:42 1530 {*movv4qi_internal}
 (expr_list:REG_EQUAL (const_vector:V4QI [
(const_int -52 [0xffcc]) repeated x4
])
(nil)))
(insn 22 21 23 4 (set (reg:V8HI 100)
(zero_extend:V8HI (vec_select:V8QI (subreg:V16QI (reg:V4QI 98) 0)
(parallel [
(const_int 0 [0])
(const_int 1 [0x1])
(const_int 2 [0x2])
(const_int 3 [0x3])
(const_int 4 [0x4])
(const_int 5 [0x5])
(const_int 6 [0x6])
(const_int 7 [0x7])
] "pr110206.c":12:42 7471
{sse4_1_zero_extendv8qiv8hi2}
 (expr_list:REG_EQUAL (const_vector:V8HI [
(const_int 204 [0xcc]) repeated x8
])
(expr_list:REG_DEAD (reg:V4QI 98)
(nil

We rely on the "undefined" vals to have a specific value (from the earlier
REG_EQUAL note) but actual code generation doesn't ensure this (it doesn't
need to).  That said, the issue isn't the constant folding per-se but that
we do not actually constant fold but register an equality that doesn't
hold.

PR target/110206

gcc/ChangeLog:

* fwprop.cc (contains_paradoxical_subreg_p): Move to ...
* rtlanal.cc (contains_paradoxical_subreg_p): ... here.
* rtlanal.h (contains_paradoxical_subreg_p): Add prototype.
* cprop.cc (try_replace_reg): Do not set REG_EQUAL note
when the original source contains a paradoxical subreg.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110206.c: New test.

(cherry picked from commit 1815e313a8fb519a77c94a908eb6dafc4ce51ffe)

[Bug rtl-optimization/110206] [14 Regression] wrong code with -Os -march=cascadelake since r14-1246

2023-07-14 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110206

--- Comment #17 from CVS Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:1815e313a8fb519a77c94a908eb6dafc4ce51ffe

commit r14-2525-g1815e313a8fb519a77c94a908eb6dafc4ce51ffe
Author: Uros Bizjak 
Date:   Fri Jul 14 11:46:22 2023 +0200

cprop: Do not set REG_EQUAL note when simplifying paradoxical subreg
[PR110206]

cprop1 pass does not consider paradoxical subreg and for (insn 22) claims
that it equals 8 elements of HImodeby setting REG_EQUAL note:

(insn 21 19 22 4 (set (reg:V4QI 98)
(mem/u/c:V4QI (symbol_ref/u:DI ("*.LC1") [flags 0x2]) [0  S4 A32]))
"pr110206.c":12:42 1530 {*movv4qi_internal}
 (expr_list:REG_EQUAL (const_vector:V4QI [
(const_int -52 [0xffcc]) repeated x4
])
(nil)))
(insn 22 21 23 4 (set (reg:V8HI 100)
(zero_extend:V8HI (vec_select:V8QI (subreg:V16QI (reg:V4QI 98) 0)
(parallel [
(const_int 0 [0])
(const_int 1 [0x1])
(const_int 2 [0x2])
(const_int 3 [0x3])
(const_int 4 [0x4])
(const_int 5 [0x5])
(const_int 6 [0x6])
(const_int 7 [0x7])
] "pr110206.c":12:42 7471
{sse4_1_zero_extendv8qiv8hi2}
 (expr_list:REG_EQUAL (const_vector:V8HI [
(const_int 204 [0xcc]) repeated x8
])
(expr_list:REG_DEAD (reg:V4QI 98)
(nil

We rely on the "undefined" vals to have a specific value (from the earlier
REG_EQUAL note) but actual code generation doesn't ensure this (it doesn't
need to).  That said, the issue isn't the constant folding per-se but that
we do not actually constant fold but register an equality that doesn't
hold.

PR target/110206

gcc/ChangeLog:

* fwprop.cc (contains_paradoxical_subreg_p): Move to ...
* rtlanal.cc (contains_paradoxical_subreg_p): ... here.
* rtlanal.h (contains_paradoxical_subreg_p): Add prototype.
* cprop.cc (try_replace_reg): Do not set REG_EQUAL note
when the original source contains a paradoxical subreg.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110206.c: New test.

[Bug rtl-optimization/110206] [14 Regression] wrong code with -Os -march=cascadelake since r14-1246

2023-07-14 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110206

--- Comment #16 from Uroš Bizjak  ---
v2 patch at [1].

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624491.html

[Bug rtl-optimization/110206] [14 Regression] wrong code with -Os -march=cascadelake since r14-1246

2023-07-13 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110206

--- Comment #15 from Uroš Bizjak  ---
Created attachment 55537
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55537=edit
Proposed patch.

v2 patch in testing.

This version prevents emission of invalid REG_EQUAL note in
cprop.cc/try_replace_reg when original, non-simplified RTX contains SUBREG. The
patch is in effect an one-liner:

@@ -795,7 +796,8 @@ try_replace_reg (rtx from, rtx to, rtx_insn *insn)
   /* If we've failed perform the replacement, have a single SET to
 a REG destination and don't yet have a note, add a REG_EQUAL note
 to not lose information.  */
-  if (!success && note == 0 && set != 0 && REG_P (SET_DEST (set)))
+  if (!success && note == 0 && set != 0 && REG_P (SET_DEST (set))
+ && !contains_paradoxical_subreg_p (SET_SRC (set)))
note = set_unique_reg_note (insn, REG_EQUAL, copy_rtx (src));
 }

but we have to move contains_paradoxical_subreg_p to rtlanal.cc.

[Bug rtl-optimization/110206] [14 Regression] wrong code with -Os -march=cascadelake since r14-1246

2023-07-10 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110206

--- Comment #14 from Uroš Bizjak  ---
(In reply to Uroš Bizjak from comment #10)
> (In reply to Uroš Bizjak from comment #9)
> > and simplify_replace_rtx simplifies the above to:
> > 
> > (gdb) p debug_rtx (src)
> > (const_vector:V8HI [
> > (const_int 204 [0xcc]) repeated x8
> > ])
> 
> Patched compiler simplifies to:
> 
> (gdb) p debug_rtx (src)
> (const_vector:V8HI [
> (const_int 204 [0xcc]) repeated x4
> (const_int 0 [0]) repeated x4
> ])

The patched compiler puts the above in REG_EQUAL note. While the value is "more
correct", I don't think the compiler has the right to set REG_EQUAL note when
the top 4 bytes are actually undefined (as a result of an operation with an
undefined input, which is the case with paradoxical subreg).

[Bug rtl-optimization/110206] [14 Regression] wrong code with -Os -march=cascadelake since r14-1246

2023-07-10 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110206

--- Comment #13 from Uroš Bizjak  ---
(In reply to Richard Biener from comment #12)
> I can see cprop1 adds the REG_EQUAL note:
> 
> (insn 22 21 23 4 (set (reg:V8HI 100)
> (zero_extend:V8HI (vec_select:V8QI (subreg:V16QI (reg:V4QI 98) 0)
> (parallel [
> (const_int 0 [0])
> (const_int 1 [0x1])
> (const_int 2 [0x2])
> (const_int 3 [0x3])
> (const_int 4 [0x4])
> (const_int 5 [0x5])
>  (const_int 6 [0x6])
>  (const_int 7 [0x7])
>  ] "t.c":12:42 7557 {sse4_1_zero_extendv8qiv8hi2}
> - (expr_list:REG_DEAD (reg:V4QI 98)
> -(nil)))
> + (expr_list:REG_EQUAL (const_vector:V8HI [
> +(const_int 204 [0xcc]) repeated x8
> +])
> +(expr_list:REG_DEAD (reg:V4QI 98)
> +(nil
> 
> but I don't see yet what the actual wrong transform based on this REG_EQUAL
> note is?

We constant fold V4QImode const_vector to a V8HImode const_vector with 8
defined elements. We started with undefined top four bytes, but now we
magically define them.

> 
> It looks like we CSE the above with
> 
> -   46: r122:V8QI=[`*.LC3']
> -  REG_EQUAL const_vector
> -   48: r125:V8HI=zero_extend(vec_select(r122:V8QI#0,parallel))
> -  REG_EQUAL const_vector
> -  REG_DEAD r122:V8QI
> -   49: r126:V8HI=r124:V8HI*r125:V8HI
> -  REG_DEAD r125:V8HI
> +   49: r126:V8HI=r124:V8HI*r100:V8HI
> 
> but otherwise do nothing.  So the issue is that we rely on the "undefined"
> vals to have a specific value (from the earlier REG_EQUAL note) but actual
> code generation doesn't ensure this (it doesn't need to).  That said,
> the issue isn't the constant folding per-se but that we do not actually
> constant fold but register an equality that doesn't hold.

The above CSE is the consequence of REG_EQUAL note that compiler set on the
insn. Compiler claims that the value of (insn 22) equals an array of 8 consts {
204 , ... , 204 }, but in reality (c.f. Comment #3) the value in the register
%xmm4 before VPMULLW insn is { 0, 0, 0, 0, 204, 204, 204, 204 }.

[Bug rtl-optimization/110206] [14 Regression] wrong code with -Os -march=cascadelake since r14-1246

2023-07-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110206

--- Comment #12 from Richard Biener  ---
I can see cprop1 adds the REG_EQUAL note:

(insn 22 21 23 4 (set (reg:V8HI 100)
(zero_extend:V8HI (vec_select:V8QI (subreg:V16QI (reg:V4QI 98) 0)
(parallel [
(const_int 0 [0])
(const_int 1 [0x1])
(const_int 2 [0x2])
(const_int 3 [0x3])
(const_int 4 [0x4])
(const_int 5 [0x5])
 (const_int 6 [0x6])
 (const_int 7 [0x7])
 ] "t.c":12:42 7557 {sse4_1_zero_extendv8qiv8hi2}
- (expr_list:REG_DEAD (reg:V4QI 98)
-(nil)))
+ (expr_list:REG_EQUAL (const_vector:V8HI [
+(const_int 204 [0xcc]) repeated x8
+])
+(expr_list:REG_DEAD (reg:V4QI 98)
+(nil

but I don't see yet what the actual wrong transform based on this REG_EQUAL
note is?

It looks like we CSE the above with

-   46: r122:V8QI=[`*.LC3']
-  REG_EQUAL const_vector
-   48: r125:V8HI=zero_extend(vec_select(r122:V8QI#0,parallel))
-  REG_EQUAL const_vector
-  REG_DEAD r122:V8QI
-   49: r126:V8HI=r124:V8HI*r125:V8HI
-  REG_DEAD r125:V8HI
+   49: r126:V8HI=r124:V8HI*r100:V8HI

but otherwise do nothing.  So the issue is that we rely on the "undefined"
vals to have a specific value (from the earlier REG_EQUAL note) but actual
code generation doesn't ensure this (it doesn't need to).  That said,
the issue isn't the constant folding per-se but that we do not actually
constant fold but register an equality that doesn't hold.

[Bug rtl-optimization/110206] [14 Regression] wrong code with -Os -march=cascadelake since r14-1246

2023-07-09 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110206

Uroš Bizjak  changed:

   What|Removed |Added

   Keywords|needs-bisection |
   Assignee|unassigned at gcc dot gnu.org  |ubizjak at gmail dot com
 Status|NEW |ASSIGNED

--- Comment #11 from Uroš Bizjak  ---
Patch at [1].

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623933.html

[Bug rtl-optimization/110206] [14 Regression] wrong code with -Os -march=cascadelake since r14-1246

2023-07-08 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110206

--- Comment #10 from Uroš Bizjak  ---
(In reply to Uroš Bizjak from comment #9)
> and simplify_replace_rtx simplifies the above to:
> 
> (gdb) p debug_rtx (src)
> (const_vector:V8HI [
> (const_int 204 [0xcc]) repeated x8
> ])

Patched compiler simplifies to:

(gdb) p debug_rtx (src)
(const_vector:V8HI [
(const_int 204 [0xcc]) repeated x4
(const_int 0 [0]) repeated x4
])

[Bug rtl-optimization/110206] [14 Regression] wrong code with -Os -march=cascadelake since r14-1246

2023-07-08 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110206

--- Comment #9 from Uroš Bizjak  ---
Some more digging through the code:

In cprop.cc/try_replace_reg, we try to simplify the source of the set given our
substitution:

Breakpoint 1, try_replace_reg (from=0x7fffe9f0b7f8, to=0x7fffe9f099e0,
insn=0x7fffea01b6c0) at ../../git/gcc/gcc/cprop.cc:789
789   src = simplify_replace_rtx (SET_SRC (set), from, to);

(gdb) list
784   if (!success && set && reg_mentioned_p (from, SET_SRC (set)))
785 {
786   /* If above failed and this is a single set, try to simplify the
source
787  of the set given our substitution.  We could perhaps try this
for
788  multiple SETs, but it probably won't buy us anything.  */
789   src = simplify_replace_rtx (SET_SRC (set), from, to);

(gdb) p debug_rtx (set)
(set (reg:V8HI 100)
(zero_extend:V8HI (vec_select:V8QI (subreg:V16QI (reg:V4QI 98) 0)
(parallel [
(const_int 0 [0])
(const_int 1 [0x1])
(const_int 2 [0x2])
(const_int 3 [0x3])
(const_int 4 [0x4])
(const_int 5 [0x5])
(const_int 6 [0x6])
(const_int 7 [0x7])
]

(gdb) p debug_rtx (from)
(reg:V4QI 98)

(gdb) p debug_rtx (to)
(const_vector:V4QI [
(const_int -52 [0xffcc]) repeated x4
])

and simplify_replace_rtx simplifies the above to:

(gdb) p debug_rtx (src)
(const_vector:V8HI [
(const_int 204 [0xcc]) repeated x8
])

which is obviously wrong, we have V4QImode input register holding V4QImode
constant.

Tracing through simplify-rtx.cc brings us to a recursive
simplify_replace_fn_rtx, which gets us to:

Breakpoint 1, simplify_replace_fn_rtx (x=0x7fffe9f0b888,
old_rtx=0x7fffe9f0b7f8, fn=0x0, data=0x7fffe9f099e0) at
../../git/gcc/gcc/simplify-rtx.cc:474
474   op0 = simplify_gen_subreg (GET_MODE (x), op0,

(gdb) list
469   if (code == SUBREG)
470 {
471   op0 = simplify_replace_fn_rtx (SUBREG_REG (x), old_rtx, fn,
data);
472   if (op0 == SUBREG_REG (x))
473 return x;
474   op0 = simplify_gen_subreg (GET_MODE (x), op0,
475  GET_MODE (SUBREG_REG (x)),
476  SUBREG_BYTE (x));
477   return op0 ? op0 : x;
478 }

(gdb) p debug_rtx (op0)
(const_vector:V4QI [
(const_int -52 [0xffcc]) repeated x4
])
(gdb) p debug_rtx (x)
(subreg:V16QI (reg:V4QI 98) 0)

and simplify_gen_subreg with the above arguments returns:

(gdb) p debug_rtx (op0)
(const_vector:V16QI [
(const_int -52 [0xffcc]) repeated x16
])

No way! It is not possible to get V16QImode vector from V4QImode vector, even
when all elements are duplicates.

Tracing even deeper to simplify_context::simplify_subreg, we found the
following:

Breakpoint 1, simplify_context::simplify_subreg (this=0x7fffd528,
outermode=E_V16QImode, op=0x7fffe9f099e0, innermode=E_V4QImode, byte=...)
at ../../git/gcc/gcc/simplify-rtx.cc:7561
7561return gen_vec_duplicate (outermode, elt);

(gdb) list
7556  rtx elt;
7557
7558  if (VECTOR_MODE_P (outermode)
7559  && GET_MODE_INNER (outermode) == GET_MODE_INNER (innermode)
7560  && vec_duplicate_p (op, ))
7561return gen_vec_duplicate (outermode, elt);
7562
7563  if (outermode == GET_MODE_INNER (innermode)
7564  && vec_duplicate_p (op, ))
7565return elt;

(gdb) p outermode
$1 = E_V16QImode
(gdb) p debug_rtx (elt)
(const_int -52 [0xffcc])

(gdb) fin
Run till exit from #0  simplify_context::simplify_subreg (this=0x7fffd528,
outermode=E_V16QImode, op=0x7fffe9f099e0, innermode=E_V4QImode, byte=...)
at ../../git/gcc/gcc/simplify-rtx.cc:7561
0x00eb24d3 in simplify_subreg (byte=..., innermode=E_V4QImode,
op=, outermode=) at ../../git/gcc/gcc/rtl.h:3513
3513  return simplify_context ().simplify_subreg (outermode, op, innermode,
byte);
Value returned is $4 = (rtx_def *) 0x7fffe9f09c10

(gdb) p debug_rtx ($4)
(const_vector:V16QI [
(const_int -52 [0xffcc]) repeated x16
])

Nope. This transformation is valid only for non-paradoxical registers.

Patch is then obvious:

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index d7315d82aa3..87ca25086dc 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -7557,6 +7557,7 @@ simplify_context::simplify_subreg (machine_mode
outermode, rtx op,

   if (VECTOR_MODE_P (outermode)
  && GET_MODE_INNER (outermode) == GET_MODE_INNER (innermode)
+ && !paradoxical_subreg_p (outermode, innermode)
  && vec_duplicate_p (op, ))
return gen_vec_duplicate (outermode, elt);

[Bug rtl-optimization/110206] [14 Regression] wrong code with -Os -march=cascadelake since r14-1246

2023-07-08 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110206

--- Comment #8 from Uroš Bizjak  ---
The testcase needs __attribute__((noinline)) to supress unwanted constant
propagation with recent gcc.

void
__attribute__((noinline))
foo (U u, u16 c, V *r)
...

[Bug rtl-optimization/110206] [14 Regression] wrong code with -Os -march=cascadelake since r14-1246

2023-06-12 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110206

--- Comment #7 from Uroš Bizjak  ---
(In reply to Uroš Bizjak from comment #4)
> cprop1 pass does not consider paradoxical subreg and for (insn 22) claims
> that it equals 8 elements of QImode:

8 elements of HImode.

[Bug rtl-optimization/110206] [14 Regression] wrong code with -Os -march=cascadelake since r14-1246

2023-06-12 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110206

--- Comment #6 from Uroš Bizjak  ---
(In reply to Uroš Bizjak from comment #3)

> However, VPMULLW needs all 8 QImode elements, but %xmm4 only has 4 loaded;

To be consistent, VPSRLVW and VPMULLW use HImode elements.

[Bug rtl-optimization/110206] [14 Regression] wrong code with -Os -march=cascadelake since r14-1246

2023-06-12 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110206

Uroš Bizjak  changed:

   What|Removed |Added

  Component|target  |rtl-optimization

--- Comment #5 from Uroš Bizjak  ---
Recategorized as generic RTL optimization problem.