[Bug tree-optimization/110557] [13/14 Regression] Wrong code for x86_64-linux-gnu with -O3 -mavx2: vectorized loop mishandles signed bit-fields

2023-07-11 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110557

--- Comment #10 from Xi Ruoyao  ---
(In reply to Xi Ruoyao from comment #9)
> Fixed for 14 and 13.2.

There is a test suite issue in the committed patch, fixed at r14-2427 and
r13-7555.

[Bug tree-optimization/110557] [13/14 Regression] Wrong code for x86_64-linux-gnu with -O3 -mavx2: vectorized loop mishandles signed bit-fields

2023-07-10 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110557

Xi Ruoyao  changed:

   What|Removed |Added

   Target Milestone|13.3|13.2
 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Xi Ruoyao  ---
Fixed for 14 and 13.2.

[Bug tree-optimization/110557] [13/14 Regression] Wrong code for x86_64-linux-gnu with -O3 -mavx2: vectorized loop mishandles signed bit-fields

2023-07-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110557

--- Comment #8 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Xi Ruoyao :

https://gcc.gnu.org/g:1e6a948cd22f2f142cdc828296f78c7af9e283c8

commit r13-7553-g1e6a948cd22f2f142cdc828296f78c7af9e283c8
Author: Xi Ruoyao 
Date:   Thu Jul 6 23:08:57 2023 +0800

vect: Fix vectorized BIT_FIELD_REF for signed bit-fields [PR110557]

If a bit-field is signed and it's wider than the output type, we must
ensure the extracted result sign-extended.  But this was not handled
correctly.

For example:

int x : 8;
long y : 55;
bool z : 1;

The vectorized extraction of y was:

vect__ifc__49.29_110 =
  MEM  [(struct Item *)vectp_a.27_108];
vect_patt_38.30_112 =
  vect__ifc__49.29_110 & { 9223372036854775552, 9223372036854775552 };
vect_patt_39.31_113 = vect_patt_38.30_112 >> 8;
vect_patt_40.32_114 =
  VIEW_CONVERT_EXPR(vect_patt_39.31_113);

This is obviously incorrect.  This pach has implemented it as:

vect__ifc__25.16_62 =
  MEM  [(struct Item *)vectp_a.14_60];
vect_patt_31.17_63 =
  VIEW_CONVERT_EXPR(vect__ifc__25.16_62);
vect_patt_32.18_64 = vect_patt_31.17_63 << 1;
vect_patt_33.19_65 = vect_patt_32.18_64 >> 9;

gcc/ChangeLog:

PR tree-optimization/110557
* tree-vect-patterns.cc (vect_recog_bitfield_ref_pattern):
Ensure the output sign-extended if necessary.

gcc/testsuite/ChangeLog:

PR tree-optimization/110557
* g++.dg/vect/pr110557.cc: New test.

(cherry picked from commit 63ae6bc60c0f67fb2791991bf4b6e7e0a907d420)

[Bug tree-optimization/110557] [13/14 Regression] Wrong code for x86_64-linux-gnu with -O3 -mavx2: vectorized loop mishandles signed bit-fields

2023-07-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110557

--- Comment #7 from CVS Commits  ---
The master branch has been updated by Xi Ruoyao :

https://gcc.gnu.org/g:63ae6bc60c0f67fb2791991bf4b6e7e0a907d420

commit r14-2407-g63ae6bc60c0f67fb2791991bf4b6e7e0a907d420
Author: Xi Ruoyao 
Date:   Thu Jul 6 23:08:57 2023 +0800

vect: Fix vectorized BIT_FIELD_REF for signed bit-fields [PR110557]

If a bit-field is signed and it's wider than the output type, we must
ensure the extracted result sign-extended.  But this was not handled
correctly.

For example:

int x : 8;
long y : 55;
bool z : 1;

The vectorized extraction of y was:

vect__ifc__49.29_110 =
  MEM  [(struct Item *)vectp_a.27_108];
vect_patt_38.30_112 =
  vect__ifc__49.29_110 & { 9223372036854775552, 9223372036854775552 };
vect_patt_39.31_113 = vect_patt_38.30_112 >> 8;
vect_patt_40.32_114 =
  VIEW_CONVERT_EXPR(vect_patt_39.31_113);

This is obviously incorrect.  This pach has implemented it as:

vect__ifc__25.16_62 =
  MEM  [(struct Item *)vectp_a.14_60];
vect_patt_31.17_63 =
  VIEW_CONVERT_EXPR(vect__ifc__25.16_62);
vect_patt_32.18_64 = vect_patt_31.17_63 << 1;
vect_patt_33.19_65 = vect_patt_32.18_64 >> 9;

gcc/ChangeLog:

PR tree-optimization/110557
* tree-vect-patterns.cc (vect_recog_bitfield_ref_pattern):
Ensure the output sign-extended if necessary.

gcc/testsuite/ChangeLog:

PR tree-optimization/110557
* g++.dg/vect/pr110557.cc: New test.

[Bug tree-optimization/110557] [13/14 Regression] Wrong code for x86_64-linux-gnu with -O3 -mavx2: vectorized loop mishandles signed bit-fields

2023-07-06 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110557

--- Comment #6 from Xi Ruoyao  ---
(In reply to avieira from comment #5)
> Hi Xi,
> 
> Feel free to test your patch and submit it to the list for review. I had a
> look over and it looks correct to me.

https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623782.html

The changes from the version posted here:

1. Add a test case (I already made it sandwiched because a very first, not
posted version of the patch failed with sandwiched cases).
2. Slightly adjusted the comment.

There is another issue: if mask_width + shift_n == prec, we should omit the
AND_EXPR even for unsigned bit-field.  For example

movq$-256, %rax
vmovq   %rax, %xmm1
vpunpcklqdq %xmm1, %xmm1, %xmm1
vpand   (%rcx,%rdi,8), %xmm1, %xmm1
vpsrlq  $8, %xmm1, %xmm1

can be just

vmovdqu (%rcx,%rdi,8), %xmm1
vpsrlq  $8, %xmm1, %xmm1

But it's a different issue so we can fix it in a different patch.

[Bug tree-optimization/110557] [13/14 Regression] Wrong code for x86_64-linux-gnu with -O3 -mavx2: vectorized loop mishandles signed bit-fields

2023-07-06 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110557

Xi Ruoyao  changed:

   What|Removed |Added

URL||https://gcc.gnu.org/piperma
   ||il/gcc-patches/2023-July/62
   ||3782.html
   Keywords||patch
 Status|NEW |ASSIGNED

[Bug tree-optimization/110557] [13/14 Regression] Wrong code for x86_64-linux-gnu with -O3 -mavx2: vectorized loop mishandles signed bit-fields

2023-07-06 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110557

--- Comment #5 from avieira at gcc dot gnu.org ---
Hi Xi,

Feel free to test your patch and submit it to the list for review. I had a look
over and it looks correct to me.

I feel like it also addresses the cases where the bitfield is 'sandwiched'
like:
int x : 7;
ptrdiff_t y : 56;
long long z: 1;

As you left-shift it, and it also addresses the case where you have both
sign-extension and have to widen-it, because you still transform the type into
signed.

But it might be nice to add tests to cover those two, just in case someone
changes this.

In the future, if you do plan to work on something it would be nice to let
people know on the bugzilla ticket (preferably by assigning it to yourself) so
that multiple people don't end up working on the same thing, I had started to
write a patch, but wasn't as far as you and I like your approach :)

[Bug tree-optimization/110557] [13/14 Regression] Wrong code for x86_64-linux-gnu with -O3 -mavx2: vectorized loop mishandles signed bit-fields

2023-07-06 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110557

--- Comment #4 from Xi Ruoyao  ---
Untested patch:

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index de20e9d59cb..01df568ee61 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -2566,7 +2566,7 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
Widening with mask first, shift later:
container = (type_out) container;
masked = container & (((1 << bitsize) - 1) << bitpos);
-   result = patt2 >> masked;
+   result = masked >> bitpos;

Widening with shift first, mask last:
container = (type_out) container;
@@ -2578,6 +2578,15 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
result = masked >> bitpos;
result = (type_out) result;

+   If the bitfield is signed and the its width is greater than the width
+   of type_out, we need to perform a sign-extension:
+   container = (type) container;
+   masked = container << (prec - bitsize - bitpos);
+   result = (type_out) (masked >> (prec - bitsize));
+
+   Here type is the signed variant of the wider of type_out and the type
+   of container.
+
The shifting is always optional depending on whether bitpos != 0.

 */
@@ -2636,14 +2645,22 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo,
stmt_vec_info stmt_info,
   if (BYTES_BIG_ENDIAN)
 shift_n = prec - shift_n - mask_width;

+  bool sign_ext = !TYPE_UNSIGNED (TREE_TYPE (bf_ref)) &&
+ TYPE_PRECISION (ret_type) > mask_width;
+  bool widening = ((TYPE_PRECISION (TREE_TYPE (container)) <
+   TYPE_PRECISION (ret_type))
+  && !useless_type_conversion_p (TREE_TYPE (container),
+ ret_type));
+
   /* We move the conversion earlier if the loaded type is smaller than the
  return type to enable the use of widening loads.  */
-  if (TYPE_PRECISION (TREE_TYPE (container)) < TYPE_PRECISION (ret_type)
-  && !useless_type_conversion_p (TREE_TYPE (container), ret_type))
+  if (sign_ext || widening)
 {
-  pattern_stmt
-   = gimple_build_assign (vect_recog_temp_ssa_var (ret_type),
-  NOP_EXPR, container);
+  tree type = widening ? ret_type : container_type;
+  if (sign_ext)
+   type = gimple_signed_type (type);
+  pattern_stmt = gimple_build_assign (vect_recog_temp_ssa_var (type),
+ NOP_EXPR, container);
   container = gimple_get_lhs (pattern_stmt);
   container_type = TREE_TYPE (container);
   prec = tree_to_uhwi (TYPE_SIZE (container_type));
@@ -2671,7 +2688,7 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo,
stmt_vec_info stmt_info,
 shift_first = true;

   tree result;
-  if (shift_first)
+  if (shift_first && !sign_ext)
 {
   tree shifted = container;
   if (shift_n)
@@ -2694,14 +2711,27 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo,
stmt_vec_info stmt_info,
 }
   else
 {
-  tree mask = wide_int_to_tree (container_type,
-   wi::shifted_mask (shift_n, mask_width,
- false, prec));
-  pattern_stmt
-   = gimple_build_assign (vect_recog_temp_ssa_var (container_type),
-  BIT_AND_EXPR, container, mask);
-  tree masked = gimple_assign_lhs (pattern_stmt);
+  tree temp = vect_recog_temp_ssa_var (container_type);
+  if (!sign_ext)
+   {
+ tree mask = wide_int_to_tree (container_type,
+   wi::shifted_mask (shift_n,
+ mask_width,
+ false, prec));
+ pattern_stmt = gimple_build_assign (temp, BIT_AND_EXPR,
+ container, mask);
+   }
+  else
+   {
+ HOST_WIDE_INT shl = prec - shift_n - mask_width;
+ shift_n += shl;
+ pattern_stmt = gimple_build_assign (temp, LSHIFT_EXPR,
+ container,
+ build_int_cst (sizetype,
+shl));
+   }

+  tree masked = gimple_assign_lhs (pattern_stmt);
   append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
   pattern_stmt
= gimple_build_assign (vect_recog_temp_ssa_var (container_type),

[Bug tree-optimization/110557] [13/14 Regression] Wrong code for x86_64-linux-gnu with -O3 -mavx2: vectorized loop mishandles signed bit-fields

2023-07-06 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110557

--- Comment #3 from Xi Ruoyao  ---
It looks like if !UNSIGNED_TYPE (TREE_TYPE (bf_ref)), we need to generate
something like:

masked = (the signed variant of the wider type in {type_out, type_container})
container << (bitpos + bitsize);
result = (type_out) >> bitpos;

[Bug tree-optimization/110557] [13/14 Regression] Wrong code for x86_64-linux-gnu with -O3 -mavx2: vectorized loop mishandles signed bit-fields

2023-07-06 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110557

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 CC||avieira at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |avieira at gcc dot 
gnu.org

--- Comment #2 from avieira at gcc dot gnu.org ---
I'll have a look.

[Bug tree-optimization/110557] [13/14 Regression] Wrong code for x86_64-linux-gnu with -O3 -mavx2: vectorized loop mishandles signed bit-fields

2023-07-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110557

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2

[Bug tree-optimization/110557] [13/14 Regression] Wrong code for x86_64-linux-gnu with -O3 -mavx2: vectorized loop mishandles signed bit-fields

2023-07-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110557

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Keywords|needs-bisection |
   Last reconfirmed||2023-07-05
 Ever confirmed|0   |1
  Component|target  |tree-optimization

--- Comment #1 from Andrew Pinski  ---
ifcvt produces correctly the sign extend:

   _ifc__18;


  _3 = a_14(D) + _2;
  _ifc__22 = _3->D.3439;
  _ifc__18 = BIT_FIELD_REF <_ifc__22, 56, 8>;
  _5 = (long long int) _ifc__18;
  _6 = _5 * -4;
  _7 = (long unsigned int) _6;
  _15 = MAX_EXPR <_7, size_19>;

But then the vectorizer misses that:
```
  _ifc__18 = BIT_FIELD_REF <_ifc__22, 56, 8>;
  vect_patt_13.20_76 = vect__ifc__22.19_74 & { 18446744073709551360,
18446744073709551360, 18446744073709551360, 18446744073709551360 };
  vect_patt_10.21_77 = vect_patt_13.20_76 >> 8;
  vect_patt_9.22_78 = VIEW_CONVERT_EXPR(vect_patt_10.21_77);
  _5 = (long long int) _ifc__18;
  vect__6.23_80 = vect_patt_9.22_78 * { -4, -4, -4, -4 };
  _6 = _5 * -4;
```

vect_patt_9.22_78 should have been sign extended ...



With the C front-end, we get:
```
   _24;
   _23;

  _24 = _25->y;
  _23 = _24 * -4;
  _22 = (long unsigned int) _23;
```

Which the vectorizer does not understand could be prompted to 64bit and then
truncated so it does not vectorize it.


r13-3219-g25413fdb2ac24933214123e24ba16502 .