[Bug tree-optimization/110557] [13/14 Regression] Wrong code for x86_64-linux-gnu with -O3 -mavx2: vectorized loop mishandles signed bit-fields
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110557 --- Comment #10 from Xi Ruoyao --- (In reply to Xi Ruoyao from comment #9) > Fixed for 14 and 13.2. There is a test suite issue in the committed patch, fixed at r14-2427 and r13-7555.
[Bug tree-optimization/110557] [13/14 Regression] Wrong code for x86_64-linux-gnu with -O3 -mavx2: vectorized loop mishandles signed bit-fields
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110557 Xi Ruoyao changed: What|Removed |Added Target Milestone|13.3|13.2 Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #9 from Xi Ruoyao --- Fixed for 14 and 13.2.
[Bug tree-optimization/110557] [13/14 Regression] Wrong code for x86_64-linux-gnu with -O3 -mavx2: vectorized loop mishandles signed bit-fields
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110557 --- Comment #8 from CVS Commits --- The releases/gcc-13 branch has been updated by Xi Ruoyao : https://gcc.gnu.org/g:1e6a948cd22f2f142cdc828296f78c7af9e283c8 commit r13-7553-g1e6a948cd22f2f142cdc828296f78c7af9e283c8 Author: Xi Ruoyao Date: Thu Jul 6 23:08:57 2023 +0800 vect: Fix vectorized BIT_FIELD_REF for signed bit-fields [PR110557] If a bit-field is signed and it's wider than the output type, we must ensure the extracted result sign-extended. But this was not handled correctly. For example: int x : 8; long y : 55; bool z : 1; The vectorized extraction of y was: vect__ifc__49.29_110 = MEM [(struct Item *)vectp_a.27_108]; vect_patt_38.30_112 = vect__ifc__49.29_110 & { 9223372036854775552, 9223372036854775552 }; vect_patt_39.31_113 = vect_patt_38.30_112 >> 8; vect_patt_40.32_114 = VIEW_CONVERT_EXPR(vect_patt_39.31_113); This is obviously incorrect. This pach has implemented it as: vect__ifc__25.16_62 = MEM [(struct Item *)vectp_a.14_60]; vect_patt_31.17_63 = VIEW_CONVERT_EXPR(vect__ifc__25.16_62); vect_patt_32.18_64 = vect_patt_31.17_63 << 1; vect_patt_33.19_65 = vect_patt_32.18_64 >> 9; gcc/ChangeLog: PR tree-optimization/110557 * tree-vect-patterns.cc (vect_recog_bitfield_ref_pattern): Ensure the output sign-extended if necessary. gcc/testsuite/ChangeLog: PR tree-optimization/110557 * g++.dg/vect/pr110557.cc: New test. (cherry picked from commit 63ae6bc60c0f67fb2791991bf4b6e7e0a907d420)
[Bug tree-optimization/110557] [13/14 Regression] Wrong code for x86_64-linux-gnu with -O3 -mavx2: vectorized loop mishandles signed bit-fields
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110557 --- Comment #7 from CVS Commits --- The master branch has been updated by Xi Ruoyao : https://gcc.gnu.org/g:63ae6bc60c0f67fb2791991bf4b6e7e0a907d420 commit r14-2407-g63ae6bc60c0f67fb2791991bf4b6e7e0a907d420 Author: Xi Ruoyao Date: Thu Jul 6 23:08:57 2023 +0800 vect: Fix vectorized BIT_FIELD_REF for signed bit-fields [PR110557] If a bit-field is signed and it's wider than the output type, we must ensure the extracted result sign-extended. But this was not handled correctly. For example: int x : 8; long y : 55; bool z : 1; The vectorized extraction of y was: vect__ifc__49.29_110 = MEM [(struct Item *)vectp_a.27_108]; vect_patt_38.30_112 = vect__ifc__49.29_110 & { 9223372036854775552, 9223372036854775552 }; vect_patt_39.31_113 = vect_patt_38.30_112 >> 8; vect_patt_40.32_114 = VIEW_CONVERT_EXPR(vect_patt_39.31_113); This is obviously incorrect. This pach has implemented it as: vect__ifc__25.16_62 = MEM [(struct Item *)vectp_a.14_60]; vect_patt_31.17_63 = VIEW_CONVERT_EXPR(vect__ifc__25.16_62); vect_patt_32.18_64 = vect_patt_31.17_63 << 1; vect_patt_33.19_65 = vect_patt_32.18_64 >> 9; gcc/ChangeLog: PR tree-optimization/110557 * tree-vect-patterns.cc (vect_recog_bitfield_ref_pattern): Ensure the output sign-extended if necessary. gcc/testsuite/ChangeLog: PR tree-optimization/110557 * g++.dg/vect/pr110557.cc: New test.
[Bug tree-optimization/110557] [13/14 Regression] Wrong code for x86_64-linux-gnu with -O3 -mavx2: vectorized loop mishandles signed bit-fields
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110557 --- Comment #6 from Xi Ruoyao --- (In reply to avieira from comment #5) > Hi Xi, > > Feel free to test your patch and submit it to the list for review. I had a > look over and it looks correct to me. https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623782.html The changes from the version posted here: 1. Add a test case (I already made it sandwiched because a very first, not posted version of the patch failed with sandwiched cases). 2. Slightly adjusted the comment. There is another issue: if mask_width + shift_n == prec, we should omit the AND_EXPR even for unsigned bit-field. For example movq$-256, %rax vmovq %rax, %xmm1 vpunpcklqdq %xmm1, %xmm1, %xmm1 vpand (%rcx,%rdi,8), %xmm1, %xmm1 vpsrlq $8, %xmm1, %xmm1 can be just vmovdqu (%rcx,%rdi,8), %xmm1 vpsrlq $8, %xmm1, %xmm1 But it's a different issue so we can fix it in a different patch.
[Bug tree-optimization/110557] [13/14 Regression] Wrong code for x86_64-linux-gnu with -O3 -mavx2: vectorized loop mishandles signed bit-fields
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110557 Xi Ruoyao changed: What|Removed |Added URL||https://gcc.gnu.org/piperma ||il/gcc-patches/2023-July/62 ||3782.html Keywords||patch Status|NEW |ASSIGNED
[Bug tree-optimization/110557] [13/14 Regression] Wrong code for x86_64-linux-gnu with -O3 -mavx2: vectorized loop mishandles signed bit-fields
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110557 --- Comment #5 from avieira at gcc dot gnu.org --- Hi Xi, Feel free to test your patch and submit it to the list for review. I had a look over and it looks correct to me. I feel like it also addresses the cases where the bitfield is 'sandwiched' like: int x : 7; ptrdiff_t y : 56; long long z: 1; As you left-shift it, and it also addresses the case where you have both sign-extension and have to widen-it, because you still transform the type into signed. But it might be nice to add tests to cover those two, just in case someone changes this. In the future, if you do plan to work on something it would be nice to let people know on the bugzilla ticket (preferably by assigning it to yourself) so that multiple people don't end up working on the same thing, I had started to write a patch, but wasn't as far as you and I like your approach :)
[Bug tree-optimization/110557] [13/14 Regression] Wrong code for x86_64-linux-gnu with -O3 -mavx2: vectorized loop mishandles signed bit-fields
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110557 --- Comment #4 from Xi Ruoyao --- Untested patch: diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index de20e9d59cb..01df568ee61 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -2566,7 +2566,7 @@ vect_recog_widen_sum_pattern (vec_info *vinfo, Widening with mask first, shift later: container = (type_out) container; masked = container & (((1 << bitsize) - 1) << bitpos); - result = patt2 >> masked; + result = masked >> bitpos; Widening with shift first, mask last: container = (type_out) container; @@ -2578,6 +2578,15 @@ vect_recog_widen_sum_pattern (vec_info *vinfo, result = masked >> bitpos; result = (type_out) result; + If the bitfield is signed and the its width is greater than the width + of type_out, we need to perform a sign-extension: + container = (type) container; + masked = container << (prec - bitsize - bitpos); + result = (type_out) (masked >> (prec - bitsize)); + + Here type is the signed variant of the wider of type_out and the type + of container. + The shifting is always optional depending on whether bitpos != 0. */ @@ -2636,14 +2645,22 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info, if (BYTES_BIG_ENDIAN) shift_n = prec - shift_n - mask_width; + bool sign_ext = !TYPE_UNSIGNED (TREE_TYPE (bf_ref)) && + TYPE_PRECISION (ret_type) > mask_width; + bool widening = ((TYPE_PRECISION (TREE_TYPE (container)) < + TYPE_PRECISION (ret_type)) + && !useless_type_conversion_p (TREE_TYPE (container), + ret_type)); + /* We move the conversion earlier if the loaded type is smaller than the return type to enable the use of widening loads. */ - if (TYPE_PRECISION (TREE_TYPE (container)) < TYPE_PRECISION (ret_type) - && !useless_type_conversion_p (TREE_TYPE (container), ret_type)) + if (sign_ext || widening) { - pattern_stmt - = gimple_build_assign (vect_recog_temp_ssa_var (ret_type), - NOP_EXPR, container); + tree type = widening ? ret_type : container_type; + if (sign_ext) + type = gimple_signed_type (type); + pattern_stmt = gimple_build_assign (vect_recog_temp_ssa_var (type), + NOP_EXPR, container); container = gimple_get_lhs (pattern_stmt); container_type = TREE_TYPE (container); prec = tree_to_uhwi (TYPE_SIZE (container_type)); @@ -2671,7 +2688,7 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info, shift_first = true; tree result; - if (shift_first) + if (shift_first && !sign_ext) { tree shifted = container; if (shift_n) @@ -2694,14 +2711,27 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info, } else { - tree mask = wide_int_to_tree (container_type, - wi::shifted_mask (shift_n, mask_width, - false, prec)); - pattern_stmt - = gimple_build_assign (vect_recog_temp_ssa_var (container_type), - BIT_AND_EXPR, container, mask); - tree masked = gimple_assign_lhs (pattern_stmt); + tree temp = vect_recog_temp_ssa_var (container_type); + if (!sign_ext) + { + tree mask = wide_int_to_tree (container_type, + wi::shifted_mask (shift_n, + mask_width, + false, prec)); + pattern_stmt = gimple_build_assign (temp, BIT_AND_EXPR, + container, mask); + } + else + { + HOST_WIDE_INT shl = prec - shift_n - mask_width; + shift_n += shl; + pattern_stmt = gimple_build_assign (temp, LSHIFT_EXPR, + container, + build_int_cst (sizetype, +shl)); + } + tree masked = gimple_assign_lhs (pattern_stmt); append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype); pattern_stmt = gimple_build_assign (vect_recog_temp_ssa_var (container_type),
[Bug tree-optimization/110557] [13/14 Regression] Wrong code for x86_64-linux-gnu with -O3 -mavx2: vectorized loop mishandles signed bit-fields
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110557 --- Comment #3 from Xi Ruoyao --- It looks like if !UNSIGNED_TYPE (TREE_TYPE (bf_ref)), we need to generate something like: masked = (the signed variant of the wider type in {type_out, type_container}) container << (bitpos + bitsize); result = (type_out) >> bitpos;
[Bug tree-optimization/110557] [13/14 Regression] Wrong code for x86_64-linux-gnu with -O3 -mavx2: vectorized loop mishandles signed bit-fields
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110557 avieira at gcc dot gnu.org changed: What|Removed |Added CC||avieira at gcc dot gnu.org Assignee|unassigned at gcc dot gnu.org |avieira at gcc dot gnu.org --- Comment #2 from avieira at gcc dot gnu.org --- I'll have a look.
[Bug tree-optimization/110557] [13/14 Regression] Wrong code for x86_64-linux-gnu with -O3 -mavx2: vectorized loop mishandles signed bit-fields
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110557 Richard Biener changed: What|Removed |Added Priority|P3 |P2
[Bug tree-optimization/110557] [13/14 Regression] Wrong code for x86_64-linux-gnu with -O3 -mavx2: vectorized loop mishandles signed bit-fields
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110557 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |NEW Keywords|needs-bisection | Last reconfirmed||2023-07-05 Ever confirmed|0 |1 Component|target |tree-optimization --- Comment #1 from Andrew Pinski --- ifcvt produces correctly the sign extend: _ifc__18; _3 = a_14(D) + _2; _ifc__22 = _3->D.3439; _ifc__18 = BIT_FIELD_REF <_ifc__22, 56, 8>; _5 = (long long int) _ifc__18; _6 = _5 * -4; _7 = (long unsigned int) _6; _15 = MAX_EXPR <_7, size_19>; But then the vectorizer misses that: ``` _ifc__18 = BIT_FIELD_REF <_ifc__22, 56, 8>; vect_patt_13.20_76 = vect__ifc__22.19_74 & { 18446744073709551360, 18446744073709551360, 18446744073709551360, 18446744073709551360 }; vect_patt_10.21_77 = vect_patt_13.20_76 >> 8; vect_patt_9.22_78 = VIEW_CONVERT_EXPR(vect_patt_10.21_77); _5 = (long long int) _ifc__18; vect__6.23_80 = vect_patt_9.22_78 * { -4, -4, -4, -4 }; _6 = _5 * -4; ``` vect_patt_9.22_78 should have been sign extended ... With the C front-end, we get: ``` _24; _23; _24 = _25->y; _23 = _24 * -4; _22 = (long unsigned int) _23; ``` Which the vectorizer does not understand could be prompted to 64bit and then truncated so it does not vectorize it. r13-3219-g25413fdb2ac24933214123e24ba16502 .