[Bug target/115355] [12/13/14/15 Regression] vectorization exposes wrong code on P9 LE starting from r12-4496
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355 Kewen Lin changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #25 from Kewen Lin --- Should be fixed on trunk and affected release branches now.
[Bug target/115355] [12/13/14/15 Regression] vectorization exposes wrong code on P9 LE starting from r12-4496
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355 --- Comment #23 from GCC Commits --- The releases/gcc-12 branch has been updated by Kewen Lin : https://gcc.gnu.org/g:13f0528c782c3732052973a5d340769af8182c8f commit r12-10594-g13f0528c782c3732052973a5d340769af8182c8f Author: Kewen Lin Date: Wed Jun 26 02:16:17 2024 -0500 rs6000: Fix wrong RTL patterns for vector merge high/low char on LE Commit r12-4496 changes some define_expands and define_insns for vector merge high/low char, which are altivec_vmrg[hl]b. These defines are mainly for built-in function vec_merge{h,l} and some internal gen function needs. These functions should consider endianness, taking vec_mergeh as example, as PVIPR defines, vec_mergeh "Merges the first halves (in element order) of two vectors", it does note it's in element order. So it's mapped into vmrghb on BE while vmrglb on LE respectively. Although the mapped insns are different, as the discussion in PR106069, the RTL pattern should be still the same, it is conformed before commit r12-4496, but gets changed into different patterns on BE and LE starting from commit r12-4496. Similar to 32-bit element case in commit log of r15-1504, this 8-bit element pattern on LE doesn't actually match what the underlying insn is intended to represent, once some optimization like combine does some changes basing on it, it would cause the unexpected consequence. The newly constructed test case pr106069-1.c is a typical example for this issue. So this patch is to fix the wrong RTL pattern, ensure the associated RTL patterns become the same as before which can have the same semantic as their mapped insns. With the proposed patch, the expanders like altivec_vmrghb expands into altivec_vmrghb_direct_be or altivec_vmrglb_direct_le depending on endianness, "direct" can easily show which insn would be generated, _be and _le are mainly for the different RTL patterns as endianness. Co-authored-by: Xionghu Luo PR target/106069 PR target/115355 gcc/ChangeLog: * config/rs6000/altivec.md (altivec_vmrghb_direct): Rename to ... (altivec_vmrghb_direct_be): ... this. Add condition BYTES_BIG_ENDIAN. (altivec_vmrghb_direct_le): New define_insn. (altivec_vmrglb_direct): Rename to ... (altivec_vmrglb_direct_be): ... this. Add condition BYTES_BIG_ENDIAN. (altivec_vmrglb_direct_le): New define_insn. (altivec_vmrghb): Adjust by calling gen_altivec_vmrghb_direct_be for BE and gen_altivec_vmrglb_direct_le for LE. (altivec_vmrglb): Adjust by calling gen_altivec_vmrglb_direct_be for BE and gen_altivec_vmrghb_direct_le for LE. * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Replace CODE_FOR_altivec_vmrghb_direct by CODE_FOR_altivec_vmrghb_direct_be for BE and CODE_FOR_altivec_vmrghb_direct_le for LE. And replace CODE_FOR_altivec_vmrglb_direct by CODE_FOR_altivec_vmrglb_direct_be for BE and CODE_FOR_altivec_vmrglb_direct_le for LE. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr106069-1.c: New test. (cherry picked from commit 62520e4e9f7e2fe8a16ee57a4bd35da2e921ae22)
[Bug target/115355] [12/13/14/15 Regression] vectorization exposes wrong code on P9 LE starting from r12-4496
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355 --- Comment #24 from GCC Commits --- The releases/gcc-12 branch has been updated by Kewen Lin : https://gcc.gnu.org/g:ca6eea0eb33de8b2e23e0bef3466575bb14ab63f commit r12-10595-gca6eea0eb33de8b2e23e0bef3466575bb14ab63f Author: Kewen Lin Date: Wed Jun 26 02:16:17 2024 -0500 rs6000: Fix wrong RTL patterns for vector merge high/low short on LE Commit r12-4496 changes some define_expands and define_insns for vector merge high/low short, which are altivec_vmrg[hl]h. These defines are mainly for built-in function vec_merge{h,l} and some internal gen function needs. These functions should consider endianness, taking vec_mergeh as example, as PVIPR defines, vec_mergeh "Merges the first halves (in element order) of two vectors", it does note it's in element order. So it's mapped into vmrghh on BE while vmrglh on LE respectively. Although the mapped insns are different, as the discussion in PR106069, the RTL pattern should be still the same, it is conformed before commit r12-4496, but gets changed into different patterns on BE and LE starting from commit r12-4496. Similar to 32-bit element case in commit log of r15-1504, this 16-bit element pattern on LE doesn't actually match what the underlying insn is intended to represent, once some optimization like combine does some changes basing on it, it would cause the unexpected consequence. The newly constructed test case pr106069-2.c is a typical example for this issue on element type short. So this patch is to fix the wrong RTL pattern, ensure the associated RTL patterns become the same as before which can have the same semantic as their mapped insns. With the proposed patch, the expanders like altivec_vmrghh expands into altivec_vmrghh_direct_be or altivec_vmrglh_direct_le depending on endianness, "direct" can easily show which insn would be generated, _be and _le are mainly for the different RTL patterns as endianness. Co-authored-by: Xionghu Luo PR target/106069 PR target/115355 gcc/ChangeLog: * config/rs6000/altivec.md (altivec_vmrghh_direct): Rename to ... (altivec_vmrghh_direct_be): ... this. Add condition BYTES_BIG_ENDIAN. (altivec_vmrghh_direct_le): New define_insn. (altivec_vmrglh_direct): Rename to ... (altivec_vmrglh_direct_be): ... this. Add condition BYTES_BIG_ENDIAN. (altivec_vmrglh_direct_le): New define_insn. (altivec_vmrghh): Adjust by calling gen_altivec_vmrghh_direct_be for BE and gen_altivec_vmrglh_direct_le for LE. (altivec_vmrglh): Adjust by calling gen_altivec_vmrglh_direct_be for BE and gen_altivec_vmrghh_direct_le for LE. (vec_widen_umult_hi_v16qi): Adjust the call to gen_altivec_vmrghh_direct by gen_altivec_vmrghh for BE and by gen_altivec_vmrglh for LE. (vec_widen_smult_hi_v16qi): Likewise. (vec_widen_umult_lo_v16qi): Adjust the call to gen_altivec_vmrglh_direct by gen_altivec_vmrglh for BE and by gen_altivec_vmrghh for LE. (vec_widen_smult_lo_v16qi): Likewise. * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Replace CODE_FOR_altivec_vmrghh_direct by CODE_FOR_altivec_vmrghh_direct_be for BE and CODE_FOR_altivec_vmrghh_direct_le for LE. And replace CODE_FOR_altivec_vmrglh_direct by CODE_FOR_altivec_vmrglh_direct_be for BE and CODE_FOR_altivec_vmrglh_direct_le for LE. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr106069-2.c: New test. (cherry picked from commit 812c70bf4981958488331d4ea5af8709b5321da1)
[Bug target/115355] [12/13/14/15 Regression] vectorization exposes wrong code on P9 LE starting from r12-4496
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355 --- Comment #22 from GCC Commits --- The releases/gcc-13 branch has been updated by Kewen Lin : https://gcc.gnu.org/g:bab38d9271ce3f26cb64b8cb712351eb3fedd559 commit r13-8886-gbab38d9271ce3f26cb64b8cb712351eb3fedd559 Author: Kewen Lin Date: Wed Jun 26 02:16:17 2024 -0500 rs6000: Fix wrong RTL patterns for vector merge high/low short on LE Commit r12-4496 changes some define_expands and define_insns for vector merge high/low short, which are altivec_vmrg[hl]h. These defines are mainly for built-in function vec_merge{h,l} and some internal gen function needs. These functions should consider endianness, taking vec_mergeh as example, as PVIPR defines, vec_mergeh "Merges the first halves (in element order) of two vectors", it does note it's in element order. So it's mapped into vmrghh on BE while vmrglh on LE respectively. Although the mapped insns are different, as the discussion in PR106069, the RTL pattern should be still the same, it is conformed before commit r12-4496, but gets changed into different patterns on BE and LE starting from commit r12-4496. Similar to 32-bit element case in commit log of r15-1504, this 16-bit element pattern on LE doesn't actually match what the underlying insn is intended to represent, once some optimization like combine does some changes basing on it, it would cause the unexpected consequence. The newly constructed test case pr106069-2.c is a typical example for this issue on element type short. So this patch is to fix the wrong RTL pattern, ensure the associated RTL patterns become the same as before which can have the same semantic as their mapped insns. With the proposed patch, the expanders like altivec_vmrghh expands into altivec_vmrghh_direct_be or altivec_vmrglh_direct_le depending on endianness, "direct" can easily show which insn would be generated, _be and _le are mainly for the different RTL patterns as endianness. Co-authored-by: Xionghu Luo PR target/106069 PR target/115355 gcc/ChangeLog: * config/rs6000/altivec.md (altivec_vmrghh_direct): Rename to ... (altivec_vmrghh_direct_be): ... this. Add condition BYTES_BIG_ENDIAN. (altivec_vmrghh_direct_le): New define_insn. (altivec_vmrglh_direct): Rename to ... (altivec_vmrglh_direct_be): ... this. Add condition BYTES_BIG_ENDIAN. (altivec_vmrglh_direct_le): New define_insn. (altivec_vmrghh): Adjust by calling gen_altivec_vmrghh_direct_be for BE and gen_altivec_vmrglh_direct_le for LE. (altivec_vmrglh): Adjust by calling gen_altivec_vmrglh_direct_be for BE and gen_altivec_vmrghh_direct_le for LE. (vec_widen_umult_hi_v16qi): Adjust the call to gen_altivec_vmrghh_direct by gen_altivec_vmrghh for BE and by gen_altivec_vmrglh for LE. (vec_widen_smult_hi_v16qi): Likewise. (vec_widen_umult_lo_v16qi): Adjust the call to gen_altivec_vmrglh_direct by gen_altivec_vmrglh for BE and by gen_altivec_vmrghh for LE. (vec_widen_smult_lo_v16qi): Likewise. * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Replace CODE_FOR_altivec_vmrghh_direct by CODE_FOR_altivec_vmrghh_direct_be for BE and CODE_FOR_altivec_vmrghh_direct_le for LE. And replace CODE_FOR_altivec_vmrglh_direct by CODE_FOR_altivec_vmrglh_direct_be for BE and CODE_FOR_altivec_vmrglh_direct_le for LE. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr106069-2.c: New test. (cherry picked from commit 812c70bf4981958488331d4ea5af8709b5321da1)
[Bug target/115355] [12/13/14/15 Regression] vectorization exposes wrong code on P9 LE starting from r12-4496
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355 --- Comment #21 from GCC Commits --- The releases/gcc-13 branch has been updated by Kewen Lin : https://gcc.gnu.org/g:ffdd377fc07cdc7b62669d354e23f30940eaaffe commit r13-8885-gffdd377fc07cdc7b62669d354e23f30940eaaffe Author: Kewen Lin Date: Wed Jun 26 02:16:17 2024 -0500 rs6000: Fix wrong RTL patterns for vector merge high/low char on LE Commit r12-4496 changes some define_expands and define_insns for vector merge high/low char, which are altivec_vmrg[hl]b. These defines are mainly for built-in function vec_merge{h,l} and some internal gen function needs. These functions should consider endianness, taking vec_mergeh as example, as PVIPR defines, vec_mergeh "Merges the first halves (in element order) of two vectors", it does note it's in element order. So it's mapped into vmrghb on BE while vmrglb on LE respectively. Although the mapped insns are different, as the discussion in PR106069, the RTL pattern should be still the same, it is conformed before commit r12-4496, but gets changed into different patterns on BE and LE starting from commit r12-4496. Similar to 32-bit element case in commit log of r15-1504, this 8-bit element pattern on LE doesn't actually match what the underlying insn is intended to represent, once some optimization like combine does some changes basing on it, it would cause the unexpected consequence. The newly constructed test case pr106069-1.c is a typical example for this issue. So this patch is to fix the wrong RTL pattern, ensure the associated RTL patterns become the same as before which can have the same semantic as their mapped insns. With the proposed patch, the expanders like altivec_vmrghb expands into altivec_vmrghb_direct_be or altivec_vmrglb_direct_le depending on endianness, "direct" can easily show which insn would be generated, _be and _le are mainly for the different RTL patterns as endianness. Co-authored-by: Xionghu Luo PR target/106069 PR target/115355 gcc/ChangeLog: * config/rs6000/altivec.md (altivec_vmrghb_direct): Rename to ... (altivec_vmrghb_direct_be): ... this. Add condition BYTES_BIG_ENDIAN. (altivec_vmrghb_direct_le): New define_insn. (altivec_vmrglb_direct): Rename to ... (altivec_vmrglb_direct_be): ... this. Add condition BYTES_BIG_ENDIAN. (altivec_vmrglb_direct_le): New define_insn. (altivec_vmrghb): Adjust by calling gen_altivec_vmrghb_direct_be for BE and gen_altivec_vmrglb_direct_le for LE. (altivec_vmrglb): Adjust by calling gen_altivec_vmrglb_direct_be for BE and gen_altivec_vmrghb_direct_le for LE. * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Replace CODE_FOR_altivec_vmrghb_direct by CODE_FOR_altivec_vmrghb_direct_be for BE and CODE_FOR_altivec_vmrghb_direct_le for LE. And replace CODE_FOR_altivec_vmrglb_direct by CODE_FOR_altivec_vmrglb_direct_be for BE and CODE_FOR_altivec_vmrglb_direct_le for LE. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr106069-1.c: New test. (cherry picked from commit 62520e4e9f7e2fe8a16ee57a4bd35da2e921ae22)
[Bug target/115355] [12/13/14/15 Regression] vectorization exposes wrong code on P9 LE starting from r12-4496
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355 --- Comment #20 from GCC Commits --- The releases/gcc-14 branch has been updated by Kewen Lin : https://gcc.gnu.org/g:052f78d010d224c7289f1cf6eec784ac4eeed351 commit r14-10372-g052f78d010d224c7289f1cf6eec784ac4eeed351 Author: Kewen Lin Date: Wed Jun 26 02:16:17 2024 -0500 rs6000: Fix wrong RTL patterns for vector merge high/low short on LE Commit r12-4496 changes some define_expands and define_insns for vector merge high/low short, which are altivec_vmrg[hl]h. These defines are mainly for built-in function vec_merge{h,l} and some internal gen function needs. These functions should consider endianness, taking vec_mergeh as example, as PVIPR defines, vec_mergeh "Merges the first halves (in element order) of two vectors", it does note it's in element order. So it's mapped into vmrghh on BE while vmrglh on LE respectively. Although the mapped insns are different, as the discussion in PR106069, the RTL pattern should be still the same, it is conformed before commit r12-4496, but gets changed into different patterns on BE and LE starting from commit r12-4496. Similar to 32-bit element case in commit log of r15-1504, this 16-bit element pattern on LE doesn't actually match what the underlying insn is intended to represent, once some optimization like combine does some changes basing on it, it would cause the unexpected consequence. The newly constructed test case pr106069-2.c is a typical example for this issue on element type short. So this patch is to fix the wrong RTL pattern, ensure the associated RTL patterns become the same as before which can have the same semantic as their mapped insns. With the proposed patch, the expanders like altivec_vmrghh expands into altivec_vmrghh_direct_be or altivec_vmrglh_direct_le depending on endianness, "direct" can easily show which insn would be generated, _be and _le are mainly for the different RTL patterns as endianness. Co-authored-by: Xionghu Luo PR target/106069 PR target/115355 gcc/ChangeLog: * config/rs6000/altivec.md (altivec_vmrghh_direct): Rename to ... (altivec_vmrghh_direct_be): ... this. Add condition BYTES_BIG_ENDIAN. (altivec_vmrghh_direct_le): New define_insn. (altivec_vmrglh_direct): Rename to ... (altivec_vmrglh_direct_be): ... this. Add condition BYTES_BIG_ENDIAN. (altivec_vmrglh_direct_le): New define_insn. (altivec_vmrghh): Adjust by calling gen_altivec_vmrghh_direct_be for BE and gen_altivec_vmrglh_direct_le for LE. (altivec_vmrglh): Adjust by calling gen_altivec_vmrglh_direct_be for BE and gen_altivec_vmrghh_direct_le for LE. (vec_widen_umult_hi_v16qi): Adjust the call to gen_altivec_vmrghh_direct by gen_altivec_vmrghh for BE and by gen_altivec_vmrglh for LE. (vec_widen_smult_hi_v16qi): Likewise. (vec_widen_umult_lo_v16qi): Adjust the call to gen_altivec_vmrglh_direct by gen_altivec_vmrglh for BE and by gen_altivec_vmrghh for LE. (vec_widen_smult_lo_v16qi): Likewise. * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Replace CODE_FOR_altivec_vmrghh_direct by CODE_FOR_altivec_vmrghh_direct_be for BE and CODE_FOR_altivec_vmrghh_direct_le for LE. And replace CODE_FOR_altivec_vmrglh_direct by CODE_FOR_altivec_vmrglh_direct_be for BE and CODE_FOR_altivec_vmrglh_direct_le for LE. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr106069-2.c: New test. (cherry picked from commit 812c70bf4981958488331d4ea5af8709b5321da1)
[Bug target/115355] [12/13/14/15 Regression] vectorization exposes wrong code on P9 LE starting from r12-4496
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355 --- Comment #19 from GCC Commits --- The releases/gcc-14 branch has been updated by Kewen Lin : https://gcc.gnu.org/g:0e495e8e3fde11e430a77db6b477319ed0ae0b7c commit r14-10371-g0e495e8e3fde11e430a77db6b477319ed0ae0b7c Author: Kewen Lin Date: Wed Jun 26 02:16:17 2024 -0500 rs6000: Fix wrong RTL patterns for vector merge high/low char on LE Commit r12-4496 changes some define_expands and define_insns for vector merge high/low char, which are altivec_vmrg[hl]b. These defines are mainly for built-in function vec_merge{h,l} and some internal gen function needs. These functions should consider endianness, taking vec_mergeh as example, as PVIPR defines, vec_mergeh "Merges the first halves (in element order) of two vectors", it does note it's in element order. So it's mapped into vmrghb on BE while vmrglb on LE respectively. Although the mapped insns are different, as the discussion in PR106069, the RTL pattern should be still the same, it is conformed before commit r12-4496, but gets changed into different patterns on BE and LE starting from commit r12-4496. Similar to 32-bit element case in commit log of r15-1504, this 8-bit element pattern on LE doesn't actually match what the underlying insn is intended to represent, once some optimization like combine does some changes basing on it, it would cause the unexpected consequence. The newly constructed test case pr106069-1.c is a typical example for this issue. So this patch is to fix the wrong RTL pattern, ensure the associated RTL patterns become the same as before which can have the same semantic as their mapped insns. With the proposed patch, the expanders like altivec_vmrghb expands into altivec_vmrghb_direct_be or altivec_vmrglb_direct_le depending on endianness, "direct" can easily show which insn would be generated, _be and _le are mainly for the different RTL patterns as endianness. Co-authored-by: Xionghu Luo PR target/106069 PR target/115355 gcc/ChangeLog: * config/rs6000/altivec.md (altivec_vmrghb_direct): Rename to ... (altivec_vmrghb_direct_be): ... this. Add condition BYTES_BIG_ENDIAN. (altivec_vmrghb_direct_le): New define_insn. (altivec_vmrglb_direct): Rename to ... (altivec_vmrglb_direct_be): ... this. Add condition BYTES_BIG_ENDIAN. (altivec_vmrglb_direct_le): New define_insn. (altivec_vmrghb): Adjust by calling gen_altivec_vmrghb_direct_be for BE and gen_altivec_vmrglb_direct_le for LE. (altivec_vmrglb): Adjust by calling gen_altivec_vmrglb_direct_be for BE and gen_altivec_vmrghb_direct_le for LE. * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Replace CODE_FOR_altivec_vmrghb_direct by CODE_FOR_altivec_vmrghb_direct_be for BE and CODE_FOR_altivec_vmrghb_direct_le for LE. And replace CODE_FOR_altivec_vmrglb_direct by CODE_FOR_altivec_vmrglb_direct_be for BE and CODE_FOR_altivec_vmrglb_direct_le for LE. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr106069-1.c: New test. (cherry picked from commit 62520e4e9f7e2fe8a16ee57a4bd35da2e921ae22)
[Bug target/115355] [12/13/14/15 Regression] vectorization exposes wrong code on P9 LE starting from r12-4496
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355 --- Comment #18 from GCC Commits --- The releases/gcc-14 branch has been updated by Kewen Lin : https://gcc.gnu.org/g:ef8b60dd48faeaf2b4e28c35401fa10d2a3e53fb commit r14-10355-gef8b60dd48faeaf2b4e28c35401fa10d2a3e53fb Author: Kewen Lin Date: Thu Jun 20 20:23:56 2024 -0500 rs6000: Fix wrong RTL patterns for vector merge high/low word on LE Commit r12-4496 changes some define_expands and define_insns for vector merge high/low word, which are altivec_vmrg[hl]w, vsx_xxmrg[hl]w_. These defines are mainly for built-in function vec_merge{h,l}, __builtin_vsx_xxmrghw, __builtin_vsx_xxmrghw_4si and some internal gen function needs. These functions should consider endianness, taking vec_mergeh as example, as PVIPR defines, vec_mergeh "Merges the first halves (in element order) of two vectors", it does note it's in element order. So it's mapped into vmrghw on BE while vmrglw on LE respectively. Although the mapped insns are different, as the discussion in PR106069, the RTL pattern should be still the same, it is conformed before commit r12-4496, define_expand altivec_vmrghw got expanded into: (vec_select:VSX_W (vec_concat: (match_operand:VSX_W 1 "register_operand" "wa,v") (match_operand:VSX_W 2 "register_operand" "wa,v")) (parallel [(const_int 0) (const_int 4) (const_int 1) (const_int 5)])))] on both BE and LE then. But commit r12-4496 changed it to expand into: (vec_select:VSX_W (vec_concat: (match_operand:VSX_W 1 "register_operand" "wa,v") (match_operand:VSX_W 2 "register_operand" "wa,v")) (parallel [(const_int 0) (const_int 4) (const_int 1) (const_int 5)])))] on BE, and (vec_select:VSX_W (vec_concat: (match_operand:VSX_W 1 "register_operand" "wa,v") (match_operand:VSX_W 2 "register_operand" "wa,v")) (parallel [(const_int 2) (const_int 6) (const_int 3) (const_int 7)])))] on LE, although the mapped insn are still vmrghw on BE and vmrglw on LE, the associated RTL pattern is completely wrong and inconsistent with the mapped insn. If optimization passes leave this pattern alone, even if its pattern doesn't represent its mapped insn, it's still fine, that's why simple testing on bif doesn't expose this issue. But once some optimization pass such as combine does some changes basing on this wrong pattern, because the pattern doesn't match the semantics that the expanded insn is intended to represent, it would cause the unexpected result. So this patch is to fix the wrong RTL pattern, ensure the associated RTL patterns become the same as before which can have the same semantic as their mapped insns. With the proposed patch, the expanders like altivec_vmrghw expands into altivec_vmrghb_direct_be or altivec_vmrglb_direct_le depending on endianness, "direct" can easily show which insn would be generated, _be and _le are mainly for the different RTL patterns as endianness. Co-authored-by: Xionghu Luo PR target/106069 PR target/115355 gcc/ChangeLog: * config/rs6000/altivec.md (altivec_vmrghw_direct_): Rename to ... (altivec_vmrghw_direct__be): ... this. Add the condition BYTES_BIG_ENDIAN. (altivec_vmrghw_direct__le): New define_insn. (altivec_vmrglw_direct_): Rename to ... (altivec_vmrglw_direct__be): ... this. Add the condition BYTES_BIG_ENDIAN. (altivec_vmrglw_direct__le): New define_insn. (altivec_vmrghw): Adjust by calling gen_altivec_vmrghw_direct_v4si_be for BE and gen_altivec_vmrglw_direct_v4si_le for LE. (altivec_vmrglw): Adjust by calling gen_altivec_vmrglw_direct_v4si_be for BE and gen_altivec_vmrghw_direct_v4si_le for LE. (vec_widen_umult_hi_v8hi): Adjust the call to gen_altivec_vmrghw_direct_v4si by gen_altivec_vmrghw for BE and by gen_altivec_vmrglw for LE. (vec_widen_smult_hi_v8hi): Likewise. (vec_widen_umult_lo_v8hi): Adjust the call to gen_altivec_vmrglw_direct_v4si by gen_altivec_vmrglw for BE and by gen_altivec_vmrghw for LE (vec_widen_smult_lo_v8hi): Likewise. * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Replace CODE_FOR_altivec_vmrghw_direct_v4si by CODE_FOR_altivec_vmrghw_direct_v4si_be for BE and CODE_FOR_altivec_vmrghw_direct_v4si_le for LE. And replace CODE_FOR_altivec_vmrglw_direct_v4si by CODE_FOR_altivec_vmrglw_direct_v4si_be for BE and CODE_FOR_altivec_vmrglw_direct_v4si_le for LE.
[Bug target/115355] [12/13/14/15 Regression] vectorization exposes wrong code on P9 LE starting from r12-4496
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355 --- Comment #17 from GCC Commits --- The releases/gcc-13 branch has been updated by Kewen Lin : https://gcc.gnu.org/g:361bfcec901ca882130e338aebaa2ebc6ea2dc3b commit r13-8876-g361bfcec901ca882130e338aebaa2ebc6ea2dc3b Author: Kewen Lin Date: Thu Jun 20 20:23:56 2024 -0500 rs6000: Fix wrong RTL patterns for vector merge high/low word on LE Commit r12-4496 changes some define_expands and define_insns for vector merge high/low word, which are altivec_vmrg[hl]w, vsx_xxmrg[hl]w_. These defines are mainly for built-in function vec_merge{h,l}, __builtin_vsx_xxmrghw, __builtin_vsx_xxmrghw_4si and some internal gen function needs. These functions should consider endianness, taking vec_mergeh as example, as PVIPR defines, vec_mergeh "Merges the first halves (in element order) of two vectors", it does note it's in element order. So it's mapped into vmrghw on BE while vmrglw on LE respectively. Although the mapped insns are different, as the discussion in PR106069, the RTL pattern should be still the same, it is conformed before commit r12-4496, define_expand altivec_vmrghw got expanded into: (vec_select:VSX_W (vec_concat: (match_operand:VSX_W 1 "register_operand" "wa,v") (match_operand:VSX_W 2 "register_operand" "wa,v")) (parallel [(const_int 0) (const_int 4) (const_int 1) (const_int 5)])))] on both BE and LE then. But commit r12-4496 changed it to expand into: (vec_select:VSX_W (vec_concat: (match_operand:VSX_W 1 "register_operand" "wa,v") (match_operand:VSX_W 2 "register_operand" "wa,v")) (parallel [(const_int 0) (const_int 4) (const_int 1) (const_int 5)])))] on BE, and (vec_select:VSX_W (vec_concat: (match_operand:VSX_W 1 "register_operand" "wa,v") (match_operand:VSX_W 2 "register_operand" "wa,v")) (parallel [(const_int 2) (const_int 6) (const_int 3) (const_int 7)])))] on LE, although the mapped insn are still vmrghw on BE and vmrglw on LE, the associated RTL pattern is completely wrong and inconsistent with the mapped insn. If optimization passes leave this pattern alone, even if its pattern doesn't represent its mapped insn, it's still fine, that's why simple testing on bif doesn't expose this issue. But once some optimization pass such as combine does some changes basing on this wrong pattern, because the pattern doesn't match the semantics that the expanded insn is intended to represent, it would cause the unexpected result. So this patch is to fix the wrong RTL pattern, ensure the associated RTL patterns become the same as before which can have the same semantic as their mapped insns. With the proposed patch, the expanders like altivec_vmrghw expands into altivec_vmrghb_direct_be or altivec_vmrglb_direct_le depending on endianness, "direct" can easily show which insn would be generated, _be and _le are mainly for the different RTL patterns as endianness. Co-authored-by: Xionghu Luo PR target/106069 PR target/115355 gcc/ChangeLog: * config/rs6000/altivec.md (altivec_vmrghw_direct_): Rename to ... (altivec_vmrghw_direct__be): ... this. Add the condition BYTES_BIG_ENDIAN. (altivec_vmrghw_direct__le): New define_insn. (altivec_vmrglw_direct_): Rename to ... (altivec_vmrglw_direct__be): ... this. Add the condition BYTES_BIG_ENDIAN. (altivec_vmrglw_direct__le): New define_insn. (altivec_vmrghw): Adjust by calling gen_altivec_vmrghw_direct_v4si_be for BE and gen_altivec_vmrglw_direct_v4si_le for LE. (altivec_vmrglw): Adjust by calling gen_altivec_vmrglw_direct_v4si_be for BE and gen_altivec_vmrghw_direct_v4si_le for LE. (vec_widen_umult_hi_v8hi): Adjust the call to gen_altivec_vmrghw_direct_v4si by gen_altivec_vmrghw for BE and by gen_altivec_vmrglw for LE. (vec_widen_smult_hi_v8hi): Likewise. (vec_widen_umult_lo_v8hi): Adjust the call to gen_altivec_vmrglw_direct_v4si by gen_altivec_vmrglw for BE and by gen_altivec_vmrghw for LE (vec_widen_smult_lo_v8hi): Likewise. * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Replace CODE_FOR_altivec_vmrghw_direct_v4si by CODE_FOR_altivec_vmrghw_direct_v4si_be for BE and CODE_FOR_altivec_vmrghw_direct_v4si_le for LE. And replace CODE_FOR_altivec_vmrglw_direct_v4si by CODE_FOR_altivec_vmrglw_direct_v4si_be for BE and CODE_FOR_altivec_vmrglw_direct_v4si_le for LE.
[Bug target/115355] [12/13/14/15 Regression] vectorization exposes wrong code on P9 LE starting from r12-4496
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355 --- Comment #16 from GCC Commits --- The releases/gcc-12 branch has been updated by Kewen Lin : https://gcc.gnu.org/g:96ef3367067219c8e3eb88c0474a1090cc7749b4 commit r12-10587-g96ef3367067219c8e3eb88c0474a1090cc7749b4 Author: Kewen Lin Date: Thu Jun 20 20:23:56 2024 -0500 rs6000: Fix wrong RTL patterns for vector merge high/low word on LE Commit r12-4496 changes some define_expands and define_insns for vector merge high/low word, which are altivec_vmrg[hl]w, vsx_xxmrg[hl]w_. These defines are mainly for built-in function vec_merge{h,l}, __builtin_vsx_xxmrghw, __builtin_vsx_xxmrghw_4si and some internal gen function needs. These functions should consider endianness, taking vec_mergeh as example, as PVIPR defines, vec_mergeh "Merges the first halves (in element order) of two vectors", it does note it's in element order. So it's mapped into vmrghw on BE while vmrglw on LE respectively. Although the mapped insns are different, as the discussion in PR106069, the RTL pattern should be still the same, it is conformed before commit r12-4496, define_expand altivec_vmrghw got expanded into: (vec_select:VSX_W (vec_concat: (match_operand:VSX_W 1 "register_operand" "wa,v") (match_operand:VSX_W 2 "register_operand" "wa,v")) (parallel [(const_int 0) (const_int 4) (const_int 1) (const_int 5)])))] on both BE and LE then. But commit r12-4496 changed it to expand into: (vec_select:VSX_W (vec_concat: (match_operand:VSX_W 1 "register_operand" "wa,v") (match_operand:VSX_W 2 "register_operand" "wa,v")) (parallel [(const_int 0) (const_int 4) (const_int 1) (const_int 5)])))] on BE, and (vec_select:VSX_W (vec_concat: (match_operand:VSX_W 1 "register_operand" "wa,v") (match_operand:VSX_W 2 "register_operand" "wa,v")) (parallel [(const_int 2) (const_int 6) (const_int 3) (const_int 7)])))] on LE, although the mapped insn are still vmrghw on BE and vmrglw on LE, the associated RTL pattern is completely wrong and inconsistent with the mapped insn. If optimization passes leave this pattern alone, even if its pattern doesn't represent its mapped insn, it's still fine, that's why simple testing on bif doesn't expose this issue. But once some optimization pass such as combine does some changes basing on this wrong pattern, because the pattern doesn't match the semantics that the expanded insn is intended to represent, it would cause the unexpected result. So this patch is to fix the wrong RTL pattern, ensure the associated RTL patterns become the same as before which can have the same semantic as their mapped insns. With the proposed patch, the expanders like altivec_vmrghw expands into altivec_vmrghb_direct_be or altivec_vmrglb_direct_le depending on endianness, "direct" can easily show which insn would be generated, _be and _le are mainly for the different RTL patterns as endianness. Co-authored-by: Xionghu Luo PR target/106069 PR target/115355 gcc/ChangeLog: * config/rs6000/altivec.md (altivec_vmrghw_direct_): Rename to ... (altivec_vmrghw_direct__be): ... this. Add the condition BYTES_BIG_ENDIAN. (altivec_vmrghw_direct__le): New define_insn. (altivec_vmrglw_direct_): Rename to ... (altivec_vmrglw_direct__be): ... this. Add the condition BYTES_BIG_ENDIAN. (altivec_vmrglw_direct__le): New define_insn. (altivec_vmrghw): Adjust by calling gen_altivec_vmrghw_direct_v4si_be for BE and gen_altivec_vmrglw_direct_v4si_le for LE. (altivec_vmrglw): Adjust by calling gen_altivec_vmrglw_direct_v4si_be for BE and gen_altivec_vmrghw_direct_v4si_le for LE. (vec_widen_umult_hi_v8hi): Adjust the call to gen_altivec_vmrghw_direct_v4si by gen_altivec_vmrghw for BE and by gen_altivec_vmrglw for LE. (vec_widen_smult_hi_v8hi): Likewise. (vec_widen_umult_lo_v8hi): Adjust the call to gen_altivec_vmrglw_direct_v4si by gen_altivec_vmrglw for BE and by gen_altivec_vmrghw for LE (vec_widen_smult_lo_v8hi): Likewise. * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Replace CODE_FOR_altivec_vmrghw_direct_v4si by CODE_FOR_altivec_vmrghw_direct_v4si_be for BE and CODE_FOR_altivec_vmrghw_direct_v4si_le for LE. And replace CODE_FOR_altivec_vmrglw_direct_v4si by CODE_FOR_altivec_vmrglw_direct_v4si_be for BE and CODE_FOR_altivec_vmrglw_direct_v4si_le for LE.
[Bug target/115355] [12/13/14/15 Regression] vectorization exposes wrong code on P9 LE starting from r12-4496
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355 --- Comment #14 from GCC Commits --- The master branch has been updated by Kewen Lin : https://gcc.gnu.org/g:62520e4e9f7e2fe8a16ee57a4bd35da2e921ae22 commit r15-1644-g62520e4e9f7e2fe8a16ee57a4bd35da2e921ae22 Author: Kewen Lin Date: Wed Jun 26 02:16:17 2024 -0500 rs6000: Fix wrong RTL patterns for vector merge high/low char on LE Commit r12-4496 changes some define_expands and define_insns for vector merge high/low char, which are altivec_vmrg[hl]b. These defines are mainly for built-in function vec_merge{h,l} and some internal gen function needs. These functions should consider endianness, taking vec_mergeh as example, as PVIPR defines, vec_mergeh "Merges the first halves (in element order) of two vectors", it does note it's in element order. So it's mapped into vmrghb on BE while vmrglb on LE respectively. Although the mapped insns are different, as the discussion in PR106069, the RTL pattern should be still the same, it is conformed before commit r12-4496, but gets changed into different patterns on BE and LE starting from commit r12-4496. Similar to 32-bit element case in commit log of r15-1504, this 8-bit element pattern on LE doesn't actually match what the underlying insn is intended to represent, once some optimization like combine does some changes basing on it, it would cause the unexpected consequence. The newly constructed test case pr106069-1.c is a typical example for this issue. So this patch is to fix the wrong RTL pattern, ensure the associated RTL patterns become the same as before which can have the same semantic as their mapped insns. With the proposed patch, the expanders like altivec_vmrghb expands into altivec_vmrghb_direct_be or altivec_vmrglb_direct_le depending on endianness, "direct" can easily show which insn would be generated, _be and _le are mainly for the different RTL patterns as endianness. Co-authored-by: Xionghu Luo PR target/106069 PR target/115355 gcc/ChangeLog: * config/rs6000/altivec.md (altivec_vmrghb_direct): Rename to ... (altivec_vmrghb_direct_be): ... this. Add condition BYTES_BIG_ENDIAN. (altivec_vmrghb_direct_le): New define_insn. (altivec_vmrglb_direct): Rename to ... (altivec_vmrglb_direct_be): ... this. Add condition BYTES_BIG_ENDIAN. (altivec_vmrglb_direct_le): New define_insn. (altivec_vmrghb): Adjust by calling gen_altivec_vmrghb_direct_be for BE and gen_altivec_vmrglb_direct_le for LE. (altivec_vmrglb): Adjust by calling gen_altivec_vmrglb_direct_be for BE and gen_altivec_vmrghb_direct_le for LE. * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Replace CODE_FOR_altivec_vmrghb_direct by CODE_FOR_altivec_vmrghb_direct_be for BE and CODE_FOR_altivec_vmrghb_direct_le for LE. And replace CODE_FOR_altivec_vmrglb_direct by CODE_FOR_altivec_vmrglb_direct_be for BE and CODE_FOR_altivec_vmrglb_direct_le for LE. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr106069-1.c: New test.
[Bug target/115355] [12/13/14/15 Regression] vectorization exposes wrong code on P9 LE starting from r12-4496
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355 --- Comment #15 from GCC Commits --- The master branch has been updated by Kewen Lin : https://gcc.gnu.org/g:812c70bf4981958488331d4ea5af8709b5321da1 commit r15-1645-g812c70bf4981958488331d4ea5af8709b5321da1 Author: Kewen Lin Date: Wed Jun 26 02:16:17 2024 -0500 rs6000: Fix wrong RTL patterns for vector merge high/low short on LE Commit r12-4496 changes some define_expands and define_insns for vector merge high/low short, which are altivec_vmrg[hl]h. These defines are mainly for built-in function vec_merge{h,l} and some internal gen function needs. These functions should consider endianness, taking vec_mergeh as example, as PVIPR defines, vec_mergeh "Merges the first halves (in element order) of two vectors", it does note it's in element order. So it's mapped into vmrghh on BE while vmrglh on LE respectively. Although the mapped insns are different, as the discussion in PR106069, the RTL pattern should be still the same, it is conformed before commit r12-4496, but gets changed into different patterns on BE and LE starting from commit r12-4496. Similar to 32-bit element case in commit log of r15-1504, this 16-bit element pattern on LE doesn't actually match what the underlying insn is intended to represent, once some optimization like combine does some changes basing on it, it would cause the unexpected consequence. The newly constructed test case pr106069-2.c is a typical example for this issue on element type short. So this patch is to fix the wrong RTL pattern, ensure the associated RTL patterns become the same as before which can have the same semantic as their mapped insns. With the proposed patch, the expanders like altivec_vmrghh expands into altivec_vmrghh_direct_be or altivec_vmrglh_direct_le depending on endianness, "direct" can easily show which insn would be generated, _be and _le are mainly for the different RTL patterns as endianness. Co-authored-by: Xionghu Luo PR target/106069 PR target/115355 gcc/ChangeLog: * config/rs6000/altivec.md (altivec_vmrghh_direct): Rename to ... (altivec_vmrghh_direct_be): ... this. Add condition BYTES_BIG_ENDIAN. (altivec_vmrghh_direct_le): New define_insn. (altivec_vmrglh_direct): Rename to ... (altivec_vmrglh_direct_be): ... this. Add condition BYTES_BIG_ENDIAN. (altivec_vmrglh_direct_le): New define_insn. (altivec_vmrghh): Adjust by calling gen_altivec_vmrghh_direct_be for BE and gen_altivec_vmrglh_direct_le for LE. (altivec_vmrglh): Adjust by calling gen_altivec_vmrglh_direct_be for BE and gen_altivec_vmrghh_direct_le for LE. (vec_widen_umult_hi_v16qi): Adjust the call to gen_altivec_vmrghh_direct by gen_altivec_vmrghh for BE and by gen_altivec_vmrglh for LE. (vec_widen_smult_hi_v16qi): Likewise. (vec_widen_umult_lo_v16qi): Adjust the call to gen_altivec_vmrglh_direct by gen_altivec_vmrglh for BE and by gen_altivec_vmrghh for LE. (vec_widen_smult_lo_v16qi): Likewise. * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Replace CODE_FOR_altivec_vmrghh_direct by CODE_FOR_altivec_vmrghh_direct_be for BE and CODE_FOR_altivec_vmrghh_direct_le for LE. And replace CODE_FOR_altivec_vmrglh_direct by CODE_FOR_altivec_vmrglh_direct_be for BE and CODE_FOR_altivec_vmrglh_direct_le for LE. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr106069-2.c: New test.
[Bug target/115355] [12/13/14/15 Regression] vectorization exposes wrong code on P9 LE starting from r12-4496
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355 --- Comment #13 from GCC Commits --- The master branch has been updated by Kewen Lin : https://gcc.gnu.org/g:52c112800d9f44457c4832309a48c00945811313 commit r15-1504-g52c112800d9f44457c4832309a48c00945811313 Author: Kewen Lin Date: Thu Jun 20 20:23:56 2024 -0500 rs6000: Fix wrong RTL patterns for vector merge high/low word on LE Commit r12-4496 changes some define_expands and define_insns for vector merge high/low word, which are altivec_vmrg[hl]w, vsx_xxmrg[hl]w_. These defines are mainly for built-in function vec_merge{h,l}, __builtin_vsx_xxmrghw, __builtin_vsx_xxmrghw_4si and some internal gen function needs. These functions should consider endianness, taking vec_mergeh as example, as PVIPR defines, vec_mergeh "Merges the first halves (in element order) of two vectors", it does note it's in element order. So it's mapped into vmrghw on BE while vmrglw on LE respectively. Although the mapped insns are different, as the discussion in PR106069, the RTL pattern should be still the same, it is conformed before commit r12-4496, define_expand altivec_vmrghw got expanded into: (vec_select:VSX_W (vec_concat: (match_operand:VSX_W 1 "register_operand" "wa,v") (match_operand:VSX_W 2 "register_operand" "wa,v")) (parallel [(const_int 0) (const_int 4) (const_int 1) (const_int 5)])))] on both BE and LE then. But commit r12-4496 changed it to expand into: (vec_select:VSX_W (vec_concat: (match_operand:VSX_W 1 "register_operand" "wa,v") (match_operand:VSX_W 2 "register_operand" "wa,v")) (parallel [(const_int 0) (const_int 4) (const_int 1) (const_int 5)])))] on BE, and (vec_select:VSX_W (vec_concat: (match_operand:VSX_W 1 "register_operand" "wa,v") (match_operand:VSX_W 2 "register_operand" "wa,v")) (parallel [(const_int 2) (const_int 6) (const_int 3) (const_int 7)])))] on LE, although the mapped insn are still vmrghw on BE and vmrglw on LE, the associated RTL pattern is completely wrong and inconsistent with the mapped insn. If optimization passes leave this pattern alone, even if its pattern doesn't represent its mapped insn, it's still fine, that's why simple testing on bif doesn't expose this issue. But once some optimization pass such as combine does some changes basing on this wrong pattern, because the pattern doesn't match the semantics that the expanded insn is intended to represent, it would cause the unexpected result. So this patch is to fix the wrong RTL pattern, ensure the associated RTL patterns become the same as before which can have the same semantic as their mapped insns. With the proposed patch, the expanders like altivec_vmrghw expands into altivec_vmrghb_direct_be or altivec_vmrglb_direct_le depending on endianness, "direct" can easily show which insn would be generated, _be and _le are mainly for the different RTL patterns as endianness. Co-authored-by: Xionghu Luo PR target/106069 PR target/115355 gcc/ChangeLog: * config/rs6000/altivec.md (altivec_vmrghw_direct_): Rename to ... (altivec_vmrghw_direct__be): ... this. Add the condition BYTES_BIG_ENDIAN. (altivec_vmrghw_direct__le): New define_insn. (altivec_vmrglw_direct_): Rename to ... (altivec_vmrglw_direct__be): ... this. Add the condition BYTES_BIG_ENDIAN. (altivec_vmrglw_direct__le): New define_insn. (altivec_vmrghw): Adjust by calling gen_altivec_vmrghw_direct_v4si_be for BE and gen_altivec_vmrglw_direct_v4si_le for LE. (altivec_vmrglw): Adjust by calling gen_altivec_vmrglw_direct_v4si_be for BE and gen_altivec_vmrghw_direct_v4si_le for LE. (vec_widen_umult_hi_v8hi): Adjust the call to gen_altivec_vmrghw_direct_v4si by gen_altivec_vmrghw for BE and by gen_altivec_vmrglw for LE. (vec_widen_smult_hi_v8hi): Likewise. (vec_widen_umult_lo_v8hi): Adjust the call to gen_altivec_vmrglw_direct_v4si by gen_altivec_vmrglw for BE and by gen_altivec_vmrghw for LE (vec_widen_smult_lo_v8hi): Likewise. * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Replace CODE_FOR_altivec_vmrghw_direct_v4si by CODE_FOR_altivec_vmrghw_direct_v4si_be for BE and CODE_FOR_altivec_vmrghw_direct_v4si_le for LE. And replace CODE_FOR_altivec_vmrglw_direct_v4si by CODE_FOR_altivec_vmrglw_direct_v4si_be for BE and CODE_FOR_altivec_vmrglw_direct_v4si_le for LE. *
[Bug target/115355] [12/13/14/15 Regression] vectorization exposes wrong code on P9 LE starting from r12-4496
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355 Richard Biener changed: What|Removed |Added Target Milestone|12.4|12.5 --- Comment #12 from Richard Biener --- GCC 12.4 is being released, retargeting bugs to GCC 12.5.
[Bug target/115355] [12/13/14/15 Regression] vectorization exposes wrong code on P9 LE starting from r12-4496
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355 --- Comment #11 from Kewen Lin --- (In reply to Jens Seifert from comment #10) > Does this affect loop vectorize and slp vectorize ? > > -fno-tree-loop-vectorize avoids loop vectorization to be performed and > workarounds this issue. Does the same problems also affect SLP > vectorization, which does not take place in this sample. > > In other words, do I need > -fno-tree-loop-vectorize > or > -fno-tree-vectorize > to workaround this bug ? Since it's an issue on vector merge insn patterns in target code and vectorization just exposes it, it's hard to workaround this bug completely just by disabling both loop and slp vectorization, as its related bug PR106069 shows, even without vectorization but using some vec merge built-ins, it's still possible to hit this issue. But I'd expect disabling both loop and slp vectorization (-fno-tree-vectorize) can greatly reduce the possibility of encountering it.
[Bug target/115355] [12/13/14/15 Regression] vectorization exposes wrong code on P9 LE starting from r12-4496
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355 --- Comment #10 from Jens Seifert --- Does this affect loop vectorize and slp vectorize ? -fno-tree-loop-vectorize avoids loop vectorization to be performed and workarounds this issue. Does the same problems also affect SLP vectorization, which does not take place in this sample. In other words, do I need -fno-tree-loop-vectorize or -fno-tree-vectorize to workaround this bug ?
[Bug target/115355] [12/13/14/15 Regression] vectorization exposes wrong code on P9 LE starting from r12-4496
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355 --- Comment #9 from Kewen Lin --- (In reply to Peter Bergner from comment #7) > The test fails when setToIdentityBAD's index var is unsigned int. It passes > when using unsigned long long, unsigned long, unsigned short and unsigned > char. When using unsigned long long/unsigned long, we do no vectorize the unsigned {long ,}long fails to vectorize due to cost modeling: missed: cost model: the vector iteration cost = 2 divided by the scalar iteration cost = 1 is greater or equal to the vectorization factor = 2. missed: not vectorized: vectorization not profitable. it can be forced with -fno-vect-cost-model. > loop. We vectorize the loop when using unsigned int/short/char. The > vectorized code is a little strange, in that the smaller the integer type we > use for the index var, the more code we generate. > > The vectorized code for unsigned char is truly huge! ...although it does > seem to work correctly. I'm attaching the "unsigned char i" code gen for > setToIdentityBAD for people to examine. Even though it gives "correct" > results, it can't really be the code we want to generate, correct??? It's due to aggressive unrolling, as it has one early check on the loop bound between 16 and 255, then cunroll completely unrolls it for each 16 multiples (totally 15 loops). A compact version of code can be generated with -fdisable-tree-cunroll.
[Bug target/115355] [12/13/14/15 Regression] vectorization exposes wrong code on P9 LE starting from r12-4496
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355 Richard Biener changed: What|Removed |Added Priority|P3 |P2
[Bug target/115355] [12/13/14/15 Regression] vectorization exposes wrong code on P9 LE starting from r12-4496
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355 --- Comment #8 from Kewen Lin --- (In reply to Peter Bergner from comment #5) > FYI, fails for me with gcc 12 and later and works with gcc 11. It also > fails with -O3 -mcpu=power10. Thanks for the information, bisection shows r12-4496 is the culprit commit, I just tested and confirmed Xionghu's latest patch for PR106069 also fixed this one. - latest rev. for his fix: https://inbox.sourceware.org/gcc-patches/20230210025952.1887696-1-xionghu...@tencent.com/, which was resent from https://inbox.sourceware.org/gcc-patches/37b57a54-f98e-96a3-edff-866c8aae4...@gmail.com/ - original thread and some discussions: https://inbox.sourceware.org/gcc-patches/20220808034247.2618809-1-xionghu...@tencent.com/ The latest rev. looked to me as (https://inbox.sourceware.org/gcc-patches/e8e69f0c-7f36-e671-6c3b-74401e4d8...@linux.ibm.com/), still looking forward to Segher's review and approval on this.