[Bug target/113859] popcount HI can be vectorized for non-SVE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113859 Pengxuan Zheng changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #6 from Pengxuan Zheng --- Fixed.
[Bug target/113859] popcount HI can be vectorized for non-SVE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113859 --- Comment #5 from GCC Commits --- The master branch has been updated by Pengxuan Zheng : https://gcc.gnu.org/g:895bbc08d38c2aca3cbbab273a247021fea73930 commit r15-1801-g895bbc08d38c2aca3cbbab273a247021fea73930 Author: Pengxuan Zheng Date: Wed Jun 12 18:23:13 2024 -0700 aarch64: Add vector popcount besides QImode [PR113859] This patch improves GCCâs vectorization of __builtin_popcount for aarch64 target by adding popcount patterns for vector modes besides QImode, i.e., HImode, SImode and DImode. With this patch, we now generate the following for V8HI: cnt v1.16b, v0.16b uaddlp v2.8h, v1.16b For V4HI, we generate: cnt v1.8b, v0.8b uaddlp v2.4h, v1.8b For V4SI, we generate: cnt v1.16b, v0.16b uaddlp v2.8h, v1.16b uaddlp v3.4s, v2.8h For V4SI with TARGET_DOTPROD, we generate the following instead: moviv0.4s, #0 moviv1.16b, #1 cnt v3.16b, v2.16b udotv0.4s, v3.16b, v1.16b For V2SI, we generate: cnt v1.8b, v.8b uaddlp v2.4h, v1.8b uaddlp v3.2s, v2.4h For V2SI with TARGET_DOTPROD, we generate the following instead: moviv0.8b, #0 moviv1.8b, #1 cnt v3.8b, v2.8b udotv0.2s, v3.8b, v1.8b For V2DI, we generate: cnt v1.16b, v.16b uaddlp v2.8h, v1.16b uaddlp v3.4s, v2.8h uaddlp v4.2d, v3.4s For V4SI with TARGET_DOTPROD, we generate the following instead: moviv0.4s, #0 moviv1.16b, #1 cnt v3.16b, v2.16b udotv0.4s, v3.16b, v1.16b uaddlp v0.2d, v0.4s PR target/113859 gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_addlp): Rename to... (@aarch64_addlp): ... This. (popcount2): New define_expand. gcc/testsuite/ChangeLog: * gcc.target/aarch64/popcnt-udot.c: New test. * gcc.target/aarch64/popcnt-vec.c: New test. Signed-off-by: Pengxuan Zheng
[Bug target/113859] popcount HI can be vectorized for non-SVE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113859 --- Comment #4 from Andrew Pinski --- (In reply to Andrew Pinski from comment #3) > Patch was posted: > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650311.html Latest patch: https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653405.html
[Bug target/113859] popcount HI can be vectorized for non-SVE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113859 Andrew Pinski changed: What|Removed |Added Keywords||patch URL||https://gcc.gnu.org/piperma ||il/gcc-patches/2024-May/650 ||311.html --- Comment #3 from Andrew Pinski --- Patch was posted: https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650311.html
[Bug target/113859] popcount HI can be vectorized for non-SVE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113859 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 Assignee|unassigned at gcc dot gnu.org |pinskia at gcc dot gnu.org Last reconfirmed||2024-03-05 --- Comment #2 from Andrew Pinski --- Mine.
[Bug target/113859] popcount HI can be vectorized for non-SVE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113859 --- Comment #1 from Andrew Pinski --- SI (and DI) can be optimized too. LLVM is produces for int: ldr d0, [x0] cnt v0.8b, v0.8b uaddlp v0.4h, v0.8b uaddlp v0.2s, v0.4h str d0, [x1] ret And for long: ``` ldr q0, [x0] cnt v0.16b, v0.16b uaddlp v0.8h, v0.16b uaddlp v0.4s, v0.8h uaddlp v0.2d, v0.4s str q0, [x1] ret ``` That is for SLP version: ``` void f(unsigned long * __restrict b, unsigned long * __restrict d) { d[0] = __builtin_popcountll(b[0]); d[1] = __builtin_popcountll(b[1]); } ``` s/long/int/ in the first case. Note using SVE is better than the above if it is available and that is part of PR 113860 though.