https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118151
Bug ID: 118151
Summary: Relax the SVE PTEST matching conditions for any/none
(ne/eq)
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Keywords: aarch64-sve, missed-optimization
Severity: enhancement
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: rsandifo at gcc dot gnu.org
CC: tnfchris at gcc dot gnu.org
Target Milestone: ---
Target: aarch64*-*-*
All our current PTEST combiner patterns are for the general CC_NZC case, where
the eventual condition could be first/not-first/last/not-last/any/none. For
this general case, it's only usually possible to fold a PTEST with a previous
(potential) flag-setting instruction if both instructions have the same
governing predicate.
However, for the simple any/none (ne/eq) case, it's enough for the PTEST gp to
be a superset of the other instruction's gp. In particular, we can always fold
if the PTEST is predicated on a PTRUE for the same element width or narrower.
The failure to handle this case is causing us to miss many folds, both in ACLE
code and in early-break tests.
I think it could be handled by using CC_Z for ne/eq and relaxing
aarch64_sve_same_pred_for_ptest_p for that case. It might even be a relatively
simple change.
For example:
#include <arm_sve.h>
int
foo (svbool_t pg, svint32_t x, svint32_t y)
{
return svptest_any(svptrue_b8(), svcmpeq(pg, x, y));
}
currently generates:
ptrue p3.b, all
cmpeq p0.s, p0/z, z0.s, z1.s
ptest p3, p0.b
cset w0, any
ret
where the ptest and ptrue are redundant. The same is true with svptrue_b8
replaced by svptrue_b16 or svptrue_b32, but not with svptrue_b64. (LLVM
optimises the svptrue_b32 case, but not the others.)
We should try to make it so that two tests of the same result, such as
svptest_last and svptest_any, both still use the same PTEST, even if they
initially use different CC modes.