https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92884
Bug ID: 92884
Summary: [SVE] Add support for chained extract-last reductions
Product: gcc
Version: 10.0
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: rsandifo at gcc dot gnu.org
Target Milestone: ---
Extract-last (i.e. CLASTB) reductions can't yet handle chained
conditions, such as those seen in gcc.dg/vect/vect-cond-reduc-5.c.
We just fall back to the normal COND_REDUCTION handling instead.
If we have:
res_0 = PHI <res_n(latch), init(entry)>;
res_1 = COND_EXPR <cond_1, res_0, val_1>;
res_2 = COND_EXPR <cond_2, res_1, val_2>;
...
res_n = COND_EXPR <cond_n, res_{n-1}, val_n>;
one alternative would be (pseudo-code):
res_0 = PHI <res_n(latch), init(entry)>;
vec.res_1 = vec.val_1;
vec.res_2 = VEC_COND_EXPR <vec.cond_2, vec.res_1, vec.val_2>;
...
vec.res_n = VEC_COND_EXPR <vec.cond_n, vec.res_{n-1}, vec.val_n>;
vec.cond_any = IOR_EXPR <vec.cond_1, ..., vec.cond_n>;
res_n = .EXTRACT_LAST (res_0, vec.cond_any, vec.res_n);
Perhaps it would make sense to move the IFN_EXTRACT_LAST generation
from vectorizable_condition to vect_create_epilog_for_reduction.
All vectorizable_condition would need to do differently from
COND_REDUCTION is to handle the special case of:
vec.res_1 = vec.val_1;
instead of using a VEC_COND_EXPR between vec.val_1 and vec.res_0.
(res_0 isn't vectorised for EXTRACT_LAST_REDUCTION.)