https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110660
Bug ID: 110660 Summary: conditional length reduction optimization Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: juzhe.zhong at rivai dot ai Target Milestone: --- Consider this following test: #include <stdint.h> int __attribute__((noipa)) add_loop (int32_t * __restrict x, int32_t n, int res, int * __restrict cond) { for (int i = 0; i < n; ++i) if (cond[i]) res += x[i]; return res; } Current GCC can do vectorize reduction for RVV: <bb 4> [local count: 630715945]: ... _59 = .SELECT_VL (ivtmp_57, POLY_INT_CST [4, 4]); ivtmp_40 = _59 * 4; vect__4.10_43 = .LEN_MASK_LOAD (vectp_cond.8_41, 32B, _59, 0, { -1, ... }); mask__18.11_45 = vect__4.10_43 != { 0, ... }; vect__7.14_49 = .LEN_MASK_LOAD (vectp_x.12_47, 32B, _59, 0, mask__18.11_45); vect__ifc__33.15_51 = .VCOND_MASK (mask__18.11_45, vect__7.14_49, { 0, ... }); vect__34.16_52 = .COND_LEN_ADD ({ -1, ... }, vect_res_19.7_38, vect__ifc__33.15_51, vect_res_19.7_38, _59, 0); ... <bb 5> [local count: 105119324]: _54 = .REDUC_PLUS (vect__34.16_52); _55 = res_11(D) + _54; Actually, we can optmize "VCOND_MASK + COND_LEN_ADD" into single "COND_LEN_ADD" with replacing the argument of "COND_LEN_ADD". Consider this following pattern: dummy_mask = { -1, ... } dummy_else_value = { 0, ... } ;; This is dummy for PLUS, since a + 0 = a op1_2 = .VCOND_MASK (control_mask, op1_1, dummy_else_value); result = .COND_LEN_ADD (dummy_mask, op0, op1_2, op2, loop_len, bias) Since it is using dummy_mask and dummy_else_value, we can simplify this operation into: result = .COND_LEN_ADD (control_mask, op0, op1_1, op2, loop_len, bias) To do this optimization, we can either do this optimization either in middle-end "match.pd" or in backend "combine pass" to handle this. Which approach is better? Thanks.