On Tue, 27 Feb 2024, haochen.jiang wrote: > On Linux/x86_64, > > af66ad89e8169f44db723813662917cf4cbb78fc is the first bad commit > commit af66ad89e8169f44db723813662917cf4cbb78fc > Author: Richard Biener <rguent...@suse.de> > Date: Fri Feb 23 16:06:05 2024 +0100 > > middle-end/114070 - folding breaking VEC_COND expansion > > caused > > FAIL: gcc.dg/tree-ssa/andnot-2.c scan-tree-dump-not forwprop3 "_expr"
This shows that the x86 backend is missing vcond_mask_qiqi and friends (for AVX512 mask modes). Either that or both expand_vec_cond_expr_p and all the machinery behind it (ISEL pass, lowering) should handle pure integer mode VEC_COND_EXPR via bit operations. I think quite some targets now implement patterns for these variants, whatever their boolean vector modes are. One complication with the change, which was (simplify (op @3 (vec_cond:s @0 @1 @2)) - (vec_cond @0 (op! @3 @1) (op! @3 @2)))) + (if (TREE_CODE_CLASS (op) != tcc_comparison + || types_match (type, TREE_TYPE (@1)) + || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK)) + (vec_cond @0 (op! @3 @1) (op! @3 @2))))) is that expand_vec_cond_expr_p can also handle comparison defined masks, but whether or not we have this isn't visible here so we can only check whether vcond_mask expansion would work. We have optimize_vectors_before_lowering_p but we shouldn't even there turn supported into not supported ops and as said, what's supported or not cannot be finally decided (if it's only vcond and not vcond_mask that is supported). Also optimize_vectors_before_lowering_p is set for a short time between vectorization and vector lowering and we definitely do not want to turn supported vectorizer emitted stmts into ones that we need to lower. For GCC 15 we should see to move vector lowering before vectorization (before loop optimization I'd say) to close this particula hole (and also reliably ICE when the vectorizer creates unsupported IL). We also definitely want to retire vcond expanders (no target I know of supports single-instruction compare-and-select). So short term we either live with this regression (the testcase verifies we perform constant folding to { 0, 0 }), implement the four missing patterns (qi, hi, si and di missing value mode vcond_mask patterns) or see to implement generic code for this. Given precedent I'd tend towards adding the x86 patterns. Hongtao, can you handle that? Thanks, Richard.