On Tue, 27 Feb 2024, haochen.jiang wrote:

> On Linux/x86_64,
> 
> af66ad89e8169f44db723813662917cf4cbb78fc is the first bad commit
> commit af66ad89e8169f44db723813662917cf4cbb78fc
> Author: Richard Biener <rguent...@suse.de>
> Date:   Fri Feb 23 16:06:05 2024 +0100
> 
>     middle-end/114070 - folding breaking VEC_COND expansion
> 
> caused
> 
> FAIL: gcc.dg/tree-ssa/andnot-2.c scan-tree-dump-not forwprop3 "_expr"

This shows that the x86 backend is missing vcond_mask_qiqi and friends
(for AVX512 mask modes).  Either that or both expand_vec_cond_expr_p
and all the machinery behind it (ISEL pass, lowering) should handle
pure integer mode VEC_COND_EXPR via bit operations.  I think quite some
targets now implement patterns for these variants, whatever their
boolean vector modes are.

One complication with the change, which was

  (simplify
   (op @3 (vec_cond:s @0 @1 @2))
-  (vec_cond @0 (op! @3 @1) (op! @3 @2))))
+  (if (TREE_CODE_CLASS (op) != tcc_comparison
+       || types_match (type, TREE_TYPE (@1))
+       || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK))
+   (vec_cond @0 (op! @3 @1) (op! @3 @2)))))

is that expand_vec_cond_expr_p can also handle comparison defined
masks, but whether or not we have this isn't visible here so we
can only check whether vcond_mask expansion would work.

We have optimize_vectors_before_lowering_p but we shouldn't even there
turn supported into not supported ops and as said, what's supported or
not cannot be finally decided (if it's only vcond and not vcond_mask
that is supported).  Also optimize_vectors_before_lowering_p is set
for a short time between vectorization and vector lowering and we
definitely do not want to turn supported vectorizer emitted stmts
into ones that we need to lower.  For GCC 15 we should see to move
vector lowering before vectorization (before loop optimization I'd
say) to close this particula hole (and also reliably ICE when the
vectorizer creates unsupported IL).  We also definitely want to
retire vcond expanders (no target I know of supports single-instruction
compare-and-select).

So short term we either live with this regression (the testcase
verifies we perform constant folding to { 0, 0 }), implement
the four missing patterns (qi, hi, si and di missing value mode
vcond_mask patterns) or see to implement generic code for this.

Given precedent I'd tend towards adding the x86 patterns.

Hongtao, can you handle that?

Thanks,
Richard.

Reply via email to