Robin Dapp <[email protected]> writes:
> This adds zero else operands to masked loads and their intrinsics.
> I needed to adjust more than initially thought because we rely on
> combine for several instructions and a change in a "base" pattern
> needs to propagate to all those.
Looks less invasive than I'd feared though -- nice!
> For the lack of a better idea I used a function call property to specify
> whether a builtin needs an else operand or not. Somebody with better
> knowledge of the aarch64 target can surely improve that.
Yeah, those flags are really for source-level/gimple-level attributes.
Would it work to pass a new parameter to use_contiguous_load instead?
> [...]
> @@ -1505,10 +1506,16 @@ public:
> {
> insn_code icode;
> if (e.vectors_per_tuple () == 1)
> - icode = convert_optab_handler (maskload_optab,
> - e.vector_mode (0), e.gp_mode (0));
> + {
> + icode = convert_optab_handler (maskload_optab,
> + e.vector_mode (0), e.gp_mode (0));
> + e.args.quick_push (CONST0_RTX (e.vector_mode (0)));
> + }
> else
> - icode = code_for_aarch64 (UNSPEC_LD1_COUNT, e.tuple_mode (0));
> + {
> + icode = code_for_aarch64 (UNSPEC_LD1_COUNT, e.tuple_mode (0));
> + e.args.quick_push (CONST0_RTX (e.tuple_mode (0)));
> + }
> return e.use_contiguous_load_insn (icode);
> }
> };
For the record, I don't think we strictly need the zeros on LD1_COUNT
and LD1NT_COUNT. But I agree it's probably better to add them anyway,
for consistency.
(So please keep this part of the patch. Just saying the above to show
that I'd thought about it.)
> @@ -1335,6 +1340,27 @@ (define_insn "vec_mask_load_lanes<mode><vsingle>"
>
> ;; Predicated load and extend, with 8 elements per 128-bit block.
> (define_insn_and_rewrite
> "@aarch64_load<SVE_PRED_LOAD:pred_load>_<ANY_EXTEND:optab><SVE_HSDI:mode><SVE_PARTIAL_I:mode>"
> + [(set (match_operand:SVE_HSDI 0 "register_operand" "=w")
> + (unspec:SVE_HSDI
> + [(match_operand:<SVE_HSDI:VPRED> 3 "general_operand" "UplDnm")
> + (ANY_EXTEND:SVE_HSDI
> + (unspec:SVE_PARTIAL_I
> + [(match_operand:<SVE_PARTIAL_I:VPRED> 2 "register_operand" "Upl")
> + (match_operand:SVE_PARTIAL_I 1 "memory_operand" "m")
> + (match_operand:SVE_PARTIAL_I 4 "aarch64_maskload_else_operand")]
> + SVE_PRED_LOAD))]
> + UNSPEC_PRED_X))]
> + "TARGET_SVE && (~<SVE_HSDI:narrower_mask> & <SVE_PARTIAL_I:self_mask>) ==
> 0"
> + "ld1<ANY_EXTEND:s><SVE_PARTIAL_I:Vesize>\t%0.<SVE_HSDI:Vctype>, %2/z, %1"
> + "&& !CONSTANT_P (operands[3])"
> + {
> + operands[3] = CONSTM1_RTX (<SVE_HSDI:VPRED>mode);
> + }
> +)
> +
> +;; Same as above without the maskload_else_operand to still allow combine to
> +;; match a sign-extended pred_mov pattern.
> +(define_insn_and_rewrite
> "*aarch64_load<SVE_PRED_LOAD:pred_load>_<ANY_EXTEND:optab>_mov<SVE_HSDI:mode><SVE_PARTIAL_I:mode>"
> [(set (match_operand:SVE_HSDI 0 "register_operand" "=w")
> (unspec:SVE_HSDI
> [(match_operand:<SVE_HSDI:VPRED> 3 "general_operand" "UplDnm")
Splitting the patterns is the right thing to do, but it also makes
SVE_PRED_LOAD redundant. The pattern above with the else operand
should use UNSPEC_LD1_SVE in place of SVE_PRED_LOAD. The version
without should use UNSPEC_PRED_X (and I think can be an unnamed pattern,
starting with "*").
This would make SVE_PRED_LOAD and pred_load redundant, so they can
be removed. The caller in svld1_extend_impl would no longer pass
UNSPEC_LD1_SVE.
Sorry about the churn. Matching the load and move patterns in one go
seemed like a nice bit of factoring at the time, but this patch makes
it look like a factoring too far.
Otherwise it looks good. Thanks for doing this.
Richard