Wilco Dijkstra <wilco.dijks...@arm.com> writes:
> Hi Richard,
>
> (that's quick!)
>
>> +  if (size > max_copy_size || size > max_mops_size)
>> +    return aarch64_expand_cpymem_mops (operands, is_memmove);
>>
>> Could you explain this a bit more?  If I've followed the logic correctly,
>> max_copy_size will always be 0 for movmem, so this "if" condition will
>> always be true for movmem (given that the caller can be relied on to
>> optimise away zero-length copies).  So doesn't this function reduce to:
>
> In this patch it is zero yes, but there is no real reason for that. The goal 
> is to
> share as much code as possible. I have a patch that inlines memmove like
> memcpy.

But I think this part of the patch belongs in that future series.
The current patch should just concentrate on fixing the bug.

It's difficult to evaluate the change at the moment, without the follow-on
change that it's preparing for.  I don't think it stands as an indepedent
improvement in its own right.

>> when is_memmove is true?  If so, I think it would be clearer to do that
>> directly, rather than go through aarch64_expand_cpymem.  max_copy_size
>> is really an optimisation threshold, whereas the above seems to be
>> leaning on it for correctness.
>
> In principle we could for the time being add a assert (!is_memmove) if that
> makes it clearer memmove isn't yet handled.

I think for this patch movmemdi should just call aarch64_expand_cpymem_mops
directly.  Let's leave the aarch64_expand_cpymem changes to other patches.

>> ...I think we might as well keep this pattern conditional on TARGET_MOPS.
>
> But then we have inconsistencies in the conditions of the expanders, which
> is what led to all these bugs in the first place (I lost count, there are 4 
> or 5
> different bugs I fixed). Ensuring everything is 100% identical between
> memcpy and memmove makes the code much easier to follow.

I think that too should be part of your follow-on changes to do inline
movmem expansions without TARGET_MOPS.  While all supported movmemdis
require TARGET_MOPS, I think the expander should too.

>> I think we can then also split:
>>
>>   /* All three registers are changed by the instruction, so each one
>>      must be a fresh pseudo.  */
>>   rtx dst_addr = copy_to_mode_reg (Pmode, XEXP (operands[0], 0));
>>   rtx src_addr = copy_to_mode_reg (Pmode, XEXP (operands[1], 0));
>>   rtx dst_mem = replace_equiv_address (operands[0], dst_addr);
>>   rtx src_mem = replace_equiv_address (operands[1], src_addr);
>>   rtx sz_reg = copy_to_mode_reg (DImode, operands[2]);
>>
>> out of aarch64_expand_cpymem_mops into a new function (say
>> aarch64_prepare_mops_operands) and call it from the movmemdi
>> expander.  There should then be no need for the extra staging
>> expander (aarch64_movmemdi).
>
> So you're saying we could remove aarch64_cpymemdi/movmemdi if
> aarch64_expand_cpymem_mops did massage the operands in the
> right way so that we can immediately match the underlying instruction?

Yeah.  But I'd forgotten about the pesky fourth (alignment) operand
to movmemdi and cpymemdi, which we don't need for the mops patterns.
So I take that part back.  I agree it's clearer to have a separate
aarch64_movmemdi expander.

> Hmm, does that actually work, as in we don't lose the extra alias info that
> gets lost in the current memmove expander? (another bug/inconsistency)
>
> And the MOPS code would be separated from aarch64_expand_cpymem
> so we'd do all the MOPS size tests inside aarch64_expand_cpymem_mops
> and the expander tries using MOPS first and if it fails try inline expansion?
>
> So something like:
>
> (define_expand "movmemdi"
> ....
>   if (aarch64_try_mops_expansion (operands, is_memmove))
>     DONE;
>   if (aarch64_try_inline_copy_expansion (operands, is_memmove))
>     DONE;
>   FAIL;
> )
>
>> IMO the STRICT_ALIGNMENT stuff should be a separate patch,
>> with its own testcases.
>
> We will need backports to fix all these bugs, so the question is whether it
> is worth doing a lot of cleanups now?

But I think what I'm asking for is significantly simpler than the
original patch.  That should make it more backportable rather than less.

Thanks,
Richard

Reply via email to