Hi Richard, > * config/aarch64/aarch64.md (cpymemdi): Remove pattern condition.
> Shouldn't this be a separate patch? It's not immediately obvious that this > is a necessary part of this change. You mean this? @@ -1627,7 +1627,7 @@ (define_expand "cpymemdi" (match_operand:BLK 1 "memory_operand") (match_operand:DI 2 "general_operand") (match_operand:DI 3 "immediate_operand")] - "!STRICT_ALIGNMENT || TARGET_MOPS" + "" Yes that's necessary since that is the bug. > + unsigned align = INTVAL (operands[3]); > >This should read the value with UINTVAL. Given the useful range of the >alignment, it should be OK that we're not using unsigned HWI. I'll fix that. > + if (!CONST_INT_P (operands[2]) || (STRICT_ALIGNMENT && align < 16)) > return aarch64_expand_cpymem_mops (operands); > > So what about align=4 and copying, for example, 8 or 12 bytes; wouldn't we > want a sequence of LDR/STR in that case? Doesn't this fall back to MOPS too > eagerly? The goal was to fix the issue in way that is both obvious and can be easily backported. Further improvements can be made to handle other alignments, but it is slightly tricky (eg. align == 4 won't emit LDP/STP directly using current code and thus would need additional work to generalize the LDP path). >> + unsigned max_mops_size = aarch64_mops_memcpy_size_threshold; > >I find this name slightly confusing. Surely it's min_mops_size (since above >that we want to use MOPS rather than inlined loads/stores). But why not just >use aarch64_mops_memcpy_size_threshold directly in the one place it's used? The reason is that in a follow-on patch I check aarch64_mops_memcpy_size_threshold too, so for now this acts as a shortcut for the ridiculously long name. > Are there any additional tests for this? There are existing tests that check the expansion which fail if you completely block expansions with STRICT_ALIGNMENT. Cheers, Wilco