Hi,
On Wed, 15 May 2019, Aaron Sawdey wrote:
> Yes this would be a nice thing to get to, a single move/copy underlying
> builtin, to which we communicate what the compiler's analysis tells us
> about whether the operands overlap and by how much.
>
> Next question would be how do we move from the existing movmem pattern
> (which Michael Matz tells us should be renamed cpymem anyway) to this
> new thing. Are you proposing that we still have both movmem and cpymem
> optab entries underneath to call the patterns but introduce this new
> memmove_with_hints() to be used by things called by
> expand_builtin_memmove() and expand_builtin_memcpy()?
I'd say so. There are multiple levels at play:
a) exposal to user: probably a new __builtint_memmove, or a new combined
builtin with a hint param to differentiate (but we can't get rid of
__builtin_memcpy/mempcpy/strcpy, which all can go through the same
route in the middleend)
b) getting it through the gimple pipeline, probably just a new builtin
code, trivial
c) expanding the new builtin, with the help of next items
d) RTL block moves: they are defined as non-overlapping and I don't think
we should change this (essentially they're the reflection of struct
copies in C)
e) how any of the above (builtins and RTL block moves) are implemented:
currently non-overlapping only, using movmem pattern when possible;
ultimately all sitting in the emit_block_move_hints() routine.
So, I'd add a new method to emit_block_move_hints indicating possible
overlap, disabling the use of move_by_pieces. Then in
emit_block_move_via_movmem (alse getting an indication of overlap), do the
equivalent of:
finished = 0;
if (overlap_possible) {
if (optab[movmem])
finished = emit(movmem)
} else {
if (optab[cpymem])
finished = emit(cpymem);
if (!finished && optab[movmem]) // can use movmem also for overlap
finished = emit(movmem);
}
The overlap_possible method would only ever be used from the builtin
expansion, and never from the RTL block move expand. Additionally a
target may optionally only define the movmem pattern if it's just as good
as the cpymem pattern (e.g. because it only handles fixed small sizes and
uses a load-all then store-all sequence).
Ciao,
Michael.