On 6/16/26 2:59 PM, Michael S. Tsirkin wrote:
On Mon, Jun 15, 2026 at 09:48:00PM -0700, Richard Henderson wrote:
On 6/15/26 21:23, Michael S. Tsirkin wrote:
B. Also on x86, I do not see why we should not use memcpy for large
accesses if we can. Better perf.

We have an example where memcpy writes to the same location 3 times.
This is not appropriate for any host.


r~

Ah, checked libc and sure enough, it does it. E.g. it uses 2 overlapping SSE
stores to do a 17 byte write. Not sure how we get 3 but whatevs.


But just to clarify, I am talking about DMA accesses, that are not
initiated by the VCPU.  I am not so sure we care about multiple stores
in this instance? Do we? We do care about speed, for sure.


In current implementation, qemu_ram_copy/move are differentiated on x86
and other architectures. Do we need to unify the implementations 
(qemu_ram_copy/move)
on all architectures to avoid using memcpy() and memmove()?

Maybe it's time for me to post (v3) for a new round of discussions.

Thanks,
Gavin



Reply via email to