On 6/16/26 5:40 AM, Michael S. Tsirkin wrote:
On Tue, Jun 16, 2026 at 04:09:27AM +1000, Gavin Shan wrote:
On 6/16/26 3:03 AM, Richard Henderson wrote:
On 6/15/26 09:33, Gavin Shan wrote:
Either assert n != 0 to start, or use while not do/while.
Because the body of the loop won't handle n == 0 correctly.
I will change this to "while (n > 0)" since we're not expecting
"n < 0" either.
size_t is unsigned.
oh, yes.
Could you confirm which stores need qatomic_set()? There are two sets
of stores as below. I guess you're asking have qatomic_set() for (a)?
Could you explain a bit why qatomic_set() is needed?
// (a)
*(uint64_t *)dest = *(uint64_t *)src;
Of course (a).
Using qatomic_set prevents the compiler from merging the stores, kinda like
volatile. Not especially likely with the way you've written this, but if you
had exchanged the switch and loop,
switch (lsb) {
case 1:
for (; n ; n--, dst++, src++) {
*(uint8_t *)dst = *(uint8_t *)src;
}
then the compiler is quite likely to "optimize" the loop back to memcpy.
It's also self-documenting what we're intending with the store.
ok, thanks for your explanation.
I guess it's also worth asking if this copy is also used for copying *from*
device memory. The commit that added memmove (4a73aee8814) suggests that dma
from the device uses these paths. In which case you'll either want a separate
function for that, or both source and destination must be aligned.
I think it's a valid case. Lets handle this by limiting the 'step' based on
src/dest/n, something like below. atomic_read() instead of ldxxx_he_p() is
used for loads. Please take a look if there are anything we can improve
further.
// Not compiled and tested yet
static void qemu_ram_copy_unaligned(void *dest, const void *src, size_t n)
{
uintptr_t test, step;
while (n != 0) {
test = (uintptr_t)src | n; /* alignment enforced by source */
step = test & -test;
test = (uintptr_t)dest | n; /* alignment enforced by destination
*/
step = MIN(step, test & -test);
switch (step) {
case 1:
qatomic_set((uint8_t *)dest, qatomic_read((uint8_t *)src));
src += 1;
dest += 1;
n -= 1;
break;
case 2:
qatomic_set((uint16_t *)dest, qatomic_read((uint16_t *)src));
src += 2;
dest += 2;
n -= 2;
break;
case 4:
qatomic_set((uint32_t *)dest, qatomic_read((uint32_t *)src));
src += 4;
dest += 4;
n -= 4;
break;
default:
qatomic_set((uint64_t *)dest, qatomic_read((uint64_t *)src);
src += 8;
dest += 8;
n -= 8;
}
If the length is > 8, I feel it's important to just use memcpy/memmove
normally.
Because at that point it's a large data transfer not MMIO, and performance
is important.
Yes, making sense to me. it shouldn't be a MMIO transaction if the length
is 16 bytes or more, we can use memcpy/memmove() for this specific case,
to improve the performance.
}
}
Another unrelated question: why 'int' value is returned from ldub_p()
and ld{uw, l}_he_p() in bswap.h? They would return 'uint{8, 16, 32}_t'
values?
A bad choice 20 years ago, and the need to audit all uses in order to change it
now.
Newer functions use uintN_t as you suggest.
Ok :-)
r~
Thanks,
Gavin