On Tue, Jun 16, 2026 at 02:22:18PM +1000, Gavin Shan wrote:
> On 6/16/26 7:31 AM, Gavin Shan wrote:
> > On 6/16/26 5:42 AM, Michael S. Tsirkin wrote:
> > > On Tue, Jun 16, 2026 at 05:24:15AM +1000, Gavin Shan wrote:
> > > > Hi Peter and Michael,
> > > > 
> > > > On 6/16/26 1:12 AM, Peter Maydell wrote:
> > > > > On Mon, 15 Jun 2026 at 15:56, Michael S. Tsirkin <[email protected]> 
> > > > > wrote:
> > > > > > 
> > > > > > On Mon, Jun 15, 2026 at 11:57:09AM +0100, Peter Maydell wrote:
> > > > > > > On Mon, 15 Jun 2026 at 11:03, Gavin Shan <[email protected]> wrote:
> > > > > > > > 
> > > > > > > > All ram device regions were turned to be indirectly accessible 
> > > > > > > > by commit
> > > > > > > > 4a2e242bbb ("memory: Don't use memcpy for ram_device regions"). 
> > > > > > > > This leads
> > > > > > > > to guest hang on attempt to build 'cuda-samples' as reported by 
> > > > > > > > Julia. The
> > > > > > > > guest is started by the following command lines, with GH100 GPU 
> > > > > > > > card passed
> > > > > > > > from the host.
> > > > > > > 
> > > > > > > > diff --git a/include/system/memory.h b/include/system/memory.h
> > > > > > > > index 1417132f6d..5878727d09 100644
> > > > > > > > --- a/include/system/memory.h
> > > > > > > > +++ b/include/system/memory.h
> > > > > > > > @@ -2897,6 +2897,8 @@ void 
> > > > > > > > address_space_register_map_client(AddressSpace *as, QEMUBH *bh);
> > > > > > > >    void address_space_unregister_map_client(AddressSpace *as, 
> > > > > > > > QEMUBH *bh);
> > > > > > > > 
> > > > > > > >    /* Internal functions, part of the implementation of 
> > > > > > > > address_space_read.  */
> > > > > > > > +void qemu_ram_copy(void *dest, const void *src, size_t n);
> > > > > > > > +void qemu_ram_move(void *dest, const void *src, size_t n);
> > > > > > > 
> > > > > > > New function prototypes in include headers need documentation 
> > > > > > > comments.
> > > > > > > 
> > > > > > > In particular for these, it's really important that we clearly say
> > > > > > > what semantics we are attempting to provide with them, so that
> > > > > > > (a) when we're reviewing or later updating the implementation we
> > > > > > > know what we are trying to provide
> > > > > > > (b) when we're looking at the callsites we know what the function
> > > > > > > is guaranteeing to us and what it is not, and thus whether it's
> > > > > > > OK to use it or we need something els.
> > > > > > > 
> > > > > > > > +#if defined(__i386__) || defined(__x86_64__)
> > > > > > > > +#define HOST_UNALIGNED_MMIO_OK 1
> > > > > > > > +#else
> > > > > > > > +#define HOST_UNALIGNED_MMIO_OK 0
> > > > > > > > +#endif
> > > > > > > 
> > > > > > > We need to do something better than this. We can't
> > > > > > > just say "oh, we trust that on x86 this works": it is
> > > > > > > neither actually true that the compiler guarantees it
> > > > > > 
> > > > > > sorry guarantee what?
> > > > > 
> > > > > Well, that's part of my point about "we need a doc comment":
> > > > > what exactly are we trying to guarantee ? But whatever it is,
> > > > > "hardcoded ifdef that privileges x86" is definitely the wrong
> > > > > answer.
> > > > > 
> > > > 
> > > > Could you please check the following comments (documentation context) 
> > > > for the
> > > > added functions look good to you? Please let me know if there are 
> > > > anything
> > > > can be improved.
> > > > 
> > > > +
> > > > +/**
> > > > + * qemu_ram_copy: copy data to a RAMBlock
> > > > + *
> > > > + * @dest: destination where the data is copied to. It's the pointer 
> > > > returned
> > > > + *        by ramblock_ptr() and its variants.
> > > > + * @src: source where the data is copied from. It can be a regular 
> > > > memory block
> > > > + *       or a pointer returned by ramblock_ptr() and its variants.
> > > > + * @n: length of data to be copied
> > > > + *
> > > > + * The destination is always a pointer to data area of a RAMBlock 
> > > > which can
> > > > + * be for a regular memory block or a MMIO region. The source can be a 
> > > > pointer
> > > > + * to a regular memory block or a MMIO region. Atomicity isn't 
> > > > guranteed by
> > > > + * memcpy() to copy data between those MMIO regions as we do in this 
> > > > function.
> > > > + * Besides, unaligned accesses are well handled on all architectures 
> > > > except
> > > > + * i386 and x86_64 where the unaligned accesses are supported by the 
> > > > CPUs.
> > > > + *
> > > > + * The ensured atomicity and alignment consideration, which aren't 
> > > > needed
> > > > + * by data copy between two regular memory blocks. So performance 
> > > > penalty
> > > > + * is introduced by this function in such circumstance.
> > > > + */
> > > > +void qemu_ram_copy(void *dest, const void *src, size_t n);
> > > > +
> > > > +/**
> > > > + * qemu_ram_move: move data to a RAMBlock
> > > > + *
> > > > + * @dest: destination where the data is moved to. It's the pointer 
> > > > returned
> > > > + *        by ramblock_ptr() and its variants.
> > > > + * @src: source where the data is moved from. It can be a regular 
> > > > memory block
> > > > + *       or a pointer returned by ramblock_ptr() and its variants.
> > > > + * @n: length of data to be moved
> > > > + *
> > > > + * The destination is always a pointer to data area of a RAMBlock 
> > > > which can
> > > > + * be for a regular memory block or a MMIO region. The source can be a 
> > > > pointer
> > > > + * to a regular memory block or a MMIO region. Atomicity isn't 
> > > > guranteed by
> > > > + * memmove() to move data from or to those MMIO regions as we do in 
> > > > this
> > > > + * function. Besides, unaligned accesses are well handled on all 
> > > > architectures
> > > > + * except i386 and x86_64 where the unaligned accesses are supported 
> > > > by the
> > > > + * CPUs.
> > > > + *
> > > > + * The ensured atomicity and alignment consideration, which aren't 
> > > > needed
> > > > + * by data movement between two regular memory blocks. So performance 
> > > > penalty
> > > > + * is introduced by this function in such circumstance.
> > > > + */
> > > > +void qemu_ram_move(void *dest, const void *src, size_t n);
> > > > +
> > > 
> > > 
> > > I apologize, I don't understand what these comments are saying, at all.
> > > 
> > 
> > Sorry to hear that, Michael. There are two things done in the newly added
> > function qemu_ram_{copy, move}(), comparing to the original memcpy() and
> > memmove():
> > 
> > - Data copy or movement on MMIO region(s) using __bultin_{memcopy, 
> > memmove}()
> >    on x86, or qatomic_set() on other architecutures, to avoid the unexpected
> >    optimization done by the compiler on memcpy/memmove. Unsafe instructions
> >    can be produced by the compiler's optimization. This fixes the issue 
> > resolved
> >    by commit 4a2e242bbb ("memory: Don't use memcpy for ram_device regions")
> > 
> > - The unaligned access is handled by forcing the access size aligned to 
> > (src | dest | n)
> >    on non-x86 architectures, comparing to the original memcpy() and 
> > memmove().
> > 
> > Please let me know if we're on same page. If we do, I can revise the 
> > comments
> > accordingly.
> > 
> 
> I've modified the comments to something as below. Please let me know if it's
> better. The point is to emphasize qemu_ram_{copy, memmove}() are friendly to
> IO regions :-)
> 
> -----> include/system/memory.h
> 
> +/**
> + * qemu_ram_copy: copy data to a guest memory block
> + *

what is a memory block? ramblock?

don't we need something like this for reads?


> + * @dest: destination where the data is copied to. A pointer to a guest 
> memory

@dst?

> + *        block returned from ramblock_ptr() or its variants
> + * @src: source where the data is copied from. A pointer to host memory block
> + *       or guest memory block returned from ramblock_ptr() or its variants.
> + * @n: length of data to be copied
> + *
> + * When the source pointer, destinatoin pointer or both dereference guest
> + * memory blocks, which can be IO blocks (regions).

IO block?

let's not invent terminology pls.

> Under this cirtumstance,
> + * data copy between those blocks is equivalent to IO accesses. It's unsafe
> + * to use memcpy(), which could be translated to IO unfriendly instructions
> + * by the compiler due to code level optimization, resulting in the 
> unexpected
> + * behavior. This function is ensured to be friendly to both IO and memory
> + * accesses. No optimized instructions should be generated by the compiler,
> + * and no unexpected behavior should be observed for IO accesses.

Let's be specific, and brief, and tell user what this does:
Maybe:

        Copy @n bytes from @src to @dst.
        Assumes @src and @dst do not overlap.
        Handles special cases such as uncacheable ramblocks correctly.
        Use this for accessing ramblock in response to DMA/VCPU IO
        and in preference to memcpy.



> + */
> +void qemu_ram_copy(void *dest, const void *src, size_t n);
> +
> +/**
> + * qemu_ram_move: move data to a guest memory block



> + *
> + * @dest: destination where the data is moved to. A pointer to a guest memory
> + *        block returned from ramblock_ptr() or its variants
> + * @src: source where the data is moved from. A pointer to host memory block
> + *       or guest memory block returned from ramblock_ptr() or its variants.
> + * @n: length of data to be copied
> + *
> + * Similar to qemu_ram_copy(), this function is ensured to be friendly to 
> both
> + * IO and memory accesses. No optimized instructions should be generated by
> + * the compiler, and no unexpected behavior should be observed for IO 
> accesses.

unclear how it is different from copy.

> + */
> +void qemu_ram_move(void *dest, const void *src, size_t n);



> 
> > > 
> > > 
> > > > > -- PMM
> > > > > 
> 
> Thanks,
> Gavin


Reply via email to