Hi Peter and Michael,
On 6/16/26 1:12 AM, Peter Maydell wrote:
On Mon, 15 Jun 2026 at 15:56, Michael S. Tsirkin <[email protected]> wrote:
On Mon, Jun 15, 2026 at 11:57:09AM +0100, Peter Maydell wrote:
On Mon, 15 Jun 2026 at 11:03, Gavin Shan <[email protected]> wrote:
All ram device regions were turned to be indirectly accessible by commit
4a2e242bbb ("memory: Don't use memcpy for ram_device regions"). This leads
to guest hang on attempt to build 'cuda-samples' as reported by Julia. The
guest is started by the following command lines, with GH100 GPU card passed
from the host.
diff --git a/include/system/memory.h b/include/system/memory.h
index 1417132f6d..5878727d09 100644
--- a/include/system/memory.h
+++ b/include/system/memory.h
@@ -2897,6 +2897,8 @@ void address_space_register_map_client(AddressSpace *as,
QEMUBH *bh);
void address_space_unregister_map_client(AddressSpace *as, QEMUBH *bh);
/* Internal functions, part of the implementation of address_space_read. */
+void qemu_ram_copy(void *dest, const void *src, size_t n);
+void qemu_ram_move(void *dest, const void *src, size_t n);
New function prototypes in include headers need documentation comments.
In particular for these, it's really important that we clearly say
what semantics we are attempting to provide with them, so that
(a) when we're reviewing or later updating the implementation we
know what we are trying to provide
(b) when we're looking at the callsites we know what the function
is guaranteeing to us and what it is not, and thus whether it's
OK to use it or we need something els.
+#if defined(__i386__) || defined(__x86_64__)
+#define HOST_UNALIGNED_MMIO_OK 1
+#else
+#define HOST_UNALIGNED_MMIO_OK 0
+#endif
We need to do something better than this. We can't
just say "oh, we trust that on x86 this works": it is
neither actually true that the compiler guarantees it
sorry guarantee what?
Well, that's part of my point about "we need a doc comment":
what exactly are we trying to guarantee ? But whatever it is,
"hardcoded ifdef that privileges x86" is definitely the wrong
answer.
Could you please check the following comments (documentation context) for the
added functions look good to you? Please let me know if there are anything
can be improved.
+
+/**
+ * qemu_ram_copy: copy data to a RAMBlock
+ *
+ * @dest: destination where the data is copied to. It's the pointer returned
+ * by ramblock_ptr() and its variants.
+ * @src: source where the data is copied from. It can be a regular memory block
+ * or a pointer returned by ramblock_ptr() and its variants.
+ * @n: length of data to be copied
+ *
+ * The destination is always a pointer to data area of a RAMBlock which can
+ * be for a regular memory block or a MMIO region. The source can be a pointer
+ * to a regular memory block or a MMIO region. Atomicity isn't guranteed by
+ * memcpy() to copy data between those MMIO regions as we do in this function.
+ * Besides, unaligned accesses are well handled on all architectures except
+ * i386 and x86_64 where the unaligned accesses are supported by the CPUs.
+ *
+ * The ensured atomicity and alignment consideration, which aren't needed
+ * by data copy between two regular memory blocks. So performance penalty
+ * is introduced by this function in such circumstance.
+ */
+void qemu_ram_copy(void *dest, const void *src, size_t n);
+
+/**
+ * qemu_ram_move: move data to a RAMBlock
+ *
+ * @dest: destination where the data is moved to. It's the pointer returned
+ * by ramblock_ptr() and its variants.
+ * @src: source where the data is moved from. It can be a regular memory block
+ * or a pointer returned by ramblock_ptr() and its variants.
+ * @n: length of data to be moved
+ *
+ * The destination is always a pointer to data area of a RAMBlock which can
+ * be for a regular memory block or a MMIO region. The source can be a pointer
+ * to a regular memory block or a MMIO region. Atomicity isn't guranteed by
+ * memmove() to move data from or to those MMIO regions as we do in this
+ * function. Besides, unaligned accesses are well handled on all architectures
+ * except i386 and x86_64 where the unaligned accesses are supported by the
+ * CPUs.
+ *
+ * The ensured atomicity and alignment consideration, which aren't needed
+ * by data movement between two regular memory blocks. So performance penalty
+ * is introduced by this function in such circumstance.
+ */
+void qemu_ram_move(void *dest, const void *src, size_t n);
+
-- PMM
Thanks,
Gavin