Hi Peter, Marco, and QEMU community,

I'm Junjie Cao -- I graduated from the National University of Singapore
last year and am currently working at Intel. I'd like to express my
interest in the GSoC 2026 Fast Snapshot Load project.

Much of my background is in network device virtualization -- specifically
the QEMU/DPDK/Virtio-net/VHost stack (emulation and tuning). This has
given me hands-on experience with QEMU's guest memory management and
device state serialization, and helps me appreciate how practical the
problem this project aims to solve really is -- Marco's fuzzing use case
of loading snapshots frequently is a great example.

== Preparation so far ==

- Built QEMU from latest master; will be sending a small fix patch soon to
get familiar with the QEMU community workflow.

- Read Peter's original proposal and the follow-up discussion with Marco;
read mapped-ram.rst -- the fixed-offset format allows pread() of any page
by RAMBlock+offset directly from the snapshot file, which is the
prerequisite for demand paging to work.

- On the source level, focused on two paths: the uffd infrastructure in
migration/postcopy-ram.c (postcopy_ram_fault_thread,
postcopy_ram_incoming_setup, ram_block_enable_notify) to understand how
postcopy registers and handles page faults; and ram_load() in
migration/ram.c to understand RAM section deserialization during loadvm.
The intersection of these two paths is where the core changes for this
project would go.

== Implementation path as I understand it ==

The core idea: split loadvm into "device state loading" and "RAM
loading", bridged by uffd. Based on file: migration + mapped-ram:

- Modify the ram_load() path: parse_ramblock_mapped_ram() reads the
MappedRamHeader and bitmap but skips read_ramblock_mapped_ram(), building
an offset table for on-demand pread() instead. Other device states load
normally.

- Register userfaultfd (MISSING mode) on all migratable RAMBlocks.
postcopy-ram.c has similar infrastructure; the difference is we don't
need discard logic or the source-side page request protocol.

- Start vCPUs with two threads populating RAM in parallel: a background
loader doing sequential pread() + UFFDIO_COPY for bulk prefetch, and a
fault handler resolving vCPU-triggered page faults on demand. A
per-RAMBlock atomic bitmap with test_and_set coordinates the two to
avoid double-copy. Once all pages are loaded, unregister uffd. Loadvm is
complete.

Peter mentioned that the MVP could start with no multifd and anonymous
memory only -- I think that would be a great approach. Since mapped-ram
and postcopy are currently mutually exclusive in migrate_caps_check(), 
a new capability will likely be needed for this feature.

== A thought and a question ==

Some devices' load_state handlers access guest RAM during restore.
Postcopy handles this by registering uffd before device state loading
(inside the POSTCOPY_LISTEN handler, before device sections in the
CMD_PACKAGED blob). The same ordering applies here: register uffd, start
the fault handler, then load device states -- any RAM touch is
intercepted and resolved via pread().

This relies on the kernel queuing faults between UFFDIO_REGISTER and the
first poll(). I noticed postcopy_ram_incoming_setup() has a similar
window (fault thread created before blocks registered) -- is this the
same guarantee we'd rely on here?

== Next steps ==

Besides attempting a QEMU patch submission and working on my proposal,
is there anything else you'd suggest I prepare?

Any feedback or guidance would be very much appreciated!

Best regards,
Junjie Cao
[email protected]

Reply via email to