Hi Peter, Marco, and QEMU community, I'm Junjie Cao -- I graduated from the National University of Singapore last year and am currently working at Intel. I'd like to express my interest in the GSoC 2026 Fast Snapshot Load project.
Much of my background is in network device virtualization -- specifically the QEMU/DPDK/Virtio-net/VHost stack (emulation and tuning). This has given me hands-on experience with QEMU's guest memory management and device state serialization, and helps me appreciate how practical the problem this project aims to solve really is -- Marco's fuzzing use case of loading snapshots frequently is a great example. == Preparation so far == - Built QEMU from latest master; will be sending a small fix patch soon to get familiar with the QEMU community workflow. - Read Peter's original proposal and the follow-up discussion with Marco; read mapped-ram.rst -- the fixed-offset format allows pread() of any page by RAMBlock+offset directly from the snapshot file, which is the prerequisite for demand paging to work. - On the source level, focused on two paths: the uffd infrastructure in migration/postcopy-ram.c (postcopy_ram_fault_thread, postcopy_ram_incoming_setup, ram_block_enable_notify) to understand how postcopy registers and handles page faults; and ram_load() in migration/ram.c to understand RAM section deserialization during loadvm. The intersection of these two paths is where the core changes for this project would go. == Implementation path as I understand it == The core idea: split loadvm into "device state loading" and "RAM loading", bridged by uffd. Based on file: migration + mapped-ram: - Modify the ram_load() path: parse_ramblock_mapped_ram() reads the MappedRamHeader and bitmap but skips read_ramblock_mapped_ram(), building an offset table for on-demand pread() instead. Other device states load normally. - Register userfaultfd (MISSING mode) on all migratable RAMBlocks. postcopy-ram.c has similar infrastructure; the difference is we don't need discard logic or the source-side page request protocol. - Start vCPUs with two threads populating RAM in parallel: a background loader doing sequential pread() + UFFDIO_COPY for bulk prefetch, and a fault handler resolving vCPU-triggered page faults on demand. A per-RAMBlock atomic bitmap with test_and_set coordinates the two to avoid double-copy. Once all pages are loaded, unregister uffd. Loadvm is complete. Peter mentioned that the MVP could start with no multifd and anonymous memory only -- I think that would be a great approach. Since mapped-ram and postcopy are currently mutually exclusive in migrate_caps_check(), a new capability will likely be needed for this feature. == A thought and a question == Some devices' load_state handlers access guest RAM during restore. Postcopy handles this by registering uffd before device state loading (inside the POSTCOPY_LISTEN handler, before device sections in the CMD_PACKAGED blob). The same ordering applies here: register uffd, start the fault handler, then load device states -- any RAM touch is intercepted and resolved via pread(). This relies on the kernel queuing faults between UFFDIO_REGISTER and the first poll(). I noticed postcopy_ram_incoming_setup() has a similar window (fault thread created before blocks registered) -- is this the same guarantee we'd rely on here? == Next steps == Besides attempting a QEMU patch submission and working on my proposal, is there anything else you'd suggest I prepare? Any feedback or guidance would be very much appreciated! Best regards, Junjie Cao [email protected]
