Hi everybody, Here are the notes from the last Hypervisor Live Update call that happened on Monday, April 21. Thanks to everybody who was involved!
These notes are intended to bring people up to speed who could not attend the call as well as keep the conversation going in between meetings. ----->o----- KHO v6 is now staged in Andrew's mm-new tree. We discussed what it will take for this series to be pushed to Linus, specifically around Reviewed-by tags. There is not a ton of x86 specific code, but it would likely be useful to have Reviewed-bys from some x86 maintainers. Dave Hansen, would you be the right person to take a look at this from an x86 perspective? We definitely wanted to touch base with Jason Gunthorpe on this topic since he was on vacation at the time. Jason, do you have any feedback on KHO v6 that would be blocking for its eventual merge upstream? ----->o----- Pratyush looked at the LUO patches and noticed that it was largely doing what he would have done with fdbox. He expressed a concern about the global states, which was not part of the fdbox thinking. Pasha clarified that the finish step would be when the final cleanup would be done; we can still control what gets unfreezed and when. Pratyush asked if the prepare phase worked the same way for LUO. Pasha said that the list of participating subsystems is added before the prepare step, which does not freeze anything. Pratyush noted this is very similar to what he wanted to do with fdbox so suggested working on top of LUO. Pasha said that LUO v2 was happening right now and already began porting over the fdbox support. ----->o----- We discussed whether LUO largely replaces the need for guestmemfs as discussed in previous instances. It was noted that guestmemfs largely was aimed at preserving IOMMU and now we have aligned on iommufds, which was also supported by Jason previously. Since we didn't have James Gowans in this call this time, we wanted to follow-up on the upstream mailing list for this. James, do you agree on the convergence with LUO or is there still use cases where guestmemfs would be useful that isn't currently planned? ----->o----- We touched base very briefly about swiotlb in low memory, an issue that Pratyush ran into several weeks ago. Mike Rapoport noted that this was now supported with KHO v6 upstream which uses lowmem scratch support, so this should no longer be an issue. ----->o----- The u64 in the KHO FDT was discussed from the last sync, which KHO v6 implements. Each component has its own independent FDT that goes to the global FDT's physical address. We revisited the discussion from before when Alex had previously implemented it differently. The fixed maximum size was one of the biggest blockers. Now, with KHO v6, everybody gets their own subtree and can allocate from anywhere they want. I noted that one of concerns James had previously flagged was about dumping the state of the FDT for debugging purposes. This is noted as being solved in KHO v6. No additional concerns were flagged about the single u64 that was implemented. ----->o----- Mike noted that memblock is easy and useful for ftrace without a ton of additional complexity. ----->o----- We discussed the future of KSTATE[1]. Andrey noted that his vision allowed for building it on top of KHO as a protocol for describing state. This hasn't been done yet, but can be worked on. I asked if the KHO v6 pending in Andrew's tree would cause any issues for this extension. Andrey noted that this is a format for serialization and de-serialization data, the data itself could be an FDT blob. Pasha asked who would be benefiting from this serialization. Andrey noted this would be pretty much everything, including device drivers. At least with LUO v2, every component gets the u64 to store anything they want and this memory gets preserved by KHO. Pasha wondered if KSTATE could then plug into LUO v2, since it doesn't have any format dependency. Andrey said this could be done. It has similar flows compared to qemu. It was noted that KSTATE was going to be complementary, not overlapping, with KHO. Pasha suggested building KSTATE on top of LUO v2 and suggested Chris Li review the current KSTATE proposal. ----->o----- We discussed testing for LUO; Pasha noted that the plan is to add selftests and then asked a general question about existing kexec tests that exist, including with qemu emulator. We need to create a mechanism to do kexec testing with qemu that can be done directly with selftests. Andrey suggested using a nested VM and to test live update between two different kernel versions. I asked about how we could enumerate upstream the supported kernel versions that support live update, including their device drivers. I focused on how to describe the set of drivers that have been tested so that others can consume that information -- is this done through a code change that indicates that they can upgrade from a specific version. David Matlack suggested focusing on automation and frameworks that downstream consumers of the kernel can use in their own environments. Pasha suggested a zero-day testing infrastructure for reporting regressions and something like syzbot to track regressions. David Matlack also noted that he is working on VFIO selftests. ----->o----- Ashish Kalra asked about SEV-SNP support for live update. He noted "for SNP there is a VMSA page which is marked in-use/busy when the guest is running. Then the VMSA page for the currently running vCPU cannot be dumped by makedumpfile during vmcore generation as walking the guest memory and touching it will cause unrecoverable #NPF faults as VMSA is marked busy. So, this looks like a potential use case for preserving guest memory across kexec (kstate patches), so that the VMSA page be marked to be preserved and reserved." ----->o----- Next meeting will be on Monday, May 5 at 8am PDT (UTC-7), everybody is welcome: https://meet.google.com/rjn-dmzu-hgq Topics for the next meeting: - address any pending concerns from Jason as well as x86 maintainers that start looking at the KHO series + we specifically want to wait for Jason to discuss the decision on everything being preserved by fds vs recreating the state on the other side of kexec - determine next set of milestones for KHO v6 beyond what is already staged in Andrew's tree - determine next set of milestones for LUO v2 in its development and limitations on current support - confirm that LUO v2 subsumes guestmemfs or identify required support that is not yet present - update on KSTATE progress on top of LUO and what will be needed for PCIe devices - upstream patches for 1GB dev dax support - update on physical pool allocator that can be used to provide pages for hugetlb, guest_memfd, and memfds - SEV-SNP support for preserving guest memory and what foundational components AMD can depend on, building on top of KHO v6 or KSTATE - later: reducing blackout window during live update - later: testing methodology to allow downstream consumers to qualify that live update works from one version to another Please let me know if you'd like to propose additional topics for discussion, thank you! [1] https://github.com/aryabinin/linux/commits/kstate-v2.1/