Hi everybody, Here are the notes from the last Hypervisor Live Update call that happened on Monday, January 12. Thanks to everybody who was involved!
These notes are intended to bring people up to speed who could not attend the call as well as keep the conversation going in between meetings. ----->o----- This was a relatively short meeting due to the holidays and all of the fantastic discussion at LPC in Tokyo. David Matlack discussed a follow up from the VFIO talk at LPC. There was feedback that disabling interrupts on devices during live update may cause ordering issues so David is following up to understand it better and then decide the right approach to address it. Pasha learned about other work on orphaned VMs which would be the next step for LUO: be able to do a seamless VM update as a stepping stone. This may take some companies longer to switch to an upstream implementation for live update given their current software stack. It was noted that after LPC that Pratyush will continue to work on versioning support and memfd preservation. Mike Rapoport noted that the videos from LPC have been posted online if people missed any of the sessions. ----->o----- Pasha noted that v4 of stateless KHO was sent out last week and received some upstream feedback. That series is getting better, Mike did not suggest any substantial concerns were present at this time. Another KHO series is also upstream that moves some of the ABI to a separate header file. For LUO, Pasha is working on luo-agent[1]. There was an upstream patch series sent out for qemu to support fd preservation using LUO, but without sessions. luo-agent is anticipated to provide the sessions to VMs, so Pasha was speeding up the development of that: he also updated the design doc[2] for the work. He was soliciting feedback from anybody who is interested. There is also work underway to support LUO with CHV. ----->o----- David Matlack updated on the status of VFIO. His goal is to get v2 out in the next couple weeks, it shouldn't look much different than before. The next thing on top of that would be iommufd and PCI device preservation. The PCI preservation doesn't yet have a solid timeline for the next posting as engineers are ramping up. Samiullah noted that he sent out an RFC for IOMMU persistence that was presented at LPC. His plan is to send out a non-RFC version next. He also sent out a series for hitless domain replacement for Intel iommu driver; there was some feedback so a v2 is in progress. Jason suggested that Intel would need to do this: Sami was going to start a discussion with Intel developers on that support. ----->o----- Jason noted there was discussion from LPC about a storage box use case that could leverage KHO support for preserving information about the scsi devices so you don't need to go rediscover everything. This may have overlap with the PCI preservation series if people are interested. ----->o----- Next meeting will be on Monday, January 26 at 8am PST (UTC-8), everybody is welcome: https://meet.google.com/rjn-dmzu-hgq Topics for the next meeting: - status of stateless KHO patches from Jason Miu - possible ordering issues when disabling interrupts during live update - luo-agent development and additional LUO changes targeted for upstream - next version of VFIO patch series to go upstream - non-RFC version of the IOMMU persistence patch series - PCI preservation timelines - hitless replacement for iommu domains, collaboration with Intel - HugeTLB + 1GB page preservation support - versioning support for various components for luod to negotiate - later: update on status of guest_memfd support for 1GB HugeTLB pages - later: testing methodology to allow downstream consumers to qualify that live update works from one version to another - later: reducing blackout window during live update, including deferred struct page initialization Please let me know if you'd like to propose additional topics for discussion, thank you! [1] https://github.com/googleprodkernel/luo-agent [2] https://tinyurl.com/luoddesign
