On Wed, 1 Apr 2020 02:41:54 -0400 Yan Zhao <yan.y.z...@intel.com> wrote:
> On Wed, Apr 01, 2020 at 02:34:24AM +0800, Alex Williamson wrote: > > On Wed, 25 Mar 2020 02:38:58 +0530 > > Kirti Wankhede <kwankh...@nvidia.com> wrote: > > > > > Hi, > > > > > > This Patch set adds migration support for VFIO devices in QEMU. > > > > Hi Kirti, > > > > Do you have any migration data you can share to show that this solution > > is viable and useful? I was chatting with Dave Gilbert and there still > > seems to be a concern that we actually have a real-world practical > > solution. We know this is inefficient with QEMU today, vendor pinned > > memory will get copied multiple times if we're lucky. If we're not > > lucky we may be copying all of guest RAM repeatedly. There are known > > inefficiencies with vIOMMU, etc. QEMU could learn new heuristics to > > account for some of this and we could potentially report different > > bitmaps in different phases through vfio, but let's make sure that > > there are useful cases enabled by this first implementation. > > > > With a reasonably sized VM, running a reasonable graphics demo or > > workload, can we achieve reasonably live migration? What kind of > > downtime do we achieve and what is the working set size of the pinned > > memory? Intel folks, if you've been able to port to this or similar > > code base, please report your results as well, open source consumers > > are arguably even more important. Thanks, > > > hi Alex > we're in the process of porting to this code, and now it's able to > migrate successfully without dirty pages. > > when there're dirty pages, we met several issues. > one of them is reported here > (https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg00004.html). > dirty pages for some regions are not able to be collected correctly, > especially for memory range from 3G to 4G. > > even without this bug, qemu still got stuck in middle before > reaching stop-and-copy phase and cannot be killed by admin. > still in debugging of this problem. Thanks, Yan. So it seems we have various bugs, known limitations, and we haven't actually proven that this implementation provides a useful feature, at least for the open source consumer. This doesn't give me much confidence to consider the kernel portion ready for v5.7 given how late we are already :-\ Thanks, Alex