On Tue, Mar 19, 2013 at 11:21 AM, George Dunlap <george.dun...@eu.citrix.com> wrote: > On 03/19/2013 11:14 AM, Paolo Bonzini wrote: >> >> Il 19/03/2013 11:51, George Dunlap ha scritto: >>> >>> On 03/19/2013 10:43 AM, Paolo Bonzini wrote: >>>>>> >>>>>> Even for successful migration, it would also be bad for downtime (QEMU >>>>>> isn't exactly lightning-fast to start). And even if failure weren't >>>>>> catastrophic, it would be a pity to transfer a few gigs of memory and >>>>>> then find out that QEMU isn't present in the destination. :) >>>>> >>>>> >>>>> Well, if qemu isn't present at the destination, that's definitely user >>>>> error. :-) In any case, I know that he migrate can resume if it >>>>> fails, so I suspect that the qemu is just paused on the sending side >>>>> until the migration is known to complete. As long as the last write >>>>> was flushed to the NFS server before the receiver opens the file, we >>>>> should be safe. >>>> >>>> >>>> Note that the close really must happen before the next open. Otherwise >>>> the file metadata might not be up-to-date on the destination, too. >>> >>> >>> By "file metadata" I assume you mean "metadata about the virtual disk >>> within the file", not "metadata about the file within the filesystem", >>> right? That's good to know, I'll keep that in mind. >> >> >> Actually especially the former (I'm calling them respectively "image >> metadata" and "file metadata"). File metadata could also be a problem, >> but I think it might just work except in cases like on-line resizing >> during migration. >> >>> Even if it's true that at the moment qemu doesn't write the file >>> metadata until it closes the file, that just means we'd have to add a >>> hook to the callback to save qemu state, to sync the metadata at that >>> point, right? >> >> >> Unfortunately no. The problem is in the loading side's kernel, on which >> you do not have any control. If the loading side doesn't use O_DIRECT, >> any attempt to invalidate the metadata in userspace or on the source is >> futile, because there is no way to invalidate the page cache's copy of >> that metadata. > > > Yes, I meant "the only further thing we would have to do". The entire > discussion relies on the assumption that the receiving side doesn't open the > file until after the sending side has issued the qemu state save. So as long > as both the virtual blocks and the image metadata have been synced to the > NFS server at that point, we should be all right. If at the moment the > image metadata is *not* synced at that point, it seems like we should be > able to make it so relatively easily.
I've just had a chat with Stefano, and it turns out I was a bit confused -- this change has nothing to do with qemu running as a device model, but only as qemu running as a PV back-end for PV guests. So the question of when in the save/restore process the qemu is started is moot. For posterity's sake, however, here is what I have found about qemu as a device model and Xen migration: * qemu on the receiving side gets the name of the qemu save file from the command-line arguments; so it cannot be started until after qemu on the sending side has been paused, and the file send across the wire. * qemu on the sending side does not exit until it is determined that the receiver is ready to begin. So it is still running when qemu on the receiving side starts. * I didn't determine whether the commands to stop and save state cause a disk image metadata sync. -George