Il 19/03/2013 11:06, George Dunlap ha scritto: > On Mon, Mar 18, 2013 at 6:00 PM, Paolo Bonzini <pbonz...@redhat.com> wrote: >> Il 18/03/2013 18:38, George Dunlap ha scritto: >>>>> >>>> This might be a difference between Xen and KVM. On Xen migration is >>>> made to a server in a paused state, and it's only unpaused when >>>> the migration to B is complete. There's a sort of extra handshake at >>>> the end. >>> >>> I think what you mean is that all the memory is handled by Xen and the >>> toolstack, not by qemu. The qemu state is sent as the very last thing, >>> after all of the memory, and therefore (you are arguing) that qemu is >>> not started, and the files cannot be opened, until after the migration >>> is nearly complete, and certainly until after the file is closed on the >>> sending side. >> >> That would be quite dangerous. Files aren't closed until after QEMU >> exits; at this point whatever problem you have launching QEMU on the >> destination would be unrecoverable. > > But if I understand your concern correctly, you were concerned about > the following scenario: > R1. Receiver qemu opens file > R2. Something causes receiver kernel to cache parts of file (maybe > optimistic read-ahead)
For some image formats, metadata is cached inside QEMU on startup. There is a callback to invalidate QEMU's cache at the end of migration, but that does not extend to the page cache. > S1. Sender qemu writes to file > S2. Sender qemu does final flush > S3. Sender qemu closes file > R3. Receiver reads stale blocks from cache > > Even supposing that Xen doesn't actually shut down qemu until it is > started on the remote side, as long as the file isn't opened by qemu > until after S2, we should be safe, right? It would look like this: > > S1. Sender qemu writes to file > S2. Sender qemu does final flush > R1. Receiver qemu opens file > R2. Receiver kernel caches file > S3. Sender qemu closes file > > This is all assuming that: > 1. The barrier operations / write flush are effective at getting the > data back on to the NFS server > 2. The receiver qemu doesn't open the file until after the last flush > by the sender. > > Number 1 has been tested by Alex I believe, and is mentioned in the > changeset log; so if #2 is true, then we should be safe. I'll try to > verify that today. Thanks. >> Even for successful migration, it would also be bad for downtime (QEMU >> isn't exactly lightning-fast to start). And even if failure weren't >> catastrophic, it would be a pity to transfer a few gigs of memory and >> then find out that QEMU isn't present in the destination. :) > > Well, if qemu isn't present at the destination, that's definitely user > error. :-) In any case, I know that he migrate can resume if it > fails, so I suspect that the qemu is just paused on the sending side > until the migration is known to complete. As long as the last write > was flushed to the NFS server before the receiver opens the file, we > should be safe. Note that the close really must happen before the next open. Otherwise the file metadata might not be up-to-date on the destination, too. Paolo