On Tue, 2012-10-23 at 16:02 +0100, Jamie Lokier wrote: > Nicholas Thomas wrote: > > On Tue, 2012-10-23 at 12:33 +0200, Kevin Wolf wrote: > > > Am 22.10.2012 13:09, schrieb n...@bytemark.co.uk: > > > > > > > > This is unlikely to come up now, but is a necessary prerequisite for > > > > reconnection > > > > behaviour. > > > > > > > > Signed-off-by: Nick Thomas <n...@bytemark.co.uk> > > > > --- > > > > block/nbd.c | 13 +++++++++++-- > > > > 1 files changed, 11 insertions(+), 2 deletions(-) > > > > > > What's the real requirement here? Silently ignoring a flush and > > > returning success for it feels wrong. Why is it correct? > > > > > > Kevin > > > > I just needed to avoid socket operations while s->sock == -1, and > > extending the existing case of "can't do the command, so pretend I did > > it" to "can't do the command right now, so pretend..." seemed like an > > easy way out. > > Hi Nicholas, > > Ignoring a flush is another way of saying "corrupt my data" in some > circumstances. We have options in QEMU already to say whether flushes > are ignored on normal discs, but if someone's chosen the "I really > care about my database/filesystem" option, and verified that their NBD > setup really performs them (in normal circumstances), silently > dropping flushes from time to time isn't nice. > > I would much rather the guest is forced to wait until reconnection and > then get a successful flush, if the problem is just that the server > was done briefly. Or, if that is too hard, that the flush is
OK, that's certainly doable. Revised patches hopefully showing up this week or (at a push) next. > Since the I/O _order_ before, and sometimes after, flush, is important > for data integrity, this needs to be maintained when I/Os are queued in > the disconnected state -- including those which were inflight at the > time disconnect was detected and then retried on reconnect. Hmm, discussing this on IRC I was told that it wasn't necessary to preserve order - although I forget the fine detail. Depending on the implementation of qemu's coroutine mutexes, operations may not actually be performed in order right now - it's not too easy to work out what's happening. I've also just noticed that flush & discard don't take the send_mutex before writing to the socket. That can't be intentional, surely? Paolo? I've got a proof-of-concept that makes flush, discard, readv_1 and writev_1 call a common nbd_co_handle_request function. this retries I/Os when a connection goes away and comes back, and lets me make this patch unnecessary. It doesn't preserve I/O ordering, but it won't be too hard to add that if it is actually necessary. > Ignoring a discard is not too bad. However, if discard is retried, > then I/O order is important in relation to those as well. > > > In the Bytemark case, the NBD server always opens the file O_SYNC, so > > nbd_co_flush could check in_flight == 0 and return 0/1 based on that; > > but I'd be surprised if that's true for all NBD servers. Should we be > > returning 1 here for both "not supported" and "can't do it right now", > > instead? > > When the server is opening the file O_SYNC, wouldn't it make sense to > tell QEMU -- and the guest -- that there's no need to send flushes at > all, as it's equivalent to a disk with no write-cache (or disabled)? Probably; our NBD server still implements the old-style handshakes, as far as I recall, and doesn't actually implement the flush or discard commands as a result.