discard commands if connected to the NBD server

Nicholas Thomas Wed, 24 Oct 2012 05:17:00 -0700

On Tue, 2012-10-23 at 16:02 +0100, Jamie Lokier wrote:
> Nicholas Thomas wrote:
> > On Tue, 2012-10-23 at 12:33 +0200, Kevin Wolf wrote:
> > > Am 22.10.2012 13:09, schrieb n...@bytemark.co.uk:
> > > > 
> > > > This is unlikely to come up now, but is a necessary prerequisite for 
> > > > reconnection
> > > > behaviour.
> > > > 
> > > > Signed-off-by: Nick Thomas <n...@bytemark.co.uk>
> > > > ---
> > > >  block/nbd.c |   13 +++++++++++--
> > > >  1 files changed, 11 insertions(+), 2 deletions(-)
> > > 
> > > What's the real requirement here? Silently ignoring a flush and
> > > returning success for it feels wrong. Why is it correct?
> > > 
> > > Kevin
> > 
> > I just needed to avoid socket operations while s->sock == -1, and
> > extending the existing case of "can't do the command, so pretend I did
> > it" to "can't do the command right now, so pretend..." seemed like an
> > easy way out. 
> 
> Hi Nicholas,
> 
> Ignoring a flush is another way of saying "corrupt my data" in some
> circumstances.  We have options in QEMU already to say whether flushes
> are ignored on normal discs, but if someone's chosen the "I really
> care about my database/filesystem" option, and verified that their NBD
> setup really performs them (in normal circumstances), silently
> dropping flushes from time to time isn't nice.
> 
> I would much rather the guest is forced to wait until reconnection and
> then get a successful flush, if the problem is just that the server
> was done briefly.  Or, if that is too hard, that the flush is


OK, that's certainly doable. Revised patches hopefully showing up this
week or (at a push) next. 

> Since the I/O _order_ before, and sometimes after, flush, is important
> for data integrity, this needs to be maintained when I/Os are queued in
> the disconnected state -- including those which were inflight at the
> time disconnect was detected and then retried on reconnect.

Hmm, discussing this on IRC I was told that it wasn't necessary to
preserve order - although I forget the fine detail. Depending on the
implementation of qemu's coroutine mutexes, operations may not actually
be performed in order right now - it's not too easy to work out what's
happening. 

I've also just noticed that flush & discard don't take the send_mutex
before writing to the socket. That can't be intentional, surely? Paolo?

I've got a proof-of-concept that makes flush, discard, readv_1 and
writev_1 call a common nbd_co_handle_request function. this retries I/Os
when a connection goes away and comes back, and lets me make this patch
unnecessary. It doesn't preserve I/O ordering, but it won't be too hard
to add that if it is actually necessary.

> Ignoring a discard is not too bad.  However, if discard is retried,
> then I/O order is important in relation to those as well.
> 
> > In the Bytemark case, the NBD server always opens the file O_SYNC, so
> > nbd_co_flush could check in_flight == 0 and return 0/1 based on that;
> > but I'd be surprised if that's true for all NBD servers. Should we be
> > returning 1 here for both "not supported" and "can't do it right now",
> > instead?
> 
> When the server is opening the file O_SYNC, wouldn't it make sense to
> tell QEMU -- and the guest -- that there's no need to send flushes at
> all, as it's equivalent to a disk with no write-cache (or disabled)?

Probably; our NBD server still implements the old-style handshakes, as
far as I recall, and doesn't actually implement the flush or discard
commands as a result.

Re: [Qemu-devel] [PATCH 1/3] nbd: Only try to send flush/discard commands if connected to the NBD server

Reply via email to