Re: [Qemu-devel] NBD block device backend - 'improvements'

Kevin Wolf Wed, 16 Feb 2011 03:59:39 -0800

Hi Nick,

[ Please use "reply to all" so that the CC list is kept intact,
re-adding Stefan ]


Am 15.02.2011 22:26, schrieb Nicholas Thomas:
> Hi Kevin, Stefan.
> 
> On Tue, 2011-02-15 at 12:09 +0100, Kevin Wolf wrote:
>> Am 14.02.2011 21:32, schrieb Stefan Hajnoczi:
> [...]
>>> block/nbd.c needs to be made asynchronous in order for this change to
>>> work.  
>>
>> And even then it's not free of problem: For example qemu_aio_flush()
>> will hang. We're having all kinds of fun with NFS servers that go away
>> and let requests hang indefinitely.
>>
>> So maybe what we should add is a timeout option which defaults to 0
>> (fail immediately, like today)
> 
> Noted, so long as we can have -1 as "forever". 

As long as it's optional, that's fine with me.

> I'm currently spending
> time reworking block/nbd.c to be asynchronous, following the model in
> block/sheepdog.c
> 
> There does seem to be a lot of scope for code duplication (setting up
> the TCP connection, taking it down, the mechanics of actually reading /
> writing bytes using the aio interface, etc) between the two, and
> presumably for rbd as well. 
> 
> Reading http://www.mail-archive.com/qemu-devel@nongnu.org/msg36479.html
> suggests it should be possible to have a "tcp" (+ "unix") protocol /
> transport, which nbd+sheepdog could stack on top of (curl+rbd seem to
> depend on their own libraries for managing the TCP part of the
> connection).
> 
> They would implement talking the actual protocol, while the tcp/unix
> transports would have the duplicatable bits.
> 
> I've not investigated it in code yet - it's possible I'm just letting my
> appetite for abstraction get away with me. Thoughts?

I'm not sure about how much duplication there actually is, but if you
can take a closer look and think it's worthwhile, we should probably
consider it.

>> Unconditionally stopping the VM from a block driver sounds wrong to me.
>> If you want to have this behaviour, the block driver should return an
>> error and you should use werror=stop.
> 
> Unconditional? - if the socket manages to re-establish, the process
> continues on its way (I guess we'd see the same behaviour if a send/recv
> happened to take an unconscionably long time with the current code).
> 
> Making just the I/O hang until the network comes back, keeping guest
> execution and qemu monitor working, is obviously better than that
> (although not /strictly/ necessary for our particular use case), so I
> hope to be able to offer an AIO NBD patch for review "soon". 

Maybe I wasn't very clear. I was talking about Stefan's suggestion to
completely stop the VM, like we already can do for I/O errors (see the
werror and rerror options for -drive). I think it's not what you're
looking for, you just need the timeout=-1 thing.

>>> IPv6 would be nice and if you can consolidate that in qemu_socket.h,
>>> then that's a win for non-nbd socket users too.
>>
>> Agreed.
> 
> We'd get it for free with a unified TCP transport, as described above
> (sheepdog already uses getaddrinfo and friends) - but if that's not
> feasible, I'll be happy to supply a patch just for this. Much easier
> than aio! :)

Sure, I think it would be a good thing to have. And even if you
implemented this unified TCP transport (I'm not sure yet what it would
look like), I think the basic support could still be in qemu_socket.h,
so that users outside the block layer can benefit from it, too.

Kevin

Re: [Qemu-devel] NBD block device backend - 'improvements'

Reply via email to