Hi Nick, [ Please use "reply to all" so that the CC list is kept intact, re-adding Stefan ]
Am 15.02.2011 22:26, schrieb Nicholas Thomas: > Hi Kevin, Stefan. > > On Tue, 2011-02-15 at 12:09 +0100, Kevin Wolf wrote: >> Am 14.02.2011 21:32, schrieb Stefan Hajnoczi: > [...] >>> block/nbd.c needs to be made asynchronous in order for this change to >>> work. >> >> And even then it's not free of problem: For example qemu_aio_flush() >> will hang. We're having all kinds of fun with NFS servers that go away >> and let requests hang indefinitely. >> >> So maybe what we should add is a timeout option which defaults to 0 >> (fail immediately, like today) > > Noted, so long as we can have -1 as "forever". As long as it's optional, that's fine with me. > I'm currently spending > time reworking block/nbd.c to be asynchronous, following the model in > block/sheepdog.c > > There does seem to be a lot of scope for code duplication (setting up > the TCP connection, taking it down, the mechanics of actually reading / > writing bytes using the aio interface, etc) between the two, and > presumably for rbd as well. > > Reading http://www.mail-archive.com/qemu-devel@nongnu.org/msg36479.html > suggests it should be possible to have a "tcp" (+ "unix") protocol / > transport, which nbd+sheepdog could stack on top of (curl+rbd seem to > depend on their own libraries for managing the TCP part of the > connection). > > They would implement talking the actual protocol, while the tcp/unix > transports would have the duplicatable bits. > > I've not investigated it in code yet - it's possible I'm just letting my > appetite for abstraction get away with me. Thoughts? I'm not sure about how much duplication there actually is, but if you can take a closer look and think it's worthwhile, we should probably consider it. >> Unconditionally stopping the VM from a block driver sounds wrong to me. >> If you want to have this behaviour, the block driver should return an >> error and you should use werror=stop. > > Unconditional? - if the socket manages to re-establish, the process > continues on its way (I guess we'd see the same behaviour if a send/recv > happened to take an unconscionably long time with the current code). > > Making just the I/O hang until the network comes back, keeping guest > execution and qemu monitor working, is obviously better than that > (although not /strictly/ necessary for our particular use case), so I > hope to be able to offer an AIO NBD patch for review "soon". Maybe I wasn't very clear. I was talking about Stefan's suggestion to completely stop the VM, like we already can do for I/O errors (see the werror and rerror options for -drive). I think it's not what you're looking for, you just need the timeout=-1 thing. >>> IPv6 would be nice and if you can consolidate that in qemu_socket.h, >>> then that's a win for non-nbd socket users too. >> >> Agreed. > > We'd get it for free with a unified TCP transport, as described above > (sheepdog already uses getaddrinfo and friends) - but if that's not > feasible, I'll be happy to supply a patch just for this. Much easier > than aio! :) Sure, I think it would be a good thing to have. And even if you implemented this unified TCP transport (I'm not sure yet what it would look like), I think the basic support could still be in qemu_socket.h, so that users outside the block layer can benefit from it, too. Kevin