Re: [Qemu-devel] [RFC 0/5] nbd: Adapt for dataplane
On 03.06.2014 16:38, Stefan Hajnoczi wrote: On Sat, May 31, 2014 at 08:43:07PM +0200, Max Reitz wrote: For the NBD server to work with dataplane, it needs to correctly access the exported BDS. It makes the most sense to run both in the same AioContext, therefore this series implements methods for tracking a BDS's AioContext and makes NBD make use of this for keeping the clients connected to that BDS in the same AioContext. The reason this is an RFC and not a PATCH is my inexperience with AIO, coroutines and the like. Also, I'm not sure about what to do about the coroutines. The NBD server has up to two coroutines per client: One for receiving and one for sending. Theoretically, both have to be transferred to the new AioContext if it is changed; however, as far as I see it, coroutines are not really bound to an AioContext, they are simply run in the AioContext entering them. Therefore, I think a transfer is unnecessary. All coroutines are entered from nbd_read() and nbd_restart_write(), both of which are AIO routines registered via aio_set_fd_handler2(). As bs_aio_detach() unregisters all of these routines, the coroutines can no longer be entered, but only after bs_aio_attach() is called again. Then, when they are called, they will enter the coroutines in the new AioContext. Therefore, I think an explicit transfer unnecessary. This reasoning sounds correct. However, if bs_aio_detach() is called from a different thread than the old AioContext is running in, we may still have coroutines running for which we should wait before returning from bs_aio_detach(). The bdrv_attach/detach_aio_context() APIs have rules regarding where these functions are called from: /** * bdrv_set_aio_context: * * Changes the #AioContext used for fd handlers, timers, and BHs by this * BlockDriverState and all its children. * * This function must be called from the old #AioContext or with a lock held so * the old #AioContext is not executing. Oh, that makes things easier. *g* */ void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context); and: /* Remove fd handlers, timers, and other event loop callbacks so the event * loop is no longer in use. Called with no in-flight requests and in * depth-first traversal order with parents before child nodes. */ void (*bdrv_detach_aio_context)(BlockDriverState *bs); /* Add fd handlers, timers, and other event loop callbacks so I/O requests * can be processed again. Called with no in-flight requests and in * depth-first traversal order with child nodes before parent nodes. */ void (*bdrv_attach_aio_context)(BlockDriverState *bs, AioContext *new_context); These rules ensure that it's safe to perform these operations. You don't have to support arbitrary callers in NBD either. But because of my inexperience with coroutines, I'm not sure. I now have these patches nearly unchanged here for about a week and I'm looking for ways of testing them, but so far I could only test whether the old use cases work, but not whether they will work for what they are intended to do: With BDS changing their AioContext. So, because I'm not sure what else to do and because I don't know how to test multiple AIO threads (how do I move a BDS into another iothread?) I'm just sending this out as an RFC. Use a Linux guest with virtio-blk: qemu -drive if=none,file=test.img,id=drive0 \ -object iothread,id=iothread0 \ -device virtio-blk-pci,drive=drive0,x-iothread=iothread0 \ ... Once the guest has booted the virtio-blk device will be in dataplane mode. That means drive0's BlockDriverState -aio_context will be the IOThread AioContext and not the global qemu_aio_context. Ah, thank you. Now you can exercise the run-time NBD server over QMP and check that things still work. For example, try running a few instance of dd if=/dev/vdb of=/dev/null iflag=direct inside the guest to stress guest I/O. Typically what happens if code is not dataplane-aware is that a deadlock or crash occurs due to race conditions between the QEMU main loop and the IOThread for this virtio-blk device. For an overview of dataplane programming concepts, see: https://lists.gnu.org/archive/html/qemu-devel/2014-05/msg01436.html Yes, I took this email as a reference, however it only said how to create a new iothread, but not how to use it. :-) Max Stefan
Re: [Qemu-devel] [RFC 0/5] nbd: Adapt for dataplane
On Sat, May 31, 2014 at 08:43:07PM +0200, Max Reitz wrote: For the NBD server to work with dataplane, it needs to correctly access the exported BDS. It makes the most sense to run both in the same AioContext, therefore this series implements methods for tracking a BDS's AioContext and makes NBD make use of this for keeping the clients connected to that BDS in the same AioContext. The reason this is an RFC and not a PATCH is my inexperience with AIO, coroutines and the like. Also, I'm not sure about what to do about the coroutines. The NBD server has up to two coroutines per client: One for receiving and one for sending. Theoretically, both have to be transferred to the new AioContext if it is changed; however, as far as I see it, coroutines are not really bound to an AioContext, they are simply run in the AioContext entering them. Therefore, I think a transfer is unnecessary. All coroutines are entered from nbd_read() and nbd_restart_write(), both of which are AIO routines registered via aio_set_fd_handler2(). As bs_aio_detach() unregisters all of these routines, the coroutines can no longer be entered, but only after bs_aio_attach() is called again. Then, when they are called, they will enter the coroutines in the new AioContext. Therefore, I think an explicit transfer unnecessary. However, if bs_aio_detach() is called from a different thread than the old AioContext is running in, we may still have coroutines running for which we should wait before returning from bs_aio_detach(). But because of my inexperience with coroutines, I'm not sure. I now have these patches nearly unchanged here for about a week and I'm looking for ways of testing them, but so far I could only test whether the old use cases work, but not whether they will work for what they are intended to do: With BDS changing their AioContext. So, because I'm not sure what else to do and because I don't know how to test multiple AIO threads (how do I move a BDS into another iothread?) I'm just sending this out as an RFC. Max Reitz (5): nbd: Correct name comparison for export_set_name() aio: Add io_read_poll() callback nbd: Use aio_set_fd_handler2() block: Add AIO followers nbd: Follow the BDS' AIO context aio-posix.c | 26 - block.c | 55 +++ include/block/aio.h | 12 ++ include/block/block_int.h | 40 include/qemu/main-loop.h | 1 - nbd.c | 59 ++- 6 files changed, 175 insertions(+), 18 deletions(-) Thanks for the RFC. It looks like the concept will work correctly. I left comments, a lot of them are just ideas and you don't have to implement them if you don't want to. Stefan
Re: [Qemu-devel] [RFC 0/5] nbd: Adapt for dataplane
On Sat, May 31, 2014 at 10:24:47PM +0200, Max Reitz wrote: On 31.05.2014 20:43, Max Reitz wrote: [snip] However, if bs_aio_detach() is called from a different thread than the old AioContext is running in, we may still have coroutines running for which we should wait before returning from bs_aio_detach(). After re-reading Stefan's RFC and the AIO locking code, would it suffice to call aio_context_acquire() and aio_context_release() on the old AioContext at the end of bs_aio_detach() to ensure all coroutines are settled? If not, I guess I have to wait until the coroutine variables are set to NULL (which indicates they have completely finished execution; however, this is not actually necessary and might lead to an infinite loop if the block driver keeps yielding due to some condition related to the AioContext switch), or I really have to switch the existing coroutines to the new AioContext while they are running. Doing this from the outside will probably be even messier than it would already be from the inside, so I sure hope to be able to avoid this… I think you don't need to worry about this at all. bdrv_set_aio_context() is always invoked with the QEMU global mutex held. Regarding switching a coroutine from one AioContext to another (possibly in another thread), I've implemented it but it's a little involved (complex and slow). Let's avoid it, if possible. I also lost the patch during the great make distclean twice deletes .git/ fiasco of 2014 so would need to reimplement it :). Stefan
Re: [Qemu-devel] [RFC 0/5] nbd: Adapt for dataplane
On Sat, May 31, 2014 at 08:43:07PM +0200, Max Reitz wrote: For the NBD server to work with dataplane, it needs to correctly access the exported BDS. It makes the most sense to run both in the same AioContext, therefore this series implements methods for tracking a BDS's AioContext and makes NBD make use of this for keeping the clients connected to that BDS in the same AioContext. The reason this is an RFC and not a PATCH is my inexperience with AIO, coroutines and the like. Also, I'm not sure about what to do about the coroutines. The NBD server has up to two coroutines per client: One for receiving and one for sending. Theoretically, both have to be transferred to the new AioContext if it is changed; however, as far as I see it, coroutines are not really bound to an AioContext, they are simply run in the AioContext entering them. Therefore, I think a transfer is unnecessary. All coroutines are entered from nbd_read() and nbd_restart_write(), both of which are AIO routines registered via aio_set_fd_handler2(). As bs_aio_detach() unregisters all of these routines, the coroutines can no longer be entered, but only after bs_aio_attach() is called again. Then, when they are called, they will enter the coroutines in the new AioContext. Therefore, I think an explicit transfer unnecessary. This reasoning sounds correct. However, if bs_aio_detach() is called from a different thread than the old AioContext is running in, we may still have coroutines running for which we should wait before returning from bs_aio_detach(). The bdrv_attach/detach_aio_context() APIs have rules regarding where these functions are called from: /** * bdrv_set_aio_context: * * Changes the #AioContext used for fd handlers, timers, and BHs by this * BlockDriverState and all its children. * * This function must be called from the old #AioContext or with a lock held so * the old #AioContext is not executing. */ void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context); and: /* Remove fd handlers, timers, and other event loop callbacks so the event * loop is no longer in use. Called with no in-flight requests and in * depth-first traversal order with parents before child nodes. */ void (*bdrv_detach_aio_context)(BlockDriverState *bs); /* Add fd handlers, timers, and other event loop callbacks so I/O requests * can be processed again. Called with no in-flight requests and in * depth-first traversal order with child nodes before parent nodes. */ void (*bdrv_attach_aio_context)(BlockDriverState *bs, AioContext *new_context); These rules ensure that it's safe to perform these operations. You don't have to support arbitrary callers in NBD either. But because of my inexperience with coroutines, I'm not sure. I now have these patches nearly unchanged here for about a week and I'm looking for ways of testing them, but so far I could only test whether the old use cases work, but not whether they will work for what they are intended to do: With BDS changing their AioContext. So, because I'm not sure what else to do and because I don't know how to test multiple AIO threads (how do I move a BDS into another iothread?) I'm just sending this out as an RFC. Use a Linux guest with virtio-blk: qemu -drive if=none,file=test.img,id=drive0 \ -object iothread,id=iothread0 \ -device virtio-blk-pci,drive=drive0,x-iothread=iothread0 \ ... Once the guest has booted the virtio-blk device will be in dataplane mode. That means drive0's BlockDriverState -aio_context will be the IOThread AioContext and not the global qemu_aio_context. Now you can exercise the run-time NBD server over QMP and check that things still work. For example, try running a few instance of dd if=/dev/vdb of=/dev/null iflag=direct inside the guest to stress guest I/O. Typically what happens if code is not dataplane-aware is that a deadlock or crash occurs due to race conditions between the QEMU main loop and the IOThread for this virtio-blk device. For an overview of dataplane programming concepts, see: https://lists.gnu.org/archive/html/qemu-devel/2014-05/msg01436.html Stefan
[Qemu-devel] [RFC 0/5] nbd: Adapt for dataplane
For the NBD server to work with dataplane, it needs to correctly access the exported BDS. It makes the most sense to run both in the same AioContext, therefore this series implements methods for tracking a BDS's AioContext and makes NBD make use of this for keeping the clients connected to that BDS in the same AioContext. The reason this is an RFC and not a PATCH is my inexperience with AIO, coroutines and the like. Also, I'm not sure about what to do about the coroutines. The NBD server has up to two coroutines per client: One for receiving and one for sending. Theoretically, both have to be transferred to the new AioContext if it is changed; however, as far as I see it, coroutines are not really bound to an AioContext, they are simply run in the AioContext entering them. Therefore, I think a transfer is unnecessary. All coroutines are entered from nbd_read() and nbd_restart_write(), both of which are AIO routines registered via aio_set_fd_handler2(). As bs_aio_detach() unregisters all of these routines, the coroutines can no longer be entered, but only after bs_aio_attach() is called again. Then, when they are called, they will enter the coroutines in the new AioContext. Therefore, I think an explicit transfer unnecessary. However, if bs_aio_detach() is called from a different thread than the old AioContext is running in, we may still have coroutines running for which we should wait before returning from bs_aio_detach(). But because of my inexperience with coroutines, I'm not sure. I now have these patches nearly unchanged here for about a week and I'm looking for ways of testing them, but so far I could only test whether the old use cases work, but not whether they will work for what they are intended to do: With BDS changing their AioContext. So, because I'm not sure what else to do and because I don't know how to test multiple AIO threads (how do I move a BDS into another iothread?) I'm just sending this out as an RFC. Max Reitz (5): nbd: Correct name comparison for export_set_name() aio: Add io_read_poll() callback nbd: Use aio_set_fd_handler2() block: Add AIO followers nbd: Follow the BDS' AIO context aio-posix.c | 26 - block.c | 55 +++ include/block/aio.h | 12 ++ include/block/block_int.h | 40 include/qemu/main-loop.h | 1 - nbd.c | 59 ++- 6 files changed, 175 insertions(+), 18 deletions(-) -- 1.9.3
Re: [Qemu-devel] [RFC 0/5] nbd: Adapt for dataplane
On 31.05.2014 20:43, Max Reitz wrote: [snip] However, if bs_aio_detach() is called from a different thread than the old AioContext is running in, we may still have coroutines running for which we should wait before returning from bs_aio_detach(). After re-reading Stefan's RFC and the AIO locking code, would it suffice to call aio_context_acquire() and aio_context_release() on the old AioContext at the end of bs_aio_detach() to ensure all coroutines are settled? If not, I guess I have to wait until the coroutine variables are set to NULL (which indicates they have completely finished execution; however, this is not actually necessary and might lead to an infinite loop if the block driver keeps yielding due to some condition related to the AioContext switch), or I really have to switch the existing coroutines to the new AioContext while they are running. Doing this from the outside will probably be even messier than it would already be from the inside, so I sure hope to be able to avoid this… Max