Re: [Qemu-devel] [RFC 0/5] nbd: Adapt for dataplane

2014-06-05 Thread Max Reitz

On 03.06.2014 16:38, Stefan Hajnoczi wrote:

On Sat, May 31, 2014 at 08:43:07PM +0200, Max Reitz wrote:

For the NBD server to work with dataplane, it needs to correctly access
the exported BDS. It makes the most sense to run both in the same
AioContext, therefore this series implements methods for tracking a
BDS's AioContext and makes NBD make use of this for keeping the clients
connected to that BDS in the same AioContext.

The reason this is an RFC and not a PATCH is my inexperience with AIO,
coroutines and the like. Also, I'm not sure about what to do about the
coroutines. The NBD server has up to two coroutines per client: One for
receiving and one for sending. Theoretically, both have to be
transferred to the new AioContext if it is changed; however, as far as
I see it, coroutines are not really bound to an AioContext, they are
simply run in the AioContext entering them. Therefore, I think a
transfer is unnecessary. All coroutines are entered from nbd_read() and
nbd_restart_write(), both of which are AIO routines registered via
aio_set_fd_handler2().

As bs_aio_detach() unregisters all of these routines, the coroutines can
no longer be entered, but only after bs_aio_attach() is called again.
Then, when they are called, they will enter the coroutines in the new
AioContext. Therefore, I think an explicit transfer unnecessary.

This reasoning sounds correct.


However, if bs_aio_detach() is called from a different thread than the
old AioContext is running in, we may still have coroutines running for
which we should wait before returning from bs_aio_detach().

The bdrv_attach/detach_aio_context() APIs have rules regarding where
these functions are called from:

   /**
* bdrv_set_aio_context:
*
* Changes the #AioContext used for fd handlers, timers, and BHs by this
* BlockDriverState and all its children.
*
* This function must be called from the old #AioContext or with a lock held 
so
* the old #AioContext is not executing.


Oh, that makes things easier. *g*


*/
   void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context);

and:

   /* Remove fd handlers, timers, and other event loop callbacks so the event
* loop is no longer in use.  Called with no in-flight requests and in
* depth-first traversal order with parents before child nodes.
*/
   void (*bdrv_detach_aio_context)(BlockDriverState *bs);

   /* Add fd handlers, timers, and other event loop callbacks so I/O requests
* can be processed again.  Called with no in-flight requests and in
* depth-first traversal order with child nodes before parent nodes.
*/
   void (*bdrv_attach_aio_context)(BlockDriverState *bs,
   AioContext *new_context);

These rules ensure that it's safe to perform these operations.  You
don't have to support arbitrary callers in NBD either.


But because of my inexperience with coroutines, I'm not sure. I now have
these patches nearly unchanged here for about a week and I'm looking for
ways of testing them, but so far I could only test whether the old use
cases work, but not whether they will work for what they are intended to
do: With BDS changing their AioContext.

So, because I'm not sure what else to do and because I don't know how to
test multiple AIO threads (how do I move a BDS into another iothread?)
I'm just sending this out as an RFC.

Use a Linux guest with virtio-blk:

   qemu -drive if=none,file=test.img,id=drive0 \
-object iothread,id=iothread0 \
-device virtio-blk-pci,drive=drive0,x-iothread=iothread0 \
...

Once the guest has booted the virtio-blk device will be in dataplane
mode.  That means drive0's BlockDriverState -aio_context will be the
IOThread AioContext and not the global qemu_aio_context.


Ah, thank you.


Now you can exercise the run-time NBD server over QMP and check that
things still work.  For example, try running a few instance of dd
if=/dev/vdb of=/dev/null iflag=direct inside the guest to stress guest
I/O.

Typically what happens if code is not dataplane-aware is that a deadlock
or crash occurs due to race conditions between the QEMU main loop and
the IOThread for this virtio-blk device.

For an overview of dataplane programming concepts, see:
https://lists.gnu.org/archive/html/qemu-devel/2014-05/msg01436.html


Yes, I took this email as a reference, however it only said how to 
create a new iothread, but not how to use it. :-)


Max


Stefan





Re: [Qemu-devel] [RFC 0/5] nbd: Adapt for dataplane

2014-06-04 Thread Stefan Hajnoczi
On Sat, May 31, 2014 at 08:43:07PM +0200, Max Reitz wrote:
 For the NBD server to work with dataplane, it needs to correctly access
 the exported BDS. It makes the most sense to run both in the same
 AioContext, therefore this series implements methods for tracking a
 BDS's AioContext and makes NBD make use of this for keeping the clients
 connected to that BDS in the same AioContext.
 
 The reason this is an RFC and not a PATCH is my inexperience with AIO,
 coroutines and the like. Also, I'm not sure about what to do about the
 coroutines. The NBD server has up to two coroutines per client: One for
 receiving and one for sending. Theoretically, both have to be
 transferred to the new AioContext if it is changed; however, as far as
 I see it, coroutines are not really bound to an AioContext, they are
 simply run in the AioContext entering them. Therefore, I think a
 transfer is unnecessary. All coroutines are entered from nbd_read() and
 nbd_restart_write(), both of which are AIO routines registered via
 aio_set_fd_handler2().
 
 As bs_aio_detach() unregisters all of these routines, the coroutines can
 no longer be entered, but only after bs_aio_attach() is called again.
 Then, when they are called, they will enter the coroutines in the new
 AioContext. Therefore, I think an explicit transfer unnecessary.
 
 However, if bs_aio_detach() is called from a different thread than the
 old AioContext is running in, we may still have coroutines running for
 which we should wait before returning from bs_aio_detach().
 
 But because of my inexperience with coroutines, I'm not sure. I now have
 these patches nearly unchanged here for about a week and I'm looking for
 ways of testing them, but so far I could only test whether the old use
 cases work, but not whether they will work for what they are intended to
 do: With BDS changing their AioContext.
 
 So, because I'm not sure what else to do and because I don't know how to
 test multiple AIO threads (how do I move a BDS into another iothread?)
 I'm just sending this out as an RFC.
 
 
 Max Reitz (5):
   nbd: Correct name comparison for export_set_name()
   aio: Add io_read_poll() callback
   nbd: Use aio_set_fd_handler2()
   block: Add AIO followers
   nbd: Follow the BDS' AIO context
 
  aio-posix.c   | 26 -
  block.c   | 55 +++
  include/block/aio.h   | 12 ++
  include/block/block_int.h | 40 
  include/qemu/main-loop.h  |  1 -
  nbd.c | 59 
 ++-
  6 files changed, 175 insertions(+), 18 deletions(-)

Thanks for the RFC.  It looks like the concept will work correctly.  I
left comments, a lot of them are just ideas and you don't have to
implement them if you don't want to.

Stefan



Re: [Qemu-devel] [RFC 0/5] nbd: Adapt for dataplane

2014-06-04 Thread Stefan Hajnoczi
On Sat, May 31, 2014 at 10:24:47PM +0200, Max Reitz wrote:
 On 31.05.2014 20:43, Max Reitz wrote:
 [snip]
 
 However, if bs_aio_detach() is called from a different thread than the
 old AioContext is running in, we may still have coroutines running for
 which we should wait before returning from bs_aio_detach().
 
 After re-reading Stefan's RFC and the AIO locking code, would it suffice to
 call aio_context_acquire() and aio_context_release() on the old AioContext
 at the end of bs_aio_detach() to ensure all coroutines are settled? If not,
 I guess I have to wait until the coroutine variables are set to NULL (which
 indicates they have completely finished execution; however, this is not
 actually necessary and might lead to an infinite loop if the block driver
 keeps yielding due to some condition related to the AioContext switch), or I
 really have to switch the existing coroutines to the new AioContext while
 they are running. Doing this from the outside will probably be even messier
 than it would already be from the inside, so I sure hope to be able to avoid
 this…

I think you don't need to worry about this at all.
bdrv_set_aio_context() is always invoked with the QEMU global mutex
held.

Regarding switching a coroutine from one AioContext to another (possibly
in another thread), I've implemented it but it's a little involved
(complex and slow).  Let's avoid it, if possible.  I also lost the patch
during the great make distclean twice deletes .git/ fiasco of 2014 so
would need to reimplement it :).

Stefan



Re: [Qemu-devel] [RFC 0/5] nbd: Adapt for dataplane

2014-06-03 Thread Stefan Hajnoczi
On Sat, May 31, 2014 at 08:43:07PM +0200, Max Reitz wrote:
 For the NBD server to work with dataplane, it needs to correctly access
 the exported BDS. It makes the most sense to run both in the same
 AioContext, therefore this series implements methods for tracking a
 BDS's AioContext and makes NBD make use of this for keeping the clients
 connected to that BDS in the same AioContext.
 
 The reason this is an RFC and not a PATCH is my inexperience with AIO,
 coroutines and the like. Also, I'm not sure about what to do about the
 coroutines. The NBD server has up to two coroutines per client: One for
 receiving and one for sending. Theoretically, both have to be
 transferred to the new AioContext if it is changed; however, as far as
 I see it, coroutines are not really bound to an AioContext, they are
 simply run in the AioContext entering them. Therefore, I think a
 transfer is unnecessary. All coroutines are entered from nbd_read() and
 nbd_restart_write(), both of which are AIO routines registered via
 aio_set_fd_handler2().
 
 As bs_aio_detach() unregisters all of these routines, the coroutines can
 no longer be entered, but only after bs_aio_attach() is called again.
 Then, when they are called, they will enter the coroutines in the new
 AioContext. Therefore, I think an explicit transfer unnecessary.

This reasoning sounds correct.

 However, if bs_aio_detach() is called from a different thread than the
 old AioContext is running in, we may still have coroutines running for
 which we should wait before returning from bs_aio_detach().

The bdrv_attach/detach_aio_context() APIs have rules regarding where
these functions are called from:

  /**
   * bdrv_set_aio_context:
   *
   * Changes the #AioContext used for fd handlers, timers, and BHs by this
   * BlockDriverState and all its children.
   *
   * This function must be called from the old #AioContext or with a lock held 
so
   * the old #AioContext is not executing.
   */
  void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context);

and:

  /* Remove fd handlers, timers, and other event loop callbacks so the event
   * loop is no longer in use.  Called with no in-flight requests and in
   * depth-first traversal order with parents before child nodes.
   */
  void (*bdrv_detach_aio_context)(BlockDriverState *bs);

  /* Add fd handlers, timers, and other event loop callbacks so I/O requests
   * can be processed again.  Called with no in-flight requests and in
   * depth-first traversal order with child nodes before parent nodes.
   */
  void (*bdrv_attach_aio_context)(BlockDriverState *bs,
  AioContext *new_context);

These rules ensure that it's safe to perform these operations.  You
don't have to support arbitrary callers in NBD either.

 But because of my inexperience with coroutines, I'm not sure. I now have
 these patches nearly unchanged here for about a week and I'm looking for
 ways of testing them, but so far I could only test whether the old use
 cases work, but not whether they will work for what they are intended to
 do: With BDS changing their AioContext.
 
 So, because I'm not sure what else to do and because I don't know how to
 test multiple AIO threads (how do I move a BDS into another iothread?)
 I'm just sending this out as an RFC.

Use a Linux guest with virtio-blk:

  qemu -drive if=none,file=test.img,id=drive0 \
   -object iothread,id=iothread0 \
   -device virtio-blk-pci,drive=drive0,x-iothread=iothread0 \
   ...

Once the guest has booted the virtio-blk device will be in dataplane
mode.  That means drive0's BlockDriverState -aio_context will be the
IOThread AioContext and not the global qemu_aio_context.

Now you can exercise the run-time NBD server over QMP and check that
things still work.  For example, try running a few instance of dd
if=/dev/vdb of=/dev/null iflag=direct inside the guest to stress guest
I/O.

Typically what happens if code is not dataplane-aware is that a deadlock
or crash occurs due to race conditions between the QEMU main loop and
the IOThread for this virtio-blk device.

For an overview of dataplane programming concepts, see:
https://lists.gnu.org/archive/html/qemu-devel/2014-05/msg01436.html

Stefan



[Qemu-devel] [RFC 0/5] nbd: Adapt for dataplane

2014-05-31 Thread Max Reitz
For the NBD server to work with dataplane, it needs to correctly access
the exported BDS. It makes the most sense to run both in the same
AioContext, therefore this series implements methods for tracking a
BDS's AioContext and makes NBD make use of this for keeping the clients
connected to that BDS in the same AioContext.

The reason this is an RFC and not a PATCH is my inexperience with AIO,
coroutines and the like. Also, I'm not sure about what to do about the
coroutines. The NBD server has up to two coroutines per client: One for
receiving and one for sending. Theoretically, both have to be
transferred to the new AioContext if it is changed; however, as far as
I see it, coroutines are not really bound to an AioContext, they are
simply run in the AioContext entering them. Therefore, I think a
transfer is unnecessary. All coroutines are entered from nbd_read() and
nbd_restart_write(), both of which are AIO routines registered via
aio_set_fd_handler2().

As bs_aio_detach() unregisters all of these routines, the coroutines can
no longer be entered, but only after bs_aio_attach() is called again.
Then, when they are called, they will enter the coroutines in the new
AioContext. Therefore, I think an explicit transfer unnecessary.

However, if bs_aio_detach() is called from a different thread than the
old AioContext is running in, we may still have coroutines running for
which we should wait before returning from bs_aio_detach().

But because of my inexperience with coroutines, I'm not sure. I now have
these patches nearly unchanged here for about a week and I'm looking for
ways of testing them, but so far I could only test whether the old use
cases work, but not whether they will work for what they are intended to
do: With BDS changing their AioContext.

So, because I'm not sure what else to do and because I don't know how to
test multiple AIO threads (how do I move a BDS into another iothread?)
I'm just sending this out as an RFC.


Max Reitz (5):
  nbd: Correct name comparison for export_set_name()
  aio: Add io_read_poll() callback
  nbd: Use aio_set_fd_handler2()
  block: Add AIO followers
  nbd: Follow the BDS' AIO context

 aio-posix.c   | 26 -
 block.c   | 55 +++
 include/block/aio.h   | 12 ++
 include/block/block_int.h | 40 
 include/qemu/main-loop.h  |  1 -
 nbd.c | 59 ++-
 6 files changed, 175 insertions(+), 18 deletions(-)

-- 
1.9.3




Re: [Qemu-devel] [RFC 0/5] nbd: Adapt for dataplane

2014-05-31 Thread Max Reitz

On 31.05.2014 20:43, Max Reitz wrote:

[snip]

However, if bs_aio_detach() is called from a different thread than the
old AioContext is running in, we may still have coroutines running for
which we should wait before returning from bs_aio_detach().


After re-reading Stefan's RFC and the AIO locking code, would it suffice 
to call aio_context_acquire() and aio_context_release() on the old 
AioContext at the end of bs_aio_detach() to ensure all coroutines are 
settled? If not, I guess I have to wait until the coroutine variables 
are set to NULL (which indicates they have completely finished 
execution; however, this is not actually necessary and might lead to an 
infinite loop if the block driver keeps yielding due to some condition 
related to the AioContext switch), or I really have to switch the 
existing coroutines to the new AioContext while they are running. Doing 
this from the outside will probably be even messier than it would 
already be from the inside, so I sure hope to be able to avoid this…


Max