[Qemu-devel] [PATCH 00/12] nbd improvements

2011-09-08 Thread Paolo Bonzini
I find nbd quite useful to test migration, but it is limited:
it can only do synchronous operation, it is not safe because it
does not support flush, and it has no discard either.  qemu-nbd
is also limited to 1MB requests, and the nbd block driver does
not take this into account.

Luckily, flush/FUA support is being worked out by upstream,
and discard can also be added with the same framework (patches
1 to 6).

Asynchronous support is also very similar to what sheepdog is
already doing (patches 7 to 12).

Paolo Bonzini (12):
  nbd: support feature negotiation
  nbd: sync API definitions with upstream
  nbd: support NBD_SET_FLAGS ioctl
  nbd: add support for NBD_CMD_FLUSH
  nbd: add support for NBD_CMD_FLAG_FUA
  nbd: support NBD_CMD_TRIM in the server
  sheepdog: add coroutine_fn markers
  add socket_set_block
  sheepdog: move coroutine send/recv function to generic code
  block: add bdrv_co_flush support
  nbd: switch to asynchronous operation
  nbd: split requests

 block.c  |   53 ++---
 block/nbd.c  |  225 
 block/sheepdog.c |  235 +++---
 block_int.h  |1 +
 cutils.c |  108 +
 nbd.c|   80 +--
 nbd.h|   20 -
 oslib-posix.c|7 ++
 oslib-win32.c|6 ++
 qemu-common.h|3 +
 qemu-coroutine.c |   71 
 qemu-coroutine.h |   26 ++
 qemu-nbd.c   |   13 ++--
 qemu_socket.h|1 +
 14 files changed, 580 insertions(+), 269 deletions(-)

-- 
1.7.6




Re: [Qemu-devel] [PATCH 00/12] nbd improvements

2011-09-09 Thread Kevin Wolf
Am 08.09.2011 17:24, schrieb Paolo Bonzini:
> I find nbd quite useful to test migration, but it is limited:
> it can only do synchronous operation, it is not safe because it
> does not support flush, and it has no discard either.  qemu-nbd
> is also limited to 1MB requests, and the nbd block driver does
> not take this into account.
> 
> Luckily, flush/FUA support is being worked out by upstream,
> and discard can also be added with the same framework (patches
> 1 to 6).
> 
> Asynchronous support is also very similar to what sheepdog is
> already doing (patches 7 to 12).
> 
> Paolo Bonzini (12):
>   nbd: support feature negotiation
>   nbd: sync API definitions with upstream
>   nbd: support NBD_SET_FLAGS ioctl
>   nbd: add support for NBD_CMD_FLUSH
>   nbd: add support for NBD_CMD_FLAG_FUA
>   nbd: support NBD_CMD_TRIM in the server
>   sheepdog: add coroutine_fn markers
>   add socket_set_block
>   sheepdog: move coroutine send/recv function to generic code
>   block: add bdrv_co_flush support
>   nbd: switch to asynchronous operation
>   nbd: split requests
> 
>  block.c  |   53 ++---
>  block/nbd.c  |  225 
>  block/sheepdog.c |  235 
> +++---
>  block_int.h  |1 +
>  cutils.c |  108 +
>  nbd.c|   80 +--
>  nbd.h|   20 -
>  oslib-posix.c|7 ++
>  oslib-win32.c|6 ++
>  qemu-common.h|3 +
>  qemu-coroutine.c |   71 
>  qemu-coroutine.h |   26 ++
>  qemu-nbd.c   |   13 ++--
>  qemu_socket.h|1 +
>  14 files changed, 580 insertions(+), 269 deletions(-)

There is anonther patch enabling AIO for NBD on the list [1], by
Nicholas Thomas (CCed), that lacked review so far. Can you guys please
review each others approach and then converge on a solution? I guess
Paolo's patches 1-7 can be applied in any case, probably causing minor
conflicts, but for the rest we need to decide which one to pick.

Kevin

[1] http://www.mail-archive.com/qemu-devel@nongnu.org/msg74711.html



Re: [Qemu-devel] [PATCH 00/12] nbd improvements

2011-09-09 Thread Paolo Bonzini

On 09/09/2011 11:00 AM, Kevin Wolf wrote:

There is anonther patch enabling AIO for NBD on the list [1], by
Nicholas Thomas (CCed), that lacked review so far. Can you guys please
review each others approach and then converge on a solution? I guess
Paolo's patches 1-7 can be applied in any case, probably causing minor
conflicts, but for the rest we need to decide which one to pick.


Stefan also pointed me to Nicholas's patches yesterday.  I would go with 
mine, if only because his work predates coroutines (at least the older 
versions) and are much more complex.


On the other hand, Nicholas's work includes timeout and reconnect.  I'm 
not sure how complicated it is to include it in my series, but probably 
not much.  With coroutines, preserving the list of outstanding I/O 
requests is done implicitly by the CoMutex, so you basically have to 
check errno for ECONNRESET and similar errors, reconnect, and retry 
issuing the current request only.


Timeout can be done with a QEMUTimer that shuts down the socket 
(shutdown(2) I mean).  This triggers an EPIPE when writing, or a 
zero-sized read when reading.  The timeout can be set every time the 
coroutine is (re)entered, and reset before exiting nbd_co_readv/writev.


What do you think?

Paolo



Re: [Qemu-devel] [PATCH 00/12] nbd improvements

2011-09-09 Thread Kevin Wolf
Am 09.09.2011 12:29, schrieb Paolo Bonzini:
> On 09/09/2011 11:00 AM, Kevin Wolf wrote:
>> There is anonther patch enabling AIO for NBD on the list [1], by
>> Nicholas Thomas (CCed), that lacked review so far. Can you guys please
>> review each others approach and then converge on a solution? I guess
>> Paolo's patches 1-7 can be applied in any case, probably causing minor
>> conflicts, but for the rest we need to decide which one to pick.
> 
> Stefan also pointed me to Nicholas's patches yesterday.  I would go with 
> mine, if only because his work predates coroutines (at least the older 
> versions) and are much more complex.
> 
> On the other hand, Nicholas's work includes timeout and reconnect.  I'm 
> not sure how complicated it is to include it in my series, but probably 
> not much.  With coroutines, preserving the list of outstanding I/O 
> requests is done implicitly by the CoMutex, so you basically have to 
> check errno for ECONNRESET and similar errors, reconnect, and retry 
> issuing the current request only.
> 
> Timeout can be done with a QEMUTimer that shuts down the socket 
> (shutdown(2) I mean).  This triggers an EPIPE when writing, or a 
> zero-sized read when reading.  The timeout can be set every time the 
> coroutine is (re)entered, and reset before exiting nbd_co_readv/writev.
> 
> What do you think?

I haven't really had a look at your patches yet (I hope to do so later
today), but from the patch descriptions I would think that indeed your
patches could be the better base for a converged series.

What I would like you and Nick to do is to see what is missing from your
series compared to his one (you describe a couple of things, but let's
see if Nick knows anything else) and to agree on the next steps. I think
that a possible way is to merge your series and do the other
improvements on top, but as I said I haven't really looked into the
details yet.

Kevin



Re: [Qemu-devel] [PATCH 00/12] nbd improvements

2011-09-09 Thread Nicholas Thomas
On Fri, 2011-09-09 at 12:29 +0200, Paolo Bonzini wrote:
> On 09/09/2011 11:00 AM, Kevin Wolf wrote:
> > There is anonther patch enabling AIO for NBD on the list [1], by
> > Nicholas Thomas (CCed), that lacked review so far. Can you guys please
> > review each others approach and then converge on a solution? I guess
> > Paolo's patches 1-7 can be applied in any case, probably causing minor
> > conflicts, but for the rest we need to decide which one to pick.
> 
> Stefan also pointed me to Nicholas's patches yesterday.  I would go with 
> mine, if only because his work predates coroutines (at least the older 
> versions) and are much more complex.

I'd concur here. I wrote the AIO portion of my patches merely to get the
code into a state where I could add timeout and reconnect logic - I'm
pretty sure the code I wrote is *correct* (we're using it in production,
after all), but I'm by no means invested in it. Your version includes
TRIM and flush support as well, which is nice.

> On the other hand, Nicholas's work includes timeout and reconnect.  I'm 
> not sure how complicated it is to include it in my series, but probably 
> not much.  With coroutines, preserving the list of outstanding I/O 
> requests is done implicitly by the CoMutex, so you basically have to 
> check errno for ECONNRESET and similar errors, reconnect, and retry 
> issuing the current request only.

Definitely a simpler approach than my version. Although, does your
version preserve write ordering? I was quite careful to ensure that
write requests would always be sent in the order they were presented;
although I don't know if qemu proper depends on that behaviour or not.

> Timeout can be done with a QEMUTimer that shuts down the socket 
> (shutdown(2) I mean).  This triggers an EPIPE when writing, or a 
> zero-sized read when reading.  The timeout can be set every time the 
> coroutine is (re)entered, and reset before exiting nbd_co_readv/writev.

for reconnect, I did consider using QEMUTimer, but when I tried it I ran
into linking problems with qemu-io. As far as I can tell, resolving that
is a significant code reorganisation - QEMUTimer pulls in lots of code
that isn't linked in at the moment (VM clocks, etc). I'm not sure it's
worth tackling that to avoid using timer_create(2). 

The timeout code I have at the moment is something of a bodge and
replacing it with an actual timer (of either kind) would be a massive
improvement, obviously.

/Nick




Re: [Qemu-devel] [PATCH 00/12] nbd improvements

2011-09-09 Thread Paolo Bonzini

On 09/09/2011 12:50 PM, Nicholas Thomas wrote:

>  On the other hand, Nicholas's work includes timeout and reconnect.  I'm
>  not sure how complicated it is to include it in my series, but probably
>  not much.  With coroutines, preserving the list of outstanding I/O
>  requests is done implicitly by the CoMutex, so you basically have to
>  check errno for ECONNRESET and similar errors, reconnect, and retry
>  issuing the current request only.

Definitely a simpler approach than my version. Although, does your
version preserve write ordering? I was quite careful to ensure that
write requests would always be sent in the order they were presented;
although I don't know if qemu proper depends on that behaviour or not.


Yes, the current operation does not drop the CoMutex until the error 
goes away.  Ordering is preserved because in turn no request can start 
until the first leaves the coroutine (with the CoMutex locked).


Anyway I think the guests do not depend on it, there is no ordering 
guarantee in the AIO thread pool.  If any of the more advanced file 
formats do, that would be a bug.



>  Timeout can be done with a QEMUTimer that shuts down the socket
>  (shutdown(2) I mean).  This triggers an EPIPE when writing, or a
>  zero-sized read when reading.  The timeout can be set every time the
>  coroutine is (re)entered, and reset before exiting nbd_co_readv/writev.

for reconnect, I did consider using QEMUTimer, but when I tried it I ran
into linking problems with qemu-io.   As far as I can tell, resolving that
is a significant code reorganisation - QEMUTimer pulls in lots of code
that isn't linked in at the moment (VM clocks, etc). I'm not sure it's
worth tackling that to avoid using timer_create(2).


I agree.  I think it's worth it because it would help Anthony's work 
with unit testing too; but it's also a significant amount of work.  But 
in this sense, keeping each feature in a separate patch helps a lot.  If 
something is done in a hacky way, it's much easier to redo it later if 
"git show" gives a good overview of how it was done.


I saw earlier versions of your patch had problems with the upstream nbd 
server.  It works for me, but you'd be welcome trying out my series.


Paolo



Re: [Qemu-devel] [PATCH 00/12] nbd improvements

2011-09-09 Thread Kevin Wolf
Am 09.09.2011 12:50, schrieb Nicholas Thomas:
> On Fri, 2011-09-09 at 12:29 +0200, Paolo Bonzini wrote:
>> On 09/09/2011 11:00 AM, Kevin Wolf wrote:
>>> There is anonther patch enabling AIO for NBD on the list [1], by
>>> Nicholas Thomas (CCed), that lacked review so far. Can you guys please
>>> review each others approach and then converge on a solution? I guess
>>> Paolo's patches 1-7 can be applied in any case, probably causing minor
>>> conflicts, but for the rest we need to decide which one to pick.
>>
>> Stefan also pointed me to Nicholas's patches yesterday.  I would go with 
>> mine, if only because his work predates coroutines (at least the older 
>> versions) and are much more complex.
> 
> I'd concur here. I wrote the AIO portion of my patches merely to get the
> code into a state where I could add timeout and reconnect logic - I'm
> pretty sure the code I wrote is *correct* (we're using it in production,
> after all), but I'm by no means invested in it. Your version includes
> TRIM and flush support as well, which is nice.

Good to see agreement here. Do you think that Paolo's patches need to be
changed or can we do everything else on top?

>> On the other hand, Nicholas's work includes timeout and reconnect.  I'm 
>> not sure how complicated it is to include it in my series, but probably 
>> not much.  With coroutines, preserving the list of outstanding I/O 
>> requests is done implicitly by the CoMutex, so you basically have to 
>> check errno for ECONNRESET and similar errors, reconnect, and retry 
>> issuing the current request only.
> 
> Definitely a simpler approach than my version. Although, does your
> version preserve write ordering? I was quite careful to ensure that
> write requests would always be sent in the order they were presented;
> although I don't know if qemu proper depends on that behaviour or not.

This is stricter semantics than is required. Write order is only
guaranteed if request B is issued after request A has completed. If A
has been submitted, but hasn't completed yet, and you submit B, then the
order is undefined.

>> Timeout can be done with a QEMUTimer that shuts down the socket 
>> (shutdown(2) I mean).  This triggers an EPIPE when writing, or a 
>> zero-sized read when reading.  The timeout can be set every time the 
>> coroutine is (re)entered, and reset before exiting nbd_co_readv/writev.
> 
> for reconnect, I did consider using QEMUTimer, but when I tried it I ran
> into linking problems with qemu-io. As far as I can tell, resolving that
> is a significant code reorganisation - QEMUTimer pulls in lots of code
> that isn't linked in at the moment (VM clocks, etc). I'm not sure it's
> worth tackling that to avoid using timer_create(2). 
> 
> The timeout code I have at the moment is something of a bodge and
> replacing it with an actual timer (of either kind) would be a massive
> improvement, obviously.

We do have some timer stubs in qemu-tool.c since May. Of course, they
are just stubs and never really trigger the timer, but I assume that the
timeouts are really meant for qemu proper anyway, rather than qemu-img
or qemu-io.

Kevin



Re: [Qemu-devel] [PATCH 00/12] nbd improvements

2011-09-09 Thread Nicholas Thomas
On 09/09/11 12:04, Kevin Wolf wrote:

> Good to see agreement here. Do you think that Paolo's patches need to be
> changed or can we do everything else on top?

A few things have come up on a third read, actually. I'll respond in due
course to the appropriate patch.

> We do have some timer stubs in qemu-tool.c since May. Of course, they
> are just stubs and never really trigger the timer, but I assume that the
> timeouts are really meant for qemu proper anyway, rather than qemu-img
> or qemu-io.

I wasn't aware of these - however, using them would make testing
timeout/reconnect quite a bit harder. And surely there are use cases of
qemu-img that would benefit from automatic reconnection?

I can work around the former, and we don't actually use qemu-img for
anything ourselves, so if you think this is the right way to go about
it, I'm happy to rework the timeout/reconnect using QEMUTimer, on top of
Paolo's patches.

/Nick



Re: [Qemu-devel] [PATCH 00/12] nbd improvements

2011-09-14 Thread Kevin Wolf
Am 08.09.2011 17:24, schrieb Paolo Bonzini:
> I find nbd quite useful to test migration, but it is limited:
> it can only do synchronous operation, it is not safe because it
> does not support flush, and it has no discard either.  qemu-nbd
> is also limited to 1MB requests, and the nbd block driver does
> not take this into account.
> 
> Luckily, flush/FUA support is being worked out by upstream,
> and discard can also be added with the same framework (patches
> 1 to 6).
> 
> Asynchronous support is also very similar to what sheepdog is
> already doing (patches 7 to 12).
> 
> Paolo Bonzini (12):
>   nbd: support feature negotiation
>   nbd: sync API definitions with upstream
>   nbd: support NBD_SET_FLAGS ioctl
>   nbd: add support for NBD_CMD_FLUSH
>   nbd: add support for NBD_CMD_FLAG_FUA
>   nbd: support NBD_CMD_TRIM in the server
>   sheepdog: add coroutine_fn markers
>   add socket_set_block
>   sheepdog: move coroutine send/recv function to generic code
>   block: add bdrv_co_flush support
>   nbd: switch to asynchronous operation
>   nbd: split requests

Okay, completed the review for this series now. I think if you consider
the comments posted so far for v2 we should be good.

Kevin