On Fri, Jul 26, 2013 at 03:10:42PM +0900, MORITA Kazutaka wrote: > Currently, if a sheepdog server exits, all the connecting VMs need to > be restarted. This series implements a feature to reconnect the > server, and enables us to do online sheepdog upgrade and avoid > restarting VMs when sheepdog servers crash unexpectedly. > > v4: > - Added comment to explain why we need a failed queue. > - Fixed a return value of sd_acb_cancelable(). > > v3: > - Check return values of qemu_co_recv/send more strictly. > - Move inflight requests to the failed list after reconnection > completes. This is necessary to resend I/Os while connection is > lost. > - Check simultaneous create in resend_aioreq(). > > v2: > - Dropped nonblocking connect patches. > > MORITA Kazutaka (10): > ignore SIGPIPE in qemu-img and qemu-io > iov: handle EOF in iov_send_recv > sheepdog: check return values of qemu_co_recv/send correctly > sheepdog: handle vdi objects in resend_aio_req > sheepdog: reload inode outside of resend_aioreq > coroutine: add co_aio_sleep_ns() to allow sleep in block drivers > sheepdog: try to reconnect to sheepdog after network error > sheepdog: make add_aio_request and send_aioreq void functions > sheepdog: cancel aio requests if possible > sheepdog: check simultaneous create in resend_aioreq > > block/sheepdog.c | 320 > +++++++++++++++++++++++++++++----------------- > include/block/coroutine.h | 8 ++ > qemu-coroutine-sleep.c | 47 +++++++ > qemu-img.c | 4 + > qemu-io.c | 4 + > util/iov.c | 6 + > 6 files changed, 269 insertions(+), 120 deletions(-)
I have done a brief review. The biggest change that I suggest using the new AioContext timer support that Alex Bligh and Ping Fan are working on (see qemu-devel for the latest patches). It provides a way to use a timer during qemu_aio_wait() without spinning. CCed Nick Thomas who worked on NBD reconnect. Maybe your series will motivate him to push his patches again, or he might have some review suggestions for you.