[Qemu-devel] [PATCH] MAINTAINERS: update sheepdog maintainer

2014-09-03 Thread MORITA Kazutaka
Hitoshi takes over sheepdog maintenance from me.

Signed-off-by: MORITA Kazutaka morita.kazut...@gmail.com
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 59940f9..28a697d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -968,7 +968,7 @@ S: Supported
 F: block/rbd.c
 
 Sheepdog
-M: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
+M: Hitoshi Mitake mitake.hito...@lab.ntt.co.jp
 M: Liu Yuan namei.u...@gmail.com
 L: sheep...@lists.wpkg.org
 S: Supported
-- 
1.9.1




Re: [Qemu-devel] [sheepdog] [PATCH v4 2/2] sheepdog: support user-defined redundancy option

2013-11-01 Thread MORITA Kazutaka
At Thu, 31 Oct 2013 13:49:28 +0800,
Liu Yuan wrote:
 
 +/*
 + * Sheepdog support two kinds of redundancy, full replication and erasure
 + * coding.
 + *
 + * # create a fully replicated vdi with x copies
 + * -o redundancy=x (1 = x = SD_MAX_COPIES)
 + *
 + * # create a erasure coded vdi with x data strips and y parity strips
 + * -o redundancy=x:y (x must be one of {2,4,8,16} and 1 = y  
 SD_EC_MAX_STRIP)
 + */
 +static int parse_redundancy(BDRVSheepdogState *s, const char *opt)
 +{
 +struct SheepdogInode *inode = s-inode;
 +const char *n1, *n2;
 +uint8_t copy, parity;
 +char p[10];
 +
 +strncpy(p, opt, sizeof(p));

strncpy() is not safe here.  Please use pstrcpy() instead.

 +n1 = strtok(p, :);
 +n2 = strtok(NULL, :);
 +
 +if ((n1  !is_numeric(n1)) || (n2  !is_numeric(n2))) {
 +return -EINVAL;
 +}

This cannot detect an error when 'opt' is empty.  Actually, the
following command causes a segfault.

 $ qemu-img create -o redundancy= sheepdog:test 4G

Thanks,

Kazutaka



Re: [Qemu-devel] [sheepdog] [PATCH v5 RESENT 0/2] sheepdog: add user-defined redundancy option

2013-11-01 Thread MORITA Kazutaka
At Fri,  1 Nov 2013 23:10:11 +0800,
Liu Yuan wrote:
 
 v5:
  - use pstrcpy instead of strncpy
  - fix a segfalt for 'null' string option string
 
 v4:
  - fix do_sd_create that forgot to pass nr_copies
  - fix parse_redundancy dealing with replicated vdi
 
 v3:
  - rework is_numeric
 
 v2:
  - fix a typo in comment and commit log
 
  This patch set add one sheepdog specific option for qemu-img to control
  redundancy.
 
  This patch set is on top of Kevin's block tree.
 
 Liu Yuan (2):
   sheepdog: refactor do_sd_create()
   sheepdog: support user-defined redundancy option
 
  block/sheepdog.c  |  127 
 +
  include/block/block_int.h |1 +
  2 files changed, 105 insertions(+), 23 deletions(-)

Reviewed-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp



Re: [Qemu-devel] [sheepdog] [PATCH v2 2/2] sheepdog: support user-defined redundancy option

2013-10-30 Thread MORITA Kazutaka
At Tue, 29 Oct 2013 16:25:52 +0800,
Liu Yuan wrote:
 
 Sheepdog support two kinds of redundancy, full replication and erasure coding.
 
 # create a fully replicated vdi with x copies
  -o redundancy=x (1 = x = SD_MAX_COPIES)
 
 # create a erasure coded vdi with x data strips and y parity strips
  -o redundancy=x:y (x must be one of {2,4,8,16} and 1 = y  SD_EC_MAX_STRIP)
 
 E.g, to convert a vdi into sheepdog vdi 'test' with 8:3 erasure coding scheme
 
 $ qemu-img convert -o redundancy=8:3 linux-0.2.img sheepdog:test
 
 Cc: Kevin Wolf kw...@redhat.com
 Cc: Stefan Hajnoczi stefa...@redhat.com
 Signed-off-by: Liu Yuan namei.u...@gmail.com
 ---
  block/sheepdog.c  |   78 
 -
  include/block/block_int.h |1 +
  2 files changed, 78 insertions(+), 1 deletion(-)
 
 diff --git a/block/sheepdog.c b/block/sheepdog.c
 index e66d2f8..bd7cfd6 100644
 --- a/block/sheepdog.c
 +++ b/block/sheepdog.c
 @@ -91,6 +91,14 @@
  #define SD_NR_VDIS   (1U  24)
  #define SD_DATA_OBJ_SIZE (UINT64_C(1)  22)
  #define SD_MAX_VDI_SIZE (SD_DATA_OBJ_SIZE * MAX_DATA_OBJS)
 +/*
 + * For erasure coding, we use at most SD_EC_MAX_STRIP for data strips and
 + * (SD_EC_MAX_STRIP - 1) for parity strips
 + *
 + * SD_MAX_COPIES is sum of number of dats trips and parity strips.
 + */
 +#define SD_EC_MAX_STRIP 16
 +#define SD_MAX_COPIES (SD_EC_MAX_STRIP * 2 - 1)
  
  #define SD_INODE_SIZE (sizeof(SheepdogInode))
  #define CURRENT_VDI_ID 0
 @@ -1446,6 +1454,65 @@ out:
  return ret;
  }
  
 +static int64_t is_numeric(const char *s)
 +{
 +char *end;
 +return strtosz_suffix(s, end, STRTOSZ_DEFSUFFIX_B);
 +}

I think the type of the return value should be bool.

Thanks,

Kazutaka



Re: [Qemu-devel] [sheepdog] [PATCH v2 0/2] sheepdog: make use of copy_policy

2013-10-25 Thread MORITA Kazutaka
At Wed, 23 Oct 2013 16:51:50 +0800,
Liu Yuan wrote:
 
 v2:
  - merge the reserved bits
 
 This patch set makes use of copy_policy in struct SheepdogInode in order to
 support recently introduced erasure coding volume in sheepdog.
 
 Thanks
 Yuan
 
 Liu Yuan (2):
   sheepdog: explicitly set copies as type uint8_t
   sheepdog: pass copy_policy in the request
 
  block/sheepdog.c |   30 +++---
  1 file changed, 19 insertions(+), 11 deletions(-)

Acked-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp



[Qemu-devel] [PATCH v5 6/8] sheepdog: make add_aio_request and send_aioreq void functions

2013-10-24 Thread MORITA Kazutaka
These functions no longer return errors.  We can make them void
functions and simplify the codes.

Reviewed-by: Liu Yuan namei.u...@gmail.com
Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c |   66 ++
 1 file changed, 17 insertions(+), 49 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 3e98291..5846ac4 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -606,10 +606,10 @@ static int do_req(int sockfd, SheepdogReq *hdr, void 
*data,
 return srco.ret;
 }
 
-static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
+static void coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
struct iovec *iov, int niov, bool create,
enum AIOCBState aiocb_type);
-static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req);
+static void coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req);
 static int reload_inode(BDRVSheepdogState *s, uint32_t snapid, const char 
*tag);
 static int get_sheep_fd(BDRVSheepdogState *s);
 static void co_write_request(void *opaque);
@@ -635,22 +635,14 @@ static void coroutine_fn 
send_pending_req(BDRVSheepdogState *s, uint64_t oid)
 {
 AIOReq *aio_req;
 SheepdogAIOCB *acb;
-int ret;
 
 while ((aio_req = find_pending_req(s, oid)) != NULL) {
 acb = aio_req-aiocb;
 /* move aio_req from pending list to inflight one */
 QLIST_REMOVE(aio_req, aio_siblings);
 QLIST_INSERT_HEAD(s-inflight_aio_head, aio_req, aio_siblings);
-ret = add_aio_request(s, aio_req, acb-qiov-iov,
-  acb-qiov-niov, false, acb-aiocb_type);
-if (ret  0) {
-error_report(add_aio_request is failed);
-free_aio_req(s, aio_req);
-if (!acb-nr_pending) {
-sd_finish_aiocb(acb);
-}
-}
+add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov, false,
+acb-aiocb_type);
 }
 }
 
@@ -813,11 +805,8 @@ static void coroutine_fn aio_read_response(void *opaque)
 } else {
 aio_req-oid = vid_to_vdi_oid(s-inode.vdi_id);
 }
-ret = resend_aioreq(s, aio_req);
-if (ret == SD_RES_SUCCESS) {
-goto out;
-}
-/* fall through */
+resend_aioreq(s, aio_req);
+goto out;
 default:
 acb-ret = -EIO;
 error_report(%s, sd_strerror(rsp.result));
@@ -1066,7 +1055,7 @@ out:
 return ret;
 }
 
-static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
+static void coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
struct iovec *iov, int niov, bool create,
enum AIOCBState aiocb_type)
 {
@@ -1144,8 +1133,6 @@ out:
 qemu_aio_set_fd_handler(s-fd, co_read_response, NULL, s);
 s-co_send = NULL;
 qemu_co_mutex_unlock(s-lock);
-
-return 0;
 }
 
 static int read_write_object(int fd, char *buf, uint64_t oid, int copies,
@@ -1248,7 +1235,7 @@ out:
 return ret;
 }
 
-static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req)
+static void coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req)
 {
 SheepdogAIOCB *acb = aio_req-aiocb;
 bool create = false;
@@ -1273,7 +1260,7 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState 
*s, AIOReq *aio_req)
 DPRINTF(simultaneous CoW to % PRIx64 \n, aio_req-oid);
 QLIST_REMOVE(aio_req, aio_siblings);
 QLIST_INSERT_HEAD(s-pending_aio_head, aio_req, aio_siblings);
-return SD_RES_SUCCESS;
+return;
 }
 }
 
@@ -1283,13 +1270,13 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState 
*s, AIOReq *aio_req)
 }
 out:
 if (is_data_obj(aio_req-oid)) {
-return add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov,
-   create, acb-aiocb_type);
+add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov, create,
+acb-aiocb_type);
 } else {
 struct iovec iov;
 iov.iov_base = s-inode;
 iov.iov_len = sizeof(s-inode);
-return add_aio_request(s, aio_req, iov, 1, false, AIOCB_WRITE_UDATA);
+add_aio_request(s, aio_req, iov, 1, false, AIOCB_WRITE_UDATA);
 }
 }
 
@@ -1689,7 +1676,6 @@ static int sd_truncate(BlockDriverState *bs, int64_t 
offset)
  */
 static void coroutine_fn sd_write_done(SheepdogAIOCB *acb)
 {
-int ret;
 BDRVSheepdogState *s = acb-common.bs-opaque;
 struct iovec iov;
 AIOReq *aio_req;
@@ -1711,18 +1697,13 @@ static void coroutine_fn sd_write_done(SheepdogAIOCB 
*acb)
 aio_req = alloc_aio_req(s, acb, vid_to_vdi_oid(s-inode.vdi_id),
 data_len, offset, 0, 0, offset

[Qemu-devel] [PATCH v5 4/8] coroutine: add co_aio_sleep_ns() to allow sleep in block drivers

2013-10-24 Thread MORITA Kazutaka
This helper function behaves similarly to co_sleep_ns(), but the
sleeping coroutine will be resumed when using qemu_aio_wait().

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 include/block/coroutine.h |9 +
 qemu-coroutine-sleep.c|   14 ++
 2 files changed, 23 insertions(+)

diff --git a/include/block/coroutine.h b/include/block/coroutine.h
index 4232569..4d5c0cf 100644
--- a/include/block/coroutine.h
+++ b/include/block/coroutine.h
@@ -216,6 +216,15 @@ void qemu_co_rwlock_unlock(CoRwlock *lock);
 void coroutine_fn co_sleep_ns(QEMUClockType type, int64_t ns);
 
 /**
+ * Yield the coroutine for a given duration
+ *
+ * Behaves similarly to co_sleep_ns(), but the sleeping coroutine will be
+ * resumed when using qemu_aio_wait().
+ */
+void coroutine_fn co_aio_sleep_ns(AioContext *ctx, QEMUClockType type,
+  int64_t ns);
+
+/**
  * Yield until a file descriptor becomes readable
  *
  * Note that this function clobbers the handlers for the file descriptor.
diff --git a/qemu-coroutine-sleep.c b/qemu-coroutine-sleep.c
index f6db978..ad78fba 100644
--- a/qemu-coroutine-sleep.c
+++ b/qemu-coroutine-sleep.c
@@ -13,6 +13,7 @@
 
 #include block/coroutine.h
 #include qemu/timer.h
+#include block/aio.h
 
 typedef struct CoSleepCB {
 QEMUTimer *ts;
@@ -37,3 +38,16 @@ void coroutine_fn co_sleep_ns(QEMUClockType type, int64_t ns)
 timer_del(sleep_cb.ts);
 timer_free(sleep_cb.ts);
 }
+
+void coroutine_fn co_aio_sleep_ns(AioContext *ctx, QEMUClockType type,
+  int64_t ns)
+{
+CoSleepCB sleep_cb = {
+.co = qemu_coroutine_self(),
+};
+sleep_cb.ts = aio_timer_new(ctx, type, SCALE_NS, co_sleep_cb, sleep_cb);
+timer_mod(sleep_cb.ts, qemu_clock_get_ns(type) + ns);
+qemu_coroutine_yield();
+timer_del(sleep_cb.ts);
+timer_free(sleep_cb.ts);
+}
-- 
1.7.10.4




[Qemu-devel] [PATCH v5 5/8] sheepdog: try to reconnect to sheepdog after network error

2013-10-24 Thread MORITA Kazutaka
This introduces a failed request queue and links all the inflight
requests to the list after network error happens.  After QEMU
reconnects to the sheepdog server successfully, the sheepdog block
driver will retry all the requests in the failed queue.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c |   80 --
 1 file changed, 66 insertions(+), 14 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 5569e54..3e98291 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -299,6 +299,8 @@ struct SheepdogAIOCB {
 };
 
 typedef struct BDRVSheepdogState {
+BlockDriverState *bs;
+
 SheepdogInode inode;
 
 uint32_t min_dirty_data_idx;
@@ -318,8 +320,11 @@ typedef struct BDRVSheepdogState {
 Coroutine *co_recv;
 
 uint32_t aioreq_seq_num;
+
+/* Every aio request must be linked to either of these queues. */
 QLIST_HEAD(inflight_aio_head, AIOReq) inflight_aio_head;
 QLIST_HEAD(pending_aio_head, AIOReq) pending_aio_head;
+QLIST_HEAD(failed_aio_head, AIOReq) failed_aio_head;
 } BDRVSheepdogState;
 
 static const char * sd_strerror(int err)
@@ -606,6 +611,8 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState 
*s, AIOReq *aio_req,
enum AIOCBState aiocb_type);
 static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req);
 static int reload_inode(BDRVSheepdogState *s, uint32_t snapid, const char 
*tag);
+static int get_sheep_fd(BDRVSheepdogState *s);
+static void co_write_request(void *opaque);
 
 static AIOReq *find_pending_req(BDRVSheepdogState *s, uint64_t oid)
 {
@@ -647,6 +654,51 @@ static void coroutine_fn 
send_pending_req(BDRVSheepdogState *s, uint64_t oid)
 }
 }
 
+static coroutine_fn void reconnect_to_sdog(void *opaque)
+{
+BDRVSheepdogState *s = opaque;
+AIOReq *aio_req, *next;
+
+qemu_aio_set_fd_handler(s-fd, NULL, NULL, NULL);
+close(s-fd);
+s-fd = -1;
+
+/* Wait for outstanding write requests to be completed. */
+while (s-co_send != NULL) {
+co_write_request(opaque);
+}
+
+/* Try to reconnect the sheepdog server every one second. */
+while (s-fd  0) {
+s-fd = get_sheep_fd(s);
+if (s-fd  0) {
+DPRINTF(Wait for connection to be established\n);
+co_aio_sleep_ns(bdrv_get_aio_context(s-bs), QEMU_CLOCK_REALTIME,
+10ULL);
+}
+};
+
+/*
+ * Now we have to resend all the request in the inflight queue.  However,
+ * resend_aioreq() can yield and newly created requests can be added to the
+ * inflight queue before the coroutine is resumed.  To avoid mixing them, 
we
+ * have to move all the inflight requests to the failed queue before
+ * resend_aioreq() is called.
+ */
+QLIST_FOREACH_SAFE(aio_req, s-inflight_aio_head, aio_siblings, next) {
+QLIST_REMOVE(aio_req, aio_siblings);
+QLIST_INSERT_HEAD(s-failed_aio_head, aio_req, aio_siblings);
+}
+
+/* Resend all the failed aio requests. */
+while (!QLIST_EMPTY(s-failed_aio_head)) {
+aio_req = QLIST_FIRST(s-failed_aio_head);
+QLIST_REMOVE(aio_req, aio_siblings);
+QLIST_INSERT_HEAD(s-inflight_aio_head, aio_req, aio_siblings);
+resend_aioreq(s, aio_req);
+}
+}
+
 /*
  * Receive responses of the I/O requests.
  *
@@ -663,15 +715,11 @@ static void coroutine_fn aio_read_response(void *opaque)
 SheepdogAIOCB *acb;
 uint64_t idx;
 
-if (QLIST_EMPTY(s-inflight_aio_head)) {
-goto out;
-}
-
 /* read a header */
 ret = qemu_co_recv(fd, rsp, sizeof(rsp));
 if (ret != sizeof(rsp)) {
 error_report(failed to get the header, %s, strerror(errno));
-goto out;
+goto err;
 }
 
 /* find the right aio_req from the inflight aio list */
@@ -682,7 +730,7 @@ static void coroutine_fn aio_read_response(void *opaque)
 }
 if (!aio_req) {
 error_report(cannot find aio_req %x, rsp.id);
-goto out;
+goto err;
 }
 
 acb = aio_req-aiocb;
@@ -722,7 +770,7 @@ static void coroutine_fn aio_read_response(void *opaque)
 aio_req-iov_offset, rsp.data_length);
 if (ret != rsp.data_length) {
 error_report(failed to get the data, %s, strerror(errno));
-goto out;
+goto err;
 }
 break;
 case AIOCB_FLUSH_CACHE:
@@ -756,10 +804,9 @@ static void coroutine_fn aio_read_response(void *opaque)
 if (s-inode.vdi_id == oid_to_vid(aio_req-oid)) {
 ret = reload_inode(s, 0, );
 if (ret  0) {
-goto out;
+goto err;
 }
 }
-
 if (is_data_obj(aio_req-oid)) {
 aio_req-oid = vid_to_data_oid(s-inode.vdi_id,
data_oid_to_idx(aio_req-oid));
@@ -787,6 +834,10 @@ static void

[Qemu-devel] [PATCH v5 1/8] sheepdog: check return values of qemu_co_recv/send correctly

2013-10-24 Thread MORITA Kazutaka
If qemu_co_recv/send doesn't return the specified length, it means
that an error happened.

Reviewed-by: Liu Yuan namei.u...@gmail.com
Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c |   16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 5f81c93..cb681de 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -489,13 +489,13 @@ static coroutine_fn int send_co_req(int sockfd, 
SheepdogReq *hdr, void *data,
 int ret;
 
 ret = qemu_co_send(sockfd, hdr, sizeof(*hdr));
-if (ret  sizeof(*hdr)) {
+if (ret != sizeof(*hdr)) {
 error_report(failed to send a req, %s, strerror(errno));
 return ret;
 }
 
 ret = qemu_co_send(sockfd, data, *wlen);
-if (ret  *wlen) {
+if (ret != *wlen) {
 error_report(failed to send a req, %s, strerror(errno));
 }
 
@@ -541,7 +541,7 @@ static coroutine_fn void do_co_req(void *opaque)
 qemu_aio_set_fd_handler(sockfd, restart_co_req, NULL, co);
 
 ret = qemu_co_recv(sockfd, hdr, sizeof(*hdr));
-if (ret  sizeof(*hdr)) {
+if (ret != sizeof(*hdr)) {
 error_report(failed to get a rsp, %s, strerror(errno));
 ret = -errno;
 goto out;
@@ -553,7 +553,7 @@ static coroutine_fn void do_co_req(void *opaque)
 
 if (*rlen) {
 ret = qemu_co_recv(sockfd, data, *rlen);
-if (ret  *rlen) {
+if (ret != *rlen) {
 error_report(failed to get the data, %s, strerror(errno));
 ret = -errno;
 goto out;
@@ -664,7 +664,7 @@ static void coroutine_fn aio_read_response(void *opaque)
 
 /* read a header */
 ret = qemu_co_recv(fd, rsp, sizeof(rsp));
-if (ret  0) {
+if (ret != sizeof(rsp)) {
 error_report(failed to get the header, %s, strerror(errno));
 goto out;
 }
@@ -715,7 +715,7 @@ static void coroutine_fn aio_read_response(void *opaque)
 case AIOCB_READ_UDATA:
 ret = qemu_co_recvv(fd, acb-qiov-iov, acb-qiov-niov,
 aio_req-iov_offset, rsp.data_length);
-if (ret  0) {
+if (ret != rsp.data_length) {
 error_report(failed to get the data, %s, strerror(errno));
 goto out;
 }
@@ -1059,7 +1059,7 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState 
*s, AIOReq *aio_req,
 
 /* send a header */
 ret = qemu_co_send(s-fd, hdr, sizeof(hdr));
-if (ret  0) {
+if (ret != sizeof(hdr)) {
 qemu_co_mutex_unlock(s-lock);
 error_report(failed to send a req, %s, strerror(errno));
 return -errno;
@@ -1067,7 +1067,7 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState 
*s, AIOReq *aio_req,
 
 if (wlen) {
 ret = qemu_co_sendv(s-fd, iov, niov, aio_req-iov_offset, wlen);
-if (ret  0) {
+if (ret != wlen) {
 qemu_co_mutex_unlock(s-lock);
 error_report(failed to send a data, %s, strerror(errno));
 return -errno;
-- 
1.7.10.4




[Qemu-devel] [PATCH v5 3/8] sheepdog: reload inode outside of resend_aioreq

2013-10-24 Thread MORITA Kazutaka
This prepares for using resend_aioreq() after reconnecting to the
sheepdog server.

Reviewed-by: Liu Yuan namei.u...@gmail.com
Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c |   33 +++--
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 59cad97..5569e54 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -222,6 +222,11 @@ static inline uint64_t data_oid_to_idx(uint64_t oid)
 return oid  (MAX_DATA_OBJS - 1);
 }
 
+static inline uint32_t oid_to_vid(uint64_t oid)
+{
+return (oid  ~VDI_BIT)  VDI_SPACE_SHIFT;
+}
+
 static inline uint64_t vid_to_vdi_oid(uint32_t vid)
 {
 return VDI_BIT | ((uint64_t)vid  VDI_SPACE_SHIFT);
@@ -600,7 +605,7 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState 
*s, AIOReq *aio_req,
struct iovec *iov, int niov, bool create,
enum AIOCBState aiocb_type);
 static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req);
-
+static int reload_inode(BDRVSheepdogState *s, uint32_t snapid, const char 
*tag);
 
 static AIOReq *find_pending_req(BDRVSheepdogState *s, uint64_t oid)
 {
@@ -748,6 +753,19 @@ static void coroutine_fn aio_read_response(void *opaque)
 case SD_RES_SUCCESS:
 break;
 case SD_RES_READONLY:
+if (s-inode.vdi_id == oid_to_vid(aio_req-oid)) {
+ret = reload_inode(s, 0, );
+if (ret  0) {
+goto out;
+}
+}
+
+if (is_data_obj(aio_req-oid)) {
+aio_req-oid = vid_to_data_oid(s-inode.vdi_id,
+   data_oid_to_idx(aio_req-oid));
+} else {
+aio_req-oid = vid_to_vdi_oid(s-inode.vdi_id);
+}
 ret = resend_aioreq(s, aio_req);
 if (ret == SD_RES_SUCCESS) {
 goto out;
@@ -1185,19 +1203,6 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState 
*s, AIOReq *aio_req)
 {
 SheepdogAIOCB *acb = aio_req-aiocb;
 bool create = false;
-int ret;
-
-ret = reload_inode(s, 0, );
-if (ret  0) {
-return ret;
-}
-
-if (is_data_obj(aio_req-oid)) {
-aio_req-oid = vid_to_data_oid(s-inode.vdi_id,
-   data_oid_to_idx(aio_req-oid));
-} else {
-aio_req-oid = vid_to_vdi_oid(s-inode.vdi_id);
-}
 
 /* check whether this request becomes a CoW one */
 if (acb-aiocb_type == AIOCB_WRITE_UDATA  is_data_obj(aio_req-oid)) {
-- 
1.7.10.4




[Qemu-devel] [PATCH v5 0/8] sheepdog: reconnect server after connection failure

2013-10-24 Thread MORITA Kazutaka
Currently, if a sheepdog server exits, all the connecting VMs need to
be restarted.  This series implements a feature to reconnect the
server, and enables us to do online sheepdog upgrade and avoid
restarting VMs when sheepdog servers crash unexpectedly.

v5:
 - Use AioContext timer for co_aio_sleep_ns().

v4:
 - Added comment to explain why we need a failed queue.
 - Fixed a return value of sd_acb_cancelable().

v3:
 - Check return values of qemu_co_recv/send more strictly.
 - Move inflight requests to the failed list after reconnection
   completes.  This is necessary to resend I/Os while connection is
   lost.
 - Check simultaneous create in resend_aioreq().

v2:
 - Dropped nonblocking connect patches.

MORITA Kazutaka (8):
  sheepdog: check return values of qemu_co_recv/send correctly
  sheepdog: handle vdi objects in resend_aio_req
  sheepdog: reload inode outside of resend_aioreq
  coroutine: add co_aio_sleep_ns() to allow sleep in block drivers
  sheepdog: try to reconnect to sheepdog after network error
  sheepdog: make add_aio_request and send_aioreq void functions
  sheepdog: cancel aio requests if possible
  sheepdog: check simultaneous create in resend_aioreq

 block/sheepdog.c  |  322 -
 include/block/coroutine.h |9 ++
 qemu-coroutine-sleep.c|   14 ++
 3 files changed, 226 insertions(+), 119 deletions(-)

-- 
1.7.10.4




[Qemu-devel] [PATCH v5 7/8] sheepdog: cancel aio requests if possible

2013-10-24 Thread MORITA Kazutaka
This patch tries to cancel aio requests in pending queue and failed
queue.  When the sheepdog driver cannot cancel the requests, it waits
for them to be completed.

Reviewed-by: Liu Yuan namei.u...@gmail.com
Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c |   70 +-
 1 file changed, 59 insertions(+), 11 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 5846ac4..cb3a22d 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -294,7 +294,8 @@ struct SheepdogAIOCB {
 Coroutine *coroutine;
 void (*aio_done_func)(SheepdogAIOCB *);
 
-bool canceled;
+bool cancelable;
+bool *finished;
 int nr_pending;
 };
 
@@ -413,6 +414,7 @@ static inline void free_aio_req(BDRVSheepdogState *s, 
AIOReq *aio_req)
 {
 SheepdogAIOCB *acb = aio_req-aiocb;
 
+acb-cancelable = false;
 QLIST_REMOVE(aio_req, aio_siblings);
 g_free(aio_req);
 
@@ -421,23 +423,68 @@ static inline void free_aio_req(BDRVSheepdogState *s, 
AIOReq *aio_req)
 
 static void coroutine_fn sd_finish_aiocb(SheepdogAIOCB *acb)
 {
-if (!acb-canceled) {
-qemu_coroutine_enter(acb-coroutine, NULL);
+qemu_coroutine_enter(acb-coroutine, NULL);
+if (acb-finished) {
+*acb-finished = true;
 }
 qemu_aio_release(acb);
 }
 
+/*
+ * Check whether the specified acb can be canceled
+ *
+ * We can cancel aio when any request belonging to the acb is:
+ *  - Not processed by the sheepdog server.
+ *  - Not linked to the inflight queue.
+ */
+static bool sd_acb_cancelable(const SheepdogAIOCB *acb)
+{
+BDRVSheepdogState *s = acb-common.bs-opaque;
+AIOReq *aioreq;
+
+if (!acb-cancelable) {
+return false;
+}
+
+QLIST_FOREACH(aioreq, s-inflight_aio_head, aio_siblings) {
+if (aioreq-aiocb == acb) {
+return false;
+}
+}
+
+return true;
+}
+
 static void sd_aio_cancel(BlockDriverAIOCB *blockacb)
 {
 SheepdogAIOCB *acb = (SheepdogAIOCB *)blockacb;
+BDRVSheepdogState *s = acb-common.bs-opaque;
+AIOReq *aioreq, *next;
+bool finished = false;
+
+acb-finished = finished;
+while (!finished) {
+if (sd_acb_cancelable(acb)) {
+/* Remove outstanding requests from pending and failed queues.  */
+QLIST_FOREACH_SAFE(aioreq, s-pending_aio_head, aio_siblings,
+   next) {
+if (aioreq-aiocb == acb) {
+free_aio_req(s, aioreq);
+}
+}
+QLIST_FOREACH_SAFE(aioreq, s-failed_aio_head, aio_siblings,
+   next) {
+if (aioreq-aiocb == acb) {
+free_aio_req(s, aioreq);
+}
+}
 
-/*
- * Sheepdog cannot cancel the requests which are already sent to
- * the servers, so we just complete the request with -EIO here.
- */
-acb-ret = -EIO;
-qemu_coroutine_enter(acb-coroutine, NULL);
-acb-canceled = true;
+assert(acb-nr_pending == 0);
+sd_finish_aiocb(acb);
+return;
+}
+qemu_aio_wait();
+}
 }
 
 static const AIOCBInfo sd_aiocb_info = {
@@ -458,7 +505,8 @@ static SheepdogAIOCB *sd_aio_setup(BlockDriverState *bs, 
QEMUIOVector *qiov,
 acb-nb_sectors = nb_sectors;
 
 acb-aio_done_func = NULL;
-acb-canceled = false;
+acb-cancelable = true;
+acb-finished = NULL;
 acb-coroutine = qemu_coroutine_self();
 acb-ret = 0;
 acb-nr_pending = 0;
-- 
1.7.10.4




[Qemu-devel] [PATCH v5 8/8] sheepdog: check simultaneous create in resend_aioreq

2013-10-24 Thread MORITA Kazutaka
After reconnection happens, all the inflight requests are moved to the
failed request list.  As a result, sd_co_rw_vector() can send another
create request before resend_aioreq() resends a create request from
the failed list.

This patch adds a helper function check_simultaneous_create() and
checks simultaneous create requests more strictly in resend_aioreq().

Reviewed-by: Liu Yuan namei.u...@gmail.com
Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c |   64 +++---
 1 file changed, 32 insertions(+), 32 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index cb3a22d..c9ee273 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -1283,6 +1283,29 @@ out:
 return ret;
 }
 
+/* Return true if the specified request is linked to the pending list. */
+static bool check_simultaneous_create(BDRVSheepdogState *s, AIOReq *aio_req)
+{
+AIOReq *areq;
+QLIST_FOREACH(areq, s-inflight_aio_head, aio_siblings) {
+if (areq != aio_req  areq-oid == aio_req-oid) {
+/*
+ * Sheepdog cannot handle simultaneous create requests to the same
+ * object, so we cannot send the request until the previous request
+ * finishes.
+ */
+DPRINTF(simultaneous create to % PRIx64 \n, aio_req-oid);
+aio_req-flags = 0;
+aio_req-base_oid = 0;
+QLIST_REMOVE(aio_req, aio_siblings);
+QLIST_INSERT_HEAD(s-pending_aio_head, aio_req, aio_siblings);
+return true;
+}
+}
+
+return false;
+}
+
 static void coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req)
 {
 SheepdogAIOCB *acb = aio_req-aiocb;
@@ -1291,29 +1314,19 @@ static void coroutine_fn 
resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req)
 /* check whether this request becomes a CoW one */
 if (acb-aiocb_type == AIOCB_WRITE_UDATA  is_data_obj(aio_req-oid)) {
 int idx = data_oid_to_idx(aio_req-oid);
-AIOReq *areq;
 
-if (s-inode.data_vdi_id[idx] == 0) {
-create = true;
-goto out;
-}
 if (is_data_obj_writable(s-inode, idx)) {
 goto out;
 }
 
-/* link to the pending list if there is another CoW request to
- * the same object */
-QLIST_FOREACH(areq, s-inflight_aio_head, aio_siblings) {
-if (areq != aio_req  areq-oid == aio_req-oid) {
-DPRINTF(simultaneous CoW to % PRIx64 \n, aio_req-oid);
-QLIST_REMOVE(aio_req, aio_siblings);
-QLIST_INSERT_HEAD(s-pending_aio_head, aio_req, aio_siblings);
-return;
-}
+if (check_simultaneous_create(s, aio_req)) {
+return;
 }
 
-aio_req-base_oid = vid_to_data_oid(s-inode.data_vdi_id[idx], idx);
-aio_req-flags |= SD_FLAG_CMD_COW;
+if (s-inode.data_vdi_id[idx]) {
+aio_req-base_oid = vid_to_data_oid(s-inode.data_vdi_id[idx], 
idx);
+aio_req-flags |= SD_FLAG_CMD_COW;
+}
 create = true;
 }
 out:
@@ -1937,27 +1950,14 @@ static int coroutine_fn sd_co_rw_vector(void *p)
 }
 
 aio_req = alloc_aio_req(s, acb, oid, len, offset, flags, old_oid, 
done);
+QLIST_INSERT_HEAD(s-inflight_aio_head, aio_req, aio_siblings);
 
 if (create) {
-AIOReq *areq;
-QLIST_FOREACH(areq, s-inflight_aio_head, aio_siblings) {
-if (areq-oid == oid) {
-/*
- * Sheepdog cannot handle simultaneous create
- * requests to the same object.  So we cannot send
- * the request until the previous request
- * finishes.
- */
-aio_req-flags = 0;
-aio_req-base_oid = 0;
-QLIST_INSERT_HEAD(s-pending_aio_head, aio_req,
-  aio_siblings);
-goto done;
-}
+if (check_simultaneous_create(s, aio_req)) {
+goto done;
 }
 }
 
-QLIST_INSERT_HEAD(s-inflight_aio_head, aio_req, aio_siblings);
 add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov, create,
 acb-aiocb_type);
 done:
-- 
1.7.10.4




[Qemu-devel] [PATCH v5 2/8] sheepdog: handle vdi objects in resend_aio_req

2013-10-24 Thread MORITA Kazutaka
The current resend_aio_req() doesn't work when the request is against
vdi objects.  This fixes the problem.

Reviewed-by: Liu Yuan namei.u...@gmail.com
Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c |   21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index cb681de..59cad97 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -1192,11 +1192,15 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState 
*s, AIOReq *aio_req)
 return ret;
 }
 
-aio_req-oid = vid_to_data_oid(s-inode.vdi_id,
-   data_oid_to_idx(aio_req-oid));
+if (is_data_obj(aio_req-oid)) {
+aio_req-oid = vid_to_data_oid(s-inode.vdi_id,
+   data_oid_to_idx(aio_req-oid));
+} else {
+aio_req-oid = vid_to_vdi_oid(s-inode.vdi_id);
+}
 
 /* check whether this request becomes a CoW one */
-if (acb-aiocb_type == AIOCB_WRITE_UDATA) {
+if (acb-aiocb_type == AIOCB_WRITE_UDATA  is_data_obj(aio_req-oid)) {
 int idx = data_oid_to_idx(aio_req-oid);
 AIOReq *areq;
 
@@ -1224,8 +1228,15 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState 
*s, AIOReq *aio_req)
 create = true;
 }
 out:
-return add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov,
-   create, acb-aiocb_type);
+if (is_data_obj(aio_req-oid)) {
+return add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov,
+   create, acb-aiocb_type);
+} else {
+struct iovec iov;
+iov.iov_base = s-inode;
+iov.iov_len = sizeof(s-inode);
+return add_aio_request(s, aio_req, iov, 1, false, AIOCB_WRITE_UDATA);
+}
 }
 
 /* TODO Convert to fine grained options */
-- 
1.7.10.4




Re: [Qemu-devel] [PATCH] MAINTAINERS: add block driver sub-maintainers

2013-10-23 Thread MORITA Kazutaka
At Mon, 21 Oct 2013 14:26:15 +0100,
Stefan Hajnoczi wrote:
 
 There are a number of contributors who maintain block drivers (image
 formats and protocols).  They should be listed in the MAINTAINERS file
 so that get_maintainer.pl lists them.
 
 Note that commits are still merged through Kevin or Stefan's block tree
 but the block driver sub-maintainers are usually the ones to review
 patches.
 
 Acked-by: Kevin Wolf kw...@redhat.com
 Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
 ---
  MAINTAINERS | 38 ++
  1 file changed, 38 insertions(+)
 
 diff --git a/MAINTAINERS b/MAINTAINERS
 index 77edacf..da18a23 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
 @@ -857,3 +857,41 @@ Stable 0.10
  L: qemu-sta...@nongnu.org
  T: git git://git.qemu-project.org/qemu-stable-0.10.git
  S: Orphan
 +
 +Block drivers
 +-
 +VMDK
 +M: Fam Zheng f...@redhat.com
 +S: Supported
 +F: block/vmdk.c
 +
 +RBD
 +M: Josh Durgin josh.dur...@dreamhost.com
 +S: Supported
 +F: block/rbd.c
 +
 +Sheepdog
 +M: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
 +S: Supported
 +F: block/sheepdog.c

Acked-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp



Re: [Qemu-devel] [PATCH] MAINTAINERS: add block driver sub-maintainers

2013-10-23 Thread MORITA Kazutaka
At Wed, 23 Oct 2013 15:19:47 +0900,
MORITA Kazutaka wrote:
 
  +
  +Sheepdog
  +M: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
  +S: Supported
  +F: block/sheepdog.c
 
 Acked-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp

Is it okay to add Liu Yuan tailai...@taobao.com to the sheepdog
maintainer too?  He is a co-maintainer of the sheepdog project with me
and very familiar with sheepdog internals.

Thanks,

Kazutaka



Re: [Qemu-devel] [sheepdog] [PATCH 1/2] sheepdog: explicitly set copies as type uint8_t

2013-10-23 Thread MORITA Kazutaka
At Wed, 16 Oct 2013 15:38:37 +0800,
Liu Yuan wrote:
 
 'copies' is actually uint8_t since day one, but request headers and some 
 helper
 functions parameterize it as uint32_t for unknown reasons and effectively
 reserve 24 bytes for possible future use. This patch explicitly set the 
 correct
 for copies and reserve the left bytes.
 
 This is a preparation patch that allow passing copy_policy in request header.
 
 Cc: Kevin Wolf kw...@redhat.com
 Cc: Stefan Hajnoczi stefa...@redhat.com
 Signed-off-by: Liu Yuan namei.u...@gmail.com
 ---
  block/sheepdog.c |   15 +--
  1 file changed, 9 insertions(+), 6 deletions(-)
 
 diff --git a/block/sheepdog.c b/block/sheepdog.c
 index 5f81c93..ca4f98b 100644
 --- a/block/sheepdog.c
 +++ b/block/sheepdog.c
 @@ -125,7 +125,8 @@ typedef struct SheepdogObjReq {
  uint32_t data_length;
  uint64_t oid;
  uint64_t cow_oid;
 -uint32_t copies;
 +uint8_t copies;
 +uint8_t reserved[3];
  uint32_t rsvd;
  uint64_t offset;
  } SheepdogObjReq;

Having both 'reserved' and 'rsvd' looks confusing.  I'd suggest
merging them into 'uint8_t reserved[7]'.

Thanks,

Kazutaka



Re: [Qemu-devel] [sheepdog] [PATCH] sheepdog: implement .bdrv_get_allocated_file_size

2013-08-07 Thread MORITA Kazutaka
At Wed,  7 Aug 2013 16:59:53 +0800,
Liu Yuan wrote:
 
 With this patch, qemu-img info sheepdog:image will show disk size for sheepdog
 images.
 
 Cc: Kevin Wolf kw...@redhat.com
 Cc: Stefan Hajnoczi stefa...@redhat.com
 Cc: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
 Signed-off-by: Liu Yuan namei.u...@gmail.com
 ---
  block/sheepdog.c |   19 +++
  1 file changed, 19 insertions(+)

Looks good to me.

Reviewed-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp



Re: [Qemu-devel] [sheepdog] [PATCH] sheepdog: add missing .bdrv_has_zero_init

2013-08-06 Thread MORITA Kazutaka
At Tue,  6 Aug 2013 14:44:37 +0800,
Liu Yuan wrote:
 
 Cc: Kevin Wolf kw...@redhat.com
 Cc: Stefan Hajnoczi stefa...@redhat.com
 Signed-off-by: Liu Yuan namei.u...@gmail.com
 ---
  block/sheepdog.c |2 ++
  1 file changed, 2 insertions(+)

Reviewed-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp



Re: [Qemu-devel] [sheepdog] [PATCH v4 03/10] sheepdog: check return values of qemu_co_recv/send correctly

2013-08-02 Thread MORITA Kazutaka
At Tue, 30 Jul 2013 15:48:02 +0200,
Stefan Hajnoczi wrote:
 
 On Fri, Jul 26, 2013 at 03:10:45PM +0900, MORITA Kazutaka wrote:
  If qemu_co_recv/send doesn't return the specified length, it means
  that an error happened.
  
  Tested-and-reviewed-by: Liu Yuan namei.u...@gmail.com
  Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
  ---
   block/sheepdog.c | 16 
   1 file changed, 8 insertions(+), 8 deletions(-)
  
  diff --git a/block/sheepdog.c b/block/sheepdog.c
  index 6a41ad9..c6e9b89 100644
  --- a/block/sheepdog.c
  +++ b/block/sheepdog.c
  @@ -489,13 +489,13 @@ static coroutine_fn int send_co_req(int sockfd, 
  SheepdogReq *hdr, void *data,
   int ret;
   
   ret = qemu_co_send(sockfd, hdr, sizeof(*hdr));
  -if (ret  sizeof(*hdr)) {
  +if (ret != sizeof(*hdr)) {
   error_report(failed to send a req, %s, strerror(errno));
 
 Does this rely on qemu_co_send_recv() getting ret=-1 errno=EPIPE from
 iov_send_recv()?  I want to check that I understand what happens when
 the socket is closed by the other side.

Yes, when the socket is closed by the peer, qemu_co_send_recv()
returns a short write (if some bytes are already sent) or -1 (if no
data is sent).  The current sheepdog driver doesn't work correctly for
the latter case because it compares -1 and an unsigned value.

This doesn't happen for the current qemu-io and qemu-img because they
terminate with SIGPIPE when the connection is closed by the peer.

Thanks,

Kazutaka



Re: [Qemu-devel] [sheepdog] [PATCH v4 06/10] coroutine: add co_aio_sleep_ns() to allow sleep in block drivers

2013-08-02 Thread MORITA Kazutaka
At Tue, 30 Jul 2013 15:58:58 +0200,
Stefan Hajnoczi wrote:
 
 On Fri, Jul 26, 2013 at 03:10:48PM +0900, MORITA Kazutaka wrote:
  This helper function behaves similarly to co_sleep_ns(), but the
  sleeping coroutine will be resumed when using qemu_aio_wait().
  
  Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
  ---
   include/block/coroutine.h |  8 
   qemu-coroutine-sleep.c| 47 
  +++
   2 files changed, 55 insertions(+)
  
  diff --git a/include/block/coroutine.h b/include/block/coroutine.h
  index 377805a..23ea6e9 100644
  --- a/include/block/coroutine.h
  +++ b/include/block/coroutine.h
  @@ -210,6 +210,14 @@ void qemu_co_rwlock_unlock(CoRwlock *lock);
   void coroutine_fn co_sleep_ns(QEMUClock *clock, int64_t ns);
   
   /**
  + * Yield the coroutine for a given duration
  + *
  + * Behaves similarly to co_sleep_ns(), but the sleeping coroutine will be
  + * resumed when using qemu_aio_wait().
  + */
  +void coroutine_fn co_aio_sleep_ns(int64_t ns);
  +
  +/**
* Yield until a file descriptor becomes readable
*
* Note that this function clobbers the handlers for the file descriptor.
  diff --git a/qemu-coroutine-sleep.c b/qemu-coroutine-sleep.c
  index 169ce5c..3955347 100644
  --- a/qemu-coroutine-sleep.c
  +++ b/qemu-coroutine-sleep.c
  @@ -13,6 +13,7 @@
   
   #include block/coroutine.h
   #include qemu/timer.h
  +#include qemu/thread.h
   
   typedef struct CoSleepCB {
   QEMUTimer *ts;
  @@ -37,3 +38,49 @@ void coroutine_fn co_sleep_ns(QEMUClock *clock, int64_t 
  ns)
   qemu_del_timer(sleep_cb.ts);
   qemu_free_timer(sleep_cb.ts);
   }
  +
  +typedef struct CoAioSleepCB {
  +QEMUBH *bh;
  +int64_t ns;
  +Coroutine *co;
  +} CoAioSleepCB;
  +
  +static void co_aio_sleep_cb(void *opaque)
  +{
  +CoAioSleepCB *aio_sleep_cb = opaque;
  +
  +qemu_coroutine_enter(aio_sleep_cb-co, NULL);
  +}
  +
  +static void *sleep_thread(void *opaque)
  +{
  +CoAioSleepCB *aio_sleep_cb = opaque;
  +struct timespec req = {
  +.tv_sec = aio_sleep_cb-ns / 10,
  +.tv_nsec = aio_sleep_cb-ns % 10,
  +};
  +struct timespec rem;
  +
  +while (nanosleep(req, rem)  0  errno == EINTR) {
 
 This breaks the Windows build and makes qemu_aio_wait() spin instead of
 blocking (wastes CPU).
 
 I think Alex Bligh and Ping Fan's QEMUTimer in AioContext work is needed
 here.  I have CCed them.  Their patches would allow you to use
 co_sleep_ns() in qemu_aio_wait().

Okay, I'll update this patch based on the AioContext timer.  I'm also
happy to help Alex and Pingfan to finish the implementation.

Thanks,

Kazutaka



Re: [Qemu-devel] [PATCH 1/4] block/sheepdog: Rename 'dprintf' to 'DPRINTF'

2013-07-29 Thread MORITA Kazutaka
At Mon, 29 Jul 2013 14:44:16 +0200,
Kevin Wolf wrote:
 
 Am 29.07.2013 um 14:16 hat Peter Maydell geschrieben:
  'dprintf' is the name of a POSIX standard function so we should not be
  stealing it for our debug macro. Rename to 'DPRINTF' (in line with
  a number of other source files.)
  
  Signed-off-by: Peter Maydell peter.mayd...@linaro.org
 
 Acked-by: Kevin Wolf kw...@redhat.com
 
 (CCed Kazutaka in case he has any objections, unexpectedly)

No problem.

Reviewed-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp

Thanks,

Kazutaka



[Qemu-devel] [PATCH v4 01/10] ignore SIGPIPE in qemu-img and qemu-io

2013-07-26 Thread MORITA Kazutaka
This prevents the tools from being stopped when they write data to a
closed connection in the other side.

Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 qemu-img.c | 4 
 qemu-io.c  | 4 
 2 files changed, 8 insertions(+)

diff --git a/qemu-img.c b/qemu-img.c
index c55ca5c..919d464 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2319,6 +2319,10 @@ int main(int argc, char **argv)
 const img_cmd_t *cmd;
 const char *cmdname;
 
+#ifdef CONFIG_POSIX
+signal(SIGPIPE, SIG_IGN);
+#endif
+
 error_set_progname(argv[0]);
 
 qemu_init_main_loop();
diff --git a/qemu-io.c b/qemu-io.c
index cb9def5..d54dc86 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -335,6 +335,10 @@ int main(int argc, char **argv)
 int opt_index = 0;
 int flags = BDRV_O_UNMAP;
 
+#ifdef CONFIG_POSIX
+signal(SIGPIPE, SIG_IGN);
+#endif
+
 progname = basename(argv[0]);
 
 while ((c = getopt_long(argc, argv, sopt, lopt, opt_index)) != -1) {
-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH v4 06/10] coroutine: add co_aio_sleep_ns() to allow sleep in block drivers

2013-07-26 Thread MORITA Kazutaka
This helper function behaves similarly to co_sleep_ns(), but the
sleeping coroutine will be resumed when using qemu_aio_wait().

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 include/block/coroutine.h |  8 
 qemu-coroutine-sleep.c| 47 +++
 2 files changed, 55 insertions(+)

diff --git a/include/block/coroutine.h b/include/block/coroutine.h
index 377805a..23ea6e9 100644
--- a/include/block/coroutine.h
+++ b/include/block/coroutine.h
@@ -210,6 +210,14 @@ void qemu_co_rwlock_unlock(CoRwlock *lock);
 void coroutine_fn co_sleep_ns(QEMUClock *clock, int64_t ns);
 
 /**
+ * Yield the coroutine for a given duration
+ *
+ * Behaves similarly to co_sleep_ns(), but the sleeping coroutine will be
+ * resumed when using qemu_aio_wait().
+ */
+void coroutine_fn co_aio_sleep_ns(int64_t ns);
+
+/**
  * Yield until a file descriptor becomes readable
  *
  * Note that this function clobbers the handlers for the file descriptor.
diff --git a/qemu-coroutine-sleep.c b/qemu-coroutine-sleep.c
index 169ce5c..3955347 100644
--- a/qemu-coroutine-sleep.c
+++ b/qemu-coroutine-sleep.c
@@ -13,6 +13,7 @@
 
 #include block/coroutine.h
 #include qemu/timer.h
+#include qemu/thread.h
 
 typedef struct CoSleepCB {
 QEMUTimer *ts;
@@ -37,3 +38,49 @@ void coroutine_fn co_sleep_ns(QEMUClock *clock, int64_t ns)
 qemu_del_timer(sleep_cb.ts);
 qemu_free_timer(sleep_cb.ts);
 }
+
+typedef struct CoAioSleepCB {
+QEMUBH *bh;
+int64_t ns;
+Coroutine *co;
+} CoAioSleepCB;
+
+static void co_aio_sleep_cb(void *opaque)
+{
+CoAioSleepCB *aio_sleep_cb = opaque;
+
+qemu_coroutine_enter(aio_sleep_cb-co, NULL);
+}
+
+static void *sleep_thread(void *opaque)
+{
+CoAioSleepCB *aio_sleep_cb = opaque;
+struct timespec req = {
+.tv_sec = aio_sleep_cb-ns / 10,
+.tv_nsec = aio_sleep_cb-ns % 10,
+};
+struct timespec rem;
+
+while (nanosleep(req, rem)  0  errno == EINTR) {
+req = rem;
+}
+
+qemu_bh_schedule(aio_sleep_cb-bh);
+
+return NULL;
+}
+
+void coroutine_fn co_aio_sleep_ns(int64_t ns)
+{
+CoAioSleepCB aio_sleep_cb = {
+.ns = ns,
+.co = qemu_coroutine_self(),
+};
+QemuThread thread;
+
+aio_sleep_cb.bh = qemu_bh_new(co_aio_sleep_cb, aio_sleep_cb);
+qemu_thread_create(thread, sleep_thread, aio_sleep_cb,
+   QEMU_THREAD_DETACHED);
+qemu_coroutine_yield();
+qemu_bh_delete(aio_sleep_cb.bh);
+}
-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH v4 02/10] iov: handle EOF in iov_send_recv

2013-07-26 Thread MORITA Kazutaka
Without this patch, iov_send_recv() never returns when do_send_recv()
returns zero.

Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 util/iov.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/util/iov.c b/util/iov.c
index cc6e837..f705586 100644
--- a/util/iov.c
+++ b/util/iov.c
@@ -202,6 +202,12 @@ ssize_t iov_send_recv(int sockfd, struct iovec *iov, 
unsigned iov_cnt,
 return -1;
 }
 
+if (ret == 0  !do_send) {
+/* recv returns 0 when the peer has performed an orderly
+ * shutdown. */
+break;
+}
+
 /* Prepare for the next iteration */
 offset += ret;
 total += ret;
-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH v4 05/10] sheepdog: reload inode outside of resend_aioreq

2013-07-26 Thread MORITA Kazutaka
This prepares for using resend_aioreq() after reconnecting to the
sheepdog server.

Tested-and-reviewed-by: Liu Yuan namei.u...@gmail.com
Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 33 +++--
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index fae17ac..7b22816 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -222,6 +222,11 @@ static inline uint64_t data_oid_to_idx(uint64_t oid)
 return oid  (MAX_DATA_OBJS - 1);
 }
 
+static inline uint32_t oid_to_vid(uint64_t oid)
+{
+return (oid  ~VDI_BIT)  VDI_SPACE_SHIFT;
+}
+
 static inline uint64_t vid_to_vdi_oid(uint32_t vid)
 {
 return VDI_BIT | ((uint64_t)vid  VDI_SPACE_SHIFT);
@@ -607,7 +612,7 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState 
*s, AIOReq *aio_req,
struct iovec *iov, int niov, bool create,
enum AIOCBState aiocb_type);
 static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req);
-
+static int reload_inode(BDRVSheepdogState *s, uint32_t snapid, const char 
*tag);
 
 static AIOReq *find_pending_req(BDRVSheepdogState *s, uint64_t oid)
 {
@@ -755,6 +760,19 @@ static void coroutine_fn aio_read_response(void *opaque)
 case SD_RES_SUCCESS:
 break;
 case SD_RES_READONLY:
+if (s-inode.vdi_id == oid_to_vid(aio_req-oid)) {
+ret = reload_inode(s, 0, );
+if (ret  0) {
+goto out;
+}
+}
+
+if (is_data_obj(aio_req-oid)) {
+aio_req-oid = vid_to_data_oid(s-inode.vdi_id,
+   data_oid_to_idx(aio_req-oid));
+} else {
+aio_req-oid = vid_to_vdi_oid(s-inode.vdi_id);
+}
 ret = resend_aioreq(s, aio_req);
 if (ret == SD_RES_SUCCESS) {
 goto out;
@@ -1202,19 +1220,6 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState 
*s, AIOReq *aio_req)
 {
 SheepdogAIOCB *acb = aio_req-aiocb;
 bool create = false;
-int ret;
-
-ret = reload_inode(s, 0, );
-if (ret  0) {
-return ret;
-}
-
-if (is_data_obj(aio_req-oid)) {
-aio_req-oid = vid_to_data_oid(s-inode.vdi_id,
-   data_oid_to_idx(aio_req-oid));
-} else {
-aio_req-oid = vid_to_vdi_oid(s-inode.vdi_id);
-}
 
 /* check whether this request becomes a CoW one */
 if (acb-aiocb_type == AIOCB_WRITE_UDATA  is_data_obj(aio_req-oid)) {
-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH v4 10/10] sheepdog: check simultaneous create in resend_aioreq

2013-07-26 Thread MORITA Kazutaka
After reconnection happens, all the inflight requests are moved to the
failed request list.  As a result, sd_co_rw_vector() can send another
create request before resend_aioreq() resends a create request from
the failed list.

This patch adds a helper function check_simultaneous_create() and
checks simultaneous create requests more strictly in resend_aioreq().

Tested-and-reviewed-by: Liu Yuan namei.u...@gmail.com
Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 64 
 1 file changed, 32 insertions(+), 32 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index c8739ae..800ebf4 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -1298,6 +1298,29 @@ out:
 return ret;
 }
 
+/* Return true if the specified request is linked to the pending list. */
+static bool check_simultaneous_create(BDRVSheepdogState *s, AIOReq *aio_req)
+{
+AIOReq *areq;
+QLIST_FOREACH(areq, s-inflight_aio_head, aio_siblings) {
+if (areq != aio_req  areq-oid == aio_req-oid) {
+/*
+ * Sheepdog cannot handle simultaneous create requests to the same
+ * object, so we cannot send the request until the previous request
+ * finishes.
+ */
+dprintf(simultaneous create to % PRIx64 \n, aio_req-oid);
+aio_req-flags = 0;
+aio_req-base_oid = 0;
+QLIST_REMOVE(aio_req, aio_siblings);
+QLIST_INSERT_HEAD(s-pending_aio_head, aio_req, aio_siblings);
+return true;
+}
+}
+
+return false;
+}
+
 static void coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req)
 {
 SheepdogAIOCB *acb = aio_req-aiocb;
@@ -1306,29 +1329,19 @@ static void coroutine_fn 
resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req)
 /* check whether this request becomes a CoW one */
 if (acb-aiocb_type == AIOCB_WRITE_UDATA  is_data_obj(aio_req-oid)) {
 int idx = data_oid_to_idx(aio_req-oid);
-AIOReq *areq;
 
-if (s-inode.data_vdi_id[idx] == 0) {
-create = true;
-goto out;
-}
 if (is_data_obj_writable(s-inode, idx)) {
 goto out;
 }
 
-/* link to the pending list if there is another CoW request to
- * the same object */
-QLIST_FOREACH(areq, s-inflight_aio_head, aio_siblings) {
-if (areq != aio_req  areq-oid == aio_req-oid) {
-dprintf(simultaneous CoW to % PRIx64 \n, aio_req-oid);
-QLIST_REMOVE(aio_req, aio_siblings);
-QLIST_INSERT_HEAD(s-pending_aio_head, aio_req, aio_siblings);
-return;
-}
+if (check_simultaneous_create(s, aio_req)) {
+return;
 }
 
-aio_req-base_oid = vid_to_data_oid(s-inode.data_vdi_id[idx], idx);
-aio_req-flags |= SD_FLAG_CMD_COW;
+if (s-inode.data_vdi_id[idx]) {
+aio_req-base_oid = vid_to_data_oid(s-inode.data_vdi_id[idx], 
idx);
+aio_req-flags |= SD_FLAG_CMD_COW;
+}
 create = true;
 }
 out:
@@ -1942,27 +1955,14 @@ static int coroutine_fn sd_co_rw_vector(void *p)
 }
 
 aio_req = alloc_aio_req(s, acb, oid, len, offset, flags, old_oid, 
done);
+QLIST_INSERT_HEAD(s-inflight_aio_head, aio_req, aio_siblings);
 
 if (create) {
-AIOReq *areq;
-QLIST_FOREACH(areq, s-inflight_aio_head, aio_siblings) {
-if (areq-oid == oid) {
-/*
- * Sheepdog cannot handle simultaneous create
- * requests to the same object.  So we cannot send
- * the request until the previous request
- * finishes.
- */
-aio_req-flags = 0;
-aio_req-base_oid = 0;
-QLIST_INSERT_HEAD(s-pending_aio_head, aio_req,
-  aio_siblings);
-goto done;
-}
+if (check_simultaneous_create(s, aio_req)) {
+goto done;
 }
 }
 
-QLIST_INSERT_HEAD(s-inflight_aio_head, aio_req, aio_siblings);
 add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov, create,
 acb-aiocb_type);
 done:
-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH v4 09/10] sheepdog: cancel aio requests if possible

2013-07-26 Thread MORITA Kazutaka
This patch tries to cancel aio requests in pending queue and failed
queue.  When the sheepdog driver cannot cancel the requests, it waits
for them to be completed.

Tested-and-reviewed-by: Liu Yuan namei.u...@gmail.com
Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 70 +++-
 1 file changed, 59 insertions(+), 11 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 17c7941..c8739ae 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -294,7 +294,8 @@ struct SheepdogAIOCB {
 Coroutine *coroutine;
 void (*aio_done_func)(SheepdogAIOCB *);
 
-bool canceled;
+bool cancelable;
+bool *finished;
 int nr_pending;
 };
 
@@ -411,6 +412,7 @@ static inline void free_aio_req(BDRVSheepdogState *s, 
AIOReq *aio_req)
 {
 SheepdogAIOCB *acb = aio_req-aiocb;
 
+acb-cancelable = false;
 QLIST_REMOVE(aio_req, aio_siblings);
 g_free(aio_req);
 
@@ -419,23 +421,68 @@ static inline void free_aio_req(BDRVSheepdogState *s, 
AIOReq *aio_req)
 
 static void coroutine_fn sd_finish_aiocb(SheepdogAIOCB *acb)
 {
-if (!acb-canceled) {
-qemu_coroutine_enter(acb-coroutine, NULL);
+qemu_coroutine_enter(acb-coroutine, NULL);
+if (acb-finished) {
+*acb-finished = true;
 }
 qemu_aio_release(acb);
 }
 
+/*
+ * Check whether the specified acb can be canceled
+ *
+ * We can cancel aio when any request belonging to the acb is:
+ *  - Not processed by the sheepdog server.
+ *  - Not linked to the inflight queue.
+ */
+static bool sd_acb_cancelable(const SheepdogAIOCB *acb)
+{
+BDRVSheepdogState *s = acb-common.bs-opaque;
+AIOReq *aioreq;
+
+if (!acb-cancelable) {
+return false;
+}
+
+QLIST_FOREACH(aioreq, s-inflight_aio_head, aio_siblings) {
+if (aioreq-aiocb == acb) {
+return false;
+}
+}
+
+return true;
+}
+
 static void sd_aio_cancel(BlockDriverAIOCB *blockacb)
 {
 SheepdogAIOCB *acb = (SheepdogAIOCB *)blockacb;
+BDRVSheepdogState *s = acb-common.bs-opaque;
+AIOReq *aioreq, *next;
+bool finished = false;
+
+acb-finished = finished;
+while (!finished) {
+if (sd_acb_cancelable(acb)) {
+/* Remove outstanding requests from pending and failed queues.  */
+QLIST_FOREACH_SAFE(aioreq, s-pending_aio_head, aio_siblings,
+   next) {
+if (aioreq-aiocb == acb) {
+free_aio_req(s, aioreq);
+}
+}
+QLIST_FOREACH_SAFE(aioreq, s-failed_aio_head, aio_siblings,
+   next) {
+if (aioreq-aiocb == acb) {
+free_aio_req(s, aioreq);
+}
+}
 
-/*
- * Sheepdog cannot cancel the requests which are already sent to
- * the servers, so we just complete the request with -EIO here.
- */
-acb-ret = -EIO;
-qemu_coroutine_enter(acb-coroutine, NULL);
-acb-canceled = true;
+assert(acb-nr_pending == 0);
+sd_finish_aiocb(acb);
+return;
+}
+qemu_aio_wait();
+}
 }
 
 static const AIOCBInfo sd_aiocb_info = {
@@ -456,7 +503,8 @@ static SheepdogAIOCB *sd_aio_setup(BlockDriverState *bs, 
QEMUIOVector *qiov,
 acb-nb_sectors = nb_sectors;
 
 acb-aio_done_func = NULL;
-acb-canceled = false;
+acb-cancelable = true;
+acb-finished = NULL;
 acb-coroutine = qemu_coroutine_self();
 acb-ret = 0;
 acb-nr_pending = 0;
-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH v4 08/10] sheepdog: make add_aio_request and send_aioreq void functions

2013-07-26 Thread MORITA Kazutaka
These functions no longer return errors.  We can make them void
functions and simplify the codes.

Tested-and-reviewed-by: Liu Yuan namei.u...@gmail.com
Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 66 +++-
 1 file changed, 17 insertions(+), 49 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 3860611..17c7941 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -611,10 +611,10 @@ static int do_req(int sockfd, SheepdogReq *hdr, void 
*data,
 return srco.ret;
 }
 
-static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
+static void coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
struct iovec *iov, int niov, bool create,
enum AIOCBState aiocb_type);
-static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req);
+static void coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req);
 static int reload_inode(BDRVSheepdogState *s, uint32_t snapid, const char 
*tag);
 static int get_sheep_fd(BDRVSheepdogState *s);
 static void co_write_request(void *opaque);
@@ -640,22 +640,14 @@ static void coroutine_fn 
send_pending_req(BDRVSheepdogState *s, uint64_t oid)
 {
 AIOReq *aio_req;
 SheepdogAIOCB *acb;
-int ret;
 
 while ((aio_req = find_pending_req(s, oid)) != NULL) {
 acb = aio_req-aiocb;
 /* move aio_req from pending list to inflight one */
 QLIST_REMOVE(aio_req, aio_siblings);
 QLIST_INSERT_HEAD(s-inflight_aio_head, aio_req, aio_siblings);
-ret = add_aio_request(s, aio_req, acb-qiov-iov,
-  acb-qiov-niov, false, acb-aiocb_type);
-if (ret  0) {
-error_report(add_aio_request is failed);
-free_aio_req(s, aio_req);
-if (!acb-nr_pending) {
-sd_finish_aiocb(acb);
-}
-}
+add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov, false,
+acb-aiocb_type);
 }
 }
 
@@ -817,11 +809,8 @@ static void coroutine_fn aio_read_response(void *opaque)
 } else {
 aio_req-oid = vid_to_vdi_oid(s-inode.vdi_id);
 }
-ret = resend_aioreq(s, aio_req);
-if (ret == SD_RES_SUCCESS) {
-goto out;
-}
-/* fall through */
+resend_aioreq(s, aio_req);
+goto out;
 default:
 acb-ret = -EIO;
 error_report(%s, sd_strerror(rsp.result));
@@ -1079,7 +1068,7 @@ out:
 return ret;
 }
 
-static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
+static void coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
struct iovec *iov, int niov, bool create,
enum AIOCBState aiocb_type)
 {
@@ -1159,8 +1148,6 @@ out:
 aio_flush_request, s);
 s-co_send = NULL;
 qemu_co_mutex_unlock(s-lock);
-
-return 0;
 }
 
 static int read_write_object(int fd, char *buf, uint64_t oid, int copies,
@@ -1263,7 +1250,7 @@ out:
 return ret;
 }
 
-static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req)
+static void coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req)
 {
 SheepdogAIOCB *acb = aio_req-aiocb;
 bool create = false;
@@ -1288,7 +1275,7 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState 
*s, AIOReq *aio_req)
 dprintf(simultaneous CoW to % PRIx64 \n, aio_req-oid);
 QLIST_REMOVE(aio_req, aio_siblings);
 QLIST_INSERT_HEAD(s-pending_aio_head, aio_req, aio_siblings);
-return SD_RES_SUCCESS;
+return;
 }
 }
 
@@ -1298,13 +1285,13 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState 
*s, AIOReq *aio_req)
 }
 out:
 if (is_data_obj(aio_req-oid)) {
-return add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov,
-   create, acb-aiocb_type);
+add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov, create,
+acb-aiocb_type);
 } else {
 struct iovec iov;
 iov.iov_base = s-inode;
 iov.iov_len = sizeof(s-inode);
-return add_aio_request(s, aio_req, iov, 1, false, AIOCB_WRITE_UDATA);
+add_aio_request(s, aio_req, iov, 1, false, AIOCB_WRITE_UDATA);
 }
 }
 
@@ -1694,7 +1681,6 @@ static int sd_truncate(BlockDriverState *bs, int64_t 
offset)
  */
 static void coroutine_fn sd_write_done(SheepdogAIOCB *acb)
 {
-int ret;
 BDRVSheepdogState *s = acb-common.bs-opaque;
 struct iovec iov;
 AIOReq *aio_req;
@@ -1716,18 +1702,13 @@ static void coroutine_fn sd_write_done(SheepdogAIOCB 
*acb)
 aio_req = alloc_aio_req(s, acb, vid_to_vdi_oid(s-inode.vdi_id),
 data_len, offset, 0, 0, offset

[Qemu-devel] [PATCH v4 03/10] sheepdog: check return values of qemu_co_recv/send correctly

2013-07-26 Thread MORITA Kazutaka
If qemu_co_recv/send doesn't return the specified length, it means
that an error happened.

Tested-and-reviewed-by: Liu Yuan namei.u...@gmail.com
Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 6a41ad9..c6e9b89 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -489,13 +489,13 @@ static coroutine_fn int send_co_req(int sockfd, 
SheepdogReq *hdr, void *data,
 int ret;
 
 ret = qemu_co_send(sockfd, hdr, sizeof(*hdr));
-if (ret  sizeof(*hdr)) {
+if (ret != sizeof(*hdr)) {
 error_report(failed to send a req, %s, strerror(errno));
 return ret;
 }
 
 ret = qemu_co_send(sockfd, data, *wlen);
-if (ret  *wlen) {
+if (ret != *wlen) {
 error_report(failed to send a req, %s, strerror(errno));
 }
 
@@ -548,7 +548,7 @@ static coroutine_fn void do_co_req(void *opaque)
 qemu_aio_set_fd_handler(sockfd, restart_co_req, NULL, have_co_req, co);
 
 ret = qemu_co_recv(sockfd, hdr, sizeof(*hdr));
-if (ret  sizeof(*hdr)) {
+if (ret != sizeof(*hdr)) {
 error_report(failed to get a rsp, %s, strerror(errno));
 ret = -errno;
 goto out;
@@ -560,7 +560,7 @@ static coroutine_fn void do_co_req(void *opaque)
 
 if (*rlen) {
 ret = qemu_co_recv(sockfd, data, *rlen);
-if (ret  *rlen) {
+if (ret != *rlen) {
 error_report(failed to get the data, %s, strerror(errno));
 ret = -errno;
 goto out;
@@ -671,7 +671,7 @@ static void coroutine_fn aio_read_response(void *opaque)
 
 /* read a header */
 ret = qemu_co_recv(fd, rsp, sizeof(rsp));
-if (ret  0) {
+if (ret != sizeof(rsp)) {
 error_report(failed to get the header, %s, strerror(errno));
 goto out;
 }
@@ -722,7 +722,7 @@ static void coroutine_fn aio_read_response(void *opaque)
 case AIOCB_READ_UDATA:
 ret = qemu_co_recvv(fd, acb-qiov-iov, acb-qiov-niov,
 aio_req-iov_offset, rsp.data_length);
-if (ret  0) {
+if (ret != rsp.data_length) {
 error_report(failed to get the data, %s, strerror(errno));
 goto out;
 }
@@ -1075,7 +1075,7 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState 
*s, AIOReq *aio_req,
 
 /* send a header */
 ret = qemu_co_send(s-fd, hdr, sizeof(hdr));
-if (ret  0) {
+if (ret != sizeof(hdr)) {
 qemu_co_mutex_unlock(s-lock);
 error_report(failed to send a req, %s, strerror(errno));
 return -errno;
@@ -1083,7 +1083,7 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState 
*s, AIOReq *aio_req,
 
 if (wlen) {
 ret = qemu_co_sendv(s-fd, iov, niov, aio_req-iov_offset, wlen);
-if (ret  0) {
+if (ret != wlen) {
 qemu_co_mutex_unlock(s-lock);
 error_report(failed to send a data, %s, strerror(errno));
 return -errno;
-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH v4 04/10] sheepdog: handle vdi objects in resend_aio_req

2013-07-26 Thread MORITA Kazutaka
The current resend_aio_req() doesn't work when the request is against
vdi objects.  This fixes the problem.

Tested-and-reviewed-by: Liu Yuan namei.u...@gmail.com
Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index c6e9b89..fae17ac 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -1209,11 +1209,15 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState 
*s, AIOReq *aio_req)
 return ret;
 }
 
-aio_req-oid = vid_to_data_oid(s-inode.vdi_id,
-   data_oid_to_idx(aio_req-oid));
+if (is_data_obj(aio_req-oid)) {
+aio_req-oid = vid_to_data_oid(s-inode.vdi_id,
+   data_oid_to_idx(aio_req-oid));
+} else {
+aio_req-oid = vid_to_vdi_oid(s-inode.vdi_id);
+}
 
 /* check whether this request becomes a CoW one */
-if (acb-aiocb_type == AIOCB_WRITE_UDATA) {
+if (acb-aiocb_type == AIOCB_WRITE_UDATA  is_data_obj(aio_req-oid)) {
 int idx = data_oid_to_idx(aio_req-oid);
 AIOReq *areq;
 
@@ -1241,8 +1245,15 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState 
*s, AIOReq *aio_req)
 create = true;
 }
 out:
-return add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov,
-   create, acb-aiocb_type);
+if (is_data_obj(aio_req-oid)) {
+return add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov,
+   create, acb-aiocb_type);
+} else {
+struct iovec iov;
+iov.iov_base = s-inode;
+iov.iov_len = sizeof(s-inode);
+return add_aio_request(s, aio_req, iov, 1, false, AIOCB_WRITE_UDATA);
+}
 }
 
 /* TODO Convert to fine grained options */
-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH v4 00/10] sheepdog: reconnect server after connection failure

2013-07-26 Thread MORITA Kazutaka
Currently, if a sheepdog server exits, all the connecting VMs need to
be restarted.  This series implements a feature to reconnect the
server, and enables us to do online sheepdog upgrade and avoid
restarting VMs when sheepdog servers crash unexpectedly.

v4:
 - Added comment to explain why we need a failed queue.
 - Fixed a return value of sd_acb_cancelable().

v3:
 - Check return values of qemu_co_recv/send more strictly.
 - Move inflight requests to the failed list after reconnection
   completes.  This is necessary to resend I/Os while connection is
   lost.
 - Check simultaneous create in resend_aioreq().

v2:
 - Dropped nonblocking connect patches.

MORITA Kazutaka (10):
  ignore SIGPIPE in qemu-img and qemu-io
  iov: handle EOF in iov_send_recv
  sheepdog: check return values of qemu_co_recv/send correctly
  sheepdog: handle vdi objects in resend_aio_req
  sheepdog: reload inode outside of resend_aioreq
  coroutine: add co_aio_sleep_ns() to allow sleep in block drivers
  sheepdog: try to reconnect to sheepdog after network error
  sheepdog: make add_aio_request and send_aioreq void functions
  sheepdog: cancel aio requests if possible
  sheepdog: check simultaneous create in resend_aioreq

 block/sheepdog.c  | 320 +-
 include/block/coroutine.h |   8 ++
 qemu-coroutine-sleep.c|  47 +++
 qemu-img.c|   4 +
 qemu-io.c |   4 +
 util/iov.c|   6 +
 6 files changed, 269 insertions(+), 120 deletions(-)

-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH v4 07/10] sheepdog: try to reconnect to sheepdog after network error

2013-07-26 Thread MORITA Kazutaka
This introduces a failed request queue and links all the inflight
requests to the list after network error happens.  After QEMU
reconnects to the sheepdog server successfully, the sheepdog block
driver will retry all the requests in the failed queue.

Tested-and-reviewed-by: Liu Yuan namei.u...@gmail.com
Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 78 +---
 1 file changed, 63 insertions(+), 15 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 7b22816..3860611 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -318,8 +318,11 @@ typedef struct BDRVSheepdogState {
 Coroutine *co_recv;
 
 uint32_t aioreq_seq_num;
+
+/* Every aio request must be linked to either of these queues. */
 QLIST_HEAD(inflight_aio_head, AIOReq) inflight_aio_head;
 QLIST_HEAD(pending_aio_head, AIOReq) pending_aio_head;
+QLIST_HEAD(failed_aio_head, AIOReq) failed_aio_head;
 } BDRVSheepdogState;
 
 static const char * sd_strerror(int err)
@@ -613,6 +616,8 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState 
*s, AIOReq *aio_req,
enum AIOCBState aiocb_type);
 static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req);
 static int reload_inode(BDRVSheepdogState *s, uint32_t snapid, const char 
*tag);
+static int get_sheep_fd(BDRVSheepdogState *s);
+static void co_write_request(void *opaque);
 
 static AIOReq *find_pending_req(BDRVSheepdogState *s, uint64_t oid)
 {
@@ -654,6 +659,50 @@ static void coroutine_fn 
send_pending_req(BDRVSheepdogState *s, uint64_t oid)
 }
 }
 
+static coroutine_fn void reconnect_to_sdog(void *opaque)
+{
+BDRVSheepdogState *s = opaque;
+AIOReq *aio_req, *next;
+
+qemu_aio_set_fd_handler(s-fd, NULL, NULL, NULL, NULL);
+close(s-fd);
+s-fd = -1;
+
+/* Wait for outstanding write requests to be completed. */
+while (s-co_send != NULL) {
+co_write_request(opaque);
+}
+
+/* Try to reconnect the sheepdog server every one second. */
+while (s-fd  0) {
+s-fd = get_sheep_fd(s);
+if (s-fd  0) {
+dprintf(Wait for connection to be established\n);
+co_aio_sleep_ns(10ULL);
+}
+};
+
+/*
+ * Now we have to resend all the request in the inflight queue.  However,
+ * resend_aioreq() can yield and newly created requests can be added to the
+ * inflight queue before the coroutine is resumed.  To avoid mixing them, 
we
+ * have to move all the inflight requests to the failed queue before
+ * resend_aioreq() is called.
+ */
+QLIST_FOREACH_SAFE(aio_req, s-inflight_aio_head, aio_siblings, next) {
+QLIST_REMOVE(aio_req, aio_siblings);
+QLIST_INSERT_HEAD(s-failed_aio_head, aio_req, aio_siblings);
+}
+
+/* Resend all the failed aio requests. */
+while (!QLIST_EMPTY(s-failed_aio_head)) {
+aio_req = QLIST_FIRST(s-failed_aio_head);
+QLIST_REMOVE(aio_req, aio_siblings);
+QLIST_INSERT_HEAD(s-inflight_aio_head, aio_req, aio_siblings);
+resend_aioreq(s, aio_req);
+}
+}
+
 /*
  * Receive responses of the I/O requests.
  *
@@ -670,15 +719,11 @@ static void coroutine_fn aio_read_response(void *opaque)
 SheepdogAIOCB *acb;
 uint64_t idx;
 
-if (QLIST_EMPTY(s-inflight_aio_head)) {
-goto out;
-}
-
 /* read a header */
 ret = qemu_co_recv(fd, rsp, sizeof(rsp));
 if (ret != sizeof(rsp)) {
 error_report(failed to get the header, %s, strerror(errno));
-goto out;
+goto err;
 }
 
 /* find the right aio_req from the inflight aio list */
@@ -689,7 +734,7 @@ static void coroutine_fn aio_read_response(void *opaque)
 }
 if (!aio_req) {
 error_report(cannot find aio_req %x, rsp.id);
-goto out;
+goto err;
 }
 
 acb = aio_req-aiocb;
@@ -729,7 +774,7 @@ static void coroutine_fn aio_read_response(void *opaque)
 aio_req-iov_offset, rsp.data_length);
 if (ret != rsp.data_length) {
 error_report(failed to get the data, %s, strerror(errno));
-goto out;
+goto err;
 }
 break;
 case AIOCB_FLUSH_CACHE:
@@ -763,10 +808,9 @@ static void coroutine_fn aio_read_response(void *opaque)
 if (s-inode.vdi_id == oid_to_vid(aio_req-oid)) {
 ret = reload_inode(s, 0, );
 if (ret  0) {
-goto out;
+goto err;
 }
 }
-
 if (is_data_obj(aio_req-oid)) {
 aio_req-oid = vid_to_data_oid(s-inode.vdi_id,
data_oid_to_idx(aio_req-oid));
@@ -794,6 +838,10 @@ static void coroutine_fn aio_read_response(void *opaque)
 }
 out:
 s-co_recv = NULL;
+return;
+err:
+s-co_recv = NULL;
+reconnect_to_sdog(opaque);
 }
 
 static void co_read_response(void *opaque

[Qemu-devel] [PATCH v3 02/10] iov: handle EOF in iov_send_recv

2013-07-25 Thread MORITA Kazutaka
Without this patch, iov_send_recv() never returns when do_send_recv()
returns zero.

Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 util/iov.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/util/iov.c b/util/iov.c
index cc6e837..f705586 100644
--- a/util/iov.c
+++ b/util/iov.c
@@ -202,6 +202,12 @@ ssize_t iov_send_recv(int sockfd, struct iovec *iov, 
unsigned iov_cnt,
 return -1;
 }
 
+if (ret == 0  !do_send) {
+/* recv returns 0 when the peer has performed an orderly
+ * shutdown. */
+break;
+}
+
 /* Prepare for the next iteration */
 offset += ret;
 total += ret;
-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH v3 10/10] sheepdog: check simultaneous create in resend_aioreq

2013-07-25 Thread MORITA Kazutaka
After reconnection happens, all the inflight requests are moved to the
failed request list.  As a result, sd_co_rw_vector() can send another
create request before resend_aioreq() resends a create request from
the failed list.

This patch adds a helper function check_simultaneous_create() and
checks simultaneous create requests more strictly in resend_aioreq().

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 64 
 1 file changed, 32 insertions(+), 32 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 7bf882a..46821df 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -1292,6 +1292,29 @@ out:
 return ret;
 }
 
+/* Return true if the specified request is linked to the pending list. */
+static bool check_simultaneous_create(BDRVSheepdogState *s, AIOReq *aio_req)
+{
+AIOReq *areq;
+QLIST_FOREACH(areq, s-inflight_aio_head, aio_siblings) {
+if (areq != aio_req  areq-oid == aio_req-oid) {
+/*
+ * Sheepdog cannot handle simultaneous create requests to the same
+ * object, so we cannot send the request until the previous request
+ * finishes.
+ */
+dprintf(simultaneous create to % PRIx64 \n, aio_req-oid);
+aio_req-flags = 0;
+aio_req-base_oid = 0;
+QLIST_REMOVE(aio_req, aio_siblings);
+QLIST_INSERT_HEAD(s-pending_aio_head, aio_req, aio_siblings);
+return true;
+}
+}
+
+return false;
+}
+
 static void coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req)
 {
 SheepdogAIOCB *acb = aio_req-aiocb;
@@ -1300,29 +1323,19 @@ static void coroutine_fn 
resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req)
 /* check whether this request becomes a CoW one */
 if (acb-aiocb_type == AIOCB_WRITE_UDATA  is_data_obj(aio_req-oid)) {
 int idx = data_oid_to_idx(aio_req-oid);
-AIOReq *areq;
 
-if (s-inode.data_vdi_id[idx] == 0) {
-create = true;
-goto out;
-}
 if (is_data_obj_writable(s-inode, idx)) {
 goto out;
 }
 
-/* link to the pending list if there is another CoW request to
- * the same object */
-QLIST_FOREACH(areq, s-inflight_aio_head, aio_siblings) {
-if (areq != aio_req  areq-oid == aio_req-oid) {
-dprintf(simultaneous CoW to % PRIx64 \n, aio_req-oid);
-QLIST_REMOVE(aio_req, aio_siblings);
-QLIST_INSERT_HEAD(s-pending_aio_head, aio_req, aio_siblings);
-return;
-}
+if (check_simultaneous_create(s, aio_req)) {
+return;
 }
 
-aio_req-base_oid = vid_to_data_oid(s-inode.data_vdi_id[idx], idx);
-aio_req-flags |= SD_FLAG_CMD_COW;
+if (s-inode.data_vdi_id[idx]) {
+aio_req-base_oid = vid_to_data_oid(s-inode.data_vdi_id[idx], 
idx);
+aio_req-flags |= SD_FLAG_CMD_COW;
+}
 create = true;
 }
 out:
@@ -1936,27 +1949,14 @@ static int coroutine_fn sd_co_rw_vector(void *p)
 }
 
 aio_req = alloc_aio_req(s, acb, oid, len, offset, flags, old_oid, 
done);
+QLIST_INSERT_HEAD(s-inflight_aio_head, aio_req, aio_siblings);
 
 if (create) {
-AIOReq *areq;
-QLIST_FOREACH(areq, s-inflight_aio_head, aio_siblings) {
-if (areq-oid == oid) {
-/*
- * Sheepdog cannot handle simultaneous create
- * requests to the same object.  So we cannot send
- * the request until the previous request
- * finishes.
- */
-aio_req-flags = 0;
-aio_req-base_oid = 0;
-QLIST_INSERT_HEAD(s-pending_aio_head, aio_req,
-  aio_siblings);
-goto done;
-}
+if (check_simultaneous_create(s, aio_req)) {
+goto done;
 }
 }
 
-QLIST_INSERT_HEAD(s-inflight_aio_head, aio_req, aio_siblings);
 add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov, create,
 acb-aiocb_type);
 done:
-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH v3 01/10] ignore SIGPIPE in qemu-img and qemu-io

2013-07-25 Thread MORITA Kazutaka
This prevents the tools from being stopped when they write data to a
closed connection in the other side.

Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 qemu-img.c | 4 
 qemu-io.c  | 4 
 2 files changed, 8 insertions(+)

diff --git a/qemu-img.c b/qemu-img.c
index c55ca5c..919d464 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2319,6 +2319,10 @@ int main(int argc, char **argv)
 const img_cmd_t *cmd;
 const char *cmdname;
 
+#ifdef CONFIG_POSIX
+signal(SIGPIPE, SIG_IGN);
+#endif
+
 error_set_progname(argv[0]);
 
 qemu_init_main_loop();
diff --git a/qemu-io.c b/qemu-io.c
index cb9def5..d54dc86 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -335,6 +335,10 @@ int main(int argc, char **argv)
 int opt_index = 0;
 int flags = BDRV_O_UNMAP;
 
+#ifdef CONFIG_POSIX
+signal(SIGPIPE, SIG_IGN);
+#endif
+
 progname = basename(argv[0]);
 
 while ((c = getopt_long(argc, argv, sopt, lopt, opt_index)) != -1) {
-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH v3 06/10] coroutine: add co_aio_sleep_ns() to allow sleep in block drivers

2013-07-25 Thread MORITA Kazutaka
This helper function behaves similarly to co_sleep_ns(), but the
sleeping coroutine will be resumed when using qemu_aio_wait().

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 include/block/coroutine.h |  8 
 qemu-coroutine-sleep.c| 47 +++
 2 files changed, 55 insertions(+)

diff --git a/include/block/coroutine.h b/include/block/coroutine.h
index 377805a..23ea6e9 100644
--- a/include/block/coroutine.h
+++ b/include/block/coroutine.h
@@ -210,6 +210,14 @@ void qemu_co_rwlock_unlock(CoRwlock *lock);
 void coroutine_fn co_sleep_ns(QEMUClock *clock, int64_t ns);
 
 /**
+ * Yield the coroutine for a given duration
+ *
+ * Behaves similarly to co_sleep_ns(), but the sleeping coroutine will be
+ * resumed when using qemu_aio_wait().
+ */
+void coroutine_fn co_aio_sleep_ns(int64_t ns);
+
+/**
  * Yield until a file descriptor becomes readable
  *
  * Note that this function clobbers the handlers for the file descriptor.
diff --git a/qemu-coroutine-sleep.c b/qemu-coroutine-sleep.c
index 169ce5c..3955347 100644
--- a/qemu-coroutine-sleep.c
+++ b/qemu-coroutine-sleep.c
@@ -13,6 +13,7 @@
 
 #include block/coroutine.h
 #include qemu/timer.h
+#include qemu/thread.h
 
 typedef struct CoSleepCB {
 QEMUTimer *ts;
@@ -37,3 +38,49 @@ void coroutine_fn co_sleep_ns(QEMUClock *clock, int64_t ns)
 qemu_del_timer(sleep_cb.ts);
 qemu_free_timer(sleep_cb.ts);
 }
+
+typedef struct CoAioSleepCB {
+QEMUBH *bh;
+int64_t ns;
+Coroutine *co;
+} CoAioSleepCB;
+
+static void co_aio_sleep_cb(void *opaque)
+{
+CoAioSleepCB *aio_sleep_cb = opaque;
+
+qemu_coroutine_enter(aio_sleep_cb-co, NULL);
+}
+
+static void *sleep_thread(void *opaque)
+{
+CoAioSleepCB *aio_sleep_cb = opaque;
+struct timespec req = {
+.tv_sec = aio_sleep_cb-ns / 10,
+.tv_nsec = aio_sleep_cb-ns % 10,
+};
+struct timespec rem;
+
+while (nanosleep(req, rem)  0  errno == EINTR) {
+req = rem;
+}
+
+qemu_bh_schedule(aio_sleep_cb-bh);
+
+return NULL;
+}
+
+void coroutine_fn co_aio_sleep_ns(int64_t ns)
+{
+CoAioSleepCB aio_sleep_cb = {
+.ns = ns,
+.co = qemu_coroutine_self(),
+};
+QemuThread thread;
+
+aio_sleep_cb.bh = qemu_bh_new(co_aio_sleep_cb, aio_sleep_cb);
+qemu_thread_create(thread, sleep_thread, aio_sleep_cb,
+   QEMU_THREAD_DETACHED);
+qemu_coroutine_yield();
+qemu_bh_delete(aio_sleep_cb.bh);
+}
-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH v3 08/10] sheepdog: make add_aio_request and send_aioreq void functions

2013-07-25 Thread MORITA Kazutaka
These functions no longer return errors.  We can make them void
functions and simplify the codes.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 66 +++-
 1 file changed, 17 insertions(+), 49 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 43a6feb..9f3fa89 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -611,10 +611,10 @@ static int do_req(int sockfd, SheepdogReq *hdr, void 
*data,
 return srco.ret;
 }
 
-static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
+static void coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
struct iovec *iov, int niov, bool create,
enum AIOCBState aiocb_type);
-static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req);
+static void coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req);
 static int reload_inode(BDRVSheepdogState *s, uint32_t snapid, const char 
*tag);
 static int get_sheep_fd(BDRVSheepdogState *s);
 static void co_write_request(void *opaque);
@@ -640,22 +640,14 @@ static void coroutine_fn 
send_pending_req(BDRVSheepdogState *s, uint64_t oid)
 {
 AIOReq *aio_req;
 SheepdogAIOCB *acb;
-int ret;
 
 while ((aio_req = find_pending_req(s, oid)) != NULL) {
 acb = aio_req-aiocb;
 /* move aio_req from pending list to inflight one */
 QLIST_REMOVE(aio_req, aio_siblings);
 QLIST_INSERT_HEAD(s-inflight_aio_head, aio_req, aio_siblings);
-ret = add_aio_request(s, aio_req, acb-qiov-iov,
-  acb-qiov-niov, false, acb-aiocb_type);
-if (ret  0) {
-error_report(add_aio_request is failed);
-free_aio_req(s, aio_req);
-if (!acb-nr_pending) {
-sd_finish_aiocb(acb);
-}
-}
+add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov, false,
+acb-aiocb_type);
 }
 }
 
@@ -811,11 +803,8 @@ static void coroutine_fn aio_read_response(void *opaque)
 } else {
 aio_req-oid = vid_to_vdi_oid(s-inode.vdi_id);
 }
-ret = resend_aioreq(s, aio_req);
-if (ret == SD_RES_SUCCESS) {
-goto out;
-}
-/* fall through */
+resend_aioreq(s, aio_req);
+goto out;
 default:
 acb-ret = -EIO;
 error_report(%s, sd_strerror(rsp.result));
@@ -1073,7 +1062,7 @@ out:
 return ret;
 }
 
-static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
+static void coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
struct iovec *iov, int niov, bool create,
enum AIOCBState aiocb_type)
 {
@@ -1153,8 +1142,6 @@ out:
 aio_flush_request, s);
 s-co_send = NULL;
 qemu_co_mutex_unlock(s-lock);
-
-return 0;
 }
 
 static int read_write_object(int fd, char *buf, uint64_t oid, int copies,
@@ -1257,7 +1244,7 @@ out:
 return ret;
 }
 
-static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req)
+static void coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req)
 {
 SheepdogAIOCB *acb = aio_req-aiocb;
 bool create = false;
@@ -1282,7 +1269,7 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState 
*s, AIOReq *aio_req)
 dprintf(simultaneous CoW to % PRIx64 \n, aio_req-oid);
 QLIST_REMOVE(aio_req, aio_siblings);
 QLIST_INSERT_HEAD(s-pending_aio_head, aio_req, aio_siblings);
-return SD_RES_SUCCESS;
+return;
 }
 }
 
@@ -1292,13 +1279,13 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState 
*s, AIOReq *aio_req)
 }
 out:
 if (is_data_obj(aio_req-oid)) {
-return add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov,
-   create, acb-aiocb_type);
+add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov, create,
+acb-aiocb_type);
 } else {
 struct iovec iov;
 iov.iov_base = s-inode;
 iov.iov_len = sizeof(s-inode);
-return add_aio_request(s, aio_req, iov, 1, false, AIOCB_WRITE_UDATA);
+add_aio_request(s, aio_req, iov, 1, false, AIOCB_WRITE_UDATA);
 }
 }
 
@@ -1688,7 +1675,6 @@ static int sd_truncate(BlockDriverState *bs, int64_t 
offset)
  */
 static void coroutine_fn sd_write_done(SheepdogAIOCB *acb)
 {
-int ret;
 BDRVSheepdogState *s = acb-common.bs-opaque;
 struct iovec iov;
 AIOReq *aio_req;
@@ -1710,18 +1696,13 @@ static void coroutine_fn sd_write_done(SheepdogAIOCB 
*acb)
 aio_req = alloc_aio_req(s, acb, vid_to_vdi_oid(s-inode.vdi_id),
 data_len, offset, 0, 0, offset);
 QLIST_INSERT_HEAD(s-inflight_aio_head, aio_req, aio_siblings

[Qemu-devel] [PATCH v3 04/10] sheepdog: handle vdi objects in resend_aio_req

2013-07-25 Thread MORITA Kazutaka
The current resend_aio_req() doesn't work when the request is against
vdi objects.  This fixes the problem.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index c6e9b89..fae17ac 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -1209,11 +1209,15 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState 
*s, AIOReq *aio_req)
 return ret;
 }
 
-aio_req-oid = vid_to_data_oid(s-inode.vdi_id,
-   data_oid_to_idx(aio_req-oid));
+if (is_data_obj(aio_req-oid)) {
+aio_req-oid = vid_to_data_oid(s-inode.vdi_id,
+   data_oid_to_idx(aio_req-oid));
+} else {
+aio_req-oid = vid_to_vdi_oid(s-inode.vdi_id);
+}
 
 /* check whether this request becomes a CoW one */
-if (acb-aiocb_type == AIOCB_WRITE_UDATA) {
+if (acb-aiocb_type == AIOCB_WRITE_UDATA  is_data_obj(aio_req-oid)) {
 int idx = data_oid_to_idx(aio_req-oid);
 AIOReq *areq;
 
@@ -1241,8 +1245,15 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState 
*s, AIOReq *aio_req)
 create = true;
 }
 out:
-return add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov,
-   create, acb-aiocb_type);
+if (is_data_obj(aio_req-oid)) {
+return add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov,
+   create, acb-aiocb_type);
+} else {
+struct iovec iov;
+iov.iov_base = s-inode;
+iov.iov_len = sizeof(s-inode);
+return add_aio_request(s, aio_req, iov, 1, false, AIOCB_WRITE_UDATA);
+}
 }
 
 /* TODO Convert to fine grained options */
-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH v3 00/10] sheepdog: reconnect server after connection failure

2013-07-25 Thread MORITA Kazutaka
Currently, if a sheepdog server exits, all the connecting VMs need to
be restarted.  This series implements a feature to reconnect the
server, and enables us to do online sheepdog upgrade and avoid
restarting VMs when sheepdog servers crash unexpectedly.

v3:
 - Check return values of qemu_co_recv/send more strictly.
 - Move inflight requests to the failed list after reconnection
   completes.  This is necessary to resend I/Os while connection is
   lost.
 - Check simultaneous create in resend_aioreq().

v2:
 - Dropped nonblocking connect patches.

MORITA Kazutaka (10):
  ignore SIGPIPE in qemu-img and qemu-io
  iov: handle EOF in iov_send_recv
  sheepdog: check return values of qemu_co_recv/send correctly
  sheepdog: handle vdi objects in resend_aio_req
  sheepdog: reload inode outside of resend_aioreq
  coroutine: add co_aio_sleep_ns() to allow sleep in block drivers
  sheepdog: try to reconnect to sheepdog after network error
  sheepdog: make add_aio_request and send_aioreq void functions
  sheepdog: cancel aio requests if possible
  sheepdog: check simultaneous create in resend_aioreq

 block/sheepdog.c  | 314 --
 include/block/coroutine.h |   8 ++
 qemu-coroutine-sleep.c|  47 +++
 qemu-img.c|   4 +
 qemu-io.c |   4 +
 util/iov.c|   6 +
 6 files changed, 263 insertions(+), 120 deletions(-)

-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH v3 03/10] sheepdog: check return values of qemu_co_recv/send correctly

2013-07-25 Thread MORITA Kazutaka
If qemu_co_recv/send doesn't return the specified length, it means
that an error happened.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 6a41ad9..c6e9b89 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -489,13 +489,13 @@ static coroutine_fn int send_co_req(int sockfd, 
SheepdogReq *hdr, void *data,
 int ret;
 
 ret = qemu_co_send(sockfd, hdr, sizeof(*hdr));
-if (ret  sizeof(*hdr)) {
+if (ret != sizeof(*hdr)) {
 error_report(failed to send a req, %s, strerror(errno));
 return ret;
 }
 
 ret = qemu_co_send(sockfd, data, *wlen);
-if (ret  *wlen) {
+if (ret != *wlen) {
 error_report(failed to send a req, %s, strerror(errno));
 }
 
@@ -548,7 +548,7 @@ static coroutine_fn void do_co_req(void *opaque)
 qemu_aio_set_fd_handler(sockfd, restart_co_req, NULL, have_co_req, co);
 
 ret = qemu_co_recv(sockfd, hdr, sizeof(*hdr));
-if (ret  sizeof(*hdr)) {
+if (ret != sizeof(*hdr)) {
 error_report(failed to get a rsp, %s, strerror(errno));
 ret = -errno;
 goto out;
@@ -560,7 +560,7 @@ static coroutine_fn void do_co_req(void *opaque)
 
 if (*rlen) {
 ret = qemu_co_recv(sockfd, data, *rlen);
-if (ret  *rlen) {
+if (ret != *rlen) {
 error_report(failed to get the data, %s, strerror(errno));
 ret = -errno;
 goto out;
@@ -671,7 +671,7 @@ static void coroutine_fn aio_read_response(void *opaque)
 
 /* read a header */
 ret = qemu_co_recv(fd, rsp, sizeof(rsp));
-if (ret  0) {
+if (ret != sizeof(rsp)) {
 error_report(failed to get the header, %s, strerror(errno));
 goto out;
 }
@@ -722,7 +722,7 @@ static void coroutine_fn aio_read_response(void *opaque)
 case AIOCB_READ_UDATA:
 ret = qemu_co_recvv(fd, acb-qiov-iov, acb-qiov-niov,
 aio_req-iov_offset, rsp.data_length);
-if (ret  0) {
+if (ret != rsp.data_length) {
 error_report(failed to get the data, %s, strerror(errno));
 goto out;
 }
@@ -1075,7 +1075,7 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState 
*s, AIOReq *aio_req,
 
 /* send a header */
 ret = qemu_co_send(s-fd, hdr, sizeof(hdr));
-if (ret  0) {
+if (ret != sizeof(hdr)) {
 qemu_co_mutex_unlock(s-lock);
 error_report(failed to send a req, %s, strerror(errno));
 return -errno;
@@ -1083,7 +1083,7 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState 
*s, AIOReq *aio_req,
 
 if (wlen) {
 ret = qemu_co_sendv(s-fd, iov, niov, aio_req-iov_offset, wlen);
-if (ret  0) {
+if (ret != wlen) {
 qemu_co_mutex_unlock(s-lock);
 error_report(failed to send a data, %s, strerror(errno));
 return -errno;
-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH v3 05/10] sheepdog: reload inode outside of resend_aioreq

2013-07-25 Thread MORITA Kazutaka
This prepares for using resend_aioreq() after reconnecting to the
sheepdog server.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 33 +++--
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index fae17ac..7b22816 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -222,6 +222,11 @@ static inline uint64_t data_oid_to_idx(uint64_t oid)
 return oid  (MAX_DATA_OBJS - 1);
 }
 
+static inline uint32_t oid_to_vid(uint64_t oid)
+{
+return (oid  ~VDI_BIT)  VDI_SPACE_SHIFT;
+}
+
 static inline uint64_t vid_to_vdi_oid(uint32_t vid)
 {
 return VDI_BIT | ((uint64_t)vid  VDI_SPACE_SHIFT);
@@ -607,7 +612,7 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState 
*s, AIOReq *aio_req,
struct iovec *iov, int niov, bool create,
enum AIOCBState aiocb_type);
 static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req);
-
+static int reload_inode(BDRVSheepdogState *s, uint32_t snapid, const char 
*tag);
 
 static AIOReq *find_pending_req(BDRVSheepdogState *s, uint64_t oid)
 {
@@ -755,6 +760,19 @@ static void coroutine_fn aio_read_response(void *opaque)
 case SD_RES_SUCCESS:
 break;
 case SD_RES_READONLY:
+if (s-inode.vdi_id == oid_to_vid(aio_req-oid)) {
+ret = reload_inode(s, 0, );
+if (ret  0) {
+goto out;
+}
+}
+
+if (is_data_obj(aio_req-oid)) {
+aio_req-oid = vid_to_data_oid(s-inode.vdi_id,
+   data_oid_to_idx(aio_req-oid));
+} else {
+aio_req-oid = vid_to_vdi_oid(s-inode.vdi_id);
+}
 ret = resend_aioreq(s, aio_req);
 if (ret == SD_RES_SUCCESS) {
 goto out;
@@ -1202,19 +1220,6 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState 
*s, AIOReq *aio_req)
 {
 SheepdogAIOCB *acb = aio_req-aiocb;
 bool create = false;
-int ret;
-
-ret = reload_inode(s, 0, );
-if (ret  0) {
-return ret;
-}
-
-if (is_data_obj(aio_req-oid)) {
-aio_req-oid = vid_to_data_oid(s-inode.vdi_id,
-   data_oid_to_idx(aio_req-oid));
-} else {
-aio_req-oid = vid_to_vdi_oid(s-inode.vdi_id);
-}
 
 /* check whether this request becomes a CoW one */
 if (acb-aiocb_type == AIOCB_WRITE_UDATA  is_data_obj(aio_req-oid)) {
-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH v3 07/10] sheepdog: try to reconnect to sheepdog after network error

2013-07-25 Thread MORITA Kazutaka
This introduces a failed request queue and links all the inflight
requests to the list after network error happens.  After QEMU
reconnects to the sheepdog server successfully, the sheepdog block
driver will retry all the requests in the failed queue.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 72 
 1 file changed, 57 insertions(+), 15 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 7b22816..43a6feb 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -318,8 +318,11 @@ typedef struct BDRVSheepdogState {
 Coroutine *co_recv;
 
 uint32_t aioreq_seq_num;
+
+/* Every aio request must be linked to either of these queues. */
 QLIST_HEAD(inflight_aio_head, AIOReq) inflight_aio_head;
 QLIST_HEAD(pending_aio_head, AIOReq) pending_aio_head;
+QLIST_HEAD(failed_aio_head, AIOReq) failed_aio_head;
 } BDRVSheepdogState;
 
 static const char * sd_strerror(int err)
@@ -613,6 +616,8 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState 
*s, AIOReq *aio_req,
enum AIOCBState aiocb_type);
 static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req);
 static int reload_inode(BDRVSheepdogState *s, uint32_t snapid, const char 
*tag);
+static int get_sheep_fd(BDRVSheepdogState *s);
+static void co_write_request(void *opaque);
 
 static AIOReq *find_pending_req(BDRVSheepdogState *s, uint64_t oid)
 {
@@ -654,6 +659,44 @@ static void coroutine_fn 
send_pending_req(BDRVSheepdogState *s, uint64_t oid)
 }
 }
 
+static coroutine_fn void reconnect_to_sdog(void *opaque)
+{
+BDRVSheepdogState *s = opaque;
+AIOReq *aio_req, *next;
+
+qemu_aio_set_fd_handler(s-fd, NULL, NULL, NULL, NULL);
+close(s-fd);
+s-fd = -1;
+
+/* Wait for outstanding write requests to be completed. */
+while (s-co_send != NULL) {
+co_write_request(opaque);
+}
+
+/* Try to reconnect the sheepdog server every one second. */
+while (s-fd  0) {
+s-fd = get_sheep_fd(s);
+if (s-fd  0) {
+dprintf(Wait for connection to be established\n);
+co_aio_sleep_ns(10ULL);
+}
+};
+
+/* Move all the inflight requests to the failed queue. */
+QLIST_FOREACH_SAFE(aio_req, s-inflight_aio_head, aio_siblings, next) {
+QLIST_REMOVE(aio_req, aio_siblings);
+QLIST_INSERT_HEAD(s-failed_aio_head, aio_req, aio_siblings);
+}
+
+/* Resend all the failed aio requests. */
+while (!QLIST_EMPTY(s-failed_aio_head)) {
+aio_req = QLIST_FIRST(s-failed_aio_head);
+QLIST_REMOVE(aio_req, aio_siblings);
+QLIST_INSERT_HEAD(s-inflight_aio_head, aio_req, aio_siblings);
+resend_aioreq(s, aio_req);
+}
+}
+
 /*
  * Receive responses of the I/O requests.
  *
@@ -670,15 +713,11 @@ static void coroutine_fn aio_read_response(void *opaque)
 SheepdogAIOCB *acb;
 uint64_t idx;
 
-if (QLIST_EMPTY(s-inflight_aio_head)) {
-goto out;
-}
-
 /* read a header */
 ret = qemu_co_recv(fd, rsp, sizeof(rsp));
 if (ret != sizeof(rsp)) {
 error_report(failed to get the header, %s, strerror(errno));
-goto out;
+goto err;
 }
 
 /* find the right aio_req from the inflight aio list */
@@ -689,7 +728,7 @@ static void coroutine_fn aio_read_response(void *opaque)
 }
 if (!aio_req) {
 error_report(cannot find aio_req %x, rsp.id);
-goto out;
+goto err;
 }
 
 acb = aio_req-aiocb;
@@ -729,7 +768,7 @@ static void coroutine_fn aio_read_response(void *opaque)
 aio_req-iov_offset, rsp.data_length);
 if (ret != rsp.data_length) {
 error_report(failed to get the data, %s, strerror(errno));
-goto out;
+goto err;
 }
 break;
 case AIOCB_FLUSH_CACHE:
@@ -763,10 +802,9 @@ static void coroutine_fn aio_read_response(void *opaque)
 if (s-inode.vdi_id == oid_to_vid(aio_req-oid)) {
 ret = reload_inode(s, 0, );
 if (ret  0) {
-goto out;
+goto err;
 }
 }
-
 if (is_data_obj(aio_req-oid)) {
 aio_req-oid = vid_to_data_oid(s-inode.vdi_id,
data_oid_to_idx(aio_req-oid));
@@ -794,6 +832,10 @@ static void coroutine_fn aio_read_response(void *opaque)
 }
 out:
 s-co_recv = NULL;
+return;
+err:
+s-co_recv = NULL;
+reconnect_to_sdog(opaque);
 }
 
 static void co_read_response(void *opaque)
@@ -819,7 +861,8 @@ static int aio_flush_request(void *opaque)
 BDRVSheepdogState *s = opaque;
 
 return !QLIST_EMPTY(s-inflight_aio_head) ||
-!QLIST_EMPTY(s-pending_aio_head);
+!QLIST_EMPTY(s-pending_aio_head) ||
+!QLIST_EMPTY(s-failed_aio_head);
 }
 
 /*
@@ -1094,23 +1137,21 @@ static int coroutine_fn 
add_aio_request

[Qemu-devel] [PATCH v3 09/10] sheepdog: cancel aio requests if possible

2013-07-25 Thread MORITA Kazutaka
This patch tries to cancel aio requests in pending queue and failed
queue.  When the sheepdog driver cannot cancel the requests, it waits
for them to be completed.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 70 +++-
 1 file changed, 59 insertions(+), 11 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 9f3fa89..7bf882a 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -294,7 +294,8 @@ struct SheepdogAIOCB {
 Coroutine *coroutine;
 void (*aio_done_func)(SheepdogAIOCB *);
 
-bool canceled;
+bool cancelable;
+bool *finished;
 int nr_pending;
 };
 
@@ -411,6 +412,7 @@ static inline void free_aio_req(BDRVSheepdogState *s, 
AIOReq *aio_req)
 {
 SheepdogAIOCB *acb = aio_req-aiocb;
 
+acb-cancelable = false;
 QLIST_REMOVE(aio_req, aio_siblings);
 g_free(aio_req);
 
@@ -419,23 +421,68 @@ static inline void free_aio_req(BDRVSheepdogState *s, 
AIOReq *aio_req)
 
 static void coroutine_fn sd_finish_aiocb(SheepdogAIOCB *acb)
 {
-if (!acb-canceled) {
-qemu_coroutine_enter(acb-coroutine, NULL);
+qemu_coroutine_enter(acb-coroutine, NULL);
+if (acb-finished) {
+*acb-finished = true;
 }
 qemu_aio_release(acb);
 }
 
+/*
+ * Check whether the specified acb can be canceled
+ *
+ * We can cancel aio when any request belonging to the acb is:
+ *  - Not processed by the sheepdog server.
+ *  - Not linked to the inflight queue.
+ */
+static bool sd_acb_cancelable(const SheepdogAIOCB *acb)
+{
+BDRVSheepdogState *s = acb-common.bs-opaque;
+AIOReq *aioreq;
+
+if (!acb-cancelable) {
+return false;
+}
+
+QLIST_FOREACH(aioreq, s-inflight_aio_head, aio_siblings) {
+if (aioreq-aiocb == acb) {
+return false;
+}
+}
+
+return false;
+}
+
 static void sd_aio_cancel(BlockDriverAIOCB *blockacb)
 {
 SheepdogAIOCB *acb = (SheepdogAIOCB *)blockacb;
+BDRVSheepdogState *s = acb-common.bs-opaque;
+AIOReq *aioreq, *next;
+bool finished = false;
+
+acb-finished = finished;
+while (!finished) {
+if (sd_acb_cancelable(acb)) {
+/* Remove outstanding requests from pending and failed queues.  */
+QLIST_FOREACH_SAFE(aioreq, s-pending_aio_head, aio_siblings,
+   next) {
+if (aioreq-aiocb == acb) {
+free_aio_req(s, aioreq);
+}
+}
+QLIST_FOREACH_SAFE(aioreq, s-failed_aio_head, aio_siblings,
+   next) {
+if (aioreq-aiocb == acb) {
+free_aio_req(s, aioreq);
+}
+}
 
-/*
- * Sheepdog cannot cancel the requests which are already sent to
- * the servers, so we just complete the request with -EIO here.
- */
-acb-ret = -EIO;
-qemu_coroutine_enter(acb-coroutine, NULL);
-acb-canceled = true;
+assert(acb-nr_pending == 0);
+sd_finish_aiocb(acb);
+return;
+}
+qemu_aio_wait();
+}
 }
 
 static const AIOCBInfo sd_aiocb_info = {
@@ -456,7 +503,8 @@ static SheepdogAIOCB *sd_aio_setup(BlockDriverState *bs, 
QEMUIOVector *qiov,
 acb-nb_sectors = nb_sectors;
 
 acb-aio_done_func = NULL;
-acb-canceled = false;
+acb-cancelable = true;
+acb-finished = NULL;
 acb-coroutine = qemu_coroutine_self();
 acb-ret = 0;
 acb-nr_pending = 0;
-- 
1.8.1.3.566.gaa39828




Re: [Qemu-devel] [sheepdog] [PATCH v3 03/10] sheepdog: check return values of qemu_co_recv/send correctly

2013-07-25 Thread MORITA Kazutaka
At Thu, 25 Jul 2013 16:46:36 +0800,
Liu Yuan wrote:
 
 On Thu, Jul 25, 2013 at 05:31:58PM +0900, MORITA Kazutaka wrote:
  If qemu_co_recv/send doesn't return the specified length, it means
  that an error happened.
  
  Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
  ---
   block/sheepdog.c | 16 
   1 file changed, 8 insertions(+), 8 deletions(-)
  
  diff --git a/block/sheepdog.c b/block/sheepdog.c
  index 6a41ad9..c6e9b89 100644
  --- a/block/sheepdog.c
  +++ b/block/sheepdog.c
  @@ -489,13 +489,13 @@ static coroutine_fn int send_co_req(int sockfd, 
  SheepdogReq *hdr, void *data,
   int ret;
   
   ret = qemu_co_send(sockfd, hdr, sizeof(*hdr));
  -if (ret  sizeof(*hdr)) {
  +if (ret != sizeof(*hdr)) {
   error_report(failed to send a req, %s, strerror(errno));
   return ret;
   }
   
   ret = qemu_co_send(sockfd, data, *wlen);
  -if (ret  *wlen) {
  +if (ret != *wlen) {
   error_report(failed to send a req, %s, strerror(errno));
   }
   
  @@ -548,7 +548,7 @@ static coroutine_fn void do_co_req(void *opaque)
   qemu_aio_set_fd_handler(sockfd, restart_co_req, NULL, have_co_req, co);
   
   ret = qemu_co_recv(sockfd, hdr, sizeof(*hdr));
  -if (ret  sizeof(*hdr)) {
  +if (ret != sizeof(*hdr)) {
   error_report(failed to get a rsp, %s, strerror(errno));
   ret = -errno;
   goto out;
  @@ -560,7 +560,7 @@ static coroutine_fn void do_co_req(void *opaque)
   
   if (*rlen) {
   ret = qemu_co_recv(sockfd, data, *rlen);
  -if (ret  *rlen) {
  +if (ret != *rlen) {
   error_report(failed to get the data, %s, strerror(errno));
   ret = -errno;
   goto out;
  @@ -671,7 +671,7 @@ static void coroutine_fn aio_read_response(void *opaque)
   
   /* read a header */
   ret = qemu_co_recv(fd, rsp, sizeof(rsp));
  -if (ret  0) {
  +if (ret != sizeof(rsp)) {
   error_report(failed to get the header, %s, strerror(errno));
   goto out;
   }
  @@ -722,7 +722,7 @@ static void coroutine_fn aio_read_response(void *opaque)
   case AIOCB_READ_UDATA:
   ret = qemu_co_recvv(fd, acb-qiov-iov, acb-qiov-niov,
   aio_req-iov_offset, rsp.data_length);
  -if (ret  0) {
  +if (ret != rsp.data_length) {
   error_report(failed to get the data, %s, strerror(errno));
   goto out;
   }
  @@ -1075,7 +1075,7 @@ static int coroutine_fn 
  add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
   
   /* send a header */
   ret = qemu_co_send(s-fd, hdr, sizeof(hdr));
  -if (ret  0) {
  +if (ret != sizeof(hdr)) {
   qemu_co_mutex_unlock(s-lock);
   error_report(failed to send a req, %s, strerror(errno));
   return -errno;
  @@ -1083,7 +1083,7 @@ static int coroutine_fn 
  add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
   
   if (wlen) {
   ret = qemu_co_sendv(s-fd, iov, niov, aio_req-iov_offset, wlen);
  -if (ret  0) {
  +if (ret != wlen) {
   qemu_co_mutex_unlock(s-lock);
   error_report(failed to send a data, %s, strerror(errno));
   return -errno;
 
 These checks are wrong because signed int will be converted to unsgned int. 
 E.g,
 
 ret = -1;
 (ret  sizeof(hdr)) will always be false, since -1 is converted to unsigned 
 int.

Yes, that's the reason I replaced (ret  sizeof(hdr)) with (ret != sizeof(hdr)).

ret = -1;
(ret != sizeof(hdr)) will be true.

Thanks,

Kazutaka



Re: [Qemu-devel] [sheepdog] [PATCH v3 07/10] sheepdog: try to reconnect to sheepdog after network error

2013-07-25 Thread MORITA Kazutaka
At Thu, 25 Jul 2013 17:13:46 +0800,
Liu Yuan wrote:
 
  +
  +/* Try to reconnect the sheepdog server every one second. */
  +while (s-fd  0) {
  +s-fd = get_sheep_fd(s);
  +if (s-fd  0) {
  +dprintf(Wait for connection to be established\n);
  +co_aio_sleep_ns(10ULL);
  +}
  +};
  +
  +/* Move all the inflight requests to the failed queue. */
  +QLIST_FOREACH_SAFE(aio_req, s-inflight_aio_head, aio_siblings, next) 
  {
  +QLIST_REMOVE(aio_req, aio_siblings);
  +QLIST_INSERT_HEAD(s-failed_aio_head, aio_req, aio_siblings);
  +}
  +
  +/* Resend all the failed aio requests. */
  +while (!QLIST_EMPTY(s-failed_aio_head)) {
  +aio_req = QLIST_FIRST(s-failed_aio_head);
  +QLIST_REMOVE(aio_req, aio_siblings);
  +QLIST_INSERT_HEAD(s-inflight_aio_head, aio_req, aio_siblings);
  +resend_aioreq(s, aio_req);
  +}
  +}
  +
 
 Is failed queue necessary? Here you just move requests from inflight queue to
 failed queue, then interate the failed queue to send them all.
 
 Isn't it simpler we just resend the requests in the inflight queue like
 
  +QLIST_FOREACH(aio_req, s-inflight_aio_head, aio_siblings, next) {
  +resend_aioreq(s, aio_req);
  +}

resend_aioreq() can yield and a new aio request can be added to the
inflight queue during this loop.  To avoid mixing new requests and
failed ones, I think the failed queue is necessary.

Thanks,

Kazutaka



Re: [Qemu-devel] [sheepdog] [PATCH v3 09/10] sheepdog: cancel aio requests if possible

2013-07-25 Thread MORITA Kazutaka
At Thu, 25 Jul 2013 17:04:53 +0800,
Liu Yuan wrote:
 
   
  +/*
  + * Check whether the specified acb can be canceled
  + *
  + * We can cancel aio when any request belonging to the acb is:
  + *  - Not processed by the sheepdog server.
  + *  - Not linked to the inflight queue.
  + */
  +static bool sd_acb_cancelable(const SheepdogAIOCB *acb)
  +{
  +BDRVSheepdogState *s = acb-common.bs-opaque;
  +AIOReq *aioreq;
  +
  +if (!acb-cancelable) {
  +return false;
  +}
  +
  +QLIST_FOREACH(aioreq, s-inflight_aio_head, aio_siblings) {
  +if (aioreq-aiocb == acb) {
  +return false;
  +}
  +}
  +
  +return false;
 
 return true; ?

Oops, thanks for the catch!

Thanks,

Kazutaka



Re: [Qemu-devel] [PATCH v6 13/18] block/sheepdog: drop have_co_req() and aio_flush_request()

2013-07-25 Thread MORITA Kazutaka
At Thu, 25 Jul 2013 17:18:20 +0200,
Stefan Hajnoczi wrote:
 
 .io_flush() is no longer called so drop have_co_req() and
 aio_flush_request().
 
 Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
 ---
  block/sheepdog.c | 25 +
  1 file changed, 5 insertions(+), 20 deletions(-)

Reviewed-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp



Re: [Qemu-devel] [sheepdog] [PATCH v3 07/10] sheepdog: try to reconnect to sheepdog after network error

2013-07-25 Thread MORITA Kazutaka
At Thu, 25 Jul 2013 21:20:43 +0800,
Liu Yuan wrote:
 
 On Thu, Jul 25, 2013 at 09:53:14PM +0900, MORITA Kazutaka wrote:
  At Thu, 25 Jul 2013 17:13:46 +0800,
  Liu Yuan wrote:
   
+
+/* Try to reconnect the sheepdog server every one second. */
+while (s-fd  0) {
+s-fd = get_sheep_fd(s);
+if (s-fd  0) {
+dprintf(Wait for connection to be established\n);
+co_aio_sleep_ns(10ULL);
+}
+};
+
+/* Move all the inflight requests to the failed queue. */
+QLIST_FOREACH_SAFE(aio_req, s-inflight_aio_head, aio_siblings, 
next) {
+QLIST_REMOVE(aio_req, aio_siblings);
+QLIST_INSERT_HEAD(s-failed_aio_head, aio_req, aio_siblings);
+}
+
+/* Resend all the failed aio requests. */
+while (!QLIST_EMPTY(s-failed_aio_head)) {
+aio_req = QLIST_FIRST(s-failed_aio_head);
+QLIST_REMOVE(aio_req, aio_siblings);
+QLIST_INSERT_HEAD(s-inflight_aio_head, aio_req, 
aio_siblings);
+resend_aioreq(s, aio_req);
+}
+}
+
   
   Is failed queue necessary? Here you just move requests from inflight 
   queue to
   failed queue, then interate the failed queue to send them all.
   
   Isn't it simpler we just resend the requests in the inflight queue like
   
+QLIST_FOREACH(aio_req, s-inflight_aio_head, aio_siblings, next) {
+resend_aioreq(s, aio_req);
+}
  
  resend_aioreq() can yield and a new aio request can be added to the
  inflight queue during this loop.  To avoid mixing new requests and
  failed ones, I think the failed queue is necessary.
  
 
 Okay, make sense. This should be included in the source file. You can add my

Okay, thanks for your review and comments.

Kazutaka

 
 Tested-and-reviewed-by: Liu Yuan namei.u...@gmail.com
 
 to sheepdog patches
 
 Thanks,
 Yuan



Re: [Qemu-devel] [sheepdog] [PATCH 00/11] sheepdog: reconnect server after connection failure

2013-07-24 Thread MORITA Kazutaka
At Tue, 23 Jul 2013 13:08:04 +0200,
Luca Lazzeroni wrote:
 
 Is this series of patches applyable to sheepdog-stable-0.6 band qemu 1.5.0 ? 
 I've seen they use  async i/o...

This series is against upstream qemu.  I've not tried it with qemu
1.5.x, but probably it can be applied without a big change.

Thanks,

Kazutaka



Re: [Qemu-devel] [sheepdog] [PATCH 03/11] qemu-sockets: make wait_for_connect be invoked in qemu_aio_wait

2013-07-24 Thread MORITA Kazutaka
At Tue, 23 Jul 2013 13:36:08 +0200,
Paolo Bonzini wrote:
 
 Il 23/07/2013 10:30, MORITA Kazutaka ha scritto:
  This allows us to use inet_nonblocking_connect() and
  unix_nonblocking_connect() in block drivers.
  
  qemu-ga needs to link block-obj to resolve dependencies of
  qemu_aio_set_fd_handler().
  
  Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
 
 I'm not sure this is safe.  You could have e.g. migration start during
 qemu_aio_wait().

I thought that it is safe.  Qemu creates another thread for migration
and it can be started at any time, either way.  However, so as not to
hurt the existing codes, it might be better to create another
nonblocking connect for qemu_aio_wait().

I think of dropping this patch from this series and will leave it for
another day.  Usually, sheepdog users prepare a local sheepdog daemon
to be connected to, and connect() is unlikely to sleep for a long
time.  Using a blocking connect wouldn't be a big problem.

Thanks,

Kazutaka



[Qemu-devel] [PATCH v2 0/9] sheepdog: reconnect server after connection failure

2013-07-24 Thread MORITA Kazutaka
Currently, if a sheepdog server exits, all the connecting VMs need to
be restarted.  This series implements a feature to reconnect the
server, and enables us to do online sheepdog upgrade and avoid
restarting VMs when sheepdog servers crash unexpectedly.

v2:
 - Dropped nonblocking connect patches

MORITA Kazutaka (9):
  ignore SIGPIPE in qemu-img and qemu-io
  iov: handle EOF in iov_send_recv
  sheepdog: check return values of qemu_co_recv/send correctly
  sheepdog: handle vdi objects in resend_aio_req
  sheepdog: reload inode outside of resend_aioreq
  coroutine: add co_aio_sleep_ns() to allow sleep in block drivers
  sheepdog: try to reconnect to sheepdog after network error
  sheepdog: make add_aio_request and send_aioreq void functions
  sheepdog: cancel aio requests if possible

 block/sheepdog.c  | 244 ++
 include/block/coroutine.h |   8 ++
 qemu-coroutine-sleep.c|  47 +
 qemu-img.c|   4 +
 qemu-io.c |   4 +
 util/iov.c|   6 ++
 6 files changed, 228 insertions(+), 85 deletions(-)

-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH v2 6/9] coroutine: add co_aio_sleep_ns() to allow sleep in block drivers

2013-07-24 Thread MORITA Kazutaka
This helper function behaves similarly to co_sleep_ns(), but the
sleeping coroutine will be resumed when using qemu_aio_wait().

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 include/block/coroutine.h |  8 
 qemu-coroutine-sleep.c| 47 +++
 2 files changed, 55 insertions(+)

diff --git a/include/block/coroutine.h b/include/block/coroutine.h
index 377805a..23ea6e9 100644
--- a/include/block/coroutine.h
+++ b/include/block/coroutine.h
@@ -210,6 +210,14 @@ void qemu_co_rwlock_unlock(CoRwlock *lock);
 void coroutine_fn co_sleep_ns(QEMUClock *clock, int64_t ns);
 
 /**
+ * Yield the coroutine for a given duration
+ *
+ * Behaves similarly to co_sleep_ns(), but the sleeping coroutine will be
+ * resumed when using qemu_aio_wait().
+ */
+void coroutine_fn co_aio_sleep_ns(int64_t ns);
+
+/**
  * Yield until a file descriptor becomes readable
  *
  * Note that this function clobbers the handlers for the file descriptor.
diff --git a/qemu-coroutine-sleep.c b/qemu-coroutine-sleep.c
index 169ce5c..3955347 100644
--- a/qemu-coroutine-sleep.c
+++ b/qemu-coroutine-sleep.c
@@ -13,6 +13,7 @@
 
 #include block/coroutine.h
 #include qemu/timer.h
+#include qemu/thread.h
 
 typedef struct CoSleepCB {
 QEMUTimer *ts;
@@ -37,3 +38,49 @@ void coroutine_fn co_sleep_ns(QEMUClock *clock, int64_t ns)
 qemu_del_timer(sleep_cb.ts);
 qemu_free_timer(sleep_cb.ts);
 }
+
+typedef struct CoAioSleepCB {
+QEMUBH *bh;
+int64_t ns;
+Coroutine *co;
+} CoAioSleepCB;
+
+static void co_aio_sleep_cb(void *opaque)
+{
+CoAioSleepCB *aio_sleep_cb = opaque;
+
+qemu_coroutine_enter(aio_sleep_cb-co, NULL);
+}
+
+static void *sleep_thread(void *opaque)
+{
+CoAioSleepCB *aio_sleep_cb = opaque;
+struct timespec req = {
+.tv_sec = aio_sleep_cb-ns / 10,
+.tv_nsec = aio_sleep_cb-ns % 10,
+};
+struct timespec rem;
+
+while (nanosleep(req, rem)  0  errno == EINTR) {
+req = rem;
+}
+
+qemu_bh_schedule(aio_sleep_cb-bh);
+
+return NULL;
+}
+
+void coroutine_fn co_aio_sleep_ns(int64_t ns)
+{
+CoAioSleepCB aio_sleep_cb = {
+.ns = ns,
+.co = qemu_coroutine_self(),
+};
+QemuThread thread;
+
+aio_sleep_cb.bh = qemu_bh_new(co_aio_sleep_cb, aio_sleep_cb);
+qemu_thread_create(thread, sleep_thread, aio_sleep_cb,
+   QEMU_THREAD_DETACHED);
+qemu_coroutine_yield();
+qemu_bh_delete(aio_sleep_cb.bh);
+}
-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH v2 4/9] sheepdog: handle vdi objects in resend_aio_req

2013-07-24 Thread MORITA Kazutaka
The current resend_aio_req() doesn't work when the request is against
vdi objects.  This fixes the problem.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index bca5730..f25c7df 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -1209,11 +1209,15 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState 
*s, AIOReq *aio_req)
 return ret;
 }
 
-aio_req-oid = vid_to_data_oid(s-inode.vdi_id,
-   data_oid_to_idx(aio_req-oid));
+if (is_data_obj(aio_req-oid)) {
+aio_req-oid = vid_to_data_oid(s-inode.vdi_id,
+   data_oid_to_idx(aio_req-oid));
+} else {
+aio_req-oid = vid_to_vdi_oid(s-inode.vdi_id);
+}
 
 /* check whether this request becomes a CoW one */
-if (acb-aiocb_type == AIOCB_WRITE_UDATA) {
+if (acb-aiocb_type == AIOCB_WRITE_UDATA  is_data_obj(aio_req-oid)) {
 int idx = data_oid_to_idx(aio_req-oid);
 AIOReq *areq;
 
@@ -1241,8 +1245,15 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState 
*s, AIOReq *aio_req)
 create = true;
 }
 out:
-return add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov,
-   create, acb-aiocb_type);
+if (is_data_obj(aio_req-oid)) {
+return add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov,
+   create, acb-aiocb_type);
+} else {
+struct iovec iov;
+iov.iov_base = s-inode;
+iov.iov_len = sizeof(s-inode);
+return add_aio_request(s, aio_req, iov, 1, false, AIOCB_WRITE_UDATA);
+}
 }
 
 /* TODO Convert to fine grained options */
-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH v2 3/9] sheepdog: check return values of qemu_co_recv/send correctly

2013-07-24 Thread MORITA Kazutaka
qemu_co_recv/send return shorter length on error.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 6a41ad9..bca5730 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -671,7 +671,7 @@ static void coroutine_fn aio_read_response(void *opaque)
 
 /* read a header */
 ret = qemu_co_recv(fd, rsp, sizeof(rsp));
-if (ret  0) {
+if (ret  sizeof(rsp)) {
 error_report(failed to get the header, %s, strerror(errno));
 goto out;
 }
@@ -722,7 +722,7 @@ static void coroutine_fn aio_read_response(void *opaque)
 case AIOCB_READ_UDATA:
 ret = qemu_co_recvv(fd, acb-qiov-iov, acb-qiov-niov,
 aio_req-iov_offset, rsp.data_length);
-if (ret  0) {
+if (ret  rsp.data_length) {
 error_report(failed to get the data, %s, strerror(errno));
 goto out;
 }
@@ -1075,7 +1075,7 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState 
*s, AIOReq *aio_req,
 
 /* send a header */
 ret = qemu_co_send(s-fd, hdr, sizeof(hdr));
-if (ret  0) {
+if (ret  sizeof(hdr)) {
 qemu_co_mutex_unlock(s-lock);
 error_report(failed to send a req, %s, strerror(errno));
 return -errno;
@@ -1083,7 +1083,7 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState 
*s, AIOReq *aio_req,
 
 if (wlen) {
 ret = qemu_co_sendv(s-fd, iov, niov, aio_req-iov_offset, wlen);
-if (ret  0) {
+if (ret  wlen) {
 qemu_co_mutex_unlock(s-lock);
 error_report(failed to send a data, %s, strerror(errno));
 return -errno;
-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH v2 1/9] ignore SIGPIPE in qemu-img and qemu-io

2013-07-24 Thread MORITA Kazutaka
This prevents the tools from being stopped when they write data to a
closed connection in the other side.

Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 qemu-img.c | 4 
 qemu-io.c  | 4 
 2 files changed, 8 insertions(+)

diff --git a/qemu-img.c b/qemu-img.c
index c55ca5c..919d464 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2319,6 +2319,10 @@ int main(int argc, char **argv)
 const img_cmd_t *cmd;
 const char *cmdname;
 
+#ifdef CONFIG_POSIX
+signal(SIGPIPE, SIG_IGN);
+#endif
+
 error_set_progname(argv[0]);
 
 qemu_init_main_loop();
diff --git a/qemu-io.c b/qemu-io.c
index cb9def5..d54dc86 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -335,6 +335,10 @@ int main(int argc, char **argv)
 int opt_index = 0;
 int flags = BDRV_O_UNMAP;
 
+#ifdef CONFIG_POSIX
+signal(SIGPIPE, SIG_IGN);
+#endif
+
 progname = basename(argv[0]);
 
 while ((c = getopt_long(argc, argv, sopt, lopt, opt_index)) != -1) {
-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH v2 9/9] sheepdog: cancel aio requests if possible

2013-07-24 Thread MORITA Kazutaka
This patch tries to cancel aio requests in pending queue and failed
queue.  When the sheepdog driver cannot cancel the requests, it waits
for them to be completed.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 70 +++-
 1 file changed, 59 insertions(+), 11 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 42a30f1..58e03c8 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -294,7 +294,8 @@ struct SheepdogAIOCB {
 Coroutine *coroutine;
 void (*aio_done_func)(SheepdogAIOCB *);
 
-bool canceled;
+bool cancelable;
+bool *finished;
 int nr_pending;
 };
 
@@ -411,6 +412,7 @@ static inline void free_aio_req(BDRVSheepdogState *s, 
AIOReq *aio_req)
 {
 SheepdogAIOCB *acb = aio_req-aiocb;
 
+acb-cancelable = false;
 QLIST_REMOVE(aio_req, aio_siblings);
 g_free(aio_req);
 
@@ -419,23 +421,68 @@ static inline void free_aio_req(BDRVSheepdogState *s, 
AIOReq *aio_req)
 
 static void coroutine_fn sd_finish_aiocb(SheepdogAIOCB *acb)
 {
-if (!acb-canceled) {
-qemu_coroutine_enter(acb-coroutine, NULL);
+qemu_coroutine_enter(acb-coroutine, NULL);
+if (acb-finished) {
+*acb-finished = true;
 }
 qemu_aio_release(acb);
 }
 
+/*
+ * Check whether the specified acb can be canceled
+ *
+ * We can cancel aio when any request belonging to the acb is:
+ *  - Not processed by the sheepdog server.
+ *  - Not linked to the inflight queue.
+ */
+static bool sd_acb_cancelable(const SheepdogAIOCB *acb)
+{
+BDRVSheepdogState *s = acb-common.bs-opaque;
+AIOReq *aioreq;
+
+if (!acb-cancelable) {
+return false;
+}
+
+QLIST_FOREACH(aioreq, s-inflight_aio_head, aio_siblings) {
+if (aioreq-aiocb == acb) {
+return false;
+}
+}
+
+return false;
+}
+
 static void sd_aio_cancel(BlockDriverAIOCB *blockacb)
 {
 SheepdogAIOCB *acb = (SheepdogAIOCB *)blockacb;
+BDRVSheepdogState *s = acb-common.bs-opaque;
+AIOReq *aioreq, *next;
+bool finished = false;
+
+acb-finished = finished;
+while (!finished) {
+if (sd_acb_cancelable(acb)) {
+/* Remove outstanding requests from pending and failed queues.  */
+QLIST_FOREACH_SAFE(aioreq, s-pending_aio_head, aio_siblings,
+   next) {
+if (aioreq-aiocb == acb) {
+free_aio_req(s, aioreq);
+}
+}
+QLIST_FOREACH_SAFE(aioreq, s-failed_aio_head, aio_siblings,
+   next) {
+if (aioreq-aiocb == acb) {
+free_aio_req(s, aioreq);
+}
+}
 
-/*
- * Sheepdog cannot cancel the requests which are already sent to
- * the servers, so we just complete the request with -EIO here.
- */
-acb-ret = -EIO;
-qemu_coroutine_enter(acb-coroutine, NULL);
-acb-canceled = true;
+assert(acb-nr_pending == 0);
+sd_finish_aiocb(acb);
+return;
+}
+qemu_aio_wait();
+}
 }
 
 static const AIOCBInfo sd_aiocb_info = {
@@ -456,7 +503,8 @@ static SheepdogAIOCB *sd_aio_setup(BlockDriverState *bs, 
QEMUIOVector *qiov,
 acb-nb_sectors = nb_sectors;
 
 acb-aio_done_func = NULL;
-acb-canceled = false;
+acb-cancelable = true;
+acb-finished = NULL;
 acb-coroutine = qemu_coroutine_self();
 acb-ret = 0;
 acb-nr_pending = 0;
-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH v2 7/9] sheepdog: try to reconnect to sheepdog after network error

2013-07-24 Thread MORITA Kazutaka
This introduces a failed request queue and links all the inflight
requests to the list after network error happens.  After QEMU
reconnects to the sheepdog server successfully, the sheepdog block
driver will retry all the requests in the failed queue.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 72 
 1 file changed, 57 insertions(+), 15 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index cde887b..303354e 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -318,8 +318,11 @@ typedef struct BDRVSheepdogState {
 Coroutine *co_recv;
 
 uint32_t aioreq_seq_num;
+
+/* Every aio request must be linked to either of these queues. */
 QLIST_HEAD(inflight_aio_head, AIOReq) inflight_aio_head;
 QLIST_HEAD(pending_aio_head, AIOReq) pending_aio_head;
+QLIST_HEAD(failed_aio_head, AIOReq) failed_aio_head;
 } BDRVSheepdogState;
 
 static const char * sd_strerror(int err)
@@ -613,6 +616,8 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState 
*s, AIOReq *aio_req,
enum AIOCBState aiocb_type);
 static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req);
 static int reload_inode(BDRVSheepdogState *s, uint32_t snapid, const char 
*tag);
+static int get_sheep_fd(BDRVSheepdogState *s);
+static void co_write_request(void *opaque);
 
 static AIOReq *find_pending_req(BDRVSheepdogState *s, uint64_t oid)
 {
@@ -654,6 +659,44 @@ static void coroutine_fn 
send_pending_req(BDRVSheepdogState *s, uint64_t oid)
 }
 }
 
+static coroutine_fn void reconnect_to_sdog(void *opaque)
+{
+BDRVSheepdogState *s = opaque;
+AIOReq *aio_req, *next;
+
+qemu_aio_set_fd_handler(s-fd, NULL, NULL, NULL, NULL);
+close(s-fd);
+s-fd = -1;
+
+/* Wait for outstanding write requests to be completed. */
+while (s-co_send != NULL) {
+co_write_request(opaque);
+}
+
+/* Move all the inflight requests to the failed queue. */
+QLIST_FOREACH_SAFE(aio_req, s-inflight_aio_head, aio_siblings, next) {
+QLIST_REMOVE(aio_req, aio_siblings);
+QLIST_INSERT_HEAD(s-failed_aio_head, aio_req, aio_siblings);
+}
+
+/* Try to reconnect the sheepdog server every one second. */
+while (s-fd  0) {
+s-fd = get_sheep_fd(s);
+if (s-fd  0) {
+dprintf(Wait for connection to be established\n);
+co_aio_sleep_ns(10ULL);
+}
+};
+
+/* Resend all the failed aio requests. */
+while (!QLIST_EMPTY(s-failed_aio_head)) {
+aio_req = QLIST_FIRST(s-failed_aio_head);
+QLIST_REMOVE(aio_req, aio_siblings);
+QLIST_INSERT_HEAD(s-inflight_aio_head, aio_req, aio_siblings);
+resend_aioreq(s, aio_req);
+}
+}
+
 /*
  * Receive responses of the I/O requests.
  *
@@ -670,15 +713,11 @@ static void coroutine_fn aio_read_response(void *opaque)
 SheepdogAIOCB *acb;
 uint64_t idx;
 
-if (QLIST_EMPTY(s-inflight_aio_head)) {
-goto out;
-}
-
 /* read a header */
 ret = qemu_co_recv(fd, rsp, sizeof(rsp));
 if (ret  sizeof(rsp)) {
 error_report(failed to get the header, %s, strerror(errno));
-goto out;
+goto err;
 }
 
 /* find the right aio_req from the inflight aio list */
@@ -689,7 +728,7 @@ static void coroutine_fn aio_read_response(void *opaque)
 }
 if (!aio_req) {
 error_report(cannot find aio_req %x, rsp.id);
-goto out;
+goto err;
 }
 
 acb = aio_req-aiocb;
@@ -729,7 +768,7 @@ static void coroutine_fn aio_read_response(void *opaque)
 aio_req-iov_offset, rsp.data_length);
 if (ret  rsp.data_length) {
 error_report(failed to get the data, %s, strerror(errno));
-goto out;
+goto err;
 }
 break;
 case AIOCB_FLUSH_CACHE:
@@ -763,10 +802,9 @@ static void coroutine_fn aio_read_response(void *opaque)
 if (s-inode.vdi_id == oid_to_vid(aio_req-oid)) {
 ret = reload_inode(s, 0, );
 if (ret  0) {
-goto out;
+goto err;
 }
 }
-
 if (is_data_obj(aio_req-oid)) {
 aio_req-oid = vid_to_data_oid(s-inode.vdi_id,
data_oid_to_idx(aio_req-oid));
@@ -794,6 +832,10 @@ static void coroutine_fn aio_read_response(void *opaque)
 }
 out:
 s-co_recv = NULL;
+return;
+err:
+s-co_recv = NULL;
+reconnect_to_sdog(opaque);
 }
 
 static void co_read_response(void *opaque)
@@ -819,7 +861,8 @@ static int aio_flush_request(void *opaque)
 BDRVSheepdogState *s = opaque;
 
 return !QLIST_EMPTY(s-inflight_aio_head) ||
-!QLIST_EMPTY(s-pending_aio_head);
+!QLIST_EMPTY(s-pending_aio_head) ||
+!QLIST_EMPTY(s-failed_aio_head);
 }
 
 /*
@@ -1094,23 +1137,21 @@ static int coroutine_fn 
add_aio_request

[Qemu-devel] [PATCH v2 2/9] iov: handle EOF in iov_send_recv

2013-07-24 Thread MORITA Kazutaka
Without this patch, iov_send_recv() never returns when do_send_recv()
returns zero.

Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 util/iov.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/util/iov.c b/util/iov.c
index cc6e837..f705586 100644
--- a/util/iov.c
+++ b/util/iov.c
@@ -202,6 +202,12 @@ ssize_t iov_send_recv(int sockfd, struct iovec *iov, 
unsigned iov_cnt,
 return -1;
 }
 
+if (ret == 0  !do_send) {
+/* recv returns 0 when the peer has performed an orderly
+ * shutdown. */
+break;
+}
+
 /* Prepare for the next iteration */
 offset += ret;
 total += ret;
-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH v2 8/9] sheepdog: make add_aio_request and send_aioreq void functions

2013-07-24 Thread MORITA Kazutaka
These functions no longer return errors.  We can make them void
functions and simplify the codes.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 66 +++-
 1 file changed, 17 insertions(+), 49 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 303354e..42a30f1 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -611,10 +611,10 @@ static int do_req(int sockfd, SheepdogReq *hdr, void 
*data,
 return srco.ret;
 }
 
-static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
+static void coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
struct iovec *iov, int niov, bool create,
enum AIOCBState aiocb_type);
-static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req);
+static void coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req);
 static int reload_inode(BDRVSheepdogState *s, uint32_t snapid, const char 
*tag);
 static int get_sheep_fd(BDRVSheepdogState *s);
 static void co_write_request(void *opaque);
@@ -640,22 +640,14 @@ static void coroutine_fn 
send_pending_req(BDRVSheepdogState *s, uint64_t oid)
 {
 AIOReq *aio_req;
 SheepdogAIOCB *acb;
-int ret;
 
 while ((aio_req = find_pending_req(s, oid)) != NULL) {
 acb = aio_req-aiocb;
 /* move aio_req from pending list to inflight one */
 QLIST_REMOVE(aio_req, aio_siblings);
 QLIST_INSERT_HEAD(s-inflight_aio_head, aio_req, aio_siblings);
-ret = add_aio_request(s, aio_req, acb-qiov-iov,
-  acb-qiov-niov, false, acb-aiocb_type);
-if (ret  0) {
-error_report(add_aio_request is failed);
-free_aio_req(s, aio_req);
-if (!acb-nr_pending) {
-sd_finish_aiocb(acb);
-}
-}
+add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov, false,
+acb-aiocb_type);
 }
 }
 
@@ -811,11 +803,8 @@ static void coroutine_fn aio_read_response(void *opaque)
 } else {
 aio_req-oid = vid_to_vdi_oid(s-inode.vdi_id);
 }
-ret = resend_aioreq(s, aio_req);
-if (ret == SD_RES_SUCCESS) {
-goto out;
-}
-/* fall through */
+resend_aioreq(s, aio_req);
+goto out;
 default:
 acb-ret = -EIO;
 error_report(%s, sd_strerror(rsp.result));
@@ -1073,7 +1062,7 @@ out:
 return ret;
 }
 
-static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
+static void coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
struct iovec *iov, int niov, bool create,
enum AIOCBState aiocb_type)
 {
@@ -1153,8 +1142,6 @@ out:
 aio_flush_request, s);
 s-co_send = NULL;
 qemu_co_mutex_unlock(s-lock);
-
-return 0;
 }
 
 static int read_write_object(int fd, char *buf, uint64_t oid, int copies,
@@ -1257,7 +1244,7 @@ out:
 return ret;
 }
 
-static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req)
+static void coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req)
 {
 SheepdogAIOCB *acb = aio_req-aiocb;
 bool create = false;
@@ -1282,7 +1269,7 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState 
*s, AIOReq *aio_req)
 dprintf(simultaneous CoW to % PRIx64 \n, aio_req-oid);
 QLIST_REMOVE(aio_req, aio_siblings);
 QLIST_INSERT_HEAD(s-pending_aio_head, aio_req, aio_siblings);
-return SD_RES_SUCCESS;
+return;
 }
 }
 
@@ -1292,13 +1279,13 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState 
*s, AIOReq *aio_req)
 }
 out:
 if (is_data_obj(aio_req-oid)) {
-return add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov,
-   create, acb-aiocb_type);
+add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov, create,
+acb-aiocb_type);
 } else {
 struct iovec iov;
 iov.iov_base = s-inode;
 iov.iov_len = sizeof(s-inode);
-return add_aio_request(s, aio_req, iov, 1, false, AIOCB_WRITE_UDATA);
+add_aio_request(s, aio_req, iov, 1, false, AIOCB_WRITE_UDATA);
 }
 }
 
@@ -1688,7 +1675,6 @@ static int sd_truncate(BlockDriverState *bs, int64_t 
offset)
  */
 static void coroutine_fn sd_write_done(SheepdogAIOCB *acb)
 {
-int ret;
 BDRVSheepdogState *s = acb-common.bs-opaque;
 struct iovec iov;
 AIOReq *aio_req;
@@ -1710,18 +1696,13 @@ static void coroutine_fn sd_write_done(SheepdogAIOCB 
*acb)
 aio_req = alloc_aio_req(s, acb, vid_to_vdi_oid(s-inode.vdi_id),
 data_len, offset, 0, 0, offset);
 QLIST_INSERT_HEAD(s-inflight_aio_head, aio_req, aio_siblings

[Qemu-devel] [PATCH v2 5/9] sheepdog: reload inode outside of resend_aioreq

2013-07-24 Thread MORITA Kazutaka
This prepares for using resend_aioreq() after reconnecting to the
sheepdog server.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 33 +++--
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index f25c7df..cde887b 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -222,6 +222,11 @@ static inline uint64_t data_oid_to_idx(uint64_t oid)
 return oid  (MAX_DATA_OBJS - 1);
 }
 
+static inline uint32_t oid_to_vid(uint64_t oid)
+{
+return (oid  ~VDI_BIT)  VDI_SPACE_SHIFT;
+}
+
 static inline uint64_t vid_to_vdi_oid(uint32_t vid)
 {
 return VDI_BIT | ((uint64_t)vid  VDI_SPACE_SHIFT);
@@ -607,7 +612,7 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState 
*s, AIOReq *aio_req,
struct iovec *iov, int niov, bool create,
enum AIOCBState aiocb_type);
 static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req);
-
+static int reload_inode(BDRVSheepdogState *s, uint32_t snapid, const char 
*tag);
 
 static AIOReq *find_pending_req(BDRVSheepdogState *s, uint64_t oid)
 {
@@ -755,6 +760,19 @@ static void coroutine_fn aio_read_response(void *opaque)
 case SD_RES_SUCCESS:
 break;
 case SD_RES_READONLY:
+if (s-inode.vdi_id == oid_to_vid(aio_req-oid)) {
+ret = reload_inode(s, 0, );
+if (ret  0) {
+goto out;
+}
+}
+
+if (is_data_obj(aio_req-oid)) {
+aio_req-oid = vid_to_data_oid(s-inode.vdi_id,
+   data_oid_to_idx(aio_req-oid));
+} else {
+aio_req-oid = vid_to_vdi_oid(s-inode.vdi_id);
+}
 ret = resend_aioreq(s, aio_req);
 if (ret == SD_RES_SUCCESS) {
 goto out;
@@ -1202,19 +1220,6 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState 
*s, AIOReq *aio_req)
 {
 SheepdogAIOCB *acb = aio_req-aiocb;
 bool create = false;
-int ret;
-
-ret = reload_inode(s, 0, );
-if (ret  0) {
-return ret;
-}
-
-if (is_data_obj(aio_req-oid)) {
-aio_req-oid = vid_to_data_oid(s-inode.vdi_id,
-   data_oid_to_idx(aio_req-oid));
-} else {
-aio_req-oid = vid_to_vdi_oid(s-inode.vdi_id);
-}
 
 /* check whether this request becomes a CoW one */
 if (acb-aiocb_type == AIOCB_WRITE_UDATA  is_data_obj(aio_req-oid)) {
-- 
1.8.1.3.566.gaa39828




Re: [Qemu-devel] [sheepdog] [PATCH v2 0/9] sheepdog: reconnect server after connection failure

2013-07-24 Thread MORITA Kazutaka
At Wed, 24 Jul 2013 16:28:30 +0800,
Liu Yuan wrote:
 
 On Wed, Jul 24, 2013 at 04:56:24PM +0900, MORITA Kazutaka wrote:
  Currently, if a sheepdog server exits, all the connecting VMs need to
  be restarted.  This series implements a feature to reconnect the
  server, and enables us to do online sheepdog upgrade and avoid
  restarting VMs when sheepdog servers crash unexpectedly.
  
 
 It doesn't work on my test. I tried start linux-0.2.img stored in sheepdog
 cluster and then
 
 1. did some buffered writes
 2. restart sheep that this QEMU VM connected to.
 3. $ sync
 
 I got following error:
 
 $ ../qemu/x86_64-softmmu/qemu-system-x86_64 --enable-kvm -m 1024 -hda 
 sheepdog:test
 qemu-system-x86_64: failed to get the header, Resource temporarily unavailable
 qemu-system-x86_64: Failed to connect to socket: Connection refused
 qemu-system-x86_64: Failed to connect to socket: Connection refused
 qemu-system-x86_64: Failed to connect to socket: Connection refused
 qemu-system-x86_64: Failed to connect to socket: Connection refused
 qemu-system-x86_64: Failed to connect to socket: Connection refused
 ...repeat...
 
 QEMU version is master tip

Your sheep daemon looks like unreachable from qemu.  I tried the same
procedure, but couldn't reproduce it.

Is the problem reproducible?  Can you make sure that you can connect
to the sheep daemon from collie while the error message shows up?

Thanks,

Kazutaka



Re: [Qemu-devel] [sheepdog] [PATCH v2 0/9] sheepdog: reconnect server after connection failure

2013-07-24 Thread MORITA Kazutaka
At Thu, 25 Jul 2013 13:25:33 +0800,
Liu Yuan wrote:
 
 Hello Kazutaka,
 
I have two patches fixing the problems I found on my testing and they are
 complementary patches. Please consider sending them on top of your patch set.

Thanks a lot for your comments and patches, but I've already prepared
patches, which would be probably better fixes.  I'll send the v3
series soon.  It'd be appreciated if you would give a review for it.

Thanks,

Kazutaka



[Qemu-devel] [PATCH 09/11] sheepdog: try to reconnect to sheepdog after network error

2013-07-23 Thread MORITA Kazutaka
This introduces a failed request queue and links all the inflight
requests to the list after network error happens.  After QEMU
reconnects to the sheepdog server successfully, the sheepdog block
driver will retry all the requests in the failed queue.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 72 
 1 file changed, 57 insertions(+), 15 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 1173605..cd72927 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -318,8 +318,11 @@ typedef struct BDRVSheepdogState {
 Coroutine *co_recv;
 
 uint32_t aioreq_seq_num;
+
+/* Every aio request must be linked to either of these queues. */
 QLIST_HEAD(inflight_aio_head, AIOReq) inflight_aio_head;
 QLIST_HEAD(pending_aio_head, AIOReq) pending_aio_head;
+QLIST_HEAD(failed_aio_head, AIOReq) failed_aio_head;
 } BDRVSheepdogState;
 
 static const char * sd_strerror(int err)
@@ -669,6 +672,8 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState 
*s, AIOReq *aio_req,
enum AIOCBState aiocb_type);
 static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req);
 static int reload_inode(BDRVSheepdogState *s, uint32_t snapid, const char 
*tag);
+static int get_sheep_fd(BDRVSheepdogState *s);
+static void co_write_request(void *opaque);
 
 static AIOReq *find_pending_req(BDRVSheepdogState *s, uint64_t oid)
 {
@@ -710,6 +715,44 @@ static void coroutine_fn 
send_pending_req(BDRVSheepdogState *s, uint64_t oid)
 }
 }
 
+static coroutine_fn void reconnect_to_sdog(void *opaque)
+{
+BDRVSheepdogState *s = opaque;
+AIOReq *aio_req, *next;
+
+qemu_aio_set_fd_handler(s-fd, NULL, NULL, NULL, NULL);
+close(s-fd);
+s-fd = -1;
+
+/* Wait for outstanding write requests to be completed. */
+while (s-co_send != NULL) {
+co_write_request(opaque);
+}
+
+/* Move all the inflight requests to the failed queue. */
+QLIST_FOREACH_SAFE(aio_req, s-inflight_aio_head, aio_siblings, next) {
+QLIST_REMOVE(aio_req, aio_siblings);
+QLIST_INSERT_HEAD(s-failed_aio_head, aio_req, aio_siblings);
+}
+
+/* Try to reconnect the sheepdog server every one second. */
+while (s-fd  0) {
+s-fd = get_sheep_fd(s);
+if (s-fd  0) {
+dprintf(Wait for connection to be established\n);
+co_aio_sleep_ns(10ULL);
+}
+};
+
+/* Resend all the failed aio requests. */
+while (!QLIST_EMPTY(s-failed_aio_head)) {
+aio_req = QLIST_FIRST(s-failed_aio_head);
+QLIST_REMOVE(aio_req, aio_siblings);
+QLIST_INSERT_HEAD(s-inflight_aio_head, aio_req, aio_siblings);
+resend_aioreq(s, aio_req);
+}
+}
+
 /*
  * Receive responses of the I/O requests.
  *
@@ -726,15 +769,11 @@ static void coroutine_fn aio_read_response(void *opaque)
 SheepdogAIOCB *acb;
 uint64_t idx;
 
-if (QLIST_EMPTY(s-inflight_aio_head)) {
-goto out;
-}
-
 /* read a header */
 ret = qemu_co_recv(fd, rsp, sizeof(rsp));
 if (ret  sizeof(rsp)) {
 error_report(failed to get the header, %s, strerror(errno));
-goto out;
+goto err;
 }
 
 /* find the right aio_req from the inflight aio list */
@@ -745,7 +784,7 @@ static void coroutine_fn aio_read_response(void *opaque)
 }
 if (!aio_req) {
 error_report(cannot find aio_req %x, rsp.id);
-goto out;
+goto err;
 }
 
 acb = aio_req-aiocb;
@@ -785,7 +824,7 @@ static void coroutine_fn aio_read_response(void *opaque)
 aio_req-iov_offset, rsp.data_length);
 if (ret  rsp.data_length) {
 error_report(failed to get the data, %s, strerror(errno));
-goto out;
+goto err;
 }
 break;
 case AIOCB_FLUSH_CACHE:
@@ -819,10 +858,9 @@ static void coroutine_fn aio_read_response(void *opaque)
 if (s-inode.vdi_id == oid_to_vid(aio_req-oid)) {
 ret = reload_inode(s, 0, );
 if (ret  0) {
-goto out;
+goto err;
 }
 }
-
 if (is_data_obj(aio_req-oid)) {
 aio_req-oid = vid_to_data_oid(s-inode.vdi_id,
data_oid_to_idx(aio_req-oid));
@@ -850,6 +888,10 @@ static void coroutine_fn aio_read_response(void *opaque)
 }
 out:
 s-co_recv = NULL;
+return;
+err:
+s-co_recv = NULL;
+reconnect_to_sdog(opaque);
 }
 
 static void co_read_response(void *opaque)
@@ -875,7 +917,8 @@ static int aio_flush_request(void *opaque)
 BDRVSheepdogState *s = opaque;
 
 return !QLIST_EMPTY(s-inflight_aio_head) ||
-!QLIST_EMPTY(s-pending_aio_head);
+!QLIST_EMPTY(s-pending_aio_head) ||
+!QLIST_EMPTY(s-failed_aio_head);
 }
 
 /*
@@ -1150,23 +1193,21 @@ static int coroutine_fn 
add_aio_request

[Qemu-devel] [PATCH 03/11] qemu-sockets: make wait_for_connect be invoked in qemu_aio_wait

2013-07-23 Thread MORITA Kazutaka
This allows us to use inet_nonblocking_connect() and
unix_nonblocking_connect() in block drivers.

qemu-ga needs to link block-obj to resolve dependencies of
qemu_aio_set_fd_handler().

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 Makefile|  4 ++--
 util/qemu-sockets.c | 15 ++-
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/Makefile b/Makefile
index c06bfab..5fe2e0f 100644
--- a/Makefile
+++ b/Makefile
@@ -197,7 +197,7 @@ fsdev/virtfs-proxy-helper$(EXESUF): LIBS += -lcap
 qemu-img-cmds.h: $(SRC_PATH)/qemu-img-cmds.hx
$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h  $  $@,  GEN  
 $@)
 
-qemu-ga$(EXESUF): LIBS = $(LIBS_QGA)
+qemu-ga$(EXESUF): LIBS = $(LIBS_QGA) $(LIBS_TOOLS)
 qemu-ga$(EXESUF): QEMU_CFLAGS += -I qga/qapi-generated
 
 gen-out-type = $(subst .,-,$(suffix $@))
@@ -227,7 +227,7 @@ $(SRC_PATH)/qapi-schema.json 
$(SRC_PATH)/scripts/qapi-commands.py $(qapi-py)
 QGALIB_GEN=$(addprefix qga/qapi-generated/, qga-qapi-types.h qga-qapi-visit.h 
qga-qmp-commands.h)
 $(qga-obj-y) qemu-ga.o: $(QGALIB_GEN)
 
-qemu-ga$(EXESUF): $(qga-obj-y) libqemuutil.a libqemustub.a
+qemu-ga$(EXESUF): $(qga-obj-y) $(block-obj-y) libqemuutil.a libqemustub.a
$(call LINK, $^)
 
 clean:
diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index 095716e..8b21fd1 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -218,6 +218,11 @@ typedef struct ConnectState {
 static int inet_connect_addr(struct addrinfo *addr, bool *in_progress,
  ConnectState *connect_state, Error **errp);
 
+static int return_true(void *opaque)
+{
+return 1;
+}
+
 static void wait_for_connect(void *opaque)
 {
 ConnectState *s = opaque;
@@ -225,7 +230,7 @@ static void wait_for_connect(void *opaque)
 socklen_t valsize = sizeof(val);
 bool in_progress;
 
-qemu_set_fd_handler2(s-fd, NULL, NULL, NULL, NULL);
+qemu_aio_set_fd_handler(s-fd, NULL, NULL, NULL, NULL);
 
 do {
 rc = qemu_getsockopt(s-fd, SOL_SOCKET, SO_ERROR, val, valsize);
@@ -288,8 +293,8 @@ static int inet_connect_addr(struct addrinfo *addr, bool 
*in_progress,
 
 if (connect_state != NULL  QEMU_SOCKET_RC_INPROGRESS(rc)) {
 connect_state-fd = sock;
-qemu_set_fd_handler2(sock, NULL, NULL, wait_for_connect,
- connect_state);
+qemu_aio_set_fd_handler(sock, NULL, wait_for_connect, return_true,
+connect_state);
 *in_progress = true;
 } else if (rc  0) {
 error_set_errno(errp, errno, QERR_SOCKET_CONNECT_FAILED);
@@ -749,8 +754,8 @@ int unix_connect_opts(QemuOpts *opts, Error **errp,
 
 if (connect_state != NULL  QEMU_SOCKET_RC_INPROGRESS(rc)) {
 connect_state-fd = sock;
-qemu_set_fd_handler2(sock, NULL, NULL, wait_for_connect,
- connect_state);
+qemu_aio_set_fd_handler(sock, NULL, wait_for_connect, return_true,
+connect_state);
 return sock;
 } else if (rc = 0) {
 /* non blocking socket immediate success, call callback */
-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH 06/11] sheepdog: handle vdi objects in resend_aio_req

2013-07-23 Thread MORITA Kazutaka
The current resend_aio_req() doesn't work when the request is against
vdi objects.  This fixes the problem.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 567f52e..018eab2 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -1265,11 +1265,15 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState 
*s, AIOReq *aio_req)
 return ret;
 }
 
-aio_req-oid = vid_to_data_oid(s-inode.vdi_id,
-   data_oid_to_idx(aio_req-oid));
+if (is_data_obj(aio_req-oid)) {
+aio_req-oid = vid_to_data_oid(s-inode.vdi_id,
+   data_oid_to_idx(aio_req-oid));
+} else {
+aio_req-oid = vid_to_vdi_oid(s-inode.vdi_id);
+}
 
 /* check whether this request becomes a CoW one */
-if (acb-aiocb_type == AIOCB_WRITE_UDATA) {
+if (acb-aiocb_type == AIOCB_WRITE_UDATA  is_data_obj(aio_req-oid)) {
 int idx = data_oid_to_idx(aio_req-oid);
 AIOReq *areq;
 
@@ -1297,8 +1301,15 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState 
*s, AIOReq *aio_req)
 create = true;
 }
 out:
-return add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov,
-   create, acb-aiocb_type);
+if (is_data_obj(aio_req-oid)) {
+return add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov,
+   create, acb-aiocb_type);
+} else {
+struct iovec iov;
+iov.iov_base = s-inode;
+iov.iov_len = sizeof(s-inode);
+return add_aio_request(s, aio_req, iov, 1, false, AIOCB_WRITE_UDATA);
+}
 }
 
 /* TODO Convert to fine grained options */
-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH 11/11] sheepdog: cancel aio requests if possible

2013-07-23 Thread MORITA Kazutaka
This patch tries to cancel aio requests in pending queue and failed
queue.  When the sheepdog driver cannot cancel the requests, it waits
for them to be completed.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 70 +++-
 1 file changed, 59 insertions(+), 11 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 8a6c432..43be479 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -294,7 +294,8 @@ struct SheepdogAIOCB {
 Coroutine *coroutine;
 void (*aio_done_func)(SheepdogAIOCB *);
 
-bool canceled;
+bool cancelable;
+bool *finished;
 int nr_pending;
 };
 
@@ -411,6 +412,7 @@ static inline void free_aio_req(BDRVSheepdogState *s, 
AIOReq *aio_req)
 {
 SheepdogAIOCB *acb = aio_req-aiocb;
 
+acb-cancelable = false;
 QLIST_REMOVE(aio_req, aio_siblings);
 g_free(aio_req);
 
@@ -419,23 +421,68 @@ static inline void free_aio_req(BDRVSheepdogState *s, 
AIOReq *aio_req)
 
 static void coroutine_fn sd_finish_aiocb(SheepdogAIOCB *acb)
 {
-if (!acb-canceled) {
-qemu_coroutine_enter(acb-coroutine, NULL);
+qemu_coroutine_enter(acb-coroutine, NULL);
+if (acb-finished) {
+*acb-finished = true;
 }
 qemu_aio_release(acb);
 }
 
+/*
+ * Check whether the specified acb can be canceled
+ *
+ * We can cancel aio when any request belonging to the acb is:
+ *  - Not processed by the sheepdog server.
+ *  - Not linked to the inflight queue.
+ */
+static bool sd_acb_cancelable(const SheepdogAIOCB *acb)
+{
+BDRVSheepdogState *s = acb-common.bs-opaque;
+AIOReq *aioreq;
+
+if (!acb-cancelable) {
+return false;
+}
+
+QLIST_FOREACH(aioreq, s-inflight_aio_head, aio_siblings) {
+if (aioreq-aiocb == acb) {
+return false;
+}
+}
+
+return false;
+}
+
 static void sd_aio_cancel(BlockDriverAIOCB *blockacb)
 {
 SheepdogAIOCB *acb = (SheepdogAIOCB *)blockacb;
+BDRVSheepdogState *s = acb-common.bs-opaque;
+AIOReq *aioreq, *next;
+bool finished = false;
+
+acb-finished = finished;
+while (!finished) {
+if (sd_acb_cancelable(acb)) {
+/* Remove outstanding requests from pending and failed queues.  */
+QLIST_FOREACH_SAFE(aioreq, s-pending_aio_head, aio_siblings,
+   next) {
+if (aioreq-aiocb == acb) {
+free_aio_req(s, aioreq);
+}
+}
+QLIST_FOREACH_SAFE(aioreq, s-failed_aio_head, aio_siblings,
+   next) {
+if (aioreq-aiocb == acb) {
+free_aio_req(s, aioreq);
+}
+}
 
-/*
- * Sheepdog cannot cancel the requests which are already sent to
- * the servers, so we just complete the request with -EIO here.
- */
-acb-ret = -EIO;
-qemu_coroutine_enter(acb-coroutine, NULL);
-acb-canceled = true;
+assert(acb-nr_pending == 0);
+sd_finish_aiocb(acb);
+return;
+}
+qemu_aio_wait();
+}
 }
 
 static const AIOCBInfo sd_aiocb_info = {
@@ -456,7 +503,8 @@ static SheepdogAIOCB *sd_aio_setup(BlockDriverState *bs, 
QEMUIOVector *qiov,
 acb-nb_sectors = nb_sectors;
 
 acb-aio_done_func = NULL;
-acb-canceled = false;
+acb-cancelable = true;
+acb-finished = NULL;
 acb-coroutine = qemu_coroutine_self();
 acb-ret = 0;
 acb-nr_pending = 0;
-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH 08/11] coroutine: add co_aio_sleep_ns() to allow sleep in block drivers

2013-07-23 Thread MORITA Kazutaka
This helper function behaves similarly to co_sleep_ns(), but the
sleeping coroutine will be resumed when using qemu_aio_wait().

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 include/block/coroutine.h |  8 
 qemu-coroutine-sleep.c| 47 +++
 2 files changed, 55 insertions(+)

diff --git a/include/block/coroutine.h b/include/block/coroutine.h
index 377805a..23ea6e9 100644
--- a/include/block/coroutine.h
+++ b/include/block/coroutine.h
@@ -210,6 +210,14 @@ void qemu_co_rwlock_unlock(CoRwlock *lock);
 void coroutine_fn co_sleep_ns(QEMUClock *clock, int64_t ns);
 
 /**
+ * Yield the coroutine for a given duration
+ *
+ * Behaves similarly to co_sleep_ns(), but the sleeping coroutine will be
+ * resumed when using qemu_aio_wait().
+ */
+void coroutine_fn co_aio_sleep_ns(int64_t ns);
+
+/**
  * Yield until a file descriptor becomes readable
  *
  * Note that this function clobbers the handlers for the file descriptor.
diff --git a/qemu-coroutine-sleep.c b/qemu-coroutine-sleep.c
index 169ce5c..3955347 100644
--- a/qemu-coroutine-sleep.c
+++ b/qemu-coroutine-sleep.c
@@ -13,6 +13,7 @@
 
 #include block/coroutine.h
 #include qemu/timer.h
+#include qemu/thread.h
 
 typedef struct CoSleepCB {
 QEMUTimer *ts;
@@ -37,3 +38,49 @@ void coroutine_fn co_sleep_ns(QEMUClock *clock, int64_t ns)
 qemu_del_timer(sleep_cb.ts);
 qemu_free_timer(sleep_cb.ts);
 }
+
+typedef struct CoAioSleepCB {
+QEMUBH *bh;
+int64_t ns;
+Coroutine *co;
+} CoAioSleepCB;
+
+static void co_aio_sleep_cb(void *opaque)
+{
+CoAioSleepCB *aio_sleep_cb = opaque;
+
+qemu_coroutine_enter(aio_sleep_cb-co, NULL);
+}
+
+static void *sleep_thread(void *opaque)
+{
+CoAioSleepCB *aio_sleep_cb = opaque;
+struct timespec req = {
+.tv_sec = aio_sleep_cb-ns / 10,
+.tv_nsec = aio_sleep_cb-ns % 10,
+};
+struct timespec rem;
+
+while (nanosleep(req, rem)  0  errno == EINTR) {
+req = rem;
+}
+
+qemu_bh_schedule(aio_sleep_cb-bh);
+
+return NULL;
+}
+
+void coroutine_fn co_aio_sleep_ns(int64_t ns)
+{
+CoAioSleepCB aio_sleep_cb = {
+.ns = ns,
+.co = qemu_coroutine_self(),
+};
+QemuThread thread;
+
+aio_sleep_cb.bh = qemu_bh_new(co_aio_sleep_cb, aio_sleep_cb);
+qemu_thread_create(thread, sleep_thread, aio_sleep_cb,
+   QEMU_THREAD_DETACHED);
+qemu_coroutine_yield();
+qemu_bh_delete(aio_sleep_cb.bh);
+}
-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH 07/11] sheepdog: reload inode outside of resend_aioreq

2013-07-23 Thread MORITA Kazutaka
This prepares for using resend_aioreq() after reconnecting to the
sheepdog server.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 33 +++--
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 018eab2..1173605 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -222,6 +222,11 @@ static inline uint64_t data_oid_to_idx(uint64_t oid)
 return oid  (MAX_DATA_OBJS - 1);
 }
 
+static inline uint32_t oid_to_vid(uint64_t oid)
+{
+return (oid  ~VDI_BIT)  VDI_SPACE_SHIFT;
+}
+
 static inline uint64_t vid_to_vdi_oid(uint32_t vid)
 {
 return VDI_BIT | ((uint64_t)vid  VDI_SPACE_SHIFT);
@@ -663,7 +668,7 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState 
*s, AIOReq *aio_req,
struct iovec *iov, int niov, bool create,
enum AIOCBState aiocb_type);
 static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req);
-
+static int reload_inode(BDRVSheepdogState *s, uint32_t snapid, const char 
*tag);
 
 static AIOReq *find_pending_req(BDRVSheepdogState *s, uint64_t oid)
 {
@@ -811,6 +816,19 @@ static void coroutine_fn aio_read_response(void *opaque)
 case SD_RES_SUCCESS:
 break;
 case SD_RES_READONLY:
+if (s-inode.vdi_id == oid_to_vid(aio_req-oid)) {
+ret = reload_inode(s, 0, );
+if (ret  0) {
+goto out;
+}
+}
+
+if (is_data_obj(aio_req-oid)) {
+aio_req-oid = vid_to_data_oid(s-inode.vdi_id,
+   data_oid_to_idx(aio_req-oid));
+} else {
+aio_req-oid = vid_to_vdi_oid(s-inode.vdi_id);
+}
 ret = resend_aioreq(s, aio_req);
 if (ret == SD_RES_SUCCESS) {
 goto out;
@@ -1258,19 +1276,6 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState 
*s, AIOReq *aio_req)
 {
 SheepdogAIOCB *acb = aio_req-aiocb;
 bool create = false;
-int ret;
-
-ret = reload_inode(s, 0, );
-if (ret  0) {
-return ret;
-}
-
-if (is_data_obj(aio_req-oid)) {
-aio_req-oid = vid_to_data_oid(s-inode.vdi_id,
-   data_oid_to_idx(aio_req-oid));
-} else {
-aio_req-oid = vid_to_vdi_oid(s-inode.vdi_id);
-}
 
 /* check whether this request becomes a CoW one */
 if (acb-aiocb_type == AIOCB_WRITE_UDATA  is_data_obj(aio_req-oid)) {
-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH 04/11] sheepdog: make connect nonblocking

2013-07-23 Thread MORITA Kazutaka
This uses nonblocking connect functions to connect to the sheepdog
server.  The connect operation is done in a coroutine function and it
will be yielded until the created socked is ready for IO.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 70 ++--
 1 file changed, 63 insertions(+), 7 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 6a41ad9..6f5ede4 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -455,18 +455,51 @@ static SheepdogAIOCB *sd_aio_setup(BlockDriverState *bs, 
QEMUIOVector *qiov,
 return acb;
 }
 
-static int connect_to_sdog(BDRVSheepdogState *s)
-{
+typedef struct SheepdogConnectCo {
+BDRVSheepdogState *bs;
+Coroutine *co;
 int fd;
+bool finished;
+} SheepdogConnectCo;
+
+static void sd_connect_completed(int fd, void *opaque)
+{
+SheepdogConnectCo *scco = opaque;
+
+if (fd  0) {
+int val, rc;
+socklen_t valsize = sizeof(val);
+
+do {
+rc = qemu_getsockopt(scco-fd, SOL_SOCKET, SO_ERROR, val,
+ valsize);
+} while (rc == -1  socket_error() == EINTR);
+
+scco-fd = rc  0 ? -errno : -val;
+}
+
+scco-finished = true;
+
+if (scco-co != NULL) {
+qemu_coroutine_enter(scco-co, NULL);
+}
+}
+
+static coroutine_fn void co_connect_to_sdog(void *opaque)
+{
+SheepdogConnectCo *scco = opaque;
+BDRVSheepdogState *s = scco-bs;
 Error *err = NULL;
 
 if (s-is_unix) {
-fd = unix_connect(s-host_spec, err);
+scco-fd = unix_nonblocking_connect(s-host_spec, sd_connect_completed,
+opaque, err);
 } else {
-fd = inet_connect(s-host_spec, err);
+scco-fd = inet_nonblocking_connect(s-host_spec, sd_connect_completed,
+opaque, err);
 
 if (err == NULL) {
-int ret = socket_set_nodelay(fd);
+int ret = socket_set_nodelay(scco-fd);
 if (ret  0) {
 error_report(%s, strerror(errno));
 }
@@ -476,11 +509,34 @@ static int connect_to_sdog(BDRVSheepdogState *s)
 if (err != NULL) {
 qerror_report_err(err);
 error_free(err);
+}
+
+if (!scco-finished) {
+/* wait for connect to finish */
+scco-co = qemu_coroutine_self();
+qemu_coroutine_yield();
+}
+}
+
+static int connect_to_sdog(BDRVSheepdogState *s)
+{
+Coroutine *co;
+SheepdogConnectCo scco = {
+.bs = s,
+.finished = false,
+};
+
+if (qemu_in_coroutine()) {
+co_connect_to_sdog(scco);
 } else {
-qemu_set_nonblock(fd);
+co = qemu_coroutine_create(co_connect_to_sdog);
+qemu_coroutine_enter(co, scco);
+while (!scco.finished) {
+qemu_aio_wait();
+}
 }
 
-return fd;
+return scco.fd;
 }
 
 static coroutine_fn int send_co_req(int sockfd, SheepdogReq *hdr, void *data,
-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH 10/11] sheepdog: make add_aio_request and send_aioreq void functions

2013-07-23 Thread MORITA Kazutaka
These functions no longer return errors.  We can make them void
functions and simplify the codes.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 66 +++-
 1 file changed, 17 insertions(+), 49 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index cd72927..8a6c432 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -667,10 +667,10 @@ static int do_req(int sockfd, SheepdogReq *hdr, void 
*data,
 return srco.ret;
 }
 
-static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
+static void coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
struct iovec *iov, int niov, bool create,
enum AIOCBState aiocb_type);
-static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req);
+static void coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req);
 static int reload_inode(BDRVSheepdogState *s, uint32_t snapid, const char 
*tag);
 static int get_sheep_fd(BDRVSheepdogState *s);
 static void co_write_request(void *opaque);
@@ -696,22 +696,14 @@ static void coroutine_fn 
send_pending_req(BDRVSheepdogState *s, uint64_t oid)
 {
 AIOReq *aio_req;
 SheepdogAIOCB *acb;
-int ret;
 
 while ((aio_req = find_pending_req(s, oid)) != NULL) {
 acb = aio_req-aiocb;
 /* move aio_req from pending list to inflight one */
 QLIST_REMOVE(aio_req, aio_siblings);
 QLIST_INSERT_HEAD(s-inflight_aio_head, aio_req, aio_siblings);
-ret = add_aio_request(s, aio_req, acb-qiov-iov,
-  acb-qiov-niov, false, acb-aiocb_type);
-if (ret  0) {
-error_report(add_aio_request is failed);
-free_aio_req(s, aio_req);
-if (!acb-nr_pending) {
-sd_finish_aiocb(acb);
-}
-}
+add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov, false,
+acb-aiocb_type);
 }
 }
 
@@ -867,11 +859,8 @@ static void coroutine_fn aio_read_response(void *opaque)
 } else {
 aio_req-oid = vid_to_vdi_oid(s-inode.vdi_id);
 }
-ret = resend_aioreq(s, aio_req);
-if (ret == SD_RES_SUCCESS) {
-goto out;
-}
-/* fall through */
+resend_aioreq(s, aio_req);
+goto out;
 default:
 acb-ret = -EIO;
 error_report(%s, sd_strerror(rsp.result));
@@ -1129,7 +1118,7 @@ out:
 return ret;
 }
 
-static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
+static void coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
struct iovec *iov, int niov, bool create,
enum AIOCBState aiocb_type)
 {
@@ -1209,8 +1198,6 @@ out:
 aio_flush_request, s);
 s-co_send = NULL;
 qemu_co_mutex_unlock(s-lock);
-
-return 0;
 }
 
 static int read_write_object(int fd, char *buf, uint64_t oid, int copies,
@@ -1313,7 +1300,7 @@ out:
 return ret;
 }
 
-static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req)
+static void coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req)
 {
 SheepdogAIOCB *acb = aio_req-aiocb;
 bool create = false;
@@ -1338,7 +1325,7 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState 
*s, AIOReq *aio_req)
 dprintf(simultaneous CoW to % PRIx64 \n, aio_req-oid);
 QLIST_REMOVE(aio_req, aio_siblings);
 QLIST_INSERT_HEAD(s-pending_aio_head, aio_req, aio_siblings);
-return SD_RES_SUCCESS;
+return;
 }
 }
 
@@ -1348,13 +1335,13 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState 
*s, AIOReq *aio_req)
 }
 out:
 if (is_data_obj(aio_req-oid)) {
-return add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov,
-   create, acb-aiocb_type);
+add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov, create,
+acb-aiocb_type);
 } else {
 struct iovec iov;
 iov.iov_base = s-inode;
 iov.iov_len = sizeof(s-inode);
-return add_aio_request(s, aio_req, iov, 1, false, AIOCB_WRITE_UDATA);
+add_aio_request(s, aio_req, iov, 1, false, AIOCB_WRITE_UDATA);
 }
 }
 
@@ -1744,7 +1731,6 @@ static int sd_truncate(BlockDriverState *bs, int64_t 
offset)
  */
 static void coroutine_fn sd_write_done(SheepdogAIOCB *acb)
 {
-int ret;
 BDRVSheepdogState *s = acb-common.bs-opaque;
 struct iovec iov;
 AIOReq *aio_req;
@@ -1766,18 +1752,13 @@ static void coroutine_fn sd_write_done(SheepdogAIOCB 
*acb)
 aio_req = alloc_aio_req(s, acb, vid_to_vdi_oid(s-inode.vdi_id),
 data_len, offset, 0, 0, offset);
 QLIST_INSERT_HEAD(s-inflight_aio_head, aio_req, aio_siblings

[Qemu-devel] [PATCH 02/11] iov: handle EOF in iov_send_recv

2013-07-23 Thread MORITA Kazutaka
Without this patch, iov_send_recv() never returns when do_send_recv()
returns zero.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 util/iov.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/util/iov.c b/util/iov.c
index cc6e837..f705586 100644
--- a/util/iov.c
+++ b/util/iov.c
@@ -202,6 +202,12 @@ ssize_t iov_send_recv(int sockfd, struct iovec *iov, 
unsigned iov_cnt,
 return -1;
 }
 
+if (ret == 0  !do_send) {
+/* recv returns 0 when the peer has performed an orderly
+ * shutdown. */
+break;
+}
+
 /* Prepare for the next iteration */
 offset += ret;
 total += ret;
-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH 05/11] sheepdog: check return values of qemu_co_recv/send correctly

2013-07-23 Thread MORITA Kazutaka
qemu_co_recv/send return shorter length on error.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 6f5ede4..567f52e 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -727,7 +727,7 @@ static void coroutine_fn aio_read_response(void *opaque)
 
 /* read a header */
 ret = qemu_co_recv(fd, rsp, sizeof(rsp));
-if (ret  0) {
+if (ret  sizeof(rsp)) {
 error_report(failed to get the header, %s, strerror(errno));
 goto out;
 }
@@ -778,7 +778,7 @@ static void coroutine_fn aio_read_response(void *opaque)
 case AIOCB_READ_UDATA:
 ret = qemu_co_recvv(fd, acb-qiov-iov, acb-qiov-niov,
 aio_req-iov_offset, rsp.data_length);
-if (ret  0) {
+if (ret  rsp.data_length) {
 error_report(failed to get the data, %s, strerror(errno));
 goto out;
 }
@@ -1131,7 +1131,7 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState 
*s, AIOReq *aio_req,
 
 /* send a header */
 ret = qemu_co_send(s-fd, hdr, sizeof(hdr));
-if (ret  0) {
+if (ret  sizeof(hdr)) {
 qemu_co_mutex_unlock(s-lock);
 error_report(failed to send a req, %s, strerror(errno));
 return -errno;
@@ -1139,7 +1139,7 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState 
*s, AIOReq *aio_req,
 
 if (wlen) {
 ret = qemu_co_sendv(s-fd, iov, niov, aio_req-iov_offset, wlen);
-if (ret  0) {
+if (ret  wlen) {
 qemu_co_mutex_unlock(s-lock);
 error_report(failed to send a data, %s, strerror(errno));
 return -errno;
-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH 01/11] ignore SIGPIPE in qemu-img and qemu-io

2013-07-23 Thread MORITA Kazutaka
This prevents the tools from being stopped when they write data to a
closed connection in the other side.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 qemu-img.c | 4 
 qemu-io.c  | 4 
 2 files changed, 8 insertions(+)

diff --git a/qemu-img.c b/qemu-img.c
index c55ca5c..919d464 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2319,6 +2319,10 @@ int main(int argc, char **argv)
 const img_cmd_t *cmd;
 const char *cmdname;
 
+#ifdef CONFIG_POSIX
+signal(SIGPIPE, SIG_IGN);
+#endif
+
 error_set_progname(argv[0]);
 
 qemu_init_main_loop();
diff --git a/qemu-io.c b/qemu-io.c
index cb9def5..d54dc86 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -335,6 +335,10 @@ int main(int argc, char **argv)
 int opt_index = 0;
 int flags = BDRV_O_UNMAP;
 
+#ifdef CONFIG_POSIX
+signal(SIGPIPE, SIG_IGN);
+#endif
+
 progname = basename(argv[0]);
 
 while ((c = getopt_long(argc, argv, sopt, lopt, opt_index)) != -1) {
-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH 00/11] sheepdog: reconnect server after connection failure

2013-07-23 Thread MORITA Kazutaka
Currently, if a sheepdog server exits, all the connecting VMs need to
be restarted.  This series implements a feature to reconnect the
server, and enables us to do online sheepdog upgrade and avoid
restarting VMs when sheepdog servers crash unexpectedly.

MORITA Kazutaka (11):
  ignore SIGPIPE in qemu-img and qemu-io
  iov: handle eof in iov_send_recv
  qemu-sockets: make wait_for_connect be invoked in qemu_aio_wait
  sheepdog: make connect nonblocking
  sheepdog: check return values of qemu_co_recv/send correctly
  sheepdog: handle vdi objects in resend_aio_req
  sheepdog: reload inode outside of resend_aioreq
  coroutine: add co_aio_sleep_ns() to allow sleep in block drivers
  sheepdog: try to reconnect to sheepdog after network error
  sheepdog: make add_aio_request and send_aioreq void functions
  sheepdog: cancel aio requests if possible

 Makefile  |   4 +-
 block/sheepdog.c  | 314 --
 include/block/coroutine.h |   8 ++
 qemu-coroutine-sleep.c|  47 +++
 qemu-img.c|   4 +
 qemu-io.c |   4 +
 util/iov.c|   6 +
 util/qemu-sockets.c   |  15 ++-
 8 files changed, 303 insertions(+), 99 deletions(-)

-- 
1.8.1.3.566.gaa39828




Re: [Qemu-devel] block: Review of .has_zero_init use

2013-06-25 Thread MORITA Kazutaka
At Tue, 25 Jun 2013 13:39:11 +0200,
Kevin Wolf wrote:
 
 Hi all,
 
 while discussing some iscsi patches with Peter, we came to have a look
 at which block drivers implement has_zero_init() to return 0, and which
 don't (returning 1 is the default).
 
 The meaning of this value is that if has_zero_init != 0, after
 bdrv_create() one can assume that the whole image would read back as all
 zero. For example, this is true for the traditional image files, but not
 for host_device, where the block device isn't really created during
 bdrv_create() but only checked for size.
 
 The full list of protocol level block drivers is:
 
 * blkdebug  - doesn't have bdrv_create
 * blkverify - doesn't have bdrv_create
 * curl  - doesn't have bdrv_create
 * gluster   - currently has_zero_init = 1 (is this correct?)
 * iscsi - has_zero_init = 0
 * nbd   - doesn't have bdrv_create
 * file  - has_zero_init = 1
 * host_*- has_zero_init = 0
 * rbd   - currently has_zero_init = 1 (is this correct?)
 * sheepdog  - currently has_zero_init = 1 (is this correct?)
 * ssh   - currently has_zero_init = 1 (is this correct?)
 * vvfat - doesn't have bdrv_create
 
 Can you please review for the gluster, rbd, sheepdog and ssh driver
 whether it's safe to assume that the image reads back as zeros after
 bdrv_create?

It's safe for Sheepdog.  Sheepdog uses ftruncate or fallocate to
create data blocks and it is guaranteed that the space will be
initialized to zero.

Thanks,

Kazutaka



Re: [Qemu-devel] block: Review of .has_zero_init use

2013-06-25 Thread MORITA Kazutaka
At Tue, 25 Jun 2013 15:20:18 +0200,
Kevin Wolf wrote:
 
 Am 25.06.2013 um 15:11 hat MORITA Kazutaka geschrieben:
  At Tue, 25 Jun 2013 13:39:11 +0200,
  Kevin Wolf wrote:
   
   Hi all,
   
   while discussing some iscsi patches with Peter, we came to have a look
   at which block drivers implement has_zero_init() to return 0, and which
   don't (returning 1 is the default).
   
   The meaning of this value is that if has_zero_init != 0, after
   bdrv_create() one can assume that the whole image would read back as all
   zero. For example, this is true for the traditional image files, but not
   for host_device, where the block device isn't really created during
   bdrv_create() but only checked for size.
   
   The full list of protocol level block drivers is:
   
   * blkdebug  - doesn't have bdrv_create
   * blkverify - doesn't have bdrv_create
   * curl  - doesn't have bdrv_create
   * gluster   - currently has_zero_init = 1 (is this correct?)
   * iscsi - has_zero_init = 0
   * nbd   - doesn't have bdrv_create
   * file  - has_zero_init = 1
   * host_*- has_zero_init = 0
   * rbd   - currently has_zero_init = 1 (is this correct?)
   * sheepdog  - currently has_zero_init = 1 (is this correct?)
   * ssh   - currently has_zero_init = 1 (is this correct?)
   * vvfat - doesn't have bdrv_create
   
   Can you please review for the gluster, rbd, sheepdog and ssh driver
   whether it's safe to assume that the image reads back as zeros after
   bdrv_create?
  
  It's safe for Sheepdog.  Sheepdog uses ftruncate or fallocate to
  create data blocks and it is guaranteed that the space will be
  initialized to zero.
 
 Note that ftruncate/fallocate don't zero-initialise block devices, only
 regular files. Not sure if you can use block devices to back Sheepdog
 images?

We cannot do that.  Sheepdog heavily relies on filesystem features and
we will not support block devices for the backend of Sheepdog.

Thanks,

Kazutaka



Re: [Qemu-devel] [sheepdog] [PATCH v2] sheepdog: fix loadvm operation

2013-04-25 Thread MORITA Kazutaka
At Thu, 25 Apr 2013 12:12:58 +0800,
Liu Yuan wrote:
 
 On 04/25/2013 12:47 AM, Liu Yuan wrote:
  - don't break old behavior if we boot up on the snapshot by using 
  s-reverted
 to indicate if we delete working VDI successfully
 
 If we implement 'boot from snapshot' == loadvm snapshot, we don't need
 s-reverted. What do you think, Kazutaka? With this idea, qemu -hda
 sheepdog:test:id will be virtually the same as qemu -hda sheepdog:test
 -loadvm id.
 
 I don't understand the usecase for snapshting current workding VDI and
 then restore to the specified snapshot.

Are you suggesting that booting from snapshot should discard the
current state like savevm?  If yes, it looks okay to me.

Thanks,

Kazutaka



Re: [Qemu-devel] [sheepdog] [PATCH v3] sheepdog: fix loadvm operation

2013-04-25 Thread MORITA Kazutaka
At Thu, 25 Apr 2013 16:42:34 +0800,
Liu Yuan wrote:
 
 +/* Delete current working VDI on the snapshot chain */
 +static bool sd_delete(BDRVSheepdogState *s)
 +{
 +unsigned int wlen = SD_MAX_VDI_LEN, rlen = 0;
 +SheepdogVdiReq hdr = {
 +.opcode = SD_OP_DEL_VDI,
 +.vdi_id = s-inode.vdi_id,
 +.data_length = wlen,
 +.flags = SD_FLAG_CMD_WRITE,
 +};
 +SheepdogVdiRsp *rsp = (SheepdogVdiRsp *)hdr;
 +int fd, ret;
 +
 +fd = connect_to_sdog(s);
 +if (fd  0) {
 +return false;
 +}
 +
 +ret = do_req(fd, (SheepdogReq *)hdr, s-name, wlen, rlen);
 +closesocket(fd);
 +if (ret || (rsp-result != SD_RES_SUCCESS 
 +rsp-result != SD_RES_NO_VDI)) {
 +error_report(%s, %s, sd_strerror(rsp-result), s-name);
 +return false;
 +}
 +
 +return true;
 +}

Isn't it better to show an error message when the result code is
SD_RES_NO_VDI?

Thanks,

Kazutaka



Re: [Qemu-devel] [sheepdog] [PATCH v3] sheepdog: fix loadvm operation

2013-04-25 Thread MORITA Kazutaka
At Thu, 25 Apr 2013 17:44:41 +0800,
Liu Yuan wrote:
 
 On 04/25/2013 05:40 PM, MORITA Kazutaka wrote:
  Isn't it better to show an error message when the result code is
  SD_RES_NO_VDI?
 
 This information isn't useful even for debugging, what it for?

The block driver tries to delete the vdi, but the sheepdog servers
return No such vdi - I thought that something goes wrong in this
case.

What's the scenario where the sheepdog servers return SD_RES_NO_VDI?
Can we ignore it safely?

Thanks,

Kazutaka



[Qemu-devel] [PATCH 2/3] sheepdog: add SD_RES_READONLY result code

2013-04-25 Thread MORITA Kazutaka
Sheepdog returns SD_RES_READONLY when qemu sends write requests to the
snapshot vdi.  This adds the result code and makes sd_strerror() print
its error reason.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 4326664..f4e7204 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -68,6 +68,7 @@
 #define SD_RES_WAIT_FOR_JOIN0x17 /* Waiting for other nodes joining */
 #define SD_RES_JOIN_FAILED   0x18 /* Target node had failed to join sheepdog */
 #define SD_RES_HALT  0x19 /* Sheepdog is stopped serving IO request */
+#define SD_RES_READONLY  0x1A /* Object is read-only */
 
 /*
  * Object ID rules
@@ -349,6 +350,7 @@ static const char * sd_strerror(int err)
 {SD_RES_WAIT_FOR_JOIN, Sheepdog is waiting for other nodes joining},
 {SD_RES_JOIN_FAILED, Target node had failed to join sheepdog},
 {SD_RES_HALT, Sheepdog is stopped serving IO request},
+{SD_RES_READONLY, Object is read-only},
 };
 
 for (i = 0; i  ARRAY_SIZE(errors); ++i) {
-- 
1.7.2.5




[Qemu-devel] [PATCH 1/3] sheepdog: cleanup find_vdi_name

2013-04-25 Thread MORITA Kazutaka
This makes 'filename' and 'tag' constant variables, and renames
'for_snapshot' to 'lock' to clear how it works.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c |   15 ---
 1 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 9f30a87..4326664 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -941,8 +941,9 @@ static int parse_vdiname(BDRVSheepdogState *s, const char 
*filename,
 return ret;
 }
 
-static int find_vdi_name(BDRVSheepdogState *s, char *filename, uint32_t snapid,
- char *tag, uint32_t *vid, int for_snapshot)
+static int find_vdi_name(BDRVSheepdogState *s, const char *filename,
+ uint32_t snapid, const char *tag, uint32_t *vid,
+ bool lock)
 {
 int ret, fd;
 SheepdogVdiReq hdr;
@@ -963,10 +964,10 @@ static int find_vdi_name(BDRVSheepdogState *s, char 
*filename, uint32_t snapid,
 strncpy(buf + SD_MAX_VDI_LEN, tag, SD_MAX_VDI_TAG_LEN);
 
 memset(hdr, 0, sizeof(hdr));
-if (for_snapshot) {
-hdr.opcode = SD_OP_GET_VDI_INFO;
-} else {
+if (lock) {
 hdr.opcode = SD_OP_LOCK_VDI;
+} else {
+hdr.opcode = SD_OP_GET_VDI_INFO;
 }
 wlen = SD_MAX_VDI_LEN + SD_MAX_VDI_TAG_LEN;
 hdr.proto_ver = SD_PROTO_VER;
@@ -1205,7 +1206,7 @@ static int sd_open(BlockDriverState *bs, QDict *options, 
int flags)
 goto out;
 }
 
-ret = find_vdi_name(s, vdi, snapid, tag, vid, 0);
+ret = find_vdi_name(s, vdi, snapid, tag, vid, true);
 if (ret) {
 goto out;
 }
@@ -1921,7 +1922,7 @@ static int sd_snapshot_goto(BlockDriverState *bs, const 
char *snapshot_id)
 pstrcpy(tag, sizeof(tag), s-name);
 }
 
-ret = find_vdi_name(s, vdi, snapid, tag, vid, 1);
+ret = find_vdi_name(s, vdi, snapid, tag, vid, false);
 if (ret) {
 error_report(Failed to find_vdi_name);
 goto out;
-- 
1.7.2.5




[Qemu-devel] [PATCH 3/3] sheepdog: resend write requests when SD_RES_READONLY is received

2013-04-25 Thread MORITA Kazutaka
When a snapshot is taken from out side of qemu (e.g. qemu-img
snapshot), write requests to the current vdi return SD_RES_READONLY.
In this case, the sheepdog block driver needs to update the current
inode to the latest one and resend the write requests.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c |   98 +-
 1 files changed, 97 insertions(+), 1 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index f4e7204..3338cd1 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -605,6 +605,7 @@ static int do_req(int sockfd, SheepdogReq *hdr, void *data,
 static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
struct iovec *iov, int niov, bool create,
enum AIOCBState aiocb_type);
+static int resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req);
 
 
 static AIOReq *find_pending_req(BDRVSheepdogState *s, uint64_t oid)
@@ -749,9 +750,19 @@ static void coroutine_fn aio_read_response(void *opaque)
 }
 }
 
-if (rsp.result != SD_RES_SUCCESS) {
+switch (rsp.result) {
+case SD_RES_SUCCESS:
+break;
+case SD_RES_READONLY:
+ret = resend_aioreq(s, aio_req);
+if (ret == SD_RES_SUCCESS) {
+goto out;
+}
+/* fall through */
+default:
 acb-ret = -EIO;
 error_report(%s, sd_strerror(rsp.result));
+break;
 }
 
 free_aio_req(s, aio_req);
@@ -1150,6 +1161,91 @@ static int write_object(int fd, char *buf, uint64_t oid, 
int copies,
  create, cache_flags);
 }
 
+/* update inode with the latest state */
+static int coroutine_fn reload_vdi_object(BDRVSheepdogState *s)
+{
+SheepdogInode *inode;
+int ret = 0, fd;
+uint32_t vid;
+
+inode = (SheepdogInode *)g_malloc(sizeof(s-inode));
+
+fd = connect_to_sdog(s);
+if (fd  0) {
+ret = -EIO;
+goto out;
+}
+
+ret = find_vdi_name(s, s-name, 0, , vid, false);
+if (ret) {
+goto out;
+}
+
+ret = read_object(fd, (char *)inode, vid_to_vdi_oid(vid),
+  s-inode.nr_copies, sizeof(*inode), 0, s-cache_flags);
+if (ret  0) {
+goto out;
+}
+
+if (inode-vdi_id != s-inode.vdi_id) {
+memcpy(s-inode, inode, sizeof(s-inode));
+}
+
+out:
+free(inode);
+closesocket(fd);
+
+return ret;
+}
+
+static int resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req)
+{
+SheepdogAIOCB *acb = aio_req-aiocb;
+bool create = false;
+int ret;
+
+ret = reload_vdi_object(s);
+if (ret  0) {
+return ret;
+}
+
+aio_req-oid = vid_to_data_oid(s-inode.vdi_id,
+   data_oid_to_idx(aio_req-oid));
+
+/* check whether this request becomes a CoW one */
+if (acb-aiocb_type == AIOCB_WRITE_UDATA) {
+int idx = data_oid_to_idx(aio_req-oid);
+AIOReq *areq;
+
+if (s-inode.data_vdi_id[idx] == 0) {
+create = true;
+goto out;
+}
+if (is_data_obj_writable(s-inode, idx)) {
+goto out;
+}
+
+/* link to the pendng list if there is another CoW request to
+ * the same object */
+QLIST_FOREACH(areq, s-inflight_aio_head, aio_siblings) {
+if (areq != aio_req  areq-oid == aio_req-oid) {
+dprintf(simultaneous CoW to % PRIx64 \n, aio_req-oid);
+QLIST_REMOVE(aio_req, aio_siblings);
+QLIST_INSERT_HEAD(s-pending_aio_head, aio_req, aio_siblings);
+return SD_RES_SUCCESS;
+}
+}
+
+aio_req-base_oid = vid_to_data_oid(s-inode.data_vdi_id[idx], idx);
+aio_req-flags |= SD_FLAG_CMD_COW;
+create = true;
+}
+out:
+return add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov,
+   create, acb-aiocb_type);
+}
+
+
 /* TODO Convert to fine grained options */
 static QemuOptsList runtime_opts = {
 .name = sheepdog,
-- 
1.7.2.5




[Qemu-devel] [PATCH 0/3] sheepdog: support online snapshot from qemu-img

2013-04-25 Thread MORITA Kazutaka
Currently, we can take sheepdog snapshots of running VMs only from the
qemu monitor.  This series allows taking online snapshots from
qemu-img.

The first two patches prepare for the thrid patch.

MORITA Kazutaka (3):
  sheepdog: cleanup find_vdi_name
  sheepdog: add SD_RES_READONLY result code
  sheepdog: resend write requests when SD_RES_READONLY is received

 block/sheepdog.c |  115 ++
 1 files changed, 107 insertions(+), 8 deletions(-)

-- 
1.7.2.5





Re: [Qemu-devel] [sheepdog] [PATCH v4] sheepdog: fix loadvm operation

2013-04-25 Thread MORITA Kazutaka
At Thu, 25 Apr 2013 20:49:39 +0800,
Liu Yuan wrote:
 
 From: Liu Yuan tailai...@taobao.com
 
 Currently the 'loadvm' opertaion works as following:
 1. switch to the snapshot
 2. mark current working VDI as a snapshot
 3. rely on sd_create_branch to create a new working VDI based on the snapshot
 
 This works not the same as other format as QCOW2. For e.g,
 
 qemu  savevm # get a live snapshot snap1
 qemu  savevm # snap2
 qemu  loadvm 1 # This will steally create snap3 of the working VDI
 
 Which will result in following snapshot chain:
 
 base -- snap1 -- snap2 -- snap3
   ^
   |
   working VDI
 
 snap3 was unnecessarily created and might be annoying users.
 
 This patch discard the unnecessary 'snap3' creation. and implement
 rollback(loadvm) operation to the specified snapshot by
 1. switch to the snapshot
 2. delete working VDI
 3. rely on sd_create_branch to create a new working VDI based on the snapshot
 
 The snapshot chain for above example will be:
 
 base -- snap1 -- snap2
   ^
   |
   working VDI
 
 Cc: qemu-devel@nongnu.org
 Cc: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
 Cc: Kevin Wolf kw...@redhat.com
 Cc: Stefan Hajnoczi stefa...@redhat.com
 Signed-off-by: Liu Yuan tailai...@taobao.com
 ---
 v4:
  - print an error message when NO_VDI found
 
 v3:
  - let boot from snapshot behave like 'loadvm'
 
 v2:
  - use do_req() because sd_delete isn't in coroutine
  - don't break old behavior if we boot up on the snapshot by using s-reverted
to indicate if we delete working VDI successfully
  - fix a subtle case that sd_create_branch() isn't called yet while another
'loadvm' is executed
 
  block/sheepdog.c |   54 
 +-
  1 file changed, 53 insertions(+), 1 deletion(-)

Reviewed-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp



[Qemu-devel] [PATCH v2 4/4] sheepdog: resend write requests when SD_RES_READONLY is received

2013-04-25 Thread MORITA Kazutaka
When a snapshot is taken from out side of qemu (e.g. qemu-img
snapshot), write requests to the current vdi return SD_RES_READONLY.
In this case, the sheepdog block driver needs to update the current
inode to the latest one and resend the write requests.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c |   60 +-
 1 files changed, 59 insertions(+), 1 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 0eaf4c3..77e21fd 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -605,6 +605,7 @@ static int do_req(int sockfd, SheepdogReq *hdr, void *data,
 static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
struct iovec *iov, int niov, bool create,
enum AIOCBState aiocb_type);
+static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req);
 
 
 static AIOReq *find_pending_req(BDRVSheepdogState *s, uint64_t oid)
@@ -749,9 +750,19 @@ static void coroutine_fn aio_read_response(void *opaque)
 }
 }
 
-if (rsp.result != SD_RES_SUCCESS) {
+switch (rsp.result) {
+case SD_RES_SUCCESS:
+break;
+case SD_RES_READONLY:
+ret = resend_aioreq(s, aio_req);
+if (ret == SD_RES_SUCCESS) {
+goto out;
+}
+/* fall through */
+default:
 acb-ret = -EIO;
 error_report(%s, sd_strerror(rsp.result));
+break;
 }
 
 free_aio_req(s, aio_req);
@@ -1186,6 +1197,53 @@ out:
 return ret;
 }
 
+static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req)
+{
+SheepdogAIOCB *acb = aio_req-aiocb;
+bool create = false;
+int ret;
+
+ret = reload_inode(s, 0, );
+if (ret  0) {
+return ret;
+}
+
+aio_req-oid = vid_to_data_oid(s-inode.vdi_id,
+   data_oid_to_idx(aio_req-oid));
+
+/* check whether this request becomes a CoW one */
+if (acb-aiocb_type == AIOCB_WRITE_UDATA) {
+int idx = data_oid_to_idx(aio_req-oid);
+AIOReq *areq;
+
+if (s-inode.data_vdi_id[idx] == 0) {
+create = true;
+goto out;
+}
+if (is_data_obj_writable(s-inode, idx)) {
+goto out;
+}
+
+/* link to the pending list if there is another CoW request to
+ * the same object */
+QLIST_FOREACH(areq, s-inflight_aio_head, aio_siblings) {
+if (areq != aio_req  areq-oid == aio_req-oid) {
+dprintf(simultaneous CoW to % PRIx64 \n, aio_req-oid);
+QLIST_REMOVE(aio_req, aio_siblings);
+QLIST_INSERT_HEAD(s-pending_aio_head, aio_req, aio_siblings);
+return SD_RES_SUCCESS;
+}
+}
+
+aio_req-base_oid = vid_to_data_oid(s-inode.data_vdi_id[idx], idx);
+aio_req-flags |= SD_FLAG_CMD_COW;
+create = true;
+}
+out:
+return add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov,
+   create, acb-aiocb_type);
+}
+
 /* TODO Convert to fine grained options */
 static QemuOptsList runtime_opts = {
 .name = sheepdog,
-- 
1.7.2.5




[Qemu-devel] [PATCH v2 1/4] sheepdog: cleanup find_vdi_name

2013-04-25 Thread MORITA Kazutaka
This makes 'filename' and 'tag' constant variables, and renames
'for_snapshot' to 'lock' to clear how it works.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c |   15 ---
 1 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 9f30a87..4326664 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -941,8 +941,9 @@ static int parse_vdiname(BDRVSheepdogState *s, const char 
*filename,
 return ret;
 }
 
-static int find_vdi_name(BDRVSheepdogState *s, char *filename, uint32_t snapid,
- char *tag, uint32_t *vid, int for_snapshot)
+static int find_vdi_name(BDRVSheepdogState *s, const char *filename,
+ uint32_t snapid, const char *tag, uint32_t *vid,
+ bool lock)
 {
 int ret, fd;
 SheepdogVdiReq hdr;
@@ -963,10 +964,10 @@ static int find_vdi_name(BDRVSheepdogState *s, char 
*filename, uint32_t snapid,
 strncpy(buf + SD_MAX_VDI_LEN, tag, SD_MAX_VDI_TAG_LEN);
 
 memset(hdr, 0, sizeof(hdr));
-if (for_snapshot) {
-hdr.opcode = SD_OP_GET_VDI_INFO;
-} else {
+if (lock) {
 hdr.opcode = SD_OP_LOCK_VDI;
+} else {
+hdr.opcode = SD_OP_GET_VDI_INFO;
 }
 wlen = SD_MAX_VDI_LEN + SD_MAX_VDI_TAG_LEN;
 hdr.proto_ver = SD_PROTO_VER;
@@ -1205,7 +1206,7 @@ static int sd_open(BlockDriverState *bs, QDict *options, 
int flags)
 goto out;
 }
 
-ret = find_vdi_name(s, vdi, snapid, tag, vid, 0);
+ret = find_vdi_name(s, vdi, snapid, tag, vid, true);
 if (ret) {
 goto out;
 }
@@ -1921,7 +1922,7 @@ static int sd_snapshot_goto(BlockDriverState *bs, const 
char *snapshot_id)
 pstrcpy(tag, sizeof(tag), s-name);
 }
 
-ret = find_vdi_name(s, vdi, snapid, tag, vid, 1);
+ret = find_vdi_name(s, vdi, snapid, tag, vid, false);
 if (ret) {
 error_report(Failed to find_vdi_name);
 goto out;
-- 
1.7.2.5




[Qemu-devel] [PATCH v2 3/4] sheepdog: add helper function to reload inode

2013-04-25 Thread MORITA Kazutaka
This adds a helper function to update the current inode state with the
specified vdi object.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c |   67 +++--
 1 files changed, 39 insertions(+), 28 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index f4e7204..0eaf4c3 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -1150,6 +1150,42 @@ static int write_object(int fd, char *buf, uint64_t oid, 
int copies,
  create, cache_flags);
 }
 
+/* update inode with the latest state */
+static int reload_inode(BDRVSheepdogState *s, uint32_t snapid, const char *tag)
+{
+SheepdogInode *inode;
+int ret = 0, fd;
+uint32_t vid = 0;
+
+fd = connect_to_sdog(s);
+if (fd  0) {
+return -EIO;
+}
+
+inode = g_malloc(sizeof(s-inode));
+
+ret = find_vdi_name(s, s-name, snapid, tag, vid, false);
+if (ret) {
+goto out;
+}
+
+ret = read_object(fd, (char *)inode, vid_to_vdi_oid(vid),
+  s-inode.nr_copies, sizeof(*inode), 0, s-cache_flags);
+if (ret  0) {
+goto out;
+}
+
+if (inode-vdi_id != s-inode.vdi_id) {
+memcpy(s-inode, inode, sizeof(s-inode));
+}
+
+out:
+g_free(inode);
+closesocket(fd);
+
+return ret;
+}
+
 /* TODO Convert to fine grained options */
 static QemuOptsList runtime_opts = {
 .name = sheepdog,
@@ -1905,18 +1941,14 @@ static int sd_snapshot_goto(BlockDriverState *bs, const 
char *snapshot_id)
 {
 BDRVSheepdogState *s = bs-opaque;
 BDRVSheepdogState *old_s;
-char vdi[SD_MAX_VDI_LEN], tag[SD_MAX_VDI_TAG_LEN];
-char *buf = NULL;
-uint32_t vid;
+char tag[SD_MAX_VDI_TAG_LEN];
 uint32_t snapid = 0;
-int ret = 0, fd;
+int ret = 0;
 
 old_s = g_malloc(sizeof(BDRVSheepdogState));
 
 memcpy(old_s, s, sizeof(BDRVSheepdogState));
 
-pstrcpy(vdi, sizeof(vdi), s-name);
-
 snapid = strtoul(snapshot_id, NULL, 10);
 if (snapid) {
 tag[0] = 0;
@@ -1924,30 +1956,11 @@ static int sd_snapshot_goto(BlockDriverState *bs, const 
char *snapshot_id)
 pstrcpy(tag, sizeof(tag), s-name);
 }
 
-ret = find_vdi_name(s, vdi, snapid, tag, vid, false);
+ret = reload_inode(s, snapid, tag);
 if (ret) {
-error_report(Failed to find_vdi_name);
 goto out;
 }
 
-fd = connect_to_sdog(s);
-if (fd  0) {
-ret = fd;
-goto out;
-}
-
-buf = g_malloc(SD_INODE_SIZE);
-ret = read_object(fd, buf, vid_to_vdi_oid(vid), s-inode.nr_copies,
-  SD_INODE_SIZE, 0, s-cache_flags);
-
-closesocket(fd);
-
-if (ret) {
-goto out;
-}
-
-memcpy(s-inode, buf, sizeof(s-inode));
-
 if (!s-inode.vm_state_size) {
 error_report(Invalid snapshot);
 ret = -ENOENT;
@@ -1956,14 +1969,12 @@ static int sd_snapshot_goto(BlockDriverState *bs, const 
char *snapshot_id)
 
 s-is_snapshot = true;
 
-g_free(buf);
 g_free(old_s);
 
 return 0;
 out:
 /* recover bdrv_sd_state */
 memcpy(s, old_s, sizeof(BDRVSheepdogState));
-g_free(buf);
 g_free(old_s);
 
 error_report(failed to open. recover old bdrv_sd_state.);
-- 
1.7.2.5




[Qemu-devel] [PATCH v2 2/4] sheepdog: add SD_RES_READONLY result code

2013-04-25 Thread MORITA Kazutaka
Sheepdog returns SD_RES_READONLY when qemu sends write requests to the
snapshot vdi.  This adds the result code and makes sd_strerror() print
its error reason.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 4326664..f4e7204 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -68,6 +68,7 @@
 #define SD_RES_WAIT_FOR_JOIN0x17 /* Waiting for other nodes joining */
 #define SD_RES_JOIN_FAILED   0x18 /* Target node had failed to join sheepdog */
 #define SD_RES_HALT  0x19 /* Sheepdog is stopped serving IO request */
+#define SD_RES_READONLY  0x1A /* Object is read-only */
 
 /*
  * Object ID rules
@@ -349,6 +350,7 @@ static const char * sd_strerror(int err)
 {SD_RES_WAIT_FOR_JOIN, Sheepdog is waiting for other nodes joining},
 {SD_RES_JOIN_FAILED, Target node had failed to join sheepdog},
 {SD_RES_HALT, Sheepdog is stopped serving IO request},
+{SD_RES_READONLY, Object is read-only},
 };
 
 for (i = 0; i  ARRAY_SIZE(errors); ++i) {
-- 
1.7.2.5




[Qemu-devel] [PATCH v2 0/4] sheepdog: support online snapshot from qemu-img

2013-04-25 Thread MORITA Kazutaka
v2:
 - rename reload_vdi_obj to reload_inode and use it from
   sd_snapshot_goto (Yuan)
 - use g_free instead of free (Stefan)
 - fix typo in the comment (Stefan)
 - add coroutine_fn marker to resend_aioreq

Currently, we can take sheepdog snapshots of running VMs only from the
qemu monitor.  This series allows taking online snapshots from
qemu-img.

The first two patches prepare for the thrid patch.

MORITA Kazutaka (4):
  sheepdog: cleanup find_vdi_name
  sheepdog: add SD_RES_READONLY result code
  sheepdog: add helper function to reload inode
  sheepdog: resend write requests when SD_RES_READONLY is received

 block/sheepdog.c |  142 -
 1 files changed, 107 insertions(+), 35 deletions(-)

-- 
1.7.2.5




Re: [Qemu-devel] [sheepdog] [PATCH 3/3] sheepdog: resend write requests when SD_RES_READONLY is received

2013-04-25 Thread MORITA Kazutaka
At Thu, 25 Apr 2013 21:08:01 +0800,
Liu Yuan wrote:
 
 On 04/25/2013 06:37 PM, MORITA Kazutaka wrote:
  +/* update inode with the latest state */
  +static int coroutine_fn reload_vdi_object(BDRVSheepdogState *s)
 
 I'd suggest function name as
 'reload_inode(BDRVSheepdogState *s, tag, snapid)', then sd_create_branch
 and sd_snapshot_goto can make use of this function. With this change, it
 would be conflicted to my patch series applied to Stefan's block tree +
 Fix loadvm patch. So it would be better you can develop this patch set
 against Stefan's block tree + my fix loadvm patch.

sd_create_branch() doesn't use find_vdi_name() to get vid, so
reload_inode() is not suitable for it.  After all, I used
reload_inode() only for sd_snapshot_goto() and resend_aioreq() in v2,
so our patches didn't conflict.

Thanks,

Kazutaka



Re: [Qemu-devel] [sheepdog] [PATCH] sheepdog: implement .bdrv_co_is_allocated

2013-04-22 Thread MORITA Kazutaka
At Thu, 18 Apr 2013 19:48:52 +0800,
Liu Yuan wrote:
 
 +static coroutine_fn int
 +sd_co_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
 +   int *pnum)
 +{
 +BDRVSheepdogState *s = bs-opaque;
 +SheepdogInode *inode = s-inode;
 +unsigned long start = sector_num * SECTOR_SIZE / SD_DATA_OBJ_SIZE, idx,

It looks better to use BDRV_SECTOR_SIZE .  I'd suggest preparing
another patch to replace all the SECTOR_SIZE with BDRV_SECTOR_SIZE.

 +  end = start + (nb_sectors * SECTOR_SIZE) / 
 SD_DATA_OBJ_SIZE;

Using 'start' to calculate 'end' is wrong because 'start' may be
rounded down.

 +
 +for (idx = start; idx = end; idx++) {
 +if (inode-data_vdi_id[idx] == 0) {
 +break;
 +}
 +}
 +if (idx == start) {
 +*pnum = SD_DATA_OBJ_SIZE / SECTOR_SIZE;

Isn't it better to set the longest length of the unallocated sectors?

 +return 0;
 +}
 +
 +*pnum = (idx - start) * SD_DATA_OBJ_SIZE / SECTOR_SIZE;
 +return 1;
 +}
 +

Thanks,

Kazutaka



Re: [Qemu-devel] [sheepdog] [PATCH for QEMU v3] sheepdog: add discard/trim support for sheepdog

2013-04-15 Thread MORITA Kazutaka
At Sun, 14 Apr 2013 13:16:44 +0800,
Liu Yuan wrote:
 
 From: Liu Yuan tailai...@taobao.com
 
 The 'TRIM' command from VM that is to release underlying data storage for
 better thin-provision is already supported by the Sheepdog.
 
 This patch adds the TRIM support at QEMU part.
 
 For older Sheepdog that doesn't support it, we return EIO to upper layer.

I think we can safely return 0 without doing anything when the server
doesn't support SD_OP_DISCARD.  Actually, if the block driver doesn't
support the discard operation, bdrv_co_discard() in block.c returns 0.


 diff --git a/block/sheepdog.c b/block/sheepdog.c
 index 987018e..e852d4e 100644
 --- a/block/sheepdog.c
 +++ b/block/sheepdog.c
 @@ -34,6 +34,7 @@
  #define SD_OP_GET_VDI_INFO   0x14
  #define SD_OP_READ_VDIS  0x15
  #define SD_OP_FLUSH_VDI  0x16
 +#define SD_OP_DISCARD0x17

This is an opcode for objects, so I prefer SD_OP_DISCARD_OBJ.

  
 +static int sd_co_discard(BlockDriverState *bs, int64_t sector_num,
 + int nb_sectors)

Should add coroutine_fn.

Thanks,

Kazutaka



Re: [Qemu-devel] [sheepdog] [PATCH v4] sheepdog: add discard/trim support for sheepdog

2013-04-15 Thread MORITA Kazutaka
At Mon, 15 Apr 2013 23:52:40 +0800,
Liu Yuan wrote:
 
 diff --git a/block/sheepdog.c b/block/sheepdog.c
 index 987018e..362244a 100644
 --- a/block/sheepdog.c
 +++ b/block/sheepdog.c
 @@ -34,6 +34,7 @@
  #define SD_OP_GET_VDI_INFO   0x14
  #define SD_OP_READ_VDIS  0x15
  #define SD_OP_FLUSH_VDI  0x16
 +#define SD_OP_DISCARD_OBJ0x17

I think SD_OP_DISCARD_OBJ should be 0x05 since we are using a value
less than 0x10 for SD_OP_*_OBJ.

The other parts look good to me.

Thanks,

Kazutaka



Re: [Qemu-devel] [sheepdog] [PATCH v5] sheepdog: add discard/trim support for sheepdog

2013-04-15 Thread MORITA Kazutaka
At Tue, 16 Apr 2013 00:15:04 +0800,
Liu Yuan wrote:
 
 From: Liu Yuan tailai...@taobao.com
 
 The 'TRIM' command from VM that is to release underlying data storage for
 better thin-provision is already supported by the Sheepdog.
 
 This patch adds the TRIM support at QEMU part.
 
 For older Sheepdog that doesn't support it, we return 0(success) to upper 
 layer.
 
 Cc: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
 Cc: Kevin Wolf kw...@redhat.com
 Cc: Stefan Hajnoczi stefa...@redhat.com
 Cc: Paolo Bonzini pbonz...@redhat.com
 Signed-off-by: Liu Yuan tailai...@taobao.com
 ---
 v5:
  - adjust macro numbering
 v4:
  - adjust discard macro
  - return success when operation is not supported by sheep
  - add coroutine_fn marker
 v3:
  - fix a silly accidental deletion of 'default' in switch clause.
 v2:
  - skip the object when it is not allocated
 
  block/sheepdog.c |   56 
 +-
  1 file changed, 55 insertions(+), 1 deletion(-)

Reviewed-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp



Re: [Qemu-devel] [PATCH] sheepdog: show error message for halt status

2013-03-18 Thread MORITA Kazutaka
At Mon, 18 Mar 2013 14:27:55 +0800,
Liu Yuan wrote:
 
 From: Liu Yuan tailai...@taobao.com
 
 Sheepdog (neither quorum nor unsafe mode) will refuse to serve IO requests 
 when
 number of alive nodes is less than that of copies specified by users. This 
 will
 return 0x19 to QEMU client which currently doesn't recognize it.
 
 This patch adds an error description when QEMU client receives it, other than
 plainly printing 'Invalid error code'
 
 Cc: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
 Cc: Kevin Wolf kw...@redhat.com
 Cc: Stefan Hajnoczi stefa...@redhat.com
 Signed-off-by: Liu Yuan tailai...@taobao.com
 ---
  block/sheepdog.c |2 ++
  1 file changed, 2 insertions(+)

Reviewed-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp



[Qemu-devel] [PATCH v2 0/2] sheepdog: don't sleep in coroutine context

2013-03-12 Thread MORITA Kazutaka
This patch prevents the sheepdog driver from sleeping in coroutine
context long time.

The first patch makes the driver use a non-blocking socket and the
second one fixes a bug that yielded coroutines aren't entered.

Changes from v2:
 - add a patch to use non-blocking fd
 - add explanation why it is safe to io_flush to NULL


MORITA Kazutaka (2):
  sheepdog: use non-blocking fd in coroutine context
  sheepdog: set io_flush handler in do_co_req

 block/sheepdog.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH v2 1/2] sheepdog: use non-blocking fd in coroutine context

2013-03-12 Thread MORITA Kazutaka
Using a blocking socket in the coroutine context reduces the chance of
switching to other work.  This patch makes the sheepdog driver use a
non-blocking fd always.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index c711c28..27abef2 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -468,6 +468,8 @@ static int connect_to_sdog(BDRVSheepdogState *s)
 if (err != NULL) {
 qerror_report_err(err);
 error_free(err);
+} else {
+socket_set_nonblock(fd);
 }
 
 return fd;
@@ -523,7 +525,6 @@ static coroutine_fn void do_co_req(void *opaque)
 co = qemu_coroutine_self();
 qemu_aio_set_fd_handler(sockfd, NULL, restart_co_req, NULL, co);
 
-socket_set_block(sockfd);
 ret = send_co_req(sockfd, hdr, data, wlen);
 if (ret  0) {
 goto out;
@@ -553,7 +554,6 @@ static coroutine_fn void do_co_req(void *opaque)
 ret = 0;
 out:
 qemu_aio_set_fd_handler(sockfd, NULL, NULL, NULL, NULL);
-socket_set_nonblock(sockfd);
 
 srco-ret = ret;
 srco-finished = true;
@@ -776,8 +776,6 @@ static int get_sheep_fd(BDRVSheepdogState *s)
 return fd;
 }
 
-socket_set_nonblock(fd);
-
 qemu_aio_set_fd_handler(fd, co_read_response, NULL, aio_flush_request, s);
 return fd;
 }
-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH v2 2/2] sheepdog: set io_flush handler in do_co_req

2013-03-12 Thread MORITA Kazutaka
If an io_flush handler is not set, qemu_aio_wait doesn't invoke
callbacks.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 27abef2..4245328 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -501,6 +501,13 @@ static void restart_co_req(void *opaque)
 qemu_coroutine_enter(co, NULL);
 }
 
+static int have_co_req(void *opaque)
+{
+/* this handler is set only when there is a pending request, so
+ * always returns 1. */
+return 1;
+}
+
 typedef struct SheepdogReqCo {
 int sockfd;
 SheepdogReq *hdr;
@@ -523,14 +530,14 @@ static coroutine_fn void do_co_req(void *opaque)
 unsigned int *rlen = srco-rlen;
 
 co = qemu_coroutine_self();
-qemu_aio_set_fd_handler(sockfd, NULL, restart_co_req, NULL, co);
+qemu_aio_set_fd_handler(sockfd, NULL, restart_co_req, have_co_req, co);
 
 ret = send_co_req(sockfd, hdr, data, wlen);
 if (ret  0) {
 goto out;
 }
 
-qemu_aio_set_fd_handler(sockfd, restart_co_req, NULL, NULL, co);
+qemu_aio_set_fd_handler(sockfd, restart_co_req, NULL, have_co_req, co);
 
 ret = qemu_co_recv(sockfd, hdr, sizeof(*hdr));
 if (ret  sizeof(*hdr)) {
@@ -553,6 +560,8 @@ static coroutine_fn void do_co_req(void *opaque)
 }
 ret = 0;
 out:
+/* there is at most one request for this sockfd, so it is safe to
+ * set each handler to NULL. */
 qemu_aio_set_fd_handler(sockfd, NULL, NULL, NULL, NULL);
 
 srco-ret = ret;
-- 
1.8.1.3.566.gaa39828




[Qemu-devel] [PATCH] sheepdog: set io_flush handler in do_co_req

2013-03-11 Thread MORITA Kazutaka
If an io_flush handler is not set, qemu_aio_wait doesn't invoke
callbacks.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index e4ec32d..cb0eeed 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -501,6 +501,13 @@ static void restart_co_req(void *opaque)
 qemu_coroutine_enter(co, NULL);
 }
 
+static int have_co_req(void *opaque)
+{
+/* this handler is set only when there is a pending request, so
+ * always returns 1. */
+return 1;
+}
+
 typedef struct SheepdogReqCo {
 int sockfd;
 SheepdogReq *hdr;
@@ -523,14 +530,14 @@ static coroutine_fn void do_co_req(void *opaque)
 unsigned int *rlen = srco-rlen;
 
 co = qemu_coroutine_self();
-qemu_aio_set_fd_handler(sockfd, NULL, restart_co_req, NULL, co);
+qemu_aio_set_fd_handler(sockfd, NULL, restart_co_req, have_co_req, co);
 
 ret = send_co_req(sockfd, hdr, data, wlen);
 if (ret  0) {
 goto out;
 }
 
-qemu_aio_set_fd_handler(sockfd, restart_co_req, NULL, NULL, co);
+qemu_aio_set_fd_handler(sockfd, restart_co_req, NULL, have_co_req, co);
 
 ret = qemu_co_recv(sockfd, hdr, sizeof(*hdr));
 if (ret  sizeof(*hdr)) {
-- 
1.8.1.3.566.gaa39828




  1   2   3   >