subject:"Re\: \[Qemu\-devel\] \[PATCHv2\] block\: add native support for NFS"

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2014-01-03 Thread Peter Lieven


On 20.12.2013 17:27, Stefan Hajnoczi wrote:

On Fri, Dec 20, 2013 at 4:57 PM, Peter Lieven p...@kamp.de wrote:

On 20.12.2013 16:54, Stefan Hajnoczi wrote:

On Fri, Dec 20, 2013 at 4:49 PM, Peter Lieven p...@kamp.de wrote:

On 20.12.2013 16:30, Stefan Hajnoczi wrote:

On Fri, Dec 20, 2013 at 3:43 PM, Peter Lieven p...@kamp.de wrote:

On 20.12.2013 15:38, Stefan Hajnoczi wrote:

On Fri, Dec 20, 2013 at 3:07 PM, Peter Lieven p...@kamp.de wrote:

On 20.12.2013 14:57, Stefan Hajnoczi wrote:

On Fri, Dec 20, 2013 at 1:53 PM, Peter Lieven p...@kamp.de wrote:

On 20.12.2013 13:19, Stefan Hajnoczi wrote:

On Fri, Dec 20, 2013 at 10:48:41AM +0100, Peter Lieven wrote:

On 17.12.2013 17:47, Stefan Hajnoczi wrote:

On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote:

+/* set to -ENOTSUP since bdrv_allocated_file_size is only
used
+ * in qemu-img open. So we can use the cached value for
allocate
+ * filesize obtained from fstat at open time */
+client-allocated_file_size = -ENOTSUP;

Can you implement this fully?  By stubbing it out like this we
won't
be
able to call get_allocated_file_size() at runtime in the future
without
updating the nfs block driver code.  It's just an fstat call,
shouldn't
be too hard to implement properly :).

It seems I have to leave it as is currently.
bdrv_get_allocated_file_size
is not in a coroutine context. I get coroutine yields to no one.

Create a coroutine and pump the event loop until it has reached
completion:

co = qemu_coroutine_create(my_coroutine_fn, ...);
qemu_coroutine_enter(co, foo);
while (!complete) {
  qemu_aio_wait();
}

See block.c for similar examples.

Wouldn't it make sense to make this modification to
bdrv_get_allocated_file_size in
block.c rather than in client/nfs.c and in the future potentially
other
drivers?

If yes, I would ask you to take v3 of the NFS protocol patch and I
promise
to send
a follow up early next year to make this modification to block.c
and
change
block/nfs.c
and other implementations to be a coroutine_fn.

.bdrv_get_allocated_file_size() implementations in other block
drivers
are synchronous.  Making the block driver interface use coroutines
would be wrong unless all the block drivers were updated to use
coroutines too.

I can do that. I think its not too complicated because all those
implementations do not rely on callbacks. It should be possible
to just rename the existing implemenations to lets say
.bdrv_co_get_allocated_file_size and call them inside a coroutine.

No, that would be wrong because coroutine functions should not block.
The point of coroutines is that if they cannot proceed they must yield
so the event loop regains control.  If you simply rename the function
to _co_ then they will block the event loop and not be true coroutine
functions.


Can you just call nfs_fstat() (the sync libnfs interface)?

I can only do that if its guaranteed that no other requests are in
flight
otherwise it will mess up.

How will it mess up?

The sync calls into libnfs are just wrappers around the async calls.
The problem is that this wrapper will handle all the callbacks for the
in-flight requests and they will never return.

So back to my original suggestion to use a qemu_aio_wait() loop in
block/nfs.c?

sorry, I cannot follow you. but maybe this here is a short solution.
question
is, what will happen when there are pending requests which invoke
callbacks.
will we end up returning from qemu_aio_wait? in the qemu-img info case
this here works:

static int64_t nfs_get_allocated_file_size(BlockDriverState *bs)
{

  NFSClient *client = bs-opaque;
  NFSRPC task = {0};
  struct stat st;

  task.st = st;
  if (nfs_fstat_async(client-context, client-fh, nfs_co_generic_cb,
  task) != 0) {
  return -ENOMEM;
  }

  while (!task.complete) {
  nfs_set_events(client);
  qemu_aio_wait();
  }

  return (task.status  0 ? task.status : st.st_blocks *
st.st_blksize);
}

Yes, that's exactly what I had in mind.

Yes, other callbacks will get called and requests will complete in
this event loop.  That can be a problem in some scenarios but should
be okay with bdrv_get_allocated_file_size().

ok I will send v4 and then prepare for the holidays ;-)

Happy holidays!  I already sent out the block pull request earlier
today but your patch will be included in the first January pull
request.

If you pick this up please take v5.

Peter

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-20 Thread Peter Lieven


On 17.12.2013 17:47, Stefan Hajnoczi wrote:

On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote:

This patch adds native support for accessing images on NFS shares without
the requirement to actually mount the entire NFS share on the host.

NFS Images can simply be specified by an url of the form:
nfs://host/export/filename

For example:
qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2

You need libnfs from Ronnie Sahlberg available at:
git://github.com/sahlberg/libnfs.git
for this to work.

During configure it is automatically probed for libnfs and support
is enabled on-the-fly. You can forbid or enforce libnfs support
with --disable-libnfs or --enable-libnfs respectively.

Due to NFS restrictions you might need to execute your binaries
as root, allow them to open priviledged ports (1024) or specify
insecure option on the NFS server.

Signed-off-by: Peter Lieven p...@kamp.de
---
v1-v2:
  - fixed block/Makefile.objs [Ronnie]
  - do not always register a read handler [Ronnie]
  - add support for reading beyond EOF [Fam]
  - fixed struct and paramter naming [Fam]
  - fixed overlong lines and whitespace errors [Fam]
  - return return status from libnfs whereever possible [Fam]
  - added comment why we set allocated_file_size to -ENOTSUP after write [Fam]
  - avoid segfault when parsing filname [Fam]
  - remove unused close_bh from NFSClient [Fam]
  - avoid dividing and mutliplying total_size by BDRV_SECTOR_SIZE in 
nfs_file_create [Fam]

  MAINTAINERS |5 +
  block/Makefile.objs |1 +
  block/nfs.c |  419 +++
  configure   |   38 +
  4 files changed, 463 insertions(+)
  create mode 100644 block/nfs.c

Which NFS protocol versions are supported by current libnfs?


+#include poll.h

Why is this header included?


+typedef struct nfsclient {

Please either drop the struct tag or use NFSClient.


+static void
+nfs_co_generic_cb(int status, struct nfs_context *nfs, void *data,
+  void *private_data)
+{
+NFSTask *Task = private_data;

lowercase task local variable name please.


+static int coroutine_fn nfs_co_writev(BlockDriverState *bs,
+int64_t sector_num, int nb_sectors,
+QEMUIOVector *iov)
+{
+NFSClient *client = bs-opaque;
+NFSTask task;
+char *buf = NULL;
+
+nfs_co_init_task(client, task);
+
+buf = g_malloc(nb_sectors * BDRV_SECTOR_SIZE);
+qemu_iovec_to_buf(iov, 0, buf, nb_sectors * BDRV_SECTOR_SIZE);
+
+if (nfs_pwrite_async(client-context, client-fh,
+ sector_num * BDRV_SECTOR_SIZE,
+ nb_sectors * BDRV_SECTOR_SIZE,
+ buf, nfs_co_generic_cb, task) != 0) {
+g_free(buf);
+return -EIO;

Can we get a more detailed errno here?  (e.g. ENOSPC)


+}
+
+while (!task.complete) {
+nfs_set_events(client);
+qemu_coroutine_yield();
+}
+
+g_free(buf);
+
+if (task.status != nb_sectors * BDRV_SECTOR_SIZE) {
+return task.status  0 ? task.status : -EIO;
+}
+
+bs-total_sectors = MAX(bs-total_sectors, sector_num + nb_sectors);

Why is this necessary?  block.c will update bs-total_sectors if the
file is growable.


+/* set to -ENOTSUP since bdrv_allocated_file_size is only used
+ * in qemu-img open. So we can use the cached value for allocate
+ * filesize obtained from fstat at open time */
+client-allocated_file_size = -ENOTSUP;

Can you implement this fully?  By stubbing it out like this we won't be
able to call get_allocated_file_size() at runtime in the future without
updating the nfs block driver code.  It's just an fstat call, shouldn't
be too hard to implement properly :).


It seems I have to leave it as is currently. bdrv_get_allocated_file_size
is not in a coroutine context. I get coroutine yields to no one.

Peter

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-20 Thread Stefan Hajnoczi

On Fri, Dec 20, 2013 at 10:48:41AM +0100, Peter Lieven wrote:
 On 17.12.2013 17:47, Stefan Hajnoczi wrote:
 On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote:
 This patch adds native support for accessing images on NFS shares without
 the requirement to actually mount the entire NFS share on the host.
 
 NFS Images can simply be specified by an url of the form:
 nfs://host/export/filename
 
 For example:
 qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2
 
 You need libnfs from Ronnie Sahlberg available at:
 git://github.com/sahlberg/libnfs.git
 for this to work.
 
 During configure it is automatically probed for libnfs and support
 is enabled on-the-fly. You can forbid or enforce libnfs support
 with --disable-libnfs or --enable-libnfs respectively.
 
 Due to NFS restrictions you might need to execute your binaries
 as root, allow them to open priviledged ports (1024) or specify
 insecure option on the NFS server.
 
 Signed-off-by: Peter Lieven p...@kamp.de
 ---
 v1-v2:
   - fixed block/Makefile.objs [Ronnie]
   - do not always register a read handler [Ronnie]
   - add support for reading beyond EOF [Fam]
   - fixed struct and paramter naming [Fam]
   - fixed overlong lines and whitespace errors [Fam]
   - return return status from libnfs whereever possible [Fam]
   - added comment why we set allocated_file_size to -ENOTSUP after write 
  [Fam]
   - avoid segfault when parsing filname [Fam]
   - remove unused close_bh from NFSClient [Fam]
   - avoid dividing and mutliplying total_size by BDRV_SECTOR_SIZE in 
  nfs_file_create [Fam]
 
   MAINTAINERS |5 +
   block/Makefile.objs |1 +
   block/nfs.c |  419 
  +++
   configure   |   38 +
   4 files changed, 463 insertions(+)
   create mode 100644 block/nfs.c
 Which NFS protocol versions are supported by current libnfs?
 
 +#include poll.h
 Why is this header included?
 
 +typedef struct nfsclient {
 Please either drop the struct tag or use NFSClient.
 
 +static void
 +nfs_co_generic_cb(int status, struct nfs_context *nfs, void *data,
 +  void *private_data)
 +{
 +NFSTask *Task = private_data;
 lowercase task local variable name please.
 
 +static int coroutine_fn nfs_co_writev(BlockDriverState *bs,
 +int64_t sector_num, int nb_sectors,
 +QEMUIOVector *iov)
 +{
 +NFSClient *client = bs-opaque;
 +NFSTask task;
 +char *buf = NULL;
 +
 +nfs_co_init_task(client, task);
 +
 +buf = g_malloc(nb_sectors * BDRV_SECTOR_SIZE);
 +qemu_iovec_to_buf(iov, 0, buf, nb_sectors * BDRV_SECTOR_SIZE);
 +
 +if (nfs_pwrite_async(client-context, client-fh,
 + sector_num * BDRV_SECTOR_SIZE,
 + nb_sectors * BDRV_SECTOR_SIZE,
 + buf, nfs_co_generic_cb, task) != 0) {
 +g_free(buf);
 +return -EIO;
 Can we get a more detailed errno here?  (e.g. ENOSPC)
 
 +}
 +
 +while (!task.complete) {
 +nfs_set_events(client);
 +qemu_coroutine_yield();
 +}
 +
 +g_free(buf);
 +
 +if (task.status != nb_sectors * BDRV_SECTOR_SIZE) {
 +return task.status  0 ? task.status : -EIO;
 +}
 +
 +bs-total_sectors = MAX(bs-total_sectors, sector_num + nb_sectors);
 Why is this necessary?  block.c will update bs-total_sectors if the
 file is growable.
 
 +/* set to -ENOTSUP since bdrv_allocated_file_size is only used
 + * in qemu-img open. So we can use the cached value for allocate
 + * filesize obtained from fstat at open time */
 +client-allocated_file_size = -ENOTSUP;
 Can you implement this fully?  By stubbing it out like this we won't be
 able to call get_allocated_file_size() at runtime in the future without
 updating the nfs block driver code.  It's just an fstat call, shouldn't
 be too hard to implement properly :).
 
 It seems I have to leave it as is currently. bdrv_get_allocated_file_size
 is not in a coroutine context. I get coroutine yields to no one.

Create a coroutine and pump the event loop until it has reached
completion:

co = qemu_coroutine_create(my_coroutine_fn, ...);
qemu_coroutine_enter(co, foo);
while (!complete) {
qemu_aio_wait();
}

See block.c for similar examples.

Stefan

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-20 Thread Peter Lieven


On 20.12.2013 13:19, Stefan Hajnoczi wrote:

On Fri, Dec 20, 2013 at 10:48:41AM +0100, Peter Lieven wrote:

On 17.12.2013 17:47, Stefan Hajnoczi wrote:

On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote:

This patch adds native support for accessing images on NFS shares without
the requirement to actually mount the entire NFS share on the host.

NFS Images can simply be specified by an url of the form:
nfs://host/export/filename

For example:
qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2

You need libnfs from Ronnie Sahlberg available at:
git://github.com/sahlberg/libnfs.git
for this to work.

During configure it is automatically probed for libnfs and support
is enabled on-the-fly. You can forbid or enforce libnfs support
with --disable-libnfs or --enable-libnfs respectively.

Due to NFS restrictions you might need to execute your binaries
as root, allow them to open priviledged ports (1024) or specify
insecure option on the NFS server.

Signed-off-by: Peter Lieven p...@kamp.de
---
v1-v2:
  - fixed block/Makefile.objs [Ronnie]
  - do not always register a read handler [Ronnie]
  - add support for reading beyond EOF [Fam]
  - fixed struct and paramter naming [Fam]
  - fixed overlong lines and whitespace errors [Fam]
  - return return status from libnfs whereever possible [Fam]
  - added comment why we set allocated_file_size to -ENOTSUP after write [Fam]
  - avoid segfault when parsing filname [Fam]
  - remove unused close_bh from NFSClient [Fam]
  - avoid dividing and mutliplying total_size by BDRV_SECTOR_SIZE in 
nfs_file_create [Fam]

  MAINTAINERS |5 +
  block/Makefile.objs |1 +
  block/nfs.c |  419 +++
  configure   |   38 +
  4 files changed, 463 insertions(+)
  create mode 100644 block/nfs.c

Which NFS protocol versions are supported by current libnfs?


+#include poll.h

Why is this header included?


+typedef struct nfsclient {

Please either drop the struct tag or use NFSClient.


+static void
+nfs_co_generic_cb(int status, struct nfs_context *nfs, void *data,
+  void *private_data)
+{
+NFSTask *Task = private_data;

lowercase task local variable name please.


+static int coroutine_fn nfs_co_writev(BlockDriverState *bs,
+int64_t sector_num, int nb_sectors,
+QEMUIOVector *iov)
+{
+NFSClient *client = bs-opaque;
+NFSTask task;
+char *buf = NULL;
+
+nfs_co_init_task(client, task);
+
+buf = g_malloc(nb_sectors * BDRV_SECTOR_SIZE);
+qemu_iovec_to_buf(iov, 0, buf, nb_sectors * BDRV_SECTOR_SIZE);
+
+if (nfs_pwrite_async(client-context, client-fh,
+ sector_num * BDRV_SECTOR_SIZE,
+ nb_sectors * BDRV_SECTOR_SIZE,
+ buf, nfs_co_generic_cb, task) != 0) {
+g_free(buf);
+return -EIO;

Can we get a more detailed errno here?  (e.g. ENOSPC)


+}
+
+while (!task.complete) {
+nfs_set_events(client);
+qemu_coroutine_yield();
+}
+
+g_free(buf);
+
+if (task.status != nb_sectors * BDRV_SECTOR_SIZE) {
+return task.status  0 ? task.status : -EIO;
+}
+
+bs-total_sectors = MAX(bs-total_sectors, sector_num + nb_sectors);

Why is this necessary?  block.c will update bs-total_sectors if the
file is growable.


+/* set to -ENOTSUP since bdrv_allocated_file_size is only used
+ * in qemu-img open. So we can use the cached value for allocate
+ * filesize obtained from fstat at open time */
+client-allocated_file_size = -ENOTSUP;

Can you implement this fully?  By stubbing it out like this we won't be
able to call get_allocated_file_size() at runtime in the future without
updating the nfs block driver code.  It's just an fstat call, shouldn't
be too hard to implement properly :).

It seems I have to leave it as is currently. bdrv_get_allocated_file_size
is not in a coroutine context. I get coroutine yields to no one.

Create a coroutine and pump the event loop until it has reached
completion:

co = qemu_coroutine_create(my_coroutine_fn, ...);
qemu_coroutine_enter(co, foo);
while (!complete) {
 qemu_aio_wait();
}

See block.c for similar examples.

Wouldn't it make sense to make this modification to 
bdrv_get_allocated_file_size in
block.c rather than in client/nfs.c and in the future potentially other drivers?

If yes, I would ask you to take v3 of the NFS protocol patch and I promise to 
send
a follow up early next year to make this modification to block.c and change 
block/nfs.c
and other implementations to be a coroutine_fn.

Thanks
Peter

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-20 Thread Stefan Hajnoczi

On Fri, Dec 20, 2013 at 1:53 PM, Peter Lieven p...@kamp.de wrote:
 On 20.12.2013 13:19, Stefan Hajnoczi wrote:

 On Fri, Dec 20, 2013 at 10:48:41AM +0100, Peter Lieven wrote:

 On 17.12.2013 17:47, Stefan Hajnoczi wrote:

 On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote:
 +/* set to -ENOTSUP since bdrv_allocated_file_size is only used
 + * in qemu-img open. So we can use the cached value for allocate
 + * filesize obtained from fstat at open time */
 +client-allocated_file_size = -ENOTSUP;

 Can you implement this fully?  By stubbing it out like this we won't be
 able to call get_allocated_file_size() at runtime in the future without
 updating the nfs block driver code.  It's just an fstat call, shouldn't
 be too hard to implement properly :).

 It seems I have to leave it as is currently. bdrv_get_allocated_file_size
 is not in a coroutine context. I get coroutine yields to no one.

 Create a coroutine and pump the event loop until it has reached
 completion:

 co = qemu_coroutine_create(my_coroutine_fn, ...);
 qemu_coroutine_enter(co, foo);
 while (!complete) {
  qemu_aio_wait();
 }

 See block.c for similar examples.

 Wouldn't it make sense to make this modification to
 bdrv_get_allocated_file_size in
 block.c rather than in client/nfs.c and in the future potentially other
 drivers?

 If yes, I would ask you to take v3 of the NFS protocol patch and I promise
 to send
 a follow up early next year to make this modification to block.c and change
 block/nfs.c
 and other implementations to be a coroutine_fn.

.bdrv_get_allocated_file_size() implementations in other block drivers
are synchronous.  Making the block driver interface use coroutines
would be wrong unless all the block drivers were updated to use
coroutines too.

Can you just call nfs_fstat() (the sync libnfs interface)?

Stefan

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-20 Thread Peter Lieven


On 20.12.2013 14:57, Stefan Hajnoczi wrote:

On Fri, Dec 20, 2013 at 1:53 PM, Peter Lieven p...@kamp.de wrote:

On 20.12.2013 13:19, Stefan Hajnoczi wrote:

On Fri, Dec 20, 2013 at 10:48:41AM +0100, Peter Lieven wrote:

On 17.12.2013 17:47, Stefan Hajnoczi wrote:

On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote:

+/* set to -ENOTSUP since bdrv_allocated_file_size is only used
+ * in qemu-img open. So we can use the cached value for allocate
+ * filesize obtained from fstat at open time */
+client-allocated_file_size = -ENOTSUP;

Can you implement this fully?  By stubbing it out like this we won't be
able to call get_allocated_file_size() at runtime in the future without
updating the nfs block driver code.  It's just an fstat call, shouldn't
be too hard to implement properly :).

It seems I have to leave it as is currently. bdrv_get_allocated_file_size
is not in a coroutine context. I get coroutine yields to no one.

Create a coroutine and pump the event loop until it has reached
completion:

co = qemu_coroutine_create(my_coroutine_fn, ...);
qemu_coroutine_enter(co, foo);
while (!complete) {
  qemu_aio_wait();
}

See block.c for similar examples.

Wouldn't it make sense to make this modification to
bdrv_get_allocated_file_size in
block.c rather than in client/nfs.c and in the future potentially other
drivers?

If yes, I would ask you to take v3 of the NFS protocol patch and I promise
to send
a follow up early next year to make this modification to block.c and change
block/nfs.c
and other implementations to be a coroutine_fn.

.bdrv_get_allocated_file_size() implementations in other block drivers
are synchronous.  Making the block driver interface use coroutines
would be wrong unless all the block drivers were updated to use
coroutines too.

I can do that. I think its not too complicated because all those
implementations do not rely on callbacks. It should be possible
to just rename the existing implemenations to lets say
.bdrv_co_get_allocated_file_size and call them inside a coroutine.


Can you just call nfs_fstat() (the sync libnfs interface)?

I can only do that if its guaranteed that no other requests are in flight
otherwise it will mess up.

Peter

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-20 Thread Stefan Hajnoczi

On Fri, Dec 20, 2013 at 3:07 PM, Peter Lieven p...@kamp.de wrote:
 On 20.12.2013 14:57, Stefan Hajnoczi wrote:

 On Fri, Dec 20, 2013 at 1:53 PM, Peter Lieven p...@kamp.de wrote:

 On 20.12.2013 13:19, Stefan Hajnoczi wrote:

 On Fri, Dec 20, 2013 at 10:48:41AM +0100, Peter Lieven wrote:

 On 17.12.2013 17:47, Stefan Hajnoczi wrote:

 On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote:

 +/* set to -ENOTSUP since bdrv_allocated_file_size is only used
 + * in qemu-img open. So we can use the cached value for allocate
 + * filesize obtained from fstat at open time */
 +client-allocated_file_size = -ENOTSUP;

 Can you implement this fully?  By stubbing it out like this we won't
 be
 able to call get_allocated_file_size() at runtime in the future
 without
 updating the nfs block driver code.  It's just an fstat call,
 shouldn't
 be too hard to implement properly :).

 It seems I have to leave it as is currently.
 bdrv_get_allocated_file_size
 is not in a coroutine context. I get coroutine yields to no one.

 Create a coroutine and pump the event loop until it has reached
 completion:

 co = qemu_coroutine_create(my_coroutine_fn, ...);
 qemu_coroutine_enter(co, foo);
 while (!complete) {
   qemu_aio_wait();
 }

 See block.c for similar examples.

 Wouldn't it make sense to make this modification to
 bdrv_get_allocated_file_size in
 block.c rather than in client/nfs.c and in the future potentially other
 drivers?

 If yes, I would ask you to take v3 of the NFS protocol patch and I
 promise
 to send
 a follow up early next year to make this modification to block.c and
 change
 block/nfs.c
 and other implementations to be a coroutine_fn.

 .bdrv_get_allocated_file_size() implementations in other block drivers
 are synchronous.  Making the block driver interface use coroutines
 would be wrong unless all the block drivers were updated to use
 coroutines too.

 I can do that. I think its not too complicated because all those
 implementations do not rely on callbacks. It should be possible
 to just rename the existing implemenations to lets say
 .bdrv_co_get_allocated_file_size and call them inside a coroutine.

No, that would be wrong because coroutine functions should not block.
The point of coroutines is that if they cannot proceed they must yield
so the event loop regains control.  If you simply rename the function
to _co_ then they will block the event loop and not be true coroutine
functions.


 Can you just call nfs_fstat() (the sync libnfs interface)?

 I can only do that if its guaranteed that no other requests are in flight
 otherwise it will mess up.

How will it mess up?

Stefan

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-20 Thread Peter Lieven


On 20.12.2013 15:38, Stefan Hajnoczi wrote:

On Fri, Dec 20, 2013 at 3:07 PM, Peter Lieven p...@kamp.de wrote:

On 20.12.2013 14:57, Stefan Hajnoczi wrote:

On Fri, Dec 20, 2013 at 1:53 PM, Peter Lieven p...@kamp.de wrote:

On 20.12.2013 13:19, Stefan Hajnoczi wrote:

On Fri, Dec 20, 2013 at 10:48:41AM +0100, Peter Lieven wrote:

On 17.12.2013 17:47, Stefan Hajnoczi wrote:

On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote:

+/* set to -ENOTSUP since bdrv_allocated_file_size is only used
+ * in qemu-img open. So we can use the cached value for allocate
+ * filesize obtained from fstat at open time */
+client-allocated_file_size = -ENOTSUP;

Can you implement this fully?  By stubbing it out like this we won't
be
able to call get_allocated_file_size() at runtime in the future
without
updating the nfs block driver code.  It's just an fstat call,
shouldn't
be too hard to implement properly :).

It seems I have to leave it as is currently.
bdrv_get_allocated_file_size
is not in a coroutine context. I get coroutine yields to no one.

Create a coroutine and pump the event loop until it has reached
completion:

co = qemu_coroutine_create(my_coroutine_fn, ...);
qemu_coroutine_enter(co, foo);
while (!complete) {
   qemu_aio_wait();
}

See block.c for similar examples.

Wouldn't it make sense to make this modification to
bdrv_get_allocated_file_size in
block.c rather than in client/nfs.c and in the future potentially other
drivers?

If yes, I would ask you to take v3 of the NFS protocol patch and I
promise
to send
a follow up early next year to make this modification to block.c and
change
block/nfs.c
and other implementations to be a coroutine_fn.

.bdrv_get_allocated_file_size() implementations in other block drivers
are synchronous.  Making the block driver interface use coroutines
would be wrong unless all the block drivers were updated to use
coroutines too.

I can do that. I think its not too complicated because all those
implementations do not rely on callbacks. It should be possible
to just rename the existing implemenations to lets say
.bdrv_co_get_allocated_file_size and call them inside a coroutine.

No, that would be wrong because coroutine functions should not block.
The point of coroutines is that if they cannot proceed they must yield
so the event loop regains control.  If you simply rename the function
to _co_ then they will block the event loop and not be true coroutine
functions.


Can you just call nfs_fstat() (the sync libnfs interface)?

I can only do that if its guaranteed that no other requests are in flight
otherwise it will mess up.

How will it mess up?

The sync calls into libnfs are just wrappers around the async calls.
The problem is that this wrapper will handle all the callbacks for the
in-flight requests and they will never return.

Peter

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-20 Thread ronnie sahlberg

The sync calls uses a trivial eventloop built into libnfs using poll().

Mixing the _async() and _sync() interfaces in libnfs means you may
risk running nested eventloops. Pain and tears lie behind that door.

On Fri, Dec 20, 2013 at 6:43 AM, Peter Lieven p...@kamp.de wrote:
 On 20.12.2013 15:38, Stefan Hajnoczi wrote:

 On Fri, Dec 20, 2013 at 3:07 PM, Peter Lieven p...@kamp.de wrote:

 On 20.12.2013 14:57, Stefan Hajnoczi wrote:

 On Fri, Dec 20, 2013 at 1:53 PM, Peter Lieven p...@kamp.de wrote:

 On 20.12.2013 13:19, Stefan Hajnoczi wrote:

 On Fri, Dec 20, 2013 at 10:48:41AM +0100, Peter Lieven wrote:

 On 17.12.2013 17:47, Stefan Hajnoczi wrote:

 On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote:

 +/* set to -ENOTSUP since bdrv_allocated_file_size is only used
 + * in qemu-img open. So we can use the cached value for
 allocate
 + * filesize obtained from fstat at open time */
 +client-allocated_file_size = -ENOTSUP;

 Can you implement this fully?  By stubbing it out like this we won't
 be
 able to call get_allocated_file_size() at runtime in the future
 without
 updating the nfs block driver code.  It's just an fstat call,
 shouldn't
 be too hard to implement properly :).

 It seems I have to leave it as is currently.
 bdrv_get_allocated_file_size
 is not in a coroutine context. I get coroutine yields to no one.

 Create a coroutine and pump the event loop until it has reached
 completion:

 co = qemu_coroutine_create(my_coroutine_fn, ...);
 qemu_coroutine_enter(co, foo);
 while (!complete) {
qemu_aio_wait();
 }

 See block.c for similar examples.

 Wouldn't it make sense to make this modification to
 bdrv_get_allocated_file_size in
 block.c rather than in client/nfs.c and in the future potentially other
 drivers?

 If yes, I would ask you to take v3 of the NFS protocol patch and I
 promise
 to send
 a follow up early next year to make this modification to block.c and
 change
 block/nfs.c
 and other implementations to be a coroutine_fn.

 .bdrv_get_allocated_file_size() implementations in other block drivers
 are synchronous.  Making the block driver interface use coroutines
 would be wrong unless all the block drivers were updated to use
 coroutines too.

 I can do that. I think its not too complicated because all those
 implementations do not rely on callbacks. It should be possible
 to just rename the existing implemenations to lets say
 .bdrv_co_get_allocated_file_size and call them inside a coroutine.

 No, that would be wrong because coroutine functions should not block.
 The point of coroutines is that if they cannot proceed they must yield
 so the event loop regains control.  If you simply rename the function
 to _co_ then they will block the event loop and not be true coroutine
 functions.

 Can you just call nfs_fstat() (the sync libnfs interface)?

 I can only do that if its guaranteed that no other requests are in flight
 otherwise it will mess up.

 How will it mess up?

 The sync calls into libnfs are just wrappers around the async calls.
 The problem is that this wrapper will handle all the callbacks for the
 in-flight requests and they will never return.

 Peter

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-20 Thread Stefan Hajnoczi

On Fri, Dec 20, 2013 at 3:43 PM, Peter Lieven p...@kamp.de wrote:
 On 20.12.2013 15:38, Stefan Hajnoczi wrote:

 On Fri, Dec 20, 2013 at 3:07 PM, Peter Lieven p...@kamp.de wrote:

 On 20.12.2013 14:57, Stefan Hajnoczi wrote:

 On Fri, Dec 20, 2013 at 1:53 PM, Peter Lieven p...@kamp.de wrote:

 On 20.12.2013 13:19, Stefan Hajnoczi wrote:

 On Fri, Dec 20, 2013 at 10:48:41AM +0100, Peter Lieven wrote:

 On 17.12.2013 17:47, Stefan Hajnoczi wrote:

 On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote:

 +/* set to -ENOTSUP since bdrv_allocated_file_size is only used
 + * in qemu-img open. So we can use the cached value for
 allocate
 + * filesize obtained from fstat at open time */
 +client-allocated_file_size = -ENOTSUP;

 Can you implement this fully?  By stubbing it out like this we won't
 be
 able to call get_allocated_file_size() at runtime in the future
 without
 updating the nfs block driver code.  It's just an fstat call,
 shouldn't
 be too hard to implement properly :).

 It seems I have to leave it as is currently.
 bdrv_get_allocated_file_size
 is not in a coroutine context. I get coroutine yields to no one.

 Create a coroutine and pump the event loop until it has reached
 completion:

 co = qemu_coroutine_create(my_coroutine_fn, ...);
 qemu_coroutine_enter(co, foo);
 while (!complete) {
qemu_aio_wait();
 }

 See block.c for similar examples.

 Wouldn't it make sense to make this modification to
 bdrv_get_allocated_file_size in
 block.c rather than in client/nfs.c and in the future potentially other
 drivers?

 If yes, I would ask you to take v3 of the NFS protocol patch and I
 promise
 to send
 a follow up early next year to make this modification to block.c and
 change
 block/nfs.c
 and other implementations to be a coroutine_fn.

 .bdrv_get_allocated_file_size() implementations in other block drivers
 are synchronous.  Making the block driver interface use coroutines
 would be wrong unless all the block drivers were updated to use
 coroutines too.

 I can do that. I think its not too complicated because all those
 implementations do not rely on callbacks. It should be possible
 to just rename the existing implemenations to lets say
 .bdrv_co_get_allocated_file_size and call them inside a coroutine.

 No, that would be wrong because coroutine functions should not block.
 The point of coroutines is that if they cannot proceed they must yield
 so the event loop regains control.  If you simply rename the function
 to _co_ then they will block the event loop and not be true coroutine
 functions.

 Can you just call nfs_fstat() (the sync libnfs interface)?

 I can only do that if its guaranteed that no other requests are in flight
 otherwise it will mess up.

 How will it mess up?

 The sync calls into libnfs are just wrappers around the async calls.
 The problem is that this wrapper will handle all the callbacks for the
 in-flight requests and they will never return.

So back to my original suggestion to use a qemu_aio_wait() loop in block/nfs.c?

Stefan

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-20 Thread Peter Lieven


On 20.12.2013 16:30, Stefan Hajnoczi wrote:

On Fri, Dec 20, 2013 at 3:43 PM, Peter Lieven p...@kamp.de wrote:

On 20.12.2013 15:38, Stefan Hajnoczi wrote:

On Fri, Dec 20, 2013 at 3:07 PM, Peter Lieven p...@kamp.de wrote:

On 20.12.2013 14:57, Stefan Hajnoczi wrote:

On Fri, Dec 20, 2013 at 1:53 PM, Peter Lieven p...@kamp.de wrote:

On 20.12.2013 13:19, Stefan Hajnoczi wrote:

On Fri, Dec 20, 2013 at 10:48:41AM +0100, Peter Lieven wrote:

On 17.12.2013 17:47, Stefan Hajnoczi wrote:

On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote:

+/* set to -ENOTSUP since bdrv_allocated_file_size is only used
+ * in qemu-img open. So we can use the cached value for
allocate
+ * filesize obtained from fstat at open time */
+client-allocated_file_size = -ENOTSUP;

Can you implement this fully?  By stubbing it out like this we won't
be
able to call get_allocated_file_size() at runtime in the future
without
updating the nfs block driver code.  It's just an fstat call,
shouldn't
be too hard to implement properly :).

It seems I have to leave it as is currently.
bdrv_get_allocated_file_size
is not in a coroutine context. I get coroutine yields to no one.

Create a coroutine and pump the event loop until it has reached
completion:

co = qemu_coroutine_create(my_coroutine_fn, ...);
qemu_coroutine_enter(co, foo);
while (!complete) {
qemu_aio_wait();
}

See block.c for similar examples.

Wouldn't it make sense to make this modification to
bdrv_get_allocated_file_size in
block.c rather than in client/nfs.c and in the future potentially other
drivers?

If yes, I would ask you to take v3 of the NFS protocol patch and I
promise
to send
a follow up early next year to make this modification to block.c and
change
block/nfs.c
and other implementations to be a coroutine_fn.

.bdrv_get_allocated_file_size() implementations in other block drivers
are synchronous.  Making the block driver interface use coroutines
would be wrong unless all the block drivers were updated to use
coroutines too.

I can do that. I think its not too complicated because all those
implementations do not rely on callbacks. It should be possible
to just rename the existing implemenations to lets say
.bdrv_co_get_allocated_file_size and call them inside a coroutine.

No, that would be wrong because coroutine functions should not block.
The point of coroutines is that if they cannot proceed they must yield
so the event loop regains control.  If you simply rename the function
to _co_ then they will block the event loop and not be true coroutine
functions.


Can you just call nfs_fstat() (the sync libnfs interface)?

I can only do that if its guaranteed that no other requests are in flight
otherwise it will mess up.

How will it mess up?

The sync calls into libnfs are just wrappers around the async calls.
The problem is that this wrapper will handle all the callbacks for the
in-flight requests and they will never return.

So back to my original suggestion to use a qemu_aio_wait() loop in block/nfs.c?

sorry, I cannot follow you. but maybe this here is a short solution. question
is, what will happen when there are pending requests which invoke callbacks.
will we end up returning from qemu_aio_wait? in the qemu-img info case
this here works:

static int64_t nfs_get_allocated_file_size(BlockDriverState *bs)
{
NFSClient *client = bs-opaque;
NFSRPC task = {0};
struct stat st;

task.st = st;
if (nfs_fstat_async(client-context, client-fh, nfs_co_generic_cb,
task) != 0) {
return -ENOMEM;
}

while (!task.complete) {
nfs_set_events(client);
qemu_aio_wait();
}

return (task.status  0 ? task.status : st.st_blocks * st.st_blksize);
}

in theory we could also leave .bdrv_get_allocated_file_size completly out. Its 
just a
nice to have so that qemu-img info does not show unavailable for disk size.

Peter

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-20 Thread Stefan Hajnoczi

On Fri, Dec 20, 2013 at 4:49 PM, Peter Lieven p...@kamp.de wrote:
 On 20.12.2013 16:30, Stefan Hajnoczi wrote:

 On Fri, Dec 20, 2013 at 3:43 PM, Peter Lieven p...@kamp.de wrote:

 On 20.12.2013 15:38, Stefan Hajnoczi wrote:

 On Fri, Dec 20, 2013 at 3:07 PM, Peter Lieven p...@kamp.de wrote:

 On 20.12.2013 14:57, Stefan Hajnoczi wrote:

 On Fri, Dec 20, 2013 at 1:53 PM, Peter Lieven p...@kamp.de wrote:

 On 20.12.2013 13:19, Stefan Hajnoczi wrote:

 On Fri, Dec 20, 2013 at 10:48:41AM +0100, Peter Lieven wrote:

 On 17.12.2013 17:47, Stefan Hajnoczi wrote:

 On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote:

 +/* set to -ENOTSUP since bdrv_allocated_file_size is only
 used
 + * in qemu-img open. So we can use the cached value for
 allocate
 + * filesize obtained from fstat at open time */
 +client-allocated_file_size = -ENOTSUP;

 Can you implement this fully?  By stubbing it out like this we
 won't
 be
 able to call get_allocated_file_size() at runtime in the future
 without
 updating the nfs block driver code.  It's just an fstat call,
 shouldn't
 be too hard to implement properly :).

 It seems I have to leave it as is currently.
 bdrv_get_allocated_file_size
 is not in a coroutine context. I get coroutine yields to no one.

 Create a coroutine and pump the event loop until it has reached
 completion:

 co = qemu_coroutine_create(my_coroutine_fn, ...);
 qemu_coroutine_enter(co, foo);
 while (!complete) {
 qemu_aio_wait();
 }

 See block.c for similar examples.

 Wouldn't it make sense to make this modification to
 bdrv_get_allocated_file_size in
 block.c rather than in client/nfs.c and in the future potentially
 other
 drivers?

 If yes, I would ask you to take v3 of the NFS protocol patch and I
 promise
 to send
 a follow up early next year to make this modification to block.c and
 change
 block/nfs.c
 and other implementations to be a coroutine_fn.

 .bdrv_get_allocated_file_size() implementations in other block drivers
 are synchronous.  Making the block driver interface use coroutines
 would be wrong unless all the block drivers were updated to use
 coroutines too.

 I can do that. I think its not too complicated because all those
 implementations do not rely on callbacks. It should be possible
 to just rename the existing implemenations to lets say
 .bdrv_co_get_allocated_file_size and call them inside a coroutine.

 No, that would be wrong because coroutine functions should not block.
 The point of coroutines is that if they cannot proceed they must yield
 so the event loop regains control.  If you simply rename the function
 to _co_ then they will block the event loop and not be true coroutine
 functions.

 Can you just call nfs_fstat() (the sync libnfs interface)?

 I can only do that if its guaranteed that no other requests are in
 flight
 otherwise it will mess up.

 How will it mess up?

 The sync calls into libnfs are just wrappers around the async calls.
 The problem is that this wrapper will handle all the callbacks for the
 in-flight requests and they will never return.

 So back to my original suggestion to use a qemu_aio_wait() loop in
 block/nfs.c?

 sorry, I cannot follow you. but maybe this here is a short solution.
 question
 is, what will happen when there are pending requests which invoke callbacks.
 will we end up returning from qemu_aio_wait? in the qemu-img info case
 this here works:

 static int64_t nfs_get_allocated_file_size(BlockDriverState *bs)
 {

 NFSClient *client = bs-opaque;
 NFSRPC task = {0};
 struct stat st;

 task.st = st;
 if (nfs_fstat_async(client-context, client-fh, nfs_co_generic_cb,
 task) != 0) {
 return -ENOMEM;
 }

 while (!task.complete) {
 nfs_set_events(client);
 qemu_aio_wait();
 }

 return (task.status  0 ? task.status : st.st_blocks * st.st_blksize);
 }

Yes, that's exactly what I had in mind.

Yes, other callbacks will get called and requests will complete in
this event loop.  That can be a problem in some scenarios but should
be okay with bdrv_get_allocated_file_size().

Stefan

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-20 Thread Peter Lieven


On 20.12.2013 16:54, Stefan Hajnoczi wrote:

On Fri, Dec 20, 2013 at 4:49 PM, Peter Lieven p...@kamp.de wrote:

On 20.12.2013 16:30, Stefan Hajnoczi wrote:

On Fri, Dec 20, 2013 at 3:43 PM, Peter Lieven p...@kamp.de wrote:

On 20.12.2013 15:38, Stefan Hajnoczi wrote:

On Fri, Dec 20, 2013 at 3:07 PM, Peter Lieven p...@kamp.de wrote:

On 20.12.2013 14:57, Stefan Hajnoczi wrote:

On Fri, Dec 20, 2013 at 1:53 PM, Peter Lieven p...@kamp.de wrote:

On 20.12.2013 13:19, Stefan Hajnoczi wrote:

On Fri, Dec 20, 2013 at 10:48:41AM +0100, Peter Lieven wrote:

On 17.12.2013 17:47, Stefan Hajnoczi wrote:

On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote:

+/* set to -ENOTSUP since bdrv_allocated_file_size is only
used
+ * in qemu-img open. So we can use the cached value for
allocate
+ * filesize obtained from fstat at open time */
+client-allocated_file_size = -ENOTSUP;

Can you implement this fully?  By stubbing it out like this we
won't
be
able to call get_allocated_file_size() at runtime in the future
without
updating the nfs block driver code.  It's just an fstat call,
shouldn't
be too hard to implement properly :).

It seems I have to leave it as is currently.
bdrv_get_allocated_file_size
is not in a coroutine context. I get coroutine yields to no one.

Create a coroutine and pump the event loop until it has reached
completion:

co = qemu_coroutine_create(my_coroutine_fn, ...);
qemu_coroutine_enter(co, foo);
while (!complete) {
 qemu_aio_wait();
}

See block.c for similar examples.

Wouldn't it make sense to make this modification to
bdrv_get_allocated_file_size in
block.c rather than in client/nfs.c and in the future potentially
other
drivers?

If yes, I would ask you to take v3 of the NFS protocol patch and I
promise
to send
a follow up early next year to make this modification to block.c and
change
block/nfs.c
and other implementations to be a coroutine_fn.

.bdrv_get_allocated_file_size() implementations in other block drivers
are synchronous.  Making the block driver interface use coroutines
would be wrong unless all the block drivers were updated to use
coroutines too.

I can do that. I think its not too complicated because all those
implementations do not rely on callbacks. It should be possible
to just rename the existing implemenations to lets say
.bdrv_co_get_allocated_file_size and call them inside a coroutine.

No, that would be wrong because coroutine functions should not block.
The point of coroutines is that if they cannot proceed they must yield
so the event loop regains control.  If you simply rename the function
to _co_ then they will block the event loop and not be true coroutine
functions.


Can you just call nfs_fstat() (the sync libnfs interface)?

I can only do that if its guaranteed that no other requests are in
flight
otherwise it will mess up.

How will it mess up?

The sync calls into libnfs are just wrappers around the async calls.
The problem is that this wrapper will handle all the callbacks for the
in-flight requests and they will never return.

So back to my original suggestion to use a qemu_aio_wait() loop in
block/nfs.c?

sorry, I cannot follow you. but maybe this here is a short solution.
question
is, what will happen when there are pending requests which invoke callbacks.
will we end up returning from qemu_aio_wait? in the qemu-img info case
this here works:

static int64_t nfs_get_allocated_file_size(BlockDriverState *bs)
{

 NFSClient *client = bs-opaque;
 NFSRPC task = {0};
 struct stat st;

 task.st = st;
 if (nfs_fstat_async(client-context, client-fh, nfs_co_generic_cb,
 task) != 0) {
 return -ENOMEM;
 }

 while (!task.complete) {
 nfs_set_events(client);
 qemu_aio_wait();
 }

 return (task.status  0 ? task.status : st.st_blocks * st.st_blksize);
}

Yes, that's exactly what I had in mind.

Yes, other callbacks will get called and requests will complete in
this event loop.  That can be a problem in some scenarios but should
be okay with bdrv_get_allocated_file_size().

ok I will send v4 and then prepare for the holidays ;-)

Peter

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-20 Thread Stefan Hajnoczi

On Fri, Dec 20, 2013 at 4:57 PM, Peter Lieven p...@kamp.de wrote:
 On 20.12.2013 16:54, Stefan Hajnoczi wrote:

 On Fri, Dec 20, 2013 at 4:49 PM, Peter Lieven p...@kamp.de wrote:

 On 20.12.2013 16:30, Stefan Hajnoczi wrote:

 On Fri, Dec 20, 2013 at 3:43 PM, Peter Lieven p...@kamp.de wrote:

 On 20.12.2013 15:38, Stefan Hajnoczi wrote:

 On Fri, Dec 20, 2013 at 3:07 PM, Peter Lieven p...@kamp.de wrote:

 On 20.12.2013 14:57, Stefan Hajnoczi wrote:

 On Fri, Dec 20, 2013 at 1:53 PM, Peter Lieven p...@kamp.de wrote:

 On 20.12.2013 13:19, Stefan Hajnoczi wrote:

 On Fri, Dec 20, 2013 at 10:48:41AM +0100, Peter Lieven wrote:

 On 17.12.2013 17:47, Stefan Hajnoczi wrote:

 On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote:

 +/* set to -ENOTSUP since bdrv_allocated_file_size is only
 used
 + * in qemu-img open. So we can use the cached value for
 allocate
 + * filesize obtained from fstat at open time */
 +client-allocated_file_size = -ENOTSUP;

 Can you implement this fully?  By stubbing it out like this we
 won't
 be
 able to call get_allocated_file_size() at runtime in the future
 without
 updating the nfs block driver code.  It's just an fstat call,
 shouldn't
 be too hard to implement properly :).

 It seems I have to leave it as is currently.
 bdrv_get_allocated_file_size
 is not in a coroutine context. I get coroutine yields to no one.

 Create a coroutine and pump the event loop until it has reached
 completion:

 co = qemu_coroutine_create(my_coroutine_fn, ...);
 qemu_coroutine_enter(co, foo);
 while (!complete) {
  qemu_aio_wait();
 }

 See block.c for similar examples.

 Wouldn't it make sense to make this modification to
 bdrv_get_allocated_file_size in
 block.c rather than in client/nfs.c and in the future potentially
 other
 drivers?

 If yes, I would ask you to take v3 of the NFS protocol patch and I
 promise
 to send
 a follow up early next year to make this modification to block.c
 and
 change
 block/nfs.c
 and other implementations to be a coroutine_fn.

 .bdrv_get_allocated_file_size() implementations in other block
 drivers
 are synchronous.  Making the block driver interface use coroutines
 would be wrong unless all the block drivers were updated to use
 coroutines too.

 I can do that. I think its not too complicated because all those
 implementations do not rely on callbacks. It should be possible
 to just rename the existing implemenations to lets say
 .bdrv_co_get_allocated_file_size and call them inside a coroutine.

 No, that would be wrong because coroutine functions should not block.
 The point of coroutines is that if they cannot proceed they must yield
 so the event loop regains control.  If you simply rename the function
 to _co_ then they will block the event loop and not be true coroutine
 functions.

 Can you just call nfs_fstat() (the sync libnfs interface)?

 I can only do that if its guaranteed that no other requests are in
 flight
 otherwise it will mess up.

 How will it mess up?

 The sync calls into libnfs are just wrappers around the async calls.
 The problem is that this wrapper will handle all the callbacks for the
 in-flight requests and they will never return.

 So back to my original suggestion to use a qemu_aio_wait() loop in
 block/nfs.c?

 sorry, I cannot follow you. but maybe this here is a short solution.
 question
 is, what will happen when there are pending requests which invoke
 callbacks.
 will we end up returning from qemu_aio_wait? in the qemu-img info case
 this here works:

 static int64_t nfs_get_allocated_file_size(BlockDriverState *bs)
 {

  NFSClient *client = bs-opaque;
  NFSRPC task = {0};
  struct stat st;

  task.st = st;
  if (nfs_fstat_async(client-context, client-fh, nfs_co_generic_cb,
  task) != 0) {
  return -ENOMEM;
  }

  while (!task.complete) {
  nfs_set_events(client);
  qemu_aio_wait();
  }

  return (task.status  0 ? task.status : st.st_blocks *
 st.st_blksize);
 }

 Yes, that's exactly what I had in mind.

 Yes, other callbacks will get called and requests will complete in
 this event loop.  That can be a problem in some scenarios but should
 be okay with bdrv_get_allocated_file_size().

 ok I will send v4 and then prepare for the holidays ;-)

Happy holidays!  I already sent out the block pull request earlier
today but your patch will be included in the first January pull
request.

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-19 Thread Paolo Bonzini

Il 18/12/2013 18:21, Peter Lieven ha scritto:
 Not completely I think, but please correct me if I am wrong.
 
 If cache=writeback is set we issue just a write. In libnfs or libiscsi case
 that guarantees that the request has been successfully executed
 on the target / server. This is enough to be consistent in case of
 migration because consecutive reads will be answered correctly.
 
 But the target / server is still free to just keep this
 data in memory so we should only set this if we know that the target / server
 e.g. has a battery backup or we know that the filesystem uses e.g. barriers 
 and
 issues a flush at important points.

Yes.  However, the same holds for cache=none (both on
libnfs/libiscsi/NBD/others, where it is a nop, and on regular I/O, where
it uses O_DIRECT *but not O_SYNC or O_DSYNC*).

 If cache=none is set we issue a flush after every single write.

That's what you get with cache=writethrough or cache=directsync.  Here
the flush is added automatically by the block layer, so it works with
every driver.

Paolo

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-18 Thread Daniel P. Berrange

On Wed, Dec 18, 2013 at 12:03:24AM +0100, Peter Lieven wrote:
 
 
  Am 17.12.2013 um 18:32 schrieb Daniel P. Berrange berra...@redhat.com:
  
  On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote:
  This patch adds native support for accessing images on NFS shares without
  the requirement to actually mount the entire NFS share on the host.
  
  NFS Images can simply be specified by an url of the form:
  nfs://host/export/filename
  
  For example:
  qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2
  
  Does it support other config tunables, eg specifying which
  NFS version to use 2/3/4 ? If so will they be available as
  URI parameters in the obvious manner ?
 
 currently only v3 is supported by libnfs. what other tunables would you like 
 to see?

I didn't have any particular list in mind beyond just protocol for now. 


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-18 Thread Orit Wasserman


On 12/18/2013 01:03 AM, Peter Lieven wrote:




Am 17.12.2013 um 18:32 schrieb Daniel P. Berrange berra...@redhat.com:


On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote:
This patch adds native support for accessing images on NFS shares without
the requirement to actually mount the entire NFS share on the host.

NFS Images can simply be specified by an url of the form:
nfs://host/export/filename

For example:
qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2


Does it support other config tunables, eg specifying which
NFS version to use 2/3/4 ? If so will they be available as
URI parameters in the obvious manner ?


currently only v3 is supported by libnfs. what other tunables would you like to 
see?



For live migration we need the sync option (async ignores O_SYNC and O_DIRECT 
sadly),
will it be supported? or will it be the default?

Orit



Daniel
--
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-18 Thread Daniel P. Berrange

On Wed, Dec 18, 2013 at 12:00:03PM +0200, Orit Wasserman wrote:
 On 12/18/2013 01:03 AM, Peter Lieven wrote:
 
 
 Am 17.12.2013 um 18:32 schrieb Daniel P. Berrange berra...@redhat.com:
 
 On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote:
 This patch adds native support for accessing images on NFS shares without
 the requirement to actually mount the entire NFS share on the host.
 
 NFS Images can simply be specified by an url of the form:
 nfs://host/export/filename
 
 For example:
 qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2
 
 Does it support other config tunables, eg specifying which
 NFS version to use 2/3/4 ? If so will they be available as
 URI parameters in the obvious manner ?
 
 currently only v3 is supported by libnfs. what other tunables would you like 
 to see?
 
 
 For live migration we need the sync option (async ignores O_SYNC and O_DIRECT 
 sadly),
 will it be supported? or will it be the default?

Since this is bypassing the client kernel FS I/O layer question around
support of things like O_SYNC/O_DIRECT are not applicable.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-18 Thread Orit Wasserman


On 12/18/2013 12:18 PM, Daniel P. Berrange wrote:

On Wed, Dec 18, 2013 at 12:00:03PM +0200, Orit Wasserman wrote:

On 12/18/2013 01:03 AM, Peter Lieven wrote:




Am 17.12.2013 um 18:32 schrieb Daniel P. Berrange berra...@redhat.com:


On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote:
This patch adds native support for accessing images on NFS shares without
the requirement to actually mount the entire NFS share on the host.

NFS Images can simply be specified by an url of the form:
nfs://host/export/filename

For example:
qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2


Does it support other config tunables, eg specifying which
NFS version to use 2/3/4 ? If so will they be available as
URI parameters in the obvious manner ?


currently only v3 is supported by libnfs. what other tunables would you like to 
see?



For live migration we need the sync option (async ignores O_SYNC and O_DIRECT 
sadly),
will it be supported? or will it be the default?


Since this is bypassing the client kernel FS I/O layer question around
support of things like O_SYNC/O_DIRECT are not applicable.



so no live migration support?
 

Daniel

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-18 Thread Paolo Bonzini

Il 18/12/2013 11:24, Orit Wasserman ha scritto:

 For live migration we need the sync option (async ignores O_SYNC and
 O_DIRECT sadly),
 will it be supported? or will it be the default?

 Since this is bypassing the client kernel FS I/O layer question around
 support of things like O_SYNC/O_DIRECT are not applicable.

 
 so no live migration support?

No, live migration just works.

O_SYNC is not used anymore in QEMU, we just issue a flush after every write.

And all protocol drivers except files/block devices _always_ bypass the
kernel page cache (libiscsi, NBD, ceph, and now libnfs) so they are
always behaving as if they had O_DIRECT.  For these protocols,
cache=writeback and cache=none are entirely the same.

Paolo

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-18 Thread Peter Lieven



 Am 18.12.2013 um 11:00 schrieb Orit Wasserman owass...@redhat.com:
 
 On 12/18/2013 01:03 AM, Peter Lieven wrote:
 
 
 Am 17.12.2013 um 18:32 schrieb Daniel P. Berrange berra...@redhat.com:
 
 On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote:
 This patch adds native support for accessing images on NFS shares without
 the requirement to actually mount the entire NFS share on the host.
 
 NFS Images can simply be specified by an url of the form:
 nfs://host/export/filename
 
 For example:
 qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2
 
 Does it support other config tunables, eg specifying which
 NFS version to use 2/3/4 ? If so will they be available as
 URI parameters in the obvious manner ?
 
 currently only v3 is supported by libnfs. what other tunables would you like 
 to see?
 
 For live migration we need the sync option (async ignores O_SYNC and O_DIRECT 
 sadly),
 will it be supported? or will it be the default?

the async you see in the libnfs calls refer to the async API. bdrv_flush will 
not return before the nfs server completes the request.

Peter

 
 Orit
 
 
 Daniel
 --
 |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ 
 :|
 |: http://libvirt.org  -o- http://virt-manager.org 
 :|
 |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ 
 :|
 |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc 
 :|

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-18 Thread Orit Wasserman


On 12/18/2013 01:11 PM, Peter Lieven wrote:




Am 18.12.2013 um 11:00 schrieb Orit Wasserman owass...@redhat.com:


On 12/18/2013 01:03 AM, Peter Lieven wrote:



Am 17.12.2013 um 18:32 schrieb Daniel P. Berrange berra...@redhat.com:

On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote:
This patch adds native support for accessing images on NFS shares without
the requirement to actually mount the entire NFS share on the host.

NFS Images can simply be specified by an url of the form:
nfs://host/export/filename

For example:
qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2


Does it support other config tunables, eg specifying which
NFS version to use 2/3/4 ? If so will they be available as
URI parameters in the obvious manner ?


currently only v3 is supported by libnfs. what other tunables would you like to 
see?


For live migration we need the sync option (async ignores O_SYNC and O_DIRECT 
sadly),
will it be supported? or will it be the default?


the async you see in the libnfs calls refer to the async API. bdrv_flush will 
not return before the nfs server completes the request.



That great!

Thanks,
Orit


Peter



Orit



Daniel
--
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-18 Thread ronnie sahlberg

On Wed, Dec 18, 2013 at 2:00 AM, Orit Wasserman owass...@redhat.com wrote:
 On 12/18/2013 01:03 AM, Peter Lieven wrote:



 Am 17.12.2013 um 18:32 schrieb Daniel P. Berrange
 berra...@redhat.com:

 On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote:
 This patch adds native support for accessing images on NFS shares
 without
 the requirement to actually mount the entire NFS share on the host.

 NFS Images can simply be specified by an url of the form:
 nfs://host/export/filename

 For example:
 qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2


 Does it support other config tunables, eg specifying which
 NFS version to use 2/3/4 ? If so will they be available as
 URI parameters in the obvious manner ?


 currently only v3 is supported by libnfs. what other tunables would you
 like to see?


 For live migration we need the sync option (async ignores O_SYNC and
 O_DIRECT sadly),
 will it be supported? or will it be the default?


If you use the high-level API that provides posix like functions, such
as nfs_open() then libnfs does.
nfs_open()/nfs_open_async() takes a mode parameter and libnfs checks
the O_SYNC flag in modes.

By default libnfs will translate any nfs_write*() or nfs_pwrite*() to
NFS/WRITE3+UNSTABLE that allows the server to just write to
cache/memory.

IF you specify O_SYNC in the mode argument to nfds_open/nfs_open_async
then libnfs will flag this handle as sync and any calls to
nfs_write/nfs_pwrite will translate to NFS/WRITE3+FILE_SYNC

Calls to nfs_fsync is translated to NFS/COMMIT3



Of course, as for normal file I/O this is useful but not great since
you can only control the sync vs async per open filehandle.
Libnfs does also allow you to send raw rpc commands to the server and
using this API you can control the sync behaviour for individual
writes.
This means you coould do something like
* emulate SCSI to the guest.
* if guest sends SCSI/WRITE* without any FUA bits set, then for that
I/O you send a NFS3/WRITE+UNSTABLE
* if guest sends SCSI/WRITE* with FUA bits set, then for that I/O you
send NFS3/WRITE+FILE_SYNC
and then the guest kernel can control for each individual write
whether it is sync or not.

But that is probably something that can wait until later and don't
need to be part of the initial patch?
If peter wants to do this in the future I can create a small writeup
on how to mixin the two different APIs using the same context.

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-18 Thread Peter Lieven


Am 18.12.2013 um 15:42 schrieb ronnie sahlberg ronniesahlb...@gmail.com:

 On Wed, Dec 18, 2013 at 2:00 AM, Orit Wasserman owass...@redhat.com wrote:
 On 12/18/2013 01:03 AM, Peter Lieven wrote:
 
 
 
 Am 17.12.2013 um 18:32 schrieb Daniel P. Berrange
 berra...@redhat.com:
 
 On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote:
 This patch adds native support for accessing images on NFS shares
 without
 the requirement to actually mount the entire NFS share on the host.
 
 NFS Images can simply be specified by an url of the form:
 nfs://host/export/filename
 
 For example:
 qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2
 
 
 Does it support other config tunables, eg specifying which
 NFS version to use 2/3/4 ? If so will they be available as
 URI parameters in the obvious manner ?
 
 
 currently only v3 is supported by libnfs. what other tunables would you
 like to see?
 
 
 For live migration we need the sync option (async ignores O_SYNC and
 O_DIRECT sadly),
 will it be supported? or will it be the default?
 
 
 If you use the high-level API that provides posix like functions, such
 as nfs_open() then libnfs does.
 nfs_open()/nfs_open_async() takes a mode parameter and libnfs checks
 the O_SYNC flag in modes.
 
 By default libnfs will translate any nfs_write*() or nfs_pwrite*() to
 NFS/WRITE3+UNSTABLE that allows the server to just write to
 cache/memory.
 
 IF you specify O_SYNC in the mode argument to nfds_open/nfs_open_async
 then libnfs will flag this handle as sync and any calls to
 nfs_write/nfs_pwrite will translate to NFS/WRITE3+FILE_SYNC
 
 Calls to nfs_fsync is translated to NFS/COMMIT3

If this NFS/COMMIT3 would issue a sync on the server that would be all we
actually need. And in this case migration should be safe. Even if we open
a file with cache = none qemu would issue such a commit after every write.

This also allow for writeback caching where the filesystem flushes would
go through right to the server.


 
 
 
 Of course, as for normal file I/O this is useful but not great since
 you can only control the sync vs async per open filehandle.
 Libnfs does also allow you to send raw rpc commands to the server and
 using this API you can control the sync behaviour for individual
 writes.
 This means you coould do something like
 * emulate SCSI to the guest.
 * if guest sends SCSI/WRITE* without any FUA bits set, then for that
 I/O you send a NFS3/WRITE+UNSTABLE
 * if guest sends SCSI/WRITE* with FUA bits set, then for that I/O you
 send NFS3/WRITE+FILE_SYNC
 and then the guest kernel can control for each individual write
 whether it is sync or not.
 
 But that is probably something that can wait until later and don't
 need to be part of the initial patch?
 If peter wants to do this in the future I can create a small writeup
 on how to mixin the two different APIs using the same context.

We can do that, but I would like to focus on the basic functionality
first.

Peter

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-18 Thread Peter Lieven


Am 18.12.2013 um 11:38 schrieb Paolo Bonzini pbonz...@redhat.com:

 Il 18/12/2013 11:24, Orit Wasserman ha scritto:
 
 For live migration we need the sync option (async ignores O_SYNC and
 O_DIRECT sadly),
 will it be supported? or will it be the default?
 
 Since this is bypassing the client kernel FS I/O layer question around
 support of things like O_SYNC/O_DIRECT are not applicable.
 
 
 so no live migration support?
 
 No, live migration just works.
 
 O_SYNC is not used anymore in QEMU, we just issue a flush after every write.
 
 And all protocol drivers except files/block devices _always_ bypass the
 kernel page cache (libiscsi, NBD, ceph, and now libnfs) so they are
 always behaving as if they had O_DIRECT.  For these protocols,
 cache=writeback and cache=none are entirely the same.

Not completely I think, but please correct me if I am wrong.

If cache=writeback is set we issue just a write. In libnfs or libiscsi case
that guarantees that the request has been successfully executed
on the target / server. This is enough to be consistent in case of
migration because consecutive reads will be answered correctly.

But the target / server is still free to just keep this
data in memory so we should only set this if we know that the target / server
e.g. has a battery backup or we know that the filesystem uses e.g. barriers and
issues a flush at important points.

If cache=none is set we issue a flush after every single write. In the libiscsi
case we issue a synchronize cache which should also guarantee that
data is flushed to disk. I hope that the NFS_COMMIT does the same in the
libnfs case?!

BR,
Peter

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-18 Thread ronnie sahlberg

On Wed, Dec 18, 2013 at 8:59 AM, Peter Lieven p...@kamp.de wrote:

 Am 18.12.2013 um 15:42 schrieb ronnie sahlberg ronniesahlb...@gmail.com:

 On Wed, Dec 18, 2013 at 2:00 AM, Orit Wasserman owass...@redhat.com wrote:
 On 12/18/2013 01:03 AM, Peter Lieven wrote:



 Am 17.12.2013 um 18:32 schrieb Daniel P. Berrange
 berra...@redhat.com:

 On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote:
 This patch adds native support for accessing images on NFS shares
 without
 the requirement to actually mount the entire NFS share on the host.

 NFS Images can simply be specified by an url of the form:
 nfs://host/export/filename

 For example:
 qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2


 Does it support other config tunables, eg specifying which
 NFS version to use 2/3/4 ? If so will they be available as
 URI parameters in the obvious manner ?


 currently only v3 is supported by libnfs. what other tunables would you
 like to see?


 For live migration we need the sync option (async ignores O_SYNC and
 O_DIRECT sadly),
 will it be supported? or will it be the default?


 If you use the high-level API that provides posix like functions, such
 as nfs_open() then libnfs does.
 nfs_open()/nfs_open_async() takes a mode parameter and libnfs checks
 the O_SYNC flag in modes.

 By default libnfs will translate any nfs_write*() or nfs_pwrite*() to
 NFS/WRITE3+UNSTABLE that allows the server to just write to
 cache/memory.

 IF you specify O_SYNC in the mode argument to nfds_open/nfs_open_async
 then libnfs will flag this handle as sync and any calls to
 nfs_write/nfs_pwrite will translate to NFS/WRITE3+FILE_SYNC

 Calls to nfs_fsync is translated to NFS/COMMIT3

 If this NFS/COMMIT3 would issue a sync on the server that would be all we
 actually need.

You have that guarantee in NFS/COMMIT3
NFS/COMMIT3 will not return until the server has flushed the specified
range to disk.

However, while the NFS protocol allows you to specify a range for the
COMMIT3 call so that you can do things like
WRITE3 Offset:foo Length:bar
COMMIT3 Offset:foo Length:bar
many/most nfs servers will ignore the offset/length arguments to the
COMMIT3 call and always unconditionally make an fsync() for the whole
file.

This can make the COMMIT3 call very expensive for large files.


NFSv3 also supports FILE_SYNC write mode, which libnfs triggers if you
specify O_SYNC to nfs_open*()
In this mode every single NFS/WRITE3 is sent with the FILE_SYNC mode
which means that the server will guarantee to write the data to stable
storage before responding back to the client.
In this mode there is no real need to do anything at all or even call
COMMIT3  since there is never any writeback data on the server that
needs to be destaged.


Since many servers treat COMMIT3 as unconditionally walk all blocks
for the whole file and make sure they are destaged it is not clear
whether how

WRITE3-normal Offset:foo Length:bar
COMMIT3 Offset:foo Length:bar

will compare to

WRITE3+O_SYNC Offset:foo Length:bar

I would not be surprised if the second mode would have higher
(potentially significantly) performance than the former.



 And in this case migration should be safe. Even if we open
 a file with cache = none qemu would issue such a commit after every write.

 This also allow for writeback caching where the filesystem flushes would
 go through right to the server.





 Of course, as for normal file I/O this is useful but not great since
 you can only control the sync vs async per open filehandle.
 Libnfs does also allow you to send raw rpc commands to the server and
 using this API you can control the sync behaviour for individual
 writes.
 This means you coould do something like
 * emulate SCSI to the guest.
 * if guest sends SCSI/WRITE* without any FUA bits set, then for that
 I/O you send a NFS3/WRITE+UNSTABLE
 * if guest sends SCSI/WRITE* with FUA bits set, then for that I/O you
 send NFS3/WRITE+FILE_SYNC
 and then the guest kernel can control for each individual write
 whether it is sync or not.

 But that is probably something that can wait until later and don't
 need to be part of the initial patch?
 If peter wants to do this in the future I can create a small writeup
 on how to mixin the two different APIs using the same context.

 We can do that, but I would like to focus on the basic functionality
 first.

ACK


 Peter

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-18 Thread Peter Lieven


Am 18.12.2013 um 18:33 schrieb ronnie sahlberg ronniesahlb...@gmail.com:

 On Wed, Dec 18, 2013 at 8:59 AM, Peter Lieven p...@kamp.de wrote:
 
 Am 18.12.2013 um 15:42 schrieb ronnie sahlberg ronniesahlb...@gmail.com:
 
 On Wed, Dec 18, 2013 at 2:00 AM, Orit Wasserman owass...@redhat.com wrote:
 On 12/18/2013 01:03 AM, Peter Lieven wrote:
 
 
 
 Am 17.12.2013 um 18:32 schrieb Daniel P. Berrange
 berra...@redhat.com:
 
 On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote:
 This patch adds native support for accessing images on NFS shares
 without
 the requirement to actually mount the entire NFS share on the host.
 
 NFS Images can simply be specified by an url of the form:
 nfs://host/export/filename
 
 For example:
 qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2
 
 
 Does it support other config tunables, eg specifying which
 NFS version to use 2/3/4 ? If so will they be available as
 URI parameters in the obvious manner ?
 
 
 currently only v3 is supported by libnfs. what other tunables would you
 like to see?
 
 
 For live migration we need the sync option (async ignores O_SYNC and
 O_DIRECT sadly),
 will it be supported? or will it be the default?
 
 
 If you use the high-level API that provides posix like functions, such
 as nfs_open() then libnfs does.
 nfs_open()/nfs_open_async() takes a mode parameter and libnfs checks
 the O_SYNC flag in modes.
 
 By default libnfs will translate any nfs_write*() or nfs_pwrite*() to
 NFS/WRITE3+UNSTABLE that allows the server to just write to
 cache/memory.
 
 IF you specify O_SYNC in the mode argument to nfds_open/nfs_open_async
 then libnfs will flag this handle as sync and any calls to
 nfs_write/nfs_pwrite will translate to NFS/WRITE3+FILE_SYNC
 
 Calls to nfs_fsync is translated to NFS/COMMIT3
 
 If this NFS/COMMIT3 would issue a sync on the server that would be all we
 actually need.
 
 You have that guarantee in NFS/COMMIT3
 NFS/COMMIT3 will not return until the server has flushed the specified
 range to disk.
 
 However, while the NFS protocol allows you to specify a range for the
 COMMIT3 call so that you can do things like
 WRITE3 Offset:foo Length:bar
 COMMIT3 Offset:foo Length:bar
 many/most nfs servers will ignore the offset/length arguments to the
 COMMIT3 call and always unconditionally make an fsync() for the whole
 file.
 
 This can make the COMMIT3 call very expensive for large files.
 
 
 NFSv3 also supports FILE_SYNC write mode, which libnfs triggers if you
 specify O_SYNC to nfs_open*()
 In this mode every single NFS/WRITE3 is sent with the FILE_SYNC mode
 which means that the server will guarantee to write the data to stable
 storage before responding back to the client.
 In this mode there is no real need to do anything at all or even call
 COMMIT3  since there is never any writeback data on the server that
 needs to be destaged.
 
 
 Since many servers treat COMMIT3 as unconditionally walk all blocks
 for the whole file and make sure they are destaged it is not clear
 whether how
 
 WRITE3-normal Offset:foo Length:bar
 COMMIT3 Offset:foo Length:bar
 
 will compare to
 
 WRITE3+O_SYNC Offset:foo Length:bar
 
 I would not be surprised if the second mode would have higher
 (potentially significantly) performance than the former.

The qemu block layer currently is designed to send a bdrv_flush after every 
single
write if the write cache is not enabled. This means that the unwritten data is 
just
the data of the single write operation. However, changing this to issue a sync
write call would require to change the whole API. The major problem is that
the write cache setting can be changed while the device is open otherwise
we could just ignore all calls to bdrv flush if the device was opened without
enabled write cache.

In the very popular case of using Virtio as Driver it is the case that the 
device
is always opened with disabled write cache and the write cache is only
enabled after the host has negotiated with the guest that the guest is
able to send flushed.

We can keep in mind for a later version of the driver that we manually craft
a write call with O_SYNC if the write cache is disabled and ignore bdrv_flush.
And we use async write + commit via bdrv_flush in the case of an enabled
write cache.

Peter

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-18 Thread ronnie sahlberg

On Wed, Dec 18, 2013 at 9:42 AM, Peter Lieven p...@kamp.de wrote:

 Am 18.12.2013 um 18:33 schrieb ronnie sahlberg ronniesahlb...@gmail.com:

 On Wed, Dec 18, 2013 at 8:59 AM, Peter Lieven p...@kamp.de wrote:

 Am 18.12.2013 um 15:42 schrieb ronnie sahlberg ronniesahlb...@gmail.com:

 On Wed, Dec 18, 2013 at 2:00 AM, Orit Wasserman owass...@redhat.com 
 wrote:
 On 12/18/2013 01:03 AM, Peter Lieven wrote:



 Am 17.12.2013 um 18:32 schrieb Daniel P. Berrange
 berra...@redhat.com:

 On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote:
 This patch adds native support for accessing images on NFS shares
 without
 the requirement to actually mount the entire NFS share on the host.

 NFS Images can simply be specified by an url of the form:
 nfs://host/export/filename

 For example:
 qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2


 Does it support other config tunables, eg specifying which
 NFS version to use 2/3/4 ? If so will they be available as
 URI parameters in the obvious manner ?


 currently only v3 is supported by libnfs. what other tunables would you
 like to see?


 For live migration we need the sync option (async ignores O_SYNC and
 O_DIRECT sadly),
 will it be supported? or will it be the default?


 If you use the high-level API that provides posix like functions, such
 as nfs_open() then libnfs does.
 nfs_open()/nfs_open_async() takes a mode parameter and libnfs checks
 the O_SYNC flag in modes.

 By default libnfs will translate any nfs_write*() or nfs_pwrite*() to
 NFS/WRITE3+UNSTABLE that allows the server to just write to
 cache/memory.

 IF you specify O_SYNC in the mode argument to nfds_open/nfs_open_async
 then libnfs will flag this handle as sync and any calls to
 nfs_write/nfs_pwrite will translate to NFS/WRITE3+FILE_SYNC

 Calls to nfs_fsync is translated to NFS/COMMIT3

 If this NFS/COMMIT3 would issue a sync on the server that would be all we
 actually need.

 You have that guarantee in NFS/COMMIT3
 NFS/COMMIT3 will not return until the server has flushed the specified
 range to disk.

 However, while the NFS protocol allows you to specify a range for the
 COMMIT3 call so that you can do things like
 WRITE3 Offset:foo Length:bar
 COMMIT3 Offset:foo Length:bar
 many/most nfs servers will ignore the offset/length arguments to the
 COMMIT3 call and always unconditionally make an fsync() for the whole
 file.

 This can make the COMMIT3 call very expensive for large files.


 NFSv3 also supports FILE_SYNC write mode, which libnfs triggers if you
 specify O_SYNC to nfs_open*()
 In this mode every single NFS/WRITE3 is sent with the FILE_SYNC mode
 which means that the server will guarantee to write the data to stable
 storage before responding back to the client.
 In this mode there is no real need to do anything at all or even call
 COMMIT3  since there is never any writeback data on the server that
 needs to be destaged.


 Since many servers treat COMMIT3 as unconditionally walk all blocks
 for the whole file and make sure they are destaged it is not clear
 whether how

 WRITE3-normal Offset:foo Length:bar
 COMMIT3 Offset:foo Length:bar

 will compare to

 WRITE3+O_SYNC Offset:foo Length:bar

 I would not be surprised if the second mode would have higher
 (potentially significantly) performance than the former.

 The qemu block layer currently is designed to send a bdrv_flush after every 
 single
 write if the write cache is not enabled. This means that the unwritten data 
 is just
 the data of the single write operation.

I understand that, there is only a single WRITE3 worth of data to
actually destage each time.

But what I meant is that for a lot of servers, for large files,   the
server might need to spend non-trivial amount of time
crunching file metadata and check every single page for the file in
order to discover the I only need to destage pages x,y,z

On many nfs servers this figure out which blocks to flush can take a
lot of time and affect performance greatly.



 However, changing this to issue a sync
 write call would require to change the whole API. The major problem is that
 the write cache setting can be changed while the device is open otherwise
 we could just ignore all calls to bdrv flush if the device was opened without
 enabled write cache.

 In the very popular case of using Virtio as Driver it is the case that the 
 device
 is always opened with disabled write cache and the write cache is only
 enabled after the host has negotiated with the guest that the guest is
 able to send flushed.

 We can keep in mind for a later version of the driver that we manually craft
 a write call with O_SYNC if the write cache is disabled and ignore bdrv_flush.
 And we use async write + commit via bdrv_flush in the case of an enabled
 write cache.

 Peter

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-18 Thread Peter Lieven


Am 18.12.2013 um 18:50 schrieb ronnie sahlberg ronniesahlb...@gmail.com:

 On Wed, Dec 18, 2013 at 9:42 AM, Peter Lieven p...@kamp.de wrote:
 
 Am 18.12.2013 um 18:33 schrieb ronnie sahlberg ronniesahlb...@gmail.com:
 
 On Wed, Dec 18, 2013 at 8:59 AM, Peter Lieven p...@kamp.de wrote:
 
 Am 18.12.2013 um 15:42 schrieb ronnie sahlberg ronniesahlb...@gmail.com:
 
 On Wed, Dec 18, 2013 at 2:00 AM, Orit Wasserman owass...@redhat.com 
 wrote:
 On 12/18/2013 01:03 AM, Peter Lieven wrote:
 
 
 
 Am 17.12.2013 um 18:32 schrieb Daniel P. Berrange
 berra...@redhat.com:
 
 On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote:
 This patch adds native support for accessing images on NFS shares
 without
 the requirement to actually mount the entire NFS share on the host.
 
 NFS Images can simply be specified by an url of the form:
 nfs://host/export/filename
 
 For example:
 qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2
 
 
 Does it support other config tunables, eg specifying which
 NFS version to use 2/3/4 ? If so will they be available as
 URI parameters in the obvious manner ?
 
 
 currently only v3 is supported by libnfs. what other tunables would you
 like to see?
 
 
 For live migration we need the sync option (async ignores O_SYNC and
 O_DIRECT sadly),
 will it be supported? or will it be the default?
 
 
 If you use the high-level API that provides posix like functions, such
 as nfs_open() then libnfs does.
 nfs_open()/nfs_open_async() takes a mode parameter and libnfs checks
 the O_SYNC flag in modes.
 
 By default libnfs will translate any nfs_write*() or nfs_pwrite*() to
 NFS/WRITE3+UNSTABLE that allows the server to just write to
 cache/memory.
 
 IF you specify O_SYNC in the mode argument to nfds_open/nfs_open_async
 then libnfs will flag this handle as sync and any calls to
 nfs_write/nfs_pwrite will translate to NFS/WRITE3+FILE_SYNC
 
 Calls to nfs_fsync is translated to NFS/COMMIT3
 
 If this NFS/COMMIT3 would issue a sync on the server that would be all we
 actually need.
 
 You have that guarantee in NFS/COMMIT3
 NFS/COMMIT3 will not return until the server has flushed the specified
 range to disk.
 
 However, while the NFS protocol allows you to specify a range for the
 COMMIT3 call so that you can do things like
 WRITE3 Offset:foo Length:bar
 COMMIT3 Offset:foo Length:bar
 many/most nfs servers will ignore the offset/length arguments to the
 COMMIT3 call and always unconditionally make an fsync() for the whole
 file.
 
 This can make the COMMIT3 call very expensive for large files.
 
 
 NFSv3 also supports FILE_SYNC write mode, which libnfs triggers if you
 specify O_SYNC to nfs_open*()
 In this mode every single NFS/WRITE3 is sent with the FILE_SYNC mode
 which means that the server will guarantee to write the data to stable
 storage before responding back to the client.
 In this mode there is no real need to do anything at all or even call
 COMMIT3  since there is never any writeback data on the server that
 needs to be destaged.
 
 
 Since many servers treat COMMIT3 as unconditionally walk all blocks
 for the whole file and make sure they are destaged it is not clear
 whether how
 
 WRITE3-normal Offset:foo Length:bar
 COMMIT3 Offset:foo Length:bar
 
 will compare to
 
 WRITE3+O_SYNC Offset:foo Length:bar
 
 I would not be surprised if the second mode would have higher
 (potentially significantly) performance than the former.
 
 The qemu block layer currently is designed to send a bdrv_flush after every 
 single
 write if the write cache is not enabled. This means that the unwritten data 
 is just
 the data of the single write operation.
 
 I understand that, there is only a single WRITE3 worth of data to
 actually destage each time.
 
 But what I meant is that for a lot of servers, for large files,   the
 server might need to spend non-trivial amount of time
 crunching file metadata and check every single page for the file in
 order to discover the I only need to destage pages x,y,z
 
 On many nfs servers this figure out which blocks to flush can take a
 lot of time and affect performance greatly.

But this is only the case for disabled write cache. I would not expect great 
write
performance at all if the write cache is disabled.

As an improvement for the future we could improve the write operation with 
disabled
write cache by sending a write + file_sync call for every write and ignore the 
bdrv_flush
if this is faster.

Peter

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-17 Thread Stefan Hajnoczi

On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote:
 This patch adds native support for accessing images on NFS shares without
 the requirement to actually mount the entire NFS share on the host.
 
 NFS Images can simply be specified by an url of the form:
 nfs://host/export/filename
 
 For example:
 qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2
 
 You need libnfs from Ronnie Sahlberg available at:
git://github.com/sahlberg/libnfs.git
 for this to work.
 
 During configure it is automatically probed for libnfs and support
 is enabled on-the-fly. You can forbid or enforce libnfs support
 with --disable-libnfs or --enable-libnfs respectively.
 
 Due to NFS restrictions you might need to execute your binaries
 as root, allow them to open priviledged ports (1024) or specify
 insecure option on the NFS server.
 
 Signed-off-by: Peter Lieven p...@kamp.de
 ---
 v1-v2:
  - fixed block/Makefile.objs [Ronnie]
  - do not always register a read handler [Ronnie]
  - add support for reading beyond EOF [Fam]
  - fixed struct and paramter naming [Fam]
  - fixed overlong lines and whitespace errors [Fam]
  - return return status from libnfs whereever possible [Fam]
  - added comment why we set allocated_file_size to -ENOTSUP after write [Fam]
  - avoid segfault when parsing filname [Fam]
  - remove unused close_bh from NFSClient [Fam]
  - avoid dividing and mutliplying total_size by BDRV_SECTOR_SIZE in 
 nfs_file_create [Fam]
 
  MAINTAINERS |5 +
  block/Makefile.objs |1 +
  block/nfs.c |  419 
 +++
  configure   |   38 +
  4 files changed, 463 insertions(+)
  create mode 100644 block/nfs.c

Which NFS protocol versions are supported by current libnfs?

 +#include poll.h

Why is this header included?

 +typedef struct nfsclient {

Please either drop the struct tag or use NFSClient.

 +static void
 +nfs_co_generic_cb(int status, struct nfs_context *nfs, void *data,
 +  void *private_data)
 +{
 +NFSTask *Task = private_data;

lowercase task local variable name please.

 +static int coroutine_fn nfs_co_writev(BlockDriverState *bs,
 +int64_t sector_num, int nb_sectors,
 +QEMUIOVector *iov)
 +{
 +NFSClient *client = bs-opaque;
 +NFSTask task;
 +char *buf = NULL;
 +
 +nfs_co_init_task(client, task);
 +
 +buf = g_malloc(nb_sectors * BDRV_SECTOR_SIZE);
 +qemu_iovec_to_buf(iov, 0, buf, nb_sectors * BDRV_SECTOR_SIZE);
 +
 +if (nfs_pwrite_async(client-context, client-fh,
 + sector_num * BDRV_SECTOR_SIZE,
 + nb_sectors * BDRV_SECTOR_SIZE,
 + buf, nfs_co_generic_cb, task) != 0) {
 +g_free(buf);
 +return -EIO;

Can we get a more detailed errno here?  (e.g. ENOSPC)

 +}
 +
 +while (!task.complete) {
 +nfs_set_events(client);
 +qemu_coroutine_yield();
 +}
 +
 +g_free(buf);
 +
 +if (task.status != nb_sectors * BDRV_SECTOR_SIZE) {
 +return task.status  0 ? task.status : -EIO;
 +}
 +
 +bs-total_sectors = MAX(bs-total_sectors, sector_num + nb_sectors);

Why is this necessary?  block.c will update bs-total_sectors if the
file is growable.

 +/* set to -ENOTSUP since bdrv_allocated_file_size is only used
 + * in qemu-img open. So we can use the cached value for allocate
 + * filesize obtained from fstat at open time */
 +client-allocated_file_size = -ENOTSUP;

Can you implement this fully?  By stubbing it out like this we won't be
able to call get_allocated_file_size() at runtime in the future without
updating the nfs block driver code.  It's just an fstat call, shouldn't
be too hard to implement properly :).

 +if (client-context == NULL) {
 +error_setg(errp, Failed to init NFS context);
 +ret = -EINVAL;
 +goto fail;
 +}
 +
 +if (strlen(filename) = 6) {
 +error_setg(errp, Invalid server in URL);
 +ret = -EINVAL;
 +goto fail;
 +}
 +
 +server = g_strdup(filename + 6);
 +if (server[0] == '/' || server[0] == '\0') {
 +error_setg(errp, Invalid server in URL);
 +ret = -EINVAL;
 +goto fail;
 +}
 +strp = strchr(server, '/');
 +if (strp == NULL) {
 +error_setg(errp, Invalid URL specified.\n);
 +ret = -EINVAL;
 +goto fail;
 +}
 +path = g_strdup(strp);
 +*strp = 0;
 +strp = strrchr(path, '/');
 +if (strp == NULL) {
 +error_setg(errp, Invalid URL specified.\n);
 +ret = -EINVAL;
 +goto fail;
 +}
 +file = g_strdup(strp);
 +*strp = 0;

Can you use util/uri.c to avoid the string manipulation?  It can extract
the different parts for validation: scheme, server, port, and path.

 +static int nfs_file_create(const char *filename, QEMUOptionParameter 
 *options,
 +

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-17 Thread ronnie sahlberg

NFSTask

Task is a very scsi-ish term. Maybe RPC is better ?

NFSrpc ?



On Tue, Dec 17, 2013 at 1:15 AM, Peter Lieven p...@kamp.de wrote:
 This patch adds native support for accessing images on NFS shares without
 the requirement to actually mount the entire NFS share on the host.

 NFS Images can simply be specified by an url of the form:
 nfs://host/export/filename

 For example:
 qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2

 You need libnfs from Ronnie Sahlberg available at:
git://github.com/sahlberg/libnfs.git
 for this to work.

 During configure it is automatically probed for libnfs and support
 is enabled on-the-fly. You can forbid or enforce libnfs support
 with --disable-libnfs or --enable-libnfs respectively.

 Due to NFS restrictions you might need to execute your binaries
 as root, allow them to open priviledged ports (1024) or specify
 insecure option on the NFS server.

 Signed-off-by: Peter Lieven p...@kamp.de
 ---
 v1-v2:
  - fixed block/Makefile.objs [Ronnie]
  - do not always register a read handler [Ronnie]
  - add support for reading beyond EOF [Fam]
  - fixed struct and paramter naming [Fam]
  - fixed overlong lines and whitespace errors [Fam]
  - return return status from libnfs whereever possible [Fam]
  - added comment why we set allocated_file_size to -ENOTSUP after write [Fam]
  - avoid segfault when parsing filname [Fam]
  - remove unused close_bh from NFSClient [Fam]
  - avoid dividing and mutliplying total_size by BDRV_SECTOR_SIZE in 
 nfs_file_create [Fam]

  MAINTAINERS |5 +
  block/Makefile.objs |1 +
  block/nfs.c |  419 
 +++
  configure   |   38 +
  4 files changed, 463 insertions(+)
  create mode 100644 block/nfs.c

 diff --git a/MAINTAINERS b/MAINTAINERS
 index c19133f..f53d184 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
 @@ -899,6 +899,11 @@ M: Peter Lieven p...@kamp.de
  S: Supported
  F: block/iscsi.c

 +NFS
 +M: Peter Lieven p...@kamp.de
 +S: Maintained
 +F: block/nfs.c
 +
  SSH
  M: Richard W.M. Jones rjo...@redhat.com
  S: Supported
 diff --git a/block/Makefile.objs b/block/Makefile.objs
 index f43ecbc..aa8eaf9 100644
 --- a/block/Makefile.objs
 +++ b/block/Makefile.objs
 @@ -12,6 +12,7 @@ block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
  ifeq ($(CONFIG_POSIX),y)
  block-obj-y += nbd.o sheepdog.o
  block-obj-$(CONFIG_LIBISCSI) += iscsi.o
 +block-obj-$(CONFIG_LIBNFS) += nfs.o
  block-obj-$(CONFIG_CURL) += curl.o
  block-obj-$(CONFIG_RBD) += rbd.o
  block-obj-$(CONFIG_GLUSTERFS) += gluster.o
 diff --git a/block/nfs.c b/block/nfs.c
 new file mode 100644
 index 000..006b8cc
 --- /dev/null
 +++ b/block/nfs.c
 @@ -0,0 +1,419 @@
 +/*
 + * QEMU Block driver for native access to files on NFS shares
 + *
 + * Copyright (c) 2013 Peter Lieven p...@kamp.de
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining a 
 copy
 + * of this software and associated documentation files (the Software), to 
 deal
 + * in the Software without restriction, including without limitation the 
 rights
 + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 + * copies of the Software, and to permit persons to whom the Software is
 + * furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice shall be included in
 + * all copies or substantial portions of the Software.
 + *
 + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
 FROM,
 + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 + * THE SOFTWARE.
 + */
 +
 +#include config-host.h
 +
 +#include poll.h
 +#include qemu-common.h
 +#include qemu/config-file.h
 +#include qemu/error-report.h
 +#include block/block_int.h
 +#include trace.h
 +#include qemu/iov.h
 +#include sysemu/sysemu.h
 +
 +#include nfsc/libnfs-zdr.h
 +#include nfsc/libnfs.h
 +#include nfsc/libnfs-raw.h
 +#include nfsc/libnfs-raw-mount.h
 +
 +typedef struct nfsclient {
 +struct nfs_context *context;
 +struct nfsfh *fh;
 +int events;
 +bool has_zero_init;
 +int64_t allocated_file_size;
 +} NFSClient;
 +
 +typedef struct NFSTask {
 +int status;
 +int complete;
 +QEMUIOVector *iov;
 +Coroutine *co;
 +QEMUBH *bh;
 +} NFSTask;
 +
 +static void nfs_process_read(void *arg);
 +static void nfs_process_write(void *arg);
 +
 +static void nfs_set_events(NFSClient *client)
 +{
 +int ev = nfs_which_events(client-context);
 +if (ev != client-events) {
 +qemu_aio_set_fd_handler(nfs_get_fd(client-context),
 +  (ev  POLLIN) ? nfs_process_read :

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-17 Thread Peter Lieven


On 17.12.2013 17:47, Stefan Hajnoczi wrote:

On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote:

This patch adds native support for accessing images on NFS shares without
the requirement to actually mount the entire NFS share on the host.

NFS Images can simply be specified by an url of the form:
nfs://host/export/filename

For example:
qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2

You need libnfs from Ronnie Sahlberg available at:
git://github.com/sahlberg/libnfs.git
for this to work.

During configure it is automatically probed for libnfs and support
is enabled on-the-fly. You can forbid or enforce libnfs support
with --disable-libnfs or --enable-libnfs respectively.

Due to NFS restrictions you might need to execute your binaries
as root, allow them to open priviledged ports (1024) or specify
insecure option on the NFS server.

Signed-off-by: Peter Lieven p...@kamp.de
---
v1-v2:
  - fixed block/Makefile.objs [Ronnie]
  - do not always register a read handler [Ronnie]
  - add support for reading beyond EOF [Fam]
  - fixed struct and paramter naming [Fam]
  - fixed overlong lines and whitespace errors [Fam]
  - return return status from libnfs whereever possible [Fam]
  - added comment why we set allocated_file_size to -ENOTSUP after write [Fam]
  - avoid segfault when parsing filname [Fam]
  - remove unused close_bh from NFSClient [Fam]
  - avoid dividing and mutliplying total_size by BDRV_SECTOR_SIZE in 
nfs_file_create [Fam]

  MAINTAINERS |5 +
  block/Makefile.objs |1 +
  block/nfs.c |  419 +++
  configure   |   38 +
  4 files changed, 463 insertions(+)
  create mode 100644 block/nfs.c

Which NFS protocol versions are supported by current libnfs?

Will check that out. Ronnie?



+#include poll.h

Why is this header included?

leftover.



+typedef struct nfsclient {

Please either drop the struct tag or use NFSClient.

ok



+static void
+nfs_co_generic_cb(int status, struct nfs_context *nfs, void *data,
+  void *private_data)
+{
+NFSTask *Task = private_data;

lowercase task local variable name please.

ok



+static int coroutine_fn nfs_co_writev(BlockDriverState *bs,
+int64_t sector_num, int nb_sectors,
+QEMUIOVector *iov)
+{
+NFSClient *client = bs-opaque;
+NFSTask task;
+char *buf = NULL;
+
+nfs_co_init_task(client, task);
+
+buf = g_malloc(nb_sectors * BDRV_SECTOR_SIZE);
+qemu_iovec_to_buf(iov, 0, buf, nb_sectors * BDRV_SECTOR_SIZE);
+
+if (nfs_pwrite_async(client-context, client-fh,
+ sector_num * BDRV_SECTOR_SIZE,
+ nb_sectors * BDRV_SECTOR_SIZE,
+ buf, nfs_co_generic_cb, task) != 0) {
+g_free(buf);
+return -EIO;

Can we get a more detailed errno here?  (e.g. ENOSPC)

libnfs only returns 0 or -1 if the setup of the call
fails. the status code from the RPC is more detailed
and available in task.status.



+}
+
+while (!task.complete) {
+nfs_set_events(client);
+qemu_coroutine_yield();
+}
+
+g_free(buf);
+
+if (task.status != nb_sectors * BDRV_SECTOR_SIZE) {
+return task.status  0 ? task.status : -EIO;
+}
+
+bs-total_sectors = MAX(bs-total_sectors, sector_num + nb_sectors);

Why is this necessary?  block.c will update bs-total_sectors if the
file is growable.

Ok, didn't know ;-)



+/* set to -ENOTSUP since bdrv_allocated_file_size is only used
+ * in qemu-img open. So we can use the cached value for allocate
+ * filesize obtained from fstat at open time */
+client-allocated_file_size = -ENOTSUP;

Can you implement this fully?  By stubbing it out like this we won't be
able to call get_allocated_file_size() at runtime in the future without
updating the nfs block driver code.  It's just an fstat call, shouldn't
be too hard to implement properly :).

Will do. I will also add bdrv_truncate as its needed for VMDK Create.



+if (client-context == NULL) {
+error_setg(errp, Failed to init NFS context);
+ret = -EINVAL;
+goto fail;
+}
+
+if (strlen(filename) = 6) {
+error_setg(errp, Invalid server in URL);
+ret = -EINVAL;
+goto fail;
+}
+
+server = g_strdup(filename + 6);
+if (server[0] == '/' || server[0] == '\0') {
+error_setg(errp, Invalid server in URL);
+ret = -EINVAL;
+goto fail;
+}
+strp = strchr(server, '/');
+if (strp == NULL) {
+error_setg(errp, Invalid URL specified.\n);
+ret = -EINVAL;
+goto fail;
+}
+path = g_strdup(strp);
+*strp = 0;
+strp = strrchr(path, '/');
+if (strp == NULL) {
+error_setg(errp, Invalid URL specified.\n);
+ret = -EINVAL;
+goto fail;
+}
+file = g_strdup(strp);
+*strp = 0;

Can you use

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-17 Thread ronnie sahlberg

On Tue, Dec 17, 2013 at 9:03 AM, Peter Lieven p...@kamp.de wrote:
 On 17.12.2013 17:47, Stefan Hajnoczi wrote:
...
 Which NFS protocol versions are supported by current libnfs?

 Will check that out. Ronnie?


It uses NFS v3 only.

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-17 Thread ronnie sahlberg

On Tue, Dec 17, 2013 at 9:03 AM, Peter Lieven p...@kamp.de wrote:
 On 17.12.2013 17:47, Stefan Hajnoczi wrote:

 On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote:

...
 +if (nfs_pwrite_async(client-context, client-fh,
 + sector_num * BDRV_SECTOR_SIZE,
 + nb_sectors * BDRV_SECTOR_SIZE,
 + buf, nfs_co_generic_cb, task) != 0) {
 +g_free(buf);
 +return -EIO;

 Can we get a more detailed errno here?  (e.g. ENOSPC)

 libnfs only returns 0 or -1 if the setup of the call
 fails. the status code from the RPC is more detailed
 and available in task.status.


The *_async() functions only allocates memory and marshalls the
arguments to the buffer.
So barring marshalling bugs, it will only fail due to OOM.

So -ENOMEM is perhaps a better error for when *_async() returns an error.
That is really the only condition where these functions will fail.


If *_async() returns success you are guaranteed that
nfs_co_generic_cb() will be invoked
and there you can inspect the status argument for more detailed reason why.

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-17 Thread Daniel P. Berrange

On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote:
 This patch adds native support for accessing images on NFS shares without
 the requirement to actually mount the entire NFS share on the host.
 
 NFS Images can simply be specified by an url of the form:
 nfs://host/export/filename
 
 For example:
 qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2

Does it support other config tunables, eg specifying which
NFS version to use 2/3/4 ? If so will they be available as
URI parameters in the obvious manner ?

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-17 Thread Peter Lieven



 Am 17.12.2013 um 18:13 schrieb ronnie sahlberg ronniesahlb...@gmail.com:
 
 On Tue, Dec 17, 2013 at 9:03 AM, Peter Lieven p...@kamp.de wrote:
 On 17.12.2013 17:47, Stefan Hajnoczi wrote:
 ...
 Which NFS protocol versions are supported by current libnfs?
 
 Will check that out. Ronnie?
 
 It uses NFS v3 only.

should we use nfs3:// for the urls then?

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-17 Thread Eric Blake

On 12/17/2013 03:36 PM, Peter Lieven wrote:
 
 
 Am 17.12.2013 um 18:13 schrieb ronnie sahlberg ronniesahlb...@gmail.com:

 On Tue, Dec 17, 2013 at 9:03 AM, Peter Lieven p...@kamp.de wrote:
 On 17.12.2013 17:47, Stefan Hajnoczi wrote:
 ...
 Which NFS protocol versions are supported by current libnfs?

 Will check that out. Ronnie?

 It uses NFS v3 only.
 
 should we use nfs3:// for the urls then?

Or maybe nfs://10.0.0.1/qemu-images/test.qcow2?protocol=3 (where
?protocol= is optional and defaults to 3, but could be expanded to
include 4 in the future, and that way the single nfs:// URI covers both
protocols).

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-17 Thread ronnie sahlberg

On Tue, Dec 17, 2013 at 2:36 PM, Peter Lieven p...@kamp.de wrote:


 Am 17.12.2013 um 18:13 schrieb ronnie sahlberg ronniesahlb...@gmail.com:

 On Tue, Dec 17, 2013 at 9:03 AM, Peter Lieven p...@kamp.de wrote:
 On 17.12.2013 17:47, Stefan Hajnoczi wrote:
 ...
 Which NFS protocol versions are supported by current libnfs?

 Will check that out. Ronnie?

 It uses NFS v3 only.

 should we use nfs3:// for the urls then?

No, I think we should leave it as nfs://... so that we are compatilbe
with rfc2224

Once/if/when I add support for v2 and v4 we can force a protocol
version using ?version=2

Then
nfs://server/foo/bar   would be use whatever versions the server offers
but
nfs://server/foo/bar?version=2 would become use version 2 only

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-17 Thread Peter Lieven



 Am 17.12.2013 um 23:51 schrieb ronnie sahlberg ronniesahlb...@gmail.com:
 
 On Tue, Dec 17, 2013 at 2:36 PM, Peter Lieven p...@kamp.de wrote:
 
 
 Am 17.12.2013 um 18:13 schrieb ronnie sahlberg ronniesahlb...@gmail.com:
 
 On Tue, Dec 17, 2013 at 9:03 AM, Peter Lieven p...@kamp.de wrote:
 On 17.12.2013 17:47, Stefan Hajnoczi wrote:
 ...
 Which NFS protocol versions are supported by current libnfs?
 
 Will check that out. Ronnie?
 
 It uses NFS v3 only.
 
 should we use nfs3:// for the urls then?
 
 No, I think we should leave it as nfs://... so that we are compatilbe
 with rfc2224
 
 Once/if/when I add support for v2 and v4 we can force a protocol
 version using ?version=2
 
 Then
 nfs://server/foo/bar   would be use whatever versions the server offers
 but
 nfs://server/foo/bar?version=2 would become use version 2 only

then i would leave it as is and add a comment to the commit message that only 
v3 is supported atm.

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-17 Thread Peter Lieven



 Am 17.12.2013 um 17:53 schrieb ronnie sahlberg ronniesahlb...@gmail.com:
 
 NFSTask
 
 Task is a very scsi-ish term. Maybe RPC is better ?
 
 NFSrpc ?

will change it in v3

 
 
 
 On Tue, Dec 17, 2013 at 1:15 AM, Peter Lieven p...@kamp.de wrote:
 This patch adds native support for accessing images on NFS shares without
 the requirement to actually mount the entire NFS share on the host.
 
 NFS Images can simply be specified by an url of the form:
 nfs://host/export/filename
 
 For example:
 qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2
 
 You need libnfs from Ronnie Sahlberg available at:
   git://github.com/sahlberg/libnfs.git
 for this to work.
 
 During configure it is automatically probed for libnfs and support
 is enabled on-the-fly. You can forbid or enforce libnfs support
 with --disable-libnfs or --enable-libnfs respectively.
 
 Due to NFS restrictions you might need to execute your binaries
 as root, allow them to open priviledged ports (1024) or specify
 insecure option on the NFS server.
 
 Signed-off-by: Peter Lieven p...@kamp.de
 ---
 v1-v2:
 - fixed block/Makefile.objs [Ronnie]
 - do not always register a read handler [Ronnie]
 - add support for reading beyond EOF [Fam]
 - fixed struct and paramter naming [Fam]
 - fixed overlong lines and whitespace errors [Fam]
 - return return status from libnfs whereever possible [Fam]
 - added comment why we set allocated_file_size to -ENOTSUP after write [Fam]
 - avoid segfault when parsing filname [Fam]
 - remove unused close_bh from NFSClient [Fam]
 - avoid dividing and mutliplying total_size by BDRV_SECTOR_SIZE in 
 nfs_file_create [Fam]
 
 MAINTAINERS |5 +
 block/Makefile.objs |1 +
 block/nfs.c |  419 
 +++
 configure   |   38 +
 4 files changed, 463 insertions(+)
 create mode 100644 block/nfs.c
 
 diff --git a/MAINTAINERS b/MAINTAINERS
 index c19133f..f53d184 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
 @@ -899,6 +899,11 @@ M: Peter Lieven p...@kamp.de
 S: Supported
 F: block/iscsi.c
 
 +NFS
 +M: Peter Lieven p...@kamp.de
 +S: Maintained
 +F: block/nfs.c
 +
 SSH
 M: Richard W.M. Jones rjo...@redhat.com
 S: Supported
 diff --git a/block/Makefile.objs b/block/Makefile.objs
 index f43ecbc..aa8eaf9 100644
 --- a/block/Makefile.objs
 +++ b/block/Makefile.objs
 @@ -12,6 +12,7 @@ block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
 ifeq ($(CONFIG_POSIX),y)
 block-obj-y += nbd.o sheepdog.o
 block-obj-$(CONFIG_LIBISCSI) += iscsi.o
 +block-obj-$(CONFIG_LIBNFS) += nfs.o
 block-obj-$(CONFIG_CURL) += curl.o
 block-obj-$(CONFIG_RBD) += rbd.o
 block-obj-$(CONFIG_GLUSTERFS) += gluster.o
 diff --git a/block/nfs.c b/block/nfs.c
 new file mode 100644
 index 000..006b8cc
 --- /dev/null
 +++ b/block/nfs.c
 @@ -0,0 +1,419 @@
 +/*
 + * QEMU Block driver for native access to files on NFS shares
 + *
 + * Copyright (c) 2013 Peter Lieven p...@kamp.de
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining a 
 copy
 + * of this software and associated documentation files (the Software), to 
 deal
 + * in the Software without restriction, including without limitation the 
 rights
 + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 + * copies of the Software, and to permit persons to whom the Software is
 + * furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice shall be included 
 in
 + * all copies or substantial portions of the Software.
 + *
 + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS 
 OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
 OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
 FROM,
 + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 + * THE SOFTWARE.
 + */
 +
 +#include config-host.h
 +
 +#include poll.h
 +#include qemu-common.h
 +#include qemu/config-file.h
 +#include qemu/error-report.h
 +#include block/block_int.h
 +#include trace.h
 +#include qemu/iov.h
 +#include sysemu/sysemu.h
 +
 +#include nfsc/libnfs-zdr.h
 +#include nfsc/libnfs.h
 +#include nfsc/libnfs-raw.h
 +#include nfsc/libnfs-raw-mount.h
 +
 +typedef struct nfsclient {
 +struct nfs_context *context;
 +struct nfsfh *fh;
 +int events;
 +bool has_zero_init;
 +int64_t allocated_file_size;
 +} NFSClient;
 +
 +typedef struct NFSTask {
 +int status;
 +int complete;
 +QEMUIOVector *iov;
 +Coroutine *co;
 +QEMUBH *bh;
 +} NFSTask;
 +
 +static void nfs_process_read(void *arg);
 +static void nfs_process_write(void *arg);
 +
 +static void nfs_set_events(NFSClient *client)
 +{
 +int ev = nfs_which_events(client-context);
 +if (ev != client-events) {
 +

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-17 Thread Peter Lieven



 Am 17.12.2013 um 18:28 schrieb ronnie sahlberg ronniesahlb...@gmail.com:
 
 On Tue, Dec 17, 2013 at 9:03 AM, Peter Lieven p...@kamp.de wrote:
 On 17.12.2013 17:47, Stefan Hajnoczi wrote:
 
 On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote:
 
 ...
 +if (nfs_pwrite_async(client-context, client-fh,
 + sector_num * BDRV_SECTOR_SIZE,
 + nb_sectors * BDRV_SECTOR_SIZE,
 + buf, nfs_co_generic_cb, task) != 0) {
 +g_free(buf);
 +return -EIO;
 
 Can we get a more detailed errno here?  (e.g. ENOSPC)
 
 libnfs only returns 0 or -1 if the setup of the call
 fails. the status code from the RPC is more detailed
 and available in task.status.
 
 The *_async() functions only allocates memory and marshalls the
 arguments to the buffer.
 So barring marshalling bugs, it will only fail due to OOM.
 
 So -ENOMEM is perhaps a better error for when *_async() returns an error.
 That is really the only condition where these functions will fail.

i guess the same applies to libiscsi?!
i will change it in v3 and make a patch for the iscsi driver.

 
 
 If *_async() returns success you are guaranteed that
 nfs_co_generic_cb() will be invoked
 and there you can inspect the status argument for more detailed reason why.

Re: [Qemu-devel] [PATCHv2] block: add native support for NFS

2013-12-17 Thread Peter Lieven



 Am 17.12.2013 um 18:32 schrieb Daniel P. Berrange berra...@redhat.com:
 
 On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote:
 This patch adds native support for accessing images on NFS shares without
 the requirement to actually mount the entire NFS share on the host.
 
 NFS Images can simply be specified by an url of the form:
 nfs://host/export/filename
 
 For example:
 qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2
 
 Does it support other config tunables, eg specifying which
 NFS version to use 2/3/4 ? If so will they be available as
 URI parameters in the obvious manner ?

currently only v3 is supported by libnfs. what other tunables would you like to 
see?

 
 Daniel
 -- 
 |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
 |: http://libvirt.org  -o- http://virt-manager.org :|
 |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
 |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

42 matches

Mail list logo