Re: [Qemu-block] [RFC for-2.7 0/1] block/qapi: Add query-block-node-tree

2016-03-31 Thread Stefan Hajnoczi
On Thu, Mar 24, 2016 at 08:07:17PM +0100, Max Reitz wrote:
> As I responded to:
> - http://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg04464.html
> - http://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg05680.html
> 
> I think a general solution for querying the block node tree would be
> nice (I don't think we actually have one). The single patch in this
> series implements such a command.
> 
> However, this is an RFC because I'm not sure whether we really want this
> and thus I didn't want to write the necessary tests for something we may
> be going to discard anyway.
> 
> There are two reasons why I fear we may not want this:
> 
> The first is that the node graph is more or less something internal to
> qemu. Its actual structure may (and most probably will) change over
> time. We do want to be able to let the user or management application
> manage the graph in fine detail, but these modifications are something
> that can be emulated by legacy handling code later if we decide they are
> no longer in line with the internal graph that we'd like to have.
> 
> However, if we emit the full graph with a command such as introduced
> here, we can hardly change its outside appearance just to please legacy
> applications. The output will change if the internal representation
> changes.
> 
> I don't personally think this is too bad as long as we clearly state
> this in the command's description: That qemu is free to implicitly
> create intermediate nodes the user did not explicitly specify, and that
> the set of nodes thus created may change over time.
> 
> No, I didn't specify this in this version, but it's an RFC, after all.
> :-)

I'm also concerned about exposing the graph since QEMU may implement
features with internal block filters (I/O throttling, backup block job
to get rid of write notifiers, etc).  These nodes come and go depending
on high-level commands issued by the guest *and* are dependent on the
QEMU version.

What is the purpose of this command - I didn't see that in your cover
letter?  Does it still serve its purpose if we warn the user that the
graph structure can contain little surprises :)?


signature.asc
Description: PGP signature


Re: [Qemu-block] [PULL for-2.6 0/2] Block patches

2016-03-31 Thread Peter Maydell
On 30 March 2016 at 21:52, Jeff Cody  wrote:
> The following changes since commit 9370a3bbc478f623dd21d783560629ea2064625b:
>
>   Update version for v2.6.0-rc0 release (2016-03-30 19:25:40 +0100)
>
> are available in the git repository at:
>
>   g...@github.com:codyprime/qemu-kvm-jtc.git tags/block-pull-request
>
> for you to fetch changes up to 0d94b74655adadbeee135aee9bc105a41a524436:
>
>   block/nfs: add missing #include "qemu/cutils.h" (2016-03-30 16:50:39 -0400)
>
> 
> Block patch for 2.6, to fix build failure for libnfs
> 

Applied, thanks.

-- PMM



Re: [Qemu-block] [PATCH v12 2/3] quorum: implement bdrv_add_child() and bdrv_del_child()

2016-03-31 Thread Alberto Garcia
On Wed 30 Mar 2016 05:07:15 PM CEST, Max Reitz wrote:
>> I also have another (not directly related) question: why not simply
>> use the node name when removing children? I understood that the idea
>> was that it's possible to have the same node attached twice to the
>> same Quorum, but can you actually do that? And what's the use case?
>
> What I like about using the child role name is that it automatically
> prevents you from specifying a node that is not a child of the given
> parent.

Right, but checking if a node is not a child and returning an error is
very simple. And it doesn't require the user to keep track of the node
name *and* the child role name.

Unless I'm forgetting something this would be the first time we expose
the child role name in the API, that's why I'm wondering if it's
something worth doing.

> Which makes me notice that it might be a good idea to require the user
> to specify the child's role when adding a new child. In this version
> of this series (where only quorum is supported), the children are just
> inserted in numerical order (first free slot is taken first), but
> maybe the user wants to insert them in a different order.

For the Quorum case it totally makes sense to let the user choose the
position of the new child.

But for creating a Quorum array in the first place we don't require
that, the order is the one that the user provides, and the user does not
need to know about the child role names at that point.

> And the "filling up quorum's children" topic then makes me notice that
> (x-)blockdev-change should probably be transaction-able (so you can
> restructure the whole BDS graph in a single transaction), but that's
> something we can care about later on.

I agree.

Berto



Re: [Qemu-block] [PATCH v12 2/3] quorum: implement bdrv_add_child() and bdrv_del_child()

2016-03-31 Thread Dr. David Alan Gilbert
* Alberto Garcia (be...@igalia.com) wrote:
> On Wed 30 Mar 2016 05:07:15 PM CEST, Max Reitz wrote:
> >> I also have another (not directly related) question: why not simply
> >> use the node name when removing children? I understood that the idea
> >> was that it's possible to have the same node attached twice to the
> >> same Quorum, but can you actually do that? And what's the use case?
> >
> > What I like about using the child role name is that it automatically
> > prevents you from specifying a node that is not a child of the given
> > parent.
> 
> Right, but checking if a node is not a child and returning an error is
> very simple. And it doesn't require the user to keep track of the node
> name *and* the child role name.
> 
> Unless I'm forgetting something this would be the first time we expose
> the child role name in the API, that's why I'm wondering if it's
> something worth doing.
> 
> > Which makes me notice that it might be a good idea to require the user
> > to specify the child's role when adding a new child. In this version
> > of this series (where only quorum is supported), the children are just
> > inserted in numerical order (first free slot is taken first), but
> > maybe the user wants to insert them in a different order.
> 
> For the Quorum case it totally makes sense to let the user choose the
> position of the new child.
> 
> But for creating a Quorum array in the first place we don't require
> that, the order is the one that the user provides, and the user does not
> need to know about the child role names at that point.

Actually in my setup I only add the local block device using the normal command
line, and use drive_add even at startup.  It solves a lot of the ordering
problems with getting COLO booted.

Dave

> 
> > And the "filling up quorum's children" topic then makes me notice that
> > (x-)blockdev-change should probably be transaction-able (so you can
> > restructure the whole BDS graph in a single transaction), but that's
> > something we can care about later on.
> 
> I agree.
> 
> Berto
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



[Qemu-block] [PATCH 1/3] nbd: Treat flags vs. command type as separate fields

2016-03-31 Thread Eric Blake
Current upstream NBD documents that requests have a 16-bit flags,
followed by a 16-bit type integer; although older versions mentioned
only a 32-bit field with masking to find flags.  Since the protocol
is in network order (big-endian over the wire), the ABI is unchanged;
but dealing with the flags as a separate field rather than masking
will make it easier to add support for upcoming NBD extensions that
improve the efficiency of sparse file handling (since those
extensions will increase the number of both flags and commands).

Improve some comments in nbd.h based on the current upstream
NBD protocol (https://github.com/yoe/nbd/blob/master/doc/proto.md),
and touch some nearby code to keep checkpatch.pl happy.

Signed-off-by: Eric Blake 
---
 include/block/nbd.h | 17 +++--
 block/nbd-client.c  | 11 ---
 nbd/client.c| 17 ++---
 nbd/server.c| 28 
 4 files changed, 41 insertions(+), 32 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index b86a976..f018968 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -27,7 +27,8 @@

 struct nbd_request {
 uint32_t magic;
-uint32_t type;
+uint16_t flags;
+uint16_t type;
 uint64_t handle;
 uint64_t from;
 uint32_t len;
@@ -39,6 +40,8 @@ struct nbd_reply {
 uint64_t handle;
 } QEMU_PACKED;

+/* Transmission (export) flags: sent from server to client during handshake,
+   but describe what will happen during transmission */
 #define NBD_FLAG_HAS_FLAGS  (1 << 0)/* Flags are there */
 #define NBD_FLAG_READ_ONLY  (1 << 1)/* Device is read-only */
 #define NBD_FLAG_SEND_FLUSH (1 << 2)/* Send FLUSH */
@@ -46,10 +49,12 @@ struct nbd_reply {
 #define NBD_FLAG_ROTATIONAL (1 << 4)/* Use elevator algorithm - 
rotational media */
 #define NBD_FLAG_SEND_TRIM  (1 << 5)/* Send TRIM (discard) */

-/* New-style global flags. */
+/* New-style handshake (global) flags, sent from server to client, and
+   control what will happen during handshake phase. */
 #define NBD_FLAG_FIXED_NEWSTYLE (1 << 0)/* Fixed newstyle protocol. */

-/* New-style client flags. */
+/* New-style client flags, sent from client to server to control what happens
+   during handshake phase. */
 #define NBD_FLAG_C_FIXED_NEWSTYLE   (1 << 0)/* Fixed newstyle protocol. */

 /* Reply types. */
@@ -60,10 +65,10 @@ struct nbd_reply {
 #define NBD_REP_ERR_INVALID ((UINT32_C(1) << 31) | 3) /* Invalid length. */
 #define NBD_REP_ERR_TLS_REQD((UINT32_C(1) << 31) | 5) /* TLS required */

+/* Request flags, sent from client to server during transmission phase */
+#define NBD_CMD_FLAG_FUA(1 << 0)

-#define NBD_CMD_MASK_COMMAND   0x
-#define NBD_CMD_FLAG_FUA   (1 << 16)
-
+/* Supported request types */
 enum {
 NBD_CMD_READ = 0,
 NBD_CMD_WRITE = 1,
diff --git a/block/nbd-client.c b/block/nbd-client.c
index 021a88b..8645be4 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -252,7 +252,7 @@ static int nbd_co_writev_1(BlockDriverState *bs, int64_t 
sector_num,

 if ((*flags & BDRV_REQ_FUA) && (client->nbdflags & NBD_FLAG_SEND_FUA)) {
 *flags &= ~BDRV_REQ_FUA;
-request.type |= NBD_CMD_FLAG_FUA;
+request.flags |= NBD_CMD_FLAG_FUA;
 }

 request.from = sector_num * 512;
@@ -319,8 +319,9 @@ int nbd_client_co_flush(BlockDriverState *bs)
 return 0;
 }

+/* XXX: NBD Protocol only documents use of FUA with WRITE */
 if (client->nbdflags & NBD_FLAG_SEND_FUA) {
-request.type |= NBD_CMD_FLAG_FUA;
+request.flags |= NBD_CMD_FLAG_FUA;
 }

 request.from = 0;
@@ -380,11 +381,7 @@ void nbd_client_attach_aio_context(BlockDriverState *bs,
 void nbd_client_close(BlockDriverState *bs)
 {
 NbdClientSession *client = nbd_get_client_session(bs);
-struct nbd_request request = {
-.type = NBD_CMD_DISC,
-.from = 0,
-.len = 0
-};
+struct nbd_request request = { .type = NBD_CMD_DISC };

 if (client->ioc == NULL) {
 return;
diff --git a/nbd/client.c b/nbd/client.c
index f89c0a1..5af9f26 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -628,15 +628,18 @@ ssize_t nbd_send_request(QIOChannel *ioc, struct 
nbd_request *request)
 uint8_t buf[NBD_REQUEST_SIZE];
 ssize_t ret;

-cpu_to_be32w((uint32_t*)buf, NBD_REQUEST_MAGIC);
-cpu_to_be32w((uint32_t*)(buf + 4), request->type);
-cpu_to_be64w((uint64_t*)(buf + 8), request->handle);
-cpu_to_be64w((uint64_t*)(buf + 16), request->from);
-cpu_to_be32w((uint32_t*)(buf + 24), request->len);
+cpu_to_be32w((uint32_t *)buf, NBD_REQUEST_MAGIC);
+cpu_to_be16w((uint16_t *)(buf + 4), request->flags);
+cpu_to_be16w((uint16_t *)(buf + 6), request->type);
+cpu_to_be64w((uint64_t *)(buf + 8), request->handle);
+cpu_to_be64w((uint64_t *)(buf + 16), request->from);
+cpu_to_be32w((uint32_t *)(buf + 24), request->len)