Re: [Qemu-block] [Qemu-devel] [PATCH v5 3/4] qmp: add monitor command to add/remove a child

2015-10-12 Thread Markus Armbruster
Max Reitz  writes:

> On 08.10.2015 08:15, Markus Armbruster wrote:
>> Max Reitz  writes:
>> 
>>> On 22.09.2015 09:44, Wen Congyang wrote:
 The new QMP command name is x-blockdev-child-add, and x-blockdev-child-del.
 It justs for adding/removing quorum's child now, and don't support all
 kinds of children,
>>>
>>> It does support all kinds of children for quorum, doesn't it?
>>>
nor all block drivers. So it is experimental now.
>>>
>>> Well, that is not really a reason why we would have to make it
>>> experimental. For instance, blockdev-add (although some might argue it
>>> actually is experimental...) doesn't support all block drivers either.
>> 
>> Yup, and not calling it x-blockdev-add until it's done was a mistake.
>> People tried using it, then found its current limitations the painful
>> way.  Not nice.
>
> I knew I should have written s/some might/Markus does/. ;-)

:)

>>> The reason I am hesitant of adding an experimental QMP interface that is
>>> actually visible to the user (compare x-image in blkverify and blkdebug,
>>> which are not documented and not to be used by the user) is twofold:
>>>
>>> (1) At some point we have to say "OK, this is good enough now" and make
>>> it stable. What would that point be? Who can guarantee that we
>>> wouldn't want to make any interface changes after that point?
>> 
>> Nobody can, just like for any other interface.  So?
>
> The main question is "what would that point be". As I can see you're
> arguing that that point would be "once people want to use it", but I'm
> arguing that people want to use it today or we wouldn't need this
> interface at all.
>
> I'm against adding external experimental interface because having
> external interface indicates that someone wants to use them, but making
> them experimental indicates that nobody should use them.

Make that "nobody should use them in anger just yet."

They can and should be used to develop stuff.  Developing non-trivial
interfaces without actual users is risky.  Sometimes, you can't see
shortcomings in an interface until you try to use it.  Successful actual
use can build confidence the experimental interface is in fact ready to
be cast in stone.

> This interface is added for the COLO series. The documentation added in
> patch 5 there explains usage of COLO with x-child-add. I don't think
> that should be there, because it's experimental. But why have an
> external interface if nobody should use it anyway?
>
>> The x- prefix enables work spanning multiple releases.  Until the
>> feature is complete, we have a hard time seeing the whole picture, and
>> therefore the risk of interface mistakes is higher than normal.  Once
>> it's complete, we drop the x-.
>
> I'm arguing the feature is complete as far as what it's supposed to do goes.

When you say "the feature is complete", you're arguing this specific
interface is ready.  When you say you're "against adding external
experimental interface", you're arguing proper use of x-.  Let's try to
keep the discussion of principles separate from the discussion of the
specific instance.

On the former: maybe the interface is ready, but I can't judge offhand.
All I can do is ask questions.

On the latter: I emphatically disagree with the idea that experimental
interfaces are to be avoided because "someone wants to use them".

>>>   Would
>>> we actually remember to revisit this function once in a while and
>>> consider making it stable?
>> 
>> Has that been a problem in the past?
>
> I don't know, because I never witnessed an external experimental
> interface, but I haven't been closely involved with qemu for too long.

QMP itself started experimental, and was declared stable after fairly
heated discussion.

I think we've been dropping x- prefixes pretty routinely.  A quick,
superficial search finds commit 41310c6 (x-rdma) and commit 467b3f3
(x-iothread).

[...]



Re: [Qemu-block] [Qemu-devel] [PATCH v5 3/4] qmp: add monitor command to add/remove a child

2015-10-12 Thread Markus Armbruster
"Dr. David Alan Gilbert"  writes:

> * Max Reitz (mre...@redhat.com) wrote:
>> On 08.10.2015 08:15, Markus Armbruster wrote:
>> > Max Reitz  writes:
>> > 
>> >> On 22.09.2015 09:44, Wen Congyang wrote:
>> >>> The new QMP command name is x-blockdev-child-add, and
>> >>> x-blockdev-child-del.
>> >>> It justs for adding/removing quorum's child now, and don't support all
>> >>> kinds of children,
>> >>
>> >> It does support all kinds of children for quorum, doesn't it?
>> >>
>> >>>nor all block drivers. So it is experimental now.
>> >>
>> >> Well, that is not really a reason why we would have to make it
>> >> experimental. For instance, blockdev-add (although some might argue it
>> >> actually is experimental...) doesn't support all block drivers either.
>> > 
>> > Yup, and not calling it x-blockdev-add until it's done was a mistake.
>> > People tried using it, then found its current limitations the painful
>> > way.  Not nice.
>> 
>> I knew I should have written s/some might/Markus does/. ;-)
>> 
>> >> The reason I am hesitant of adding an experimental QMP interface that is
>> >> actually visible to the user (compare x-image in blkverify and blkdebug,
>> >> which are not documented and not to be used by the user) is twofold:
>> >>
>> >> (1) At some point we have to say "OK, this is good enough now" and make
>> >> it stable. What would that point be? Who can guarantee that we
>> >> wouldn't want to make any interface changes after that point?
>> > 
>> > Nobody can, just like for any other interface.  So?
>> 
>> The main question is "what would that point be". As I can see you're
>> arguing that that point would be "once people want to use it", but I'm
>> arguing that people want to use it today or we wouldn't need this
>> interface at all.
>> 
>> I'm against adding external experimental interface because having
>> external interface indicates that someone wants to use them, but making
>> them experimental indicates that nobody should use them.
>> 
>> This interface is added for the COLO series. The documentation added in
>> patch 5 there explains usage of COLO with x-child-add. I don't think
>> that should be there, because it's experimental. But why have an
>> external interface if nobody should use it anyway?
>
> Because it lets people move forward; the COLO series is pretty huge, there
> already seem to be side discussions spawning off about dynamic reconfiguration
> of stuff, who knows how long those will take to pan out.
> Adding the experimental stuff makes it easier for people to try and
> get some feedback on.
> If everyone turns out to love it then it only takes a trivial patch to promote
> it; if people actually realise there is a better interface then it's
> no problem to change it either - x- doesn't stop any one using it, but it
> does remove their right to moan if it changes.

Exactly.



Re: [Qemu-block] [Qemu-devel] [PATCH v5 3/4] qmp: add monitor command to add/remove a child

2015-10-12 Thread Dr. David Alan Gilbert
* Max Reitz (mre...@redhat.com) wrote:
> On 09.10.2015 18:42, Dr. David Alan Gilbert wrote:
> > * Max Reitz (mre...@redhat.com) wrote:
> >> On 08.10.2015 08:15, Markus Armbruster wrote:
> >>> Max Reitz  writes:
> >>>
>  On 22.09.2015 09:44, Wen Congyang wrote:
> > The new QMP command name is x-blockdev-child-add, and 
> > x-blockdev-child-del.
> > It justs for adding/removing quorum's child now, and don't support all
> > kinds of children,
> 
>  It does support all kinds of children for quorum, doesn't it?
> 
> >nor all block drivers. So it is experimental now.
> 
>  Well, that is not really a reason why we would have to make it
>  experimental. For instance, blockdev-add (although some might argue it
>  actually is experimental...) doesn't support all block drivers either.
> >>>
> >>> Yup, and not calling it x-blockdev-add until it's done was a mistake.
> >>> People tried using it, then found its current limitations the painful
> >>> way.  Not nice.
> >>
> >> I knew I should have written s/some might/Markus does/. ;-)
> >>
>  The reason I am hesitant of adding an experimental QMP interface that is
>  actually visible to the user (compare x-image in blkverify and blkdebug,
>  which are not documented and not to be used by the user) is twofold:
> 
>  (1) At some point we have to say "OK, this is good enough now" and make
>  it stable. What would that point be? Who can guarantee that we
>  wouldn't want to make any interface changes after that point?
> >>>
> >>> Nobody can, just like for any other interface.  So?
> >>
> >> The main question is "what would that point be". As I can see you're
> >> arguing that that point would be "once people want to use it", but I'm
> >> arguing that people want to use it today or we wouldn't need this
> >> interface at all.
> >>
> >> I'm against adding external experimental interface because having
> >> external interface indicates that someone wants to use them, but making
> >> them experimental indicates that nobody should use them.
> >>
> >> This interface is added for the COLO series. The documentation added in
> >> patch 5 there explains usage of COLO with x-child-add. I don't think
> >> that should be there, because it's experimental. But why have an
> >> external interface if nobody should use it anyway?
> > 
> > Because it lets people move forward; the COLO series is pretty huge, there
> > already seem to be side discussions spawning off about dynamic 
> > reconfiguration
> > of stuff, who knows how long those will take to pan out.
> 
> Yes, and my point is that with these functions
> (blockdev-child-{add,del}) the result of that side discussion doesn't
> matter.
> 
> > Adding the experimental stuff makes it easier for people to try and
> > get some feedback on.
> 
> The thing is, I cannot imagine any feedback that would necessitate an
> incompatible change. “I want to change quorum's options while
> adding/removing children” can easily be accomplished with an additional
> optional parameter.
> 
> But I do know that we want to keep things experimental exactly because
> there can be feedback which I cannot imagine right now.
> 
> > If everyone turns out to love it then it only takes a trivial patch to 
> > promote
> > it; if people actually realise there is a better interface then it's
> > no problem to change it either - x- doesn't stop any one using it,
> 
> But it should, shouldn't it? No management tool should be using an x-
> command, as far as I know. And these are functions which are clearly
> designed for management tools.
> 
> If management tools are indeed free to use x- functions, then I'm
> completely fine with making these experimental for now. It's just that
> it looks to me like “Hey, look, we have these two new functions you can
> use!” and then, two versions later we remove them because we have a
> general reconfiguration option, and we'll say “It's your own fault for
> using experimental functions” if someone complains. That sounds
> hypocritical to me, but I'm probably being to “legal” here.
>
> (i.e. it's more like “Hey, look, two new cool functions! But don't use
> them.” which sounds like a contradiction to me, whereas it actually
> means “Feel free to use them but don't blame us”)
> 
> tl;dr: May management tools use x- functions? And is it actually
> conceivable for them to do so? If so, my whole argument becomes moot, so
> let's make these functions x-.

My guess is the libvirt guys wont take the code to drive the x- methods;
but it still makes it easier if someone wants to try this stuff out, they
wont need to apply 2/3 sets of COLO code and then any management tools.

> Mainly I'd like to know about some example where we had an x- function
> in the past. Markus seemed to imply that was the case.

The RDMA code used to have x- for migration protocol and some of the
capabilities; we've recently added Jason Herne's cpu throttling with
sim

Re: [Qemu-block] [Qemu-devel] [PATCH v5 3/4] qmp: add monitor command to add/remove a child

2015-10-12 Thread Kevin Wolf
Am 09.10.2015 um 20:24 hat Max Reitz geschrieben:
> On 09.10.2015 18:42, Dr. David Alan Gilbert wrote:
> > * Max Reitz (mre...@redhat.com) wrote:
> >> On 08.10.2015 08:15, Markus Armbruster wrote:
> >>> Max Reitz  writes:
> >>>
>  On 22.09.2015 09:44, Wen Congyang wrote:
> > The new QMP command name is x-blockdev-child-add, and 
> > x-blockdev-child-del.
> > It justs for adding/removing quorum's child now, and don't support all
> > kinds of children,
> 
>  It does support all kinds of children for quorum, doesn't it?
> 
> >nor all block drivers. So it is experimental now.
> 
>  Well, that is not really a reason why we would have to make it
>  experimental. For instance, blockdev-add (although some might argue it
>  actually is experimental...) doesn't support all block drivers either.
> >>>
> >>> Yup, and not calling it x-blockdev-add until it's done was a mistake.
> >>> People tried using it, then found its current limitations the painful
> >>> way.  Not nice.
> >>
> >> I knew I should have written s/some might/Markus does/. ;-)
> >>
>  The reason I am hesitant of adding an experimental QMP interface that is
>  actually visible to the user (compare x-image in blkverify and blkdebug,
>  which are not documented and not to be used by the user) is twofold:
> 
>  (1) At some point we have to say "OK, this is good enough now" and make
>  it stable. What would that point be? Who can guarantee that we
>  wouldn't want to make any interface changes after that point?
> >>>
> >>> Nobody can, just like for any other interface.  So?
> >>
> >> The main question is "what would that point be". As I can see you're
> >> arguing that that point would be "once people want to use it", but I'm
> >> arguing that people want to use it today or we wouldn't need this
> >> interface at all.
> >>
> >> I'm against adding external experimental interface because having
> >> external interface indicates that someone wants to use them, but making
> >> them experimental indicates that nobody should use them.
> >>
> >> This interface is added for the COLO series. The documentation added in
> >> patch 5 there explains usage of COLO with x-child-add. I don't think
> >> that should be there, because it's experimental. But why have an
> >> external interface if nobody should use it anyway?
> > 
> > Because it lets people move forward; the COLO series is pretty huge, there
> > already seem to be side discussions spawning off about dynamic 
> > reconfiguration
> > of stuff, who knows how long those will take to pan out.
> 
> Yes, and my point is that with these functions
> (blockdev-child-{add,del}) the result of that side discussion doesn't
> matter.
> 
> > Adding the experimental stuff makes it easier for people to try and
> > get some feedback on.
> 
> The thing is, I cannot imagine any feedback that would necessitate an
> incompatible change. “I want to change quorum's options while
> adding/removing children” can easily be accomplished with an additional
> optional parameter.
> 
> But I do know that we want to keep things experimental exactly because
> there can be feedback which I cannot imagine right now.
> 
> > If everyone turns out to love it then it only takes a trivial patch to 
> > promote
> > it; if people actually realise there is a better interface then it's
> > no problem to change it either - x- doesn't stop any one using it,
> 
> But it should, shouldn't it? No management tool should be using an x-
> command, as far as I know. And these are functions which are clearly
> designed for management tools.

It should stop people from using it in production, but it shouldn't stop
them from using it for development and testing.

We know that child-add/del is probably not the interface that we want to
have in the end (and I would like to avoid accumulating tons of
compatibility commands once we have what we want).

If the COLO people say that they need an experimental command in order
to make progress, that's fine with me. I think we'll all agree that
while 'blockdev-add' can't reasonably be used in production yet, without
it we couldn't have made much of the progress in the block layer that we
made in the past year. If COLO people are in the same situation, let's
give them what they need, without setting an unwanted interface in
stone.

> If management tools are indeed free to use x- functions, then I'm
> completely fine with making these experimental for now. It's just that
> it looks to me like “Hey, look, we have these two new functions you can
> use!” and then, two versions later we remove them because we have a
> general reconfiguration option, and we'll say “It's your own fault for
> using experimental functions” if someone complains. That sounds
> hypocritical to me, but I'm probably being to “legal” here.

Experimental features in management tools (e.g. in some feature branch)
can use them, they just can't rely on it kee

Re: [Qemu-block] [PATCH 05/12] aio: introduce aio_{disable, enable}_clients

2015-10-12 Thread Kevin Wolf
Am 09.10.2015 um 18:27 hat Fam Zheng geschrieben:
> On Fri, 10/09 16:31, Kevin Wolf wrote:
> > Am 09.10.2015 um 07:45 hat Fam Zheng geschrieben:
> > > Signed-off-by: Fam Zheng 
> > > ---
> > >  aio-posix.c |  3 ++-
> > >  aio-win32.c |  3 ++-
> > >  async.c | 42 ++
> > >  include/block/aio.h | 30 ++
> > >  4 files changed, 76 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/aio-posix.c b/aio-posix.c
> > > index d25fcfc..a261892 100644
> > > --- a/aio-posix.c
> > > +++ b/aio-posix.c
> > > @@ -261,7 +261,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
> > >  
> > >  /* fill pollfds */
> > >  QLIST_FOREACH(node, &ctx->aio_handlers, node) {
> > > -if (!node->deleted && node->pfd.events) {
> > > +if (!node->deleted && node->pfd.events
> > > +&& !aio_type_disabled(ctx, node->type)) {
> > >  add_pollfd(node);
> > >  }
> > >  }
> > > diff --git a/aio-win32.c b/aio-win32.c
> > > index f5ecf57..66cff60 100644
> > > --- a/aio-win32.c
> > > +++ b/aio-win32.c
> > > @@ -309,7 +309,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
> > >  /* fill fd sets */
> > >  count = 0;
> > >  QLIST_FOREACH(node, &ctx->aio_handlers, node) {
> > > -if (!node->deleted && node->io_notify) {
> > > +if (!node->deleted && node->io_notify
> > > +&& !aio_type_disabled(ctx, node->type)) {
> > >  events[count++] = event_notifier_get_handle(node->e);
> > >  }
> > >  }
> > > diff --git a/async.c b/async.c
> > > index 244bf79..855b9d5 100644
> > > --- a/async.c
> > > +++ b/async.c
> > > @@ -361,3 +361,45 @@ void aio_context_release(AioContext *ctx)
> > >  {
> > >  rfifolock_unlock(&ctx->lock);
> > >  }
> > > +
> > > +bool aio_type_disabled(AioContext *ctx, int type)
> > > +{
> > > +int i = 1;
> > > +int n = 0;
> > > +
> > > +while (type) {
> > > +bool b = type & 0x1;
> > > +type >>= 1;
> > > +n++;
> > 
> > Any specific reason for leaving client_disable_counters[0] unused?
> 
> No, I should have started from 0.
> 
> > 
> > > +i <<= 1;
> > 
> > i is never read.
> > 
> > > +if (!b) {
> > > +continue;
> > > +}
> > > +if (ctx->client_disable_counters[n]) {
> > > +return true;
> > > +}
> > > +}
> > > +return false;
> > > +}
> > 
> > In general I wonder whether this function really needs to take a mask
> > with possibly multiple set bits instead of just a single type.
> 
> Previous versions used to have more types than "internal" and "external", so 
> it
> has been a mask. So yes, I think a single type will be better now.
> 
> > 
> > > +void aio_disable_enable_clients(AioContext *ctx, int clients_mask,
> > > +bool is_disable)
> > > +{
> > > +int i = 1;
> > > +int n = 0;
> > > +aio_context_acquire(ctx);
> > > +
> > > +while (clients_mask) {
> > > +bool b = clients_mask & 0x1;
> > > +clients_mask >>= 1;
> > > +n++;
> > > +i <<= 1;
> > 
> > This i isn't used either.
> > 
> > > +if (!b) {
> > > +continue;
> > > +}
> > > +if (ctx->client_disable_counters[n]) {
> > > +return true;
> > > +}
> > 
> > Wait, why are you checking the state instead of setting it?
> 
> Oops, apparent I screwed my workspaces as I do remember coding this 
> assignment.
> And I must have used a wrong command when building the tree so that I don't
> even catch the compiling error. :(
> 
> > 
> > How did you test this series?
> 
> So far only smoke testing and qemu-iotests, because I don't have a good idea 
> of
> testifying the transaction's atomicity. Any suggestions?

Perhaps you could use blkdebug to delay something in the middle of the
transaction while your guest keeps writing stuff? That should result in
100% reproducability.

I guess you actually need to make sure that your guest doesn't do any
I/O, then set the blkdebug breakpoint, send the transaction, and once a
request is stopped, you start some I/O in the guest. Resume as soon as
you know that something bad happened.

Possibly you need to add a new blkdebug event to find a good place to
suspend a transaction request.

Kevin



[Qemu-block] [PATCH v7 1/5] block: check for existing device IDs in external_snapshot_prepare()

2015-10-12 Thread Alberto Garcia
The 'snapshot-node-name' parameter of blockdev-snapshot-sync allows
setting the node name of the image that is going to be created.

Before creating the image, external_snapshot_prepare() checks that the
name is not already being used. The check is however incomplete since
it only considers existing node names, but node names must not clash
with device IDs either because they share the same namespace.

If the user attempts to create a snapshot using the name of an
existing device for the 'snapshot-node-name' parameter the operation
will eventually fail, but only after the new image has been created.

This patch replaces bdrv_find_node() with bdrv_lookup_bs() to extend
the check to existing device IDs, and thus detect possible name
clashes before the new image is created.

Signed-off-by: Alberto Garcia 
---
 blockdev.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 4731843..0898d1f 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1570,8 +1570,9 @@ static void external_snapshot_prepare(BlkTransactionState 
*common,
 return;
 }
 
-if (has_snapshot_node_name && bdrv_find_node(snapshot_node_name)) {
-error_setg(errp, "New snapshot node name already existing");
+if (has_snapshot_node_name &&
+bdrv_lookup_bs(snapshot_node_name, snapshot_node_name, NULL)) {
+error_setg(errp, "New snapshot node name already in use");
 return;
 }
 
-- 
2.6.1




[Qemu-block] [PATCH v7 3/5] block: support passing 'backing': '' to 'blockdev-add'

2015-10-12 Thread Alberto Garcia
Passing an empty string allows opening an image but not its backing
file. This was already described in the API documentation, only the
implementation was missing.

This is useful for creating snapshots using images opened with
blockdev-add, since they are not supposed to have a backing image
before the operation.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Kevin Wolf 
---
 block.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/block.c b/block.c
index c9e3c6c..31d1b6e 100644
--- a/block.c
+++ b/block.c
@@ -1406,6 +1406,7 @@ static int bdrv_open_inherit(BlockDriverState **pbs, 
const char *filename,
 BlockDriverState *file = NULL, *bs;
 BlockDriver *drv = NULL;
 const char *drvname;
+const char *backing;
 Error *local_err = NULL;
 int snapshot_flags = 0;
 
@@ -1473,6 +1474,12 @@ static int bdrv_open_inherit(BlockDriverState **pbs, 
const char *filename,
 
 assert(drvname || !(flags & BDRV_O_PROTOCOL));
 
+backing = qdict_get_try_str(options, "backing");
+if (backing && *backing == '\0') {
+flags |= BDRV_O_NO_BACKING;
+qdict_del(options, "backing");
+}
+
 bs->open_flags = flags;
 bs->options = options;
 options = qdict_clone_shallow(options);
-- 
2.6.1




[Qemu-block] [PATCH v7 0/5] Add 'blockdev-snapshot' command

2015-10-12 Thread Alberto Garcia
This series adds a new 'blockdev-snapshot' command, that is similar to
'blockdev-snapshot-sync' but takes references to two existing block
devices.

This depends on Max's "BlockBackend and media" series:

https://lists.gnu.org/archive/html/qemu-block/2015-09/msg00497.html

v7:
- Rebase on top of the current master.
  qmp_marshal_input_blockdev_snapshot is renamed to
  qmp_marshal_blockdev_snapshot in order to make it build.
- New patch to use bdrv_lookup_bs() instead of bdrv_find_node() in
  external_snapshot_prepare(). This way, if the user attempts to use
  blockdev-snapshot-sync using an existing device ID in the
  snapshot-node-name parameter, the code will detect the error before
  the new image is created.

v6: https://lists.gnu.org/archive/html/qemu-block/2015-09/msg00575.html
- Update documentation and parameter names following Eric's
  suggestions: 'device' -> 'node', 'snapshot' -> 'overlay'.
- Rebased on top of Max's "BlockBackend and media" v5

v5: https://lists.gnu.org/archive/html/qemu-block/2015-09/msg00483.html
- Don't delete the 'backing' option if it contains something different
  from an empty string.
- Rebase on top of the current master.

v4: https://lists.gnu.org/archive/html/qemu-block/2015-09/msg00372.html
- Implement the support for 'backing': '', drop 'ignore-backing',
  and update iotest 085 accordingly.
- Include sample 'blockdev-add' call in the 'blockdev-snapshot'
  documentation.
- Clarify that the snapshot must not have a backing file in the
  BlockdevSnapshot documentation.
- Update error message ("...node name already existing" -> "...exists").

v3: https://lists.gnu.org/archive/html/qemu-block/2015-09/msg00272.html
- Add 'ignore-backing' field to BlockdevOptionsGenericCOWFormat. This
  allows opening images but not their backing images.
- Check for op blockers in the snapshot node and make sure that it
  doesn't have any backing image.
- Remove extra check for the existence of the snapshot node:
  bdrv_open() already does that.
- Extend iotest 085 to add tests for 'blockdev-snapshot'.
- Replace local_err with errp in some places where the former is
  unnecessary.
- Update command description.
- Add 'since' tag to the 'blockdev-snapshot' field in TransactionAction.

v2: https://lists.gnu.org/archive/html/qemu-block/2015-09/msg00094.html
- Add 'blockdev-snapshot' command instead of allowing passing options
  to 'blockdev-snapshot-sync'.
- Rename BlockdevSnapshot to BlockdevSnapshotSync

v1: https://lists.gnu.org/archive/html/qemu-block/2015-08/msg00236.html

Alberto Garcia (5):
  block: check for existing device IDs in external_snapshot_prepare()
  block: rename BlockdevSnapshot to BlockdevSnapshotSync
  block: support passing 'backing': '' to 'blockdev-add'
  block: add a 'blockdev-snapshot' QMP command
  block: add tests for the 'blockdev-snapshot' command

 block.c|   7 ++
 blockdev.c | 166 -
 qapi-schema.json   |   4 +-
 qapi/block-core.json   |  34 +-
 qmp-commands.hx|  38 +++
 tests/qemu-iotests/085 | 102 ++--
 tests/qemu-iotests/085.out |  34 +-
 7 files changed, 312 insertions(+), 73 deletions(-)

-- 
2.6.1




[Qemu-block] [PATCH v7 4/5] block: add a 'blockdev-snapshot' QMP command

2015-10-12 Thread Alberto Garcia
One of the limitations of the 'blockdev-snapshot-sync' command is that
it does not allow passing BlockdevOptions to the newly created
snapshots, so they are always opened using the default values.

Extending the command to allow passing options is not a practical
solution because there is overlap between those options and some of
the existing parameters of the command.

This patch introduces a new 'blockdev-snapshot' command with a simpler
interface: it just takes two references to existing block devices that
will be used as the source and target for the snapshot.

Since the main difference between the two commands is that one of them
creates and opens the target image, while the other uses an already
opened one, the bulk of the implementation is shared.

Signed-off-by: Alberto Garcia 
Cc: Eric Blake 
Reviewed-by: Max Reitz 
---
 blockdev.c   | 165 ---
 qapi-schema.json |   2 +
 qapi/block-core.json |  28 +
 qmp-commands.hx  |  38 
 4 files changed, 172 insertions(+), 61 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 12741a0..b5470c9 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1183,6 +1183,18 @@ void qmp_blockdev_snapshot_sync(bool has_device, const 
char *device,
&snapshot, errp);
 }
 
+void qmp_blockdev_snapshot(const char *node, const char *overlay,
+   Error **errp)
+{
+BlockdevSnapshot snapshot_data = {
+.node = (char *) node,
+.overlay = (char *) overlay
+};
+
+blockdev_do_action(TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT,
+   &snapshot_data, errp);
+}
+
 void qmp_blockdev_snapshot_internal_sync(const char *device,
  const char *name,
  Error **errp)
@@ -1521,58 +1533,48 @@ typedef struct ExternalSnapshotState {
 static void external_snapshot_prepare(BlkTransactionState *common,
   Error **errp)
 {
-int flags, ret;
-QDict *options;
+int flags = 0, ret;
+QDict *options = NULL;
 Error *local_err = NULL;
-bool has_device = false;
+/* Device and node name of the image to generate the snapshot from */
 const char *device;
-bool has_node_name = false;
 const char *node_name;
-bool has_snapshot_node_name = false;
-const char *snapshot_node_name;
+/* Reference to the new image (for 'blockdev-snapshot') */
+const char *snapshot_ref;
+/* File name of the new image (for 'blockdev-snapshot-sync') */
 const char *new_image_file;
-const char *format = "qcow2";
-enum NewImageMode mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
 ExternalSnapshotState *state =
  DO_UPCAST(ExternalSnapshotState, common, common);
 TransactionAction *action = common->action;
 
-/* get parameters */
-g_assert(action->kind == TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC);
-
-has_device = action->blockdev_snapshot_sync->has_device;
-device = action->blockdev_snapshot_sync->device;
-has_node_name = action->blockdev_snapshot_sync->has_node_name;
-node_name = action->blockdev_snapshot_sync->node_name;
-has_snapshot_node_name =
-action->blockdev_snapshot_sync->has_snapshot_node_name;
-snapshot_node_name = action->blockdev_snapshot_sync->snapshot_node_name;
-
-new_image_file = action->blockdev_snapshot_sync->snapshot_file;
-if (action->blockdev_snapshot_sync->has_format) {
-format = action->blockdev_snapshot_sync->format;
-}
-if (action->blockdev_snapshot_sync->has_mode) {
-mode = action->blockdev_snapshot_sync->mode;
+/* 'blockdev-snapshot' and 'blockdev-snapshot-sync' have similar
+ * purpose but a different set of parameters */
+switch (action->kind) {
+case TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT:
+{
+BlockdevSnapshot *s = action->blockdev_snapshot;
+device = s->node;
+node_name = s->node;
+new_image_file = NULL;
+snapshot_ref = s->overlay;
+}
+break;
+case TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC:
+{
+BlockdevSnapshotSync *s = action->blockdev_snapshot_sync;
+device = s->has_device ? s->device : NULL;
+node_name = s->has_node_name ? s->node_name : NULL;
+new_image_file = s->snapshot_file;
+snapshot_ref = NULL;
+}
+break;
+default:
+g_assert_not_reached();
 }
 
 /* start processing */
-state->old_bs = bdrv_lookup_bs(has_device ? device : NULL,
-   has_node_name ? node_name : NULL,
-   &local_err);
-if (local_err) {
-error_propagate(errp, local_err);
-return;
-}
-
-if (has_node_name && !has_snapshot_node_name) {
-error_setg(errp, "New snapshot node name missi

[Qemu-block] [PATCH v7 5/5] block: add tests for the 'blockdev-snapshot' command

2015-10-12 Thread Alberto Garcia
Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
---
 tests/qemu-iotests/085 | 102 ++---
 tests/qemu-iotests/085.out |  34 ++-
 2 files changed, 128 insertions(+), 8 deletions(-)

diff --git a/tests/qemu-iotests/085 b/tests/qemu-iotests/085
index 56cd6f8..9484117 100755
--- a/tests/qemu-iotests/085
+++ b/tests/qemu-iotests/085
@@ -7,6 +7,7 @@
 # snapshots are performed.
 #
 # Copyright (C) 2014 Red Hat, Inc.
+# Copyright (C) 2015 Igalia, S.L.
 #
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -34,17 +35,17 @@ status=1# failure is the default!
 snapshot_virt0="snapshot-v0.qcow2"
 snapshot_virt1="snapshot-v1.qcow2"
 
-MAX_SNAPSHOTS=10
+SNAPSHOTS=10
 
 _cleanup()
 {
 _cleanup_qemu
-for i in $(seq 1 ${MAX_SNAPSHOTS})
+for i in $(seq 1 ${SNAPSHOTS})
 do
 rm -f "${TEST_DIR}/${i}-${snapshot_virt0}"
 rm -f "${TEST_DIR}/${i}-${snapshot_virt1}"
 done
-   _cleanup_test_img
+rm -f "${TEST_IMG}.1" "${TEST_IMG}.2"
 
 }
 trap "_cleanup; exit \$status" 0 1 2 3 15
@@ -85,18 +86,50 @@ function create_group_snapshot()
 _send_qemu_cmd $h "${cmd}" "return"
 }
 
+# ${1}: unique identifier for the snapshot filename
+# ${2}: true: open backing images; false: don't open them (default)
+function add_snapshot_image()
+{
+if [ "${2}" = "true" ]; then
+extra_params=""
+else
+extra_params="'backing': '', "
+fi
+base_image="${TEST_DIR}/$((${1}-1))-${snapshot_virt0}"
+snapshot_file="${TEST_DIR}/${1}-${snapshot_virt0}"
+_make_test_img -b "${base_image}" "$size"
+mv "${TEST_IMG}" "${snapshot_file}"
+cmd="{ 'execute': 'blockdev-add', 'arguments':
+   { 'options':
+ { 'driver': 'qcow2', 'node-name': 'snap_"${1}"', "${extra_params}"
+   'file':
+   { 'driver': 'file', 'filename': '"${snapshot_file}"' } } } }"
+_send_qemu_cmd $h "${cmd}" "return"
+}
+
+# ${1}: unique identifier for the snapshot filename
+# ${2}: expected response, defaults to 'return'
+function blockdev_snapshot()
+{
+cmd="{ 'execute': 'blockdev-snapshot',
+  'arguments': { 'node': 'virtio0',
+ 'overlay':'snap_"${1}"' } }"
+_send_qemu_cmd $h "${cmd}" "${2:-return}"
+}
+
 size=128M
 
 _make_test_img $size
-mv "${TEST_IMG}" "${TEST_IMG}.orig"
+mv "${TEST_IMG}" "${TEST_IMG}.1"
 _make_test_img $size
+mv "${TEST_IMG}" "${TEST_IMG}.2"
 
 echo
 echo === Running QEMU ===
 echo
 
 qemu_comm_method="qmp"
-_launch_qemu -drive file="${TEST_IMG}.orig",if=virtio -drive 
file="${TEST_IMG}",if=virtio
+_launch_qemu -drive file="${TEST_IMG}.1",if=virtio -drive 
file="${TEST_IMG}.2",if=virtio
 h=$QEMU_HANDLE
 
 echo
@@ -105,6 +138,8 @@ echo
 
 _send_qemu_cmd $h "{ 'execute': 'qmp_capabilities' }" "return"
 
+# Tests for the blockdev-snapshot-sync command
+
 echo
 echo === Create a single snapshot on virtio0 ===
 echo
@@ -132,11 +167,66 @@ echo
 echo === Create several transactional group snapshots ===
 echo
 
-for i in $(seq 2 ${MAX_SNAPSHOTS})
+for i in $(seq 2 ${SNAPSHOTS})
 do
 create_group_snapshot ${i}
 done
 
+# Tests for the blockdev-snapshot command
+
+echo
+echo === Create a couple of snapshots using blockdev-snapshot ===
+echo
+
+SNAPSHOTS=$((${SNAPSHOTS}+1))
+add_snapshot_image ${SNAPSHOTS}
+blockdev_snapshot ${SNAPSHOTS}
+
+SNAPSHOTS=$((${SNAPSHOTS}+1))
+add_snapshot_image ${SNAPSHOTS}
+blockdev_snapshot ${SNAPSHOTS}
+
+echo
+echo === Invalid command - snapshot node used as active layer ===
+echo
+
+blockdev_snapshot ${SNAPSHOTS} error
+
+_send_qemu_cmd $h "{ 'execute': 'blockdev-snapshot',
+ 'arguments': { 'node':'virtio0',
+'overlay':'virtio0' }
+   }" "error"
+
+_send_qemu_cmd $h "{ 'execute': 'blockdev-snapshot',
+ 'arguments': { 'node':'virtio0',
+'overlay':'virtio1' }
+   }" "error"
+
+echo
+echo === Invalid command - snapshot node used as backing hd ===
+echo
+
+blockdev_snapshot $((${SNAPSHOTS}-1)) error
+
+echo
+echo === Invalid command - snapshot node has a backing image ===
+echo
+
+SNAPSHOTS=$((${SNAPSHOTS}+1))
+add_snapshot_image ${SNAPSHOTS} true
+blockdev_snapshot ${SNAPSHOTS} error
+
+echo
+echo === Invalid command - The node does not exist ===
+echo
+
+blockdev_snapshot $((${SNAPSHOTS}+1)) error
+
+_send_qemu_cmd $h "{ 'execute': 'blockdev-snapshot',
+ 'arguments': { 'node':'nodevice',
+'overlay':'snap_"${SNAPSHOTS}"' }
+   }" "error"
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/085.out b/tests/qemu-iotests/085.out
index a6cf19e..52292ea 100644
--- a/tests/qemu-iotests/085.out
+++ b/tests/qemu-iotests/085.out
@@ -11,7 +11,7 @@ Formatting 'TES

[Qemu-block] [PATCH v7 2/5] block: rename BlockdevSnapshot to BlockdevSnapshotSync

2015-10-12 Thread Alberto Garcia
We will introduce the 'blockdev-snapshot' command that will require
its own struct for the parameters, so we need to rename this one in
order to avoid name clashes.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
Reviewed-by: Kevin Wolf 
---
 blockdev.c   | 2 +-
 qapi-schema.json | 2 +-
 qapi/block-core.json | 8 
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 0898d1f..12741a0 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1166,7 +1166,7 @@ void qmp_blockdev_snapshot_sync(bool has_device, const 
char *device,
 bool has_format, const char *format,
 bool has_mode, NewImageMode mode, Error **errp)
 {
-BlockdevSnapshot snapshot = {
+BlockdevSnapshotSync snapshot = {
 .has_device = has_device,
 .device = (char *) device,
 .has_node_name = has_node_name,
diff --git a/qapi-schema.json b/qapi-schema.json
index a05794e..65701dc 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1534,7 +1534,7 @@
 ##
 { 'union': 'TransactionAction',
   'data': {
-   'blockdev-snapshot-sync': 'BlockdevSnapshot',
+   'blockdev-snapshot-sync': 'BlockdevSnapshotSync',
'drive-backup': 'DriveBackup',
'blockdev-backup': 'BlockdevBackup',
'abort': 'Abort',
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 5f12af7..6b5ac02 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -682,7 +682,7 @@
   'data': [ 'existing', 'absolute-paths' ] }
 
 ##
-# @BlockdevSnapshot
+# @BlockdevSnapshotSync
 #
 # Either @device or @node-name must be set but not both.
 #
@@ -699,7 +699,7 @@
 # @mode: #optional whether and how QEMU should create a new image, default is
 #'absolute-paths'.
 ##
-{ 'struct': 'BlockdevSnapshot',
+{ 'struct': 'BlockdevSnapshotSync',
   'data': { '*device': 'str', '*node-name': 'str',
 'snapshot-file': 'str', '*snapshot-node-name': 'str',
 '*format': 'str', '*mode': 'NewImageMode' } }
@@ -790,7 +790,7 @@
 #
 # Generates a synchronous snapshot of a block device.
 #
-# For the arguments, see the documentation of BlockdevSnapshot.
+# For the arguments, see the documentation of BlockdevSnapshotSync.
 #
 # Returns: nothing on success
 #  If @device is not a valid block device, DeviceNotFound
@@ -798,7 +798,7 @@
 # Since 0.14.0
 ##
 { 'command': 'blockdev-snapshot-sync',
-  'data': 'BlockdevSnapshot' }
+  'data': 'BlockdevSnapshotSync' }
 
 ##
 # @change-backing-file
-- 
2.6.1




[Qemu-block] [PATCH 1/3] aio: Move AioHandler struct to header

2015-10-12 Thread Fam Zheng
AioHandler for win32 is a superset of the counterpart in aio-posix, move that
to a new header "aio-internal.h" and drop the posix variation.

Signed-off-by: Fam Zheng 
---
 aio-posix.c  | 11 +--
 aio-win32.c  | 12 +---
 include/block/aio-internal.h | 30 ++
 3 files changed, 32 insertions(+), 21 deletions(-)
 create mode 100644 include/block/aio-internal.h

diff --git a/aio-posix.c b/aio-posix.c
index d477033..7ae54fc 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -17,16 +17,7 @@
 #include "block/block.h"
 #include "qemu/queue.h"
 #include "qemu/sockets.h"
-
-struct AioHandler
-{
-GPollFD pfd;
-IOHandler *io_read;
-IOHandler *io_write;
-int deleted;
-void *opaque;
-QLIST_ENTRY(AioHandler) node;
-};
+#include "block/aio-internal.h"
 
 static AioHandler *find_aio_handler(AioContext *ctx, int fd)
 {
diff --git a/aio-win32.c b/aio-win32.c
index 50a6867..f018934 100644
--- a/aio-win32.c
+++ b/aio-win32.c
@@ -19,17 +19,7 @@
 #include "block/block.h"
 #include "qemu/queue.h"
 #include "qemu/sockets.h"
-
-struct AioHandler {
-EventNotifier *e;
-IOHandler *io_read;
-IOHandler *io_write;
-EventNotifierHandler *io_notify;
-GPollFD pfd;
-int deleted;
-void *opaque;
-QLIST_ENTRY(AioHandler) node;
-};
+#include "block/aio-internal.h"
 
 void aio_set_fd_handler(AioContext *ctx,
 int fd,
diff --git a/include/block/aio-internal.h b/include/block/aio-internal.h
new file mode 100644
index 000..2ffbcdc
--- /dev/null
+++ b/include/block/aio-internal.h
@@ -0,0 +1,30 @@
+/*
+ * QEMU aio internal interface
+ *
+ * Copyright Red Hat, Inc. 2015
+ *
+ * Authors:
+ *  Fam Zheng 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_AIO_INTERNAL_H
+#define QEMU_AIO_INTERNAL_H
+
+#include "block/aio.h"
+
+struct AioHandler {
+EventNotifier *e;
+IOHandler *io_read;
+IOHandler *io_write;
+EventNotifierHandler *io_notify;
+GPollFD pfd;
+int deleted;
+void *opaque;
+QLIST_ENTRY(AioHandler) node;
+};
+
+#endif
-- 
2.6.1




[Qemu-block] [PATCH 0/3] aio: Use epoll in aio_poll()

2015-10-12 Thread Fam Zheng
This series adds the ability to use epoll in aio_poll() on Linux. It's switched
on in a dynamic way rather than static for two reasons: 1) when the number of
fds is not high enough, using epoll has little advantage; 2) when an epoll
incompatible fd needs to be handled, we need to fall back.  The epoll is
enabled when a fd number threshold is met.



Fam Zheng (3):
  aio: Move AioHandler struct to header
  aio: Introduce aio_context_setup
  aio: Introduce aio-epoll.c

 Makefile.objs|   1 +
 aio-epoll.c  | 150 +++
 aio-posix.c  |  31 +
 aio-win32.c  |  16 ++---
 async.c  |  14 +++-
 include/block/aio-internal.h |  47 ++
 include/block/aio.h  |   5 ++
 stubs/Makefile.objs  |   1 +
 stubs/aio-epoll.c|  37 +++
 9 files changed, 277 insertions(+), 25 deletions(-)
 create mode 100644 aio-epoll.c
 create mode 100644 include/block/aio-internal.h
 create mode 100644 stubs/aio-epoll.c

-- 
2.6.1




[Qemu-block] [PATCH 2/3] aio: Introduce aio_context_setup

2015-10-12 Thread Fam Zheng
This is the place to initialize platform specific bits of AioContext.

Signed-off-by: Fam Zheng 
---
 aio-posix.c  |  4 
 aio-win32.c  |  4 
 async.c  | 14 --
 include/block/aio-internal.h |  2 ++
 4 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/aio-posix.c b/aio-posix.c
index 7ae54fc..4fd2383 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -288,3 +288,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
 
 return progress;
 }
+
+void aio_context_setup(AioContext *ctx, Error **errp)
+{
+}
diff --git a/aio-win32.c b/aio-win32.c
index f018934..7873141 100644
--- a/aio-win32.c
+++ b/aio-win32.c
@@ -353,3 +353,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
 aio_context_release(ctx);
 return progress;
 }
+
+void aio_context_setup(AioContext *ctx, Error **errp)
+{
+}
diff --git a/async.c b/async.c
index efce14b..72cdc9b 100644
--- a/async.c
+++ b/async.c
@@ -27,6 +27,7 @@
 #include "block/thread-pool.h"
 #include "qemu/main-loop.h"
 #include "qemu/atomic.h"
+#include "block/aio-internal.h"
 
 /***/
 /* bottom halves (can be seen as timers which expire ASAP) */
@@ -320,12 +321,18 @@ AioContext *aio_context_new(Error **errp)
 {
 int ret;
 AioContext *ctx;
+Error *local_err = NULL;
+
 ctx = (AioContext *) g_source_new(&aio_source_funcs, sizeof(AioContext));
+aio_context_setup(ctx, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+goto fail;
+}
 ret = event_notifier_init(&ctx->notifier, false);
 if (ret < 0) {
-g_source_destroy(&ctx->source);
 error_setg_errno(errp, -ret, "Failed to initialize event notifier");
-return NULL;
+goto fail;
 }
 g_source_set_can_recurse(&ctx->source, true);
 aio_set_event_notifier(ctx, &ctx->notifier,
@@ -339,6 +346,9 @@ AioContext *aio_context_new(Error **errp)
 ctx->notify_dummy_bh = aio_bh_new(ctx, notify_dummy_bh, NULL);
 
 return ctx;
+fail:
+g_source_destroy(&ctx->source);
+return NULL;
 }
 
 void aio_context_ref(AioContext *ctx)
diff --git a/include/block/aio-internal.h b/include/block/aio-internal.h
index 2ffbcdc..f50a37c 100644
--- a/include/block/aio-internal.h
+++ b/include/block/aio-internal.h
@@ -27,4 +27,6 @@ struct AioHandler {
 QLIST_ENTRY(AioHandler) node;
 };
 
+void aio_context_setup(AioContext *ctx, Error **errp);
+
 #endif
-- 
2.6.1




[Qemu-block] [PATCH 3/3] aio: Introduce aio-epoll.c

2015-10-12 Thread Fam Zheng
To minimize code duplication, epoll is hooked into aio-posix's
aio_poll() instead of rolling its own. This approach also has the
advantage that both compile time and run time ability to switch from
between the two:

1) If configure script didn't find epoll, the libqemustub.a nop
functions will be used, which selects the usual ppoll.

2) When QEMU starts with a small number of fds in the event loop, ppoll
is used.

3) When QEMU starts with a big number of fds, or when more devices are
hot plugged after starting up, epoll automatically kicks in after the
number of fds hits the threshold.

4) Some fds may not support epoll, such as tty based stdio. In this
case, we can fall back to ppoll.

Signed-off-by: Fam Zheng 
---
 Makefile.objs|   1 +
 aio-epoll.c  | 150 +++
 aio-posix.c  |  16 -
 include/block/aio-internal.h |  15 +
 include/block/aio.h  |   5 ++
 stubs/Makefile.objs  |   1 +
 stubs/aio-epoll.c|  37 +++
 7 files changed, 223 insertions(+), 2 deletions(-)
 create mode 100644 aio-epoll.c
 create mode 100644 stubs/aio-epoll.c

diff --git a/Makefile.objs b/Makefile.objs
index bc43e5c..8f401b7 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -10,6 +10,7 @@ util-obj-y += qmp-introspect.o qapi-types.o qapi-visit.o 
qapi-event.o
 block-obj-y = async.o thread-pool.o
 block-obj-y += nbd.o block.o blockjob.o
 block-obj-y += main-loop.o iohandler.o qemu-timer.o
+block-obj-$(CONFIG_EPOLL) += aio-epoll.o
 block-obj-$(CONFIG_POSIX) += aio-posix.o
 block-obj-$(CONFIG_WIN32) += aio-win32.o
 block-obj-y += block/
diff --git a/aio-epoll.c b/aio-epoll.c
new file mode 100644
index 000..4557dcb
--- /dev/null
+++ b/aio-epoll.c
@@ -0,0 +1,150 @@
+/*
+ * QEMU aio implementation
+ *
+ * Copyright Red Hat, Inc, 2015
+ *
+ * Authors:
+ *  Fam Zheng 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Contributions after 2012-01-13 are licensed under the terms of the
+ * GNU GPL, version 2 or (at your option) any later version.
+ */
+
+#include "qemu-common.h"
+#include "block/block.h"
+#include "qemu/queue.h"
+#include "block/aio-internal.h"
+#include 
+
+/* The fd number threashold to switch to epoll */
+#define EPOLL_ENABLE_THRESHOLD 64
+
+static void aio_epoll_disable(AioContext *ctx)
+{
+ctx->epoll_available = false;
+if (!ctx->epoll_enabled) {
+return;
+}
+ctx->epoll_enabled = false;
+close(ctx->epollfd);
+}
+
+static inline int epoll_events_from_pfd(int pfd_events)
+{
+return (pfd_events & G_IO_IN ? EPOLLIN : 0) |
+   (pfd_events & G_IO_OUT ? EPOLLOUT : 0) |
+   (pfd_events & G_IO_HUP ? EPOLLHUP : 0) |
+   (pfd_events & G_IO_ERR ? EPOLLERR : 0);
+}
+
+static bool aio_epoll_try_enable(AioContext *ctx)
+{
+AioHandler *node;
+struct epoll_event event;
+if (!ctx->epoll_available) {
+return false;
+}
+
+QLIST_FOREACH(node, &ctx->aio_handlers, node) {
+int r;
+if (node->deleted || !node->pfd.events) {
+continue;
+}
+event.events = epoll_events_from_pfd(node->pfd.events);
+event.data.ptr = node;
+r = epoll_ctl(ctx->epollfd, EPOLL_CTL_ADD, node->pfd.fd, &event);
+if (r) {
+return false;
+}
+}
+ctx->epoll_enabled = true;
+return true;
+}
+
+void aio_epoll_update(AioContext *ctx, AioHandler *node, bool is_new)
+{
+struct epoll_event event;
+int r;
+
+if (!ctx->epoll_enabled) {
+return;
+}
+if (!node->pfd.events) {
+r = epoll_ctl(ctx->epollfd, EPOLL_CTL_DEL, node->pfd.fd, &event);
+assert(!r);
+} else {
+event.data.ptr = node;
+event.events = epoll_events_from_pfd(node->pfd.events);
+if (is_new) {
+r = epoll_ctl(ctx->epollfd, EPOLL_CTL_ADD, node->pfd.fd, &event);
+if (r) {
+aio_epoll_disable(ctx);
+}
+} else {
+r = epoll_ctl(ctx->epollfd, EPOLL_CTL_MOD, node->pfd.fd, &event);
+assert(!r);
+}
+}
+}
+
+int aio_epoll(AioContext *ctx, GPollFD *pfds, unsigned npfd, int64_t timeout)
+{
+AioHandler *node;
+int i, ret = 0;
+struct epoll_event events[128];
+
+assert(npfd == 1);
+assert(pfds[0].fd == ctx->epollfd);
+if (timeout > 0) {
+ret = qemu_poll_ns(pfds, npfd, timeout);
+}
+if (timeout <= 0 || ret > 0) {
+ret = epoll_wait(ctx->epollfd, events,
+ sizeof(events) / sizeof(events[0]),
+ timeout);
+if (ret <= 0) {
+goto out;
+}
+for (i = 0; i < ret; i++) {
+int ev = events[i].events;
+node = events[i].data.ptr;
+node->pfd.revents = (ev & EPOLLIN ? G_IO_IN : 0) |
+(ev & EPOLLOUT ? G_IO_OUT : 0) |
+ 

Re: [Qemu-block] [PATCH 3/3] aio: Introduce aio-epoll.c

2015-10-12 Thread Paolo Bonzini


On 12/10/2015 11:55, Fam Zheng wrote:
> Signed-off-by: Fam Zheng 
> ---
>  Makefile.objs|   1 +
>  aio-epoll.c  | 150 
> +++
>  aio-posix.c  |  16 -
>  include/block/aio-internal.h |  15 +
>  include/block/aio.h  |   5 ++
>  stubs/Makefile.objs  |   1 +
>  stubs/aio-epoll.c|  37 +++
>  7 files changed, 223 insertions(+), 2 deletions(-)
>  create mode 100644 aio-epoll.c
>  create mode 100644 stubs/aio-epoll.c

aio-epoll.c seems small enough that you can just include everything in
aio-posix.c.  This should simplify everything and make patch 1
unnecessary, too.

Paolo



Re: [Qemu-block] [Qemu-devel] [PATCH 3/3] aio: Introduce aio-epoll.c

2015-10-12 Thread Fam Zheng
On Mon, 10/12 12:06, Paolo Bonzini wrote:
> 
> 
> On 12/10/2015 11:55, Fam Zheng wrote:
> > Signed-off-by: Fam Zheng 
> > ---
> >  Makefile.objs|   1 +
> >  aio-epoll.c  | 150 
> > +++
> >  aio-posix.c  |  16 -
> >  include/block/aio-internal.h |  15 +
> >  include/block/aio.h  |   5 ++
> >  stubs/Makefile.objs  |   1 +
> >  stubs/aio-epoll.c|  37 +++
> >  7 files changed, 223 insertions(+), 2 deletions(-)
> >  create mode 100644 aio-epoll.c
> >  create mode 100644 stubs/aio-epoll.c
> 
> aio-epoll.c seems small enough that you can just include everything in
> aio-posix.c.  This should simplify everything and make patch 1
> unnecessary, too.

OK, I can do that.

Fam



Re: [Qemu-block] [Qemu-devel] [PATCH 05/12] aio: introduce aio_{disable, enable}_clients

2015-10-12 Thread Fam Zheng
On Mon, 10/12 10:31, Kevin Wolf wrote:
> Am 09.10.2015 um 18:27 hat Fam Zheng geschrieben:
> > On Fri, 10/09 16:31, Kevin Wolf wrote:
> > > Am 09.10.2015 um 07:45 hat Fam Zheng geschrieben:
> > > > Signed-off-by: Fam Zheng 
> > > > ---
> > > >  aio-posix.c |  3 ++-
> > > >  aio-win32.c |  3 ++-
> > > >  async.c | 42 ++
> > > >  include/block/aio.h | 30 ++
> > > >  4 files changed, 76 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/aio-posix.c b/aio-posix.c
> > > > index d25fcfc..a261892 100644
> > > > --- a/aio-posix.c
> > > > +++ b/aio-posix.c
> > > > @@ -261,7 +261,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
> > > >  
> > > >  /* fill pollfds */
> > > >  QLIST_FOREACH(node, &ctx->aio_handlers, node) {
> > > > -if (!node->deleted && node->pfd.events) {
> > > > +if (!node->deleted && node->pfd.events
> > > > +&& !aio_type_disabled(ctx, node->type)) {
> > > >  add_pollfd(node);
> > > >  }
> > > >  }
> > > > diff --git a/aio-win32.c b/aio-win32.c
> > > > index f5ecf57..66cff60 100644
> > > > --- a/aio-win32.c
> > > > +++ b/aio-win32.c
> > > > @@ -309,7 +309,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
> > > >  /* fill fd sets */
> > > >  count = 0;
> > > >  QLIST_FOREACH(node, &ctx->aio_handlers, node) {
> > > > -if (!node->deleted && node->io_notify) {
> > > > +if (!node->deleted && node->io_notify
> > > > +&& !aio_type_disabled(ctx, node->type)) {
> > > >  events[count++] = event_notifier_get_handle(node->e);
> > > >  }
> > > >  }
> > > > diff --git a/async.c b/async.c
> > > > index 244bf79..855b9d5 100644
> > > > --- a/async.c
> > > > +++ b/async.c
> > > > @@ -361,3 +361,45 @@ void aio_context_release(AioContext *ctx)
> > > >  {
> > > >  rfifolock_unlock(&ctx->lock);
> > > >  }
> > > > +
> > > > +bool aio_type_disabled(AioContext *ctx, int type)
> > > > +{
> > > > +int i = 1;
> > > > +int n = 0;
> > > > +
> > > > +while (type) {
> > > > +bool b = type & 0x1;
> > > > +type >>= 1;
> > > > +n++;
> > > 
> > > Any specific reason for leaving client_disable_counters[0] unused?
> > 
> > No, I should have started from 0.
> > 
> > > 
> > > > +i <<= 1;
> > > 
> > > i is never read.
> > > 
> > > > +if (!b) {
> > > > +continue;
> > > > +}
> > > > +if (ctx->client_disable_counters[n]) {
> > > > +return true;
> > > > +}
> > > > +}
> > > > +return false;
> > > > +}
> > > 
> > > In general I wonder whether this function really needs to take a mask
> > > with possibly multiple set bits instead of just a single type.
> > 
> > Previous versions used to have more types than "internal" and "external", 
> > so it
> > has been a mask. So yes, I think a single type will be better now.
> > 
> > > 
> > > > +void aio_disable_enable_clients(AioContext *ctx, int clients_mask,
> > > > +bool is_disable)
> > > > +{
> > > > +int i = 1;
> > > > +int n = 0;
> > > > +aio_context_acquire(ctx);
> > > > +
> > > > +while (clients_mask) {
> > > > +bool b = clients_mask & 0x1;
> > > > +clients_mask >>= 1;
> > > > +n++;
> > > > +i <<= 1;
> > > 
> > > This i isn't used either.
> > > 
> > > > +if (!b) {
> > > > +continue;
> > > > +}
> > > > +if (ctx->client_disable_counters[n]) {
> > > > +return true;
> > > > +}
> > > 
> > > Wait, why are you checking the state instead of setting it?
> > 
> > Oops, apparent I screwed my workspaces as I do remember coding this 
> > assignment.
> > And I must have used a wrong command when building the tree so that I don't
> > even catch the compiling error. :(
> > 
> > > 
> > > How did you test this series?
> > 
> > So far only smoke testing and qemu-iotests, because I don't have a good 
> > idea of
> > testifying the transaction's atomicity. Any suggestions?
> 
> Perhaps you could use blkdebug to delay something in the middle of the
> transaction while your guest keeps writing stuff? That should result in
> 100% reproducability.
> 
> I guess you actually need to make sure that your guest doesn't do any
> I/O, then set the blkdebug breakpoint, send the transaction, and once a
> request is stopped, you start some I/O in the guest. Resume as soon as
> you know that something bad happened.
> 
> Possibly you need to add a new blkdebug event to find a good place to
> suspend a transaction request.
> 

It's difficult to "start some I/O" in the guest in the middle of transaction,
even with help of blkdebug, because BQL is hold during the whole transaction.

I think it would be a bit easier to program a VCPU to constantly submit I/O
requests to the vq, but that's far from enough.

Anyway I'll start by writing some unit test code instead, in 

[Qemu-block] [PATCH v2 01/12] aio: Add "is_external" flag for event handlers

2015-10-12 Thread Fam Zheng
All callers pass in false, and the real external ones will switch to
true in coming patches.

Signed-off-by: Fam Zheng 
---
 aio-posix.c |  6 -
 aio-win32.c |  5 
 async.c |  3 ++-
 block/curl.c| 14 +-
 block/iscsi.c   |  9 +++
 block/linux-aio.c   |  5 ++--
 block/nbd-client.c  | 10 ---
 block/nfs.c | 17 +---
 block/sheepdog.c| 38 ++-
 block/ssh.c |  5 ++--
 block/win32-aio.c   |  5 ++--
 hw/block/dataplane/virtio-blk.c |  6 +++--
 hw/scsi/virtio-scsi-dataplane.c | 24 +++--
 include/block/aio.h |  2 ++
 iohandler.c |  3 ++-
 nbd.c   |  4 ++-
 tests/test-aio.c| 58 +++--
 17 files changed, 130 insertions(+), 84 deletions(-)

diff --git a/aio-posix.c b/aio-posix.c
index d477033..f0f9122 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -25,6 +25,7 @@ struct AioHandler
 IOHandler *io_write;
 int deleted;
 void *opaque;
+bool is_external;
 QLIST_ENTRY(AioHandler) node;
 };
 
@@ -43,6 +44,7 @@ static AioHandler *find_aio_handler(AioContext *ctx, int fd)
 
 void aio_set_fd_handler(AioContext *ctx,
 int fd,
+bool is_external,
 IOHandler *io_read,
 IOHandler *io_write,
 void *opaque)
@@ -82,6 +84,7 @@ void aio_set_fd_handler(AioContext *ctx,
 node->io_read = io_read;
 node->io_write = io_write;
 node->opaque = opaque;
+node->is_external = is_external;
 
 node->pfd.events = (io_read ? G_IO_IN | G_IO_HUP | G_IO_ERR : 0);
 node->pfd.events |= (io_write ? G_IO_OUT | G_IO_ERR : 0);
@@ -92,10 +95,11 @@ void aio_set_fd_handler(AioContext *ctx,
 
 void aio_set_event_notifier(AioContext *ctx,
 EventNotifier *notifier,
+bool is_external,
 EventNotifierHandler *io_read)
 {
 aio_set_fd_handler(ctx, event_notifier_get_fd(notifier),
-   (IOHandler *)io_read, NULL, notifier);
+   is_external, (IOHandler *)io_read, NULL, notifier);
 }
 
 bool aio_prepare(AioContext *ctx)
diff --git a/aio-win32.c b/aio-win32.c
index 50a6867..3110d85 100644
--- a/aio-win32.c
+++ b/aio-win32.c
@@ -28,11 +28,13 @@ struct AioHandler {
 GPollFD pfd;
 int deleted;
 void *opaque;
+bool is_external;
 QLIST_ENTRY(AioHandler) node;
 };
 
 void aio_set_fd_handler(AioContext *ctx,
 int fd,
+bool is_external,
 IOHandler *io_read,
 IOHandler *io_write,
 void *opaque)
@@ -86,6 +88,7 @@ void aio_set_fd_handler(AioContext *ctx,
 node->opaque = opaque;
 node->io_read = io_read;
 node->io_write = io_write;
+node->is_external = is_external;
 
 event = event_notifier_get_handle(&ctx->notifier);
 WSAEventSelect(node->pfd.fd, event,
@@ -98,6 +101,7 @@ void aio_set_fd_handler(AioContext *ctx,
 
 void aio_set_event_notifier(AioContext *ctx,
 EventNotifier *e,
+bool is_external,
 EventNotifierHandler *io_notify)
 {
 AioHandler *node;
@@ -133,6 +137,7 @@ void aio_set_event_notifier(AioContext *ctx,
 node->e = e;
 node->pfd.fd = (uintptr_t)event_notifier_get_handle(e);
 node->pfd.events = G_IO_IN;
+node->is_external = is_external;
 QLIST_INSERT_HEAD(&ctx->aio_handlers, node, node);
 
 g_source_add_poll(&ctx->source, &node->pfd);
diff --git a/async.c b/async.c
index efce14b..bdc64a3 100644
--- a/async.c
+++ b/async.c
@@ -247,7 +247,7 @@ aio_ctx_finalize(GSource *source)
 }
 qemu_mutex_unlock(&ctx->bh_lock);
 
-aio_set_event_notifier(ctx, &ctx->notifier, NULL);
+aio_set_event_notifier(ctx, &ctx->notifier, false, NULL);
 event_notifier_cleanup(&ctx->notifier);
 rfifolock_destroy(&ctx->lock);
 qemu_mutex_destroy(&ctx->bh_lock);
@@ -329,6 +329,7 @@ AioContext *aio_context_new(Error **errp)
 }
 g_source_set_can_recurse(&ctx->source, true);
 aio_set_event_notifier(ctx, &ctx->notifier,
+   false,
(EventNotifierHandler *)
event_notifier_dummy_cb);
 ctx->thread_pool = NULL;
diff --git a/block/curl.c b/block/curl.c
index 032cc8a..8994182 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -154,18 +154,20 @@ static int curl_sock_cb(CURL *curl, curl_socket_t fd, int 
action,
 DPRINTF("CURL (AIO): Sock action %d on fd %d\n", action, fd);
 switch (acti

[Qemu-block] [PATCH v2 00/12] block: bdrv_drained_begin/end for transactions on dataplane devices

2015-10-12 Thread Fam Zheng
v2: Use "bool" for external/internal instead of bit mask in the interface, so I
didn't pick up Kevin's rev-by due to patches change.
Add a unit test in tests/test-aio.c.

External I/O requests mustn't come in while we're in the middle of a
QMP transaction. This series adds bdrv_drained_begin and bdrv_drained_end
around "prepare" and "cleanup" in the transaction actions to make sure that.

Note that "backup" action already starts the block job coroutine in prepare,
and "snapshot" action already creates the snapshot, but both are OK because we
call bdrv_drained_begin first.

The nested event loops also dispatch timers and BHs on the same AioContext. The
existing timers are iscsi, curl, qcow2 and qed. The only one that does I/O is
qed, which is dealt with by a new ".bdrv_drain" callback.

Fam Zheng (12):
  aio: Add "is_external" flag for event handlers
  nbd: Mark fd handlers client type as "external"
  dataplane: Mark host notifiers' client type as "external"
  aio: introduce aio_{disable,enable}_external
  block: Introduce "drained begin/end" API
  block: Add "drained begin/end" for transactional external snapshot
  block: Add "drained begin/end" for transactional backup
  block: Add "drained begin/end" for transactional blockdev-backup
  block: Add "drained begin/end" for internal snapshot
  block: Introduce BlockDriver.bdrv_drain callback
  qed: Implement .bdrv_drain
  tests: Add test case for aio_disable_external

 aio-posix.c |  9 -
 aio-win32.c |  8 +++-
 async.c |  3 +-
 block.c |  2 +
 block/curl.c| 14 ---
 block/io.c  | 21 +++
 block/iscsi.c   |  9 ++---
 block/linux-aio.c   |  5 ++-
 block/nbd-client.c  | 10 +++--
 block/nfs.c | 17 -
 block/qed.c |  7 
 block/sheepdog.c| 38 ---
 block/ssh.c |  5 ++-
 block/win32-aio.c   |  5 ++-
 blockdev.c  | 27 +++---
 hw/block/dataplane/virtio-blk.c |  5 ++-
 hw/scsi/virtio-scsi-dataplane.c | 22 +++
 include/block/aio.h | 39 
 include/block/block.h   | 19 ++
 include/block/block_int.h   |  8 
 iohandler.c |  3 +-
 nbd.c   |  4 +-
 tests/test-aio.c| 82 -
 23 files changed, 270 insertions(+), 92 deletions(-)

-- 
2.6.1




[Qemu-block] [PATCH v2 02/12] nbd: Mark fd handlers client type as "external"

2015-10-12 Thread Fam Zheng
So we could distinguish it from internal used fds, thus avoid handling
unwanted events in nested aio polls.

Signed-off-by: Fam Zheng 
---
 nbd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/nbd.c b/nbd.c
index 32a1f66..b599e62 100644
--- a/nbd.c
+++ b/nbd.c
@@ -1446,7 +1446,7 @@ static void nbd_set_handlers(NBDClient *client)
 {
 if (client->exp && client->exp->ctx) {
 aio_set_fd_handler(client->exp->ctx, client->sock,
-   false,
+   true,
client->can_read ? nbd_read : NULL,
client->send_coroutine ? nbd_restart_write : NULL,
client);
@@ -1457,7 +1457,7 @@ static void nbd_unset_handlers(NBDClient *client)
 {
 if (client->exp && client->exp->ctx) {
 aio_set_fd_handler(client->exp->ctx, client->sock,
-   false, NULL, NULL, NULL);
+   true, NULL, NULL, NULL);
 }
 }
 
-- 
2.6.1




[Qemu-block] [PATCH v2 04/12] aio: introduce aio_{disable, enable}_external

2015-10-12 Thread Fam Zheng
Signed-off-by: Fam Zheng 
---
 aio-posix.c |  3 ++-
 aio-win32.c |  3 ++-
 include/block/aio.h | 37 +
 3 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/aio-posix.c b/aio-posix.c
index f0f9122..0467f23 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -261,7 +261,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
 
 /* fill pollfds */
 QLIST_FOREACH(node, &ctx->aio_handlers, node) {
-if (!node->deleted && node->pfd.events) {
+if (!node->deleted && node->pfd.events
+&& aio_node_check(ctx, node->is_external)) {
 add_pollfd(node);
 }
 }
diff --git a/aio-win32.c b/aio-win32.c
index 3110d85..43c4c79 100644
--- a/aio-win32.c
+++ b/aio-win32.c
@@ -309,7 +309,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
 /* fill fd sets */
 count = 0;
 QLIST_FOREACH(node, &ctx->aio_handlers, node) {
-if (!node->deleted && node->io_notify) {
+if (!node->deleted && node->io_notify
+&& aio_node_check(ctx, node->is_external)) {
 events[count++] = event_notifier_get_handle(node->e);
 }
 }
diff --git a/include/block/aio.h b/include/block/aio.h
index 12f1141..80151d1 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -122,6 +122,8 @@ struct AioContext {
 
 /* TimerLists for calling timers - one per clock type */
 QEMUTimerListGroup tlg;
+
+int external_disable_cnt;
 };
 
 /**
@@ -375,4 +377,39 @@ static inline void aio_timer_init(AioContext *ctx,
  */
 int64_t aio_compute_timeout(AioContext *ctx);
 
+/**
+ * aio_disable_external:
+ * @ctx: the aio context
+ *
+ * Disable the furthur processing of clients.
+ */
+static inline void aio_disable_external(AioContext *ctx)
+{
+atomic_inc(&ctx->external_disable_cnt);
+}
+
+/**
+ * aio_enable_external:
+ * @ctx: the aio context
+ *
+ * Disable the processing of external clients.
+ */
+static inline void aio_enable_external(AioContext *ctx)
+{
+atomic_dec(&ctx->external_disable_cnt);
+}
+
+/**
+ * aio_node_check:
+ * @ctx: the aio context
+ * @is_external: Whether or not the checked node is an external event source.
+ *
+ * Check if the node's is_external flag is okey to be polled by the ctx at this
+ * moment. True means green light.
+ */
+static inline bool aio_node_check(AioContext *ctx, bool is_external)
+{
+return !is_external || !atomic_read(&ctx->external_disable_cnt);
+}
+
 #endif
-- 
2.6.1




[Qemu-block] [PATCH v2 03/12] dataplane: Mark host notifiers' client type as "external"

2015-10-12 Thread Fam Zheng
They will be excluded by type in the nested event loops in block layer,
so that unwanted events won't be processed there.

Signed-off-by: Fam Zheng 
---
 hw/block/dataplane/virtio-blk.c |  5 ++---
 hw/scsi/virtio-scsi-dataplane.c | 18 --
 2 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index f8716bc..c42ddeb 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -283,7 +283,7 @@ void virtio_blk_data_plane_start(VirtIOBlockDataPlane *s)
 
 /* Get this show started by hooking up our callbacks */
 aio_context_acquire(s->ctx);
-aio_set_event_notifier(s->ctx, &s->host_notifier, false,
+aio_set_event_notifier(s->ctx, &s->host_notifier, true,
handle_notify);
 aio_context_release(s->ctx);
 return;
@@ -320,8 +320,7 @@ void virtio_blk_data_plane_stop(VirtIOBlockDataPlane *s)
 aio_context_acquire(s->ctx);
 
 /* Stop notifications for new requests from guest */
-aio_set_event_notifier(s->ctx, &s->host_notifier, false,
-   NULL);
+aio_set_event_notifier(s->ctx, &s->host_notifier, true, NULL);
 
 /* Drain and switch bs back to the QEMU main loop */
 blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context());
diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index d149418..1c188f0 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -60,8 +60,7 @@ static VirtIOSCSIVring *virtio_scsi_vring_init(VirtIOSCSI *s,
 r = g_slice_new(VirtIOSCSIVring);
 r->host_notifier = *virtio_queue_get_host_notifier(vq);
 r->guest_notifier = *virtio_queue_get_guest_notifier(vq);
-aio_set_event_notifier(s->ctx, &r->host_notifier, false,
-   handler);
+aio_set_event_notifier(s->ctx, &r->host_notifier, true, handler);
 
 r->parent = s;
 
@@ -72,8 +71,7 @@ static VirtIOSCSIVring *virtio_scsi_vring_init(VirtIOSCSI *s,
 return r;
 
 fail_vring:
-aio_set_event_notifier(s->ctx, &r->host_notifier, false,
-   NULL);
+aio_set_event_notifier(s->ctx, &r->host_notifier, true, NULL);
 k->set_host_notifier(qbus->parent, n, false);
 g_slice_free(VirtIOSCSIVring, r);
 return NULL;
@@ -165,16 +163,16 @@ static void virtio_scsi_clear_aio(VirtIOSCSI *s)
 
 if (s->ctrl_vring) {
 aio_set_event_notifier(s->ctx, &s->ctrl_vring->host_notifier,
-   false, NULL);
+   true, NULL);
 }
 if (s->event_vring) {
 aio_set_event_notifier(s->ctx, &s->event_vring->host_notifier,
-   false, NULL);
+   true, NULL);
 }
 if (s->cmd_vrings) {
 for (i = 0; i < vs->conf.num_queues && s->cmd_vrings[i]; i++) {
 aio_set_event_notifier(s->ctx, &s->cmd_vrings[i]->host_notifier,
-   false, NULL);
+   true, NULL);
 }
 }
 }
@@ -296,12 +294,12 @@ void virtio_scsi_dataplane_stop(VirtIOSCSI *s)
 aio_context_acquire(s->ctx);
 
 aio_set_event_notifier(s->ctx, &s->ctrl_vring->host_notifier,
-   false, NULL);
+   true, NULL);
 aio_set_event_notifier(s->ctx, &s->event_vring->host_notifier,
-   false, NULL);
+   true, NULL);
 for (i = 0; i < vs->conf.num_queues; i++) {
 aio_set_event_notifier(s->ctx, &s->cmd_vrings[i]->host_notifier,
-   false, NULL);
+   true, NULL);
 }
 
 blk_drain_all(); /* ensure there are no in-flight requests */
-- 
2.6.1




[Qemu-block] [PATCH v2 06/12] block: Add "drained begin/end" for transactional external snapshot

2015-10-12 Thread Fam Zheng
This ensures the atomicity of the transaction by avoiding processing of
external requests such as those from ioeventfd.

Signed-off-by: Fam Zheng 
---
 blockdev.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 32b04b4..90f1e15 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1479,6 +1479,7 @@ static void external_snapshot_prepare(BlkTransactionState 
*common,
 /* Acquire AioContext now so any threads operating on old_bs stop */
 state->aio_context = bdrv_get_aio_context(state->old_bs);
 aio_context_acquire(state->aio_context);
+bdrv_drained_begin(state->old_bs);
 
 if (!bdrv_is_inserted(state->old_bs)) {
 error_setg(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
@@ -1548,8 +1549,6 @@ static void external_snapshot_commit(BlkTransactionState 
*common)
  * don't want to abort all of them if one of them fails the reopen */
 bdrv_reopen(state->new_bs, state->new_bs->open_flags & ~BDRV_O_RDWR,
 NULL);
-
-aio_context_release(state->aio_context);
 }
 
 static void external_snapshot_abort(BlkTransactionState *common)
@@ -1559,7 +1558,14 @@ static void external_snapshot_abort(BlkTransactionState 
*common)
 if (state->new_bs) {
 bdrv_unref(state->new_bs);
 }
+}
+
+static void external_snapshot_clean(BlkTransactionState *common)
+{
+ExternalSnapshotState *state =
+ DO_UPCAST(ExternalSnapshotState, common, common);
 if (state->aio_context) {
+bdrv_drained_end(state->old_bs);
 aio_context_release(state->aio_context);
 }
 }
@@ -1724,6 +1730,7 @@ static const BdrvActionOps actions[] = {
 .prepare  = external_snapshot_prepare,
 .commit   = external_snapshot_commit,
 .abort = external_snapshot_abort,
+.clean = external_snapshot_clean,
 },
 [TRANSACTION_ACTION_KIND_DRIVE_BACKUP] = {
 .instance_size = sizeof(DriveBackupState),
-- 
2.6.1




[Qemu-block] [PATCH v2 07/12] block: Add "drained begin/end" for transactional backup

2015-10-12 Thread Fam Zheng
This ensures the atomicity of the transaction by avoiding processing of
external requests such as those from ioeventfd.

Move the assignment to state->bs up right after bdrv_drained_begin, so
that we can use it in the clean callback. The abort callback will still
check bs->job and state->job, so it's OK.

Signed-off-by: Fam Zheng 
---
 blockdev.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index 90f1e15..232bc21 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1599,6 +1599,8 @@ static void drive_backup_prepare(BlkTransactionState 
*common, Error **errp)
 /* AioContext is released in .clean() */
 state->aio_context = bdrv_get_aio_context(bs);
 aio_context_acquire(state->aio_context);
+bdrv_drained_begin(bs);
+state->bs = bs;
 
 qmp_drive_backup(backup->device, backup->target,
  backup->has_format, backup->format,
@@ -1614,7 +1616,6 @@ static void drive_backup_prepare(BlkTransactionState 
*common, Error **errp)
 return;
 }
 
-state->bs = bs;
 state->job = state->bs->job;
 }
 
@@ -1634,6 +1635,7 @@ static void drive_backup_clean(BlkTransactionState 
*common)
 DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
 
 if (state->aio_context) {
+bdrv_drained_end(state->bs);
 aio_context_release(state->aio_context);
 }
 }
-- 
2.6.1




[Qemu-block] [PATCH v2 11/12] qed: Implement .bdrv_drain

2015-10-12 Thread Fam Zheng
The "need_check_timer" is used to clear the "NEED_CHECK" flag in the
image header after a grace period once metadata update has finished. In
compliance to the bdrv_drain semantics we should make sure it remains
deleted once .bdrv_drain is called.

Call the qed_need_check_timer_cb manually to update the header
immediately.

Signed-off-by: Fam Zheng 
---
 block/qed.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/block/qed.c b/block/qed.c
index a7ff1d9..23bd273 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -381,6 +381,12 @@ static void bdrv_qed_attach_aio_context(BlockDriverState 
*bs,
 }
 }
 
+static void bdrv_qed_drain(BlockDriverState *bs)
+{
+qed_cancel_need_check_timer(bs->opaque);
+qed_need_check_timer_cb(bs->opaque);
+}
+
 static int bdrv_qed_open(BlockDriverState *bs, QDict *options, int flags,
  Error **errp)
 {
@@ -1683,6 +1689,7 @@ static BlockDriver bdrv_qed = {
 .bdrv_check   = bdrv_qed_check,
 .bdrv_detach_aio_context  = bdrv_qed_detach_aio_context,
 .bdrv_attach_aio_context  = bdrv_qed_attach_aio_context,
+.bdrv_drain   = bdrv_qed_drain,
 };
 
 static void bdrv_qed_init(void)
-- 
2.6.1




[Qemu-block] [PATCH v2 08/12] block: Add "drained begin/end" for transactional blockdev-backup

2015-10-12 Thread Fam Zheng
Similar to the previous patch, make sure that external events are not
dispatched during transaction operations.

Signed-off-by: Fam Zheng 
---
 blockdev.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index 232bc21..015afbf 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1680,6 +1680,8 @@ static void blockdev_backup_prepare(BlkTransactionState 
*common, Error **errp)
 return;
 }
 aio_context_acquire(state->aio_context);
+state->bs = bs;
+bdrv_drained_begin(bs);
 
 qmp_blockdev_backup(backup->device, backup->target,
 backup->sync,
@@ -1692,7 +1694,6 @@ static void blockdev_backup_prepare(BlkTransactionState 
*common, Error **errp)
 return;
 }
 
-state->bs = bs;
 state->job = state->bs->job;
 }
 
@@ -1712,6 +1713,7 @@ static void blockdev_backup_clean(BlkTransactionState 
*common)
 BlockdevBackupState *state = DO_UPCAST(BlockdevBackupState, common, 
common);
 
 if (state->aio_context) {
+bdrv_drained_end(state->bs);
 aio_context_release(state->aio_context);
 }
 }
-- 
2.6.1




[Qemu-block] [PATCH v2 05/12] block: Introduce "drained begin/end" API

2015-10-12 Thread Fam Zheng
The semantics is that after bdrv_drained_begin(bs), bs will not get new external
requests until the matching bdrv_drained_end(bs).

Signed-off-by: Fam Zheng 
---
 block.c   |  2 ++
 block/io.c| 18 ++
 include/block/block.h | 19 +++
 include/block/block_int.h |  2 ++
 4 files changed, 41 insertions(+)

diff --git a/block.c b/block.c
index 1f90b47..9b28a07 100644
--- a/block.c
+++ b/block.c
@@ -2058,6 +2058,8 @@ static void bdrv_move_feature_fields(BlockDriverState 
*bs_dest,
 bs_dest->device_list = bs_src->device_list;
 bs_dest->blk = bs_src->blk;
 
+bs_dest->quiesce_counter = bs_src->quiesce_counter;
+
 memcpy(bs_dest->op_blockers, bs_src->op_blockers,
sizeof(bs_dest->op_blockers));
 }
diff --git a/block/io.c b/block/io.c
index 94e18e6..5c088d5 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2618,3 +2618,21 @@ void bdrv_flush_io_queue(BlockDriverState *bs)
 }
 bdrv_start_throttled_reqs(bs);
 }
+
+void bdrv_drained_begin(BlockDriverState *bs)
+{
+if (bs->quiesce_counter++) {
+return;
+}
+aio_disable_external(bdrv_get_aio_context(bs));
+bdrv_drain(bs);
+}
+
+void bdrv_drained_end(BlockDriverState *bs)
+{
+assert(bs->quiesce_counter > 0);
+if (--bs->quiesce_counter > 0) {
+return;
+}
+aio_enable_external(bdrv_get_aio_context(bs));
+}
diff --git a/include/block/block.h b/include/block/block.h
index 2dd6630..c4f6eef 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -619,4 +619,23 @@ void bdrv_flush_io_queue(BlockDriverState *bs);
 
 BlockAcctStats *bdrv_get_stats(BlockDriverState *bs);
 
+/**
+ * bdrv_drained_begin:
+ *
+ * Begin a quiesced section for exclusive access to the BDS, by disabling
+ * external request sources including NBD server and device model. Note that
+ * this doesn't block timers or coroutines from submitting more requests, which
+ * means block_job_pause is still necessary.
+ *
+ * This function can be recursive.
+ */
+void bdrv_drained_begin(BlockDriverState *bs);
+
+/**
+ * bdrv_drained_end:
+ *
+ * End a quiescent section started by bdrv_drained_begin().
+ */
+void bdrv_drained_end(BlockDriverState *bs);
+
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 14ad4c3..7c58221 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -456,6 +456,8 @@ struct BlockDriverState {
 /* threshold limit for writes, in bytes. "High water mark". */
 uint64_t write_threshold_offset;
 NotifierWithReturn write_threshold_notifier;
+
+int quiesce_counter;
 };
 
 
-- 
2.6.1




[Qemu-block] [PATCH v2 09/12] block: Add "drained begin/end" for internal snapshot

2015-10-12 Thread Fam Zheng
This ensures the atomicity of the transaction by avoiding processing of
external requests such as those from ioeventfd.

state->bs is assigned right after bdrv_drained_begin. Because it was
used as the flag for deletion or not in abort, now we need a separate
flag - InternalSnapshotState.created.

Signed-off-by: Fam Zheng 
---
 blockdev.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 015afbf..c3da2c6 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1280,6 +1280,7 @@ typedef struct InternalSnapshotState {
 BlockDriverState *bs;
 AioContext *aio_context;
 QEMUSnapshotInfo sn;
+bool created;
 } InternalSnapshotState;
 
 static void internal_snapshot_prepare(BlkTransactionState *common,
@@ -1318,6 +1319,8 @@ static void internal_snapshot_prepare(BlkTransactionState 
*common,
 /* AioContext is released in .clean() */
 state->aio_context = bdrv_get_aio_context(bs);
 aio_context_acquire(state->aio_context);
+bdrv_drained_begin(bs);
+state->bs = bs;
 
 if (!bdrv_is_inserted(bs)) {
 error_setg(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
@@ -1375,7 +1378,7 @@ static void internal_snapshot_prepare(BlkTransactionState 
*common,
 }
 
 /* 4. succeed, mark a snapshot is created */
-state->bs = bs;
+state->created = true;
 }
 
 static void internal_snapshot_abort(BlkTransactionState *common)
@@ -1386,7 +1389,7 @@ static void internal_snapshot_abort(BlkTransactionState 
*common)
 QEMUSnapshotInfo *sn = &state->sn;
 Error *local_error = NULL;
 
-if (!bs) {
+if (!state->created) {
 return;
 }
 
@@ -1407,6 +1410,7 @@ static void internal_snapshot_clean(BlkTransactionState 
*common)
  common, common);
 
 if (state->aio_context) {
+bdrv_drained_end(state->bs);
 aio_context_release(state->aio_context);
 }
 }
-- 
2.6.1




[Qemu-block] [PATCH v2 12/12] tests: Add test case for aio_disable_external

2015-10-12 Thread Fam Zheng
Signed-off-by: Fam Zheng 
---
 tests/test-aio.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/tests/test-aio.c b/tests/test-aio.c
index 03cd45d..1623803 100644
--- a/tests/test-aio.c
+++ b/tests/test-aio.c
@@ -374,6 +374,29 @@ static void test_flush_event_notifier(void)
 event_notifier_cleanup(&data.e);
 }
 
+static void test_aio_external_client(void)
+{
+int i, j;
+
+for (i = 1; i < 3; i++) {
+EventNotifierTestData data = { .n = 0, .active = 10, .auto_set = true 
};
+event_notifier_init(&data.e, false);
+aio_set_event_notifier(ctx, &data.e, true, event_ready_cb);
+event_notifier_set(&data.e);
+for (j = 0; j < i; j++) {
+aio_disable_external(ctx);
+}
+for (j = 0; j < i; j++) {
+assert(!aio_poll(ctx, false));
+assert(event_notifier_test_and_clear(&data.e));
+event_notifier_set(&data.e);
+aio_enable_external(ctx);
+}
+assert(aio_poll(ctx, false));
+event_notifier_cleanup(&data.e);
+}
+}
+
 static void test_wait_event_notifier_noflush(void)
 {
 EventNotifierTestData data = { .n = 0 };
@@ -832,6 +855,7 @@ int main(int argc, char **argv)
 g_test_add_func("/aio/event/wait",  test_wait_event_notifier);
 g_test_add_func("/aio/event/wait/no-flush-cb",  
test_wait_event_notifier_noflush);
 g_test_add_func("/aio/event/flush", test_flush_event_notifier);
+g_test_add_func("/aio/external-client", test_aio_external_client);
 g_test_add_func("/aio/timer/schedule",  test_timer_schedule);
 
 g_test_add_func("/aio-gsource/flush",   test_source_flush);
-- 
2.6.1




[Qemu-block] [PATCH v2 10/12] block: Introduce BlockDriver.bdrv_drain callback

2015-10-12 Thread Fam Zheng
Drivers can have internal request sources that generate IO, like the
need_check_timer in QED. Since we want quiesced periods that contain
nested event loops in block layer, we need to have a way to disable such
event sources.

Block drivers must implement the "bdrv_drain" callback if it has any
internal sources that can generate I/O activity, like a timer or a
worker thread (even in a library) that can schedule QEMUBH in an
asynchronous callback.

Signed-off-by: Fam Zheng 
---
 block/io.c| 3 +++
 include/block/block_int.h | 6 ++
 2 files changed, 9 insertions(+)

diff --git a/block/io.c b/block/io.c
index 5c088d5..0e6d77c 100644
--- a/block/io.c
+++ b/block/io.c
@@ -247,6 +247,9 @@ void bdrv_drain(BlockDriverState *bs)
 {
 bool busy = true;
 
+if (bs->drv && bs->drv->bdrv_drain) {
+bs->drv->bdrv_drain(bs);
+}
 while (busy) {
 /* Keep iterating */
  bdrv_flush_io_queue(bs);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 7c58221..99359b2 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -288,6 +288,12 @@ struct BlockDriver {
  */
 int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);
 
+/**
+ * Drain and stop any internal sources of requests in the driver, and
+ * remain so until next I/O callback (e.g. bdrv_co_writev) is called.
+ */
+void (*bdrv_drain)(BlockDriverState *bs);
+
 QLIST_ENTRY(BlockDriver) list;
 };
 
-- 
2.6.1




Re: [Qemu-block] [PATCH v5 2/4] quorum: implement bdrv_add_child() and bdrv_del_child()

2015-10-12 Thread Alberto Garcia
On Fri 09 Oct 2015 05:51:55 PM CEST, Max Reitz  wrote:
 +s->bs = g_renew(BlockDriverState *, s->bs, s->max_children + 1);
 +s->bs[s->num_children] = NULL;
 +s->max_children++;
 +}
>>>
>>> Just a suggestion, please feel free to ignore it completely:
>>>
>>> You can drop the s->max_children field and just always call g_renew()
>>> with s->num_children + 1 as the @count parameter. There shouldn't be
>>> any (visible) performance penalty, but it would simplify the code.
>> 
>> If s->num_children has decreased since the previous g_renew() call
>> (because the user called quorum_del_child()) that could actually reduce
>> the array size.
>
> Yes, it could. And that would be just fine. ;-)
>
> We'd just keep the array exactly as big as it needs to be. I find that
> pretty intuitive. It's just counter-intuitive if you think one should
> never use realloc() for reducing the size of a buffer (and I know I
> myself tend to write my code thinking that).

If the goal is to keep the array exactly as big as it needs to be then
we should use g_renew() in quorum_del_child()...

Anyway we're digressing :-) this array is one pointer per Quorum child,
so the amount of memory we're talking about here is probably negligible.
I'm fine with any solution.

Berto



[Qemu-block] [PATCH 1/4] ide/atapi: make PIO read requests async

2015-10-12 Thread Peter Lieven
PIO read requests on the ATAPI interface used to be sync blk requests.
This has two significant drawbacks. First the main loop hangs util an
I/O request is completed and secondly if the I/O request does not
complete (e.g. due to an unresponsive storage) Qemu hangs completely.

Signed-off-by: Peter Lieven 
---
 hw/ide/atapi.c | 93 --
 1 file changed, 84 insertions(+), 9 deletions(-)

diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index 747f466..2271ea2 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -105,11 +105,16 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
 memset(buf, 0, 288);
 }
 
-static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)
+static int
+cd_read_sector_sync(IDEState *s, int lba, uint8_t *buf)
 {
 int ret;
 
-switch(sector_size) {
+#ifdef DEBUG_IDE_ATAPI
+printf("cd_read_sector_sync: lba=%d\n", lba);
+#endif
+
+switch (s->cd_sector_size) {
 case 2048:
 block_acct_start(blk_get_stats(s->blk), &s->acct,
  4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
@@ -129,9 +134,71 @@ static int cd_read_sector(IDEState *s, int lba, uint8_t 
*buf, int sector_size)
 ret = -EIO;
 break;
 }
+
+if (!ret) {
+s->lba++;
+s->io_buffer_index = 0;
+}
+
 return ret;
 }
 
+static void cd_read_sector_cb(void *opaque, int ret)
+{
+IDEState *s = opaque;
+
+block_acct_done(blk_get_stats(s->blk), &s->acct);
+
+#ifdef DEBUG_IDE_ATAPI
+printf("cd_read_sector_cb: lba=%d ret=%d\n", s->lba, ret);
+#endif
+
+if (ret < 0) {
+ide_atapi_io_error(s, ret);
+return;
+}
+
+if (s->cd_sector_size == 2352) {
+cd_data_to_raw(s->io_buffer, s->lba);
+}
+
+s->lba++;
+s->io_buffer_index = 0;
+s->status &= ~BUSY_STAT;
+
+ide_atapi_cmd_reply_end(s);
+}
+
+static int cd_read_sector(IDEState *s, int lba, void *buf)
+{
+if (s->cd_sector_size != 2048 && s->cd_sector_size != 2352) {
+return -EINVAL;
+}
+
+s->iov.iov_base = buf;
+if (s->cd_sector_size == 2352) {
+buf += 16;
+}
+
+s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
+qemu_iovec_init_external(&s->qiov, &s->iov, 1);
+
+#ifdef DEBUG_IDE_ATAPI
+printf("cd_read_sector: lba=%d\n", lba);
+#endif
+
+if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
+  cd_read_sector_cb, s) == NULL) {
+return -EIO;
+}
+
+block_acct_start(blk_get_stats(s->blk), &s->acct,
+ 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
+
+s->status |= BUSY_STAT;
+return 0;
+}
+
 void ide_atapi_cmd_ok(IDEState *s)
 {
 s->error = 0;
@@ -182,18 +249,27 @@ void ide_atapi_cmd_reply_end(IDEState *s)
 ide_atapi_cmd_ok(s);
 ide_set_irq(s->bus);
 #ifdef DEBUG_IDE_ATAPI
-printf("status=0x%x\n", s->status);
+printf("end of transfer, status=0x%x\n", s->status);
 #endif
 } else {
 /* see if a new sector must be read */
 if (s->lba != -1 && s->io_buffer_index >= s->cd_sector_size) {
-ret = cd_read_sector(s, s->lba, s->io_buffer, s->cd_sector_size);
-if (ret < 0) {
-ide_atapi_io_error(s, ret);
+if (!s->elementary_transfer_size) {
+ret = cd_read_sector(s, s->lba, s->io_buffer);
+if (ret < 0) {
+ide_atapi_io_error(s, ret);
+}
 return;
+} else {
+/* rebuffering within an elementary transfer is
+ * only possible with a sync request because we
+ * end up with a race condition otherwise */
+ret = cd_read_sector_sync(s, s->lba, s->io_buffer);
+if (ret < 0) {
+ide_atapi_io_error(s, ret);
+return;
+}
 }
-s->lba++;
-s->io_buffer_index = 0;
 }
 if (s->elementary_transfer_size > 0) {
 /* there are some data left to transmit in this elementary
@@ -275,7 +351,6 @@ static void ide_atapi_cmd_read_pio(IDEState *s, int lba, 
int nb_sectors,
 s->io_buffer_index = sector_size;
 s->cd_sector_size = sector_size;
 
-s->status = READY_STAT | SEEK_STAT;
 ide_atapi_cmd_reply_end(s);
 }
 
-- 
1.9.1




[Qemu-block] [PATCH 2/4] ide/atapi: blk_aio_readv may return NULL

2015-10-12 Thread Peter Lieven
Signed-off-by: Peter Lieven 
---
 hw/ide/atapi.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index 2271ea2..e0cf066 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -429,6 +429,10 @@ static void ide_atapi_cmd_read_dma_cb(void *opaque, int 
ret)
 s->bus->dma->aiocb = blk_aio_readv(s->blk, (int64_t)s->lba << 2,
&s->bus->dma->qiov, n * 4,
ide_atapi_cmd_read_dma_cb, s);
+if (s->bus->dma->aiocb == NULL) {
+ide_atapi_io_error(s, -EIO);
+goto eot;
+}
 return;
 
 eot:
-- 
1.9.1




[Qemu-block] [PATCH 3/4] ide: add support for cancelable read requests

2015-10-12 Thread Peter Lieven
this patch adds a new aio readv compatible function which copies
all data through a bounce buffer. The benefit is that these requests
can be flagged as canceled to avoid guest memory corruption when
a canceled request is completed by the backend at a later stage.

If an IDE protocol wants to use this function it has to pipe
all read requests through ide_readv_cancelable and it may then
enable requests_cancelable in the IDEState.

If this state is enable we can avoid the blocking blk_drain_all
in case of a BMDMA reset.

Currently only read operations are cancelable thus we can only
use this logic for read-only devices.

Signed-off-by: Peter Lieven 
---
 hw/ide/core.c | 54 ++
 hw/ide/internal.h | 16 
 hw/ide/pci.c  | 42 --
 3 files changed, 98 insertions(+), 14 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 317406d..24547ce 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -561,6 +561,59 @@ static bool ide_sect_range_ok(IDEState *s,
 return true;
 }
 
+static void ide_readv_cancelable_cb(void *opaque, int ret)
+{
+IDECancelableRequest *req = opaque;
+if (!req->canceled) {
+if (!ret) {
+qemu_iovec_from_buf(req->org_qiov, 0, req->buf, 
req->org_qiov->size);
+}
+req->org_cb(req->org_opaque, ret);
+}
+QLIST_REMOVE(req, list);
+qemu_vfree(req->buf);
+qemu_iovec_destroy(&req->qiov);
+g_free(req);
+}
+
+#define MAX_CANCELABLE_REQS 16
+
+BlockAIOCB *ide_readv_cancelable(IDEState *s, int64_t sector_num,
+ QEMUIOVector *iov, int nb_sectors,
+ BlockCompletionFunc *cb, void *opaque)
+{
+BlockAIOCB *aioreq;
+IDECancelableRequest *req;
+int c = 0;
+
+QLIST_FOREACH(req, &s->cancelable_requests, list) {
+c++;
+}
+if (c > MAX_CANCELABLE_REQS) {
+return NULL;
+}
+
+req = g_new0(IDECancelableRequest, 1);
+qemu_iovec_init(&req->qiov, 1);
+req->buf = qemu_blockalign(blk_bs(s->blk), iov->size);
+qemu_iovec_add(&req->qiov, req->buf, iov->size);
+req->org_qiov = iov;
+req->org_cb = cb;
+req->org_opaque = opaque;
+
+aioreq = blk_aio_readv(s->blk, sector_num, &req->qiov, nb_sectors,
+   ide_readv_cancelable_cb, req);
+if (aioreq == NULL) {
+qemu_vfree(req->buf);
+qemu_iovec_destroy(&req->qiov);
+g_free(req);
+} else {
+QLIST_INSERT_HEAD(&s->cancelable_requests, req, list);
+}
+
+return aioreq;
+}
+
 static void ide_sector_read(IDEState *s);
 
 static void ide_sector_read_cb(void *opaque, int ret)
@@ -805,6 +858,7 @@ void ide_start_dma(IDEState *s, BlockCompletionFunc *cb)
 s->bus->retry_unit = s->unit;
 s->bus->retry_sector_num = ide_get_sector(s);
 s->bus->retry_nsector = s->nsector;
+s->bus->s = s;
 if (s->bus->dma->ops->start_dma) {
 s->bus->dma->ops->start_dma(s->bus->dma, s, cb);
 }
diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index 05e93ff..ad188c2 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -343,6 +343,16 @@ enum ide_dma_cmd {
 #define ide_cmd_is_read(s) \
((s)->dma_cmd == IDE_DMA_READ)
 
+typedef struct IDECancelableRequest {
+QLIST_ENTRY(IDECancelableRequest) list;
+QEMUIOVector qiov;
+uint8_t *buf;
+QEMUIOVector *org_qiov;
+BlockCompletionFunc *org_cb;
+void *org_opaque;
+bool canceled;
+} IDECancelableRequest;
+
 /* NOTE: IDEState represents in fact one drive */
 struct IDEState {
 IDEBus *bus;
@@ -396,6 +406,8 @@ struct IDEState {
 BlockAIOCB *pio_aiocb;
 struct iovec iov;
 QEMUIOVector qiov;
+QLIST_HEAD(, IDECancelableRequest) cancelable_requests;
+bool requests_cancelable;
 /* ATA DMA state */
 int32_t io_buffer_offset;
 int32_t io_buffer_size;
@@ -468,6 +480,7 @@ struct IDEBus {
 uint8_t retry_unit;
 int64_t retry_sector_num;
 uint32_t retry_nsector;
+IDEState *s;
 };
 
 #define TYPE_IDE_DEVICE "ide-device"
@@ -572,6 +585,9 @@ void ide_set_inactive(IDEState *s, bool more);
 BlockAIOCB *ide_issue_trim(BlockBackend *blk,
 int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
 BlockCompletionFunc *cb, void *opaque);
+BlockAIOCB *ide_readv_cancelable(IDEState *s, int64_t sector_num,
+ QEMUIOVector *iov, int nb_sectors,
+ BlockCompletionFunc *cb, void *opaque);
 
 /* hw/ide/atapi.c */
 void ide_atapi_cmd(IDEState *s);
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index d31ff88..5587183 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -240,21 +240,35 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val)
 /* Ignore writes to SSBM if it keeps the old value */
 if ((val & BM_CMD_START) != (bm->cmd & BM_CMD_START)) {
 if (!(val & BM_CMD_START)) {
-/*
- * We can't cancel Sc

[Qemu-block] [PATCH V2 0/4] ide: avoid main-loop hang on CDROM/NFS failure

2015-10-12 Thread Peter Lieven
This series aims at avoiding a hanging main-loop if a vserver has a
CDROM image mounted from a NFS share and that NFS share goes down.
Typical situation is that users mount an CDROM ISO to install something
and then forget to eject that CDROM afterwards.
As a consequence this mounted CD is able to bring down the
whole vserver if the backend NFS share is unreachable. This is bad
especially if the CDROM itself is not needed anymore at this point.

This series aims at fixing 2 blocking I/O operations that would
hang if the NFS server is unavailable:
 - ATAPI PIO read requests used sync calls to blk_read, convert
   them to an async variant where possible.
 - If a busmaster DMA request is cancelled all requests are drained.
   Convert the drain to an async request canceling.

v1->v2: - fix offset for 2352 byte sector size [Kevin]
- use a sync request if we continue an elementary transfer.
  As John pointed out we enter a race condition between next
  IDE command and async transfer otherwise. This is sill not
  optimal, but it fixes the NFS down problems for all cases where
  the NFS server goes down while there is no PIO CD activity.
  Of course, it could still happen during a PIO transfer, but I
  expect this to be the unlikelier case.
  I spent some effort trying to read more sectors at once and
  avoiding continuation of elementary transfers, but with
  whatever I came up it was destroying migration between different
  Qemu versions. I have a quite hackish patch that works and
  should survive migration, but I am not happy with it. So I
  would like to start with this version as it is a big improvement
  already.
- Dropped Patch 5 because it is upstream meanwhile.

Peter Lieven (4):
  ide/atapi: make PIO read requests async
  ide/atapi: blk_aio_readv may return NULL
  ide: add support for cancelable read requests
  ide/atapi: enable cancelable requests

 hw/ide/atapi.c| 99 +--
 hw/ide/core.c | 55 +++
 hw/ide/internal.h | 16 +
 hw/ide/pci.c  | 42 +++
 4 files changed, 188 insertions(+), 24 deletions(-)

-- 
1.9.1




[Qemu-block] [PATCH 4/4] ide/atapi: enable cancelable requests

2015-10-12 Thread Peter Lieven
Signed-off-by: Peter Lieven 
---
 hw/ide/atapi.c | 4 ++--
 hw/ide/core.c  | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index e0cf066..8d38b1d 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -187,7 +187,7 @@ static int cd_read_sector(IDEState *s, int lba, void *buf)
 printf("cd_read_sector: lba=%d\n", lba);
 #endif
 
-if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
+if (ide_readv_cancelable(s, (int64_t)lba << 2, &s->qiov, 4,
   cd_read_sector_cb, s) == NULL) {
 return -EIO;
 }
@@ -426,7 +426,7 @@ static void ide_atapi_cmd_read_dma_cb(void *opaque, int ret)
 s->bus->dma->iov.iov_len = n * 4 * 512;
 qemu_iovec_init_external(&s->bus->dma->qiov, &s->bus->dma->iov, 1);
 
-s->bus->dma->aiocb = blk_aio_readv(s->blk, (int64_t)s->lba << 2,
+s->bus->dma->aiocb = ide_readv_cancelable(s, (int64_t)s->lba << 2,
&s->bus->dma->qiov, n * 4,
ide_atapi_cmd_read_dma_cb, s);
 if (s->bus->dma->aiocb == NULL) {
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 24547ce..5c7a346 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -2330,6 +2330,7 @@ int ide_init_drive(IDEState *s, BlockBackend *blk, 
IDEDriveKind kind,
 if (kind == IDE_CD) {
 blk_set_dev_ops(blk, &ide_cd_block_ops, s);
 blk_set_guest_block_size(blk, 2048);
+s->requests_cancelable = true;
 } else {
 if (!blk_is_inserted(s->blk)) {
 error_report("Device needs media, but drive is empty");
-- 
1.9.1




Re: [Qemu-block] [PATCH v3 07/16] block: Convert bs->backing_hd to BdrvChild

2015-10-12 Thread Alberto Garcia
On Fri 09 Oct 2015 02:15:32 PM CEST, Kevin Wolf wrote:
> This is the final step in converting all of the BlockDriverState
> pointers that block drivers use to BdrvChild.
>
> After this patch, bs->children contains the full list of child nodes
> that are referenced by a given BDS, and these children are only
> referenced through BdrvChild, so that updating the pointer in there is
> enough for changing edges in the graph.
>
> Signed-off-by: Kevin Wolf 

Reviewed-by: Alberto Garcia 

Berto



Re: [Qemu-block] [PATCH v3 08/16] block: Manage backing file references in bdrv_set_backing_hd()

2015-10-12 Thread Alberto Garcia
On Fri 09 Oct 2015 02:15:33 PM CEST, Kevin Wolf wrote:
> This simplifies the code somewhat, especially when dropping whole
> backing file subchains.
>
> The exception is the mirroring code that does adventurous things with
> bdrv_swap() and in order to keep it working, I had to duplicate most of
> bdrv_set_backing_hd() locally. We'll get rid again of this ugliness
> shortly.
>
> Signed-off-by: Kevin Wolf 

Reviewed-by: Alberto Garcia 

Berto



Re: [Qemu-block] [PATCH v10 02/10] Backup: clear all bitmap when doing block checkpoint

2015-10-12 Thread Stefan Hajnoczi
On Fri, Sep 25, 2015 at 02:17:30PM +0800, Wen Congyang wrote:
> Signed-off-by: Wen Congyang 
> Signed-off-by: zhanghailiang 
> Signed-off-by: Gonglei 
> Reviewed-by: Jeff Cody 
> ---
>  block/backup.c   | 14 ++
>  blockjob.c   | 11 +++
>  include/block/blockjob.h | 12 
>  3 files changed, 37 insertions(+)
> 
> diff --git a/block/backup.c b/block/backup.c
> index c61e4c3..5e5995e 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -214,11 +214,25 @@ static void backup_iostatus_reset(BlockJob *job)
>  }
>  }
>  
> +static void backup_do_checkpoint(BlockJob *job, Error **errp)
> +{
> +BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
> +
> +if (backup_job->sync_mode != MIRROR_SYNC_MODE_NONE) {
> +error_setg(errp, "The backup job only supports block checkpoint in"
> +   " sync=none mode");
> +return;
> +}
> +
> +hbitmap_reset_all(backup_job->bitmap);
> +}

Is this a fast way to stop and then start a new backup blockjob without
emitting block job lifecycle events?

Not sure the blockjob_do_checkpoint() interface is appropriate.  Is
there any other block job type that will implement .do_checkpoint()?

COLO block replication could call a public backup_do_checkpoint()
function.  That way the direct coupling between COLO and the backup
block job is obvious.  I'm not convinced a generic interface like
blockjob_do_checkpoint() makes sense since it's really not a generic
operation that makes sense for other block job types.

void backup_do_checkpoint(BlockJob *job, Error **errp)
{
BackupBlockJob *s;

if (job->driver != backup_job_driver) {
error_setg(errp, "expected backup block job type for "
   "checkpoint, got %d", job->driver->job_type);
return;
}

s = container_of(job, BackupBlockJob, common);
...
}

Please also make the function name and documentation more specific.
Instead of "do" maybe this should be "pre" or "post" to indicate whether
this happens before or after the checkpoint commit.  What happens if
this function returns an error?



Re: [Qemu-block] [PATCH v3 09/16] block: Split bdrv_move_feature_fields()

2015-10-12 Thread Alberto Garcia
On Fri 09 Oct 2015 02:15:34 PM CEST, Kevin Wolf wrote:
> After bdrv_swap(), some fields must be moved back to their original BDS
> to compensate for the effects that a swap of the contents of the objects
> has while keeping the old addresses. Other fields must be moved back
> because they should logically be moved and must stay on top
>
> When replacing bdrv_swap() with operations changing the pointers in the
> parents, we only need the latter and must avoid swapping the former.
> Split the function accordingly.
>
> Signed-off-by: Kevin Wolf 
> Reviewed-by: Max Reitz 
> Reviewed-by: Fam Zheng 

Reviewed-by: Alberto Garcia 

Berto



Re: [Qemu-block] [PATCH v2 05/12] block: Introduce "drained begin/end" API

2015-10-12 Thread Kevin Wolf
Am 12.10.2015 um 13:50 hat Fam Zheng geschrieben:
> The semantics is that after bdrv_drained_begin(bs), bs will not get new 
> external
> requests until the matching bdrv_drained_end(bs).
> 
> Signed-off-by: Fam Zheng 
> ---
>  block.c   |  2 ++
>  block/io.c| 18 ++
>  include/block/block.h | 19 +++
>  include/block/block_int.h |  2 ++
>  4 files changed, 41 insertions(+)
> 
> diff --git a/block.c b/block.c
> index 1f90b47..9b28a07 100644
> --- a/block.c
> +++ b/block.c
> @@ -2058,6 +2058,8 @@ static void bdrv_move_feature_fields(BlockDriverState 
> *bs_dest,
>  bs_dest->device_list = bs_src->device_list;
>  bs_dest->blk = bs_src->blk;
>  
> +bs_dest->quiesce_counter = bs_src->quiesce_counter;
> +
>  memcpy(bs_dest->op_blockers, bs_src->op_blockers,
> sizeof(bs_dest->op_blockers));
>  }

This feels wrong. As I understand it, bdrv_drained_begin/end works on
specific nodes and not on trees. Including the field in
bdrv_move_feature_fields() means that it moves to the top of the tree
(i.e. it stays at in the same C object, which however belongs to a
different logical node now).

What I could imagine is that you did this so you can use
bdrv_draind_end() on the same BDS as you called bdrv_drained_start() on.
However, that's not the interface of bdrv_swap(), which really means
that the BDSes are swapped. So with this hunk you just end up having a
bug that cancels out the weirdness of the bdrv_swap() interface.

If you rebase on my bdrv_swap() removal series, things become a bit more
obvious. If you don't, you should drop this hunk and change some
bdrv_drained_end() calls, e.g. in the next patch, you'd have to call
bdrv_drained_begin(state->old_bs), but bdrv_drained_end(state->new_bs).

The rest of this patch looks good.

Kevin



Re: [Qemu-block] [PATCH v2 10/12] block: Introduce BlockDriver.bdrv_drain callback

2015-10-12 Thread Kevin Wolf
Am 12.10.2015 um 13:50 hat Fam Zheng geschrieben:
> Drivers can have internal request sources that generate IO, like the
> need_check_timer in QED. Since we want quiesced periods that contain
> nested event loops in block layer, we need to have a way to disable such
> event sources.
> 
> Block drivers must implement the "bdrv_drain" callback if it has any
> internal sources that can generate I/O activity, like a timer or a
> worker thread (even in a library) that can schedule QEMUBH in an
> asynchronous callback.
> 
> Signed-off-by: Fam Zheng 

I think the right interface would be .bdrv_drain_begin/end callbacks so
that the timers or background work can be reenabled again after the
drained section.

As it happens, QED doesn't need this because you chose to complete the
outstanding work and the timer only needs to be reenabled on the next
write operation. Fine with me, we can extend the interface as soon as we
really need it.

(Though, actually, I'm not sure... I think I'll comment on the QED
patch.)

Kevin



Re: [Qemu-block] [PATCH v2 11/12] qed: Implement .bdrv_drain

2015-10-12 Thread Kevin Wolf
Am 12.10.2015 um 13:50 hat Fam Zheng geschrieben:
> The "need_check_timer" is used to clear the "NEED_CHECK" flag in the
> image header after a grace period once metadata update has finished. In
> compliance to the bdrv_drain semantics we should make sure it remains
> deleted once .bdrv_drain is called.
> 
> Call the qed_need_check_timer_cb manually to update the header
> immediately.
> 
> Signed-off-by: Fam Zheng 

What happens if a new allocating write request is issued during the
drained section and the timer gets reenabled?

Kevin

>  block/qed.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/block/qed.c b/block/qed.c
> index a7ff1d9..23bd273 100644
> --- a/block/qed.c
> +++ b/block/qed.c
> @@ -381,6 +381,12 @@ static void bdrv_qed_attach_aio_context(BlockDriverState 
> *bs,
>  }
>  }
>  
> +static void bdrv_qed_drain(BlockDriverState *bs)
> +{
> +qed_cancel_need_check_timer(bs->opaque);
> +qed_need_check_timer_cb(bs->opaque);
> +}
> +
>  static int bdrv_qed_open(BlockDriverState *bs, QDict *options, int flags,
>   Error **errp)
>  {
> @@ -1683,6 +1689,7 @@ static BlockDriver bdrv_qed = {
>  .bdrv_check   = bdrv_qed_check,
>  .bdrv_detach_aio_context  = bdrv_qed_detach_aio_context,
>  .bdrv_attach_aio_context  = bdrv_qed_attach_aio_context,
> +.bdrv_drain   = bdrv_qed_drain,
>  };
>  
>  static void bdrv_qed_init(void)
> -- 
> 2.6.1
> 



Re: [Qemu-block] [PATCH v3 13/16] block: Implement bdrv_append() without bdrv_swap()

2015-10-12 Thread Alberto Garcia
On Fri 09 Oct 2015 02:15:38 PM CEST, Kevin Wolf wrote:
> +static void change_parent_backing_link(BlockDriverState *from,
> +   BlockDriverState *to)
> +{
> +BdrvChild *c, *next;
> +
> +QLIST_FOREACH_SAFE(c, &from->parents, next_parent, next) {
> +assert(c->role != &child_backing);
> +c->bs = to;
> +QLIST_REMOVE(c, next_parent);
> +QLIST_INSERT_HEAD(&to->parents, c, next_parent);
> +bdrv_ref(to);
> +bdrv_unref(from);
> +}
> +if (from->blk) {
> +blk_set_bs(from->blk, to);
> +if (!to->device_list.tqe_prev) {
> +QTAILQ_INSERT_BEFORE(from, to, device_list);
> +}

Is it even possible that this last condition is false? In what case
would 'to' be already in bdrv_states?

I understand that it would mean that it would already be attached to a
BlockBackend, but that's not possible in this case.

Berto



Re: [Qemu-block] [PATCH v2 05/12] block: Introduce "drained begin/end" API

2015-10-12 Thread Paolo Bonzini


On 12/10/2015 13:50, Fam Zheng wrote:
> +void bdrv_drained_begin(BlockDriverState *bs)
> +{
> +if (bs->quiesce_counter++) {
> +return;
> +}
> +aio_disable_external(bdrv_get_aio_context(bs));
> +bdrv_drain(bs);
> +}

I think bdrv_drain should be called unconditionally, i.e. before the
"if".  This should also solve Kevin's doubt about new allocating write
request reenabling the timer: any write request from the drained section
happens normally, until you get a nested drain request and then the
callback completes the requests.

Paolo



Re: [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication

2015-10-12 Thread Stefan Hajnoczi
On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
> +static void backup_job_completed(void *opaque, int ret)
> +{
> +BDRVReplicationState *s = opaque;
> +
> +if (s->replication_state != BLOCK_REPLICATION_DONE) {
> +/* The backup job is cancelled unexpectedly */
> +s->error = -EIO;
> +}
> +
> +bdrv_op_block(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
> +  s->active_disk->backing_blocker);
> +bdrv_op_block(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE,
> +  s->hidden_disk->backing_blocker);
> +
> +bdrv_put_ref_bh_schedule(s->secondary_disk);

Why is bdrv_put_ref_bh_schedule() necessary?



Re: [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication

2015-10-12 Thread Stefan Hajnoczi
On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
> +/* start backup job now */
> +bdrv_op_unblock(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
> +s->active_disk->backing_blocker);
> +bdrv_op_unblock(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE,
> +s->hidden_disk->backing_blocker);

Why is it safe to unblock these operations?

Why do they have to be blocked for non-replication users?

Stefan



Re: [Qemu-block] [PATCH v5 3/4] qmp: add monitor command to add/remove a child

2015-10-12 Thread Max Reitz
On 07.10.2015 21:42, Max Reitz wrote:
> On 22.09.2015 09:44, Wen Congyang wrote:
>> The new QMP command name is x-blockdev-child-add, and x-blockdev-child-del.
>> It justs for adding/removing quorum's child now, and don't support all
>> kinds of children,
> 
> It does support all kinds of children for quorum, doesn't it?
> 
>>nor all block drivers. So it is experimental now.
> 
> Well, that is not really a reason why we would have to make it
> experimental. For instance, blockdev-add (although some might argue it
> actually is experimental...) doesn't support all block drivers either.

OK, after a rather long discussion, my opinion has changed. Adding them
as experimental interfaces is good (although the reason noted here is
not exactly what I feel is the reason that came out in the discussion).

Thanks to everyone who argued with me! I took a good chunk of your time,
and I'll have you know that I'm grateful for it.

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication

2015-10-12 Thread Stefan Hajnoczi
On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
> +static void replication_start(BlockDriverState *bs, ReplicationMode mode,
> +  Error **errp)
> +{
> +BDRVReplicationState *s = bs->opaque;
> +int64_t active_length, hidden_length, disk_length;
> +AioContext *aio_context;
> +Error *local_err = NULL;
> +
> +if (s->replication_state != BLOCK_REPLICATION_NONE) {
> +error_setg(errp, "Block replication is running or done");
> +return;
> +}
> +
> +if (s->mode != mode) {
> +error_setg(errp, "The parameter mode's value is invalid, needs %d,"
> +   " but receives %d", s->mode, mode);
> +return;
> +}
> +
> +switch (s->mode) {
> +case REPLICATION_MODE_PRIMARY:
> +break;
> +case REPLICATION_MODE_SECONDARY:
> +s->active_disk = bs->file;
> +if (!bs->file->backing_hd) {
> +error_setg(errp, "Active disk doesn't have backing file");
> +return;
> +}
> +
> +s->hidden_disk = s->active_disk->backing_hd;
> +if (!s->hidden_disk->backing_hd) {
> +error_setg(errp, "Hidden disk doesn't have backing file");
> +return;
> +}
> +
> +s->secondary_disk = s->hidden_disk->backing_hd;
> +if (!s->secondary_disk->blk) {
> +error_setg(errp, "The secondary disk doesn't have block 
> backend");
> +return;
> +}
...
> +aio_context = bdrv_get_aio_context(bs);
> +aio_context_acquire(aio_context);
> +bdrv_set_aio_context(s->secondary_disk, aio_context);

Why is this bdrv_set_aio_context() call necessary?

Child BDS nodes are in the same AioContext as their parents.  Other
block jobs need something like this because they operate on a second BDS
which is not bs' backing file chain.  I think you have a different
situation here so it's not needed.



Re: [Qemu-block] [PATCH v3 07/16] block: Convert bs->backing_hd to BdrvChild

2015-10-12 Thread Max Reitz
On 09.10.2015 14:15, Kevin Wolf wrote:
> This is the final step in converting all of the BlockDriverState
> pointers that block drivers use to BdrvChild.
> 
> After this patch, bs->children contains the full list of child nodes
> that are referenced by a given BDS, and these children are only
> referenced through BdrvChild, so that updating the pointer in there is
> enough for changing edges in the graph.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block.c   | 105 
> +++---
>  block/io.c|  24 +--
>  block/mirror.c|   6 +--
>  block/qapi.c  |   8 ++--
>  block/qcow.c  |   4 +-
>  block/qcow2-cluster.c |   4 +-
>  block/qcow2.c |   6 +--
>  block/qed.c   |  12 +++---
>  block/stream.c|   8 ++--
>  block/vmdk.c  |  21 +-
>  block/vvfat.c |   6 +--
>  blockdev.c|   4 +-
>  include/block/block_int.h |  12 --
>  qemu-img.c|   4 +-
>  14 files changed, 115 insertions(+), 109 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [RFC] transactions: add transaction-wide property

2015-10-12 Thread John Snow
Ping -- any consensus on how we should implement the "do-or-die"
argument for transactions that start block jobs? :)

This patch may look a little hokey in how it boxes arguments, but I can
re-do it on top of Eric Blake's very official way of boxing arguments,
when the QAPI dust settles.

--js

On 09/24/2015 05:40 PM, John Snow wrote:
> This replaces the per-action property as in Fam's series.
> Instead, we have a transaction-wide property that is shared
> with each action.
> 
> At the action level, if a property supplied transaction-wide
> is disagreeable, we return error and the transaction is aborted.
> 
> RFC:
> 
> Where this makes sense: Any transactional actions that aren't
> prepared to accept this new paradigm of transactional behavior
> can error_setg and return.
> 
> Where this may not make sense: consider the transactions which
> do not use BlockJobs to perform their actions, i.e. they are
> synchronous during the transactional phase. Because they either
> fail or succeed so early, we might say that of course they can
> support this property ...
> 
> ...however, consider the case where we create a new bitmap and
> perform a full backup, using allow_partial=false. If the backup
> fails, we might well expect the bitmap to be deleted because a
> member of the transaction ultimately/eventually failed. However,
> the bitmap creation was not undone because it does not have a
> pending/delayed abort/commit action -- those are only for jobs
> in this implementation.
> 
> How do we fix this?
> 
> (1) We just say "No, you can't use the new block job transaction
> completion mechanic in conjunction with these commands,"
> 
> (2) We make Bitmap creation/resetting small, synchronous blockjobs
> that can join the BlockJobTxn
> 
> Signed-off-by: John Snow 
> ---
>  blockdev.c | 87 
> --
>  blockjob.c |  2 +-
>  qapi-schema.json   | 15 +++--
>  qapi/block-core.json   | 26 ---
>  qmp-commands.hx|  2 +-
>  tests/qemu-iotests/124 | 12 +++
>  6 files changed, 83 insertions(+), 61 deletions(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 45a9fe7..02b1a83 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -1061,7 +1061,7 @@ static void blockdev_do_action(int kind, void *data, 
> Error **errp)
>  action.data = data;
>  list.value = &action;
>  list.next = NULL;
> -qmp_transaction(&list, errp);
> +qmp_transaction(&list, false, NULL, errp);
>  }
>  
>  void qmp_blockdev_snapshot_sync(bool has_device, const char *device,
> @@ -1286,6 +1286,7 @@ struct BlkActionState {
>  TransactionAction *action;
>  const BlkActionOps *ops;
>  BlockJobTxn *block_job_txn;
> +TransactionProperties *txn_props;
>  QSIMPLEQ_ENTRY(BlkActionState) entry;
>  };
>  
> @@ -1322,6 +1323,12 @@ static void internal_snapshot_prepare(BlkActionState 
> *common,
>  name = internal->name;
>  
>  /* 2. check for validation */
> +if (!common->txn_props->allow_partial) {
> +error_setg(errp,
> +   "internal_snapshot does not support allow_partial = 
> false");
> +return;
> +}
> +
>  blk = blk_by_name(device);
>  if (!blk) {
>  error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
> @@ -1473,6 +1480,12 @@ static void external_snapshot_prepare(BlkActionState 
> *common,
>  }
>  
>  /* start processing */
> +if (!common->txn_props->allow_partial) {
> +error_setg(errp,
> +   "external_snapshot does not support allow_partial = 
> false");
> +return;
> +}
> +
>  state->old_bs = bdrv_lookup_bs(has_device ? device : NULL,
> has_node_name ? node_name : NULL,
> &local_err);
> @@ -1603,14 +1616,11 @@ static void drive_backup_prepare(BlkActionState 
> *common, Error **errp)
>  DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
>  BlockDriverState *bs;
>  BlockBackend *blk;
> -DriveBackupTxn *backup_txn;
>  DriveBackup *backup;
> -BlockJobTxn *txn = NULL;
>  Error *local_err = NULL;
>  
>  assert(common->action->kind == TRANSACTION_ACTION_KIND_DRIVE_BACKUP);
> -backup_txn = common->action->drive_backup;
> -backup = backup_txn->base;
> +backup = common->action->drive_backup->base;
>  
>  blk = blk_by_name(backup->device);
>  if (!blk) {
> @@ -1624,11 +1634,6 @@ static void drive_backup_prepare(BlkActionState 
> *common, Error **errp)
>  state->aio_context = bdrv_get_aio_context(bs);
>  aio_context_acquire(state->aio_context);
>  
> -if (backup_txn->has_transactional_cancel &&
> -backup_txn->transactional_cancel) {
> -txn = common->block_job_txn;
> -}
> -
>  do_drive_backup(backup->device, backup->target,
>  backup->has_format, backup->format,
>  backup->sync,
> @@ -1637,7 +1642,7 @@ static void dr

Re: [Qemu-block] [PATCH v3 08/16] block: Manage backing file references in bdrv_set_backing_hd()

2015-10-12 Thread Max Reitz
On 09.10.2015 14:15, Kevin Wolf wrote:
> This simplifies the code somewhat, especially when dropping whole
> backing file subchains.
> 
> The exception is the mirroring code that does adventurous things with
> bdrv_swap() and in order to keep it working, I had to duplicate most of
> bdrv_set_backing_hd() locally. We'll get rid again of this ugliness
> shortly.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block.c   | 68 
> ++-
>  block/mirror.c| 16 +---
>  block/stream.c| 30 +--
>  block/vvfat.c |  6 -
>  include/block/block.h |  1 +
>  5 files changed, 37 insertions(+), 84 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PATCH v6 00/39] blockdev: BlockBackend and media

2015-10-12 Thread Max Reitz
*** This series is based on v3 of Kevin's  ***
*** "block: Get rid of bdrv_swap()" series ***

This series reworks a lot regarding BlockBackend and media. Basically,
it allows empty BlockBackends, that is BBs without a BDS tree.

Before this series, empty drives are represented by a BlockBackend with
an empty BDS attached to it (a BDS with a NULL driver). However, now we
have BlockBackends, thus an empty drive should be represented by a
BlockBackend without any BDS tree attached to it. This is what this
series does.


Quick and early summary for the v6 changes:
- Rebase on master and Kevin's bdrv_swap() series
- Addressed Kevin's comments for v5


Justification for each of the patches and their order:

-- Preparation before _is_inserted() patches --
 1: Patch 9 will not take care not to break host floppy support, so that
support needs to be removed first.
 2: Needed for patch 3, so that blockdev-added BDSs without a BB still
get the BDRV_O_INCOMING flag set.
 3: Needed for patch 4. Patch 26 is a follow-up after BDS-less BBs are
allowed.
 4: bdrv_close_all() is broken ("block: Rework bdrv_close_all()"). Patch
7 will break iotest 071 (actually, just make the problem apparent).
So this patch is required to work around the issue.
(with "the issue" being that bdrv_close_all() does not unref() the
BDSs it is closing, but just force-closes everything, even if the
BDS may still be in use somewhere)

-- _is_inserted() patches --
 5: General clean-up work, nice to have before patch 7 (and goes in tune
with patch 6).
 6: Using the same BB as a guest device means that the data read from
there should be exactly the same. Opening the guest tray should
therefore result in no data being readable. This is what we then
need this function for.
 7: General clean-up work (in the _is_inserted() area).
 8: General clean-up work (in the _is_inserted() area).
 9: General clean-up work (also regarding _is_inserted()).
10: Required so inserting a floppy will not result in the tray being
reported as closed (you need to "push in" the floppy first, using
blockdev-close-tray). It's here in the "_is_inserted() patches area"
because I feel like that's a very related topic.

-- Support for BDS-less BBs --
11: Preparation for BDS-less BBs
12: Preparation for BDS-less BBs
13: Preparation for BDS-less BBs (BB properties should be in the BB, and
not in the root BDS)
14: Patch 15 removes BlockAcctStats from the BDS, but wr_highest_sector
is BDS-dependent, so it needs to stay here
15: Preparation for BDS-less BBs (BB properties should be in the BB, and
not in the root BDS)
16: Preparation for BDS-less BBs (BB properties should be in the BB, and
not in the root BDS)
17: Needed for patch 18
18: Preparation for BDS-less BBs (Removing a BDS tree should retain some
properties for legacy reasons, which must therefore be stored in the
BB(RS))
19: Preparation for BDS-less BBs
20: Preparation for BDS-less BBs
21: Preparation for BDS-less BBs
22: Ability to add BDS trees to empty BBs ("inserting a medium")
23: Preparation for BDS-less BBs (needs patch 22)
24: One goal of this series, and fixes the "opening tray" event for
empty drives when shutting down qemu
25: Needed for patch 26
26: Completion of what patch 3 begun
27: Ability to detach BDS trees from BBs

-- "Atomic" QMP tray operations --
28: blockdev-open-tray
29: blockdev-close-tray
30: blockdev-remove-medium
31: blockdev-insert-medium

-- Reimplementation of change/eject --
32: eject
33: change
34: Clean-up patch

-- New QMP blockdev-change-medium command --
35: New QMP command
36: Use for HMP change command
37: Add flag to that command for changing the read-only access mode
(which was my original intention for this series)
38: Same flag for HMP

-- Tests --
39: iotests are always nice, so here is one


v6:
- Patch 8: Trivial rebase conflict due to Kevin's bdrv_swap() series
- Patch 17: Added, so that the throttle group can be strongly referenced
  by the BBRS
- Patch 18:
  - Keep a strong reference to the throttle group in the BBRS [Kevin];
this is done by keeping the name of the group (which the external
interface of the throttle implementation generally uses for
identification) and a pointer to the ThrottleState. Just storing the
name is not enough, we actually do need the ThrottleState for patch
24.
The ThrottleConfig is dropped since the ThrottleGroup's
configuration should not be overwritten when a new BDS is added.
- Patch 19: blk_{set_,}enable_write_cache() can fall back to the BBRS,
  too [Kevin]
- Patch 21: Added blk_drain(), removed blk_{set_,}enable_write_cache()
  [Kevin]
- Patch 22: blk_insert_bs() is no longer idempotent [Kevin, Eric]
  (i.e., inserting a BDS into its attached BB will now fail an
  assertion)
- Patch 24:
  - Adopted to how the BBRS stores the ThrottleGroup reference now. An
important point is that if a BDS tree is created,
bdrv_set_io_limit

[Qemu-block] [PATCH v6 02/39] block: Set BDRV_O_INCOMING in bdrv_fill_options()

2015-10-12 Thread Max Reitz
This flag should not be set for the root BDS only, but for any BDS that
is being created while incoming migration is pending, so setting it is
moved from blockdev_init() to bdrv_fill_options().

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Kevin Wolf 
Reviewed-by: Alberto Garcia 
---
 block.c| 4 
 blockdev.c | 4 
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index f38146e..0ae3fcf 100644
--- a/block.c
+++ b/block.c
@@ -1076,6 +1076,10 @@ static int bdrv_fill_options(QDict **options, const char 
**pfilename,
 }
 }
 
+if (runstate_check(RUN_STATE_INMIGRATE)) {
+*flags |= BDRV_O_INCOMING;
+}
+
 return 0;
 }
 
diff --git a/blockdev.c b/blockdev.c
index 6c8cce4..f937526 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -539,10 +539,6 @@ static BlockBackend *blockdev_init(const char *file, QDict 
*bs_opts,
 bdrv_flags |= BDRV_O_COPY_ON_READ;
 }
 
-if (runstate_check(RUN_STATE_INMIGRATE)) {
-bdrv_flags |= BDRV_O_INCOMING;
-}
-
 bdrv_flags |= ro ? 0 : BDRV_O_RDWR;
 
 blk = blk_new_open(qemu_opts_id(opts), file, NULL, bs_opts, bdrv_flags,
-- 
2.6.1




[Qemu-block] [PATCH v6 06/39] block: Add blk_is_available()

2015-10-12 Thread Max Reitz
blk_is_available() returns true iff the BDS is inserted (which means
blk_bs() is not NULL and bdrv_is_inserted() returns true) and if the
tray of the guest device is closed.

blk_is_inserted() is changed to return true only if blk_bs() is not
NULL.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Alberto Garcia 
Reviewed-by: Kevin Wolf 
---
 block/block-backend.c  | 7 ++-
 include/sysemu/block-backend.h | 1 +
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 1db002c..74642dc 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -771,7 +771,12 @@ void blk_invalidate_cache(BlockBackend *blk, Error **errp)
 
 bool blk_is_inserted(BlockBackend *blk)
 {
-return bdrv_is_inserted(blk->bs);
+return blk->bs && bdrv_is_inserted(blk->bs);
+}
+
+bool blk_is_available(BlockBackend *blk)
+{
+return blk_is_inserted(blk) && !blk_dev_is_tray_open(blk);
 }
 
 void blk_lock_medium(BlockBackend *blk, bool locked)
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 8f2bf10..1e19d1b 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -131,6 +131,7 @@ int blk_enable_write_cache(BlockBackend *blk);
 void blk_set_enable_write_cache(BlockBackend *blk, bool wce);
 void blk_invalidate_cache(BlockBackend *blk, Error **errp);
 bool blk_is_inserted(BlockBackend *blk);
+bool blk_is_available(BlockBackend *blk);
 void blk_lock_medium(BlockBackend *blk, bool locked);
 void blk_eject(BlockBackend *blk, bool eject_flag);
 int blk_get_flags(BlockBackend *blk);
-- 
2.6.1




[Qemu-block] [PATCH v6 04/39] iotests: Only create BB if necessary

2015-10-12 Thread Max Reitz
Tests 071 and 081 test giving references in blockdev-add. It is not
necessary to create a BlockBackend here, so omit it.

While at it, fix up some blockdev-add invocations in the vicinity
(s/raw/$IMGFMT/ in 081, drop the format BDS for blkverify's raw child in
071).

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Kevin Wolf 
---
 tests/qemu-iotests/071 | 54 ++
 tests/qemu-iotests/071.out | 12 +++
 tests/qemu-iotests/081 | 18 +---
 tests/qemu-iotests/081.out |  5 +++--
 4 files changed, 71 insertions(+), 18 deletions(-)

diff --git a/tests/qemu-iotests/071 b/tests/qemu-iotests/071
index 9eaa49b..92ab991 100755
--- a/tests/qemu-iotests/071
+++ b/tests/qemu-iotests/071
@@ -104,11 +104,20 @@ echo
 echo "=== Testing blkdebug on existing block device ==="
 echo
 
-run_qemu -drive "file=$TEST_IMG,format=raw,if=none,id=drive0" <

[Qemu-block] [PATCH v6 01/39] block: Remove host floppy support

2015-10-12 Thread Max Reitz
It has been deprecated as of 2.3, so we can now remove it.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Kevin Wolf 
---
 block/raw-posix.c| 222 ++-
 qapi/block-core.json |   9 +--
 2 files changed, 9 insertions(+), 222 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index cc1b874..afd1c59 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -127,11 +127,6 @@ do { \
 
 #define FTYPE_FILE   0
 #define FTYPE_CD 1
-#define FTYPE_FD 2
-
-/* if the FD is not accessed during that time (in ns), we try to
-   reopen it to see if the disk has been changed */
-#define FD_OPEN_TIMEOUT (10)
 
 #define MAX_BLOCKSIZE  4096
 
@@ -141,13 +136,6 @@ typedef struct BDRVRawState {
 int open_flags;
 size_t buf_align;
 
-#if defined(__linux__)
-/* linux floppy specific */
-int64_t fd_open_time;
-int64_t fd_error_time;
-int fd_got_error;
-int fd_media_changed;
-#endif
 #ifdef CONFIG_LINUX_AIO
 int use_aio;
 void *aio_ctx;
@@ -626,7 +614,7 @@ static int raw_reopen_prepare(BDRVReopenState *state,
 }
 #endif
 
-if (s->type == FTYPE_FD || s->type == FTYPE_CD) {
+if (s->type == FTYPE_CD) {
 raw_s->open_flags |= O_NONBLOCK;
 }
 
@@ -2178,47 +2166,6 @@ static int hdev_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 
 #if defined(__linux__)
-/* Note: we do not have a reliable method to detect if the floppy is
-   present. The current method is to try to open the floppy at every
-   I/O and to keep it opened during a few hundreds of ms. */
-static int fd_open(BlockDriverState *bs)
-{
-BDRVRawState *s = bs->opaque;
-int last_media_present;
-
-if (s->type != FTYPE_FD)
-return 0;
-last_media_present = (s->fd >= 0);
-if (s->fd >= 0 &&
-(qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - s->fd_open_time) >= 
FD_OPEN_TIMEOUT) {
-qemu_close(s->fd);
-s->fd = -1;
-DPRINTF("Floppy closed\n");
-}
-if (s->fd < 0) {
-if (s->fd_got_error &&
-(qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - s->fd_error_time) < 
FD_OPEN_TIMEOUT) {
-DPRINTF("No floppy (open delayed)\n");
-return -EIO;
-}
-s->fd = qemu_open(bs->filename, s->open_flags & ~O_NONBLOCK);
-if (s->fd < 0) {
-s->fd_error_time = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
-s->fd_got_error = 1;
-if (last_media_present)
-s->fd_media_changed = 1;
-DPRINTF("No floppy\n");
-return -EIO;
-}
-DPRINTF("Floppy opened\n");
-}
-if (!last_media_present)
-s->fd_media_changed = 1;
-s->fd_open_time = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
-s->fd_got_error = 0;
-return 0;
-}
-
 static int hdev_ioctl(BlockDriverState *bs, unsigned long int req, void *buf)
 {
 BDRVRawState *s = bs->opaque;
@@ -2247,8 +2194,8 @@ static BlockAIOCB *hdev_aio_ioctl(BlockDriverState *bs,
 pool = aio_get_thread_pool(bdrv_get_aio_context(bs));
 return thread_pool_submit_aio(pool, aio_worker, acb, cb, opaque);
 }
+#endif /* linux */
 
-#elif defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
 static int fd_open(BlockDriverState *bs)
 {
 BDRVRawState *s = bs->opaque;
@@ -2258,14 +2205,6 @@ static int fd_open(BlockDriverState *bs)
 return 0;
 return -EIO;
 }
-#else /* !linux && !FreeBSD */
-
-static int fd_open(BlockDriverState *bs)
-{
-return 0;
-}
-
-#endif /* !linux && !FreeBSD */
 
 static coroutine_fn BlockAIOCB *hdev_aio_discard(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors,
@@ -2309,14 +2248,13 @@ static int hdev_create(const char *filename, QemuOpts 
*opts,
 int64_t total_size = 0;
 bool has_prefix;
 
-/* This function is used by all three protocol block drivers and therefore
- * any of these three prefixes may be given.
+/* This function is used by both protocol block drivers and therefore 
either
+ * of these prefixes may be given.
  * The return value has to be stored somewhere, otherwise this is an error
  * due to -Werror=unused-value. */
 has_prefix =
 strstart(filename, "host_device:", &filename) ||
-strstart(filename, "host_cdrom:" , &filename) ||
-strstart(filename, "host_floppy:", &filename);
+strstart(filename, "host_cdrom:" , &filename);
 
 (void)has_prefix;
 
@@ -2396,155 +2334,6 @@ static BlockDriver bdrv_host_device = {
 #endif
 };
 
-#ifdef __linux__
-static void floppy_parse_filename(const char *filename, QDict *options,
-  Error **errp)
-{
-/* The prefix is optional, just as for "file". */
-strstart(filename, "host_floppy:", &filename);
-
-qdict_put_obj(options, "filename", QOBJECT(qstring_from_str(filename)));
-}
-
-static int floppy_open(BlockDriverState *bs, QDict *options, int flags,
-   Error **errp)
-{
-BDRVRawState

[Qemu-block] [PATCH v6 03/39] blockdev: Allow creation of BDS trees without BB

2015-10-12 Thread Max Reitz
If the "id" field is missing from the options given to blockdev-add,
just omit the BlockBackend and create the BlockDriverState tree alone.

However, if "id" is missing, "node-name" must be specified; otherwise,
the BDS tree would no longer be accessible.

Many BDS options which are not parsed by bdrv_open() (like caching)
cannot be specified for these BB-less BDS trees yet. A future patch will
remove this limitation.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Kevin Wolf 
Reviewed-by: Alberto Garcia 
---
 blockdev.c | 44 +++-
 qapi/block-core.json   | 13 +
 tests/qemu-iotests/087 |  2 +-
 tests/qemu-iotests/087.out |  4 ++--
 4 files changed, 43 insertions(+), 20 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index f937526..07b9a3d 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3028,17 +3028,12 @@ out:
 void qmp_blockdev_add(BlockdevOptions *options, Error **errp)
 {
 QmpOutputVisitor *ov = qmp_output_visitor_new();
-BlockBackend *blk;
+BlockDriverState *bs;
+BlockBackend *blk = NULL;
 QObject *obj;
 QDict *qdict;
 Error *local_err = NULL;
 
-/* Require an ID in the top level */
-if (!options->has_id) {
-error_setg(errp, "Block device needs an ID");
-goto fail;
-}
-
 /* TODO Sort it out in raw-posix and drive_new(): Reject aio=native with
  * cache.direct=false instead of silently switching to aio=threads, except
  * when called from drive_new().
@@ -3066,14 +3061,37 @@ void qmp_blockdev_add(BlockdevOptions *options, Error 
**errp)
 
 qdict_flatten(qdict);
 
-blk = blockdev_init(NULL, qdict, &local_err);
-if (local_err) {
-error_propagate(errp, local_err);
-goto fail;
+if (options->has_id) {
+blk = blockdev_init(NULL, qdict, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+goto fail;
+}
+
+bs = blk_bs(blk);
+} else {
+int ret;
+
+if (!qdict_get_try_str(qdict, "node-name")) {
+error_setg(errp, "'id' and/or 'node-name' need to be specified for 
"
+   "the root node");
+goto fail;
+}
+
+bs = NULL;
+ret = bdrv_open(&bs, NULL, NULL, qdict, BDRV_O_RDWR | BDRV_O_CACHE_WB,
+errp);
+if (ret < 0) {
+goto fail;
+}
 }
 
-if (bdrv_key_required(blk_bs(blk))) {
-blk_unref(blk);
+if (bs && bdrv_key_required(bs)) {
+if (blk) {
+blk_unref(blk);
+} else {
+bdrv_unref(bs);
+}
 error_setg(errp, "blockdev-add doesn't support encrypted devices");
 goto fail;
 }
diff --git a/qapi/block-core.json b/qapi/block-core.json
index c042561..425fdab 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1393,9 +1393,12 @@
 #
 # @driver:block driver name
 # @id:#optional id by which the new block device can be referred 
to.
-# This is a required option on the top level of blockdev-add, 
and
-# currently not allowed on any other level.
-# @node-name: #optional the name of a block driver state node (Since 2.0)
+# This option is only allowed on the top level of blockdev-add.
+# A BlockBackend will be created by blockdev-add if and only if
+# this option is given.
+# @node-name: #optional the name of a block driver state node (Since 2.0).
+# This option is required on the top level of blockdev-add if
+# the @id option is not given there.
 # @discard:   #optional discard-related options (default: ignore)
 # @cache: #optional cache-related options
 # @aio:   #optional AIO backend (default: threads)
@@ -1859,7 +1862,9 @@
 ##
 # @blockdev-add:
 #
-# Creates a new block device.
+# Creates a new block device. If the @id option is given at the top level, a
+# BlockBackend will be created; otherwise, @node-name is mandatory at the top
+# level and no BlockBackend will be created.
 #
 # This command is still a work in progress.  It doesn't support all
 # block drivers, it lacks a matching blockdev-del, and more.  Stay
diff --git a/tests/qemu-iotests/087 b/tests/qemu-iotests/087
index 8694749..af44299 100755
--- a/tests/qemu-iotests/087
+++ b/tests/qemu-iotests/087
@@ -54,7 +54,7 @@ size=128M
 _make_test_img $size
 
 echo
-echo === Missing ID ===
+echo === Missing ID and node-name ===
 echo
 
 run_qemu <

[Qemu-block] [PATCH v6 05/39] block: Make bdrv_is_inserted() return a bool

2015-10-12 Thread Max Reitz
Make bdrv_is_inserted(), blk_is_inserted(), and the callback
BlockDriver.bdrv_is_inserted() return a bool.

Suggested-by: Eric Blake 
Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Alberto Garcia 
Reviewed-by: Kevin Wolf 
---
 block.c| 12 +++-
 block/block-backend.c  |  2 +-
 block/raw-posix.c  |  8 +++-
 block/raw_bsd.c|  2 +-
 include/block/block.h  |  2 +-
 include/block/block_int.h  |  2 +-
 include/sysemu/block-backend.h |  2 +-
 7 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/block.c b/block.c
index 0ae3fcf..15a11bf 100644
--- a/block.c
+++ b/block.c
@@ -3130,14 +3130,16 @@ void bdrv_invalidate_cache_all(Error **errp)
 /**
  * Return TRUE if the media is present
  */
-int bdrv_is_inserted(BlockDriverState *bs)
+bool bdrv_is_inserted(BlockDriverState *bs)
 {
 BlockDriver *drv = bs->drv;
 
-if (!drv)
-return 0;
-if (!drv->bdrv_is_inserted)
-return 1;
+if (!drv) {
+return false;
+}
+if (!drv->bdrv_is_inserted) {
+return true;
+}
 return drv->bdrv_is_inserted(bs);
 }
 
diff --git a/block/block-backend.c b/block/block-backend.c
index 2256551..1db002c 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -769,7 +769,7 @@ void blk_invalidate_cache(BlockBackend *blk, Error **errp)
 bdrv_invalidate_cache(blk->bs, errp);
 }
 
-int blk_is_inserted(BlockBackend *blk)
+bool blk_is_inserted(BlockBackend *blk)
 {
 return bdrv_is_inserted(blk->bs);
 }
diff --git a/block/raw-posix.c b/block/raw-posix.c
index afd1c59..c014724 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -2389,15 +2389,13 @@ out:
 return prio;
 }
 
-static int cdrom_is_inserted(BlockDriverState *bs)
+static bool cdrom_is_inserted(BlockDriverState *bs)
 {
 BDRVRawState *s = bs->opaque;
 int ret;
 
 ret = ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT);
-if (ret == CDS_DISC_OK)
-return 1;
-return 0;
+return ret == CDS_DISC_OK;
 }
 
 static void cdrom_eject(BlockDriverState *bs, bool eject_flag)
@@ -2523,7 +2521,7 @@ static int cdrom_reopen(BlockDriverState *bs)
 return 0;
 }
 
-static int cdrom_is_inserted(BlockDriverState *bs)
+static bool cdrom_is_inserted(BlockDriverState *bs)
 {
 return raw_getlength(bs) > 0;
 }
diff --git a/block/raw_bsd.c b/block/raw_bsd.c
index 63ee911..3c7b413 100644
--- a/block/raw_bsd.c
+++ b/block/raw_bsd.c
@@ -154,7 +154,7 @@ static int raw_truncate(BlockDriverState *bs, int64_t 
offset)
 return bdrv_truncate(bs->file->bs, offset);
 }
 
-static int raw_is_inserted(BlockDriverState *bs)
+static bool raw_is_inserted(BlockDriverState *bs)
 {
 return bdrv_is_inserted(bs->file->bs);
 }
diff --git a/include/block/block.h b/include/block/block.h
index 1520dee..cb3e312 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -399,7 +399,7 @@ int bdrv_is_read_only(BlockDriverState *bs);
 int bdrv_is_sg(BlockDriverState *bs);
 int bdrv_enable_write_cache(BlockDriverState *bs);
 void bdrv_set_enable_write_cache(BlockDriverState *bs, bool wce);
-int bdrv_is_inserted(BlockDriverState *bs);
+bool bdrv_is_inserted(BlockDriverState *bs);
 int bdrv_media_changed(BlockDriverState *bs);
 void bdrv_lock_medium(BlockDriverState *bs, bool locked);
 void bdrv_eject(BlockDriverState *bs, bool eject_flag);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index c0e6513..40d40df 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -212,7 +212,7 @@ struct BlockDriver {
 const char *backing_file, const char *backing_fmt);
 
 /* removable device specific */
-int (*bdrv_is_inserted)(BlockDriverState *bs);
+bool (*bdrv_is_inserted)(BlockDriverState *bs);
 int (*bdrv_media_changed)(BlockDriverState *bs);
 void (*bdrv_eject)(BlockDriverState *bs, bool eject_flag);
 void (*bdrv_lock_medium)(BlockDriverState *bs, bool locked);
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 8fc960f..8f2bf10 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -130,7 +130,7 @@ int blk_is_sg(BlockBackend *blk);
 int blk_enable_write_cache(BlockBackend *blk);
 void blk_set_enable_write_cache(BlockBackend *blk, bool wce);
 void blk_invalidate_cache(BlockBackend *blk, Error **errp);
-int blk_is_inserted(BlockBackend *blk);
+bool blk_is_inserted(BlockBackend *blk);
 void blk_lock_medium(BlockBackend *blk, bool locked);
 void blk_eject(BlockBackend *blk, bool eject_flag);
 int blk_get_flags(BlockBackend *blk);
-- 
2.6.1




[Qemu-block] [PATCH v6 13/39] block: Move guest_block_size into BlockBackend

2015-10-12 Thread Max Reitz
guest_block_size is a guest device property so it should be moved into
the interface between block layer and guest devices, which is the
BlockBackend.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Alberto Garcia 
Reviewed-by: Kevin Wolf 
---
 block.c   | 7 ---
 block/block-backend.c | 7 +--
 include/block/block.h | 1 -
 include/block/block_int.h | 3 ---
 4 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/block.c b/block.c
index ccdef82..baad2b4 100644
--- a/block.c
+++ b/block.c
@@ -852,7 +852,6 @@ static int bdrv_open_common(BlockDriverState *bs, BdrvChild 
*file,
 goto fail_opts;
 }
 
-bs->guest_block_size = 512;
 bs->request_alignment = 512;
 bs->zero_beyond_eof = true;
 open_flags = bdrv_open_flags(bs, flags);
@@ -1992,7 +1991,6 @@ static void bdrv_move_feature_fields(BlockDriverState 
*bs_dest,
 /* move some fields that need to stay attached to the device */
 
 /* dev info */
-bs_dest->guest_block_size   = bs_src->guest_block_size;
 bs_dest->copy_on_read   = bs_src->copy_on_read;
 
 bs_dest->enable_write_cache = bs_src->enable_write_cache;
@@ -3197,11 +3195,6 @@ void bdrv_lock_medium(BlockDriverState *bs, bool locked)
 }
 }
 
-void bdrv_set_guest_block_size(BlockDriverState *bs, int align)
-{
-bs->guest_block_size = align;
-}
-
 BdrvDirtyBitmap *bdrv_find_dirty_bitmap(BlockDriverState *bs, const char *name)
 {
 BdrvDirtyBitmap *bm;
diff --git a/block/block-backend.c b/block/block-backend.c
index c7e0f7b..7bc2eb1 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -31,6 +31,9 @@ struct BlockBackend {
 /* TODO change to DeviceState when all users are qdevified */
 const BlockDevOps *dev_ops;
 void *dev_opaque;
+
+/* the block size for which the guest device expects atomicity */
+int guest_block_size;
 };
 
 typedef struct BlockBackendAIOCB {
@@ -351,7 +354,7 @@ void blk_detach_dev(BlockBackend *blk, void *dev)
 blk->dev = NULL;
 blk->dev_ops = NULL;
 blk->dev_opaque = NULL;
-bdrv_set_guest_block_size(blk->bs, 512);
+blk->guest_block_size = 512;
 blk_unref(blk);
 }
 
@@ -806,7 +809,7 @@ int blk_get_max_transfer_length(BlockBackend *blk)
 
 void blk_set_guest_block_size(BlockBackend *blk, int align)
 {
-bdrv_set_guest_block_size(blk->bs, align);
+blk->guest_block_size = align;
 }
 
 void *blk_blockalign(BlockBackend *blk, size_t size)
diff --git a/include/block/block.h b/include/block/block.h
index cb3e312..f923a01 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -466,7 +466,6 @@ void bdrv_img_create(const char *filename, const char *fmt,
 size_t bdrv_min_mem_align(BlockDriverState *bs);
 /* Returns optimal alignment in bytes for bounce buffer */
 size_t bdrv_opt_mem_align(BlockDriverState *bs);
-void bdrv_set_guest_block_size(BlockDriverState *bs, int align);
 void *qemu_blockalign(BlockDriverState *bs, size_t size);
 void *qemu_blockalign0(BlockDriverState *bs, size_t size);
 void *qemu_try_blockalign(BlockDriverState *bs, size_t size);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 40d40df..e79d8c0 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -411,9 +411,6 @@ struct BlockDriverState {
 /* Alignment requirement for offset/length of I/O requests */
 unsigned int request_alignment;
 
-/* the block size for which the guest device expects atomicity */
-int guest_block_size;
-
 /* do we need to tell the quest if we have a volatile write cache? */
 int enable_write_cache;
 
-- 
2.6.1




[Qemu-block] [PATCH v6 08/39] block/raw_bsd: Drop raw_is_inserted()

2015-10-12 Thread Max Reitz
With the new automatically-recursive implementation of
bdrv_is_inserted() checking by default whether all the children of a BDS
are inserted, we can drop raw's own implementation.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Kevin Wolf 
Reviewed-by: Alberto Garcia 
---
Note that this patch differs functionally from v5; I kept the R-bs,
however, since this is a trivial conflict in the code that is being
removed.
---
 block/raw_bsd.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/block/raw_bsd.c b/block/raw_bsd.c
index 3c7b413..0aded31 100644
--- a/block/raw_bsd.c
+++ b/block/raw_bsd.c
@@ -154,11 +154,6 @@ static int raw_truncate(BlockDriverState *bs, int64_t 
offset)
 return bdrv_truncate(bs->file->bs, offset);
 }
 
-static bool raw_is_inserted(BlockDriverState *bs)
-{
-return bdrv_is_inserted(bs->file->bs);
-}
-
 static int raw_media_changed(BlockDriverState *bs)
 {
 return bdrv_media_changed(bs->file->bs);
@@ -264,7 +259,6 @@ BlockDriver bdrv_raw = {
 .bdrv_refresh_limits  = &raw_refresh_limits,
 .bdrv_probe_blocksizes = &raw_probe_blocksizes,
 .bdrv_probe_geometry  = &raw_probe_geometry,
-.bdrv_is_inserted = &raw_is_inserted,
 .bdrv_media_changed   = &raw_media_changed,
 .bdrv_eject   = &raw_eject,
 .bdrv_lock_medium = &raw_lock_medium,
-- 
2.6.1




[Qemu-block] [PATCH v6 15/39] block: Move BlockAcctStats into BlockBackend

2015-10-12 Thread Max Reitz
As the comment above bdrv_get_stats() says, BlockAcctStats is something
which belongs to the device instead of each BlockDriverState. This patch
therefore moves it into the BlockBackend.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Alberto Garcia 
Reviewed-by: Kevin Wolf 
---
 block.c   | 11 ---
 block/block-backend.c |  5 -
 block/io.c|  6 +-
 block/qapi.c  | 24 ++--
 include/block/block.h |  2 --
 include/block/block_int.h |  3 ---
 6 files changed, 23 insertions(+), 28 deletions(-)

diff --git a/block.c b/block.c
index baad2b4..3e13b7f 100644
--- a/block.c
+++ b/block.c
@@ -4143,14 +4143,3 @@ void bdrv_refresh_filename(BlockDriverState *bs)
 QDECREF(json);
 }
 }
-
-/* This accessor function purpose is to allow the device models to access the
- * BlockAcctStats structure embedded inside a BlockDriverState without being
- * aware of the BlockDriverState structure layout.
- * It will go away when the BlockAcctStats structure will be moved inside
- * the device models.
- */
-BlockAcctStats *bdrv_get_stats(BlockDriverState *bs)
-{
-return &bs->stats;
-}
diff --git a/block/block-backend.c b/block/block-backend.c
index 7bc2eb1..a52037b 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -34,6 +34,9 @@ struct BlockBackend {
 
 /* the block size for which the guest device expects atomicity */
 int guest_block_size;
+
+/* I/O stats (display with "info blockstats"). */
+BlockAcctStats stats;
 };
 
 typedef struct BlockBackendAIOCB {
@@ -892,7 +895,7 @@ void blk_io_unplug(BlockBackend *blk)
 
 BlockAcctStats *blk_get_stats(BlockBackend *blk)
 {
-return bdrv_get_stats(blk->bs);
+return &blk->stats;
 }
 
 void *blk_aio_get(const AIOCBInfo *aiocb_info, BlockBackend *blk,
diff --git a/block/io.c b/block/io.c
index b80044b..2fd7a1d 100644
--- a/block/io.c
+++ b/block/io.c
@@ -23,6 +23,7 @@
  */
 
 #include "trace.h"
+#include "sysemu/block-backend.h"
 #include "block/blockjob.h"
 #include "block/block_int.h"
 #include "block/throttle-groups.h"
@@ -1905,7 +1906,10 @@ static int multiwrite_merge(BlockDriverState *bs, 
BlockRequest *reqs,
 }
 }
 
-block_acct_merge_done(&bs->stats, BLOCK_ACCT_WRITE, num_reqs - outidx - 1);
+if (bs->blk) {
+block_acct_merge_done(blk_get_stats(bs->blk), BLOCK_ACCT_WRITE,
+  num_reqs - outidx - 1);
+}
 
 return outidx + 1;
 }
diff --git a/block/qapi.c b/block/qapi.c
index 0360126..7c8209b 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -344,16 +344,20 @@ static BlockStats *bdrv_query_stats(const 
BlockDriverState *bs,
 }
 
 s->stats = g_malloc0(sizeof(*s->stats));
-s->stats->rd_bytes = bs->stats.nr_bytes[BLOCK_ACCT_READ];
-s->stats->wr_bytes = bs->stats.nr_bytes[BLOCK_ACCT_WRITE];
-s->stats->rd_operations = bs->stats.nr_ops[BLOCK_ACCT_READ];
-s->stats->wr_operations = bs->stats.nr_ops[BLOCK_ACCT_WRITE];
-s->stats->rd_merged = bs->stats.merged[BLOCK_ACCT_READ];
-s->stats->wr_merged = bs->stats.merged[BLOCK_ACCT_WRITE];
-s->stats->flush_operations = bs->stats.nr_ops[BLOCK_ACCT_FLUSH];
-s->stats->wr_total_time_ns = bs->stats.total_time_ns[BLOCK_ACCT_WRITE];
-s->stats->rd_total_time_ns = bs->stats.total_time_ns[BLOCK_ACCT_READ];
-s->stats->flush_total_time_ns = bs->stats.total_time_ns[BLOCK_ACCT_FLUSH];
+if (bs->blk) {
+BlockAcctStats *stats = blk_get_stats(bs->blk);
+
+s->stats->rd_bytes = stats->nr_bytes[BLOCK_ACCT_READ];
+s->stats->wr_bytes = stats->nr_bytes[BLOCK_ACCT_WRITE];
+s->stats->rd_operations = stats->nr_ops[BLOCK_ACCT_READ];
+s->stats->wr_operations = stats->nr_ops[BLOCK_ACCT_WRITE];
+s->stats->rd_merged = stats->merged[BLOCK_ACCT_READ];
+s->stats->wr_merged = stats->merged[BLOCK_ACCT_WRITE];
+s->stats->flush_operations = stats->nr_ops[BLOCK_ACCT_FLUSH];
+s->stats->wr_total_time_ns = stats->total_time_ns[BLOCK_ACCT_WRITE];
+s->stats->rd_total_time_ns = stats->total_time_ns[BLOCK_ACCT_READ];
+s->stats->flush_total_time_ns = stats->total_time_ns[BLOCK_ACCT_FLUSH];
+}
 
 s->stats->wr_highest_offset = bs->wr_highest_offset;
 
diff --git a/include/block/block.h b/include/block/block.h
index f923a01..d19903a 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -621,6 +621,4 @@ void bdrv_io_plug(BlockDriverState *bs);
 void bdrv_io_unplug(BlockDriverState *bs);
 void bdrv_flush_io_queue(BlockDriverState *bs);
 
-BlockAcctStats *bdrv_get_stats(BlockDriverState *bs);
-
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index b8e1c59..f9c7ec5 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -399,9 +399,6 @@ struct BlockDriverState {
 unsigned   pending_reqs[2];
 QLIST_ENTRY(BlockDriverState) round_robin;
 
-/* I/O stats (display with "info blockstats"

[Qemu-block] [PATCH v6 07/39] block: Make bdrv_is_inserted() recursive

2015-10-12 Thread Max Reitz
If bdrv_is_inserted() is called on the top level BDS, it should make
sure all nodes in the BDS tree are actually inserted.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Kevin Wolf 
---
 block.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index 15a11bf..363088c 100644
--- a/block.c
+++ b/block.c
@@ -3133,14 +3133,20 @@ void bdrv_invalidate_cache_all(Error **errp)
 bool bdrv_is_inserted(BlockDriverState *bs)
 {
 BlockDriver *drv = bs->drv;
+BdrvChild *child;
 
 if (!drv) {
 return false;
 }
-if (!drv->bdrv_is_inserted) {
-return true;
+if (drv->bdrv_is_inserted) {
+return drv->bdrv_is_inserted(bs);
 }
-return drv->bdrv_is_inserted(bs);
+QLIST_FOREACH(child, &bs->children, next) {
+if (!bdrv_is_inserted(child->bs)) {
+return false;
+}
+}
+return true;
 }
 
 /**
-- 
2.6.1




[Qemu-block] [PATCH v6 09/39] block: Invoke change media CB before NULLing drv

2015-10-12 Thread Max Reitz
In order to handle host device passthrough, some guest device models
may call blk_is_inserted() to check whether the medium is inserted on
the host, when checking the guest tray status.

This tray status is inquired by blk_dev_change_media_cb(); because
bdrv_is_inserted() (invoked by blk_is_inserted()) always returns false
for BDS with drv set to NULL, blk_dev_change_media_cb() should therefore
be called before drv is set to NULL.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Alberto Garcia 
Reviewed-by: Kevin Wolf 
---
 block.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index 363088c..ccdef82 100644
--- a/block.c
+++ b/block.c
@@ -1902,6 +1902,10 @@ void bdrv_close(BlockDriverState *bs)
 bdrv_drain(bs); /* in case flush left pending I/O */
 notifier_list_notify(&bs->close_notifiers, bs);
 
+if (bs->blk) {
+blk_dev_change_media_cb(bs->blk, false);
+}
+
 if (bs->drv) {
 BdrvChild *child, *next;
 
@@ -1940,10 +1944,6 @@ void bdrv_close(BlockDriverState *bs)
 bs->full_open_options = NULL;
 }
 
-if (bs->blk) {
-blk_dev_change_media_cb(bs->blk, false);
-}
-
 QLIST_FOREACH_SAFE(ban, &bs->aio_notifiers, list, ban_next) {
 g_free(ban);
 }
-- 
2.6.1




[Qemu-block] [PATCH v6 19/39] block: Make some BB functions fall back to BBRS

2015-10-12 Thread Max Reitz
If there is no BDS tree attached to a BlockBackend, functions that can
do so should fall back to the BlockBackendRootState structure.

Signed-off-by: Max Reitz 
---
 block/block-backend.c | 28 
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 6a3f0c7..d790870 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -872,7 +872,11 @@ void blk_error_action(BlockBackend *blk, BlockErrorAction 
action,
 
 int blk_is_read_only(BlockBackend *blk)
 {
-return bdrv_is_read_only(blk->bs);
+if (blk->bs) {
+return bdrv_is_read_only(blk->bs);
+} else {
+return blk->root_state.read_only;
+}
 }
 
 int blk_is_sg(BlockBackend *blk)
@@ -882,12 +886,24 @@ int blk_is_sg(BlockBackend *blk)
 
 int blk_enable_write_cache(BlockBackend *blk)
 {
-return bdrv_enable_write_cache(blk->bs);
+if (blk->bs) {
+return bdrv_enable_write_cache(blk->bs);
+} else {
+return !!(blk->root_state.open_flags & BDRV_O_CACHE_WB);
+}
 }
 
 void blk_set_enable_write_cache(BlockBackend *blk, bool wce)
 {
-bdrv_set_enable_write_cache(blk->bs, wce);
+if (blk->bs) {
+bdrv_set_enable_write_cache(blk->bs, wce);
+} else {
+if (wce) {
+blk->root_state.open_flags |= BDRV_O_CACHE_WB;
+} else {
+blk->root_state.open_flags &= ~BDRV_O_CACHE_WB;
+}
+}
 }
 
 void blk_invalidate_cache(BlockBackend *blk, Error **errp)
@@ -917,7 +933,11 @@ void blk_eject(BlockBackend *blk, bool eject_flag)
 
 int blk_get_flags(BlockBackend *blk)
 {
-return bdrv_get_flags(blk->bs);
+if (blk->bs) {
+return bdrv_get_flags(blk->bs);
+} else {
+return blk->root_state.open_flags;
+}
 }
 
 int blk_get_max_transfer_length(BlockBackend *blk)
-- 
2.6.1




[Qemu-block] [PATCH v6 14/39] block: Remove wr_highest_sector from BlockAcctStats

2015-10-12 Thread Max Reitz
BlockAcctStats contains statistics about the data transferred from and
to the device; wr_highest_sector does not fit in with the rest.

Furthermore, those statistics are supposed to be specific for a certain
device and not necessarily for a BDS (see the comment above
bdrv_get_stats()); on the other hand, wr_highest_sector may be a rather
important information to know for each BDS. When BlockAcctStats is
finally removed from the BDS, we will want to keep wr_highest_sector in
the BDS.

Finally, wr_highest_sector is renamed to wr_highest_offset and given the
appropriate meaning. Externally, it is represented as an offset so there
is no point in doing something different internally. Its definition is
changed to match that in qapi/block-core.json which is "the offset after
the greatest byte written to". Doing so should not cause any harm since
if external programs tried to calculate the volume usage by
(wr_highest_offset + 512) / volume_size, after this patch they will just
assume the volume to be full slightly earlier than before.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Alberto Garcia 
Reviewed-by: Kevin Wolf 
---
 block/accounting.c | 8 
 block/io.c | 4 +++-
 block/qapi.c   | 4 ++--
 include/block/accounting.h | 3 ---
 include/block/block_int.h  | 3 +++
 qmp-commands.hx| 4 ++--
 6 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/block/accounting.c b/block/accounting.c
index 01d594f..a423560 100644
--- a/block/accounting.c
+++ b/block/accounting.c
@@ -47,14 +47,6 @@ void block_acct_done(BlockAcctStats *stats, BlockAcctCookie 
*cookie)
 }
 
 
-void block_acct_highest_sector(BlockAcctStats *stats, int64_t sector_num,
-   unsigned int nb_sectors)
-{
-if (stats->wr_highest_sector < sector_num + nb_sectors - 1) {
-stats->wr_highest_sector = sector_num + nb_sectors - 1;
-}
-}
-
 void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type,
   int num_requests)
 {
diff --git a/block/io.c b/block/io.c
index 5311473..b80044b 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1151,7 +1151,9 @@ static int coroutine_fn 
bdrv_aligned_pwritev(BlockDriverState *bs,
 
 bdrv_set_dirty(bs, sector_num, nb_sectors);
 
-block_acct_highest_sector(&bs->stats, sector_num, nb_sectors);
+if (bs->wr_highest_offset < offset + bytes) {
+bs->wr_highest_offset = offset + bytes;
+}
 
 if (ret >= 0) {
 bs->total_sectors = MAX(bs->total_sectors, sector_num + nb_sectors);
diff --git a/block/qapi.c b/block/qapi.c
index 355ba32..0360126 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -350,13 +350,13 @@ static BlockStats *bdrv_query_stats(const 
BlockDriverState *bs,
 s->stats->wr_operations = bs->stats.nr_ops[BLOCK_ACCT_WRITE];
 s->stats->rd_merged = bs->stats.merged[BLOCK_ACCT_READ];
 s->stats->wr_merged = bs->stats.merged[BLOCK_ACCT_WRITE];
-s->stats->wr_highest_offset =
-bs->stats.wr_highest_sector * BDRV_SECTOR_SIZE;
 s->stats->flush_operations = bs->stats.nr_ops[BLOCK_ACCT_FLUSH];
 s->stats->wr_total_time_ns = bs->stats.total_time_ns[BLOCK_ACCT_WRITE];
 s->stats->rd_total_time_ns = bs->stats.total_time_ns[BLOCK_ACCT_READ];
 s->stats->flush_total_time_ns = bs->stats.total_time_ns[BLOCK_ACCT_FLUSH];
 
+s->stats->wr_highest_offset = bs->wr_highest_offset;
+
 if (bs->file) {
 s->has_parent = true;
 s->parent = bdrv_query_stats(bs->file->bs, query_backing);
diff --git a/include/block/accounting.h b/include/block/accounting.h
index 4c406cf..66637cd 100644
--- a/include/block/accounting.h
+++ b/include/block/accounting.h
@@ -40,7 +40,6 @@ typedef struct BlockAcctStats {
 uint64_t nr_ops[BLOCK_MAX_IOTYPE];
 uint64_t total_time_ns[BLOCK_MAX_IOTYPE];
 uint64_t merged[BLOCK_MAX_IOTYPE];
-uint64_t wr_highest_sector;
 } BlockAcctStats;
 
 typedef struct BlockAcctCookie {
@@ -52,8 +51,6 @@ typedef struct BlockAcctCookie {
 void block_acct_start(BlockAcctStats *stats, BlockAcctCookie *cookie,
   int64_t bytes, enum BlockAcctType type);
 void block_acct_done(BlockAcctStats *stats, BlockAcctCookie *cookie);
-void block_acct_highest_sector(BlockAcctStats *stats, int64_t sector_num,
-   unsigned int nb_sectors);
 void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type,
int num_requests);
 
diff --git a/include/block/block_int.h b/include/block/block_int.h
index e79d8c0..b8e1c59 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -402,6 +402,9 @@ struct BlockDriverState {
 /* I/O stats (display with "info blockstats"). */
 BlockAcctStats stats;
 
+/* Offset after the highest byte written to */
+uint64_t wr_highest_offset;
+
 /* I/O Limits */
 BlockLimits bl;
 
diff --git a/qmp-commands.hx b/qmp-commands.hx
index d2ba800..785ecf6 100644
--- a/qmp-c

[Qemu-block] [PATCH v6 12/39] block: Fix BB AIOCB AioContext without BDS

2015-10-12 Thread Max Reitz
Fix the BlockBackend's AIOCB AioContext for aborting AIO in case there
is no BDS. If there is no implementation of AIOCBInfo::get_aio_context()
the AioContext is derived from the BDS the AIOCB belongs to. If that BDS
is NULL (because it has been removed from the BB) this will not work.

This patch makes blk_get_aio_context() fall back to the main loop
context if the BDS pointer is NULL and implements
AIOCBInfo::get_aio_context() (blk_aiocb_get_aio_context()) which invokes
blk_get_aio_context().

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Alberto Garcia 
Reviewed-by: Kevin Wolf 
---
 block/block-backend.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 74642dc..c7e0f7b 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -18,6 +18,8 @@
 /* Number of coroutines to reserve per attached device model */
 #define COROUTINE_POOL_RESERVATION 64
 
+static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb);
+
 struct BlockBackend {
 char *name;
 int refcnt;
@@ -34,10 +36,12 @@ struct BlockBackend {
 typedef struct BlockBackendAIOCB {
 BlockAIOCB common;
 QEMUBH *bh;
+BlockBackend *blk;
 int ret;
 } BlockBackendAIOCB;
 
 static const AIOCBInfo block_backend_aiocb_info = {
+.get_aio_context = blk_aiocb_get_aio_context,
 .aiocb_size = sizeof(BlockBackendAIOCB),
 };
 
@@ -558,6 +562,7 @@ static BlockAIOCB *abort_aio_request(BlockBackend *blk, 
BlockCompletionFunc *cb,
 QEMUBH *bh;
 
 acb = blk_aio_get(&block_backend_aiocb_info, blk, cb, opaque);
+acb->blk = blk;
 acb->ret = ret;
 
 bh = aio_bh_new(blk_get_aio_context(blk), error_callback_bh, acb);
@@ -831,7 +836,17 @@ void blk_op_unblock_all(BlockBackend *blk, Error *reason)
 
 AioContext *blk_get_aio_context(BlockBackend *blk)
 {
-return bdrv_get_aio_context(blk->bs);
+if (blk->bs) {
+return bdrv_get_aio_context(blk->bs);
+} else {
+return qemu_get_aio_context();
+}
+}
+
+static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb)
+{
+BlockBackendAIOCB *blk_acb = DO_UPCAST(BlockBackendAIOCB, common, acb);
+return blk_get_aio_context(blk_acb->blk);
 }
 
 void blk_set_aio_context(BlockBackend *blk, AioContext *new_context)
-- 
2.6.1




[Qemu-block] [PATCH v6 21/39] block: Prepare remaining BB functions for NULL BDS

2015-10-12 Thread Max Reitz
There are several BlockBackend functions which, in theory, cannot fail.
This patch makes them cope with the BlockDriverState pointer being NULL
by making them fall back to some default action like ignoring the value
in setters and returning the default in getters.

Signed-off-by: Max Reitz 
---
 block/block-backend.c | 72 +++
 1 file changed, 56 insertions(+), 16 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 2779c22..a5c58c5 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -677,7 +677,11 @@ int64_t blk_getlength(BlockBackend *blk)
 
 void blk_get_geometry(BlockBackend *blk, uint64_t *nb_sectors_ptr)
 {
-bdrv_get_geometry(blk->bs, nb_sectors_ptr);
+if (!blk->bs) {
+*nb_sectors_ptr = 0;
+} else {
+bdrv_get_geometry(blk->bs, nb_sectors_ptr);
+}
 }
 
 int64_t blk_nb_sectors(BlockBackend *blk)
@@ -813,7 +817,9 @@ int blk_flush_all(void)
 
 void blk_drain(BlockBackend *blk)
 {
-bdrv_drain(blk->bs);
+if (blk->bs) {
+bdrv_drain(blk->bs);
+}
 }
 
 void blk_drain_all(void)
@@ -909,6 +915,10 @@ int blk_is_read_only(BlockBackend *blk)
 
 int blk_is_sg(BlockBackend *blk)
 {
+if (!blk->bs) {
+return 0;
+}
+
 return bdrv_is_sg(blk->bs);
 }
 
@@ -956,12 +966,16 @@ bool blk_is_available(BlockBackend *blk)
 
 void blk_lock_medium(BlockBackend *blk, bool locked)
 {
-bdrv_lock_medium(blk->bs, locked);
+if (blk->bs) {
+bdrv_lock_medium(blk->bs, locked);
+}
 }
 
 void blk_eject(BlockBackend *blk, bool eject_flag)
 {
-bdrv_eject(blk->bs, eject_flag);
+if (blk->bs) {
+bdrv_eject(blk->bs, eject_flag);
+}
 }
 
 int blk_get_flags(BlockBackend *blk)
@@ -975,7 +989,11 @@ int blk_get_flags(BlockBackend *blk)
 
 int blk_get_max_transfer_length(BlockBackend *blk)
 {
-return blk->bs->bl.max_transfer_length;
+if (blk->bs) {
+return blk->bs->bl.max_transfer_length;
+} else {
+return 0;
+}
 }
 
 void blk_set_guest_block_size(BlockBackend *blk, int align)
@@ -990,22 +1008,32 @@ void *blk_blockalign(BlockBackend *blk, size_t size)
 
 bool blk_op_is_blocked(BlockBackend *blk, BlockOpType op, Error **errp)
 {
+if (!blk->bs) {
+return false;
+}
+
 return bdrv_op_is_blocked(blk->bs, op, errp);
 }
 
 void blk_op_unblock(BlockBackend *blk, BlockOpType op, Error *reason)
 {
-bdrv_op_unblock(blk->bs, op, reason);
+if (blk->bs) {
+bdrv_op_unblock(blk->bs, op, reason);
+}
 }
 
 void blk_op_block_all(BlockBackend *blk, Error *reason)
 {
-bdrv_op_block_all(blk->bs, reason);
+if (blk->bs) {
+bdrv_op_block_all(blk->bs, reason);
+}
 }
 
 void blk_op_unblock_all(BlockBackend *blk, Error *reason)
 {
-bdrv_op_unblock_all(blk->bs, reason);
+if (blk->bs) {
+bdrv_op_unblock_all(blk->bs, reason);
+}
 }
 
 AioContext *blk_get_aio_context(BlockBackend *blk)
@@ -1025,15 +1053,19 @@ static AioContext *blk_aiocb_get_aio_context(BlockAIOCB 
*acb)
 
 void blk_set_aio_context(BlockBackend *blk, AioContext *new_context)
 {
-bdrv_set_aio_context(blk->bs, new_context);
+if (blk->bs) {
+bdrv_set_aio_context(blk->bs, new_context);
+}
 }
 
 void blk_add_aio_context_notifier(BlockBackend *blk,
 void (*attached_aio_context)(AioContext *new_context, void *opaque),
 void (*detach_aio_context)(void *opaque), void *opaque)
 {
-bdrv_add_aio_context_notifier(blk->bs, attached_aio_context,
-  detach_aio_context, opaque);
+if (blk->bs) {
+bdrv_add_aio_context_notifier(blk->bs, attached_aio_context,
+  detach_aio_context, opaque);
+}
 }
 
 void blk_remove_aio_context_notifier(BlockBackend *blk,
@@ -1042,23 +1074,31 @@ void blk_remove_aio_context_notifier(BlockBackend *blk,
  void (*detach_aio_context)(void *),
  void *opaque)
 {
-bdrv_remove_aio_context_notifier(blk->bs, attached_aio_context,
- detach_aio_context, opaque);
+if (blk->bs) {
+bdrv_remove_aio_context_notifier(blk->bs, attached_aio_context,
+ detach_aio_context, opaque);
+}
 }
 
 void blk_add_close_notifier(BlockBackend *blk, Notifier *notify)
 {
-bdrv_add_close_notifier(blk->bs, notify);
+if (blk->bs) {
+bdrv_add_close_notifier(blk->bs, notify);
+}
 }
 
 void blk_io_plug(BlockBackend *blk)
 {
-bdrv_io_plug(blk->bs);
+if (blk->bs) {
+bdrv_io_plug(blk->bs);
+}
 }
 
 void blk_io_unplug(BlockBackend *blk)
 {
-bdrv_io_unplug(blk->bs);
+if (blk->bs) {
+bdrv_io_unplug(blk->bs);
+}
 }
 
 BlockAcctStats *blk_get_stats(BlockBackend *blk)
-- 
2.6.1




[Qemu-block] [PATCH v6 18/39] block: Add BlockBackendRootState

2015-10-12 Thread Max Reitz
This structure will store some of the state of the root BDS if the BDS
tree is removed, so that state can be restored once a new BDS tree is
inserted.

Signed-off-by: Max Reitz 
---
 block/block-backend.c  | 40 
 include/block/block_int.h  | 10 ++
 include/qemu/typedefs.h|  1 +
 include/sysemu/block-backend.h |  2 ++
 4 files changed, 53 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 2708ad1..6a3f0c7 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -13,6 +13,7 @@
 #include "sysemu/block-backend.h"
 #include "block/block_int.h"
 #include "block/blockjob.h"
+#include "block/throttle-groups.h"
 #include "sysemu/blockdev.h"
 #include "sysemu/sysemu.h"
 #include "qapi-event.h"
@@ -37,6 +38,10 @@ struct BlockBackend {
 /* the block size for which the guest device expects atomicity */
 int guest_block_size;
 
+/* If the BDS tree is removed, some of its options are stored here (which
+ * can be used to restore those options in the new BDS on insert) */
+BlockBackendRootState root_state;
+
 /* I/O stats (display with "info blockstats"). */
 BlockAcctStats stats;
 
@@ -161,6 +166,10 @@ static void blk_delete(BlockBackend *blk)
 bdrv_unref(blk->bs);
 blk->bs = NULL;
 }
+if (blk->root_state.throttle_state) {
+g_free(blk->root_state.throttle_group);
+throttle_group_unref(blk->root_state.throttle_state);
+}
 /* Avoid double-remove after blk_hide_on_behalf_of_hmp_drive_del() */
 if (blk->name[0]) {
 QTAILQ_REMOVE(&blk_backends, blk, link);
@@ -1067,3 +1076,34 @@ int blk_probe_geometry(BlockBackend *blk, HDGeometry 
*geo)
 {
 return bdrv_probe_geometry(blk->bs, geo);
 }
+
+/*
+ * Updates the BlockBackendRootState object with data from the currently
+ * attached BlockDriverState.
+ */
+void blk_update_root_state(BlockBackend *blk)
+{
+assert(blk->bs);
+
+blk->root_state.open_flags= blk->bs->open_flags;
+blk->root_state.read_only = blk->bs->read_only;
+blk->root_state.detect_zeroes = blk->bs->detect_zeroes;
+
+if (blk->root_state.throttle_group) {
+g_free(blk->root_state.throttle_group);
+throttle_group_unref(blk->root_state.throttle_state);
+}
+if (blk->bs->throttle_state) {
+const char *name = throttle_group_get_name(blk->bs);
+blk->root_state.throttle_group = g_strdup(name);
+blk->root_state.throttle_state = throttle_group_incref(name);
+} else {
+blk->root_state.throttle_group = NULL;
+blk->root_state.throttle_state = NULL;
+}
+}
+
+BlockBackendRootState *blk_get_root_state(BlockBackend *blk)
+{
+return &blk->root_state;
+}
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 009d6ea..e472a03 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -26,6 +26,7 @@
 
 #include "block/accounting.h"
 #include "block/block.h"
+#include "block/throttle-groups.h"
 #include "qemu/option.h"
 #include "qemu/queue.h"
 #include "block/coroutine.h"
@@ -449,6 +450,15 @@ struct BlockDriverState {
 NotifierWithReturn write_threshold_notifier;
 };
 
+struct BlockBackendRootState {
+int open_flags;
+bool read_only;
+BlockdevDetectZeroesOptions detect_zeroes;
+
+char *throttle_group;
+ThrottleState *throttle_state;
+};
+
 static inline BlockDriverState *backing_bs(BlockDriverState *bs)
 {
 return bs->backing ? bs->backing->bs : NULL;
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index ee1ce1d..2e39751 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -11,6 +11,7 @@ typedef struct AddressSpace AddressSpace;
 typedef struct AioContext AioContext;
 typedef struct AudioState AudioState;
 typedef struct BlockBackend BlockBackend;
+typedef struct BlockBackendRootState BlockBackendRootState;
 typedef struct BlockDriverState BlockDriverState;
 typedef struct BusClass BusClass;
 typedef struct BusState BusState;
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index eafcef0..52e35a1 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -163,6 +163,8 @@ void blk_add_close_notifier(BlockBackend *blk, Notifier 
*notify);
 void blk_io_plug(BlockBackend *blk);
 void blk_io_unplug(BlockBackend *blk);
 BlockAcctStats *blk_get_stats(BlockBackend *blk);
+BlockBackendRootState *blk_get_root_state(BlockBackend *blk);
+void blk_update_root_state(BlockBackend *blk);
 
 void *blk_aio_get(const AIOCBInfo *aiocb_info, BlockBackend *blk,
   BlockCompletionFunc *cb, void *opaque);
-- 
2.6.1




[Qemu-block] [PATCH v6 23/39] block: Prepare for NULL BDS

2015-10-12 Thread Max Reitz
blk_bs() will not necessarily return a non-NULL value any more (unless
blk_is_available() is true or it can be assumed to otherwise, e.g.
because it is called immediately after a successful blk_new_with_bs() or
blk_new_open()).

Signed-off-by: Max Reitz 
---
 block.c |   5 ++
 block/qapi.c|   4 +-
 blockdev.c  | 201 ++--
 hw/block/xen_disk.c |   4 +-
 migration/block.c   |   5 ++
 monitor.c   |   4 ++
 6 files changed, 153 insertions(+), 70 deletions(-)

diff --git a/block.c b/block.c
index 48f7067..e5f00e4 100644
--- a/block.c
+++ b/block.c
@@ -2673,6 +2673,11 @@ BlockDriverState *bdrv_lookup_bs(const char *device,
 blk = blk_by_name(device);
 
 if (blk) {
+if (!blk_bs(blk)) {
+error_setg(errp, "Device '%s' has no medium", device);
+return NULL;
+}
+
 return blk_bs(blk);
 }
 }
diff --git a/block/qapi.c b/block/qapi.c
index 3b46f97..ec0f513 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -306,12 +306,12 @@ static void bdrv_query_info(BlockBackend *blk, BlockInfo 
**p_info,
 info->io_status = blk_iostatus(blk);
 }
 
-if (!QLIST_EMPTY(&bs->dirty_bitmaps)) {
+if (bs && !QLIST_EMPTY(&bs->dirty_bitmaps)) {
 info->has_dirty_bitmaps = true;
 info->dirty_bitmaps = bdrv_query_dirty_bitmaps(bs);
 }
 
-if (bs->drv) {
+if (bs && bs->drv) {
 info->has_inserted = true;
 info->inserted = bdrv_block_device_info(bs, errp);
 if (info->inserted == NULL) {
diff --git a/blockdev.c b/blockdev.c
index 25959eb..35efe84 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -124,14 +124,16 @@ void blockdev_mark_auto_del(BlockBackend *blk)
 return;
 }
 
-aio_context = bdrv_get_aio_context(bs);
-aio_context_acquire(aio_context);
+if (bs) {
+aio_context = bdrv_get_aio_context(bs);
+aio_context_acquire(aio_context);
 
-if (bs->job) {
-block_job_cancel(bs->job);
-}
+if (bs->job) {
+block_job_cancel(bs->job);
+}
 
-aio_context_release(aio_context);
+aio_context_release(aio_context);
+}
 
 dinfo->auto_del = 1;
 }
@@ -229,8 +231,8 @@ bool drive_check_orphaned(void)
 dinfo->type != IF_NONE) {
 fprintf(stderr, "Warning: Orphaned drive without device: "
 "id=%s,file=%s,if=%s,bus=%d,unit=%d\n",
-blk_name(blk), blk_bs(blk)->filename, if_name[dinfo->type],
-dinfo->bus, dinfo->unit);
+blk_name(blk), blk_bs(blk) ? blk_bs(blk)->filename : "",
+if_name[dinfo->type], dinfo->bus, dinfo->unit);
 rs = true;
 }
 }
@@ -1040,6 +1042,10 @@ void hmp_commit(Monitor *mon, const QDict *qdict)
 monitor_printf(mon, "Device '%s' not found\n", device);
 return;
 }
+if (!blk_is_available(blk)) {
+monitor_printf(mon, "Device '%s' has no medium\n", device);
+return;
+}
 ret = bdrv_commit(blk_bs(blk));
 }
 if (ret < 0) {
@@ -1119,7 +1125,9 @@ SnapshotInfo 
*qmp_blockdev_snapshot_delete_internal_sync(const char *device,
   "Device '%s' not found", device);
 return NULL;
 }
-bs = blk_bs(blk);
+
+aio_context = blk_get_aio_context(blk);
+aio_context_acquire(aio_context);
 
 if (!has_id) {
 id = NULL;
@@ -1131,11 +1139,14 @@ SnapshotInfo 
*qmp_blockdev_snapshot_delete_internal_sync(const char *device,
 
 if (!id && !name) {
 error_setg(errp, "Name or id must be provided");
-return NULL;
+goto out_aio_context;
 }
 
-aio_context = bdrv_get_aio_context(bs);
-aio_context_acquire(aio_context);
+if (!blk_is_available(blk)) {
+error_setg(errp, "Device '%s' has no medium", device);
+goto out_aio_context;
+}
+bs = blk_bs(blk);
 
 if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_INTERNAL_SNAPSHOT_DELETE, errp)) {
 goto out_aio_context;
@@ -1309,16 +1320,16 @@ static void 
internal_snapshot_prepare(BlkTransactionState *common,
   "Device '%s' not found", device);
 return;
 }
-bs = blk_bs(blk);
 
 /* AioContext is released in .clean() */
-state->aio_context = bdrv_get_aio_context(bs);
+state->aio_context = blk_get_aio_context(blk);
 aio_context_acquire(state->aio_context);
 
-if (!bdrv_is_inserted(bs)) {
+if (!blk_is_available(blk)) {
 error_setg(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
 return;
 }
+bs = blk_bs(blk);
 
 if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_INTERNAL_SNAPSHOT, errp)) {
 return;
@@ -1570,7 +1581,6 @@ typedef struct DriveBackupState {
 static void drive_backup_prepare(BlkTransactionState *common, Error **errp)
 {
 DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common

[Qemu-block] [PATCH v6 11/39] hw/usb-storage: Check whether BB is inserted

2015-10-12 Thread Max Reitz
Only call bdrv_add_key() on the BlockDriverState if it is not NULL.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Alberto Garcia 
Reviewed-by: Kevin Wolf 
---
 hw/usb/dev-storage.c | 30 --
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/hw/usb/dev-storage.c b/hw/usb/dev-storage.c
index 9a4e7dc..597d8fd 100644
--- a/hw/usb/dev-storage.c
+++ b/hw/usb/dev-storage.c
@@ -613,20 +613,22 @@ static void usb_msd_realize_storage(USBDevice *dev, Error 
**errp)
 return;
 }
 
-bdrv_add_key(blk_bs(blk), NULL, &err);
-if (err) {
-if (monitor_cur_is_qmp()) {
-error_propagate(errp, err);
-return;
-}
-error_free(err);
-err = NULL;
-if (cur_mon) {
-monitor_read_bdrv_key_start(cur_mon, blk_bs(blk),
-usb_msd_password_cb, s);
-s->dev.auto_attach = 0;
-} else {
-autostart = 0;
+if (blk_bs(blk)) {
+bdrv_add_key(blk_bs(blk), NULL, &err);
+if (err) {
+if (monitor_cur_is_qmp()) {
+error_propagate(errp, err);
+return;
+}
+error_free(err);
+err = NULL;
+if (cur_mon) {
+monitor_read_bdrv_key_start(cur_mon, blk_bs(blk),
+usb_msd_password_cb, s);
+s->dev.auto_attach = 0;
+} else {
+autostart = 0;
+}
 }
 }
 
-- 
2.6.1




[Qemu-block] [PATCH v6 10/39] hw/block/fdc: Implement tray status

2015-10-12 Thread Max Reitz
The tray of an FDD is open iff there is no medium inserted (there are
only two states for an FDD: "medium inserted" or "no medium inserted").

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Kevin Wolf 
---
 hw/block/fdc.c   | 20 
 tests/fdc-test.c |  4 +---
 2 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index 6686a72..4292ece 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -192,6 +192,8 @@ typedef struct FDrive {
 uint8_t ro;   /* Is read-only   */
 uint8_t media_changed;/* Is media changed   */
 uint8_t media_rate;   /* Data rate of medium*/
+
+bool media_inserted;  /* Is there a medium in the tray */
 } FDrive;
 
 static void fd_init(FDrive *drv)
@@ -261,7 +263,7 @@ static int fd_seek(FDrive *drv, uint8_t head, uint8_t 
track, uint8_t sect,
 #endif
 drv->head = head;
 if (drv->track != track) {
-if (drv->blk != NULL && blk_is_inserted(drv->blk)) {
+if (drv->media_inserted) {
 drv->media_changed = 0;
 }
 ret = 1;
@@ -270,7 +272,7 @@ static int fd_seek(FDrive *drv, uint8_t head, uint8_t 
track, uint8_t sect,
 drv->sect = sect;
 }
 
-if (drv->blk == NULL || !blk_is_inserted(drv->blk)) {
+if (!drv->media_inserted) {
 ret = 2;
 }
 
@@ -296,7 +298,7 @@ static void fd_revalidate(FDrive *drv)
 ro = blk_is_read_only(drv->blk);
 pick_geometry(drv->blk, &nb_heads, &max_track,
   &last_sect, drv->drive, &drive, &rate);
-if (!blk_is_inserted(drv->blk)) {
+if (!drv->media_inserted) {
 FLOPPY_DPRINTF("No disk in drive\n");
 } else {
 FLOPPY_DPRINTF("Floppy disk (%d h %d t %d s) %s\n", nb_heads,
@@ -692,7 +694,7 @@ static bool fdrive_media_changed_needed(void *opaque)
 {
 FDrive *drive = opaque;
 
-return (drive->blk != NULL && drive->media_changed != 1);
+return (drive->media_inserted && drive->media_changed != 1);
 }
 
 static const VMStateDescription vmstate_fdrive_media_changed = {
@@ -2184,12 +2186,21 @@ static void fdctrl_change_cb(void *opaque, bool load)
 {
 FDrive *drive = opaque;
 
+drive->media_inserted = load && drive->blk && blk_is_inserted(drive->blk);
+
 drive->media_changed = 1;
 fd_revalidate(drive);
 }
 
+static bool fdctrl_is_tray_open(void *opaque)
+{
+FDrive *drive = opaque;
+return !drive->media_inserted;
+}
+
 static const BlockDevOps fdctrl_block_ops = {
 .change_media_cb = fdctrl_change_cb,
+.is_tray_open = fdctrl_is_tray_open,
 };
 
 /* Init functions */
@@ -2217,6 +2228,7 @@ static void fdctrl_connect_drives(FDCtrl *fdctrl, Error 
**errp)
 fdctrl_change_cb(drive, 0);
 if (drive->blk) {
 blk_set_dev_ops(drive->blk, &fdctrl_block_ops, drive);
+drive->media_inserted = blk_is_inserted(drive->blk);
 }
 }
 }
diff --git a/tests/fdc-test.c b/tests/fdc-test.c
index 416394f..b5a4696 100644
--- a/tests/fdc-test.c
+++ b/tests/fdc-test.c
@@ -304,9 +304,7 @@ static void test_media_insert(void)
 qmp_discard_response("{'execute':'change', 'arguments':{"
  " 'device':'floppy0', 'target': %s, 'arg': 'raw' }}",
  test_image);
-qmp_discard_response(""); /* ignore event
- (FIXME open -> open transition?!) */
-qmp_discard_response(""); /* ignore event */
+qmp_discard_response(""); /* ignore event (open -> close) */
 
 dir = inb(FLOPPY_BASE + reg_dir);
 assert_bit_set(dir, DSKCHG);
-- 
2.6.1




[Qemu-block] [PATCH v6 20/39] block: Fail requests to empty BlockBackend

2015-10-12 Thread Max Reitz
If there is no BlockDriverState in a BlockBackend or if the tray of the
guest device is open, fail all requests (where that is possible) with
-ENOMEDIUM.

The reason the status of the guest device is taken into account is
because once the guest device's tray is opened, any request on the same
BlockBackend as the guest uses should fail. If the BDS tree is supposed
to be usable even after ejecting it from the guest, a different
BlockBackend must be used.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block/block-backend.c | 55 ++-
 1 file changed, 54 insertions(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index d790870..2779c22 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -529,7 +529,7 @@ static int blk_check_byte_request(BlockBackend *blk, 
int64_t offset,
 return -EIO;
 }
 
-if (!blk_is_inserted(blk)) {
+if (!blk_is_available(blk)) {
 return -ENOMEDIUM;
 }
 
@@ -668,6 +668,10 @@ int blk_pwrite(BlockBackend *blk, int64_t offset, const 
void *buf, int count)
 
 int64_t blk_getlength(BlockBackend *blk)
 {
+if (!blk_is_available(blk)) {
+return -ENOMEDIUM;
+}
+
 return bdrv_getlength(blk->bs);
 }
 
@@ -678,6 +682,10 @@ void blk_get_geometry(BlockBackend *blk, uint64_t 
*nb_sectors_ptr)
 
 int64_t blk_nb_sectors(BlockBackend *blk)
 {
+if (!blk_is_available(blk)) {
+return -ENOMEDIUM;
+}
+
 return bdrv_nb_sectors(blk->bs);
 }
 
@@ -708,6 +716,10 @@ BlockAIOCB *blk_aio_writev(BlockBackend *blk, int64_t 
sector_num,
 BlockAIOCB *blk_aio_flush(BlockBackend *blk,
   BlockCompletionFunc *cb, void *opaque)
 {
+if (!blk_is_available(blk)) {
+return abort_aio_request(blk, cb, opaque, -ENOMEDIUM);
+}
+
 return bdrv_aio_flush(blk->bs, cb, opaque);
 }
 
@@ -749,12 +761,20 @@ int blk_aio_multiwrite(BlockBackend *blk, BlockRequest 
*reqs, int num_reqs)
 
 int blk_ioctl(BlockBackend *blk, unsigned long int req, void *buf)
 {
+if (!blk_is_available(blk)) {
+return -ENOMEDIUM;
+}
+
 return bdrv_ioctl(blk->bs, req, buf);
 }
 
 BlockAIOCB *blk_aio_ioctl(BlockBackend *blk, unsigned long int req, void *buf,
   BlockCompletionFunc *cb, void *opaque)
 {
+if (!blk_is_available(blk)) {
+return abort_aio_request(blk, cb, opaque, -ENOMEDIUM);
+}
+
 return bdrv_aio_ioctl(blk->bs, req, buf, cb, opaque);
 }
 
@@ -770,11 +790,19 @@ int blk_co_discard(BlockBackend *blk, int64_t sector_num, 
int nb_sectors)
 
 int blk_co_flush(BlockBackend *blk)
 {
+if (!blk_is_available(blk)) {
+return -ENOMEDIUM;
+}
+
 return bdrv_co_flush(blk->bs);
 }
 
 int blk_flush(BlockBackend *blk)
 {
+if (!blk_is_available(blk)) {
+return -ENOMEDIUM;
+}
+
 return bdrv_flush(blk->bs);
 }
 
@@ -908,6 +936,11 @@ void blk_set_enable_write_cache(BlockBackend *blk, bool 
wce)
 
 void blk_invalidate_cache(BlockBackend *blk, Error **errp)
 {
+if (!blk->bs) {
+error_setg(errp, "Device '%s' has no medium", blk->name);
+return;
+}
+
 bdrv_invalidate_cache(blk->bs, errp);
 }
 
@@ -1063,6 +1096,10 @@ int blk_write_compressed(BlockBackend *blk, int64_t 
sector_num,
 
 int blk_truncate(BlockBackend *blk, int64_t offset)
 {
+if (!blk_is_available(blk)) {
+return -ENOMEDIUM;
+}
+
 return bdrv_truncate(blk->bs, offset);
 }
 
@@ -1079,21 +1116,37 @@ int blk_discard(BlockBackend *blk, int64_t sector_num, 
int nb_sectors)
 int blk_save_vmstate(BlockBackend *blk, const uint8_t *buf,
  int64_t pos, int size)
 {
+if (!blk_is_available(blk)) {
+return -ENOMEDIUM;
+}
+
 return bdrv_save_vmstate(blk->bs, buf, pos, size);
 }
 
 int blk_load_vmstate(BlockBackend *blk, uint8_t *buf, int64_t pos, int size)
 {
+if (!blk_is_available(blk)) {
+return -ENOMEDIUM;
+}
+
 return bdrv_load_vmstate(blk->bs, buf, pos, size);
 }
 
 int blk_probe_blocksizes(BlockBackend *blk, BlockSizes *bsz)
 {
+if (!blk_is_available(blk)) {
+return -ENOMEDIUM;
+}
+
 return bdrv_probe_blocksizes(blk->bs, bsz);
 }
 
 int blk_probe_geometry(BlockBackend *blk, HDGeometry *geo)
 {
+if (!blk_is_available(blk)) {
+return -ENOMEDIUM;
+}
+
 return bdrv_probe_geometry(blk->bs, geo);
 }
 
-- 
2.6.1




[Qemu-block] [PATCH v6 26/39] blockdev: Allow more options for BB-less BDS tree

2015-10-12 Thread Max Reitz
Most of the options which blockdev_init() parses for both the
BlockBackend and the root BDS are valid for just the root BDS as well
(e.g. read-only). This patch allows specifying these options even if not
creating a BlockBackend.

Signed-off-by: Max Reitz 
---
 blockdev.c | 160 ++---
 1 file changed, 154 insertions(+), 6 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index e0f04dd..69a6cb2 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -612,6 +612,65 @@ err_no_opts:
 return NULL;
 }
 
+static QemuOptsList qemu_root_bds_opts;
+
+/* Takes the ownership of bs_opts */
+static BlockDriverState *bds_tree_init(QDict *bs_opts, Error **errp)
+{
+BlockDriverState *bs;
+QemuOpts *opts;
+Error *local_error = NULL;
+ThrottleConfig cfg;
+BlockdevDetectZeroesOptions detect_zeroes;
+const char *throttling_group = NULL;
+int ret;
+int bdrv_flags = 0;
+
+opts = qemu_opts_create(&qemu_root_bds_opts, NULL, 1, errp);
+if (!opts) {
+goto fail;
+}
+
+qemu_opts_absorb_qdict(opts, bs_opts, &local_error);
+if (local_error) {
+error_propagate(errp, local_error);
+goto fail;
+}
+
+extract_common_blockdev_options(opts, &bdrv_flags, &cfg, &detect_zeroes,
+&throttling_group, &local_error);
+if (local_error) {
+error_propagate(errp, local_error);
+goto fail;
+}
+
+bs = NULL;
+ret = bdrv_open(&bs, NULL, NULL, bs_opts, bdrv_flags, errp);
+if (ret < 0) {
+goto fail_no_bs_opts;
+}
+
+bs->detect_zeroes = detect_zeroes;
+
+/* disk I/O throttling */
+if (throttle_enabled(&cfg)) {
+if (!throttling_group) {
+throttling_group = bdrv_get_node_name(bs);
+}
+bdrv_io_limits_enable(bs, throttling_group);
+bdrv_set_io_limits(bs, &cfg);
+}
+
+fail_no_bs_opts:
+qemu_opts_del(opts);
+return bs;
+
+fail:
+qemu_opts_del(opts);
+QDECREF(bs_opts);
+return NULL;
+}
+
 static void qemu_opt_rename(QemuOpts *opts, const char *from, const char *to,
 Error **errp)
 {
@@ -3170,18 +3229,14 @@ void qmp_blockdev_add(BlockdevOptions *options, Error 
**errp)
 
 bs = blk_bs(blk);
 } else {
-int ret;
-
 if (!qdict_get_try_str(qdict, "node-name")) {
 error_setg(errp, "'id' and/or 'node-name' need to be specified for 
"
"the root node");
 goto fail;
 }
 
-bs = NULL;
-ret = bdrv_open(&bs, NULL, NULL, qdict, BDRV_O_RDWR | BDRV_O_CACHE_WB,
-errp);
-if (ret < 0) {
+bs = bds_tree_init(qdict, errp);
+if (!bs) {
 goto fail;
 }
 }
@@ -3336,6 +3391,99 @@ QemuOptsList qemu_common_drive_opts = {
 },
 };
 
+static QemuOptsList qemu_root_bds_opts = {
+.name = "root-bds",
+.head = QTAILQ_HEAD_INITIALIZER(qemu_common_drive_opts.head),
+.desc = {
+{
+.name = "discard",
+.type = QEMU_OPT_STRING,
+.help = "discard operation (ignore/off, unmap/on)",
+},{
+.name = "cache.writeback",
+.type = QEMU_OPT_BOOL,
+.help = "enables writeback mode for any caches",
+},{
+.name = "cache.direct",
+.type = QEMU_OPT_BOOL,
+.help = "enables use of O_DIRECT (bypass the host page cache)",
+},{
+.name = "cache.no-flush",
+.type = QEMU_OPT_BOOL,
+.help = "ignore any flush requests for the device",
+},{
+.name = "aio",
+.type = QEMU_OPT_STRING,
+.help = "host AIO implementation (threads, native)",
+},{
+.name = "read-only",
+.type = QEMU_OPT_BOOL,
+.help = "open drive file as read-only",
+},{
+.name = "throttling.iops-total",
+.type = QEMU_OPT_NUMBER,
+.help = "limit total I/O operations per second",
+},{
+.name = "throttling.iops-read",
+.type = QEMU_OPT_NUMBER,
+.help = "limit read operations per second",
+},{
+.name = "throttling.iops-write",
+.type = QEMU_OPT_NUMBER,
+.help = "limit write operations per second",
+},{
+.name = "throttling.bps-total",
+.type = QEMU_OPT_NUMBER,
+.help = "limit total bytes per second",
+},{
+.name = "throttling.bps-read",
+.type = QEMU_OPT_NUMBER,
+.help = "limit read bytes per second",
+},{
+.name = "throttling.bps-write",
+.type = QEMU_OPT_NUMBER,
+.help = "limit write bytes per second",
+},{
+.name = "throttling.iops-total-max",
+.type = QEMU_OPT_NUMBER,
+.help = "I/O operations burst",
+},{
+  

[Qemu-block] [PATCH v6 24/39] blockdev: Do not create BDS for empty drive

2015-10-12 Thread Max Reitz
Do not use "rudimentary" BDSs for empty drives any longer (for
freshly created drives).

After a follow-up patch, empty drives will generally use a NULL BDS, not
only the freshly created drives.

Signed-off-by: Max Reitz 
---
 blockdev.c | 72 ++
 1 file changed, 44 insertions(+), 28 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 35efe84..845a1c1 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -514,16 +514,44 @@ static BlockBackend *blockdev_init(const char *file, 
QDict *bs_opts,
 goto early_err;
 }
 
+if (snapshot) {
+/* always use cache=unsafe with snapshot */
+bdrv_flags &= ~BDRV_O_CACHE_MASK;
+bdrv_flags |= (BDRV_O_SNAPSHOT|BDRV_O_CACHE_WB|BDRV_O_NO_FLUSH);
+}
+
+if (copy_on_read) {
+bdrv_flags |= BDRV_O_COPY_ON_READ;
+}
+
+if (runstate_check(RUN_STATE_INMIGRATE)) {
+bdrv_flags |= BDRV_O_INCOMING;
+}
+
+bdrv_flags |= ro ? 0 : BDRV_O_RDWR;
+
 /* init */
 if ((!file || !*file) && !has_driver_specific_opts) {
-blk = blk_new_with_bs(qemu_opts_id(opts), errp);
+BlockBackendRootState *blk_rs;
+
+blk = blk_new(qemu_opts_id(opts), errp);
 if (!blk) {
 goto early_err;
 }
 
-bs = blk_bs(blk);
-bs->open_flags = snapshot ? BDRV_O_SNAPSHOT : 0;
-bs->read_only = ro;
+blk_rs = blk_get_root_state(blk);
+blk_rs->open_flags= bdrv_flags;
+blk_rs->read_only = ro;
+blk_rs->detect_zeroes = detect_zeroes;
+
+if (throttle_enabled(&cfg)) {
+if (!throttling_group) {
+throttling_group = blk_name(blk);
+}
+blk_rs->throttle_group = g_strdup(throttling_group);
+blk_rs->throttle_state = throttle_group_incref(throttling_group);
+blk_rs->throttle_state->cfg = cfg;
+}
 
 QDECREF(bs_opts);
 } else {
@@ -531,42 +559,30 @@ static BlockBackend *blockdev_init(const char *file, 
QDict *bs_opts,
 file = NULL;
 }
 
-if (snapshot) {
-/* always use cache=unsafe with snapshot */
-bdrv_flags &= ~BDRV_O_CACHE_MASK;
-bdrv_flags |= (BDRV_O_SNAPSHOT|BDRV_O_CACHE_WB|BDRV_O_NO_FLUSH);
-}
-
-if (copy_on_read) {
-bdrv_flags |= BDRV_O_COPY_ON_READ;
-}
-
-bdrv_flags |= ro ? 0 : BDRV_O_RDWR;
-
 blk = blk_new_open(qemu_opts_id(opts), file, NULL, bs_opts, bdrv_flags,
errp);
 if (!blk) {
 goto err_no_bs_opts;
 }
 bs = blk_bs(blk);
-}
 
-bs->detect_zeroes = detect_zeroes;
+bs->detect_zeroes = detect_zeroes;
 
-blk_set_on_error(blk, on_read_error, on_write_error);
+/* disk I/O throttling */
+if (throttle_enabled(&cfg)) {
+if (!throttling_group) {
+throttling_group = blk_name(blk);
+}
+bdrv_io_limits_enable(bs, throttling_group);
+bdrv_set_io_limits(bs, &cfg);
+}
 
-/* disk I/O throttling */
-if (throttle_enabled(&cfg)) {
-if (!throttling_group) {
-throttling_group = blk_name(blk);
+if (bdrv_key_required(bs)) {
+autostart = 0;
 }
-bdrv_io_limits_enable(bs, throttling_group);
-bdrv_set_io_limits(bs, &cfg);
 }
 
-if (bdrv_key_required(bs)) {
-autostart = 0;
-}
+blk_set_on_error(blk, on_read_error, on_write_error);
 
 err_no_bs_opts:
 qemu_opts_del(opts);
-- 
2.6.1




[Qemu-block] [PATCH v6 22/39] block: Add blk_insert_bs()

2015-10-12 Thread Max Reitz
This function associates the given BlockDriverState with the given
BlockBackend.

Signed-off-by: Max Reitz 
---
 block/block-backend.c  | 11 +++
 include/sysemu/block-backend.h |  1 +
 2 files changed, 12 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index a5c58c5..19fdaae 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -334,6 +334,17 @@ void blk_hide_on_behalf_of_hmp_drive_del(BlockBackend *blk)
 }
 
 /*
+ * Associates a new BlockDriverState with @blk.
+ */
+void blk_insert_bs(BlockBackend *blk, BlockDriverState *bs)
+{
+assert(!blk->bs && !bs->blk);
+bdrv_ref(bs);
+blk->bs = bs;
+bs->blk = blk;
+}
+
+/*
  * Attach device model @dev to @blk.
  * Return 0 on success, -EBUSY when a device model is attached already.
  */
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 52e35a1..9306a52 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -72,6 +72,7 @@ BlockBackend *blk_by_name(const char *name);
 BlockBackend *blk_next(BlockBackend *blk);
 
 BlockDriverState *blk_bs(BlockBackend *blk);
+void blk_insert_bs(BlockBackend *blk, BlockDriverState *bs);
 
 void blk_hide_on_behalf_of_hmp_drive_del(BlockBackend *blk);
 
-- 
2.6.1




[Qemu-block] [PATCH v6 31/39] blockdev: Add blockdev-insert-medium

2015-10-12 Thread Max Reitz
And a helper function for that, which directly takes a pointer to the
BDS to be inserted instead of its node-name (which will be used for
implementing 'change' using blockdev-insert-medium).

Signed-off-by: Max Reitz 
---
 blockdev.c   | 54 
 qapi/block-core.json | 17 +
 qmp-commands.hx  | 37 +++
 3 files changed, 108 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 6d0a5eb..706e7e1 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2161,6 +2161,60 @@ void qmp_blockdev_remove_medium(const char *device, 
Error **errp)
 }
 }
 
+static void qmp_blockdev_insert_anon_medium(const char *device,
+BlockDriverState *bs, Error **errp)
+{
+BlockBackend *blk;
+bool has_device;
+
+blk = blk_by_name(device);
+if (!blk) {
+error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
+  "Device '%s' not found", device);
+return;
+}
+
+/* For BBs without a device, we can exchange the BDS tree at will */
+has_device = blk_get_attached_dev(blk);
+
+if (has_device && !blk_dev_has_removable_media(blk)) {
+error_setg(errp, "Device '%s' is not removable", device);
+return;
+}
+
+if (has_device && !blk_dev_is_tray_open(blk)) {
+error_setg(errp, "Tray of device '%s' is not open", device);
+return;
+}
+
+if (blk_bs(blk)) {
+error_setg(errp, "There already is a medium in device '%s'", device);
+return;
+}
+
+blk_insert_bs(blk, bs);
+}
+
+void qmp_blockdev_insert_medium(const char *device, const char *node_name,
+Error **errp)
+{
+BlockDriverState *bs;
+
+bs = bdrv_find_node(node_name);
+if (!bs) {
+error_setg(errp, "Node '%s' not found", node_name);
+return;
+}
+
+if (bs->blk) {
+error_setg(errp, "Node '%s' is already in use by '%s'", node_name,
+   blk_name(bs->blk));
+return;
+}
+
+qmp_blockdev_insert_anon_medium(device, bs, errp);
+}
+
 /* throttling disk I/O limits */
 void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
int64_t bps_wr,
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 8edf5d9..81a1f19 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1930,6 +1930,23 @@
 { 'command': 'blockdev-remove-medium',
   'data': { 'device': 'str' } }
 
+##
+# @blockdev-insert-medium:
+#
+# Inserts a medium (a block driver state tree) into a block device. That block
+# device's tray must currently be open and there must be no medium inserted
+# already.
+#
+# @device:block device name
+#
+# @node-name: name of a node in the block driver state graph
+#
+# Since: 2.5
+##
+{ 'command': 'blockdev-insert-medium',
+  'data': { 'device': 'str',
+'node-name': 'str'} }
+
 
 ##
 # @BlockErrorAction
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 2d89e26..a9223ef 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -4040,6 +4040,43 @@ Example:
 EQMP
 
 {
+.name   = "blockdev-insert-medium",
+.args_type  = "device:s,node-name:s",
+.mhandler.cmd_new = qmp_marshal_blockdev_insert_medium,
+},
+
+SQMP
+blockdev-insert-medium
+--
+
+Inserts a medium (a block driver state tree) into a block device. That block
+device's tray must currently be open and there must be no medium inserted
+already.
+
+Arguments:
+
+- "device": block device name (json-string)
+- "node-name": root node of the BDS tree to insert into the block device
+
+Example:
+
+-> { "execute": "blockdev-add",
+ "arguments": { "options": { "node-name": "node0",
+ "driver": "raw",
+ "file": { "driver": "file",
+   "filename": "fedora.iso" } } } }
+
+<- { "return": {} }
+
+-> { "execute": "blockdev-insert-medium",
+ "arguments": { "device": "ide1-cd0",
+"node-name": "node0" } }
+
+<- { "return": {} }
+
+EQMP
+
+{
 .name   = "query-named-block-nodes",
 .args_type  = "",
 .mhandler.cmd_new = qmp_marshal_query_named_block_nodes,
-- 
2.6.1




[Qemu-block] [PATCH v6 35/39] qmp: Introduce blockdev-change-medium

2015-10-12 Thread Max Reitz
Introduce a new QMP command 'blockdev-change-medium' which is intended
to replace the 'change' command for block devices. The existing function
qmp_change_blockdev() is accordingly renamed to
qmp_blockdev_change_medium().

Signed-off-by: Max Reitz 
---
 blockdev.c|  7 ---
 include/sysemu/blockdev.h |  2 --
 qapi-schema.json  |  6 --
 qapi/block-core.json  | 23 +++
 qmp-commands.hx   | 31 +++
 qmp.c |  2 +-
 6 files changed, 63 insertions(+), 8 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index bcfc29d..4ca8a8d 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2108,8 +2108,9 @@ void qmp_blockdev_insert_medium(const char *device, const 
char *node_name,
 qmp_blockdev_insert_anon_medium(device, bs, errp);
 }
 
-void qmp_change_blockdev(const char *device, const char *filename,
- const char *format, Error **errp)
+void qmp_blockdev_change_medium(const char *device, const char *filename,
+bool has_format, const char *format,
+Error **errp)
 {
 BlockBackend *blk;
 BlockBackendRootState *blk_rs;
@@ -2133,7 +2134,7 @@ void qmp_change_blockdev(const char *device, const char 
*filename,
 bdrv_flags = blk_rs->read_only ? 0 : BDRV_O_RDWR;
 bdrv_flags |= blk_rs->open_flags & ~BDRV_O_RDWR;
 
-if (format) {
+if (has_format) {
 options = qdict_new();
 qdict_put(options, "driver", qstring_from_str(format));
 }
diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h
index a00be94..b06a060 100644
--- a/include/sysemu/blockdev.h
+++ b/include/sysemu/blockdev.h
@@ -63,8 +63,6 @@ DriveInfo *drive_new(QemuOpts *arg, BlockInterfaceType 
block_default_type);
 
 /* device-hotplug */
 
-void qmp_change_blockdev(const char *device, const char *filename,
- const char *format, Error **errp);
 void hmp_commit(Monitor *mon, const QDict *qdict);
 void hmp_drive_del(Monitor *mon, const QDict *qdict);
 #endif
diff --git a/qapi-schema.json b/qapi-schema.json
index a386605..a9eda90 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1842,8 +1842,10 @@
 #  device's password.  The behavior of reads and writes to the block
 #  device between when these calls are executed is undefined.
 #
-# Notes:  It is strongly recommended that this interface is not used especially
-# for changing block devices.
+# Notes:  This interface is deprecated, and it is strongly recommended that you
+# avoid using it.  For changing block devices, use
+# blockdev-change-medium; for changing VNC parameters, use
+# change-vnc-password.
 #
 # Since: 0.14.0
 ##
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 81a1f19..b8cc18a 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1949,6 +1949,29 @@
 
 
 ##
+# @blockdev-change-medium:
+#
+# Changes the medium inserted into a block device by ejecting the current 
medium
+# and loading a new image file which is inserted as the new medium (this 
command
+# combines blockdev-open-tray, blockdev-remove-medium, blockdev-insert-medium
+# and blockdev-close-tray).
+#
+# @device:  block device name
+#
+# @filename:filename of the new image to be loaded
+#
+# @format:  #optional, format to open the new image with (defaults to
+#   the probed format)
+#
+# Since: 2.5
+##
+{ 'command': 'blockdev-change-medium',
+  'data': { 'device': 'str',
+'filename': 'str',
+'*format': 'str' } }
+
+
+##
 # @BlockErrorAction
 #
 # An enumeration of action that has been taken when a DISK I/O occurs
diff --git a/qmp-commands.hx b/qmp-commands.hx
index a9223ef..7a143a3 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -4139,6 +4139,37 @@ Example:
 EQMP
 
 {
+.name   = "blockdev-change-medium",
+.args_type  = "device:B,filename:F,format:s?",
+.mhandler.cmd_new = qmp_marshal_blockdev_change_medium,
+},
+
+SQMP
+blockdev-change-medium
+--
+
+Changes the medium inserted into a block device by ejecting the current medium
+and loading a new image file which is inserted as the new medium.
+
+Arguments:
+
+- "device": device name (json-string)
+- "filename": filename of the new image (json-string)
+- "format": format of the new image (json-string, optional)
+
+Examples:
+
+1. Change a removable medium
+
+-> { "execute": "blockdev-change-medium",
+ "arguments": { "device": "ide1-cd0",
+"filename": "/srv/images/Fedora-12-x86_64-DVD.iso",
+"format": "raw" } }
+<- { "return": {} }
+
+EQMP
+
+{
 .name   = "query-memdev",
 .args_type  = "",
 .mhandler.cmd_new = qmp_marshal_query_memdev,
diff --git a/qmp.c b/qmp.c
index ff54e5a..4e44f98 100644
--- a/qmp.c
+++ b/qmp.c
@@ -414,7 +414,7 @@ void qm

[Qemu-block] [PATCH v6 30/39] blockdev: Add blockdev-remove-medium

2015-10-12 Thread Max Reitz
Signed-off-by: Max Reitz 
---
 blockdev.c   | 30 ++
 qapi/block-core.json | 15 +++
 qmp-commands.hx  | 45 +
 3 files changed, 90 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index a4ce1df..6d0a5eb 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2131,6 +2131,36 @@ void qmp_blockdev_close_tray(const char *device, Error 
**errp)
 blk_dev_change_media_cb(blk, true);
 }
 
+void qmp_blockdev_remove_medium(const char *device, Error **errp)
+{
+BlockBackend *blk;
+bool has_device;
+
+blk = blk_by_name(device);
+if (!blk) {
+error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
+  "Device '%s' not found", device);
+return;
+}
+
+/* For BBs without a device, we can exchange the BDS tree at will */
+has_device = blk_get_attached_dev(blk);
+
+if (has_device && !blk_dev_has_removable_media(blk)) {
+error_setg(errp, "Device '%s' is not removable", device);
+return;
+}
+
+if (has_device && !blk_dev_is_tray_open(blk)) {
+error_setg(errp, "Tray of device '%s' is not open", device);
+return;
+}
+
+if (blk_bs(blk)) {
+blk_remove_bs(blk);
+}
+}
+
 /* throttling disk I/O limits */
 void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
int64_t bps_wr,
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 1a51829..8edf5d9 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1915,6 +1915,21 @@
 { 'command': 'blockdev-close-tray',
   'data': { 'device': 'str' } }
 
+##
+# @blockdev-remove-medium:
+#
+# Removes a medium (a block driver state tree) from a block device. That block
+# device's tray must currently be open.
+#
+# If the tray is open and there is no medium inserted, this will be a no-op.
+#
+# @device: block device name
+#
+# Since: 2.5
+##
+{ 'command': 'blockdev-remove-medium',
+  'data': { 'device': 'str' } }
+
 
 ##
 # @BlockErrorAction
diff --git a/qmp-commands.hx b/qmp-commands.hx
index fae2d33..2d89e26 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -3995,6 +3995,51 @@ Example:
 EQMP
 
 {
+.name   = "blockdev-remove-medium",
+.args_type  = "device:s",
+.mhandler.cmd_new = qmp_marshal_blockdev_remove_medium,
+},
+
+SQMP
+blockdev-remove-medium
+--
+
+Removes a medium (a block driver state tree) from a block device. That block
+device's tray must currently be open.
+
+If the tray is open and there is no medium inserted, this will be a no-op.
+
+Arguments:
+
+- "device": block device name (json-string)
+
+Example:
+
+-> { "execute": "blockdev-remove-medium",
+ "arguments": { "device": "ide1-cd0" } }
+
+<- { "error": { "class": "GenericError",
+"desc": "Tray of device 'ide1-cd0' is not open" } }
+
+-> { "execute": "blockdev-open-tray",
+ "arguments": { "device": "ide1-cd0" } }
+
+<- { "timestamp": { "seconds": 1418751627,
+"microseconds": 549958 },
+ "event": "DEVICE_TRAY_MOVED",
+ "data": { "device": "ide1-cd0",
+   "tray-open": true } }
+
+<- { "return": {} }
+
+-> { "execute": "blockdev-remove-medium",
+ "arguments": { "device": "ide1-cd0" } }
+
+<- { "return": {} }
+
+EQMP
+
+{
 .name   = "query-named-block-nodes",
 .args_type  = "",
 .mhandler.cmd_new = qmp_marshal_query_named_block_nodes,
-- 
2.6.1




[Qemu-block] [PATCH v6 34/39] block: Inquire tray state before tray-moved events

2015-10-12 Thread Max Reitz
blk_dev_change_media_cb() is called for all potential tray movements;
however, it is possible to request closing the tray but nothing actually
happening (on a floppy disk drive without a medium).

Thus, the actual tray status should be inquired before sending a
tray-moved event (and an event should be sent whenever the status
changed).

Checking @load is now superfluous; it was necessary because it was
possible to change a medium without having explicitly opened the tray
and closed it again (or it might have been possible, at least). This is
no longer possible, though.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block/block-backend.c | 17 +++--
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index eb7409c..10e4d71 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -429,18 +429,15 @@ void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps 
*ops,
 void blk_dev_change_media_cb(BlockBackend *blk, bool load)
 {
 if (blk->dev_ops && blk->dev_ops->change_media_cb) {
-bool tray_was_closed = !blk_dev_is_tray_open(blk);
+bool tray_was_open, tray_is_open;
 
+tray_was_open = blk_dev_is_tray_open(blk);
 blk->dev_ops->change_media_cb(blk->dev_opaque, load);
-if (tray_was_closed) {
-/* tray open */
-qapi_event_send_device_tray_moved(blk_name(blk),
-  true, &error_abort);
-}
-if (load) {
-/* tray close */
-qapi_event_send_device_tray_moved(blk_name(blk),
-  false, &error_abort);
+tray_is_open = blk_dev_is_tray_open(blk);
+
+if (tray_was_open != tray_is_open) {
+qapi_event_send_device_tray_moved(blk_name(blk), tray_is_open,
+  &error_abort);
 }
 }
 }
-- 
2.6.1




[Qemu-block] [PATCH v6 29/39] blockdev: Add blockdev-close-tray

2015-10-12 Thread Max Reitz
Signed-off-by: Max Reitz 
---
 blockdev.c   | 23 +++
 qapi/block-core.json | 16 
 qmp-commands.hx  | 35 +++
 3 files changed, 74 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index b90b1d6..a4ce1df 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2108,6 +2108,29 @@ out:
 }
 }
 
+void qmp_blockdev_close_tray(const char *device, Error **errp)
+{
+BlockBackend *blk;
+
+blk = blk_by_name(device);
+if (!blk) {
+error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
+  "Device '%s' not found", device);
+return;
+}
+
+if (!blk_dev_has_removable_media(blk)) {
+error_setg(errp, "Device '%s' is not removable", device);
+return;
+}
+
+if (!blk_dev_is_tray_open(blk)) {
+return;
+}
+
+blk_dev_change_media_cb(blk, true);
+}
+
 /* throttling disk I/O limits */
 void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
int64_t bps_wr,
diff --git a/qapi/block-core.json b/qapi/block-core.json
index b9b4a24..1a51829 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1899,6 +1899,22 @@
   'data': { 'device': 'str',
 '*force': 'bool' } }
 
+##
+# @blockdev-close-tray:
+#
+# Closes a block device's tray. If there is a block driver state tree 
associated
+# with the block device (which is currently ejected), that tree will be loaded
+# as the medium.
+#
+# If the tray was already closed before, this will be a no-op.
+#
+# @device: block device name
+#
+# Since: 2.5
+##
+{ 'command': 'blockdev-close-tray',
+  'data': { 'device': 'str' } }
+
 
 ##
 # @BlockErrorAction
diff --git a/qmp-commands.hx b/qmp-commands.hx
index f20681a..fae2d33 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -3960,6 +3960,41 @@ Example:
 EQMP
 
 {
+.name   = "blockdev-close-tray",
+.args_type  = "device:s",
+.mhandler.cmd_new = qmp_marshal_blockdev_close_tray,
+},
+
+SQMP
+blockdev-close-tray
+---
+
+Closes a block device's tray. If there is a block driver state tree associated
+with the block device (which is currently ejected), that tree will be loaded as
+the medium.
+
+If the tray was already closed before, this will be a no-op.
+
+Arguments:
+
+- "device": block device name (json-string)
+
+Example:
+
+-> { "execute": "blockdev-close-tray",
+ "arguments": { "device": "ide1-cd0" } }
+
+<- { "timestamp": { "seconds": 1418751345,
+"microseconds": 272147 },
+ "event": "DEVICE_TRAY_MOVED",
+ "data": { "device": "ide1-cd0",
+   "tray-open": false } }
+
+<- { "return": {} }
+
+EQMP
+
+{
 .name   = "query-named-block-nodes",
 .args_type  = "",
 .mhandler.cmd_new = qmp_marshal_query_named_block_nodes,
-- 
2.6.1




[Qemu-block] [PATCH v6 32/39] blockdev: Implement eject with basic operations

2015-10-12 Thread Max Reitz
Implement 'eject' by calling blockdev-open-tray and
blockdev-remove-medium.

Signed-off-by: Max Reitz 
---
 blockdev.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 706e7e1..ff3b353 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1952,16 +1952,15 @@ out:
 
 void qmp_eject(const char *device, bool has_force, bool force, Error **errp)
 {
-BlockBackend *blk;
+Error *local_err = NULL;
 
-blk = blk_by_name(device);
-if (!blk) {
-error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
-  "Device '%s' not found", device);
+qmp_blockdev_open_tray(device, has_force, force, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
 return;
 }
 
-eject_device(blk, force, errp);
+qmp_blockdev_remove_medium(device, errp);
 }
 
 void qmp_block_passwd(bool has_device, const char *device,
-- 
2.6.1




[Qemu-block] [PATCH v6 17/39] block/throttle-groups: Make incref/decref public

2015-10-12 Thread Max Reitz
Throttle groups are not necessarily referenced by BDSs alone; a later
patch will essentially allow BBs to reference them, too. Make the
ref/unref functions public so that reference can be properly accounted
for.

Their interface is slightly adjusted in that they return and take a
ThrottleState pointer, respectively, instead of a ThrottleGroup pointer.
Functionally, they are equivalent, but since ThrottleGroup is not meant
to be used outside of block/throttle-groups.c, ThrottleState is easier
to handle.

Signed-off-by: Max Reitz 
---
 block/throttle-groups.c | 19 +++
 include/block/throttle-groups.h |  3 +++
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index 1abc6fc..20cb216 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -76,9 +76,9 @@ static QTAILQ_HEAD(, ThrottleGroup) throttle_groups =
  * created.
  *
  * @name: the name of the ThrottleGroup
- * @ret:  the ThrottleGroup
+ * @ret:  the ThrottleState member of the ThrottleGroup
  */
-static ThrottleGroup *throttle_group_incref(const char *name)
+ThrottleState *throttle_group_incref(const char *name)
 {
 ThrottleGroup *tg = NULL;
 ThrottleGroup *iter;
@@ -108,7 +108,7 @@ static ThrottleGroup *throttle_group_incref(const char 
*name)
 
 qemu_mutex_unlock(&throttle_groups_lock);
 
-return tg;
+return &tg->ts;
 }
 
 /* Decrease the reference count of a ThrottleGroup.
@@ -116,10 +116,12 @@ static ThrottleGroup *throttle_group_incref(const char 
*name)
  * When the reference count reaches zero the ThrottleGroup is
  * destroyed.
  *
- * @tg:  The ThrottleGroup to unref
+ * @ts:  The ThrottleGroup to unref, given by its ThrottleState member
  */
-static void throttle_group_unref(ThrottleGroup *tg)
+void throttle_group_unref(ThrottleState *ts)
 {
+ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
+
 qemu_mutex_lock(&throttle_groups_lock);
 if (--tg->refcount == 0) {
 QTAILQ_REMOVE(&throttle_groups, tg, list);
@@ -401,7 +403,8 @@ static void write_timer_cb(void *opaque)
 void throttle_group_register_bs(BlockDriverState *bs, const char *groupname)
 {
 int i;
-ThrottleGroup *tg = throttle_group_incref(groupname);
+ThrottleState *ts = throttle_group_incref(groupname);
+ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
 int clock_type = QEMU_CLOCK_REALTIME;
 
 if (qtest_enabled()) {
@@ -409,7 +412,7 @@ void throttle_group_register_bs(BlockDriverState *bs, const 
char *groupname)
 clock_type = QEMU_CLOCK_VIRTUAL;
 }
 
-bs->throttle_state = &tg->ts;
+bs->throttle_state = ts;
 
 qemu_mutex_lock(&tg->lock);
 /* If the ThrottleGroup is new set this BlockDriverState as the token */
@@ -461,7 +464,7 @@ void throttle_group_unregister_bs(BlockDriverState *bs)
 throttle_timers_destroy(&bs->throttle_timers);
 qemu_mutex_unlock(&tg->lock);
 
-throttle_group_unref(tg);
+throttle_group_unref(&tg->ts);
 bs->throttle_state = NULL;
 }
 
diff --git a/include/block/throttle-groups.h b/include/block/throttle-groups.h
index fab113f..f3b75b3 100644
--- a/include/block/throttle-groups.h
+++ b/include/block/throttle-groups.h
@@ -30,6 +30,9 @@
 
 const char *throttle_group_get_name(BlockDriverState *bs);
 
+ThrottleState *throttle_group_incref(const char *name);
+void throttle_group_unref(ThrottleState *ts);
+
 void throttle_group_config(BlockDriverState *bs, ThrottleConfig *cfg);
 void throttle_group_get_config(BlockDriverState *bs, ThrottleConfig *cfg);
 
-- 
2.6.1




[Qemu-block] [PATCH v6 16/39] block: Move I/O status and error actions into BB

2015-10-12 Thread Max Reitz
These options are only relevant for the user of a whole BDS tree (like a
guest device or a block job) and should thus be moved into the
BlockBackend.

Signed-off-by: Max Reitz 
---
 block.c| 125 -
 block/backup.c |  17 --
 block/block-backend.c  | 116 --
 block/commit.c |   3 +-
 block/mirror.c |  17 --
 block/qapi.c   |   4 +-
 block/stream.c |   3 +-
 blockdev.c |   6 +-
 blockjob.c |   5 +-
 include/block/block.h  |  11 
 include/block/block_int.h  |   6 --
 include/sysemu/block-backend.h |   7 +++
 qmp.c  |   6 +-
 13 files changed, 158 insertions(+), 168 deletions(-)

diff --git a/block.c b/block.c
index 3e13b7f..48f7067 100644
--- a/block.c
+++ b/block.c
@@ -257,7 +257,6 @@ BlockDriverState *bdrv_new(void)
 for (i = 0; i < BLOCK_OP_TYPE_MAX; i++) {
 QLIST_INIT(&bs->op_blockers[i]);
 }
-bdrv_iostatus_disable(bs);
 notifier_list_init(&bs->close_notifiers);
 notifier_with_return_list_init(&bs->before_write_notifiers);
 qemu_co_queue_init(&bs->throttled_reqs[0]);
@@ -1995,14 +1994,6 @@ static void bdrv_move_feature_fields(BlockDriverState 
*bs_dest,
 
 bs_dest->enable_write_cache = bs_src->enable_write_cache;
 
-/* r/w error */
-bs_dest->on_read_error  = bs_src->on_read_error;
-bs_dest->on_write_error = bs_src->on_write_error;
-
-/* i/o status */
-bs_dest->iostatus_enabled   = bs_src->iostatus_enabled;
-bs_dest->iostatus   = bs_src->iostatus;
-
 /* dirty bitmap */
 bs_dest->dirty_bitmaps  = bs_src->dirty_bitmaps;
 }
@@ -2489,82 +2480,6 @@ void bdrv_get_geometry(BlockDriverState *bs, uint64_t 
*nb_sectors_ptr)
 *nb_sectors_ptr = nb_sectors < 0 ? 0 : nb_sectors;
 }
 
-void bdrv_set_on_error(BlockDriverState *bs, BlockdevOnError on_read_error,
-   BlockdevOnError on_write_error)
-{
-bs->on_read_error = on_read_error;
-bs->on_write_error = on_write_error;
-}
-
-BlockdevOnError bdrv_get_on_error(BlockDriverState *bs, bool is_read)
-{
-return is_read ? bs->on_read_error : bs->on_write_error;
-}
-
-BlockErrorAction bdrv_get_error_action(BlockDriverState *bs, bool is_read, int 
error)
-{
-BlockdevOnError on_err = is_read ? bs->on_read_error : bs->on_write_error;
-
-switch (on_err) {
-case BLOCKDEV_ON_ERROR_ENOSPC:
-return (error == ENOSPC) ?
-   BLOCK_ERROR_ACTION_STOP : BLOCK_ERROR_ACTION_REPORT;
-case BLOCKDEV_ON_ERROR_STOP:
-return BLOCK_ERROR_ACTION_STOP;
-case BLOCKDEV_ON_ERROR_REPORT:
-return BLOCK_ERROR_ACTION_REPORT;
-case BLOCKDEV_ON_ERROR_IGNORE:
-return BLOCK_ERROR_ACTION_IGNORE;
-default:
-abort();
-}
-}
-
-static void send_qmp_error_event(BlockDriverState *bs,
- BlockErrorAction action,
- bool is_read, int error)
-{
-IoOperationType optype;
-
-optype = is_read ? IO_OPERATION_TYPE_READ : IO_OPERATION_TYPE_WRITE;
-qapi_event_send_block_io_error(bdrv_get_device_name(bs), optype, action,
-   bdrv_iostatus_is_enabled(bs),
-   error == ENOSPC, strerror(error),
-   &error_abort);
-}
-
-/* This is done by device models because, while the block layer knows
- * about the error, it does not know whether an operation comes from
- * the device or the block layer (from a job, for example).
- */
-void bdrv_error_action(BlockDriverState *bs, BlockErrorAction action,
-   bool is_read, int error)
-{
-assert(error >= 0);
-
-if (action == BLOCK_ERROR_ACTION_STOP) {
-/* First set the iostatus, so that "info block" returns an iostatus
- * that matches the events raised so far (an additional error iostatus
- * is fine, but not a lost one).
- */
-bdrv_iostatus_set_err(bs, error);
-
-/* Then raise the request to stop the VM and the event.
- * qemu_system_vmstop_request_prepare has two effects.  First,
- * it ensures that the STOP event always comes after the
- * BLOCK_IO_ERROR event.  Second, it ensures that even if management
- * can observe the STOP event and do a "cont" before the STOP
- * event is issued, the VM will not stop.  In this case, vm_start()
- * also ensures that the STOP/RESUME pair of events is emitted.
- */
-qemu_system_vmstop_request_prepare();
-send_qmp_error_event(bs, action, is_read, error);
-qemu_system_vmstop_request(RUN_STATE_IO_ERROR);
-} else {
-send_qmp_error_event(bs, action, is_read, error);
-}
-}
-
 int bdrv_is_read_only(BlockDriverState *bs)
 {
 return bs->read_only;
@@ -3592,

[Qemu-block] [PATCH v6 27/39] block: Add blk_remove_bs()

2015-10-12 Thread Max Reitz
This function removes the BlockDriverState associated with the given
BlockBackend from that BB and sets the BDS pointer in the BB to NULL.

Signed-off-by: Max Reitz 
---
 block/block-backend.c  | 12 
 include/sysemu/block-backend.h |  1 +
 2 files changed, 13 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 19fdaae..eb7409c 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -334,6 +334,18 @@ void blk_hide_on_behalf_of_hmp_drive_del(BlockBackend *blk)
 }
 
 /*
+ * Disassociates the currently associated BlockDriverState from @blk.
+ */
+void blk_remove_bs(BlockBackend *blk)
+{
+blk_update_root_state(blk);
+
+bdrv_unref(blk->bs);
+blk->bs->blk = NULL;
+blk->bs = NULL;
+}
+
+/*
  * Associates a new BlockDriverState with @blk.
  */
 void blk_insert_bs(BlockBackend *blk, BlockDriverState *bs)
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 9306a52..14a6d32 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -72,6 +72,7 @@ BlockBackend *blk_by_name(const char *name);
 BlockBackend *blk_next(BlockBackend *blk);
 
 BlockDriverState *blk_bs(BlockBackend *blk);
+void blk_remove_bs(BlockBackend *blk);
 void blk_insert_bs(BlockBackend *blk, BlockDriverState *bs);
 
 void blk_hide_on_behalf_of_hmp_drive_del(BlockBackend *blk);
-- 
2.6.1




[Qemu-block] [PATCH v6 33/39] blockdev: Implement change with basic operations

2015-10-12 Thread Max Reitz
Implement 'change' on block devices by calling blockdev-open-tray,
blockdev-remove-medium, blockdev-insert-medium (a variation of that
which does not need a node-name) and blockdev-close-tray.

Signed-off-by: Max Reitz 
---
 blockdev.c | 180 +
 1 file changed, 74 insertions(+), 106 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index ff3b353..bcfc29d 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1915,41 +1915,6 @@ exit:
 }
 }
 
-
-static void eject_device(BlockBackend *blk, int force, Error **errp)
-{
-BlockDriverState *bs = blk_bs(blk);
-AioContext *aio_context;
-
-aio_context = blk_get_aio_context(blk);
-aio_context_acquire(aio_context);
-
-if (bs && bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_EJECT, errp)) {
-goto out;
-}
-if (!blk_dev_has_removable_media(blk)) {
-error_setg(errp, "Device '%s' is not removable",
-   bdrv_get_device_name(bs));
-goto out;
-}
-
-if (blk_dev_is_medium_locked(blk) && !blk_dev_is_tray_open(blk)) {
-blk_dev_eject_request(blk, force);
-if (!force) {
-error_setg(errp, "Device '%s' is locked",
-   bdrv_get_device_name(bs));
-goto out;
-}
-}
-
-if (bs) {
-bdrv_close(bs);
-}
-
-out:
-aio_context_release(aio_context);
-}
-
 void qmp_eject(const char *device, bool has_force, bool force, Error **errp)
 {
 Error *local_err = NULL;
@@ -1987,77 +1952,6 @@ void qmp_block_passwd(bool has_device, const char 
*device,
 aio_context_release(aio_context);
 }
 
-/* Assumes AioContext is held */
-static void qmp_bdrv_open_encrypted(BlockDriverState **pbs,
-const char *filename,
-int bdrv_flags, const char *format,
-const char *password, Error **errp)
-{
-BlockDriverState *bs;
-Error *local_err = NULL;
-QDict *options = NULL;
-int ret;
-
-if (format) {
-options = qdict_new();
-qdict_put(options, "driver", qstring_from_str(format));
-}
-
-ret = bdrv_open(pbs, filename, NULL, options, bdrv_flags, &local_err);
-if (ret < 0) {
-error_propagate(errp, local_err);
-return;
-}
-bs = *pbs;
-
-bdrv_add_key(bs, password, errp);
-}
-
-void qmp_change_blockdev(const char *device, const char *filename,
- const char *format, Error **errp)
-{
-BlockBackend *blk;
-BlockDriverState *bs;
-AioContext *aio_context;
-int bdrv_flags;
-bool new_bs;
-Error *err = NULL;
-
-blk = blk_by_name(device);
-if (!blk) {
-error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
-  "Device '%s' not found", device);
-return;
-}
-bs = blk_bs(blk);
-new_bs = !bs;
-
-aio_context = blk_get_aio_context(blk);
-aio_context_acquire(aio_context);
-
-eject_device(blk, 0, &err);
-if (err) {
-error_propagate(errp, err);
-goto out;
-}
-
-bdrv_flags = blk_is_read_only(blk) ? 0 : BDRV_O_RDWR;
-bdrv_flags |= blk_get_root_state(blk)->open_flags & ~BDRV_O_RDWR;
-
-qmp_bdrv_open_encrypted(&bs, filename, bdrv_flags, format, NULL, &err);
-if (err) {
-error_propagate(errp, err);
-} else if (new_bs) {
-blk_insert_bs(blk, bs);
-/* Has been sent automatically by bdrv_open() if blk_bs(blk) was not
- * NULL */
-blk_dev_change_media_cb(blk, true);
-}
-
-out:
-aio_context_release(aio_context);
-}
-
 void qmp_blockdev_open_tray(const char *device, bool has_force, bool force,
 Error **errp)
 {
@@ -2214,6 +2108,80 @@ void qmp_blockdev_insert_medium(const char *device, 
const char *node_name,
 qmp_blockdev_insert_anon_medium(device, bs, errp);
 }
 
+void qmp_change_blockdev(const char *device, const char *filename,
+ const char *format, Error **errp)
+{
+BlockBackend *blk;
+BlockBackendRootState *blk_rs;
+BlockDriverState *medium_bs = NULL;
+int bdrv_flags, ret;
+QDict *options = NULL;
+Error *err = NULL;
+
+blk = blk_by_name(device);
+if (!blk) {
+error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
+  "Device '%s' not found", device);
+goto fail;
+}
+
+if (blk_bs(blk)) {
+blk_update_root_state(blk);
+}
+
+blk_rs = blk_get_root_state(blk);
+bdrv_flags = blk_rs->read_only ? 0 : BDRV_O_RDWR;
+bdrv_flags |= blk_rs->open_flags & ~BDRV_O_RDWR;
+
+if (format) {
+options = qdict_new();
+qdict_put(options, "driver", qstring_from_str(format));
+}
+
+assert(!medium_bs);
+ret = bdrv_open(&medium_bs, filename, NULL, options, bdrv_flags, errp);
+if (ret < 0) {
+goto fail;
+}
+
+medium_bs->detect_zeroes = blk_rs->detect_zeroes;
+if (blk_rs->throttle_group) {
+bdrv_io_limit

[Qemu-block] [PATCH v6 25/39] blockdev: Pull out blockdev option extraction

2015-10-12 Thread Max Reitz
Extract some of the blockdev option extraction code from blockdev_init()
into its own function. This simplifies blockdev_init() and will allow
reusing the code in a different function added in a follow-up patch.

Signed-off-by: Max Reitz 
Reviewed-by: Alberto Garcia 
---
 blockdev.c | 209 +
 1 file changed, 113 insertions(+), 96 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 845a1c1..e0f04dd 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -350,25 +350,128 @@ static bool check_throttle_config(ThrottleConfig *cfg, 
Error **errp)
 
 typedef enum { MEDIA_DISK, MEDIA_CDROM } DriveMediaType;
 
+static void extract_common_blockdev_options(QemuOpts *opts, int *bdrv_flags,
+ThrottleConfig *throttle_cfg, BlockdevDetectZeroesOptions *detect_zeroes,
+const char **throttling_group, Error **errp)
+{
+const char *discard;
+Error *local_error = NULL;
+#ifdef CONFIG_LINUX_AIO
+const char *aio;
+#endif
+
+if (!qemu_opt_get_bool(opts, "read-only", false)) {
+*bdrv_flags |= BDRV_O_RDWR;
+}
+if (qemu_opt_get_bool(opts, "copy-on-read", false)) {
+*bdrv_flags |= BDRV_O_COPY_ON_READ;
+}
+
+if ((discard = qemu_opt_get(opts, "discard")) != NULL) {
+if (bdrv_parse_discard_flags(discard, bdrv_flags) != 0) {
+error_setg(errp, "Invalid discard option");
+return;
+}
+}
+
+if (qemu_opt_get_bool(opts, BDRV_OPT_CACHE_WB, true)) {
+*bdrv_flags |= BDRV_O_CACHE_WB;
+}
+if (qemu_opt_get_bool(opts, BDRV_OPT_CACHE_DIRECT, false)) {
+*bdrv_flags |= BDRV_O_NOCACHE;
+}
+if (qemu_opt_get_bool(opts, BDRV_OPT_CACHE_NO_FLUSH, false)) {
+*bdrv_flags |= BDRV_O_NO_FLUSH;
+}
+
+#ifdef CONFIG_LINUX_AIO
+if ((aio = qemu_opt_get(opts, "aio")) != NULL) {
+if (!strcmp(aio, "native")) {
+*bdrv_flags |= BDRV_O_NATIVE_AIO;
+} else if (!strcmp(aio, "threads")) {
+/* this is the default */
+} else {
+   error_setg(errp, "invalid aio option");
+   return;
+}
+}
+#endif
+
+/* disk I/O throttling */
+memset(throttle_cfg, 0, sizeof(*throttle_cfg));
+throttle_cfg->buckets[THROTTLE_BPS_TOTAL].avg =
+qemu_opt_get_number(opts, "throttling.bps-total", 0);
+throttle_cfg->buckets[THROTTLE_BPS_READ].avg  =
+qemu_opt_get_number(opts, "throttling.bps-read", 0);
+throttle_cfg->buckets[THROTTLE_BPS_WRITE].avg =
+qemu_opt_get_number(opts, "throttling.bps-write", 0);
+throttle_cfg->buckets[THROTTLE_OPS_TOTAL].avg =
+qemu_opt_get_number(opts, "throttling.iops-total", 0);
+throttle_cfg->buckets[THROTTLE_OPS_READ].avg =
+qemu_opt_get_number(opts, "throttling.iops-read", 0);
+throttle_cfg->buckets[THROTTLE_OPS_WRITE].avg =
+qemu_opt_get_number(opts, "throttling.iops-write", 0);
+
+throttle_cfg->buckets[THROTTLE_BPS_TOTAL].max =
+qemu_opt_get_number(opts, "throttling.bps-total-max", 0);
+throttle_cfg->buckets[THROTTLE_BPS_READ].max  =
+qemu_opt_get_number(opts, "throttling.bps-read-max", 0);
+throttle_cfg->buckets[THROTTLE_BPS_WRITE].max =
+qemu_opt_get_number(opts, "throttling.bps-write-max", 0);
+throttle_cfg->buckets[THROTTLE_OPS_TOTAL].max =
+qemu_opt_get_number(opts, "throttling.iops-total-max", 0);
+throttle_cfg->buckets[THROTTLE_OPS_READ].max =
+qemu_opt_get_number(opts, "throttling.iops-read-max", 0);
+throttle_cfg->buckets[THROTTLE_OPS_WRITE].max =
+qemu_opt_get_number(opts, "throttling.iops-write-max", 0);
+
+throttle_cfg->op_size =
+qemu_opt_get_number(opts, "throttling.iops-size", 0);
+
+*throttling_group = qemu_opt_get(opts, "throttling.group");
+
+if (!check_throttle_config(throttle_cfg, errp)) {
+return;
+}
+
+*detect_zeroes =
+qapi_enum_parse(BlockdevDetectZeroesOptions_lookup,
+qemu_opt_get(opts, "detect-zeroes"),
+BLOCKDEV_DETECT_ZEROES_OPTIONS_MAX,
+BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF,
+&local_error);
+if (local_error) {
+error_propagate(errp, local_error);
+return;
+}
+
+if (*detect_zeroes == BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP &&
+!(*bdrv_flags & BDRV_O_UNMAP))
+{
+error_setg(errp, "setting detect-zeroes to unmap is not allowed "
+ "without setting discard operation to unmap");
+return;
+}
+}
+
 /* Takes the ownership of bs_opts */
 static BlockBackend *blockdev_init(const char *file, QDict *bs_opts,
Error **errp)
 {
 const char *buf;
-int ro = 0;
 int bdrv_flags = 0;
 int on_read_error, on_write_error;
 BlockBackend *blk;
 BlockDriverState *bs;
 ThrottleConfig cfg;
 int snapshot = 0;
-bool copy_on_read;
 Error *error = NULL;
 QemuO

[Qemu-block] [PATCH v6 37/39] blockdev: read-only-mode for blockdev-change-medium

2015-10-12 Thread Max Reitz
Add an option to qmp_blockdev_change_medium() which allows changing the
read-only status of the block device whose medium is changed.

Some drives do not have a inherently fixed read-only status; for
instance, floppy disks can be set read-only or writable independently of
the drive. Some users may find it useful to be able to therefore change
the read-only status of a block device when changing the medium.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 blockdev.c   | 25 -
 hmp.c|  2 +-
 qapi/block-core.json | 24 +++-
 qmp-commands.hx  | 24 +++-
 qmp.c|  3 ++-
 5 files changed, 73 insertions(+), 5 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 4ca8a8d..2360c1f 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2110,6 +2110,8 @@ void qmp_blockdev_insert_medium(const char *device, const 
char *node_name,
 
 void qmp_blockdev_change_medium(const char *device, const char *filename,
 bool has_format, const char *format,
+bool has_read_only,
+BlockdevChangeReadOnlyMode read_only,
 Error **errp)
 {
 BlockBackend *blk;
@@ -2131,7 +2133,28 @@ void qmp_blockdev_change_medium(const char *device, 
const char *filename,
 }
 
 blk_rs = blk_get_root_state(blk);
-bdrv_flags = blk_rs->read_only ? 0 : BDRV_O_RDWR;
+
+if (!has_read_only) {
+read_only = BLOCKDEV_CHANGE_READ_ONLY_MODE_RETAIN;
+}
+
+switch (read_only) {
+case BLOCKDEV_CHANGE_READ_ONLY_MODE_RETAIN:
+bdrv_flags = blk_rs->read_only ? 0 : BDRV_O_RDWR;
+break;
+
+case BLOCKDEV_CHANGE_READ_ONLY_MODE_READ_ONLY:
+bdrv_flags = 0;
+break;
+
+case BLOCKDEV_CHANGE_READ_ONLY_MODE_READ_WRITE:
+bdrv_flags = BDRV_O_RDWR;
+break;
+
+default:
+abort();
+}
+
 bdrv_flags |= blk_rs->open_flags & ~BDRV_O_RDWR;
 
 if (has_format) {
diff --git a/hmp.c b/hmp.c
index b91821b..9e6b7e5 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1348,7 +1348,7 @@ void hmp_change(Monitor *mon, const QDict *qdict)
 }
 qmp_change("vnc", target, !!arg, arg, &err);
 } else {
-qmp_blockdev_change_medium(device, target, !!arg, arg, &err);
+qmp_blockdev_change_medium(device, target, !!arg, arg, false, 0, &err);
 if (err &&
 error_get_class(err) == ERROR_CLASS_DEVICE_ENCRYPTED) {
 error_free(err);
diff --git a/qapi/block-core.json b/qapi/block-core.json
index b8cc18a..5f12af7 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1949,6 +1949,24 @@
 
 
 ##
+# @BlockdevChangeReadOnlyMode:
+#
+# Specifies the new read-only mode of a block device subject to the
+# @blockdev-change-medium command.
+#
+# @retain:  Retains the current read-only mode
+#
+# @read-only:   Makes the device read-only
+#
+# @read-write:  Makes the device writable
+#
+# Since: 2.3
+##
+{ 'enum': 'BlockdevChangeReadOnlyMode',
+  'data': ['retain', 'read-only', 'read-write'] }
+
+
+##
 # @blockdev-change-medium:
 #
 # Changes the medium inserted into a block device by ejecting the current 
medium
@@ -1963,12 +1981,16 @@
 # @format:  #optional, format to open the new image with (defaults to
 #   the probed format)
 #
+# @read-only-mode:  #optional, change the read-only mode of the device; 
defaults
+#   to 'retain'
+#
 # Since: 2.5
 ##
 { 'command': 'blockdev-change-medium',
   'data': { 'device': 'str',
 'filename': 'str',
-'*format': 'str' } }
+'*format': 'str',
+'*read-only-mode': 'BlockdevChangeReadOnlyMode' } }
 
 
 ##
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 7a143a3..4f03d11 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -4140,7 +4140,7 @@ EQMP
 
 {
 .name   = "blockdev-change-medium",
-.args_type  = "device:B,filename:F,format:s?",
+.args_type  = "device:B,filename:F,format:s?,read-only-mode:s?",
 .mhandler.cmd_new = qmp_marshal_blockdev_change_medium,
 },
 
@@ -4156,6 +4156,8 @@ Arguments:
 - "device": device name (json-string)
 - "filename": filename of the new image (json-string)
 - "format": format of the new image (json-string, optional)
+- "read-only-mode": new read-only mode (json-string, optional)
+  - Possible values: "retain" (default), "read-only", "read-write"
 
 Examples:
 
@@ -4167,6 +4169,26 @@ Examples:
 "format": "raw" } }
 <- { "return": {} }
 
+2. Load a read-only medium into a writable drive
+
+-> { "execute": "blockdev-change-medium",
+ "arguments": { "device": "isa-fd0",
+"filename": "/srv/images/ro.img",
+"format": "raw",
+"read-only-mode": "retain" } }
+
+<- { "error":
+ { "class": "GenericError",
+  

[Qemu-block] [PATCH v6 28/39] blockdev: Add blockdev-open-tray

2015-10-12 Thread Max Reitz
Signed-off-by: Max Reitz 
---
 blockdev.c   | 49 +
 qapi/block-core.json | 23 +++
 qmp-commands.hx  | 39 +++
 3 files changed, 111 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 69a6cb2..b90b1d6 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2059,6 +2059,55 @@ out:
 aio_context_release(aio_context);
 }
 
+void qmp_blockdev_open_tray(const char *device, bool has_force, bool force,
+Error **errp)
+{
+BlockBackend *blk;
+BlockDriverState *bs;
+AioContext *aio_context = NULL;
+
+if (!has_force) {
+force = false;
+}
+
+blk = blk_by_name(device);
+if (!blk) {
+error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
+  "Device '%s' not found", device);
+return;
+}
+
+if (!blk_dev_has_removable_media(blk)) {
+error_setg(errp, "Device '%s' is not removable", device);
+return;
+}
+
+if (blk_dev_is_tray_open(blk)) {
+return;
+}
+
+bs = blk_bs(blk);
+if (bs) {
+aio_context = bdrv_get_aio_context(bs);
+aio_context_acquire(aio_context);
+
+if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_EJECT, errp)) {
+goto out;
+}
+}
+
+if (blk_dev_is_medium_locked(blk)) {
+blk_dev_eject_request(blk, force);
+} else {
+blk_dev_change_media_cb(blk, false);
+}
+
+out:
+if (aio_context) {
+aio_context_release(aio_context);
+}
+}
+
 /* throttling disk I/O limits */
 void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
int64_t bps_wr,
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 425fdab..b9b4a24 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1876,6 +1876,29 @@
 ##
 { 'command': 'blockdev-add', 'data': { 'options': 'BlockdevOptions' } }
 
+##
+# @blockdev-open-tray:
+#
+# Opens a block device's tray. If there is a block driver state tree inserted 
as
+# a medium, it will become inaccessible to the guest (but it will remain
+# associated to the block device, so closing the tray will make it accessible
+# again).
+#
+# If the tray was already open before, this will be a no-op.
+#
+# @device: block device name
+#
+# @force:  #optional if false (the default), an eject request will be sent to
+#  the guest if it has locked the tray (and the tray will not be opened
+#  immediately); if true, the tray will be opened regardless of whether
+#  it is locked
+#
+# Since: 2.5
+##
+{ 'command': 'blockdev-open-tray',
+  'data': { 'device': 'str',
+'*force': 'bool' } }
+
 
 ##
 # @BlockErrorAction
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 785ecf6..f20681a 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -3921,6 +3921,45 @@ Example (2):
 EQMP
 
 {
+.name   = "blockdev-open-tray",
+.args_type  = "device:s,force:b?",
+.mhandler.cmd_new = qmp_marshal_blockdev_open_tray,
+},
+
+SQMP
+blockdev-open-tray
+--
+
+Opens a block device's tray. If there is a block driver state tree inserted as 
a
+medium, it will become inaccessible to the guest (but it will remain associated
+to the block device, so closing the tray will make it accessible again).
+
+If the tray was already open before, this will be a no-op.
+
+Arguments:
+
+- "device": block device name (json-string)
+- "force": if false (the default), an eject request will be sent to the guest 
if
+   it has locked the tray (and the tray will not be opened 
immediately);
+   if true, the tray will be opened regardless of whether it is locked
+   (json-bool, optional)
+
+Example:
+
+-> { "execute": "blockdev-open-tray",
+ "arguments": { "device": "ide1-cd0" } }
+
+<- { "timestamp": { "seconds": 1418751016,
+"microseconds": 716996 },
+ "event": "DEVICE_TRAY_MOVED",
+ "data": { "device": "ide1-cd0",
+   "tray-open": true } }
+
+<- { "return": {} }
+
+EQMP
+
+{
 .name   = "query-named-block-nodes",
 .args_type  = "",
 .mhandler.cmd_new = qmp_marshal_query_named_block_nodes,
-- 
2.6.1




[Qemu-block] [PATCH v6 38/39] hmp: Add read-only-mode option to change command

2015-10-12 Thread Max Reitz
Expose the new read-only-mode option of 'blockdev-change-medium' for the
'change' HMP command.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 hmp-commands.hx | 20 +---
 hmp.c   | 22 +-
 2 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 3a4ae39..814ea86 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -194,8 +194,8 @@ ETEXI
 
 {
 .name   = "change",
-.args_type  = "device:B,target:F,arg:s?",
-.params = "device filename [format]",
+.args_type  = "device:B,target:F,arg:s?,read-only-mode:s?",
+.params = "device filename [format [read-only-mode]]",
 .help   = "change a removable medium, optional format",
 .mhandler.cmd = hmp_change,
 },
@@ -206,7 +206,7 @@ STEXI
 Change the configuration of a device.
 
 @table @option
-@item change @var{diskdevice} @var{filename} [@var{format}]
+@item change @var{diskdevice} @var{filename} [@var{format} 
[@var{read-only-mode}]]
 Change the medium for a removable disk device to point to @var{filename}. eg
 
 @example
@@ -215,6 +215,20 @@ Change the medium for a removable disk device to point to 
@var{filename}. eg
 
 @var{format} is optional.
 
+@var{read-only-mode} may be used to change the read-only status of the device.
+It accepts the following values:
+
+@table @var
+@item retain
+Retains the current status; this is the default.
+
+@item read-only
+Makes the device read-only.
+
+@item read-write
+Makes the device writable.
+@end table
+
 @item change vnc @var{display},@var{options}
 Change the configuration of the VNC server. The valid syntax for @var{display}
 and @var{options} are described at @ref{sec_invocation}. eg
diff --git a/hmp.c b/hmp.c
index 9e6b7e5..28caa7d 100644
--- a/hmp.c
+++ b/hmp.c
@@ -27,6 +27,7 @@
 #include "qapi/opts-visitor.h"
 #include "qapi/qmp/qerror.h"
 #include "qapi/string-output-visitor.h"
+#include "qapi/util.h"
 #include "qapi-visit.h"
 #include "ui/console.h"
 #include "block/qapi.h"
@@ -1336,9 +1337,16 @@ void hmp_change(Monitor *mon, const QDict *qdict)
 const char *device = qdict_get_str(qdict, "device");
 const char *target = qdict_get_str(qdict, "target");
 const char *arg = qdict_get_try_str(qdict, "arg");
+const char *read_only = qdict_get_try_str(qdict, "read-only-mode");
+BlockdevChangeReadOnlyMode read_only_mode = 0;
 Error *err = NULL;
 
 if (strcmp(device, "vnc") == 0) {
+if (read_only) {
+monitor_printf(mon,
+   "Parameter 'read-only-mode' is invalid for VNC");
+return;
+}
 if (strcmp(target, "passwd") == 0 ||
 strcmp(target, "password") == 0) {
 if (!arg) {
@@ -1348,7 +1356,19 @@ void hmp_change(Monitor *mon, const QDict *qdict)
 }
 qmp_change("vnc", target, !!arg, arg, &err);
 } else {
-qmp_blockdev_change_medium(device, target, !!arg, arg, false, 0, &err);
+if (read_only) {
+read_only_mode =
+qapi_enum_parse(BlockdevChangeReadOnlyMode_lookup,
+read_only, BLOCKDEV_CHANGE_READ_ONLY_MODE_MAX,
+BLOCKDEV_CHANGE_READ_ONLY_MODE_RETAIN, &err);
+if (err) {
+hmp_handle_error(mon, &err);
+return;
+}
+}
+
+qmp_blockdev_change_medium(device, target, !!arg, arg,
+   !!read_only, read_only_mode, &err);
 if (err &&
 error_get_class(err) == ERROR_CLASS_DEVICE_ENCRYPTED) {
 error_free(err);
-- 
2.6.1




Re: [Qemu-block] [PATCH v7 1/5] block: check for existing device IDs in external_snapshot_prepare()

2015-10-12 Thread Max Reitz
On 12.10.2015 11:16, Alberto Garcia wrote:
> The 'snapshot-node-name' parameter of blockdev-snapshot-sync allows
> setting the node name of the image that is going to be created.
> 
> Before creating the image, external_snapshot_prepare() checks that the
> name is not already being used. The check is however incomplete since
> it only considers existing node names, but node names must not clash
> with device IDs either because they share the same namespace.
> 
> If the user attempts to create a snapshot using the name of an
> existing device for the 'snapshot-node-name' parameter the operation
> will eventually fail, but only after the new image has been created.
> 
> This patch replaces bdrv_find_node() with bdrv_lookup_bs() to extend
> the check to existing device IDs, and thus detect possible name
> clashes before the new image is created.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  blockdev.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PATCH v6 36/39] hmp: Use blockdev-change-medium for change command

2015-10-12 Thread Max Reitz
Use separate code paths for the two overloaded functions of the 'change'
HMP command, and invoke the 'blockdev-change-medium' QMP command if used
on a block device (by calling qmp_blockdev_change_medium()).

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 hmp.c | 27 +++
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/hmp.c b/hmp.c
index 5048eee..b91821b 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1338,22 +1338,25 @@ void hmp_change(Monitor *mon, const QDict *qdict)
 const char *arg = qdict_get_try_str(qdict, "arg");
 Error *err = NULL;
 
-if (strcmp(device, "vnc") == 0 &&
-(strcmp(target, "passwd") == 0 ||
- strcmp(target, "password") == 0)) {
-if (!arg) {
-monitor_read_password(mon, hmp_change_read_arg, NULL);
+if (strcmp(device, "vnc") == 0) {
+if (strcmp(target, "passwd") == 0 ||
+strcmp(target, "password") == 0) {
+if (!arg) {
+monitor_read_password(mon, hmp_change_read_arg, NULL);
+return;
+}
+}
+qmp_change("vnc", target, !!arg, arg, &err);
+} else {
+qmp_blockdev_change_medium(device, target, !!arg, arg, &err);
+if (err &&
+error_get_class(err) == ERROR_CLASS_DEVICE_ENCRYPTED) {
+error_free(err);
+monitor_read_block_device_key(mon, device, NULL, NULL);
 return;
 }
 }
 
-qmp_change(device, target, !!arg, arg, &err);
-if (err &&
-error_get_class(err) == ERROR_CLASS_DEVICE_ENCRYPTED) {
-error_free(err);
-monitor_read_block_device_key(mon, device, NULL, NULL);
-return;
-}
 hmp_handle_error(mon, &err);
 }
 
-- 
2.6.1




[Qemu-block] [PATCH v6 39/39] iotests: Add test for change-related QMP commands

2015-10-12 Thread Max Reitz
Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 tests/qemu-iotests/118 | 638 +
 tests/qemu-iotests/118.out |   5 +
 tests/qemu-iotests/group   |   1 +
 3 files changed, 644 insertions(+)
 create mode 100755 tests/qemu-iotests/118
 create mode 100644 tests/qemu-iotests/118.out

diff --git a/tests/qemu-iotests/118 b/tests/qemu-iotests/118
new file mode 100755
index 000..915e439
--- /dev/null
+++ b/tests/qemu-iotests/118
@@ -0,0 +1,638 @@
+#!/usr/bin/env python
+#
+# Test case for the QMP 'change' command and all other associated
+# commands
+#
+# Copyright (C) 2015 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import os
+import stat
+import time
+import iotests
+from iotests import qemu_img
+
+old_img = os.path.join(iotests.test_dir, 'test0.img')
+new_img = os.path.join(iotests.test_dir, 'test1.img')
+
+class ChangeBaseClass(iotests.QMPTestCase):
+has_opened = False
+has_closed = False
+
+def process_events(self):
+for event in self.vm.get_qmp_events(wait=False):
+if (event['event'] == 'DEVICE_TRAY_MOVED' and
+event['data']['device'] == 'drive0'):
+if event['data']['tray-open'] == False:
+self.has_closed = True
+else:
+self.has_opened = True
+
+def wait_for_open(self):
+timeout = time.clock() + 3
+while not self.has_opened and time.clock() < timeout:
+self.process_events()
+if not self.has_opened:
+self.fail('Timeout while waiting for the tray to open')
+
+def wait_for_close(self):
+timeout = time.clock() + 3
+while not self.has_closed and time.clock() < timeout:
+self.process_events()
+if not self.has_opened:
+self.fail('Timeout while waiting for the tray to close')
+
+class GeneralChangeTestsBaseClass(ChangeBaseClass):
+def test_change(self):
+result = self.vm.qmp('change', device='drive0', target=new_img,
+   arg=iotests.imgfmt)
+self.assert_qmp(result, 'return', {})
+
+self.wait_for_open()
+self.wait_for_close()
+
+result = self.vm.qmp('query-block')
+self.assert_qmp(result, 'return[0]/tray_open', False)
+self.assert_qmp(result, 'return[0]/inserted/image/filename', new_img)
+
+def test_blockdev_change_medium(self):
+result = self.vm.qmp('blockdev-change-medium', device='drive0',
+   filename=new_img,
+   format=iotests.imgfmt)
+self.assert_qmp(result, 'return', {})
+
+self.wait_for_open()
+self.wait_for_close()
+
+result = self.vm.qmp('query-block')
+self.assert_qmp(result, 'return[0]/tray_open', False)
+self.assert_qmp(result, 'return[0]/inserted/image/filename', new_img)
+
+def test_eject(self):
+result = self.vm.qmp('eject', device='drive0', force=True)
+self.assert_qmp(result, 'return', {})
+
+self.wait_for_open()
+
+result = self.vm.qmp('query-block')
+self.assert_qmp(result, 'return[0]/tray_open', True)
+self.assert_qmp_absent(result, 'return[0]/inserted')
+
+def test_tray_eject_change(self):
+result = self.vm.qmp('eject', device='drive0', force=True)
+self.assert_qmp(result, 'return', {})
+
+self.wait_for_open()
+
+result = self.vm.qmp('query-block')
+self.assert_qmp(result, 'return[0]/tray_open', True)
+self.assert_qmp_absent(result, 'return[0]/inserted')
+
+result = self.vm.qmp('blockdev-change-medium', device='drive0',
+   filename=new_img,
+   format=iotests.imgfmt)
+self.assert_qmp(result, 'return', {})
+
+self.wait_for_close()
+
+result = self.vm.qmp('query-block')
+self.assert_qmp(result, 'return[0]/tray_open', False)
+self.assert_qmp(result, 'return[0]/inserted/image/filename', new_img)
+
+def test_tray_open_close(self):
+result = self.vm.qmp('blockdev-open-tray', device='drive0', force=True)
+self.assert_qmp(result, 'return', {})
+
+self.wait_for_open()
+
+r

Re: [Qemu-block] [PATCH v7 4/5] block: add a 'blockdev-snapshot' QMP command

2015-10-12 Thread Max Reitz
On 12.10.2015 11:16, Alberto Garcia wrote:
> One of the limitations of the 'blockdev-snapshot-sync' command is that
> it does not allow passing BlockdevOptions to the newly created
> snapshots, so they are always opened using the default values.
> 
> Extending the command to allow passing options is not a practical
> solution because there is overlap between those options and some of
> the existing parameters of the command.
> 
> This patch introduces a new 'blockdev-snapshot' command with a simpler
> interface: it just takes two references to existing block devices that
> will be used as the source and target for the snapshot.
> 
> Since the main difference between the two commands is that one of them
> creates and opens the target image, while the other uses an already
> opened one, the bulk of the implementation is shared.
> 
> Signed-off-by: Alberto Garcia 
> Cc: Eric Blake 
> Reviewed-by: Max Reitz 
> ---
>  blockdev.c   | 165 
> ---
>  qapi-schema.json |   2 +
>  qapi/block-core.json |  28 +
>  qmp-commands.hx  |  38 
>  4 files changed, 172 insertions(+), 61 deletions(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 12741a0..b5470c9 100644
> --- a/blockdev.c
> +++ b/blockdev.c

[...]

> @@ -1521,58 +1533,48 @@ typedef struct ExternalSnapshotState {

[...]

>  }
>  
>  /* start processing */
> -state->old_bs = bdrv_lookup_bs(has_device ? device : NULL,
> -   has_node_name ? node_name : NULL,
> -   &local_err);
> -if (local_err) {
> -error_propagate(errp, local_err);
> -return;
> -}
> -
> -if (has_node_name && !has_snapshot_node_name) {
> -error_setg(errp, "New snapshot node name missing");
> -return;
> -}
> -
> -if (has_snapshot_node_name &&
> -bdrv_lookup_bs(snapshot_node_name, snapshot_node_name, NULL)) {
> -error_setg(errp, "New snapshot node name already in use");

There's a difference from v6 here...

> +state->old_bs = bdrv_lookup_bs(device, node_name, errp);
> +if (!state->old_bs) {
>  return;
>  }
>  
> @@ -1602,35 +1604,70 @@ static void 
> external_snapshot_prepare(BlkTransactionState *common,
>  return;
>  }
>  
> -flags = state->old_bs->open_flags;
> +if (action->kind == TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC) {
> +BlockdevSnapshotSync *s = action->blockdev_snapshot_sync;
> +const char *format = s->has_format ? s->format : "qcow2";
> +enum NewImageMode mode;
> +const char *snapshot_node_name =
> +s->has_snapshot_node_name ? s->snapshot_node_name : NULL;
>  
> -/* create new image w/backing file */
> -if (mode != NEW_IMAGE_MODE_EXISTING) {
> -bdrv_img_create(new_image_file, format,
> -state->old_bs->filename,
> -state->old_bs->drv->format_name,
> -NULL, -1, flags, &local_err, false);
> -if (local_err) {
> -error_propagate(errp, local_err);
> +if (node_name && !snapshot_node_name) {
> +error_setg(errp, "New snapshot node name missing");
>  return;
>  }
> -}
>  
> -options = qdict_new();
> -if (has_snapshot_node_name) {
> -qdict_put(options, "node-name",
> -  qstring_from_str(snapshot_node_name));
> +if (snapshot_node_name &&
> +bdrv_lookup_bs(snapshot_node_name, snapshot_node_name, NULL)) {
> +error_setg(errp, "New snapshot node name already in use");

...and here, but how to resolve the conflict resulting from the newly
added patch 1 was obvious, so my R-b stands, of course.

Anyway, this is not why I'm replying, that's further down:

> +return;
> +}
> +
> +flags = state->old_bs->open_flags;
> +
> +/* create new image w/backing file */
> +mode = s->has_mode ? s->mode : NEW_IMAGE_MODE_ABSOLUTE_PATHS;
> +if (mode != NEW_IMAGE_MODE_EXISTING) {
> +bdrv_img_create(new_image_file, format,
> +state->old_bs->filename,
> +state->old_bs->drv->format_name,
> +NULL, -1, flags, &local_err, false);
> +if (local_err) {
> +error_propagate(errp, local_err);
> +return;
> +}
> +}
> +
> +options = qdict_new();
> +if (s->has_snapshot_node_name) {
> +qdict_put(options, "node-name",
> +  qstring_from_str(snapshot_node_name));
> +}
> +qdict_put(options, "driver", qstring_from_str(format));
> +
> +flags |= BDRV_O_NO_BACKING;
>  }
> -qdict_put(options, "driver", qstring_from_str(format));
>  
> -/* TODO Inherit bs->options or only take explicit options with an
> - * extended QMP command? */

[Qemu-block] [PATCH v3 0/4]

2015-10-12 Thread Jeff Cody
Changes from v2:

Patch 1:  Fixed prototype for id_generate() (thanks Alberto)
  Used *const instead of * const (thanks Eric, Markus)
  Updated function comment (thanks Markus)
  Made random in range 0-99 instead of 0-98 (thanks, Marksu)

Patch 2: Cleaned up comments (thanks Markus)
 use else if instead of nested if (thanks Markus)
 assign node_name on same line as gen_node_name (thanks Markus)

Patch 3,4: new - fix iotests (thanks Kevin)


Changes from RFC v1:

Patch 1: Several typos / grammatical errors (thanks Eric, John)
 Make id_subsys_str[] const pointer to const strings (thanks Eric)
 Moved id_subsys_str[] out from  id_generate() (thanks John)
 Assert on null string for given id (thanks Eric)
 Zero-pad the 2-digit random # (thanks John)

Patch 2: None

Born from the conversation on qemu-devel, this generation scheme uses the
format ultimately proposed by Kevin, after list discussion.

It attempts to keep the ID strings as small as possible, while fulfilling:

1.) Guarantee no collisions with a user-specified ID
2.) Identify the sub-system the ID belongs to
3.) Guarantee of uniqueness
4.) Spoiling predictibility, to avoid creating an assumption
of object ordering and parsing (i.e., we don't want users to think
they can guess the next ID based on prior behavior).

See patch 1 for the generation scheme details.

Jeff Cody (4):
  util - add automated ID generation utility
  block: auto-generated node-names
  block: add filter for generated node-names
  qemu-iotests: update tests for generated node-names

 block.c  | 19 ---
 include/qemu-common.h|  8 
 tests/qemu-iotests/041   |  4 ++--
 tests/qemu-iotests/051   |  3 ++-
 tests/qemu-iotests/051.out   |  2 +-
 tests/qemu-iotests/067   |  3 ++-
 tests/qemu-iotests/067.out   |  5 +
 tests/qemu-iotests/081   |  3 ++-
 tests/qemu-iotests/081.out   |  2 +-
 tests/qemu-iotests/common.filter |  5 +
 util/id.c| 37 +
 11 files changed, 77 insertions(+), 14 deletions(-)

-- 
1.9.3




[Qemu-block] [PATCH v3 3/4] block: add filter for generated node-names

2015-10-12 Thread Jeff Cody
Signed-off-by: Jeff Cody 
---
 tests/qemu-iotests/common.filter | 5 +
 1 file changed, 5 insertions(+)

diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index d6d05de..cfdb633 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -128,6 +128,11 @@ _filter_date()
 -e 's/[A-Z][a-z][a-z] [A-z][a-z][a-z]  *[0-9][0-9]* 
[0-9][0-9]:[0-9][0-9]:[0-9][0-9] [0-9][0-9][0-9][0-9]$/DATE/'
 }
 
+_filter_generated_node_ids()
+{
+ sed -re 's/\#block[0-9]{3,}/NODE_NAME/'
+}
+
 # replace occurrences of the actual TEST_DIR value with TEST_DIR
 _filter_testdir()
 {
-- 
1.9.3




[Qemu-block] [PATCH v3 1/4] util - add automated ID generation utility

2015-10-12 Thread Jeff Cody
Multiple sub-systems in QEMU may find it useful to generate IDs
for objects that a user may reference via QMP or HMP.  This patch
presents a standardized way to do it, so that automatic ID generation
follows the same rules.

This patch enforces the following rules when generating an ID:

1.) Guarantee no collisions with a user-specified ID
2.) Identify the sub-system the ID belongs to
3.) Guarantee of uniqueness
4.) Spoiling predictability, to avoid creating an assumption
of object ordering and parsing (i.e., we don't want users to think
they can guess the next ID based on prior behavior).

The scheme for this is as follows (no spaces):

# subsys D RR
Reserved char --||   | |
Subsystem String |   | |
Unique number (64-bit) --| |
Two-digit random number ---|

For example, a generated node-name for the block sub-system may look
like this:

#block076

The caller of id_generate() is responsible for freeing the generated
node name string with g_free().

Reviewed-by: John Snow 
Reviewed-by: Eric Blake 
Reviewed-by: Alberto Garcia 
Signed-off-by: Jeff Cody 
---
 include/qemu-common.h |  8 
 util/id.c | 37 +
 2 files changed, 45 insertions(+)

diff --git a/include/qemu-common.h b/include/qemu-common.h
index 0bd212b..2f74540 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -246,6 +246,14 @@ int64_t qemu_strtosz_suffix_unit(const char *nptr, char 
**end,
 #define STR_OR_NULL(str) ((str) ? (str) : "null")
 
 /* id.c */
+
+typedef enum IdSubSystems {
+ID_QDEV,
+ID_BLOCK,
+ID_MAX  /* last element, used as array size */
+} IdSubSystems;
+
+char *id_generate(IdSubSystems id);
 bool id_wellformed(const char *id);
 
 /* path.c */
diff --git a/util/id.c b/util/id.c
index 09b22fb..bcc64d8 100644
--- a/util/id.c
+++ b/util/id.c
@@ -26,3 +26,40 @@ bool id_wellformed(const char *id)
 }
 return true;
 }
+
+#define ID_SPECIAL_CHAR '#'
+
+static const char *const id_subsys_str[] = {
+[ID_QDEV]  = "qdev",
+[ID_BLOCK] = "block",
+};
+
+/*
+ *  Generates an ID of the form PREFIX SUBSYSTEM NUMBER
+ *  where:
+ *
+ *  - PREFIX is the reserved character '#'
+ *  - SUBSYSTEM identifies the subsystem creating the ID
+ *  - NUMBER is a decimal number unique within SUBSYSTEM.
+ *
+ *Example: "#block146"
+ *
+ * Note that these IDs do not satisfy id_wellformed().
+ *
+ * The caller is responsible for freeing the returned string with g_free()
+ */
+char *id_generate(IdSubSystems id)
+{
+static uint64_t id_counters[ID_MAX];
+uint32_t rnd;
+
+assert(id < ID_MAX);
+assert(id_subsys_str[id]);
+
+rnd = g_random_int_range(0, 100);
+
+return g_strdup_printf("%c%s%" PRIu64 "%02" PRId32, ID_SPECIAL_CHAR,
+id_subsys_str[id],
+id_counters[id]++,
+rnd);
+}
-- 
1.9.3




[Qemu-block] [PATCH v3 2/4] block: auto-generated node-names

2015-10-12 Thread Jeff Cody
If a node-name is not specified, automatically generate the node-name.

Generated node-names will use the "block" sub-system identifier.

Reviewed-by: Eric Blake 
Reviewed-by: John Snow 
Signed-off-by: Jeff Cody 
---
 block.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/block.c b/block.c
index 1f90b47..5947704 100644
--- a/block.c
+++ b/block.c
@@ -763,12 +763,15 @@ static void bdrv_assign_node_name(BlockDriverState *bs,
   const char *node_name,
   Error **errp)
 {
+char *gen_node_name = NULL;
+
 if (!node_name) {
-return;
-}
-
-/* Check for empty string or invalid characters */
-if (!id_wellformed(node_name)) {
+node_name = gen_node_name = id_generate(ID_BLOCK);
+} else if (!id_wellformed(node_name)) {
+/*
+ * Check for empty string or invalid characters, but not if it is
+ * generated (generated names use characters not available to the user)
+ */
 error_setg(errp, "Invalid node name");
 return;
 }
@@ -777,18 +780,20 @@ static void bdrv_assign_node_name(BlockDriverState *bs,
 if (blk_by_name(node_name)) {
 error_setg(errp, "node-name=%s is conflicting with a device id",
node_name);
-return;
+goto out;
 }
 
 /* takes care of avoiding duplicates node names */
 if (bdrv_find_node(node_name)) {
 error_setg(errp, "Duplicate node name");
-return;
+goto out;
 }
 
 /* copy node name into the bs and insert it into the graph list */
 pstrcpy(bs->node_name, sizeof(bs->node_name), node_name);
 QTAILQ_INSERT_TAIL(&graph_bdrv_states, bs, node_list);
+out:
+g_free(gen_node_name);
 }
 
 static QemuOptsList bdrv_runtime_opts = {
-- 
1.9.3




[Qemu-block] [PATCH v3 4/4] qemu-iotests: update tests for generated node-names

2015-10-12 Thread Jeff Cody
Signed-off-by: Jeff Cody 
---
 tests/qemu-iotests/041 | 4 ++--
 tests/qemu-iotests/051 | 3 ++-
 tests/qemu-iotests/051.out | 2 +-
 tests/qemu-iotests/067 | 3 ++-
 tests/qemu-iotests/067.out | 5 +
 tests/qemu-iotests/081 | 3 ++-
 tests/qemu-iotests/081.out | 2 +-
 7 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index 59c1a76..05b5962 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -780,7 +780,7 @@ class TestRepairQuorum(iotests.QMPTestCase):
 # here we check that the last registered quorum file has not been
 # swapped out and unref
 result = self.vm.qmp('query-named-block-nodes')
-self.assert_qmp(result, 'return[0]/file', quorum_img3)
+self.assert_qmp(result, 'return[1]/file', quorum_img3)
 self.vm.shutdown()
 
 def test_cancel_after_ready(self):
@@ -799,7 +799,7 @@ class TestRepairQuorum(iotests.QMPTestCase):
 result = self.vm.qmp('query-named-block-nodes')
 # here we check that the last registered quorum file has not been
 # swapped out and unref
-self.assert_qmp(result, 'return[0]/file', quorum_img3)
+self.assert_qmp(result, 'return[1]/file', quorum_img3)
 self.vm.shutdown()
 self.assertTrue(iotests.compare_images(quorum_img2, quorum_repair_img),
 'target image does not match source after mirroring')
diff --git a/tests/qemu-iotests/051 b/tests/qemu-iotests/051
index 4a8055b..17dbf04 100755
--- a/tests/qemu-iotests/051
+++ b/tests/qemu-iotests/051
@@ -108,7 +108,8 @@ echo
 echo === Overriding backing file ===
 echo
 
-echo "info block" | run_qemu -drive 
file="$TEST_IMG",driver=qcow2,backing.file.filename="$TEST_IMG.orig" -nodefaults
+echo "info block" | run_qemu -drive 
file="$TEST_IMG",driver=qcow2,backing.file.filename="$TEST_IMG.orig" 
-nodefaults\
+  | _filter_generated_node_ids
 
 # Drivers that don't support backing files
 run_qemu -drive 
file="$TEST_IMG",driver=raw,backing.file.filename="$TEST_IMG.orig"
diff --git a/tests/qemu-iotests/051.out b/tests/qemu-iotests/051.out
index 0429be2..7765aa0 100644
--- a/tests/qemu-iotests/051.out
+++ b/tests/qemu-iotests/051.out
@@ -59,7 +59,7 @@ QEMU X.Y.Z monitor - type 'help' for more information
 Testing: -drive 
file=TEST_DIR/t.qcow2,driver=qcow2,backing.file.filename=TEST_DIR/t.qcow2.orig 
-nodefaults
 QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) iininfinfoinfo 
info binfo 
blinfo bloinfo 
blocinfo block
-ide0-hd0: TEST_DIR/t.qcow2 (qcow2)
+ide0-hd0 (NODE_NAME): TEST_DIR/t.qcow2 (qcow2)
 Cache mode:   writeback
 Backing file: TEST_DIR/t.qcow2.orig (chain depth: 1)
 (qemu) qququiquit
diff --git a/tests/qemu-iotests/067 b/tests/qemu-iotests/067
index 3e9a053..3788534 100755
--- a/tests/qemu-iotests/067
+++ b/tests/qemu-iotests/067
@@ -48,7 +48,8 @@ function do_run_qemu()
 function run_qemu()
 {
 do_run_qemu "$@" 2>&1 | _filter_testdir | _filter_qmp | _filter_qemu \
-  | sed -e 's/\("actual-size":\s*\)[0-9]\+/\1SIZE/g'
+  | sed -e 's/\("actual-size":\s*\)[0-9]\+/\1SIZE/g' \
+  | _filter_generated_node_ids
 }
 
 size=128M
diff --git a/tests/qemu-iotests/067.out b/tests/qemu-iotests/067.out
index 5fbc881..27ad56f 100644
--- a/tests/qemu-iotests/067.out
+++ b/tests/qemu-iotests/067.out
@@ -40,6 +40,7 @@ Testing: -drive 
file=TEST_DIR/t.qcow2,format=qcow2,if=none,id=disk -device virti
 },
 "iops_wr": 0,
 "ro": false,
+"node-name": "NODE_NAME",
 "backing_file_depth": 0,
 "drv": "qcow2",
 "iops": 0,
@@ -151,6 +152,7 @@ Testing: -drive 
file=TEST_DIR/t.qcow2,format=qcow2,if=none,id=disk
 },
 "iops_wr": 0,
 "ro": false,
+"node-name": "NODE_NAME",
 "backing_file_depth": 0,
 "drv": "qcow2",
 "iops": 0,
@@ -270,6 +272,7 @@ Testing:
 },
 "iops_wr": 0,
 "ro": false,
+"node-name": "NODE_NAME",
 "backing_file_depth": 0,
 "drv": "qcow2",
 "iops": 0,
@@ -390,6 +393,7 @@ Testing:
 },
 "iops_wr": 0,
 "ro": false,
+"node-name": "NODE_NAME",
 "backing_file_depth": 0,
 "drv": "qcow2",
 "iops": 0,
@@ -480,6 +484,7 @@ Testing:
 },
 "iops_wr": 0,
 "ro": false,
+"node-name": "NODE_NAME",
 "backing_file_depth": 0,
  

  1   2   >