date:20161024

[Qemu-block] does blk_commit_all need blk_all_next?

2016-10-24 Thread Paolo Bonzini

blk_commit_all is in block/block-backend.c only because it uses
blk_all_next.  This is also the only reason why it needs a stub
(stubs/blk-commit-all.c).

The only difference between blk_next and blk_all_next is that the latter
"iterates over all BlockBackends, even the ones which are hidden (i.e.
are not referenced by the monitor)".  Should blk_commit_all really
iterate over BlockBackends such as the NBD server or the block jobs'?

Thanks,

Paolo

Re: [Qemu-block] [Qemu-devel] [PATCH v14 02/21] qdict: implement a qdict_crumple method for un-flattening a dict

2016-10-24 Thread Markus Armbruster

Max Reitz  writes:

> On 21.10.2016 11:58, Markus Armbruster wrote:
>> "Daniel P. Berrange"  writes:
>> 
>>> On Tue, Oct 18, 2016 at 04:32:13PM +0200, Markus Armbruster wrote:
 "Daniel P. Berrange"  writes:

> The qdict_flatten() method will take a dict whose elements are
> further nested dicts/lists and flatten them by concatenating
> keys.
>
> The qdict_crumple() method aims to do the reverse, taking a flat
> qdict, and turning it into a set of nested dicts/lists. It will
> apply nesting based on the key name, with a '.' indicating a
> new level in the hierarchy. If the keys in the nested structure
> are all numeric, it will create a list, otherwise it will create
> a dict.
>
> If the keys are a mixture of numeric and non-numeric, or the
> numeric keys are not in strictly ascending order, an error will
> be reported.
>
> As an example, a flat dict containing
>
>  {
>'foo.0.bar': 'one',
>'foo.0.wizz': '1',
>'foo.1.bar': 'two',
>'foo.1.wizz': '2'
>  }
>
> will get turned into a dict with one element 'foo' whose
> value is a list. The list elements will each in turn be
> dicts.
>
>  {
>'foo': [
>  { 'bar': 'one', 'wizz': '1' },
>  { 'bar': 'two', 'wizz': '2' }
>],
>  }
>
> If the key is intended to contain a literal '.', then it must
> be escaped as '..'. ie a flat dict
>
>   {
>  'foo..bar': 'wizz',
>  'bar.foo..bar': 'eek',
>  'bar.hello': 'world'
>   }
>
> Will end up as
>
>   {
>  'foo.bar': 'wizz',
>  'bar': {
> 'foo.bar': 'eek',
> 'hello': 'world'
>  }
>   }
>
> The intent of this function is that it allows a set of QemuOpts
> to be turned into a nested data structure that mirrors the nesting
> used when the same object is defined over QMP.
>
> Reviewed-by: Eric Blake 
> Reviewed-by: Kevin Wolf 
> Reviewed-by: Marc-André Lureau 
> Signed-off-by: Daniel P. Berrange 
> ---
>  include/qapi/qmp/qdict.h |   1 +
>  qobject/qdict.c  | 289 
> +++
>  tests/check-qdict.c  | 261 ++
>  3 files changed, 551 insertions(+)
>
> diff --git a/include/qapi/qmp/qdict.h b/include/qapi/qmp/qdict.h
> index 71b8eb0..e0d24e1 100644
> --- a/include/qapi/qmp/qdict.h
> +++ b/include/qapi/qmp/qdict.h
> @@ -73,6 +73,7 @@ void qdict_flatten(QDict *qdict);
>  void qdict_extract_subqdict(QDict *src, QDict **dst, const char *start);
>  void qdict_array_split(QDict *src, QList **dst);
>  int qdict_array_entries(QDict *src, const char *subqdict);
> +QObject *qdict_crumple(const QDict *src, bool recursive, Error **errp);
>  
>  void qdict_join(QDict *dest, QDict *src, bool overwrite);
>  
> diff --git a/qobject/qdict.c b/qobject/qdict.c
> index 60f158c..c38e90e 100644
> --- a/qobject/qdict.c
> +++ b/qobject/qdict.c
 [...]
> +/**
> + * qdict_crumple:
> + * @src: the original flat dictionary (only scalar values) to crumple
> + * @recursive: true to recursively crumple nested dictionaries

 Is recursive=false used outside tests in this series?
>>>
>>> No, its not used.
>>>
>>> It was suggested in a way earlier version by Max, but not sure if his
>>> code uses it or not.
>> 
>> In general, I prefer features to be added right before they're used, and
>> I really dislike adding features "just in case".  YAGNI.
>> 
>> Max, do you actually need this one?  If yes, please explain your use
>> case.
>
> As far as I can tell from a quick glance, I made the point for v1 that
> qdict_crumple() could be simplified by using qdict_array_split() and
> qdict_array_entries().
>
> Dan then (correctly) said that using these functions would worsen
> runtime performance of qdict_crumple() and that instead we can replace
> qdict_array_split() by qdict_crumple(). However, for that to work, we
> need to be able to make qdict_crumple() non-recursive (because
> qdict_array_split() is non-recursive).
>
> Dan also said that in the long run we want to keep the tree structure in
> the block layer instead of flattening everything down which would rid us
> of several other QDict functions (and would make non-recursive behavior
> obsolete again). I believe that this is an idea for the (far?) future,
> as can be seen from the discussion you and others had on this very topic
> in this version here.

In the long run, the block layer should use proper C types instead of
mucking around with QDict.  That's what QAPI is for.

> However, clearly there are no patches out yet that replace
> qdict_array_split() by qdict_crumple(recursive=false), and I certainly
> won't write any until qdict_crumple() is merged.

All right, I'll go hunting for prior ve

Re: [Qemu-block] [Qemu-devel] [PATCH] nbd: Use CoQueue for free_sema instead of CoMutex

2016-10-24 Thread Paolo Bonzini



On 24/10/2016 03:44, Changlong Xie wrote:
> Ping. Any comments? It's really a problem for NBD.

Sorry, I haven't been sending pull requests.  I'll do it this week.

Paolo

> Thanks
> -Xie
> 
> On 10/12/2016 06:18 PM, Changlong Xie wrote:
>> NBD is using the CoMutex in a way that wasn't anticipated. For
>> example, if there are
>> N(N=26, MAX_NBD_REQUESTS=16) nbd write requests, so we will invoke
>> nbd_client_co_pwritev
>> N times.
>> 
>>
>> time request Actions
>> 11   in_flight=1, Coroutine=C1
>> 22   in_flight=2, Coroutine=C2
>> ...
>> 15   15  in_flight=15, Coroutine=C15
>> 16   16  in_flight=16, Coroutine=C16, free_sema->holder=C16,
>> mutex->locked=true
>> 17   17  in_flight=16, Coroutine=C17, queue C17 into free_sema->queue
>> 18   18  in_flight=16, Coroutine=C18, queue C18 into free_sema->queue
>> ...
>> 26   N   in_flight=16, Coroutine=C26, queue C26 into free_sema->queue
>> 
>>
>>
>> Once nbd client recieves request No.16' reply, we will re-enter C16.
>> It's ok, because
>> it's equal to 'free_sema->holder'.
>> 
>>
>> time request Actions
>> 27   16  in_flight=15, Coroutine=C16, free_sema->holder=C16,
>> mutex->locked=false
>> 
>>
>>
>> Then nbd_coroutine_end invokes qemu_co_mutex_unlock what will pop
>> coroutines from
>> free_sema->queue's head and enter C17. More free_sema->holder is C17 now.
>> 
>>
>> time request Actions
>> 28   17  in_flight=16, Coroutine=C17, free_sema->holder=C17,
>> mutex->locked=true
>> 
>>
>>
>> In above scenario, we only recieves request No.16' reply. As time goes
>> by, nbd client will
>> almostly recieves replies from requests 1 to 15 rather than request 17
>> who owns C17. In this
>> case, we will encounter assert "mutex->holder == self" failed since
>> Kevin's commit 0e438cdc
>> "coroutine: Let CoMutex remember who holds it". For example, if nbd
>> client recieves request
>> No.15' reply, qemu will stop unexpectedly:
>> 
>>
>> time request   Actions
>> 29   15(most case) in_flight=15, Coroutine=C15, free_sema->holder=C17,
>> mutex->locked=false
>> 
>>
>>
>> Per Paolo's suggestion "The simplest fix is to change it to CoQueue,
>> which is like a condition
>> variable", this patch replaces CoMutex with CoQueue.
>>
>> Cc: Wen Congyang 
>> Reported-by: zhanghailiang 
>> Suggested-by: Paolo Bonzini 
>> Signed-off-by: Changlong Xie 
>> ---
>>   block/nbd-client.c | 8 
>>   block/nbd-client.h | 2 +-
>>   2 files changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/block/nbd-client.c b/block/nbd-client.c
>> index 2cf3237..40b28ab 100644
>> --- a/block/nbd-client.c
>> +++ b/block/nbd-client.c
>> @@ -199,8 +199,8 @@ static void nbd_coroutine_start(NbdClientSession *s,
>>   {
>>   /* Poor man semaphore.  The free_sema is locked when no other
>> request
>>* can be accepted, and unlocked after receiving one reply.  */
>> -if (s->in_flight >= MAX_NBD_REQUESTS - 1) {
>> -qemu_co_mutex_lock(&s->free_sema);
>> +if (s->in_flight == MAX_NBD_REQUESTS) {
>> +qemu_co_queue_wait(&s->free_sema);
>>   assert(s->in_flight < MAX_NBD_REQUESTS);
>>   }
>>   s->in_flight++;
>> @@ -214,7 +214,7 @@ static void nbd_coroutine_end(NbdClientSession *s,
>>   int i = HANDLE_TO_INDEX(s, request->handle);
>>   s->recv_coroutine[i] = NULL;
>>   if (s->in_flight-- == MAX_NBD_REQUESTS) {
>> -qemu_co_mutex_unlock(&s->free_sema);
>> +qemu_co_queue_next(&s->free_sema);
>>   }
>>   }
>>
>> @@ -386,7 +386,7 @@ int nbd_client_init(BlockDriverState *bs,
>>   }
>>
>>   qemu_co_mutex_init(&client->send_mutex);
>> -qemu_co_mutex_init(&client->free_sema);
>> +qemu_co_queue_init(&client->free_sema);
>>   client->sioc = sioc;
>>   object_ref(OBJECT(client->sioc));
>>
>> diff --git a/block/nbd-client.h b/block/nbd-client.h
>> index 044aca4..307b8b1 100644
>> --- a/block/nbd-client.h
>> +++ b/block/nbd-client.h
>> @@ -24,7 +24,7 @@ typedef struct NbdClientSession {
>>   off_t size;
>>
>>   CoMutex send_mutex;
>> -CoMutex free_sema;
>> +CoQueue free_sema;
>>   Coroutine *send_coroutine;
>>   int in_flight;
>>
>>
> 
> 
> 
>

Re: [Qemu-block] [PATCH v8 00/36] block: Image locking series

2016-10-24 Thread Kevin Wolf

Am 22.10.2016 um 03:00 hat Max Reitz geschrieben:
> 
> 
> I personally still don't like making locking a qdev property very much
> because it doesn't make sense to me*. But I remember Kevin had his
> reasons (even though I can no longer remember them) for asking for it,
> and I don't have any besides "It doesn't make sense to me". After having
> though a bit about it (= having written three paragraphs and deleted
> them again), I guess I can make some sense of it, though it seems to be
> a rather esoteric use case still; it appears to me that a guest could
> use it to signal that it's fine with some block device being shared;
> then we could use a shared lock or none at all or I don't know.
> Otherwise, we should get an exclusive lock for write access and a shared
> lock for read access.

The reason is pretty simple if you think about this question: Why do we
need user input in the first place? It's just because we don't know the
requiremens of the guest, and unfortunately the guest won't tell us
(because things like shared IDE disks simply don't exist on real
hardware). So we need ask the user, and we should do this in a layer as
close to the guest (which is the origin of the requirement) as possible.
This point is the device emulation.

Everything that isn't a guest device is under the control of qemu, so
we don't need user input there.

> (Although, fun question (that's the reason for the "I don't know"): What
> if guest A supports such a volatile block device, but guest B doesn't?
> Then imagine we run A with write access and B with read-only access. A
> will claim no lock or a shared lock, while B will probably claim a
> shared lock, too, expecting writing users to try to grab an exclusive
> lock. So suddenly both can open the image file, but B isn't actually
> prepared for that. Fun!)

Hmm, that's a good point actually.

And in fact, it reminds me how I suggested that the problem at hand is
really similar to what the new op blockers are supposed to do. And the
system I proposed for those already handles this situation just fine.

The relevant part here is how the proposal had two bit masks per user,
one describing which operatings the user needs to be able to perform
itself, and the other one for operations that other users are still
allowed perform. This means that for every operation you have four
possible states: Exclusively used; shared use; unused, but blocking
other users; unused, but allowed for other users.

If we interpret the locking flag as the operation "writes to the image",
in your example, A would have the mode "shared use" and B would have
"unused, but blocking". These two modes are in conflict.

Now, the big question is how to translate this into file locking. This
could become a little tricky. I had a few thoughts involving another
lock on byte 2, but none of them actually worked out so far, because
what we want is essentially a lock that can be shared by readers, that
can also be shared by writers, but not by readers and writers at the
same time.

> So as far as I understand it, those qdev properties should eventually be
> changeable by the guest. And if that is so, the user probably shouldn't
> touch them at all because the guest OS really knows whether it can cope
> with volatile block devices or not, and it will tell qemu if that is so.

This is an interesting thought for PV devices, but it hasn't been part
of the plan so far. I'm not sure if the guest OS block device drivers
even have this information generally, because that's more a matter of
whether a cluster file system is used on the block device or not.

> That may be the reason why the qdev property doesn't make sense to me at
> the first glance: It does make sense for the guest OS to set this
> property at that level, but it doesn't make sense for the user to do so.
> Or maybe it does, but the user is really asking for things breaking then
> ("YES, I AM SURE THAT MY GUEST WILL BE COMPLETELY FINE WITH FLOPPY DISKS
> JUST CHANGING RANDOMLY" -- but I guess we've all been at that point
> ourselves...).
> 
> So after having convinced myself twice now, having the qdev property is
> probably correct.

Essentially the whole reason is that we need the information and the
guest doesn't tell us, so we need to ask someone who supposedly knows
enough about the guest - which is the user.

> 
> 
> But that's where the flags come in again: The guest may be fine with a
> shared lock or none at all. But what about the block layer? Say you're
> using a qcow2 image; that will not like sharing at all, independently of
> what the guest things. So the block layer needs to have internal
> mechanisms to elevate the locks proposed by the guest devices to e.g.
> exclusive ones. I don't think that is something addressed by this series
> in this version. Maybe you'll still need the flags for that.

I'm not completely sure about the mechanism and whether it should
involve flags, but yes, you need something to propagate the required
locking mode down

Re: [Qemu-block] [Qemu-devel] [PATCH v14 02/21] qdict: implement a qdict_crumple method for un-flattening a dict

2016-10-24 Thread Daniel P. Berrange

On Mon, Oct 24, 2016 at 11:18:29AM +0200, Markus Armbruster wrote:
> Max Reitz  writes:
> 
> > On 21.10.2016 11:58, Markus Armbruster wrote:
> >> "Daniel P. Berrange"  writes:
> >> 
> >>> On Tue, Oct 18, 2016 at 04:32:13PM +0200, Markus Armbruster wrote:
>  "Daniel P. Berrange"  writes:
> 
> > The qdict_flatten() method will take a dict whose elements are
> > further nested dicts/lists and flatten them by concatenating
> > keys.
> >
> > The qdict_crumple() method aims to do the reverse, taking a flat
> > qdict, and turning it into a set of nested dicts/lists. It will
> > apply nesting based on the key name, with a '.' indicating a
> > new level in the hierarchy. If the keys in the nested structure
> > are all numeric, it will create a list, otherwise it will create
> > a dict.
> >
> > If the keys are a mixture of numeric and non-numeric, or the
> > numeric keys are not in strictly ascending order, an error will
> > be reported.
> >
> > As an example, a flat dict containing
> >
> >  {
> >'foo.0.bar': 'one',
> >'foo.0.wizz': '1',
> >'foo.1.bar': 'two',
> >'foo.1.wizz': '2'
> >  }
> >
> > will get turned into a dict with one element 'foo' whose
> > value is a list. The list elements will each in turn be
> > dicts.
> >
> >  {
> >'foo': [
> >  { 'bar': 'one', 'wizz': '1' },
> >  { 'bar': 'two', 'wizz': '2' }
> >],
> >  }
> >
> > If the key is intended to contain a literal '.', then it must
> > be escaped as '..'. ie a flat dict
> >
> >   {
> >  'foo..bar': 'wizz',
> >  'bar.foo..bar': 'eek',
> >  'bar.hello': 'world'
> >   }
> >
> > Will end up as
> >
> >   {
> >  'foo.bar': 'wizz',
> >  'bar': {
> > 'foo.bar': 'eek',
> > 'hello': 'world'
> >  }
> >   }
> >
> > The intent of this function is that it allows a set of QemuOpts
> > to be turned into a nested data structure that mirrors the nesting
> > used when the same object is defined over QMP.
> >
> > Reviewed-by: Eric Blake 
> > Reviewed-by: Kevin Wolf 
> > Reviewed-by: Marc-André Lureau 
> > Signed-off-by: Daniel P. Berrange 
> > ---
> >  include/qapi/qmp/qdict.h |   1 +
> >  qobject/qdict.c  | 289 
> > +++
> >  tests/check-qdict.c  | 261 
> > ++
> >  3 files changed, 551 insertions(+)
> >
> > diff --git a/include/qapi/qmp/qdict.h b/include/qapi/qmp/qdict.h
> > index 71b8eb0..e0d24e1 100644
> > --- a/include/qapi/qmp/qdict.h
> > +++ b/include/qapi/qmp/qdict.h
> > @@ -73,6 +73,7 @@ void qdict_flatten(QDict *qdict);
> >  void qdict_extract_subqdict(QDict *src, QDict **dst, const char 
> > *start);
> >  void qdict_array_split(QDict *src, QList **dst);
> >  int qdict_array_entries(QDict *src, const char *subqdict);
> > +QObject *qdict_crumple(const QDict *src, bool recursive, Error **errp);
> >  
> >  void qdict_join(QDict *dest, QDict *src, bool overwrite);
> >  
> > diff --git a/qobject/qdict.c b/qobject/qdict.c
> > index 60f158c..c38e90e 100644
> > --- a/qobject/qdict.c
> > +++ b/qobject/qdict.c
>  [...]
> > +/**
> > + * qdict_crumple:
> > + * @src: the original flat dictionary (only scalar values) to crumple
> > + * @recursive: true to recursively crumple nested dictionaries
> 
>  Is recursive=false used outside tests in this series?
> >>>
> >>> No, its not used.
> >>>
> >>> It was suggested in a way earlier version by Max, but not sure if his
> >>> code uses it or not.
> >> 
> >> In general, I prefer features to be added right before they're used, and
> >> I really dislike adding features "just in case".  YAGNI.
> >> 
> >> Max, do you actually need this one?  If yes, please explain your use
> >> case.
> >
> > As far as I can tell from a quick glance, I made the point for v1 that
> > qdict_crumple() could be simplified by using qdict_array_split() and
> > qdict_array_entries().
> >
> > Dan then (correctly) said that using these functions would worsen
> > runtime performance of qdict_crumple() and that instead we can replace
> > qdict_array_split() by qdict_crumple(). However, for that to work, we
> > need to be able to make qdict_crumple() non-recursive (because
> > qdict_array_split() is non-recursive).
> >
> > Dan also said that in the long run we want to keep the tree structure in
> > the block layer instead of flattening everything down which would rid us
> > of several other QDict functions (and would make non-recursive behavior
> > obsolete again). I believe that this is an idea for the (far?) future,
> > as can be seen from the discussion you and others had on this very topic
> > in this version here.
> 
> In t

Re: [Qemu-block] [PATCH 12/22] qcow2-bitmap: add IN_USE flag

2016-10-24 Thread Vladimir Sementsov-Ogievskiy


21.10.2016 22:58, Max Reitz пишет:

On 21.10.2016 17:34, Vladimir Sementsov-Ogievskiy wrote:

07.10.2016 22:44, Max Reitz пишет:

On 30.09.2016 12:53, Vladimir Sementsov-Ogievskiy wrote:

This flag means that the bitmap is now in use by the software or was not
successfully saved. In any way, with this flag set the bitmap data must
be considered inconsistent and should not be loaded.

With current implementation this flag is never set, as we just remove
bitmaps from the image after loading. But it defined in qcow2 spec and
must be handled. Also, it can be used in future, if async schemes of
bitmap loading/saving are implemented.

We also remove in-use bitmaps from the image on open.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
   block/qcow2-bitmap.c | 17 -
   1 file changed, 16 insertions(+), 1 deletion(-)

Don't you want to make use of this flag? It would appear useful to me if
you just marked autoload bitmaps as in_use instead of deleting them from
the image when it's opened and then overwrite them when the image is
closed.

And what is the use of it?

You don't need to free any bitmaps when opening the file, and you don't
need to allocate any new bitmap space when closing it.


As bitmaps are sparce in file, I need to allocate new space when 
closing. Or free it...





Any way, storing an invalid copy of online data is bad I think.
Therefore I will have to free bitmap data clusters and zero bitmap table
in file.

Why can't you just set this flag to mark it invalid? In case qemu
crashes, in one case you'll have some invalid bitmaps and in the other
you won't have any of them at all. I don't think there's a difference
from that perspective.

Max




--
Best regards,
Vladimir

Re: [Qemu-block] [PATCH v11 01/19] block: Add bdrv_drain_all_{begin, end}()

2016-10-24 Thread Paolo Bonzini



On 19/10/2016 19:11, Kevin Wolf wrote:
> Am 14.10.2016 um 15:08 hat Alberto Garcia geschrieben:
>> bdrv_drain_all() doesn't allow the caller to do anything after all
>> pending requests have been completed but before block jobs are
>> resumed.
>>
>> This patch splits bdrv_drain_all() into _begin() and _end() for that
>> purpose. It also adds aio_{disable,enable}_external() calls to disable
>> external clients in the meantime.
>>
>> Signed-off-by: Alberto Garcia 
> 
> This looks okay as a first step, possibly enough for this series (we'll
> have to review this carefully), but it leaves us with a rather limited
> version of bdrv_drain_all_begin/end that excludes many useful cases. One
> of them is that John wants to use it around QMP transactions.
> 
> Specifically, you can't add a new BDS or a new block job in a drain_all
> section because then bdrv_drain_all_end() would try to unpause the new
> thing even though it has never been paused. Depending on what else we
> did with it, this will either corrupt the pause counters or just
> directly fail an assertion.
> 
> My first thoughts were about how to let an unpause succeed without a
> previous pause for these objects, but actually I think this isn't what
> we should do. We rather want to actually do the pause instead because
> even new BDSes and block jobs should probably start in a quiesced state
> when created inside a drain_all section.

Yes, I agree with this.  It shouldn't be too hard to implement it.  It
would require a BlockDriverState to look at the global "inside
bdrv_drain_all_begin" state, and ask its BlockBackend to disable itself
upon bdrv_replace_child.

Basically, "foo->quiesce_counter" should become "foo->quiesce_counter ||
all_quiesce_counter", I think.  It may well be done as a separate patch
if there is a TODO comment in bdrv_replace_child; as Kevin said, there
are assertions to protect against misuse.

Paolo



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH 12/22] qcow2-bitmap: add IN_USE flag

2016-10-24 Thread Vladimir Sementsov-Ogievskiy


24.10.2016 13:32, Vladimir Sementsov-Ogievskiy пишет:

21.10.2016 22:58, Max Reitz пишет:

On 21.10.2016 17:34, Vladimir Sementsov-Ogievskiy wrote:

07.10.2016 22:44, Max Reitz пишет:

On 30.09.2016 12:53, Vladimir Sementsov-Ogievskiy wrote:
This flag means that the bitmap is now in use by the software or 
was not
successfully saved. In any way, with this flag set the bitmap data 
must

be considered inconsistent and should not be loaded.

With current implementation this flag is never set, as we just remove
bitmaps from the image after loading. But it defined in qcow2 spec 
and

must be handled. Also, it can be used in future, if async schemes of
bitmap loading/saving are implemented.

We also remove in-use bitmaps from the image on open.

Signed-off-by: Vladimir Sementsov-Ogievskiy 


---
   block/qcow2-bitmap.c | 17 -
   1 file changed, 16 insertions(+), 1 deletion(-)
Don't you want to make use of this flag? It would appear useful to 
me if
you just marked autoload bitmaps as in_use instead of deleting them 
from

the image when it's opened and then overwrite them when the image is
closed.

And what is the use of it?

You don't need to free any bitmaps when opening the file, and you don't
need to allocate any new bitmap space when closing it.


As bitmaps are sparce in file, I need to allocate new space when 
closing. Or free it...



Is it a real problem to reallocate space in qcow2? If so, to minimaze 
allocations, I will have to make a list of clusters of in_use bitmaps on 
close, and then save current bitmaps to these clusters (and allocated 
some additional clusters, or free extra clusters).


Also, if I don't free them on open, I'll have to free them on remove of 
bdrv dirty bitmap..










Any way, storing an invalid copy of online data is bad I think.
Therefore I will have to free bitmap data clusters and zero bitmap 
table

in file.

Why can't you just set this flag to mark it invalid? In case qemu
crashes, in one case you'll have some invalid bitmaps and in the other
you won't have any of them at all. I don't think there's a difference
from that perspective.

Max







--
Best regards,
Vladimir

[Qemu-block] [PATCH v5 0/4] fdc: Use separate qdev device for drives

2016-10-24 Thread Kevin Wolf

We have been complaining for a long time about how the floppy controller and
floppy drives are combined in a single qdev device and how this makes the
device awkward to work with because it behaves different from all other block
devices.

The latest reason to complain was when I noticed that using qdev device names
in QMP commands (e.g. for media change) doesn't really work when only the
controller is a qdev device, but the drives aren't.

So I decided to have a go at it, and this is the result.

It doesn't actually change any of the inner workings of the floppy controller,
but it wires things up differently on the qdev layer so that a floppy
controller now exposes a bus on which the floppy drives sit. This results in a
structure that is similar to IDE where the actual drive state is still in the
controller and the qdev device basically just contains the qdev properties -
not pretty, but quite workable.

The commit message of patch 3 explains how to use it. In short, there is a
'-device floppy' now and it does what you would expect if you ever used ide-cd.

The other problem is old command lines, especially those using things like
'-global isa-fdc,driveA=...'. In order to keep them working, we need to forward
the property to an internally created floppy drive device. This is a bit like
usb-storage, which we know is ugly, but works well enough in practice. The good
news here is that in contrast to usb-storage, the floppy controller only does
the forwarding for legacy configurations; as soon as you start using '-device
floppy', it doesn't happen any more.

So as you may have expected, this conversion doesn't result in a perfect
device, but I think it's definitely an improvement over the old state. I hope
you like it despite the warts. :-)

v5:
- Apply _filter_qemu to stderr, too [John]
- Rename the bus to floppy-bus [Frederic]
- Use FLOPPY_BUS() instead of DO_UPDATE() [Frederic]

v4:
- John says that his grep is broken and hangs at 100% CPU with my attempt to
  extract the floppy controller from info qtree. Use a simpler sed command
  instead (which, unlike the grep command, doesn't handle arbitrary indentation
  level of the next item, but we know what comes next, so just hardcoding 10
  spaces works, too).

v3:
- Fixed omissons in the conversion sysbus-fdc and the Sun one. Nice, combining
  floppy controllers and weird platforms in a single series. [John]

v2:
- Added patch 4 (qemu-iotests case for floppy config on the command line)
- Patch 2: Create a floppy device only if a BlockBackend exists instead of
  always creating two of them
- Patch 2: Initialise drive->fdctrl even if no drive is attached, it is
  accessed anyway during migration
- Patch 3: Keep 'type' qdev property and FDrive->drive in sync
- Patch 3: Removed if with condition that is always true

Alberto Garcia (2):
  throttle: Correct access to wrong BlockBackendPublic structures
  qemu-iotests: Test I/O in a single drive from a throttling group

Halil Pasic (1):
  block: improve error handling in raw_open

Pino Toscano (1):
  qapi: fix memory leak in bdrv_image_info_specific_dump

 block/qapi.c   |  1 +
 block/raw-posix.c  |  1 +
 block/raw-win32.c  |  1 +
 block/throttle-groups.c| 27 +++
 tests/qemu-iotests/093 | 33 -
 tests/qemu-iotests/093.out |  4 ++--
 6 files changed, 56 insertions(+), 11 deletions(-)

-- 
1.8.3.1

[Qemu-block] [PATCH v5 1/4] block: improve error handling in raw_open

2016-10-24 Thread Kevin Wolf

From: Halil Pasic 

Make raw_open for POSIX more consistent in handling errors by setting
the error object also when qemu_open fails. The error object was set
generally set in case of errors, but I guess this case was overlooked.
Do the same for win32.

Signed-off-by: Halil Pasic 
Reviewed-by: Sascha Silbe 
Tested-by: Marc Hartmayer  (POSIX only)
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Kevin Wolf 
---
 block/raw-posix.c | 1 +
 block/raw-win32.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 166e9d1..f481e57 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -443,6 +443,7 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 fd = qemu_open(filename, s->open_flags, 0644);
 if (fd < 0) {
 ret = -errno;
+error_setg_errno(errp, errno, "Could not open '%s'", filename);
 if (ret == -EROFS) {
 ret = -EACCES;
 }
diff --git a/block/raw-win32.c b/block/raw-win32.c
index 734bb10..800fabd 100644
--- a/block/raw-win32.c
+++ b/block/raw-win32.c
@@ -373,6 +373,7 @@ static int raw_open(BlockDriverState *bs, QDict *options, 
int flags,
 if (s->hfile == INVALID_HANDLE_VALUE) {
 int err = GetLastError();
 
+error_setg_win32(errp, err, "Could not open '%s'", filename);
 if (err == ERROR_ACCESS_DENIED) {
 ret = -EACCES;
 } else {
-- 
1.8.3.1

[Qemu-block] [PATCH v5 2/4] qapi: fix memory leak in bdrv_image_info_specific_dump

2016-10-24 Thread Kevin Wolf

From: Pino Toscano 

The 'obj' result of the visitor was not properly freed, like done in
other places doing a similar job.

Signed-off-by: Pino Toscano 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 block/qapi.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/qapi.c b/block/qapi.c
index 6f947e3..50d3090 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -698,6 +698,7 @@ void bdrv_image_info_specific_dump(fprintf_function 
func_fprintf, void *f,
 assert(qobject_type(obj) == QTYPE_QDICT);
 data = qdict_get(qobject_to_qdict(obj), "data");
 dump_qobject(func_fprintf, f, 1, data);
+qobject_decref(obj);
 visit_free(v);
 }
 
-- 
1.8.3.1

[Qemu-block] [PATCH v5 4/4] qemu-iotests: Test I/O in a single drive from a throttling group

2016-10-24 Thread Kevin Wolf

From: Alberto Garcia 

iotest 093 contains a test that creates a throttling group with
several drives and performs I/O in all of them. This patch adds a new
test that creates a similar setup but only performs I/O in one of the
drives at the same time.

This is useful to test that the round robin algorithm is behaving
properly in these scenarios, and is specifically written using the
regression introduced in 27ccdd52598290f0f8b58be56e as an example.

Signed-off-by: Alberto Garcia 
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/093 | 33 -
 tests/qemu-iotests/093.out |  4 ++--
 2 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/tests/qemu-iotests/093 b/tests/qemu-iotests/093
index ffcb271..2ed393a 100755
--- a/tests/qemu-iotests/093
+++ b/tests/qemu-iotests/093
@@ -53,7 +53,7 @@ class ThrottleTestCase(iotests.QMPTestCase):
 result = self.vm.qmp("block_set_io_throttle", conv_keys=False, 
**params)
 self.assert_qmp(result, 'return', {})
 
-def do_test_throttle(self, ndrives, seconds, params):
+def do_test_throttle(self, ndrives, seconds, params, first_drive = 0):
 def check_limit(limit, num):
 # IO throttling algorithm is discrete, allow 10% error so the test
 # is more robust
@@ -85,12 +85,14 @@ class ThrottleTestCase(iotests.QMPTestCase):
 # Send I/O requests to all drives
 for i in range(rd_nr):
 for drive in range(0, ndrives):
-self.vm.hmp_qemu_io("drive%d" % drive, "aio_read %d %d" %
+idx = first_drive + drive
+self.vm.hmp_qemu_io("drive%d" % idx, "aio_read %d %d" %
 (i * rq_size, rq_size))
 
 for i in range(wr_nr):
 for drive in range(0, ndrives):
-self.vm.hmp_qemu_io("drive%d" % drive, "aio_write %d %d" %
+idx = first_drive + drive
+self.vm.hmp_qemu_io("drive%d" % idx, "aio_write %d %d" %
 (i * rq_size, rq_size))
 
 # We'll store the I/O stats for each drive in these arrays
@@ -105,15 +107,17 @@ class ThrottleTestCase(iotests.QMPTestCase):
 
 # Read the stats before advancing the clock
 for i in range(0, ndrives):
+idx = first_drive + i
 start_rd_bytes[i], start_rd_iops[i], start_wr_bytes[i], \
-start_wr_iops[i] = self.blockstats('drive%d' % i)
+start_wr_iops[i] = self.blockstats('drive%d' % idx)
 
 self.vm.qtest("clock_step %d" % ns)
 
 # Read the stats after advancing the clock
 for i in range(0, ndrives):
+idx = first_drive + i
 end_rd_bytes[i], end_rd_iops[i], end_wr_bytes[i], \
-end_wr_iops[i] = self.blockstats('drive%d' % i)
+end_wr_iops[i] = self.blockstats('drive%d' % idx)
 
 # Check that the I/O is within the limits and evenly distributed
 for i in range(0, ndrives):
@@ -129,6 +133,7 @@ class ThrottleTestCase(iotests.QMPTestCase):
 self.assertTrue(check_limit(params['iops_rd'], rd_iops))
 self.assertTrue(check_limit(params['iops_wr'], wr_iops))
 
+# Connect N drives to a VM and test I/O in all of them
 def test_all(self):
 params = {"bps": 4096,
   "bps_rd": 4096,
@@ -146,6 +151,24 @@ class ThrottleTestCase(iotests.QMPTestCase):
 self.configure_throttle(ndrives, limits)
 self.do_test_throttle(ndrives, 5, limits)
 
+# Connect N drives to a VM and test I/O in just one of them a time
+def test_one(self):
+params = {"bps": 4096,
+  "bps_rd": 4096,
+  "bps_wr": 4096,
+  "iops": 10,
+  "iops_rd": 10,
+  "iops_wr": 10,
+ }
+# Repeat the test for each one of the drives
+for drive in range(0, self.max_drives):
+# Pick each out of all possible params and test
+for tk in params:
+limits = dict([(k, 0) for k in params])
+limits[tk] = params[tk] * self.max_drives
+self.configure_throttle(self.max_drives, limits)
+self.do_test_throttle(1, 5, limits, drive)
+
 def test_burst(self):
 params = {"bps": 4096,
   "bps_rd": 4096,
diff --git a/tests/qemu-iotests/093.out b/tests/qemu-iotests/093.out
index 914e373..2f7d390 100644
--- a/tests/qemu-iotests/093.out
+++ b/tests/qemu-iotests/093.out
@@ -1,5 +1,5 @@
-.
+...
 --
-Ran 5 tests
+Ran 7 tests
 
 OK
-- 
1.8.3.1

[Qemu-block] [PATCH v5 3/4] throttle: Correct access to wrong BlockBackendPublic structures

2016-10-24 Thread Kevin Wolf

From: Alberto Garcia 

In 27ccdd52598290f0f8b58be56e235aff7aebfaf3 the throttling fields were
moved from BlockDriverState to BlockBackend. However in a few cases
the code started using throttling fields from the active BlockBackend
instead of the round-robin token, making the algorithm behave
incorrectly.

This can cause starvation if there's a throttling group with several
drives but only one of them has I/O.

Cc: qemu-sta...@nongnu.org
Reported-by: Paolo Bonzini 
Signed-off-by: Alberto Garcia 
Reviewed-by: Paolo Bonzini 
Signed-off-by: Kevin Wolf 
---
 block/throttle-groups.c | 27 +++
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index 59545e2..17b2efb 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -168,6 +168,22 @@ static BlockBackend *throttle_group_next_blk(BlockBackend 
*blk)
 return blk_by_public(next);
 }
 
+/*
+ * Return whether a BlockBackend has pending requests.
+ *
+ * This assumes that tg->lock is held.
+ *
+ * @blk: the BlockBackend
+ * @is_write:  the type of operation (read/write)
+ * @ret:   whether the BlockBackend has pending requests.
+ */
+static inline bool blk_has_pending_reqs(BlockBackend *blk,
+bool is_write)
+{
+const BlockBackendPublic *blkp = blk_get_public(blk);
+return blkp->pending_reqs[is_write];
+}
+
 /* Return the next BlockBackend in the round-robin sequence with pending I/O
  * requests.
  *
@@ -188,7 +204,7 @@ static BlockBackend *next_throttle_token(BlockBackend *blk, 
bool is_write)
 
 /* get next bs round in round robin style */
 token = throttle_group_next_blk(token);
-while (token != start && !blkp->pending_reqs[is_write]) {
+while (token != start && !blk_has_pending_reqs(token, is_write)) {
 token = throttle_group_next_blk(token);
 }
 
@@ -196,10 +212,13 @@ static BlockBackend *next_throttle_token(BlockBackend 
*blk, bool is_write)
  * then decide the token is the current bs because chances are
  * the current bs get the current request queued.
  */
-if (token == start && !blkp->pending_reqs[is_write]) {
+if (token == start && !blk_has_pending_reqs(token, is_write)) {
 token = blk;
 }
 
+/* Either we return the original BB, or one with pending requests */
+assert(token == blk || blk_has_pending_reqs(token, is_write));
+
 return token;
 }
 
@@ -257,7 +276,7 @@ static void schedule_next_request(BlockBackend *blk, bool 
is_write)
 
 /* Check if there's any pending request to schedule next */
 token = next_throttle_token(blk, is_write);
-if (!blkp->pending_reqs[is_write]) {
+if (!blk_has_pending_reqs(token, is_write)) {
 return;
 }
 
@@ -271,7 +290,7 @@ static void schedule_next_request(BlockBackend *blk, bool 
is_write)
 qemu_co_queue_next(&blkp->throttled_reqs[is_write])) {
 token = blk;
 } else {
-ThrottleTimers *tt = &blkp->throttle_timers;
+ThrottleTimers *tt = &blk_get_public(token)->throttle_timers;
 int64_t now = qemu_clock_get_ns(tt->clock_type);
 timer_mod(tt->timers[is_write], now + 1);
 tg->any_timer_armed[is_write] = true;
-- 
1.8.3.1

Re: [Qemu-block] [Qemu-devel] [PATCH v14 02/21] qdict: implement a qdict_crumple method for un-flattening a dict

2016-10-24 Thread Markus Armbruster

"Daniel P. Berrange"  writes:

> On Mon, Oct 24, 2016 at 11:18:29AM +0200, Markus Armbruster wrote:
>> Max Reitz  writes:
>> 
>> > On 21.10.2016 11:58, Markus Armbruster wrote:
>> >> "Daniel P. Berrange"  writes:
>> >> 
>> >>> On Tue, Oct 18, 2016 at 04:32:13PM +0200, Markus Armbruster wrote:
>>  "Daniel P. Berrange"  writes:
>> 
>> > The qdict_flatten() method will take a dict whose elements are
>> > further nested dicts/lists and flatten them by concatenating
>> > keys.
>> >
>> > The qdict_crumple() method aims to do the reverse, taking a flat
>> > qdict, and turning it into a set of nested dicts/lists. It will
>> > apply nesting based on the key name, with a '.' indicating a
>> > new level in the hierarchy. If the keys in the nested structure
>> > are all numeric, it will create a list, otherwise it will create
>> > a dict.
>> >
>> > If the keys are a mixture of numeric and non-numeric, or the
>> > numeric keys are not in strictly ascending order, an error will
>> > be reported.
>> >
>> > As an example, a flat dict containing
>> >
>> >  {
>> >'foo.0.bar': 'one',
>> >'foo.0.wizz': '1',
>> >'foo.1.bar': 'two',
>> >'foo.1.wizz': '2'
>> >  }
>> >
>> > will get turned into a dict with one element 'foo' whose
>> > value is a list. The list elements will each in turn be
>> > dicts.
>> >
>> >  {
>> >'foo': [
>> >  { 'bar': 'one', 'wizz': '1' },
>> >  { 'bar': 'two', 'wizz': '2' }
>> >],
>> >  }
>> >
>> > If the key is intended to contain a literal '.', then it must
>> > be escaped as '..'. ie a flat dict
>> >
>> >   {
>> >  'foo..bar': 'wizz',
>> >  'bar.foo..bar': 'eek',
>> >  'bar.hello': 'world'
>> >   }
>> >
>> > Will end up as
>> >
>> >   {
>> >  'foo.bar': 'wizz',
>> >  'bar': {
>> > 'foo.bar': 'eek',
>> > 'hello': 'world'
>> >  }
>> >   }
>> >
>> > The intent of this function is that it allows a set of QemuOpts
>> > to be turned into a nested data structure that mirrors the nesting
>> > used when the same object is defined over QMP.
>> >
>> > Reviewed-by: Eric Blake 
>> > Reviewed-by: Kevin Wolf 
>> > Reviewed-by: Marc-André Lureau 
>> > Signed-off-by: Daniel P. Berrange 
>> > ---
>> >  include/qapi/qmp/qdict.h |   1 +
>> >  qobject/qdict.c  | 289 
>> > +++
>> >  tests/check-qdict.c  | 261 
>> > ++
>> >  3 files changed, 551 insertions(+)
>> >
>> > diff --git a/include/qapi/qmp/qdict.h b/include/qapi/qmp/qdict.h
>> > index 71b8eb0..e0d24e1 100644
>> > --- a/include/qapi/qmp/qdict.h
>> > +++ b/include/qapi/qmp/qdict.h
>> > @@ -73,6 +73,7 @@ void qdict_flatten(QDict *qdict);
>> >  void qdict_extract_subqdict(QDict *src, QDict **dst, const char 
>> > *start);
>> >  void qdict_array_split(QDict *src, QList **dst);
>> >  int qdict_array_entries(QDict *src, const char *subqdict);
>> > +QObject *qdict_crumple(const QDict *src, bool recursive, Error 
>> > **errp);
>> >  
>> >  void qdict_join(QDict *dest, QDict *src, bool overwrite);
>> >  
>> > diff --git a/qobject/qdict.c b/qobject/qdict.c
>> > index 60f158c..c38e90e 100644
>> > --- a/qobject/qdict.c
>> > +++ b/qobject/qdict.c
>>  [...]
>> > +/**
>> > + * qdict_crumple:
>> > + * @src: the original flat dictionary (only scalar values) to crumple
>> > + * @recursive: true to recursively crumple nested dictionaries
>> 
>>  Is recursive=false used outside tests in this series?
>> >>>
>> >>> No, its not used.
>> >>>
>> >>> It was suggested in a way earlier version by Max, but not sure if his
>> >>> code uses it or not.
>> >> 
>> >> In general, I prefer features to be added right before they're used, and
>> >> I really dislike adding features "just in case".  YAGNI.
>> >> 
>> >> Max, do you actually need this one?  If yes, please explain your use
>> >> case.
>> >
>> > As far as I can tell from a quick glance, I made the point for v1 that
>> > qdict_crumple() could be simplified by using qdict_array_split() and
>> > qdict_array_entries().
>> >
>> > Dan then (correctly) said that using these functions would worsen
>> > runtime performance of qdict_crumple() and that instead we can replace
>> > qdict_array_split() by qdict_crumple(). However, for that to work, we
>> > need to be able to make qdict_crumple() non-recursive (because
>> > qdict_array_split() is non-recursive).
>> >
>> > Dan also said that in the long run we want to keep the tree structure in
>> > the block layer instead of flattening everything down which would rid us
>> > of several other QDict functions (and would make non-recursive behavior
>> > obsolete aga

Re: [Qemu-block] [PATCH 0/2] less confusing block file names

2016-10-24 Thread Kevin Wolf

Am 20.10.2016 um 03:11 hat Eric Blake geschrieben:
> Based on a suggestion here:
> https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg00350.html
> 
> Eric Blake (2):
>   block: Rename raw_bsd to raw.c
>   block: rename raw-{posix,win32} to file-*.c

Seems to do what it promises:
Reviewed-by: Kevin Wolf 

I'd like to hear a few more opinions on whether it's a good idea, but I
don't think this should stay for too long on the list because we already
have conflicting patches.

One effect that makes me less than fully happy is that 'git log
block/raw.c' without --follow mixes the history of the renamed driver
with the history of the old, differently licensed one. On the other
hand, the removal of the old driver happened three years ago, so it's
probably unlikely that people confuse what belongs to which one and
revive old code accidentally.

Kevin

Re: [Qemu-block] [PATCH v4 0/3] iotests: Fix test 162

2016-10-24 Thread Kevin Wolf

Am 18.10.2016 um 11:36 hat Kevin Wolf geschrieben:
> Am 17.10.2016 um 19:07 hat Max Reitz geschrieben:
> > On 28.09.2016 22:46, Max Reitz wrote:
> > > 162 is potentially racy and makes some invalid assumptions about what
> > > should happen when connecting to a non-existing domain name. This series
> > > fixes both issues.
> > > 
> > > 
> > > v4:
> > > - Added documentation for the new --fork option [Kevin]
> > > 
> > > 
> > > git-backport-diff against v3:
> > > 
> > > Key:
> > > [] : patches are identical
> > > [] : number of functional differences between upstream/downstream
> > > patch
> > > [down] : patch is downstream-only
> > > The flags [FC] indicate (F)unctional and (C)ontextual differences,
> > > respectively
> > > 
> > > 001/3:[0004] [FC] 'qemu-nbd: Add --fork option'
> > > 002/3:[] [--] 'iotests: Remove raciness from 162'
> > > 003/3:[] [--] 'iotests: Do not rely on unavailable domains in 162'
> > > 
> > > 
> > > Max Reitz (3):
> > >   qemu-nbd: Add --fork option
> > >   iotests: Remove raciness from 162
> > >   iotests: Do not rely on unavailable domains in 162
> > > 
> > >  qemu-nbd.c | 17 -
> > >  qemu-nbd.texi  |  2 ++
> > >  tests/qemu-iotests/162 | 22 --
> > >  tests/qemu-iotests/162.out |  2 +-
> > >  4 files changed, 35 insertions(+), 8 deletions(-)
> > 
> > Ping
> 
> I expected that Sascha would review this as he had comments on earlier
> versions?

Apparently that's not happening, so I've applied the series now.

Kevin


pgpXGqEDxET8q.pgp
Description: PGP signature

Re: [Qemu-block] [PATCH 15/22] qcow2-bitmap: add autoclear bit

2016-10-24 Thread Vladimir Sementsov-Ogievskiy


07.10.2016 23:11, Max Reitz пишет:

On 30.09.2016 12:53, Vladimir Sementsov-Ogievskiy wrote:

Add autoclear bit for handling rewriting image by old qemu version.

"Add support for the autoclear bit [...]"?


finally "Add support for the autoclear bit to handle rewriting image by old
qemu version" - ok?

Or I can merge this patch into two earlier patches 06 and 07.




If autoclear bit is not set, but bitmaps extension is found it
would not be loaded and warning will be generated.

"If the autoclear bit is not set, but the bitmaps extension is found,
the bitmaps will not be loaded and a warning will be generated."


Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/qcow2-bitmap.c |  4 
  block/qcow2.c| 12 ++--
  block/qcow2.h|  9 +
  3 files changed, 23 insertions(+), 2 deletions(-)

Apart from the above, the patch looks good, but why does it come so late
in the series?

So with the commit message fixed:

Reviewed-by: Max Reitz 




--
Best regards,
Vladimir

Re: [Qemu-block] [Qemu-devel] [PATCH 1/2] block: Rename raw_bsd to raw.c

2016-10-24 Thread Laszlo Ersek

On 10/20/16 03:11, Eric Blake wrote:
> The file has nothing to do with the BSD operating system,

The name carries "_bsd" because of the license that covers the file.
While the file has nothing to do with the BSD operating system, it is
fully related to the BSD license. Please see commit range

  e1c66c6d82fe^..e5b1d99f5528

So, I recommend to remove the BSD/BDS language from the commit message;
it makes the current name appear as a typo. It was not a typo.

Other than that, I entirely welcome this series; I still get confused
about protocol drivers vs. format drivers. Based on an explanation I got
from Kevin two and a half years ago :), for exactly this set of source
files, "raw_bsd.c" (now "raw.c") is a format driver, while "raw-posix.c"
(now "file-posix.c") is a protocol driver. I think the proposed names
render these roles easier to understand.

Thanks!
Laszlo

> but
> is rather dealing with the raw data format as a BDS.  Simplify
> the name to avoid further confusion.  [Plus I hate typing _ in
> file names - the shift key slows things down]
> 
> Suggested-by: Daniel P. Berrange 
> Signed-off-by: Eric Blake 
> ---
>  block/{raw_bsd.c => raw.c} | 0
>  MAINTAINERS| 2 +-
>  block/Makefile.objs| 2 +-
>  3 files changed, 2 insertions(+), 2 deletions(-)
>  rename block/{raw_bsd.c => raw.c} (100%)
> 
> diff --git a/block/raw_bsd.c b/block/raw.c
> similarity index 100%
> rename from block/raw_bsd.c
> rename to block/raw.c
> diff --git a/MAINTAINERS b/MAINTAINERS
> index b01fec0..6f984c3 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1677,7 +1677,7 @@ F: block/linux-aio.c
>  F: include/block/raw-aio.h
>  F: block/raw-posix.c
>  F: block/raw-win32.c
> -F: block/raw_bsd.c
> +F: block/raw.c
>  F: block/win32-aio.c
> 
>  qcow2
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index 67a036a..c10941e 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -1,4 +1,4 @@
> -block-obj-y += raw_bsd.o qcow.o vdi.o vmdk.o cloop.o bochs.o vpc.o vvfat.o 
> dmg.o
> +block-obj-y += raw.o qcow.o vdi.o vmdk.o cloop.o bochs.o vpc.o vvfat.o dmg.o
>  block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o 
> qcow2-cache.o
>  block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>  block-obj-y += qed-check.o
>

Re: [Qemu-block] [PATCH 16/22] qmp: add persistent flag to block-dirty-bitmap-add

2016-10-24 Thread Vladimir Sementsov-Ogievskiy


07.10.2016 22:52, Eric Blake пишет:

On 09/30/2016 05:53 AM, Vladimir Sementsov-Ogievskiy wrote:

Add optional 'persistent' flag to qmp command block-dirty-bitmap-add.
Default is false.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Denis V. Lunev 
---
  blockdev.c   | 12 +++-
  qapi/block-core.json |  7 ++-
  qmp-commands.hx  |  5 -

This and later patches have conflicts with recent changes in the tree
that removed qmp-commands.hx; hopefully shouldn't impact review of this
series too badly.


+++ b/qapi/block-core.json
@@ -1235,10 +1235,15 @@
  # @granularity: #optional the bitmap granularity, default is 64k for
  #   block-dirty-bitmap-add
  #
+# @persistent: #optional the bitmap is persistent, i.e. it will be saved to
+#  corresponding block device on it's close. Default is false.

s/corresponding/the corresponding/
s/it's/its/

("it's" is only appropriate if you can substitute "it is" or "it has")



@@ -1458,6 +1458,9 @@ Arguments:
  - "node": device/node on which to create dirty bitmap (json-string)
  - "name": name of the new dirty bitmap (json-string)
  - "granularity": granularity to track writes with (int, optional)
+- "persistent": bitmap will be saved to corresponding block device
+on it's close. Block driver should maintain persistent bitmaps

One nice benefit of rebasing on top of qmp-commands.hx being removed is
that you don't have to repeat yourself :)


As I understand, duplicating description is still here, in 
docs/qmp-commands.txt. I just don't have to repeat parameters definition 
(which is good, anyway)




--
Best regards,
Vladimir

Re: [Qemu-block] [PATCH 0/2] less confusing block file names

2016-10-24 Thread Kevin Wolf

Am 24.10.2016 um 16:44 hat Paolo Bonzini geschrieben:
> On 24/10/2016 15:47, Kevin Wolf wrote:
> > One effect that makes me less than fully happy is that 'git log
> > block/raw.c' without --follow mixes the history of the renamed driver
> > with the history of the old, differently licensed one. On the other
> > hand, the removal of the old driver happened three years ago, so it's
> > probably unlikely that people confuse what belongs to which one and
> > revive old code accidentally.
> 
> Hmm, this makes it sensible to only apply patch 2, especially since Eric
> has corteously split them. :)

Maybe rename to something like raw-format.c instead then in patch 1?

I must admit that it's reasonable enough to make the assumption that
raw_bsd.c has something to do with a BSD-specific block driver.

Kevin

Re: [Qemu-block] [PATCH 1/2] block/nfs: Introduce runtime_opts in NFS

2016-10-24 Thread Kevin Wolf

Am 19.10.2016 um 14:38 hat Ashijeet Acharya geschrieben:
> Make NFS block driver use various fine grained runtime_opts.
> Set .bdrv_parse_filename() to nfs_parse_filename() and introduce two
> new functions nfs_parse_filename() and nfs_parse_uri() to help parsing
> the URI. This will help us to prepare the NFS for blockdev-add.
> 
> Signed-off-by: Ashijeet Acharya 
> ---
>  block/nfs.c | 360 
> +++-
>  1 file changed, 261 insertions(+), 99 deletions(-)
> 
> diff --git a/block/nfs.c b/block/nfs.c
> index 8602a44..5eb909e 100644
> --- a/block/nfs.c
> +++ b/block/nfs.c
> @@ -35,8 +35,12 @@
>  #include "qemu/uri.h"
>  #include "qemu/cutils.h"
>  #include "sysemu/sysemu.h"
> +#include "qapi/qmp/qdict.h"
> +#include "qapi/qmp/qint.h"
> +#include "qapi/qmp/qstring.h"
>  #include 
>  
> +
>  #define QEMU_NFS_MAX_READAHEAD_SIZE 1048576
>  #define QEMU_NFS_MAX_PAGECACHE_SIZE (8388608 / NFS_BLKSIZE)
>  #define QEMU_NFS_MAX_DEBUG_LEVEL 2
> @@ -61,6 +65,127 @@ typedef struct NFSRPC {
>  NFSClient *client;
>  } NFSRPC;
>  
> +static int nfs_parse_uri(const char *filename, QDict *options, Error **errp)
> +{
> +URI *uri = NULL;
> +QueryParams *qp = NULL;
> +int ret = 0, i;
> +const char *p;
> +
> +uri = uri_parse(filename);
> +if (!uri) {
> +error_setg(errp, "Invalid URI specified");
> +ret = -EINVAL;
> +goto out;
> +}
> +if (strcmp(uri->scheme, "nfs") != 0) {
> +error_setg(errp, "URI scheme must be 'nfs'");
> +ret = -EINVAL;
> +goto out;
> +}
> +
> +if (!uri->server || strcmp(uri->server, "") == 0) {

No need to use strcmp(), !*uri->server is enough.

> +error_setg(errp, "missing hostname in URI");
> +ret = -EINVAL;
> +goto out;
> +}
> +
> +if (!uri->path || strcmp(uri->path, "") == 0) {
> +error_setg(errp, "missing file path in URI");
> +ret = -EINVAL;
> +goto out;
> +}
> +
> +p = uri->path ? uri->path : "/";

You just checked that uri->path is non-NULL, so this is dead code.

> +p += strspn(p, "/");
> +if (p[0]) {
> +qdict_put(options, "export", qstring_from_str(p));
> +}

"export" isn't among the recognised options. You may have missed this
code fragment when removing the option from your patch.

> +qp = query_params_parse(uri->query);
> +if (!qp) {
> +error_setg(errp, "could not parse query parameters");
> +ret = -EINVAL;
> +goto out;
> +}
> +
> +qdict_put(options, "host", qstring_from_str(uri->server));
> +
> +qdict_put(options, "path", qstring_from_str(uri->path));
> +
> +for (i = 0; i < qp->n; i++) {
> +if (!qp->p[i].value) {
> +error_setg(errp, "Value for NFS parameter expected: %s",
> +   qp->p[i].name);
> +goto out;
> +}
> +if (parse_uint_full(qp->p[i].value, NULL, 0)) {
> +error_setg(errp, "Illegal value for NFS parameter: %s",
> +   qp->p[i].name);
> +goto out;
> +}
> +if (!strcmp(qp->p[i].name, "uid")) {
> +qdict_put(options, "uid",
> +  qstring_from_str(qp->p[i].value));
> +} else if (!strcmp(qp->p[i].name, "gid")) {
> +qdict_put(options, "gid",
> +  qstring_from_str(qp->p[i].value));
> +} else if (!strcmp(qp->p[i].name, "tcp-syncnt")) {
> +qdict_put(options, "tcp-syncnt",
> +  qstring_from_str(qp->p[i].value));
> +} else if (!strcmp(qp->p[i].name, "readahead")) {
> +qdict_put(options, "readahead",
> +  qstring_from_str(qp->p[i].value));
> +} else if (!strcmp(qp->p[i].name, "pagecache")) {
> +qdict_put(options, "pagecache",
> +  qstring_from_str(qp->p[i].value));
> +} else if (!strcmp(qp->p[i].name, "debug")) {
> +qdict_put(options, "debug",
> +  qstring_from_str(qp->p[i].value));
> +} else {
> +error_setg(errp, "Unknown NFS parameter name: %s",
> +   qp->p[i].name);
> +goto out;
> +}
> +}
> +out:
> +if (qp) {
> +query_params_free(qp);
> +}
> +if (uri) {
> +uri_free(uri);
> +}
> +return ret;
> +}
> +
> +static void nfs_parse_filename(const char *filename, QDict *options,
> +   Error **errp)
> +{
> +int ret = 0;
> +
> +if (qdict_haskey(options, "host") ||
> +qdict_haskey(options, "path") ||
> +qdict_haskey(options, "uid") ||
> +qdict_haskey(options, "gid") ||
> +qdict_haskey(options, "tcp-syncnt") ||
> +qdict_haskey(options, "readahead") ||
> +qdict_haskey(options, "pagecache") ||
> +qdict_haskey(options, "debug")) {
> +error_setg(errp, "host/path/uid/gid/tcp-syncnt/readahead/pagecach

Re: [Qemu-block] [PATCH 0/2] less confusing block file names

2016-10-24 Thread Eric Blake

On 10/24/2016 08:47 AM, Kevin Wolf wrote:
> Am 20.10.2016 um 03:11 hat Eric Blake geschrieben:
>> Based on a suggestion here:
>> https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg00350.html
>>
>> Eric Blake (2):
>>   block: Rename raw_bsd to raw.c
>>   block: rename raw-{posix,win32} to file-*.c
> 
> Seems to do what it promises:
> Reviewed-by: Kevin Wolf 
> 
> I'd like to hear a few more opinions on whether it's a good idea, but I
> don't think this should stay for too long on the list because we already
> have conflicting patches.

On the bright side, git with rename detection enabled is generally able
to patch the correct file, even after a rename.

> 
> One effect that makes me less than fully happy is that 'git log
> block/raw.c' without --follow mixes the history of the renamed driver
> with the history of the old, differently licensed one. On the other
> hand, the removal of the old driver happened three years ago, so it's
> probably unlikely that people confuse what belongs to which one and
> revive old code accidentally.

I guess knowing the history explains the current file name; and as you
mentioned, we're starting to get far enough away from the point the file
was created that the history of the name is less relevant and less
likely to cause mistakes in code motion.  And it's not the first time
we've reused a file name within the tree (look at the history of
top-level qjson.c, which has existed twice before being moved to another
location twice).

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH 17/22] qmp: add autoload parameter to block-dirty-bitmap-add

2016-10-24 Thread Vladimir Sementsov-Ogievskiy


10.10.2016 19:25, Max Reitz wrote:

On 30.09.2016 12:53, Vladimir Sementsov-Ogievskiy wrote:

Optional. Default is false.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Denis V. Lunev 
---
  blockdev.c   | 22 --
  qapi/block-core.json |  7 ++-
  qmp-commands.hx  |  5 -
  3 files changed, 30 insertions(+), 4 deletions(-)

Design question: I see that being able to specify these flags when
creating bitmaps is useful. However, would about a way for the user to
change these flags on an existing dirty bitmap? Would you consider that
useful?

(Of course, if so, it can always be added later, we don't need it now.)


I vote for adding later, if needed. For now we don't have such scenarios.


--
Best regards,
Vladimir

Re: [Qemu-block] [PATCH 0/2] less confusing block file names

2016-10-24 Thread Paolo Bonzini



On 24/10/2016 15:47, Kevin Wolf wrote:
> One effect that makes me less than fully happy is that 'git log
> block/raw.c' without --follow mixes the history of the renamed driver
> with the history of the old, differently licensed one. On the other
> hand, the removal of the old driver happened three years ago, so it's
> probably unlikely that people confuse what belongs to which one and
> revive old code accidentally.

Hmm, this makes it sensible to only apply patch 2, especially since Eric
has corteously split them. :)

Paolo

Re: [Qemu-block] [PATCH 16/22] qmp: add persistent flag to block-dirty-bitmap-add

2016-10-24 Thread Vladimir Sementsov-Ogievskiy


10.10.2016 19:08, Max Reitz wrote:

On 30.09.2016 12:53, Vladimir Sementsov-Ogievskiy wrote:



[...]


+}
  
   out:

  aio_context_release(aio_context);
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 31f9990..2bf56cd 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1235,10 +1235,15 @@
  # @granularity: #optional the bitmap granularity, default is 64k for
  #   block-dirty-bitmap-add
  #
+# @persistent: #optional the bitmap is persistent, i.e. it will be saved to
+#  corresponding block device on it's close. Default is false.
+#  For block-dirty-bitmap-add. (Since 2.8)

I'm not sure what the "For block-dirty-bitmap-add." is supposed to mean,
because this whole struct is for block-dirty-bitmap-add (and for
block-dirty-bitmap-add transactions, to be exact, but @persistent will
surely work there, too, won't it?).

Also, I'd say "will be saved to the corresponding block device image
file" instead of just "block device", because in my understanding a
block device and its image file are two separate things.


Hmm.. But 'its close' is block device close, not file close. And we call 
common bdrv_* function to save it, so we actually save it to device, and 
then the device puzzles out, how to actually save it.





+#
  # Since 2.4
  ##
  { 'struct': 'BlockDirtyBitmapAdd',
-  'data': { 'node': 'str', 'name': 'str', '*granularity': 'uint32' } }
+  'data': { 'node': 'str', 'name': 'str', '*granularity': 'uint32',
+  '*persistent': 'bool' } }

I think normally we'd align the new line so that the opening ' of
'*persistent' is under the opening ' of 'node'.

  
  ##

  # @block-dirty-bitmap-add
diff --git a/qmp-commands.hx b/qmp-commands.hx
index ba2a916..434b418 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1441,7 +1441,7 @@ EQMP
  
  {

  .name   = "block-dirty-bitmap-add",
-.args_type  = "node:B,name:s,granularity:i?",
+.args_type  = "node:B,name:s,granularity:i?,persistent:b?",
  .mhandler.cmd_new = qmp_marshal_block_dirty_bitmap_add,
  },
  
@@ -1458,6 +1458,9 @@ Arguments:

  - "node": device/node on which to create dirty bitmap (json-string)
  - "name": name of the new dirty bitmap (json-string)
  - "granularity": granularity to track writes with (int, optional)
+- "persistent": bitmap will be saved to corresponding block device
+on it's close. Block driver should maintain persistent bitmaps
+(json-bool, optional, default false) (Since 2.8)

And I don't know what the user is supposed to make of the information
that block drivers will take care of maintaining persistent bitmaps. All
they care about is that it will be stored in the corresponding image
file, so in my opinion it would be better to just omit the last sentence
here.


Users shoud know, that only qcow2 supports persistent bitmaps. Instead 
of last sentence I can write "For now only Qcow2 disks supports 
persistent bitmaps".



--
Best regards,
Vladimir

Re: [Qemu-block] [Qemu-devel] [PATCH 2/2] block: rename raw-{posix, win32} to file-*.c

2016-10-24 Thread Laszlo Ersek

On 10/20/16 03:11, Eric Blake wrote:
> These files deal with the file protocol, not the raw format (the
> file protocol is often used with other formats, and the raw
> protocol

s/raw protocol/raw format/

> is not forced to use the file format).

s/file format/file protocol/

I think.

Thanks!
Laszlo

> Rename things
> to make it a bit easier to follow.
> 
> Suggested-by: Daniel P. Berrange 
> Signed-off-by: Eric Blake 
> ---
>  include/block/block_int.h   | 2 +-
>  block/{raw-posix.c => file-posix.c} | 0
>  block/{raw-win32.c => file-win32.c} | 0
>  block/gluster.c | 4 ++--
>  MAINTAINERS | 4 ++--
>  block/Makefile.objs | 4 ++--
>  block/trace-events  | 4 ++--
>  configure   | 2 +-
>  8 files changed, 10 insertions(+), 10 deletions(-)
>  rename block/{raw-posix.c => file-posix.c} (100%)
>  rename block/{raw-win32.c => file-win32.c} (100%)
> 
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 3e79228..36d26cc 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -186,7 +186,7 @@ struct BlockDriver {
> 
>  /*
>   * Flushes all data that was already written to the OS all the way down 
> to
> - * the disk (for example raw-posix calls fsync()).
> + * the disk (for example file-posix.c calls fsync()).
>   */
>  int coroutine_fn (*bdrv_co_flush_to_disk)(BlockDriverState *bs);
> 
> diff --git a/block/raw-posix.c b/block/file-posix.c
> similarity index 100%
> rename from block/raw-posix.c
> rename to block/file-posix.c
> diff --git a/block/raw-win32.c b/block/file-win32.c
> similarity index 100%
> rename from block/raw-win32.c
> rename to block/file-win32.c
> diff --git a/block/gluster.c b/block/gluster.c
> index af76d7d5..257ade5 100644
> --- a/block/gluster.c
> +++ b/block/gluster.c
> @@ -1172,7 +1172,7 @@ static int qemu_gluster_has_zero_init(BlockDriverState 
> *bs)
>   * If @start is in a trailing hole or beyond EOF, return -ENXIO.
>   * If we can't find out, return a negative errno other than -ENXIO.
>   *
> - * (Shamefully copied from raw-posix.c, only miniscule adaptions.)
> + * (Shamefully copied from file-posix.c, only miniscule adaptions.)
>   */
>  static int find_allocation(BlockDriverState *bs, off_t start,
> off_t *data, off_t *hole)
> @@ -1262,7 +1262,7 @@ static int find_allocation(BlockDriverState *bs, off_t 
> start,
>   * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
>   * beyond the end of the disk image it will be clamped.
>   *
> - * (Based on raw_co_get_block_status() from raw-posix.c.)
> + * (Based on raw_co_get_block_status() from file-posix.c.)
>   */
>  static int64_t coroutine_fn qemu_gluster_co_get_block_status(
>  BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum,
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 6f984c3..950ebea 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1675,8 +1675,8 @@ L: qemu-block@nongnu.org
>  S: Supported
>  F: block/linux-aio.c
>  F: include/block/raw-aio.h
> -F: block/raw-posix.c
> -F: block/raw-win32.c
> +F: block/file-posix.c
> +F: block/file-win32.c
>  F: block/raw.c
>  F: block/win32-aio.c
> 
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index c10941e..3fe8910 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -6,8 +6,8 @@ block-obj-y += vhdx.o vhdx-endian.o vhdx-log.o
>  block-obj-y += quorum.o
>  block-obj-y += parallels.o blkdebug.o blkverify.o blkreplay.o
>  block-obj-y += block-backend.o snapshot.o qapi.o
> -block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
> -block-obj-$(CONFIG_POSIX) += raw-posix.o
> +block-obj-$(CONFIG_WIN32) += file-win32.o win32-aio.o
> +block-obj-$(CONFIG_POSIX) += file-posix.o
>  block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
>  block-obj-y += null.o mirror.o commit.o io.o
>  block-obj-y += throttle-groups.o
> diff --git a/block/trace-events b/block/trace-events
> index 05fa13c..627214b 100644
> --- a/block/trace-events
> +++ b/block/trace-events
> @@ -55,8 +55,8 @@ qmp_block_job_complete(void *job) "job %p"
>  block_job_cb(void *bs, void *job, int ret) "bs %p job %p ret %d"
>  qmp_block_stream(void *bs, void *job) "bs %p job %p"
> 
> -# block/raw-win32.c
> -# block/raw-posix.c
> +# block/file-win32.c
> +# block/file-posix.c
>  paio_submit_co(int64_t offset, int count, int type) "offset %"PRId64" count 
> %d type %d"
>  paio_submit(void *acb, void *opaque, int64_t offset, int count, int type) 
> "acb %p opaque %p offset %"PRId64" count %d type %d"
> 
> diff --git a/configure b/configure
> index dd9e679..e2b169f 100755
> --- a/configure
> +++ b/configure
> @@ -2727,7 +2727,7 @@ if compile_prog "" "" ; then
>  fi
> 
>  ##
> -# xfsctl() probe, used for raw-posix
> +# xfsctl() probe, used for file-posix.c
>  if test "$xfs" != "no" ; then
>cat > $TMPC << EOF
>  #include   /* NULL */
>

Re: [Qemu-block] [PATCH 12/22] qcow2-bitmap: add IN_USE flag

2016-10-24 Thread Max Reitz

On 24.10.2016 12:32, Vladimir Sementsov-Ogievskiy wrote:
> 21.10.2016 22:58, Max Reitz пишет:
>> On 21.10.2016 17:34, Vladimir Sementsov-Ogievskiy wrote:
>>> 07.10.2016 22:44, Max Reitz пишет:
 On 30.09.2016 12:53, Vladimir Sementsov-Ogievskiy wrote:
> This flag means that the bitmap is now in use by the software or
> was not
> successfully saved. In any way, with this flag set the bitmap data
> must
> be considered inconsistent and should not be loaded.
>
> With current implementation this flag is never set, as we just remove
> bitmaps from the image after loading. But it defined in qcow2 spec and
> must be handled. Also, it can be used in future, if async schemes of
> bitmap loading/saving are implemented.
>
> We also remove in-use bitmaps from the image on open.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>block/qcow2-bitmap.c | 17 -
>1 file changed, 16 insertions(+), 1 deletion(-)
 Don't you want to make use of this flag? It would appear useful to
 me if
 you just marked autoload bitmaps as in_use instead of deleting them
 from
 the image when it's opened and then overwrite them when the image is
 closed.
>>> And what is the use of it?
>> You don't need to free any bitmaps when opening the file, and you don't
>> need to allocate any new bitmap space when closing it.
> 
> As bitmaps are sparce in file, I need to allocate new space when
> closing. Or free it...

May happen. But not necessarily, and it will probably still save time as
you can reuse existing allocations and don't have to free everything.

Max



signature.asc
Description: OpenPGP digital signature

[Qemu-block] [PULL 04/23] block: improve error handling in raw_open

2016-10-24 Thread Kevin Wolf

From: Halil Pasic 

Make raw_open for POSIX more consistent in handling errors by setting
the error object also when qemu_open fails. The error object was set
generally set in case of errors, but I guess this case was overlooked.
Do the same for win32.

Signed-off-by: Halil Pasic 
Reviewed-by: Sascha Silbe 
Tested-by: Marc Hartmayer  (POSIX only)
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Kevin Wolf 
---
 block/raw-posix.c | 1 +
 block/raw-win32.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 166e9d1..f481e57 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -443,6 +443,7 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 fd = qemu_open(filename, s->open_flags, 0644);
 if (fd < 0) {
 ret = -errno;
+error_setg_errno(errp, errno, "Could not open '%s'", filename);
 if (ret == -EROFS) {
 ret = -EACCES;
 }
diff --git a/block/raw-win32.c b/block/raw-win32.c
index 734bb10..800fabd 100644
--- a/block/raw-win32.c
+++ b/block/raw-win32.c
@@ -373,6 +373,7 @@ static int raw_open(BlockDriverState *bs, QDict *options, 
int flags,
 if (s->hfile == INVALID_HANDLE_VALUE) {
 int err = GetLastError();
 
+error_setg_win32(errp, err, "Could not open '%s'", filename);
 if (err == ERROR_ACCESS_DENIED) {
 ret = -EACCES;
 } else {
-- 
1.8.3.1

[Qemu-block] [PULL 05/23] qapi: fix memory leak in bdrv_image_info_specific_dump

2016-10-24 Thread Kevin Wolf

From: Pino Toscano 

The 'obj' result of the visitor was not properly freed, like done in
other places doing a similar job.

Signed-off-by: Pino Toscano 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 block/qapi.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/qapi.c b/block/qapi.c
index 6f947e3..50d3090 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -698,6 +698,7 @@ void bdrv_image_info_specific_dump(fprintf_function 
func_fprintf, void *f,
 assert(qobject_type(obj) == QTYPE_QDICT);
 data = qdict_get(qobject_to_qdict(obj), "data");
 dump_qobject(func_fprintf, f, 1, data);
+qobject_decref(obj);
 visit_free(v);
 }
 
-- 
1.8.3.1

[Qemu-block] [PULL 00/23] Block layer patches

2016-10-24 Thread Kevin Wolf

The following changes since commit a3ae21ec3fe036f536dc94cad735931777143103:

  Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging 
(2016-10-24 15:03:09 +0100)

are available in the git repository at:


  git://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to 25493dc012e7c10dba51ee893b634a1dbfeed126:

  Merge remote-tracking branch 'mreitz/tags/pull-block-2016-10-24' into 
queue-block (2016-10-24 18:02:26 +0200)



Block layer patches


Alberto Garcia (2):
  throttle: Correct access to wrong BlockBackendPublic structures
  qemu-iotests: Test I/O in a single drive from a throttling group

Changlong Xie (1):
  block/replication: Clarify 'top-id' parameter usage

Fam Zheng (9):
  qcow2: Support BDRV_REQ_MAY_UNMAP
  block: Hide HBitmap in block dirty bitmap interface
  HBitmap: Introduce "meta" bitmap to track bit changes
  tests: Add test code for meta bitmap
  block: Support meta dirty bitmap
  block: Add two dirty bitmap getters
  block: Assert that bdrv_release_dirty_bitmap succeeded
  tests: Add test code for hbitmap serialization
  block: More operations for meta dirty bitmap

Halil Pasic (1):
  block: improve error handling in raw_open

Kevin Wolf (2):
  block: Remove "options" indirection from blockdev-add
  Merge remote-tracking branch 'mreitz/tags/pull-block-2016-10-24' into 
queue-block

Max Reitz (3):
  qemu-nbd: Add --fork option
  iotests: Remove raciness from 162
  iotests: Do not rely on unavailable domains in 162

Paolo Bonzini (2):
  quorum: change child_iter to children_read
  quorum: do not allocate multiple iovecs for FIFO strategy

Pino Toscano (1):
  qapi: fix memory leak in bdrv_image_info_specific_dump

Vladimir Sementsov-Ogievskiy (2):
  hbitmap: serialization
  block: BdrvDirtyBitmap serialization interface

Xu Tian (1):
  block: failed qemu-img command should return non-zero exit code

 block/backup.c   |  14 ++-
 block/dirty-bitmap.c | 160 -
 block/mirror.c   |  24 ++--
 block/qapi.c |   1 +
 block/qcow2-cluster.c|   9 +-
 block/qcow2.c|   3 +-
 block/qcow2.h|   3 +-
 block/quorum.c   |  93 +++
 block/raw-posix.c|   1 +
 block/raw-win32.c|   1 +
 block/replication.c  |   5 +
 block/throttle-groups.c  |  27 -
 docs/qmp-commands.txt|  84 +++--
 include/block/dirty-bitmap.h |  35 +-
 include/qemu/hbitmap.h   | 100 
 include/qemu/typedefs.h  |   1 +
 qapi/block-core.json |   7 +-
 qemu-img.c   |   2 +
 qemu-nbd.c   |  17 ++-
 qemu-nbd.texi|   2 +
 tests/qemu-iotests/041   |  11 +-
 tests/qemu-iotests/067   |  12 +-
 tests/qemu-iotests/071   | 118 ---
 tests/qemu-iotests/081   |  52 -
 tests/qemu-iotests/085   |   9 +-
 tests/qemu-iotests/087   |  76 +---
 tests/qemu-iotests/093   |  33 +-
 tests/qemu-iotests/093.out   |   4 +-
 tests/qemu-iotests/117   |  12 +-
 tests/qemu-iotests/118   |  42 +++
 tests/qemu-iotests/124   |  20 ++--
 tests/qemu-iotests/139   |  10 +-
 tests/qemu-iotests/141   |  13 +--
 tests/qemu-iotests/155   |  10 +-
 tests/qemu-iotests/162   |  22 +++-
 tests/qemu-iotests/162.out   |   2 +-
 tests/test-hbitmap.c | 272 +++
 util/hbitmap.c   | 206 +---
 38 files changed, 1140 insertions(+), 373 deletions(-)

[Qemu-block] [PULL 10/23] iotests: Do not rely on unavailable domains in 162

2016-10-24 Thread Kevin Wolf

From: Max Reitz 

There are some (mostly ISP-specific) name servers who will redirect
non-existing domains to special hosts. In this case, we will get a
different error message when trying to connect to such a host, which
breaks test 162.

162 needed this specific error message so it can confirm that qemu was
indeed trying to connect to the user-specified port. However, we can
also confirm this by setting up a local NBD server on exactly that port;
so we can fix the issue by doing just that.

Reported-by: Stefan Hajnoczi 
Signed-off-by: Max Reitz 
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/162 | 19 +++
 tests/qemu-iotests/162.out |  2 +-
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/tests/qemu-iotests/162 b/tests/qemu-iotests/162
index c7e6593..f8eecb3 100755
--- a/tests/qemu-iotests/162
+++ b/tests/qemu-iotests/162
@@ -43,10 +43,21 @@ echo '=== NBD ==='
 $QEMU_IMG info 'json:{"driver": "nbd", "host": 42}'
 
 # And this should not treat @port as if it had not been specified
-# (We cannot use localhost with an invalid port here, but we need to use a
-#  non-existing domain, because otherwise the error message will not contain
-#  the port)
-$QEMU_IMG info 'json:{"driver": "nbd", "host": "does.not.exist.example.com", 
"port": 42}'
+# (We need to set up a server here, because the error message for "Connection
+#  refused" does not contain the destination port)
+
+# Launching qemu-nbd is done in a loop: We try to set up an NBD server on some
+# random port and continue until success, i.e. until we have found a port that
+# is not in use yet.
+while true; do
+port=$((RANDOM + 32768))
+if $QEMU_NBD -p $port -f raw --fork null-co:// 2> /dev/null; then
+break
+fi
+done
+
+$QEMU_IMG info "json:{'driver': 'nbd', 'host': 'localhost', 'port': $port}" \
+| grep '^image' | sed -e "s/$port/PORT/"
 
 # This is a test for NBD's bdrv_refresh_filename() implementation: It expects
 # either host or path to be set, but it must not assume that they are set to
diff --git a/tests/qemu-iotests/162.out b/tests/qemu-iotests/162.out
index 9bba723..3c5be2c 100644
--- a/tests/qemu-iotests/162.out
+++ b/tests/qemu-iotests/162.out
@@ -2,7 +2,7 @@ QA output created by 162
 
 === NBD ===
 qemu-img: Could not open 'json:{"driver": "nbd", "host": 42}': Failed to 
connect socket: Invalid argument
-qemu-img: Could not open 'json:{"driver": "nbd", "host": 
"does.not.exist.example.com", "port": 42}': address resolution failed for 
does.not.exist.example.com:42: Name or service not known
+image: nbd://localhost:PORT
 image: nbd+unix://?socket=42
 
 === SSH ===
-- 
1.8.3.1

[Qemu-block] [PULL 01/23] block: failed qemu-img command should return non-zero exit code

2016-10-24 Thread Kevin Wolf

From: Xu Tian 

If the backing file cannot be opened when doing qemu-img rebase, the
variable 'ret' was not assigned a non-zero value, and the qemu-img
process terminated with exit code zero. Fix this.

Signed-off-by: Xu Tian 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Kevin Wolf 
---
 qemu-img.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/qemu-img.c b/qemu-img.c
index 67e8512..ab395a9 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2956,6 +2956,7 @@ static int img_rebase(int argc, char **argv)
 error_reportf_err(local_err,
   "Could not open old backing file '%s': ",
   backing_name);
+ret = -1;
 goto out;
 }
 
@@ -2973,6 +2974,7 @@ static int img_rebase(int argc, char **argv)
 error_reportf_err(local_err,
   "Could not open new backing file '%s': ",
   out_baseimg);
+ret = -1;
 goto out;
 }
 }
-- 
1.8.3.1

[Qemu-block] [PULL 12/23] quorum: do not allocate multiple iovecs for FIFO strategy

2016-10-24 Thread Kevin Wolf

From: Paolo Bonzini 

In FIFO mode there are no parallel reads, hence there is no need to
allocate separate buffers and clone the iovecs.

The two cases of quorum_aio_cb are now even more different, and
most of quorum_aio_finalize is only needed in one of them, so split
them in separate functions.

Signed-off-by: Paolo Bonzini 
Message-id: 1475685327-22767-3-git-send-email-pbonz...@redhat.com
Reviewed-by: Max Reitz 
Reviewed-by: Alberto Garcia 
Signed-off-by: Max Reitz 
---
 block/quorum.c | 79 +-
 1 file changed, 40 insertions(+), 39 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index 435296e..d122299 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -156,21 +156,7 @@ static AIOCBInfo quorum_aiocb_info = {
 
 static void quorum_aio_finalize(QuorumAIOCB *acb)
 {
-int i, ret = 0;
-
-if (acb->vote_ret) {
-ret = acb->vote_ret;
-}
-
-acb->common.cb(acb->common.opaque, ret);
-
-if (acb->is_read) {
-for (i = 0; i < acb->children_read; i++) {
-qemu_vfree(acb->qcrs[i].buf);
-qemu_iovec_destroy(&acb->qcrs[i].qiov);
-}
-}
-
+acb->common.cb(acb->common.opaque, acb->vote_ret);
 g_free(acb->qcrs);
 qemu_aio_unref(acb);
 }
@@ -282,38 +268,52 @@ static void quorum_copy_qiov(QEMUIOVector *dest, 
QEMUIOVector *source)
 }
 }
 
-static void quorum_aio_cb(void *opaque, int ret)
+static void quorum_report_bad_acb(QuorumChildRequest *sacb, int ret)
+{
+QuorumAIOCB *acb = sacb->parent;
+QuorumOpType type = acb->is_read ? QUORUM_OP_TYPE_READ : 
QUORUM_OP_TYPE_WRITE;
+quorum_report_bad(type, acb->sector_num, acb->nb_sectors,
+  sacb->aiocb->bs->node_name, ret);
+}
+
+static void quorum_fifo_aio_cb(void *opaque, int ret)
 {
 QuorumChildRequest *sacb = opaque;
 QuorumAIOCB *acb = sacb->parent;
 BDRVQuorumState *s = acb->common.bs->opaque;
-bool rewrite = false;
 
-if (ret == 0) {
-acb->success_count++;
-} else {
-QuorumOpType type;
-type = acb->is_read ? QUORUM_OP_TYPE_READ : QUORUM_OP_TYPE_WRITE;
-quorum_report_bad(type, acb->sector_num, acb->nb_sectors,
-  sacb->aiocb->bs->node_name, ret);
-}
+assert(acb->is_read && s->read_pattern == QUORUM_READ_PATTERN_FIFO);
+
+if (ret < 0) {
+quorum_report_bad_acb(sacb, ret);
 
-if (acb->is_read && s->read_pattern == QUORUM_READ_PATTERN_FIFO) {
 /* We try to read next child in FIFO order if we fail to read */
-if (ret < 0 && acb->children_read < s->num_children) {
+if (acb->children_read < s->num_children) {
 read_fifo_child(acb);
 return;
 }
-
-if (ret == 0) {
-quorum_copy_qiov(acb->qiov, &acb->qcrs[acb->children_read - 
1].qiov);
-}
-acb->vote_ret = ret;
-quorum_aio_finalize(acb);
-return;
 }
 
+acb->vote_ret = ret;
+
+/* FIXME: rewrite failed children if acb->children_read > 1? */
+quorum_aio_finalize(acb);
+}
+
+static void quorum_aio_cb(void *opaque, int ret)
+{
+QuorumChildRequest *sacb = opaque;
+QuorumAIOCB *acb = sacb->parent;
+BDRVQuorumState *s = acb->common.bs->opaque;
+bool rewrite = false;
+int i;
+
 sacb->ret = ret;
+if (ret == 0) {
+acb->success_count++;
+} else {
+quorum_report_bad_acb(sacb, ret);
+}
 acb->count++;
 assert(acb->count <= s->num_children);
 assert(acb->success_count <= s->num_children);
@@ -324,6 +324,10 @@ static void quorum_aio_cb(void *opaque, int ret)
 /* Do the vote on read */
 if (acb->is_read) {
 rewrite = quorum_vote(acb);
+for (i = 0; i < s->num_children; i++) {
+qemu_vfree(acb->qcrs[i].buf);
+qemu_iovec_destroy(&acb->qcrs[i].qiov);
+}
 } else {
 quorum_has_too_much_io_failed(acb);
 }
@@ -672,12 +676,9 @@ static BlockAIOCB *read_fifo_child(QuorumAIOCB *acb)
 BDRVQuorumState *s = acb->common.bs->opaque;
 int n = acb->children_read++;
 
-acb->qcrs[n].buf = qemu_blockalign(s->children[n]->bs, acb->qiov->size);
-qemu_iovec_init(&acb->qcrs[n].qiov, acb->qiov->niov);
-qemu_iovec_clone(&acb->qcrs[n].qiov, acb->qiov, acb->qcrs[n].buf);
 acb->qcrs[n].aiocb = bdrv_aio_readv(s->children[n], acb->sector_num,
-&acb->qcrs[n].qiov, acb->nb_sectors,
-quorum_aio_cb, &acb->qcrs[n]);
+acb->qiov, acb->nb_sectors,
+quorum_fifo_aio_cb, &acb->qcrs[n]);
 
 return &acb->common;
 }
-- 
1.8.3.1

[Qemu-block] [PULL 07/23] qemu-iotests: Test I/O in a single drive from a throttling group

2016-10-24 Thread Kevin Wolf

From: Alberto Garcia 

iotest 093 contains a test that creates a throttling group with
several drives and performs I/O in all of them. This patch adds a new
test that creates a similar setup but only performs I/O in one of the
drives at the same time.

This is useful to test that the round robin algorithm is behaving
properly in these scenarios, and is specifically written using the
regression introduced in 27ccdd52598290f0f8b58be56e as an example.

Signed-off-by: Alberto Garcia 
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/093 | 33 -
 tests/qemu-iotests/093.out |  4 ++--
 2 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/tests/qemu-iotests/093 b/tests/qemu-iotests/093
index ffcb271..2ed393a 100755
--- a/tests/qemu-iotests/093
+++ b/tests/qemu-iotests/093
@@ -53,7 +53,7 @@ class ThrottleTestCase(iotests.QMPTestCase):
 result = self.vm.qmp("block_set_io_throttle", conv_keys=False, 
**params)
 self.assert_qmp(result, 'return', {})
 
-def do_test_throttle(self, ndrives, seconds, params):
+def do_test_throttle(self, ndrives, seconds, params, first_drive = 0):
 def check_limit(limit, num):
 # IO throttling algorithm is discrete, allow 10% error so the test
 # is more robust
@@ -85,12 +85,14 @@ class ThrottleTestCase(iotests.QMPTestCase):
 # Send I/O requests to all drives
 for i in range(rd_nr):
 for drive in range(0, ndrives):
-self.vm.hmp_qemu_io("drive%d" % drive, "aio_read %d %d" %
+idx = first_drive + drive
+self.vm.hmp_qemu_io("drive%d" % idx, "aio_read %d %d" %
 (i * rq_size, rq_size))
 
 for i in range(wr_nr):
 for drive in range(0, ndrives):
-self.vm.hmp_qemu_io("drive%d" % drive, "aio_write %d %d" %
+idx = first_drive + drive
+self.vm.hmp_qemu_io("drive%d" % idx, "aio_write %d %d" %
 (i * rq_size, rq_size))
 
 # We'll store the I/O stats for each drive in these arrays
@@ -105,15 +107,17 @@ class ThrottleTestCase(iotests.QMPTestCase):
 
 # Read the stats before advancing the clock
 for i in range(0, ndrives):
+idx = first_drive + i
 start_rd_bytes[i], start_rd_iops[i], start_wr_bytes[i], \
-start_wr_iops[i] = self.blockstats('drive%d' % i)
+start_wr_iops[i] = self.blockstats('drive%d' % idx)
 
 self.vm.qtest("clock_step %d" % ns)
 
 # Read the stats after advancing the clock
 for i in range(0, ndrives):
+idx = first_drive + i
 end_rd_bytes[i], end_rd_iops[i], end_wr_bytes[i], \
-end_wr_iops[i] = self.blockstats('drive%d' % i)
+end_wr_iops[i] = self.blockstats('drive%d' % idx)
 
 # Check that the I/O is within the limits and evenly distributed
 for i in range(0, ndrives):
@@ -129,6 +133,7 @@ class ThrottleTestCase(iotests.QMPTestCase):
 self.assertTrue(check_limit(params['iops_rd'], rd_iops))
 self.assertTrue(check_limit(params['iops_wr'], wr_iops))
 
+# Connect N drives to a VM and test I/O in all of them
 def test_all(self):
 params = {"bps": 4096,
   "bps_rd": 4096,
@@ -146,6 +151,24 @@ class ThrottleTestCase(iotests.QMPTestCase):
 self.configure_throttle(ndrives, limits)
 self.do_test_throttle(ndrives, 5, limits)
 
+# Connect N drives to a VM and test I/O in just one of them a time
+def test_one(self):
+params = {"bps": 4096,
+  "bps_rd": 4096,
+  "bps_wr": 4096,
+  "iops": 10,
+  "iops_rd": 10,
+  "iops_wr": 10,
+ }
+# Repeat the test for each one of the drives
+for drive in range(0, self.max_drives):
+# Pick each out of all possible params and test
+for tk in params:
+limits = dict([(k, 0) for k in params])
+limits[tk] = params[tk] * self.max_drives
+self.configure_throttle(self.max_drives, limits)
+self.do_test_throttle(1, 5, limits, drive)
+
 def test_burst(self):
 params = {"bps": 4096,
   "bps_rd": 4096,
diff --git a/tests/qemu-iotests/093.out b/tests/qemu-iotests/093.out
index 914e373..2f7d390 100644
--- a/tests/qemu-iotests/093.out
+++ b/tests/qemu-iotests/093.out
@@ -1,5 +1,5 @@
-.
+...
 --
-Ran 5 tests
+Ran 7 tests
 
 OK
-- 
1.8.3.1

[Qemu-block] [PULL 06/23] throttle: Correct access to wrong BlockBackendPublic structures

2016-10-24 Thread Kevin Wolf

From: Alberto Garcia 

In 27ccdd52598290f0f8b58be56e235aff7aebfaf3 the throttling fields were
moved from BlockDriverState to BlockBackend. However in a few cases
the code started using throttling fields from the active BlockBackend
instead of the round-robin token, making the algorithm behave
incorrectly.

This can cause starvation if there's a throttling group with several
drives but only one of them has I/O.

Cc: qemu-sta...@nongnu.org
Reported-by: Paolo Bonzini 
Signed-off-by: Alberto Garcia 
Reviewed-by: Paolo Bonzini 
Signed-off-by: Kevin Wolf 
---
 block/throttle-groups.c | 27 +++
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index 59545e2..17b2efb 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -168,6 +168,22 @@ static BlockBackend *throttle_group_next_blk(BlockBackend 
*blk)
 return blk_by_public(next);
 }
 
+/*
+ * Return whether a BlockBackend has pending requests.
+ *
+ * This assumes that tg->lock is held.
+ *
+ * @blk: the BlockBackend
+ * @is_write:  the type of operation (read/write)
+ * @ret:   whether the BlockBackend has pending requests.
+ */
+static inline bool blk_has_pending_reqs(BlockBackend *blk,
+bool is_write)
+{
+const BlockBackendPublic *blkp = blk_get_public(blk);
+return blkp->pending_reqs[is_write];
+}
+
 /* Return the next BlockBackend in the round-robin sequence with pending I/O
  * requests.
  *
@@ -188,7 +204,7 @@ static BlockBackend *next_throttle_token(BlockBackend *blk, 
bool is_write)
 
 /* get next bs round in round robin style */
 token = throttle_group_next_blk(token);
-while (token != start && !blkp->pending_reqs[is_write]) {
+while (token != start && !blk_has_pending_reqs(token, is_write)) {
 token = throttle_group_next_blk(token);
 }
 
@@ -196,10 +212,13 @@ static BlockBackend *next_throttle_token(BlockBackend 
*blk, bool is_write)
  * then decide the token is the current bs because chances are
  * the current bs get the current request queued.
  */
-if (token == start && !blkp->pending_reqs[is_write]) {
+if (token == start && !blk_has_pending_reqs(token, is_write)) {
 token = blk;
 }
 
+/* Either we return the original BB, or one with pending requests */
+assert(token == blk || blk_has_pending_reqs(token, is_write));
+
 return token;
 }
 
@@ -257,7 +276,7 @@ static void schedule_next_request(BlockBackend *blk, bool 
is_write)
 
 /* Check if there's any pending request to schedule next */
 token = next_throttle_token(blk, is_write);
-if (!blkp->pending_reqs[is_write]) {
+if (!blk_has_pending_reqs(token, is_write)) {
 return;
 }
 
@@ -271,7 +290,7 @@ static void schedule_next_request(BlockBackend *blk, bool 
is_write)
 qemu_co_queue_next(&blkp->throttled_reqs[is_write])) {
 token = blk;
 } else {
-ThrottleTimers *tt = &blkp->throttle_timers;
+ThrottleTimers *tt = &blk_get_public(token)->throttle_timers;
 int64_t now = qemu_clock_get_ns(tt->clock_type);
 timer_mod(tt->timers[is_write], now + 1);
 tg->any_timer_armed[is_write] = true;
-- 
1.8.3.1

[Qemu-block] [PULL 08/23] qemu-nbd: Add --fork option

2016-10-24 Thread Kevin Wolf

From: Max Reitz 

Using the --fork option, one can make qemu-nbd fork the worker process.
The original process will exit on error of the worker or once the worker
enters the main loop.

Suggested-by: Sascha Silbe 
Signed-off-by: Max Reitz 
Signed-off-by: Kevin Wolf 
---
 qemu-nbd.c| 17 -
 qemu-nbd.texi |  2 ++
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/qemu-nbd.c b/qemu-nbd.c
index cca4a98..b757dc7 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -48,6 +48,7 @@
 #define QEMU_NBD_OPT_OBJECT260
 #define QEMU_NBD_OPT_TLSCREDS  261
 #define QEMU_NBD_OPT_IMAGE_OPTS262
+#define QEMU_NBD_OPT_FORK  263
 
 #define MBR_SIZE 512
 
@@ -92,6 +93,8 @@ static void usage(const char *name)
 "passwords and/or encryption keys\n"
 "  -T, --trace [[enable=]][,events=][,file=]\n"
 "specify tracing options\n"
+"  --forkfork off the server process and exit the parent\n"
+"once the server is running\n"
 #ifdef __linux__
 "Kernel NBD client support:\n"
 "  -c, --connect=DEV connect FILE to the local NBD device DEV\n"
@@ -503,6 +506,7 @@ int main(int argc, char **argv)
 { "tls-creds", required_argument, NULL, QEMU_NBD_OPT_TLSCREDS },
 { "image-opts", no_argument, NULL, QEMU_NBD_OPT_IMAGE_OPTS },
 { "trace", required_argument, NULL, 'T' },
+{ "fork", no_argument, NULL, QEMU_NBD_OPT_FORK },
 { NULL, 0, NULL, 0 }
 };
 int ch;
@@ -524,6 +528,8 @@ int main(int argc, char **argv)
 bool imageOpts = false;
 bool writethrough = true;
 char *trace_file = NULL;
+bool fork_process = false;
+int old_stderr = -1;
 
 /* The client thread uses SIGTERM to interrupt the server.  A signal
  * handler ensures that "qemu-nbd -v -c" exits with a nice status code.
@@ -715,6 +721,9 @@ int main(int argc, char **argv)
 g_free(trace_file);
 trace_file = trace_opt_parse(optarg);
 break;
+case QEMU_NBD_OPT_FORK:
+fork_process = true;
+break;
 }
 }
 
@@ -774,7 +783,7 @@ int main(int argc, char **argv)
 return 0;
 }
 
-if (device && !verbose) {
+if ((device && !verbose) || fork_process) {
 int stderr_fd[2];
 pid_t pid;
 int ret;
@@ -797,6 +806,7 @@ int main(int argc, char **argv)
 ret = qemu_daemon(1, 0);
 
 /* Temporarily redirect stderr to the parent's pipe...  */
+old_stderr = dup(STDERR_FILENO);
 dup2(stderr_fd[1], STDERR_FILENO);
 if (ret < 0) {
 error_report("Failed to daemonize: %s", strerror(errno));
@@ -960,6 +970,11 @@ int main(int argc, char **argv)
 exit(EXIT_FAILURE);
 }
 
+if (fork_process) {
+dup2(old_stderr, STDERR_FILENO);
+close(old_stderr);
+}
+
 state = RUNNING;
 do {
 main_loop_wait(false);
diff --git a/qemu-nbd.texi b/qemu-nbd.texi
index 91ebf04..b7a9c6d 100644
--- a/qemu-nbd.texi
+++ b/qemu-nbd.texi
@@ -86,6 +86,8 @@ the new style NBD protocol negotiation
 Enable mandatory TLS encryption for the server by setting the ID
 of the TLS credentials object previously created with the --object
 option.
+@item --fork
+Fork off the server process and exit the parent once the server is running.
 @item -v, --verbose
 Display extra debugging information
 @item -h, --help
-- 
1.8.3.1

[Qemu-block] [PULL 09/23] iotests: Remove raciness from 162

2016-10-24 Thread Kevin Wolf

From: Max Reitz 

With qemu-nbd's new --fork option, we no longer need to launch it the
hacky way.

Suggested-by: Sascha Silbe 
Signed-off-by: Max Reitz 
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/162 | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/162 b/tests/qemu-iotests/162
index 0b43ea3..c7e6593 100755
--- a/tests/qemu-iotests/162
+++ b/tests/qemu-iotests/162
@@ -51,8 +51,7 @@ $QEMU_IMG info 'json:{"driver": "nbd", "host": 
"does.not.exist.example.com", "po
 # This is a test for NBD's bdrv_refresh_filename() implementation: It expects
 # either host or path to be set, but it must not assume that they are set to
 # strings in the options QDict
-$QEMU_NBD -k "$PWD/42" -f raw null-co:// &
-sleep 0.5
+$QEMU_NBD -k "$PWD/42" -f raw --fork null-co://
 $QEMU_IMG info 'json:{"driver": "nbd", "path": 42}' | grep '^image'
 rm -f 42
 
-- 
1.8.3.1

[Qemu-block] [PULL 02/23] qcow2: Support BDRV_REQ_MAY_UNMAP

2016-10-24 Thread Kevin Wolf

From: Fam Zheng 

Handling this is similar to what is done to the L2 entry in the case of
compressed clusters.

Signed-off-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 block/qcow2-cluster.c | 9 +
 block/qcow2.c | 3 ++-
 block/qcow2.h | 3 ++-
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 61d1ffd..928c1e2 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1558,7 +1558,7 @@ fail:
  * clusters.
  */
 static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
-  uint64_t nb_clusters)
+  uint64_t nb_clusters, int flags)
 {
 BDRVQcow2State *s = bs->opaque;
 uint64_t *l2_table;
@@ -1582,7 +1582,7 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t 
offset,
 
 /* Update L2 entries */
 qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache, l2_table);
-if (old_offset & QCOW_OFLAG_COMPRESSED) {
+if (old_offset & QCOW_OFLAG_COMPRESSED || flags & BDRV_REQ_MAY_UNMAP) {
 l2_table[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
 qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
 } else {
@@ -1595,7 +1595,8 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t 
offset,
 return nb_clusters;
 }
 
-int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors)
+int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,
+int flags)
 {
 BDRVQcow2State *s = bs->opaque;
 uint64_t nb_clusters;
@@ -1612,7 +1613,7 @@ int qcow2_zero_clusters(BlockDriverState *bs, uint64_t 
offset, int nb_sectors)
 s->cache_discards = true;
 
 while (nb_clusters > 0) {
-ret = zero_single_l2(bs, offset, nb_clusters);
+ret = zero_single_l2(bs, offset, nb_clusters, flags);
 if (ret < 0) {
 goto fail;
 }
diff --git a/block/qcow2.c b/block/qcow2.c
index e11c7c9..6d5689a 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1155,6 +1155,7 @@ static int qcow2_open(BlockDriverState *bs, QDict 
*options, int flags,
 
 /* Initialise locks */
 qemu_co_mutex_init(&s->lock);
+bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP;
 
 /* Repair image if dirty */
 if (!(flags & (BDRV_O_CHECK | BDRV_O_INACTIVE)) && !bs->read_only &&
@@ -2477,7 +2478,7 @@ static coroutine_fn int 
qcow2_co_pwrite_zeroes(BlockDriverState *bs,
 trace_qcow2_pwrite_zeroes(qemu_coroutine_self(), offset, count);
 
 /* Whatever is left can use real zero clusters */
-ret = qcow2_zero_clusters(bs, offset, count >> BDRV_SECTOR_BITS);
+ret = qcow2_zero_clusters(bs, offset, count >> BDRV_SECTOR_BITS, flags);
 qemu_co_mutex_unlock(&s->lock);
 
 return ret;
diff --git a/block/qcow2.h b/block/qcow2.h
index 9ce5a37..92203a8 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -547,7 +547,8 @@ uint64_t 
qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
 int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m);
 int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
 int nb_sectors, enum qcow2_discard_type type, bool full_discard);
-int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors);
+int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,
+int flags);
 
 int qcow2_expand_zero_clusters(BlockDriverState *bs,
BlockDriverAmendStatusCB *status_cb,
-- 
1.8.3.1

[Qemu-block] [PULL 03/23] block: Remove "options" indirection from blockdev-add

2016-10-24 Thread Kevin Wolf

Now that QAPI supports boxed types, we can have unions at the top level
of a command, so let's put our real options directly there for
blockdev-add instead of having a single "options" dict that contains the
real arguments.

blockdev-add is still experimental and we already made substantial
changes to the API recently, so we're free to make changes like this
one, too.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Markus Armbruster 
Reviewed-by: Fam Zheng 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Max Reitz 
---
 docs/qmp-commands.txt  |  84 ---
 qapi/block-core.json   |   4 +-
 tests/qemu-iotests/041 |  11 +++--
 tests/qemu-iotests/067 |  12 +++--
 tests/qemu-iotests/071 | 118 +
 tests/qemu-iotests/081 |  52 ++
 tests/qemu-iotests/085 |   9 ++--
 tests/qemu-iotests/087 |  76 +--
 tests/qemu-iotests/117 |  12 ++---
 tests/qemu-iotests/118 |  42 +-
 tests/qemu-iotests/124 |  20 -
 tests/qemu-iotests/139 |  10 ++---
 tests/qemu-iotests/141 |  13 +++---
 tests/qemu-iotests/155 |  10 ++---
 14 files changed, 214 insertions(+), 259 deletions(-)

diff --git a/docs/qmp-commands.txt b/docs/qmp-commands.txt
index 3220fb1..284576d 100644
--- a/docs/qmp-commands.txt
+++ b/docs/qmp-commands.txt
@@ -1090,11 +1090,11 @@ Arguments:
 Example:
 
 -> { "execute": "blockdev-add",
-"arguments": { "options": { "driver": "qcow2",
-"node-name": "node1534",
-"file": { "driver": "file",
-  "filename": "hd1.qcow2" 
},
-"backing": "" } } }
+"arguments": { "driver": "qcow2",
+   "node-name": "node1534",
+   "file": { "driver": "file",
+ "filename": "hd1.qcow2" },
+   "backing": "" } }
 
 <- { "return": {} }
 
@@ -3130,41 +3130,37 @@ This command is still a work in progress.  It doesn't 
support all
 block drivers among other things.  Stay away from it unless you want
 to help with its development.
 
-Arguments:
-
-- "options": block driver options
+For the arguments, see the QAPI schema documentation of BlockdevOptions.
 
 Example (1):
 
 -> { "execute": "blockdev-add",
-"arguments": { "options" : { "driver": "qcow2",
- "file": { "driver": "file",
-   "filename": "test.qcow2" } } } }
+"arguments": { "driver": "qcow2",
+   "file": { "driver": "file",
+ "filename": "test.qcow2" } } }
 <- { "return": {} }
 
 Example (2):
 
 -> { "execute": "blockdev-add",
  "arguments": {
- "options": {
-   "driver": "qcow2",
-   "node-name": "my_disk",
-   "discard": "unmap",
-   "cache": {
-   "direct": true,
-   "writeback": true
-   },
-   "file": {
-   "driver": "file",
-   "filename": "/tmp/test.qcow2"
-   },
-   "backing": {
-   "driver": "raw",
-   "file": {
-   "driver": "file",
-   "filename": "/dev/fdset/4"
-   }
-   }
+ "driver": "qcow2",
+ "node-name": "my_disk",
+ "discard": "unmap",
+ "cache": {
+ "direct": true,
+ "writeback": true
+ },
+ "file": {
+ "driver": "file",
+ "filename": "/tmp/test.qcow2"
+ },
+ "backing": {
+ "driver": "raw",
+ "file": {
+ "driver": "file",
+ "filename": "/dev/fdset/4"
+ }
  }
}
  }
@@ -3191,13 +3187,11 @@ Example:
 
 -> { "execute": "blockdev-add",
  "arguments": {
- "options": {
- "driver": "qcow2",
- "node-name": "node0",
- "file": {
- "driver": "file",
- "filename": "test.qcow2"
- }
+ "driver": "qcow2",
+ "node-name": "node0",
+ "file": {
+ "driver": "file",
+ "filename": "test.qcow2"
  }
  }
}
@@ -3342,10 +3336,10 @@ Arguments:
 Example:
 
 -> { "execute": "blockdev-add",
- "arguments": { "options": { "node-name": "node0",
- "driver": "raw",
- "file": { "driver": "file",
-   "filename": "fedora.iso" } } } }
+ "arguments": { { "node-name": "node0",
+  "driver": "raw",
+  "file": { "driver": "file",
+"filename": "fedora.iso" } } }
 
 <- { "return": {} }
 
@

[Qemu-block] [PULL 19/23] hbitmap: serialization

2016-10-24 Thread Kevin Wolf

From: Vladimir Sementsov-Ogievskiy 

Functions to serialize / deserialize(restore) HBitmap. HBitmap should be
saved to linear sequence of bits independently of endianness and bitmap
array element (unsigned long) size. Therefore Little Endian is chosen.

These functions are appropriate for dirty bitmap migration, restoring
the bitmap in several steps is available. To save performance, every
step writes only the last level of the bitmap. All other levels are
restored by hbitmap_deserialize_finish() as a last step of restoring.
So, HBitmap is inconsistent while restoring.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
[Fix left shift operand to 1UL; add "finish" parameter. - Fam]
Signed-off-by: Fam Zheng 

Signed-off-by: John Snow 
Message-id: 1476395910-8697-8-git-send-email-js...@redhat.com
Signed-off-by: Max Reitz 
---
 include/qemu/hbitmap.h |  79 
 util/hbitmap.c | 137 +
 2 files changed, 216 insertions(+)

diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
index 1725919..eb46475 100644
--- a/include/qemu/hbitmap.h
+++ b/include/qemu/hbitmap.h
@@ -146,6 +146,85 @@ void hbitmap_reset_all(HBitmap *hb);
 bool hbitmap_get(const HBitmap *hb, uint64_t item);
 
 /**
+ * hbitmap_serialization_granularity:
+ * @hb: HBitmap to operate on.
+ *
+ * Granularity of serialization chunks, used by other serialization functions.
+ * For every chunk:
+ * 1. Chunk start should be aligned to this granularity.
+ * 2. Chunk size should be aligned too, except for last chunk (for which
+ *  start + count == hb->size)
+ */
+uint64_t hbitmap_serialization_granularity(const HBitmap *hb);
+
+/**
+ * hbitmap_serialization_size:
+ * @hb: HBitmap to operate on.
+ * @start: Starting bit
+ * @count: Number of bits
+ *
+ * Return number of bytes hbitmap_(de)serialize_part needs
+ */
+uint64_t hbitmap_serialization_size(const HBitmap *hb,
+uint64_t start, uint64_t count);
+
+/**
+ * hbitmap_serialize_part
+ * @hb: HBitmap to operate on.
+ * @buf: Buffer to store serialized bitmap.
+ * @start: First bit to store.
+ * @count: Number of bits to store.
+ *
+ * Stores HBitmap data corresponding to given region. The format of saved data
+ * is linear sequence of bits, so it can be used by hbitmap_deserialize_part
+ * independently of endianness and size of HBitmap level array elements
+ */
+void hbitmap_serialize_part(const HBitmap *hb, uint8_t *buf,
+uint64_t start, uint64_t count);
+
+/**
+ * hbitmap_deserialize_part
+ * @hb: HBitmap to operate on.
+ * @buf: Buffer to restore bitmap data from.
+ * @start: First bit to restore.
+ * @count: Number of bits to restore.
+ * @finish: Whether to call hbitmap_deserialize_finish automatically.
+ *
+ * Restores HBitmap data corresponding to given region. The format is the same
+ * as for hbitmap_serialize_part.
+ *
+ * If @finish is false, caller must call hbitmap_serialize_finish before using
+ * the bitmap.
+ */
+void hbitmap_deserialize_part(HBitmap *hb, uint8_t *buf,
+  uint64_t start, uint64_t count,
+  bool finish);
+
+/**
+ * hbitmap_deserialize_zeroes
+ * @hb: HBitmap to operate on.
+ * @start: First bit to restore.
+ * @count: Number of bits to restore.
+ * @finish: Whether to call hbitmap_deserialize_finish automatically.
+ *
+ * Fills the bitmap with zeroes.
+ *
+ * If @finish is false, caller must call hbitmap_serialize_finish before using
+ * the bitmap.
+ */
+void hbitmap_deserialize_zeroes(HBitmap *hb, uint64_t start, uint64_t count,
+bool finish);
+
+/**
+ * hbitmap_deserialize_finish
+ * @hb: HBitmap to operate on.
+ *
+ * Repair HBitmap after calling hbitmap_deserialize_data. Actually, all HBitmap
+ * layers are restored here.
+ */
+void hbitmap_deserialize_finish(HBitmap *hb);
+
+/**
  * hbitmap_free:
  * @hb: HBitmap to operate on.
  *
diff --git a/util/hbitmap.c b/util/hbitmap.c
index f303975..5d1a21c 100644
--- a/util/hbitmap.c
+++ b/util/hbitmap.c
@@ -397,6 +397,143 @@ bool hbitmap_get(const HBitmap *hb, uint64_t item)
 return (hb->levels[HBITMAP_LEVELS - 1][pos >> BITS_PER_LEVEL] & bit) != 0;
 }
 
+uint64_t hbitmap_serialization_granularity(const HBitmap *hb)
+{
+/* Require at least 64 bit granularity to be safe on both 64 bit and 32 bit
+ * hosts. */
+return 64 << hb->granularity;
+}
+
+/* Start should be aligned to serialization granularity, chunk size should be
+ * aligned to serialization granularity too, except for last chunk.
+ */
+static void serialization_chunk(const HBitmap *hb,
+uint64_t start, uint64_t count,
+unsigned long **first_el, uint64_t *el_count)
+{
+uint64_t last = start + count - 1;
+uint64_t gran = hbitmap_serialization_granularity(hb);
+
+assert((start & (gran - 1)) == 0);
+assert((last >> hb->granularity) < hb->size)

[Qemu-block] [PULL 21/23] tests: Add test code for hbitmap serialization

2016-10-24 Thread Kevin Wolf

From: Fam Zheng 

Signed-off-by: Fam Zheng 
[Fixed minor constant issue. --js]
Signed-off-by: John Snow 

Signed-off-by: John Snow 
Message-id: 1476395910-8697-10-git-send-email-js...@redhat.com
Reviewed-by: Max Reitz 
Signed-off-by: Max Reitz 
---
 tests/test-hbitmap.c | 156 +++
 1 file changed, 156 insertions(+)

diff --git a/tests/test-hbitmap.c b/tests/test-hbitmap.c
index e3abde1..9b7495c 100644
--- a/tests/test-hbitmap.c
+++ b/tests/test-hbitmap.c
@@ -11,6 +11,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/hbitmap.h"
+#include "qemu/bitmap.h"
 #include "block/block.h"
 
 #define LOG_BITS_PER_LONG  (BITS_PER_LONG == 32 ? 5 : 6)
@@ -737,6 +738,16 @@ static void test_hbitmap_meta_one(TestHBitmapData *data, 
const void *unused)
 }
 }
 
+static void test_hbitmap_serialize_granularity(TestHBitmapData *data,
+   const void *unused)
+{
+int r;
+
+hbitmap_test_init(data, L3 * 2, 3);
+r = hbitmap_serialization_granularity(data->hb);
+g_assert_cmpint(r, ==, 64 << 3);
+}
+
 static void test_hbitmap_meta_zero(TestHBitmapData *data, const void *unused)
 {
 hbitmap_test_init_meta(data, 0, 0, 1);
@@ -744,6 +755,142 @@ static void test_hbitmap_meta_zero(TestHBitmapData *data, 
const void *unused)
 hbitmap_check_meta(data, 0, 0);
 }
 
+static void hbitmap_test_serialize_range(TestHBitmapData *data,
+ uint8_t *buf, size_t buf_size,
+ uint64_t pos, uint64_t count)
+{
+size_t i;
+unsigned long *el = (unsigned long *)buf;
+
+assert(hbitmap_granularity(data->hb) == 0);
+hbitmap_reset_all(data->hb);
+memset(buf, 0, buf_size);
+if (count) {
+hbitmap_set(data->hb, pos, count);
+}
+hbitmap_serialize_part(data->hb, buf, 0, data->size);
+
+/* Serialized buffer is inherently LE, convert it back manually to test */
+for (i = 0; i < buf_size / sizeof(unsigned long); i++) {
+el[i] = (BITS_PER_LONG == 32 ? le32_to_cpu(el[i]) : 
le64_to_cpu(el[i]));
+}
+
+for (i = 0; i < data->size; i++) {
+int is_set = test_bit(i, (unsigned long *)buf);
+if (i >= pos && i < pos + count) {
+g_assert(is_set);
+} else {
+g_assert(!is_set);
+}
+}
+
+/* Re-serialize for deserialization testing */
+memset(buf, 0, buf_size);
+hbitmap_serialize_part(data->hb, buf, 0, data->size);
+hbitmap_reset_all(data->hb);
+hbitmap_deserialize_part(data->hb, buf, 0, data->size, true);
+
+for (i = 0; i < data->size; i++) {
+int is_set = hbitmap_get(data->hb, i);
+if (i >= pos && i < pos + count) {
+g_assert(is_set);
+} else {
+g_assert(!is_set);
+}
+}
+}
+
+static void test_hbitmap_serialize_basic(TestHBitmapData *data,
+ const void *unused)
+{
+int i, j;
+size_t buf_size;
+uint8_t *buf;
+uint64_t positions[] = { 0, 1, L1 - 1, L1, L2 - 1, L2, L2 + 1, L3 - 1 };
+int num_positions = sizeof(positions) / sizeof(positions[0]);
+
+hbitmap_test_init(data, L3, 0);
+buf_size = hbitmap_serialization_size(data->hb, 0, data->size);
+buf = g_malloc0(buf_size);
+
+for (i = 0; i < num_positions; i++) {
+for (j = 0; j < num_positions; j++) {
+hbitmap_test_serialize_range(data, buf, buf_size,
+ positions[i],
+ MIN(positions[j], L3 - positions[i]));
+}
+}
+
+g_free(buf);
+}
+
+static void test_hbitmap_serialize_part(TestHBitmapData *data,
+const void *unused)
+{
+int i, j, k;
+size_t buf_size;
+uint8_t *buf;
+uint64_t positions[] = { 0, 1, L1 - 1, L1, L2 - 1, L2, L2 + 1, L3 - 1 };
+int num_positions = sizeof(positions) / sizeof(positions[0]);
+
+hbitmap_test_init(data, L3, 0);
+buf_size = L2;
+buf = g_malloc0(buf_size);
+
+for (i = 0; i < num_positions; i++) {
+hbitmap_set(data->hb, positions[i], 1);
+}
+
+for (i = 0; i < data->size; i += buf_size) {
+unsigned long *el = (unsigned long *)buf;
+hbitmap_serialize_part(data->hb, buf, i, buf_size);
+for (j = 0; j < buf_size / sizeof(unsigned long); j++) {
+el[j] = (BITS_PER_LONG == 32 ? le32_to_cpu(el[j]) : 
le64_to_cpu(el[j]));
+}
+
+for (j = 0; j < buf_size; j++) {
+bool should_set = false;
+for (k = 0; k < num_positions; k++) {
+if (positions[k] == j + i) {
+should_set = true;
+break;
+}
+}
+g_assert_cmpint(should_set, ==, test_bit(j, (unsigned long *)buf));
+}
+}
+
+g_free(buf);
+}
+
+static void test_hbitmap_serialize_zeroes(TestHBitmapData *data,
+

[Qemu-block] [PULL 11/23] quorum: change child_iter to children_read

2016-10-24 Thread Kevin Wolf

From: Paolo Bonzini 

This simplifies a bit the code by using the usual C "inclusive start,
exclusive end" pattern for ranges.

Signed-off-by: Paolo Bonzini 
Message-id: 1475685327-22767-2-git-send-email-pbonz...@redhat.com
Reviewed-by: Max Reitz 
Reviewed-by: Alberto Garcia 
Signed-off-by: Max Reitz 
---
 block/quorum.c | 30 +-
 1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index 9cf876f..435296e 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -130,7 +130,7 @@ struct QuorumAIOCB {
 
 bool is_read;
 int vote_ret;
-int child_iter; /* which child to read in fifo pattern */
+int children_read;  /* how many children have been read from */
 };
 
 static bool quorum_vote(QuorumAIOCB *acb);
@@ -165,8 +165,7 @@ static void quorum_aio_finalize(QuorumAIOCB *acb)
 acb->common.cb(acb->common.opaque, ret);
 
 if (acb->is_read) {
-/* on the quorum case acb->child_iter == s->num_children - 1 */
-for (i = 0; i <= acb->child_iter; i++) {
+for (i = 0; i < acb->children_read; i++) {
 qemu_vfree(acb->qcrs[i].buf);
 qemu_iovec_destroy(&acb->qcrs[i].qiov);
 }
@@ -301,14 +300,13 @@ static void quorum_aio_cb(void *opaque, int ret)
 
 if (acb->is_read && s->read_pattern == QUORUM_READ_PATTERN_FIFO) {
 /* We try to read next child in FIFO order if we fail to read */
-if (ret < 0 && (acb->child_iter + 1) < s->num_children) {
-acb->child_iter++;
+if (ret < 0 && acb->children_read < s->num_children) {
 read_fifo_child(acb);
 return;
 }
 
 if (ret == 0) {
-quorum_copy_qiov(acb->qiov, &acb->qcrs[acb->child_iter].qiov);
+quorum_copy_qiov(acb->qiov, &acb->qcrs[acb->children_read - 
1].qiov);
 }
 acb->vote_ret = ret;
 quorum_aio_finalize(acb);
@@ -653,6 +651,7 @@ static BlockAIOCB *read_quorum_children(QuorumAIOCB *acb)
 BDRVQuorumState *s = acb->common.bs->opaque;
 int i;
 
+acb->children_read = s->num_children;
 for (i = 0; i < s->num_children; i++) {
 acb->qcrs[i].buf = qemu_blockalign(s->children[i]->bs, 
acb->qiov->size);
 qemu_iovec_init(&acb->qcrs[i].qiov, acb->qiov->niov);
@@ -671,16 +670,14 @@ static BlockAIOCB *read_quorum_children(QuorumAIOCB *acb)
 static BlockAIOCB *read_fifo_child(QuorumAIOCB *acb)
 {
 BDRVQuorumState *s = acb->common.bs->opaque;
+int n = acb->children_read++;
 
-acb->qcrs[acb->child_iter].buf =
-qemu_blockalign(s->children[acb->child_iter]->bs, acb->qiov->size);
-qemu_iovec_init(&acb->qcrs[acb->child_iter].qiov, acb->qiov->niov);
-qemu_iovec_clone(&acb->qcrs[acb->child_iter].qiov, acb->qiov,
- acb->qcrs[acb->child_iter].buf);
-acb->qcrs[acb->child_iter].aiocb =
-bdrv_aio_readv(s->children[acb->child_iter], acb->sector_num,
-   &acb->qcrs[acb->child_iter].qiov, acb->nb_sectors,
-   quorum_aio_cb, &acb->qcrs[acb->child_iter]);
+acb->qcrs[n].buf = qemu_blockalign(s->children[n]->bs, acb->qiov->size);
+qemu_iovec_init(&acb->qcrs[n].qiov, acb->qiov->niov);
+qemu_iovec_clone(&acb->qcrs[n].qiov, acb->qiov, acb->qcrs[n].buf);
+acb->qcrs[n].aiocb = bdrv_aio_readv(s->children[n], acb->sector_num,
+&acb->qcrs[n].qiov, acb->nb_sectors,
+quorum_aio_cb, &acb->qcrs[n]);
 
 return &acb->common;
 }
@@ -696,13 +693,12 @@ static BlockAIOCB *quorum_aio_readv(BlockDriverState *bs,
 QuorumAIOCB *acb = quorum_aio_get(s, bs, qiov, sector_num,
   nb_sectors, cb, opaque);
 acb->is_read = true;
+acb->children_read = 0;
 
 if (s->read_pattern == QUORUM_READ_PATTERN_QUORUM) {
-acb->child_iter = s->num_children - 1;
 return read_quorum_children(acb);
 }
 
-acb->child_iter = 0;
 return read_fifo_child(acb);
 }
 
-- 
1.8.3.1

[Qemu-block] [PULL 15/23] tests: Add test code for meta bitmap

2016-10-24 Thread Kevin Wolf

From: Fam Zheng 

Signed-off-by: Fam Zheng 
Reviewed-by: John Snow 
Reviewed-by: Max Reitz 
Signed-off-by: John Snow 
Message-id: 1476395910-8697-4-git-send-email-js...@redhat.com
Signed-off-by: Max Reitz 
---
 tests/test-hbitmap.c | 116 +++
 1 file changed, 116 insertions(+)

diff --git a/tests/test-hbitmap.c b/tests/test-hbitmap.c
index c0e9895..e3abde1 100644
--- a/tests/test-hbitmap.c
+++ b/tests/test-hbitmap.c
@@ -11,6 +11,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/hbitmap.h"
+#include "block/block.h"
 
 #define LOG_BITS_PER_LONG  (BITS_PER_LONG == 32 ? 5 : 6)
 
@@ -20,6 +21,7 @@
 
 typedef struct TestHBitmapData {
 HBitmap   *hb;
+HBitmap   *meta;
 unsigned long *bits;
 size_t size;
 size_t old_size;
@@ -91,6 +93,14 @@ static void hbitmap_test_init(TestHBitmapData *data,
 }
 }
 
+static void hbitmap_test_init_meta(TestHBitmapData *data,
+   uint64_t size, int granularity,
+   int meta_chunk)
+{
+hbitmap_test_init(data, size, granularity);
+data->meta = hbitmap_create_meta(data->hb, meta_chunk);
+}
+
 static inline size_t hbitmap_test_array_size(size_t bits)
 {
 size_t n = DIV_ROUND_UP(bits, BITS_PER_LONG);
@@ -133,6 +143,9 @@ static void hbitmap_test_teardown(TestHBitmapData *data,
   const void *unused)
 {
 if (data->hb) {
+if (data->meta) {
+hbitmap_free_meta(data->hb);
+}
 hbitmap_free(data->hb);
 data->hb = NULL;
 }
@@ -634,6 +647,103 @@ static void 
test_hbitmap_truncate_shrink_large(TestHBitmapData *data,
 hbitmap_test_truncate(data, size, -diff, 0);
 }
 
+static void hbitmap_check_meta(TestHBitmapData *data,
+   int64_t start, int count)
+{
+int64_t i;
+
+for (i = 0; i < data->size; i++) {
+if (i >= start && i < start + count) {
+g_assert(hbitmap_get(data->meta, i));
+} else {
+g_assert(!hbitmap_get(data->meta, i));
+}
+}
+}
+
+static void hbitmap_test_meta(TestHBitmapData *data,
+  int64_t start, int count,
+  int64_t check_start, int check_count)
+{
+hbitmap_reset_all(data->hb);
+hbitmap_reset_all(data->meta);
+
+/* Test "unset" -> "unset" will not update meta. */
+hbitmap_reset(data->hb, start, count);
+hbitmap_check_meta(data, 0, 0);
+
+/* Test "unset" -> "set" will update meta */
+hbitmap_set(data->hb, start, count);
+hbitmap_check_meta(data, check_start, check_count);
+
+/* Test "set" -> "set" will not update meta */
+hbitmap_reset_all(data->meta);
+hbitmap_set(data->hb, start, count);
+hbitmap_check_meta(data, 0, 0);
+
+/* Test "set" -> "unset" will update meta */
+hbitmap_reset_all(data->meta);
+hbitmap_reset(data->hb, start, count);
+hbitmap_check_meta(data, check_start, check_count);
+}
+
+static void hbitmap_test_meta_do(TestHBitmapData *data, int chunk_size)
+{
+uint64_t size = chunk_size * 100;
+hbitmap_test_init_meta(data, size, 0, chunk_size);
+
+hbitmap_test_meta(data, 0, 1, 0, chunk_size);
+hbitmap_test_meta(data, 0, chunk_size, 0, chunk_size);
+hbitmap_test_meta(data, chunk_size - 1, 1, 0, chunk_size);
+hbitmap_test_meta(data, chunk_size - 1, 2, 0, chunk_size * 2);
+hbitmap_test_meta(data, chunk_size - 1, chunk_size + 1, 0, chunk_size * 2);
+hbitmap_test_meta(data, chunk_size - 1, chunk_size + 2, 0, chunk_size * 3);
+hbitmap_test_meta(data, 7 * chunk_size - 1, chunk_size + 2,
+  6 * chunk_size, chunk_size * 3);
+hbitmap_test_meta(data, size - 1, 1, size - chunk_size, chunk_size);
+hbitmap_test_meta(data, 0, size, 0, size);
+}
+
+static void test_hbitmap_meta_byte(TestHBitmapData *data, const void *unused)
+{
+hbitmap_test_meta_do(data, BITS_PER_BYTE);
+}
+
+static void test_hbitmap_meta_word(TestHBitmapData *data, const void *unused)
+{
+hbitmap_test_meta_do(data, BITS_PER_LONG);
+}
+
+static void test_hbitmap_meta_sector(TestHBitmapData *data, const void *unused)
+{
+hbitmap_test_meta_do(data, BDRV_SECTOR_SIZE * BITS_PER_BYTE);
+}
+
+/**
+ * Create an HBitmap and test set/unset.
+ */
+static void test_hbitmap_meta_one(TestHBitmapData *data, const void *unused)
+{
+int i;
+int64_t offsets[] = {
+0, 1, L1 - 1, L1, L1 + 1, L2 - 1, L2, L2 + 1, L3 - 1, L3, L3 + 1
+};
+
+hbitmap_test_init_meta(data, L3 * 2, 0, 1);
+for (i = 0; i < ARRAY_SIZE(offsets); i++) {
+hbitmap_test_meta(data, offsets[i], 1, offsets[i], 1);
+hbitmap_test_meta(data, offsets[i], L1, offsets[i], L1);
+hbitmap_test_meta(data, offsets[i], L2, offsets[i], L2);
+}
+}
+
+static void test_hbitmap_meta_zero(TestHBitmapData *data, const void *unused)
+{
+hbitmap_test_init_meta(data, 0, 0, 1);
+
+hbitmap_check_

[Qemu-block] [PULL 17/23] block: Add two dirty bitmap getters

2016-10-24 Thread Kevin Wolf

From: Fam Zheng 

For dirty bitmap users to get the size and the name of a
BdrvDirtyBitmap.

Signed-off-by: Fam Zheng 
Reviewed-by: John Snow 
Reviewed-by: Max Reitz 
Signed-off-by: John Snow 
Message-id: 1476395910-8697-6-git-send-email-js...@redhat.com
Signed-off-by: Max Reitz 
---
 block/dirty-bitmap.c | 10 ++
 include/block/dirty-bitmap.h |  2 ++
 2 files changed, 12 insertions(+)

diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 9c6febb..860acc9 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -154,6 +154,16 @@ void bdrv_dirty_bitmap_reset_meta(BlockDriverState *bs,
 hbitmap_reset(bitmap->meta, sector, nb_sectors);
 }
 
+int64_t bdrv_dirty_bitmap_size(const BdrvDirtyBitmap *bitmap)
+{
+return bitmap->size;
+}
+
+const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap)
+{
+return bitmap->name;
+}
+
 bool bdrv_dirty_bitmap_frozen(BdrvDirtyBitmap *bitmap)
 {
 return bitmap->successor;
diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index 69c500b..c4e7858 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -32,6 +32,8 @@ uint32_t bdrv_get_default_bitmap_granularity(BlockDriverState 
*bs);
 uint32_t bdrv_dirty_bitmap_granularity(BdrvDirtyBitmap *bitmap);
 bool bdrv_dirty_bitmap_enabled(BdrvDirtyBitmap *bitmap);
 bool bdrv_dirty_bitmap_frozen(BdrvDirtyBitmap *bitmap);
+const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap);
+int64_t bdrv_dirty_bitmap_size(const BdrvDirtyBitmap *bitmap);
 DirtyBitmapStatus bdrv_dirty_bitmap_status(BdrvDirtyBitmap *bitmap);
 int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
int64_t sector);
-- 
1.8.3.1

[Qemu-block] [PULL 13/23] block: Hide HBitmap in block dirty bitmap interface

2016-10-24 Thread Kevin Wolf

From: Fam Zheng 

HBitmap is an implementation detail of block dirty bitmap that should be hidden
from users. Introduce a BdrvDirtyBitmapIter to encapsulate the underlying
HBitmapIter.

A small difference in the interface is, before, an HBitmapIter is initialized
in place, now the new BdrvDirtyBitmapIter must be dynamically allocated because
the structure definition is in block/dirty-bitmap.c.

Two current users are converted too.

Signed-off-by: Fam Zheng 
Reviewed-by: Max Reitz 
Signed-off-by: John Snow 
Message-id: 1476395910-8697-2-git-send-email-js...@redhat.com
Signed-off-by: Max Reitz 
---
 block/backup.c   | 14 --
 block/dirty-bitmap.c | 39 +--
 block/mirror.c   | 24 +---
 include/block/dirty-bitmap.h |  7 +--
 include/qemu/typedefs.h  |  1 +
 5 files changed, 60 insertions(+), 25 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 582bd0f..02dbe48 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -372,14 +372,14 @@ static int coroutine_fn 
backup_run_incremental(BackupBlockJob *job)
 int64_t end;
 int64_t last_cluster = -1;
 int64_t sectors_per_cluster = cluster_size_sectors(job);
-HBitmapIter hbi;
+BdrvDirtyBitmapIter *dbi;
 
 granularity = bdrv_dirty_bitmap_granularity(job->sync_bitmap);
 clusters_per_iter = MAX((granularity / job->cluster_size), 1);
-bdrv_dirty_iter_init(job->sync_bitmap, &hbi);
+dbi = bdrv_dirty_iter_new(job->sync_bitmap, 0);
 
 /* Find the next dirty sector(s) */
-while ((sector = hbitmap_iter_next(&hbi)) != -1) {
+while ((sector = bdrv_dirty_iter_next(dbi)) != -1) {
 cluster = sector / sectors_per_cluster;
 
 /* Fake progress updates for any clusters we skipped */
@@ -391,7 +391,7 @@ static int coroutine_fn 
backup_run_incremental(BackupBlockJob *job)
 for (end = cluster + clusters_per_iter; cluster < end; cluster++) {
 do {
 if (yield_and_check(job)) {
-return ret;
+goto out;
 }
 ret = backup_do_cow(job, cluster * sectors_per_cluster,
 sectors_per_cluster, &error_is_read,
@@ -399,7 +399,7 @@ static int coroutine_fn 
backup_run_incremental(BackupBlockJob *job)
 if ((ret < 0) &&
 backup_error_action(job, error_is_read, -ret) ==
 BLOCK_ERROR_ACTION_REPORT) {
-return ret;
+goto out;
 }
 } while (ret < 0);
 }
@@ -407,7 +407,7 @@ static int coroutine_fn 
backup_run_incremental(BackupBlockJob *job)
 /* If the bitmap granularity is smaller than the backup granularity,
  * we need to advance the iterator pointer to the next cluster. */
 if (granularity < job->cluster_size) {
-bdrv_set_dirty_iter(&hbi, cluster * sectors_per_cluster);
+bdrv_set_dirty_iter(dbi, cluster * sectors_per_cluster);
 }
 
 last_cluster = cluster - 1;
@@ -419,6 +419,8 @@ static int coroutine_fn 
backup_run_incremental(BackupBlockJob *job)
 job->common.offset += ((end - last_cluster - 1) * job->cluster_size);
 }
 
+out:
+bdrv_dirty_iter_free(dbi);
 return ret;
 }
 
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index f2bfdcf..c572dfa 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -42,9 +42,15 @@ struct BdrvDirtyBitmap {
 char *name; /* Optional non-empty unique ID */
 int64_t size;   /* Size of the bitmap (Number of sectors) */
 bool disabled;  /* Bitmap is read-only */
+int active_iterators;   /* How many iterators are active */
 QLIST_ENTRY(BdrvDirtyBitmap) list;
 };
 
+struct BdrvDirtyBitmapIter {
+HBitmapIter hbi;
+BdrvDirtyBitmap *bitmap;
+};
+
 BdrvDirtyBitmap *bdrv_find_dirty_bitmap(BlockDriverState *bs, const char *name)
 {
 BdrvDirtyBitmap *bm;
@@ -212,6 +218,7 @@ void bdrv_dirty_bitmap_truncate(BlockDriverState *bs)
 
 QLIST_FOREACH(bitmap, &bs->dirty_bitmaps, list) {
 assert(!bdrv_dirty_bitmap_frozen(bitmap));
+assert(!bitmap->active_iterators);
 hbitmap_truncate(bitmap->bitmap, size);
 bitmap->size = size;
 }
@@ -224,6 +231,7 @@ static void 
bdrv_do_release_matching_dirty_bitmap(BlockDriverState *bs,
 BdrvDirtyBitmap *bm, *next;
 QLIST_FOREACH_SAFE(bm, &bs->dirty_bitmaps, list, next) {
 if ((!bitmap || bm == bitmap) && (!only_named || bm->name)) {
+assert(!bm->active_iterators);
 assert(!bdrv_dirty_bitmap_frozen(bm));
 QLIST_REMOVE(bm, list);
 hbitmap_free(bm->bitmap);
@@ -320,9 +328,29 @@ uint32_t bdrv_dirty_bitmap_granularity(BdrvDirtyBitmap 
*bitmap)
 return BDRV_SECTOR_SIZE << hbitmap_granularity(bitmap->bitmap);
 }
 
-void bdrv_dirty_iter_init(BdrvDirt

[Qemu-block] [PULL 18/23] block: Assert that bdrv_release_dirty_bitmap succeeded

2016-10-24 Thread Kevin Wolf

From: Fam Zheng 

We use a loop over bs->dirty_bitmaps to make sure the caller is
only releasing a bitmap owned by bs. Let's also assert that in this case
the caller is releasing a bitmap that does exist.

Signed-off-by: Fam Zheng 
Reviewed-by: Max Reitz 
Signed-off-by: John Snow 
Message-id: 1476395910-8697-7-git-send-email-js...@redhat.com
Signed-off-by: Max Reitz 
---
 block/dirty-bitmap.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 860acc9..31d5296 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -305,6 +305,9 @@ static void 
bdrv_do_release_matching_dirty_bitmap(BlockDriverState *bs,
 }
 }
 }
+if (bitmap) {
+abort();
+}
 }
 
 void bdrv_release_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap)
-- 
1.8.3.1

[Qemu-block] [PULL 16/23] block: Support meta dirty bitmap

2016-10-24 Thread Kevin Wolf

From: Fam Zheng 

The added group of operations enables tracking of the changed bits in
the dirty bitmap.

Signed-off-by: Fam Zheng 
Reviewed-by: Max Reitz 
Signed-off-by: John Snow 
Message-id: 1476395910-8697-5-git-send-email-js...@redhat.com
Signed-off-by: Max Reitz 
---
 block/dirty-bitmap.c | 52 
 include/block/dirty-bitmap.h |  9 
 2 files changed, 61 insertions(+)

diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index c572dfa..9c6febb 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -38,6 +38,7 @@
  */
 struct BdrvDirtyBitmap {
 HBitmap *bitmap;/* Dirty sector bitmap implementation */
+HBitmap *meta;  /* Meta dirty bitmap */
 BdrvDirtyBitmap *successor; /* Anonymous child; implies frozen status */
 char *name; /* Optional non-empty unique ID */
 int64_t size;   /* Size of the bitmap (Number of sectors) */
@@ -103,6 +104,56 @@ BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState 
*bs,
 return bitmap;
 }
 
+/* bdrv_create_meta_dirty_bitmap
+ *
+ * Create a meta dirty bitmap that tracks the changes of bits in @bitmap. I.e.
+ * when a dirty status bit in @bitmap is changed (either from reset to set or
+ * the other way around), its respective meta dirty bitmap bit will be marked
+ * dirty as well.
+ *
+ * @bitmap: the block dirty bitmap for which to create a meta dirty bitmap.
+ * @chunk_size: how many bytes of bitmap data does each bit in the meta bitmap
+ * track.
+ */
+void bdrv_create_meta_dirty_bitmap(BdrvDirtyBitmap *bitmap,
+   int chunk_size)
+{
+assert(!bitmap->meta);
+bitmap->meta = hbitmap_create_meta(bitmap->bitmap,
+   chunk_size * BITS_PER_BYTE);
+}
+
+void bdrv_release_meta_dirty_bitmap(BdrvDirtyBitmap *bitmap)
+{
+assert(bitmap->meta);
+hbitmap_free_meta(bitmap->bitmap);
+bitmap->meta = NULL;
+}
+
+int bdrv_dirty_bitmap_get_meta(BlockDriverState *bs,
+   BdrvDirtyBitmap *bitmap, int64_t sector,
+   int nb_sectors)
+{
+uint64_t i;
+int sectors_per_bit = 1 << hbitmap_granularity(bitmap->meta);
+
+/* To optimize: we can make hbitmap to internally check the range in a
+ * coarse level, or at least do it word by word. */
+for (i = sector; i < sector + nb_sectors; i += sectors_per_bit) {
+if (hbitmap_get(bitmap->meta, i)) {
+return true;
+}
+}
+return false;
+}
+
+void bdrv_dirty_bitmap_reset_meta(BlockDriverState *bs,
+  BdrvDirtyBitmap *bitmap, int64_t sector,
+  int nb_sectors)
+{
+hbitmap_reset(bitmap->meta, sector, nb_sectors);
+}
+
 bool bdrv_dirty_bitmap_frozen(BdrvDirtyBitmap *bitmap)
 {
 return bitmap->successor;
@@ -233,6 +284,7 @@ static void 
bdrv_do_release_matching_dirty_bitmap(BlockDriverState *bs,
 if ((!bitmap || bm == bitmap) && (!only_named || bm->name)) {
 assert(!bm->active_iterators);
 assert(!bdrv_dirty_bitmap_frozen(bm));
+assert(!bm->meta);
 QLIST_REMOVE(bm, list);
 hbitmap_free(bm->bitmap);
 g_free(bm->name);
diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index 0ef927d..69c500b 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -8,6 +8,9 @@ BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs,
   uint32_t granularity,
   const char *name,
   Error **errp);
+void bdrv_create_meta_dirty_bitmap(BdrvDirtyBitmap *bitmap,
+   int chunk_size);
+void bdrv_release_meta_dirty_bitmap(BdrvDirtyBitmap *bitmap);
 int bdrv_dirty_bitmap_create_successor(BlockDriverState *bs,
BdrvDirtyBitmap *bitmap,
Error **errp);
@@ -36,6 +39,12 @@ void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
int64_t cur_sector, int64_t nr_sectors);
 void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
  int64_t cur_sector, int64_t nr_sectors);
+int bdrv_dirty_bitmap_get_meta(BlockDriverState *bs,
+   BdrvDirtyBitmap *bitmap, int64_t sector,
+   int nb_sectors);
+void bdrv_dirty_bitmap_reset_meta(BlockDriverState *bs,
+  BdrvDirtyBitmap *bitmap, int64_t sector,
+  int nb_sectors);
 BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap *bitmap,
  uint64_t first_sector);
 void bdrv_dirty_iter_free(BdrvDirtyBitmapIter *iter);
-- 
1.8.3.1

Re: [Qemu-block] [PATCH 15/22] qcow2-bitmap: add autoclear bit

2016-10-24 Thread Max Reitz

On 24.10.2016 16:25, Vladimir Sementsov-Ogievskiy wrote:
> 07.10.2016 23:11, Max Reitz пишет:
>> On 30.09.2016 12:53, Vladimir Sementsov-Ogievskiy wrote:
>>> Add autoclear bit for handling rewriting image by old qemu version.
>> "Add support for the autoclear bit [...]"?
> 
> finally "Add support for the autoclear bit to handle rewriting image by old
> qemu version" - ok?
> 
> Or I can merge this patch into two earlier patches 06 and 07.

Sounds good.

Max



signature.asc
Description: OpenPGP digital signature

[Qemu-block] [PULL 14/23] HBitmap: Introduce "meta" bitmap to track bit changes

2016-10-24 Thread Kevin Wolf

From: Fam Zheng 

Upon each bit toggle, the corresponding bit in the meta bitmap will be
set.

Signed-off-by: Fam Zheng 
[Amended text inline. --js]
Reviewed-by: Max Reitz 

Signed-off-by: John Snow 
Message-id: 1476395910-8697-3-git-send-email-js...@redhat.com
Signed-off-by: Max Reitz 
---
 include/qemu/hbitmap.h | 21 +++
 util/hbitmap.c | 69 +++---
 2 files changed, 75 insertions(+), 15 deletions(-)

diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
index 8ab721e..1725919 100644
--- a/include/qemu/hbitmap.h
+++ b/include/qemu/hbitmap.h
@@ -178,6 +178,27 @@ void hbitmap_iter_init(HBitmapIter *hbi, const HBitmap 
*hb, uint64_t first);
  */
 unsigned long hbitmap_iter_skip_words(HBitmapIter *hbi);
 
+/* hbitmap_create_meta:
+ * Create a "meta" hbitmap to track dirtiness of the bits in this HBitmap.
+ * The caller owns the created bitmap and must call hbitmap_free_meta(hb) to
+ * free it.
+ *
+ * Currently, we only guarantee that if a bit in the hbitmap is changed it
+ * will be reflected in the meta bitmap, but we do not yet guarantee the
+ * opposite.
+ *
+ * @hb: The HBitmap to operate on.
+ * @chunk_size: How many bits in @hb does one bit in the meta track.
+ */
+HBitmap *hbitmap_create_meta(HBitmap *hb, int chunk_size);
+
+/* hbitmap_free_meta:
+ * Free the meta bitmap of @hb.
+ *
+ * @hb: The HBitmap whose meta bitmap should be freed.
+ */
+void hbitmap_free_meta(HBitmap *hb);
+
 /**
  * hbitmap_iter_next:
  * @hbi: HBitmapIter to operate on.
diff --git a/util/hbitmap.c b/util/hbitmap.c
index 99fd2ba..f303975 100644
--- a/util/hbitmap.c
+++ b/util/hbitmap.c
@@ -78,6 +78,9 @@ struct HBitmap {
  */
 int granularity;
 
+/* A meta dirty bitmap to track the dirtiness of bits in this HBitmap. */
+HBitmap *meta;
+
 /* A number of progressively less coarse bitmaps (i.e. level 0 is the
  * coarsest).  Each bit in level N represents a word in level N+1 that
  * has a set bit, except the last level where each bit represents the
@@ -209,25 +212,27 @@ static uint64_t hb_count_between(HBitmap *hb, uint64_t 
start, uint64_t last)
 }
 
 /* Setting starts at the last layer and propagates up if an element
- * changes from zero to non-zero.
+ * changes.
  */
 static inline bool hb_set_elem(unsigned long *elem, uint64_t start, uint64_t 
last)
 {
 unsigned long mask;
-bool changed;
+unsigned long old;
 
 assert((last >> BITS_PER_LEVEL) == (start >> BITS_PER_LEVEL));
 assert(start <= last);
 
 mask = 2UL << (last & (BITS_PER_LONG - 1));
 mask -= 1UL << (start & (BITS_PER_LONG - 1));
-changed = (*elem == 0);
+old = *elem;
 *elem |= mask;
-return changed;
+return old != *elem;
 }
 
-/* The recursive workhorse (the depth is limited to HBITMAP_LEVELS)... */
-static void hb_set_between(HBitmap *hb, int level, uint64_t start, uint64_t 
last)
+/* The recursive workhorse (the depth is limited to HBITMAP_LEVELS)...
+ * Returns true if at least one bit is changed. */
+static bool hb_set_between(HBitmap *hb, int level, uint64_t start,
+   uint64_t last)
 {
 size_t pos = start >> BITS_PER_LEVEL;
 size_t lastpos = last >> BITS_PER_LEVEL;
@@ -256,23 +261,28 @@ static void hb_set_between(HBitmap *hb, int level, 
uint64_t start, uint64_t last
 if (level > 0 && changed) {
 hb_set_between(hb, level - 1, pos, lastpos);
 }
+return changed;
 }
 
 void hbitmap_set(HBitmap *hb, uint64_t start, uint64_t count)
 {
 /* Compute range in the last layer.  */
+uint64_t first, n;
 uint64_t last = start + count - 1;
 
 trace_hbitmap_set(hb, start, count,
   start >> hb->granularity, last >> hb->granularity);
 
-start >>= hb->granularity;
+first = start >> hb->granularity;
 last >>= hb->granularity;
-count = last - start + 1;
 assert(last < hb->size);
+n = last - first + 1;
 
-hb->count += count - hb_count_between(hb, start, last);
-hb_set_between(hb, HBITMAP_LEVELS - 1, start, last);
+hb->count += n - hb_count_between(hb, first, last);
+if (hb_set_between(hb, HBITMAP_LEVELS - 1, first, last) &&
+hb->meta) {
+hbitmap_set(hb->meta, start, count);
+}
 }
 
 /* Resetting works the other way round: propagate up if the new
@@ -293,8 +303,10 @@ static inline bool hb_reset_elem(unsigned long *elem, 
uint64_t start, uint64_t l
 return blanked;
 }
 
-/* The recursive workhorse (the depth is limited to HBITMAP_LEVELS)... */
-static void hb_reset_between(HBitmap *hb, int level, uint64_t start, uint64_t 
last)
+/* The recursive workhorse (the depth is limited to HBITMAP_LEVELS)...
+ * Returns true if at least one bit is changed. */
+static bool hb_reset_between(HBitmap *hb, int level, uint64_t start,
+ uint64_t last)
 {
 size_t pos = start >> BITS_PER_LEVEL;
 size_t lastpos = last >> BITS_PER_LEVEL;
@@ -337,22 +349,29 @@ static void hb_

[Qemu-block] [PULL 22/23] block: More operations for meta dirty bitmap

2016-10-24 Thread Kevin Wolf

From: Fam Zheng 

Callers can create an iterator of meta bitmap with
bdrv_dirty_meta_iter_new(), then use the bdrv_dirty_iter_* operations on
it. Meta iterators are also counted by bitmap->active_iterators.

Also add a couple of functions to retrieve granularity and count.

Signed-off-by: Fam Zheng 
Reviewed-by: Max Reitz 
Signed-off-by: John Snow 
Message-id: 1476395910-8697-11-git-send-email-js...@redhat.com
Signed-off-by: Max Reitz 
---
 block/dirty-bitmap.c | 19 +++
 include/block/dirty-bitmap.h |  3 +++
 2 files changed, 22 insertions(+)

diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 384146b..519737c 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -393,6 +393,11 @@ uint32_t bdrv_dirty_bitmap_granularity(BdrvDirtyBitmap 
*bitmap)
 return BDRV_SECTOR_SIZE << hbitmap_granularity(bitmap->bitmap);
 }
 
+uint32_t bdrv_dirty_bitmap_meta_granularity(BdrvDirtyBitmap *bitmap)
+{
+return BDRV_SECTOR_SIZE << hbitmap_granularity(bitmap->meta);
+}
+
 BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap *bitmap,
  uint64_t first_sector)
 {
@@ -403,6 +408,15 @@ BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap 
*bitmap,
 return iter;
 }
 
+BdrvDirtyBitmapIter *bdrv_dirty_meta_iter_new(BdrvDirtyBitmap *bitmap)
+{
+BdrvDirtyBitmapIter *iter = g_new(BdrvDirtyBitmapIter, 1);
+hbitmap_iter_init(&iter->hbi, bitmap->meta, 0);
+iter->bitmap = bitmap;
+bitmap->active_iterators++;
+return iter;
+}
+
 void bdrv_dirty_iter_free(BdrvDirtyBitmapIter *iter)
 {
 if (!iter) {
@@ -514,3 +528,8 @@ int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap)
 {
 return hbitmap_count(bitmap->bitmap);
 }
+
+int64_t bdrv_get_meta_dirty_count(BdrvDirtyBitmap *bitmap)
+{
+return hbitmap_count(bitmap->meta);
+}
diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index efc2965..9dea14b 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -30,6 +30,7 @@ void bdrv_enable_dirty_bitmap(BdrvDirtyBitmap *bitmap);
 BlockDirtyInfoList *bdrv_query_dirty_bitmaps(BlockDriverState *bs);
 uint32_t bdrv_get_default_bitmap_granularity(BlockDriverState *bs);
 uint32_t bdrv_dirty_bitmap_granularity(BdrvDirtyBitmap *bitmap);
+uint32_t bdrv_dirty_bitmap_meta_granularity(BdrvDirtyBitmap *bitmap);
 bool bdrv_dirty_bitmap_enabled(BdrvDirtyBitmap *bitmap);
 bool bdrv_dirty_bitmap_frozen(BdrvDirtyBitmap *bitmap);
 const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap);
@@ -47,12 +48,14 @@ int bdrv_dirty_bitmap_get_meta(BlockDriverState *bs,
 void bdrv_dirty_bitmap_reset_meta(BlockDriverState *bs,
   BdrvDirtyBitmap *bitmap, int64_t sector,
   int nb_sectors);
+BdrvDirtyBitmapIter *bdrv_dirty_meta_iter_new(BdrvDirtyBitmap *bitmap);
 BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap *bitmap,
  uint64_t first_sector);
 void bdrv_dirty_iter_free(BdrvDirtyBitmapIter *iter);
 int64_t bdrv_dirty_iter_next(BdrvDirtyBitmapIter *iter);
 void bdrv_set_dirty_iter(BdrvDirtyBitmapIter *hbi, int64_t sector_num);
 int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap);
+int64_t bdrv_get_meta_dirty_count(BdrvDirtyBitmap *bitmap);
 void bdrv_dirty_bitmap_truncate(BlockDriverState *bs);
 
 uint64_t bdrv_dirty_bitmap_serialization_size(const BdrvDirtyBitmap *bitmap,
-- 
1.8.3.1

Re: [Qemu-block] [PATCH 12/22] qcow2-bitmap: add IN_USE flag

2016-10-24 Thread Max Reitz

On 24.10.2016 13:35, Vladimir Sementsov-Ogievskiy wrote:
> 24.10.2016 13:32, Vladimir Sementsov-Ogievskiy пишет:
>> 21.10.2016 22:58, Max Reitz пишет:
>>> On 21.10.2016 17:34, Vladimir Sementsov-Ogievskiy wrote:
 07.10.2016 22:44, Max Reitz пишет:
> On 30.09.2016 12:53, Vladimir Sementsov-Ogievskiy wrote:
>> This flag means that the bitmap is now in use by the software or
>> was not
>> successfully saved. In any way, with this flag set the bitmap data
>> must
>> be considered inconsistent and should not be loaded.
>>
>> With current implementation this flag is never set, as we just remove
>> bitmaps from the image after loading. But it defined in qcow2 spec
>> and
>> must be handled. Also, it can be used in future, if async schemes of
>> bitmap loading/saving are implemented.
>>
>> We also remove in-use bitmaps from the image on open.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy
>> 
>> ---
>>block/qcow2-bitmap.c | 17 -
>>1 file changed, 16 insertions(+), 1 deletion(-)
> Don't you want to make use of this flag? It would appear useful to
> me if
> you just marked autoload bitmaps as in_use instead of deleting them
> from
> the image when it's opened and then overwrite them when the image is
> closed.
 And what is the use of it?
>>> You don't need to free any bitmaps when opening the file, and you don't
>>> need to allocate any new bitmap space when closing it.
>>
>> As bitmaps are sparce in file, I need to allocate new space when
>> closing. Or free it...
> 
> 
> Is it a real problem to reallocate space in qcow2? If so, to minimaze
> allocations, I will have to make a list of clusters of in_use bitmaps on
> close, and then save current bitmaps to these clusters (and allocated
> some additional clusters, or free extra clusters).

It's not a real problem, but it does take time, and I maintain that it's
time it doesn't need to take because you can just use the in_use flag.

I wouldn't worry about reusing clusters of other bitmaps. Of course we
can do that later on in some optimization but not now.

I just mean the basic case of some auto-loaded bitmap that is not being
deleted while the VM is running and is just saved back to the image file
once it is closed. I don't expect that users will always consume all of
the auto-loaded bitmaps while the VM is active...

> Also, if I don't free them on open, I'll have to free them on remove of
> bdrv dirty bitmap..

Yes, so? That takes time, but that is something the user will probably
expect. I wouldn't expect opening of the file to take that time.

Overall, it doesn't matter time-wise whether you free the bitmap data
when opening the image file or when the dirty bitmap is actually deleted
by the user or when the image file is closed. Just setting the single
in_use flag for all of the auto-loaded bitmaps will basically not take
any time.

On the other hand, as soon as just a single auto-loaded bitmap survives
a VM (or qemu-img) invocation, you will very, very likely safe at least
some time because writing the bitmap to the disk can reuse at least some
of the existing clusters.

Max

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH 12/22] qcow2-bitmap: add IN_USE flag

2016-10-24 Thread Max Reitz

On 24.10.2016 19:08, Max Reitz wrote:
> On 24.10.2016 13:35, Vladimir Sementsov-Ogievskiy wrote:
>> 24.10.2016 13:32, Vladimir Sementsov-Ogievskiy пишет:
>>> 21.10.2016 22:58, Max Reitz пишет:
 On 21.10.2016 17:34, Vladimir Sementsov-Ogievskiy wrote:
> 07.10.2016 22:44, Max Reitz пишет:
>> On 30.09.2016 12:53, Vladimir Sementsov-Ogievskiy wrote:
>>> This flag means that the bitmap is now in use by the software or
>>> was not
>>> successfully saved. In any way, with this flag set the bitmap data
>>> must
>>> be considered inconsistent and should not be loaded.
>>>
>>> With current implementation this flag is never set, as we just remove
>>> bitmaps from the image after loading. But it defined in qcow2 spec
>>> and
>>> must be handled. Also, it can be used in future, if async schemes of
>>> bitmap loading/saving are implemented.
>>>
>>> We also remove in-use bitmaps from the image on open.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy
>>> 
>>> ---
>>>block/qcow2-bitmap.c | 17 -
>>>1 file changed, 16 insertions(+), 1 deletion(-)
>> Don't you want to make use of this flag? It would appear useful to
>> me if
>> you just marked autoload bitmaps as in_use instead of deleting them
>> from
>> the image when it's opened and then overwrite them when the image is
>> closed.
> And what is the use of it?
 You don't need to free any bitmaps when opening the file, and you don't
 need to allocate any new bitmap space when closing it.
>>>
>>> As bitmaps are sparce in file, I need to allocate new space when
>>> closing. Or free it...
>>
>>
>> Is it a real problem to reallocate space in qcow2? If so, to minimaze
>> allocations, I will have to make a list of clusters of in_use bitmaps on
>> close, and then save current bitmaps to these clusters (and allocated
>> some additional clusters, or free extra clusters).
> 
> It's not a real problem, but it does take time, and I maintain that it's
> time it doesn't need to take because you can just use the in_use flag.
> 
> I wouldn't worry about reusing clusters of other bitmaps. Of course we
> can do that later on in some optimization but not now.
> 
> I just mean the basic case of some auto-loaded bitmap that is not being
> deleted while the VM is running and is just saved back to the image file
> once it is closed. I don't expect that users will always consume all of
> the auto-loaded bitmaps while the VM is active...
> 
>> Also, if I don't free them on open, I'll have to free them on remove of
>> bdrv dirty bitmap..
> 
> Yes, so? That takes time, but that is something the user will probably
> expect. I wouldn't expect opening of the file to take that time.
> 
> Overall, it doesn't matter time-wise whether you free the bitmap data
> when opening the image file or when the dirty bitmap is actually deleted
> by the user or when the image file is closed. Just setting the single
> in_use flag for all of the auto-loaded bitmaps will basically not take
> any time.
> 
> On the other hand, as soon as just a single auto-loaded bitmap survives
> a VM (or qemu-img) invocation, you will very, very likely safe at least
> some time because writing the bitmap to the disk can reuse at least some
> of the existing clusters.

By the way, dealing with removal of bitmaps when they are deleted by the
user is also positive when it comes to migration or read-only disks that
are later reopened R/W: Currently, as far as I can see, you just keep
the auto-load bitmaps in the image if the image is part of an incoming
migration or opened read-only, but then you continue under the
assumption that they are removed from the image. That's not very good.

If you delete the bitmaps only when the user asks you to, then you can
just return an error that the bitmap cannot be removed at this time
because the image file is read-only or used in migration (and I don't
think the latter can even happen when it's the user who has requested
removal of the bitmap).

Max



signature.asc
Description: OpenPGP digital signature

[Qemu-block] [PULL 23/23] block/replication: Clarify 'top-id' parameter usage

2016-10-24 Thread Kevin Wolf

From: Changlong Xie 

The replication driver only supports the 'top-id' parameter for the
secondary side; it must not be supplied for the primary side.

Reviewed-by: Eric Blake 
Signed-off-by: Changlong Xie 
Message-id: 1476247808-15646-1-git-send-email-xiecl.f...@cn.fujitsu.com
Signed-off-by: Max Reitz 
---
 block/replication.c  | 5 +
 qapi/block-core.json | 3 ++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/block/replication.c b/block/replication.c
index 3bd1cf1..8bbfc8f 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -101,6 +101,11 @@ static int replication_open(BlockDriverState *bs, QDict 
*options,
 
 if (!strcmp(mode, "primary")) {
 s->mode = REPLICATION_MODE_PRIMARY;
+top_id = qemu_opt_get(opts, REPLICATION_TOP_ID);
+if (top_id) {
+error_setg(&local_err, "The primary side does not support option 
top-id");
+goto fail;
+}
 } else if (!strcmp(mode, "secondary")) {
 s->mode = REPLICATION_MODE_SECONDARY;
 top_id = qemu_opt_get(opts, REPLICATION_TOP_ID);
diff --git a/qapi/block-core.json b/qapi/block-core.json
index c59047b..97b1205 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2197,7 +2197,8 @@
 # @mode: the replication mode
 #
 # @top-id: #optional In secondary mode, node name or device ID of the root
-#  node who owns the replication node chain. Ignored in primary mode.
+#  node who owns the replication node chain. Must not be given in
+#  primary mode.
 #
 # Since: 2.8
 ##
-- 
1.8.3.1

[Qemu-block] [PULL 20/23] block: BdrvDirtyBitmap serialization interface

2016-10-24 Thread Kevin Wolf

From: Vladimir Sementsov-Ogievskiy 

Several functions to provide necessary access to BdrvDirtyBitmap for
block-migration.c

Signed-off-by: Vladimir Sementsov-Ogievskiy 
[Add the "finish" parameters. - Fam]
Signed-off-by: Fam Zheng 
Reviewed-by: John Snow 
Reviewed-by: Max Reitz 

Signed-off-by: John Snow 
Message-id: 1476395910-8697-9-git-send-email-js...@redhat.com
Signed-off-by: Max Reitz 
---
 block/dirty-bitmap.c | 37 +
 include/block/dirty-bitmap.h | 14 ++
 2 files changed, 51 insertions(+)

diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 31d5296..384146b 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -453,6 +453,43 @@ void bdrv_undo_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, 
HBitmap *in)
 hbitmap_free(tmp);
 }
 
+uint64_t bdrv_dirty_bitmap_serialization_size(const BdrvDirtyBitmap *bitmap,
+  uint64_t start, uint64_t count)
+{
+return hbitmap_serialization_size(bitmap->bitmap, start, count);
+}
+
+uint64_t bdrv_dirty_bitmap_serialization_align(const BdrvDirtyBitmap *bitmap)
+{
+return hbitmap_serialization_granularity(bitmap->bitmap);
+}
+
+void bdrv_dirty_bitmap_serialize_part(const BdrvDirtyBitmap *bitmap,
+  uint8_t *buf, uint64_t start,
+  uint64_t count)
+{
+hbitmap_serialize_part(bitmap->bitmap, buf, start, count);
+}
+
+void bdrv_dirty_bitmap_deserialize_part(BdrvDirtyBitmap *bitmap,
+uint8_t *buf, uint64_t start,
+uint64_t count, bool finish)
+{
+hbitmap_deserialize_part(bitmap->bitmap, buf, start, count, finish);
+}
+
+void bdrv_dirty_bitmap_deserialize_zeroes(BdrvDirtyBitmap *bitmap,
+  uint64_t start, uint64_t count,
+  bool finish)
+{
+hbitmap_deserialize_zeroes(bitmap->bitmap, start, count, finish);
+}
+
+void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap)
+{
+hbitmap_deserialize_finish(bitmap->bitmap);
+}
+
 void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
 int64_t nr_sectors)
 {
diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index c4e7858..efc2965 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -55,4 +55,18 @@ void bdrv_set_dirty_iter(BdrvDirtyBitmapIter *hbi, int64_t 
sector_num);
 int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap);
 void bdrv_dirty_bitmap_truncate(BlockDriverState *bs);
 
+uint64_t bdrv_dirty_bitmap_serialization_size(const BdrvDirtyBitmap *bitmap,
+  uint64_t start, uint64_t count);
+uint64_t bdrv_dirty_bitmap_serialization_align(const BdrvDirtyBitmap *bitmap);
+void bdrv_dirty_bitmap_serialize_part(const BdrvDirtyBitmap *bitmap,
+  uint8_t *buf, uint64_t start,
+  uint64_t count);
+void bdrv_dirty_bitmap_deserialize_part(BdrvDirtyBitmap *bitmap,
+uint8_t *buf, uint64_t start,
+uint64_t count, bool finish);
+void bdrv_dirty_bitmap_deserialize_zeroes(BdrvDirtyBitmap *bitmap,
+  uint64_t start, uint64_t count,
+  bool finish);
+void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap);
+
 #endif
-- 
1.8.3.1

Re: [Qemu-block] [Qemu-devel] [PULL 00/23] Block layer patches

2016-10-24 Thread Peter Maydell

On 24 October 2016 at 18:01, Kevin Wolf  wrote:
> The following changes since commit a3ae21ec3fe036f536dc94cad735931777143103:
>
>   Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into 
> staging (2016-10-24 15:03:09 +0100)
>
> are available in the git repository at:
>
>
>   git://repo.or.cz/qemu/kevin.git tags/for-upstream
>
> for you to fetch changes up to 25493dc012e7c10dba51ee893b634a1dbfeed126:
>
>   Merge remote-tracking branch 'mreitz/tags/pull-block-2016-10-24' into 
> queue-block (2016-10-24 18:02:26 +0200)
>
> 
>
> Block layer patches

Applied, thanks.

-- PMM

Re: [Qemu-block] [Qemu-devel] [PATCH v5 0/4] fdc: Use separate qdev device for drives

2016-10-24 Thread John Snow




On 10/24/2016 07:37 AM, Kevin Wolf wrote:

We have been complaining for a long time about how the floppy controller and
floppy drives are combined in a single qdev device and how this makes the
device awkward to work with because it behaves different from all other block
devices.

The latest reason to complain was when I noticed that using qdev device names
in QMP commands (e.g. for media change) doesn't really work when only the
controller is a qdev device, but the drives aren't.

So I decided to have a go at it, and this is the result.

It doesn't actually change any of the inner workings of the floppy controller,
but it wires things up differently on the qdev layer so that a floppy
controller now exposes a bus on which the floppy drives sit. This results in a
structure that is similar to IDE where the actual drive state is still in the
controller and the qdev device basically just contains the qdev properties -
not pretty, but quite workable.

The commit message of patch 3 explains how to use it. In short, there is a
'-device floppy' now and it does what you would expect if you ever used ide-cd.

The other problem is old command lines, especially those using things like
'-global isa-fdc,driveA=...'. In order to keep them working, we need to forward
the property to an internally created floppy drive device. This is a bit like
usb-storage, which we know is ugly, but works well enough in practice. The good
news here is that in contrast to usb-storage, the floppy controller only does
the forwarding for legacy configurations; as soon as you start using '-device
floppy', it doesn't happen any more.

So as you may have expected, this conversion doesn't result in a perfect
device, but I think it's definitely an improvement over the old state. I hope
you like it despite the warts. :-)

v5:
- Apply _filter_qemu to stderr, too [John]
- Rename the bus to floppy-bus [Frederic]
- Use FLOPPY_BUS() instead of DO_UPDATE() [Frederic]

v4:
- John says that his grep is broken and hangs at 100% CPU with my attempt to
  extract the floppy controller from info qtree. Use a simpler sed command
  instead (which, unlike the grep command, doesn't handle arbitrary indentation
  level of the next item, but we know what comes next, so just hardcoding 10
  spaces works, too).

v3:
- Fixed omissons in the conversion sysbus-fdc and the Sun one. Nice, combining
  floppy controllers and weird platforms in a single series. [John]

v2:
- Added patch 4 (qemu-iotests case for floppy config on the command line)
- Patch 2: Create a floppy device only if a BlockBackend exists instead of
  always creating two of them
- Patch 2: Initialise drive->fdctrl even if no drive is attached, it is
  accessed anyway during migration
- Patch 3: Keep 'type' qdev property and FDrive->drive in sync
- Patch 3: Removed if with condition that is always true

Alberto Garcia (2):
  throttle: Correct access to wrong BlockBackendPublic structures
  qemu-iotests: Test I/O in a single drive from a throttling group

Halil Pasic (1):
  block: improve error handling in raw_open

Pino Toscano (1):
  qapi: fix memory leak in bdrv_image_info_specific_dump



uhhh

http://i.imgur.com/60YQvmg.png


 block/qapi.c   |  1 +
 block/raw-posix.c  |  1 +
 block/raw-win32.c  |  1 +
 block/throttle-groups.c| 27 +++
 tests/qemu-iotests/093 | 33 -
 tests/qemu-iotests/093.out |  4 ++--
 6 files changed, 56 insertions(+), 11 deletions(-)

Re: [Qemu-block] [PATCH v8 00/36] block: Image locking series

2016-10-24 Thread Max Reitz

On 24.10.2016 12:11, Kevin Wolf wrote:

[...]

> Now, the big question is how to translate this into file locking. This
> could become a little tricky. I had a few thoughts involving another
> lock on byte 2, but none of them actually worked out so far, because
> what we want is essentially a lock that can be shared by readers, that
> can also be shared by writers, but not by readers and writers at the
> same time.

You can also share it between readers and writers, as long as everyone
can cope with volatile data.

I agree that it's very similar to the proposed op blocker style, but I
can't really come up with a meaningful translation either.

Maybe something like this (?): All readers who do not want the file to
be modified grab a shared lock on byte 1. All writers who can deal with
volatile data grab a shared lock on byte 2. Exclusive writers grab an
exclusive lock on byte 1 and 2. Readers who can cope with volatile data
get no lock at all.

When opening, the first and second group would always have to test
whether there is a lock on the other byte, respectively. E.g. sharing
writers would first grab an exclusive lock on byte 1, then the shared
lock on byte 2 and then release the exclusive lock again.

Would that work?

>> So as far as I understand it, those qdev properties should eventually be
>> changeable by the guest. And if that is so, the user probably shouldn't
>> touch them at all because the guest OS really knows whether it can cope
>> with volatile block devices or not, and it will tell qemu if that is so.
> 
> This is an interesting thought for PV devices, but it hasn't been part
> of the plan so far. I'm not sure if the guest OS block device drivers
> even have this information generally, because that's more a matter of
> whether a cluster file system is used on the block device or not.
> 
>> That may be the reason why the qdev property doesn't make sense to me at
>> the first glance: It does make sense for the guest OS to set this
>> property at that level, but it doesn't make sense for the user to do so.
>> Or maybe it does, but the user is really asking for things breaking then
>> ("YES, I AM SURE THAT MY GUEST WILL BE COMPLETELY FINE WITH FLOPPY DISKS
>> JUST CHANGING RANDOMLY" -- but I guess we've all been at that point
>> ourselves...).
>>
>> So after having convinced myself twice now, having the qdev property is
>> probably correct.
> 
> Essentially the whole reason is that we need the information and the
> guest doesn't tell us, so we need to ask someone who supposedly knows
> enough about the guest - which is the user.

Hm, OK, fair enough.

>> 
>>
>> But that's where the flags come in again: The guest may be fine with a
>> shared lock or none at all. But what about the block layer? Say you're
>> using a qcow2 image; that will not like sharing at all, independently of
>> what the guest things. So the block layer needs to have internal
>> mechanisms to elevate the locks proposed by the guest devices to e.g.
>> exclusive ones. I don't think that is something addressed by this series
>> in this version. Maybe you'll still need the flags for that.
> 
> I'm not completely sure about the mechanism and whether it should
> involve flags, but yes, you need something to propagate the required
> locking mode down recursively (just like op blockers will need to be
> done - maybe we should really use the same infrastructure and only
> do file locking as its implementation of the lowest layer).

Sounds good to me, but that implies that we have the new op blockers
rather soon. ;-)

> What you don't need, however, is user input. We already know that qcow2
> doesn't like concurrent writes.
> 
>> Final thing: Say you have a user who's crazy. Maybe they want to debug
>> something. Anyway, they have some qcow2 image attached to some guest
>> device, and they want to access that image simultaneously from some
>> other process; as I said, crazy, but there may be legitimate uses for
>> that so we should allow it.
>>
>> Now first that user of course has to set the device's lock_mode to
>> shared or none, thus promising qemu that the guest will be fine with
>> volatile data. But of course qcow2 will override that lock mode, because
>> qcow2 doesn't care about the guest: It primarily cares about its own
>> metadata integrity. So at this point the user will need some way to
>> override qcow2's choice, too. Therefore, I think we also need a per-node
>> flag to override the locking mode.
>>
>> qcow2 would probably make this flag default to exclusive for its
>> underlying node, and the user would then have to override that flag for
>> exactly that underlying node.
>>
>> Therefore I think we still need a block layer property for the lock
>> mode. While I now agree that we do need a qdev property, I think that
>> alone won't be sufficient.
> 
> Attaching a qcow2 image to two processes simply doesn't work unless both
> are read-only, so I disagree that we should allow it. Or can you really
> come up with a specific

Re: [Qemu-block] [PATCH 16/22] qmp: add persistent flag to block-dirty-bitmap-add

2016-10-24 Thread Max Reitz

On 24.10.2016 17:12, Vladimir Sementsov-Ogievskiy wrote:
> 10.10.2016 19:08, Max Reitz wrote:
>> On 30.09.2016 12:53, Vladimir Sementsov-Ogievskiy wrote:
>>
> 
> [...]
> 
>>> +}
>>>  out:
>>>   aio_context_release(aio_context);
>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>> index 31f9990..2bf56cd 100644
>>> --- a/qapi/block-core.json
>>> +++ b/qapi/block-core.json
>>> @@ -1235,10 +1235,15 @@
>>>   # @granularity: #optional the bitmap granularity, default is 64k for
>>>   #   block-dirty-bitmap-add
>>>   #
>>> +# @persistent: #optional the bitmap is persistent, i.e. it will be
>>> saved to
>>> +#  corresponding block device on it's close. Default is
>>> false.
>>> +#  For block-dirty-bitmap-add. (Since 2.8)
>> I'm not sure what the "For block-dirty-bitmap-add." is supposed to mean,
>> because this whole struct is for block-dirty-bitmap-add (and for
>> block-dirty-bitmap-add transactions, to be exact, but @persistent will
>> surely work there, too, won't it?).
>>
>> Also, I'd say "will be saved to the corresponding block device image
>> file" instead of just "block device", because in my understanding a
>> block device and its image file are two separate things.
> 
> Hmm.. But 'its close' is block device close, not file close.

In my understanding, it is the file close.

>  And we call
> common bdrv_* function to save it, so we actually save it to device, and
> then the device puzzles out, how to actually save it.

Well, OK, it depends on what you mean by "block device". There are many
things we call a "block device", but nowadays I think it mostly refers
to either a guest block device or a BlockBackend (and as of lately,
we're more and more trying to hide the BlockBackend from the user, so
you could argue that it's only the guest device now).

The bdrv_* functions operate on block layer BDS nodes, and I don't think
we call them "block devices" (at least not any more).

In any case, I think for users the term "block device" refers to either
the device presented to the guest or to all of the block layer stuff
that's underneath, and it's not quite clear how you could save a bitmap
to that, or what it's supposed to mean to "close" a block device (you
can remove it, you can destroy it, you can delete it, but "close" it?).

But saying that it will be saved to the image file that is attached to
the block device will make it absolutely clear what we mean.

So what you have called a "device" here is neither what I'd call a
device (I'd call it a "node" or "BDS"), nor what I think users would
call a device. Also, it's not the BDS that puzzles out how to save it
but some block driver.

>>> +#
>>>   # Since 2.4
>>>   ##
>>>   { 'struct': 'BlockDirtyBitmapAdd',
>>> -  'data': { 'node': 'str', 'name': 'str', '*granularity': 'uint32' } }
>>> +  'data': { 'node': 'str', 'name': 'str', '*granularity': 'uint32',
>>> +  '*persistent': 'bool' } }
>> I think normally we'd align the new line so that the opening ' of
>> '*persistent' is under the opening ' of 'node'.
>>
>>> ##
>>>   # @block-dirty-bitmap-add
>>> diff --git a/qmp-commands.hx b/qmp-commands.hx
>>> index ba2a916..434b418 100644
>>> --- a/qmp-commands.hx
>>> +++ b/qmp-commands.hx
>>> @@ -1441,7 +1441,7 @@ EQMP
>>> {
>>>   .name   = "block-dirty-bitmap-add",
>>> -.args_type  = "node:B,name:s,granularity:i?",
>>> +.args_type  = "node:B,name:s,granularity:i?,persistent:b?",
>>>   .mhandler.cmd_new = qmp_marshal_block_dirty_bitmap_add,
>>>   },
>>>   @@ -1458,6 +1458,9 @@ Arguments:
>>>   - "node": device/node on which to create dirty bitmap (json-string)
>>>   - "name": name of the new dirty bitmap (json-string)
>>>   - "granularity": granularity to track writes with (int, optional)
>>> +- "persistent": bitmap will be saved to corresponding block device
>>> +on it's close. Block driver should maintain
>>> persistent bitmaps
>>> +(json-bool, optional, default false) (Since 2.8)
>> And I don't know what the user is supposed to make of the information
>> that block drivers will take care of maintaining persistent bitmaps. All
>> they care about is that it will be stored in the corresponding image
>> file, so in my opinion it would be better to just omit the last sentence
>> here.
> 
> Users shoud know, that only qcow2 supports persistent bitmaps. Instead
> of last sentence I can write "For now only Qcow2 disks supports
> persistent bitmaps".

s/supports/support/, but yes, that sounds preferable to me.

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH 1/2] block/nfs: Introduce runtime_opts in NFS

2016-10-24 Thread Ashijeet Acharya

On Mon, Oct 24, 2016 at 8:40 PM, Kevin Wolf  wrote:
> Am 19.10.2016 um 14:38 hat Ashijeet Acharya geschrieben:
>> Make NFS block driver use various fine grained runtime_opts.
>> Set .bdrv_parse_filename() to nfs_parse_filename() and introduce two
>> new functions nfs_parse_filename() and nfs_parse_uri() to help parsing
>> the URI. This will help us to prepare the NFS for blockdev-add.
>>
>> Signed-off-by: Ashijeet Acharya 
>> ---
>>  block/nfs.c | 360 
>> +++-
>>  1 file changed, 261 insertions(+), 99 deletions(-)
>>
>> diff --git a/block/nfs.c b/block/nfs.c
>> index 8602a44..5eb909e 100644
>> --- a/block/nfs.c
>> +++ b/block/nfs.c
>> @@ -35,8 +35,12 @@
>>  #include "qemu/uri.h"
>>  #include "qemu/cutils.h"
>>  #include "sysemu/sysemu.h"
>> +#include "qapi/qmp/qdict.h"
>> +#include "qapi/qmp/qint.h"
>> +#include "qapi/qmp/qstring.h"
>>  #include 
>>
>> +
>>  #define QEMU_NFS_MAX_READAHEAD_SIZE 1048576
>>  #define QEMU_NFS_MAX_PAGECACHE_SIZE (8388608 / NFS_BLKSIZE)
>>  #define QEMU_NFS_MAX_DEBUG_LEVEL 2
>> @@ -61,6 +65,127 @@ typedef struct NFSRPC {
>>  NFSClient *client;
>>  } NFSRPC;
>>
>> +static int nfs_parse_uri(const char *filename, QDict *options, Error **errp)
>> +{
>> +URI *uri = NULL;
>> +QueryParams *qp = NULL;
>> +int ret = 0, i;
>> +const char *p;
>> +
>> +uri = uri_parse(filename);
>> +if (!uri) {
>> +error_setg(errp, "Invalid URI specified");
>> +ret = -EINVAL;
>> +goto out;
>> +}
>> +if (strcmp(uri->scheme, "nfs") != 0) {
>> +error_setg(errp, "URI scheme must be 'nfs'");
>> +ret = -EINVAL;
>> +goto out;
>> +}
>> +
>> +if (!uri->server || strcmp(uri->server, "") == 0) {
>
> No need to use strcmp(), !*uri->server is enough.

Yes, fixed the same for uri->path too.

>
>> +error_setg(errp, "missing hostname in URI");
>> +ret = -EINVAL;
>> +goto out;
>> +}
>> +
>> +if (!uri->path || strcmp(uri->path, "") == 0) {
>> +error_setg(errp, "missing file path in URI");
>> +ret = -EINVAL;
>> +goto out;
>> +}
>> +
>> +p = uri->path ? uri->path : "/";
>
> You just checked that uri->path is non-NULL, so this is dead code.
>
>> +p += strspn(p, "/");
>> +if (p[0]) {
>> +qdict_put(options, "export", qstring_from_str(p));
>> +}
>
> "export" isn't among the recognised options. You may have missed this
> code fragment when removing the option from your patch.

I dropped "export" as we discussed on the IRC.

>
>> +qp = query_params_parse(uri->query);
>> +if (!qp) {
>> +error_setg(errp, "could not parse query parameters");
>> +ret = -EINVAL;
>> +goto out;
>> +}
>> +
>> +qdict_put(options, "host", qstring_from_str(uri->server));
>> +
>> +qdict_put(options, "path", qstring_from_str(uri->path));
>> +
>> +for (i = 0; i < qp->n; i++) {
>> +if (!qp->p[i].value) {
>> +error_setg(errp, "Value for NFS parameter expected: %s",
>> +   qp->p[i].name);
>> +goto out;
>> +}
>> +if (parse_uint_full(qp->p[i].value, NULL, 0)) {
>> +error_setg(errp, "Illegal value for NFS parameter: %s",
>> +   qp->p[i].name);
>> +goto out;
>> +}
>> +if (!strcmp(qp->p[i].name, "uid")) {
>> +qdict_put(options, "uid",
>> +  qstring_from_str(qp->p[i].value));
>> +} else if (!strcmp(qp->p[i].name, "gid")) {
>> +qdict_put(options, "gid",
>> +  qstring_from_str(qp->p[i].value));
>> +} else if (!strcmp(qp->p[i].name, "tcp-syncnt")) {
>> +qdict_put(options, "tcp-syncnt",
>> +  qstring_from_str(qp->p[i].value));
>> +} else if (!strcmp(qp->p[i].name, "readahead")) {
>> +qdict_put(options, "readahead",
>> +  qstring_from_str(qp->p[i].value));
>> +} else if (!strcmp(qp->p[i].name, "pagecache")) {
>> +qdict_put(options, "pagecache",
>> +  qstring_from_str(qp->p[i].value));
>> +} else if (!strcmp(qp->p[i].name, "debug")) {
>> +qdict_put(options, "debug",
>> +  qstring_from_str(qp->p[i].value));
>> +} else {
>> +error_setg(errp, "Unknown NFS parameter name: %s",
>> +   qp->p[i].name);
>> +goto out;
>> +}
>> +}
>> +out:
>> +if (qp) {
>> +query_params_free(qp);
>> +}
>> +if (uri) {
>> +uri_free(uri);
>> +}
>> +return ret;
>> +}
>> +
>> +static void nfs_parse_filename(const char *filename, QDict *options,
>> +   Error **errp)
>> +{
>> +int ret = 0;
>> +
>> +if (qdict_haskey(options, "host") ||
>> +qdict_haskey(options, "path") ||
>> +qdict_haskey(options, "uid") ||
>> +qdict_haskey(opt

[Qemu-block] [PATCH v2 2/2] qapi: allow blockdev-add for NFS

2016-10-24 Thread Ashijeet Acharya

Introduce new object 'BlockdevOptionsNFS' in qapi/block-core.json to
support blockdev-add for NFS network protocol driver. Also make a new
struct NFSServer to support tcp connection.

Signed-off-by: Ashijeet Acharya 
---
 qapi/block-core.json | 56 
 1 file changed, 52 insertions(+), 4 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 9d797b8..3ab028d 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1714,9 +1714,9 @@
 { 'enum': 'BlockdevDriver',
   'data': [ 'archipelago', 'blkdebug', 'blkverify', 'bochs', 'cloop',
 'dmg', 'file', 'ftp', 'ftps', 'gluster', 'host_cdrom',
-'host_device', 'http', 'https', 'luks', 'null-aio', 'null-co',
-'parallels', 'qcow', 'qcow2', 'qed', 'quorum', 'raw',
-   'replication', 'tftp', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
+'host_device', 'http', 'https', 'luks', 'nfs', 'null-aio',
+'null-co', 'parallels', 'qcow', 'qcow2', 'qed', 'quorum', 'raw',
+'replication', 'tftp', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
 
 ##
 # @BlockdevOptionsFile
@@ -2212,6 +2212,54 @@
 '*top-id': 'str' } }
 
 ##
+# @NFSServer
+#
+# Captures the address of the socket
+#
+# @type:transport type used for NFS (only TCP supported)
+#
+# @host:host part of the address
+#
+# Since 2.8
+##
+{ 'struct': 'NFSServer',
+  'data': { 'type': 'str',
+'host': 'str' } }
+
+##
+# @BlockdevOptionsNfs
+#
+# Driver specific block device option for NFS
+#
+# @server:host address
+#
+# @path:  path of the image on the host
+#
+# @uid:   #optional UID value to use when talking to the server
+#
+# @gid:   #optional GID value to use when talking to the server
+#
+# @tcp-syncnt:#optional number of SYNs during the session establishment
+#
+# @readahead: #optional set the readahead size in bytes
+#
+# @pagecache: #optional set the pagecache size in bytes
+#
+# @debug: #optional set the NFS debug level (max 2)
+#
+# Since 2.8
+##
+{ 'struct': 'BlockdevOptionsNfs',
+  'data': { 'server': 'NFSServer',
+'path': 'str',
+'*uid': 'int',
+'*gid': 'int',
+'*tcp-syncnt': 'int',
+'*readahead': 'int',
+'*pagecache': 'int',
+'*debug': 'int' } }
+
+##
 # @BlockdevOptionsCurl
 #
 # Driver specific block device options for the curl backend.
@@ -2269,7 +2317,7 @@
 # TODO iscsi: Wait for structured options
   'luks':   'BlockdevOptionsLUKS',
 # TODO nbd: Should take InetSocketAddress for 'host'?
-# TODO nfs: Wait for structured options
+  'nfs':'BlockdevOptionsNfs',
   'null-aio':   'BlockdevOptionsNull',
   'null-co':'BlockdevOptionsNull',
   'parallels':  'BlockdevOptionsGenericFormat',
-- 
2.6.2

[Qemu-block] [PATCH v2 1/2] block/nfs: Introduce runtime_opts in NFS

2016-10-24 Thread Ashijeet Acharya

Make NFS block driver use various fine grained runtime_opts.
Set .bdrv_parse_filename() to nfs_parse_filename() and introduce two
new functions nfs_parse_filename() and nfs_parse_uri() to help parsing
the URI.

Signed-off-by: Ashijeet Acharya 
---
 block/nfs.c | 348 +++-
 1 file changed, 253 insertions(+), 95 deletions(-)

diff --git a/block/nfs.c b/block/nfs.c
index 8602a44..666ccf2 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -35,8 +35,12 @@
 #include "qemu/uri.h"
 #include "qemu/cutils.h"
 #include "sysemu/sysemu.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qint.h"
+#include "qapi/qmp/qstring.h"
 #include 
 
+
 #define QEMU_NFS_MAX_READAHEAD_SIZE 1048576
 #define QEMU_NFS_MAX_PAGECACHE_SIZE (8388608 / NFS_BLKSIZE)
 #define QEMU_NFS_MAX_DEBUG_LEVEL 2
@@ -61,6 +65,118 @@ typedef struct NFSRPC {
 NFSClient *client;
 } NFSRPC;
 
+static int nfs_parse_uri(const char *filename, QDict *options, Error **errp)
+{
+URI *uri = NULL;
+QueryParams *qp = NULL;
+int ret = 0, i;
+
+uri = uri_parse(filename);
+if (!uri) {
+error_setg(errp, "Invalid URI specified");
+ret = -EINVAL;
+goto out;
+}
+if (strcmp(uri->scheme, "nfs") != 0) {
+error_setg(errp, "URI scheme must be 'nfs'");
+ret = -EINVAL;
+goto out;
+}
+
+if (!uri->server) {
+error_setg(errp, "missing hostname in URI");
+ret = -EINVAL;
+goto out;
+}
+
+if (!uri->path) {
+error_setg(errp, "missing file path in URI");
+ret = -EINVAL;
+goto out;
+}
+
+qp = query_params_parse(uri->query);
+if (!qp) {
+error_setg(errp, "could not parse query parameters");
+ret = -EINVAL;
+goto out;
+}
+
+qdict_put(options, "host", qstring_from_str(uri->server));
+
+qdict_put(options, "path", qstring_from_str(uri->path));
+
+for (i = 0; i < qp->n; i++) {
+if (!qp->p[i].value) {
+error_setg(errp, "Value for NFS parameter expected: %s",
+   qp->p[i].name);
+goto out;
+}
+if (parse_uint_full(qp->p[i].value, NULL, 0)) {
+error_setg(errp, "Illegal value for NFS parameter: %s",
+   qp->p[i].name);
+goto out;
+}
+if (!strcmp(qp->p[i].name, "uid")) {
+qdict_put(options, "uid",
+  qstring_from_str(qp->p[i].value));
+} else if (!strcmp(qp->p[i].name, "gid")) {
+qdict_put(options, "gid",
+  qstring_from_str(qp->p[i].value));
+} else if (!strcmp(qp->p[i].name, "tcp-syncnt")) {
+qdict_put(options, "tcp-syncnt",
+  qstring_from_str(qp->p[i].value));
+} else if (!strcmp(qp->p[i].name, "readahead")) {
+qdict_put(options, "readahead",
+  qstring_from_str(qp->p[i].value));
+} else if (!strcmp(qp->p[i].name, "pagecache")) {
+qdict_put(options, "pagecache",
+  qstring_from_str(qp->p[i].value));
+} else if (!strcmp(qp->p[i].name, "debug")) {
+qdict_put(options, "debug",
+  qstring_from_str(qp->p[i].value));
+} else {
+error_setg(errp, "Unknown NFS parameter name: %s",
+   qp->p[i].name);
+goto out;
+}
+}
+out:
+if (qp) {
+query_params_free(qp);
+}
+if (uri) {
+uri_free(uri);
+}
+return ret;
+}
+
+static void nfs_parse_filename(const char *filename, QDict *options,
+   Error **errp)
+{
+int ret = 0;
+
+if (qdict_haskey(options, "host") ||
+qdict_haskey(options, "path") ||
+qdict_haskey(options, "uid") ||
+qdict_haskey(options, "gid") ||
+qdict_haskey(options, "tcp-syncnt") ||
+qdict_haskey(options, "readahead") ||
+qdict_haskey(options, "pagecache") ||
+qdict_haskey(options, "debug")) {
+error_setg(errp, "host/path/uid/gid/tcp-syncnt/readahead/pagecache"
+ "/debug and a filename may not be used at the same"
+ " time");
+return;
+}
+
+ret = nfs_parse_uri(filename, options, errp);
+if (ret < 0) {
+error_setg(errp, "No valid URL specified");
+}
+return;
+}
+
 static void nfs_process_read(void *arg);
 static void nfs_process_write(void *arg);
 
@@ -228,15 +344,49 @@ static int coroutine_fn nfs_co_flush(BlockDriverState *bs)
 return task.ret;
 }
 
-/* TODO Convert to fine grained options */
 static QemuOptsList runtime_opts = {
 .name = "nfs",
 .head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
 .desc = {
 {
-.name = "filename",
+.name = "host",
+.type = QEMU_OPT_STRING,
+.help = "Host to connect to",
+},
+{
+.name = "path",

[Qemu-block] [PATCH v2 0/2] block: allow blockdev-add for NFS

2016-10-24 Thread Ashijeet Acharya

Previously posted series patches:
v1: https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg04487.html

This series adds blockdev-add support for NFS block driver.

Patch 1 helps to prepare NFS driver to make use of several runtime_opts
as they appear in the URI. This will make NFS to do things similar to
the way other drivers available in the block layer do.

Patch 2 helps to allow blockdev-add support for the NFS block driver
by making the NFS option available.

Changes in v2:
- drop strcmp() condition check for host and path in nfs_parse_uri()
- drop "export" completely
- initialize client->context bedore setting query parameters
- fix the QDict options being passed to nfs_client_open() and make use of url

Ashijeet Acharya (2):
  block/nfs: Introduce runtime_opts in NFS
  qapi: allow blockdev-add for NFS

 block/nfs.c  | 348 +--
 qapi/block-core.json |  56 -
 2 files changed, 305 insertions(+), 99 deletions(-)

-- 
2.6.2

Re: [Qemu-block] [PATCH 0/2] less confusing block file names

2016-10-24 Thread Eric Blake

On 10/24/2016 10:14 AM, Kevin Wolf wrote:
> Am 24.10.2016 um 16:44 hat Paolo Bonzini geschrieben:
>> On 24/10/2016 15:47, Kevin Wolf wrote:
>>> One effect that makes me less than fully happy is that 'git log
>>> block/raw.c' without --follow mixes the history of the renamed driver
>>> with the history of the old, differently licensed one. On the other
>>> hand, the removal of the old driver happened three years ago, so it's
>>> probably unlikely that people confuse what belongs to which one and
>>> revive old code accidentally.
>>
>> Hmm, this makes it sensible to only apply patch 2, especially since Eric
>> has corteously split them. :)
> 
> Maybe rename to something like raw-format.c instead then in patch 1?

raw-format.c would indeed be a project-unique name; I can respin along
those lines if there is consensus on that front, as well as touch up the
commit message to point out the history of the bsd name (as Laszlo
pointed out, it was the choice of licensing that determined the original
name)

> 
> I must admit that it's reasonable enough to make the assumption that
> raw_bsd.c has something to do with a BSD-specific block driver.
> 
> Kevin
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] does blk_commit_all need blk_all_next?

2016-10-24 Thread Max Reitz

On 24.10.2016 10:45, Paolo Bonzini wrote:
> blk_commit_all is in block/block-backend.c only because it uses
> blk_all_next.  This is also the only reason why it needs a stub
> (stubs/blk-commit-all.c).
> 
> The only difference between blk_next and blk_all_next is that the latter
> "iterates over all BlockBackends, even the ones which are hidden (i.e.
> are not referenced by the monitor)".  Should blk_commit_all really
> iterate over BlockBackends such as the NBD server or the block jobs'?

I guess bdrv_commit_all() did, so blk_commit_all() does now, too.

The issue is, though, that currently "hidden BB" is equivalent to
"anonymous BB". And I think that Kevin is working towards basically all
BBs being anonymous, i.e. hidden.

When using -drive without id but with node-name, then you'll get a
hidden BB. So we can't get away without blk_all_next().

But maybe it would make sense to skip all BBs that are not attached to a
device?

Max

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH 1/2] block/nfs: Introduce runtime_opts in NFS

2016-10-24 Thread Ashijeet Acharya

On Tue, Oct 25, 2016 at 12:12 AM, Ashijeet Acharya
 wrote:
> On Mon, Oct 24, 2016 at 8:40 PM, Kevin Wolf  wrote:
>> Am 19.10.2016 um 14:38 hat Ashijeet Acharya geschrieben:
>>> Make NFS block driver use various fine grained runtime_opts.
>>> Set .bdrv_parse_filename() to nfs_parse_filename() and introduce two
>>> new functions nfs_parse_filename() and nfs_parse_uri() to help parsing
>>> the URI. This will help us to prepare the NFS for blockdev-add.
>>>
>>> Signed-off-by: Ashijeet Acharya 
>>> ---
>>>  block/nfs.c | 360 
>>> +++-
>>>  1 file changed, 261 insertions(+), 99 deletions(-)
>>>
>>> diff --git a/block/nfs.c b/block/nfs.c
>>> index 8602a44..5eb909e 100644
>>> --- a/block/nfs.c
>>> +++ b/block/nfs.c
>>> @@ -35,8 +35,12 @@
>>> +if (!uri->server || strcmp(uri->server, "") == 0) {
>>
>> No need to use strcmp(), !*uri->server is enough.
>
> Yes, fixed the same for uri->path too.

Can I ask why we do this check in SSH then?

Ashijeet

>> Kevin

Re: [Qemu-block] [Qemu-devel] Assertion failure on qcow2 disk with cluster_size != 64k

2016-10-24 Thread Eric Blake

On 10/21/2016 08:14 AM, Ed Swierk wrote:
> On Thu, Oct 20, 2016 at 6:38 PM, Eric Blake  wrote:
>> On 10/20/2016 07:24 PM, Ed Swierk wrote:
>>> Changing max_transfer in the normal write case to
>>> MIN_NON_ZERO(alignment, MAX_WRITE_ZEROES_BOUNCE_BUFFER) appears to fix
>>> the problem, but I don't pretend to understand all the subtleties
>>> here.
>>
>> That actually sounds like the right fix.  But since the bug was probably
>> caused by my code, I'll formalize it into a patch and see if I can
>> modify the testsuite to give it coverage.
> 
> If alignment > MAX_WRITE_ZEROES_BOUNCE_BUFFER (however unlikely) we
> have the same problem, so maybe this would be better?

Our qcow2 support is currently limited to a maximum of 2M clusters;
while MAX_WRITE_ZEROES_BOUNCE_BUFFER is 32k * 512, or 16M.  The
maximum-size bounce buffer should not be the problem here; but for some
reason, it looks like alignment is larger than max_transfer which should
not normally be possible.  I'm still playing with what should be the
right patch, but hope to have something posted soon.

> 
> max_transfer = alignment > 0 ? alignment : MAX_WRITE_ZEROES_BOUNCE_BUFFER
> 
> --Ed
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

[Qemu-block] [PATCH v2 1/6] blockjob: fix dead pointer in txn list

2016-10-24 Thread John Snow

From: Vladimir Sementsov-Ogievskiy 

Though it is not intended to be reached through normal circumstances,
if we do not gracefully deconstruct the transaction QLIST, we may wind
up with stale pointers in the list.

The rest of this series attempts to address the underlying issues,
but this should fix list inconsistencies.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Tested-by: John Snow 
Reviewed-by: John Snow 
[Rewrote commit message. --js]
Signed-off-by: John Snow 
Reviewed-by: Eric Blake 
Reviewed-by: Kevin Wolf 

Signed-off-by: John Snow 
---
 blockjob.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/blockjob.c b/blockjob.c
index e1d0382..f55bfec 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -247,6 +247,7 @@ static void block_job_completed_single(BlockJob *job)
 }
 
 if (job->txn) {
+QLIST_REMOVE(job, txn_list);
 block_job_txn_unref(job->txn);
 }
 block_job_unref(job);
-- 
2.7.4

[Qemu-block] [PATCH v2 3/6] blockjob: add .start field

2016-10-24 Thread John Snow

Add an explicit start field to specify the entrypoint. We already have
ownership of the coroutine itself AND managing the lifetime of the
coroutine, let's take control of creation of the coroutine, too.

This will allow us to delay creation of the actual coroutine until we
know we'll actually start a BlockJob in block_job_start. This avoids
the sticky question of how to "un-create" a Coroutine that hasn't been
started yet.

Signed-off-by: John Snow 
---
 block/backup.c   | 23 ---
 block/commit.c   |  3 ++-
 block/mirror.c   |  4 +++-
 block/stream.c   |  3 ++-
 include/block/blockjob_int.h |  3 +++
 5 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index ed6d74a..622f64e 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -307,16 +307,6 @@ void backup_cow_request_end(CowRequest *req)
 cow_request_end(req);
 }
 
-static const BlockJobDriver backup_job_driver = {
-.instance_size  = sizeof(BackupBlockJob),
-.job_type   = BLOCK_JOB_TYPE_BACKUP,
-.set_speed  = backup_set_speed,
-.commit = backup_commit,
-.abort  = backup_abort,
-.clean  = backup_clean,
-.attached_aio_context   = backup_attached_aio_context,
-};
-
 static BlockErrorAction backup_error_action(BackupBlockJob *job,
 bool read, int error)
 {
@@ -526,6 +516,17 @@ static void coroutine_fn backup_run(void *opaque)
 block_job_defer_to_main_loop(&job->common, backup_complete, data);
 }
 
+static const BlockJobDriver backup_job_driver = {
+.instance_size  = sizeof(BackupBlockJob),
+.job_type   = BLOCK_JOB_TYPE_BACKUP,
+.start  = backup_run,
+.set_speed  = backup_set_speed,
+.commit = backup_commit,
+.abort  = backup_abort,
+.clean  = backup_clean,
+.attached_aio_context   = backup_attached_aio_context,
+};
+
 void backup_start(const char *job_id, BlockDriverState *bs,
   BlockDriverState *target, int64_t speed,
   MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
@@ -637,7 +638,7 @@ void backup_start(const char *job_id, BlockDriverState *bs,
 
 bdrv_op_block_all(target, job->common.blocker);
 job->common.len = len;
-job->common.co = qemu_coroutine_create(backup_run, job);
+job->common.co = qemu_coroutine_create(job->common.driver->start, job);
 block_job_txn_add_job(txn, &job->common);
 qemu_coroutine_enter(job->common.co);
 return;
diff --git a/block/commit.c b/block/commit.c
index d555600..cc2030d 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -205,6 +205,7 @@ static const BlockJobDriver commit_job_driver = {
 .instance_size = sizeof(CommitBlockJob),
 .job_type  = BLOCK_JOB_TYPE_COMMIT,
 .set_speed = commit_set_speed,
+.start = commit_run,
 };
 
 void commit_start(const char *job_id, BlockDriverState *bs,
@@ -274,7 +275,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
 s->backing_file_str = g_strdup(backing_file_str);
 
 s->on_error = on_error;
-s->common.co = qemu_coroutine_create(commit_run, s);
+s->common.co = qemu_coroutine_create(s->common.driver->start, s);
 
 trace_commit_start(bs, base, top, s, s->common.co);
 qemu_coroutine_enter(s->common.co);
diff --git a/block/mirror.c b/block/mirror.c
index c81b5e0..3a29b94 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -891,6 +891,7 @@ static const BlockJobDriver mirror_job_driver = {
 .instance_size  = sizeof(MirrorBlockJob),
 .job_type   = BLOCK_JOB_TYPE_MIRROR,
 .set_speed  = mirror_set_speed,
+.start  = mirror_run,
 .complete   = mirror_complete,
 .pause  = mirror_pause,
 .attached_aio_context   = mirror_attached_aio_context,
@@ -900,6 +901,7 @@ static const BlockJobDriver commit_active_job_driver = {
 .instance_size  = sizeof(MirrorBlockJob),
 .job_type   = BLOCK_JOB_TYPE_COMMIT,
 .set_speed  = mirror_set_speed,
+.start  = mirror_run,
 .complete   = mirror_complete,
 .pause  = mirror_pause,
 .attached_aio_context   = mirror_attached_aio_context,
@@ -968,7 +970,7 @@ static void mirror_start_job(const char *job_id, 
BlockDriverState *bs,
 
 bdrv_op_block_all(target, s->common.blocker);
 
-s->common.co = qemu_coroutine_create(mirror_run, s);
+s->common.co = qemu_coroutine_create(s->common.driver->start, s);
 trace_mirror_start(bs, s, s->common.co, opaque);
 qemu_coroutine_enter(s->common.co);
 }
diff --git a/block/stream.c b/block/stream.c
index 906f7f3..8ffed9c 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -212,6 +212,7 @@ static con

[Qemu-block] [PATCH v2 2/6] blockjob: add .clean property

2016-10-24 Thread John Snow

Cleaning up after we have deferred to the main thread but before the
transaction has converged can be dangerous and result in deadlocks
if the job cleanup invokes any BH polling loops.

A job may attempt to begin cleaning up, but may induce another job to
enter its cleanup routine. The second job, part of our same transaction,
will block waiting for the first job to finish, so neither job may now
make progress.

To rectify this, allow jobs to register a cleanup operation that will
always run regardless of if the job was in a transaction or not, and
if the transaction job group completed successfully or not.

Move sensitive cleanup to this callback instead which is guaranteed to
be run only after the transaction has converged, which removes sensitive
timing constraints from said cleanup.

Furthermore, in future patches these cleanup operations will be performed
regardless of whether or not we actually started the job. Therefore,
cleanup callbacks should essentially confine themselves to undoing create
operations, e.g. setup actions taken in what is now backup_start.

Reported-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: John Snow 
---
 block/backup.c   | 13 +
 blockjob.c   |  3 +++
 include/block/blockjob_int.h |  8 
 3 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 6d12100..ed6d74a 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -242,6 +242,13 @@ static void backup_abort(BlockJob *job)
 }
 }
 
+static void backup_clean(BlockJob *job)
+{
+BackupBlockJob *s = container_of(job, BackupBlockJob, common);
+assert(s->target);
+blk_unref(s->target);
+}
+
 static void backup_attached_aio_context(BlockJob *job, AioContext *aio_context)
 {
 BackupBlockJob *s = container_of(job, BackupBlockJob, common);
@@ -306,6 +313,7 @@ static const BlockJobDriver backup_job_driver = {
 .set_speed  = backup_set_speed,
 .commit = backup_commit,
 .abort  = backup_abort,
+.clean  = backup_clean,
 .attached_aio_context   = backup_attached_aio_context,
 };
 
@@ -327,11 +335,8 @@ typedef struct {
 
 static void backup_complete(BlockJob *job, void *opaque)
 {
-BackupBlockJob *s = container_of(job, BackupBlockJob, common);
 BackupCompleteData *data = opaque;
 
-blk_unref(s->target);
-
 block_job_completed(job, data->ret);
 g_free(data);
 }
@@ -642,7 +647,7 @@ void backup_start(const char *job_id, BlockDriverState *bs,
 bdrv_reclaim_dirty_bitmap(bs, sync_bitmap, NULL);
 }
 if (job) {
-blk_unref(job->target);
+backup_clean(&job->common);
 block_job_unref(&job->common);
 }
 }
diff --git a/blockjob.c b/blockjob.c
index f55bfec..150b87e 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -232,6 +232,9 @@ static void block_job_completed_single(BlockJob *job)
 job->driver->abort(job);
 }
 }
+if (job->driver->clean) {
+job->driver->clean(job);
+}
 
 if (job->cb) {
 job->cb(job->opaque, job->ret);
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index 10ebb38..1c4bc90 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -74,6 +74,14 @@ struct BlockJobDriver {
 void (*abort)(BlockJob *job);
 
 /**
+ * If the callback is not NULL, it will be invoked after a call to either
+ * .commit() or .abort(). Regardless of which callback is invoked after
+ * completion, .clean() will always be called, even if the job does not
+ * belong to a transaction group.
+ */
+void (*clean)(BlockJob *job);
+
+/**
  * If the callback is not NULL, it will be invoked when the job transitions
  * into the paused state.  Paused jobs must not perform any asynchronous
  * I/O or event loop activity.  This callback is used to quiesce jobs.
-- 
2.7.4

[Qemu-block] [PATCH v2 0/6] jobs: fix transactional race condition

2016-10-24 Thread John Snow

Requires: [Qemu-devel] [PATCH 0/7] blockjobs: preliminary refactoring work, Pt 1

There are a few problems with transactional job completion right now.

First, if jobs complete so quickly they complete before remaining jobs
get a chance to join the transaction, the completion mode can leave well
known state and the QLIST can get corrupted and the transactional jobs
can complete in batches or phases instead of all together.

Second, if two or more jobs defer to the main loop at roughly the same
time, it's possible for one job's cleanup to directly invoke the other
job's cleanup from within the same thread, leading to a situation that
will deadlock the entire transaction.

Thanks to Vladimir for pointing out these modes of failure.

I have omitted the test for right now, but wanted to air the patches on-list.
It makes no attempt to change the locking mechanisms around qmp_transaction
right now, asserting instead that things are no more broken than they were,
especially in the case of dataplane. I will make further attempts to clarify
the locking mechanisms around qmp_transaction after Paolo's changes go in.

===
v2:
===

- Correct Vladimir's email (Sorry!)
- Add test as a variant of an existing test [Vladimir]



For convenience, this branch is available at:
https://github.com/jnsnow/qemu.git branch job-fix-race-condition
https://github.com/jnsnow/qemu/tree/job-fix-race-condition

This version is tagged job-fix-race-condition-v2:
https://github.com/jnsnow/qemu/releases/tag/job-fix-race-condition-v2

John Snow (5):
  blockjob: add .clean property
  blockjob: add .start field
  blockjob: add block_job_start
  blockjob: refactor backup_start as backup_job_create
  iotests: add transactional failure race test

Vladimir Sementsov-Ogievskiy (1):
  blockjob: fix dead pointer in txn list

 block/backup.c   | 59 +--
 block/commit.c   |  4 +--
 block/mirror.c   |  5 +--
 block/replication.c  | 12 ---
 block/stream.c   |  4 +--
 blockdev.c   | 83 
 blockjob.c   | 55 ++---
 include/block/block_int.h| 23 ++--
 include/block/blockjob.h |  9 +
 include/block/blockjob_int.h | 11 ++
 tests/qemu-iotests/124   | 53 ++--
 tests/qemu-iotests/124.out   |  4 +--
 tests/test-blockjob-txn.c| 12 +++
 13 files changed, 219 insertions(+), 115 deletions(-)

-- 
2.7.4

[Qemu-block] [PATCH v2 6/6] iotests: add transactional failure race test

2016-10-24 Thread John Snow

Add a regression test for the case found by Vladimir.

Reported-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: John Snow 
---
 tests/qemu-iotests/124 | 53 ++
 tests/qemu-iotests/124.out |  4 ++--
 2 files changed, 37 insertions(+), 20 deletions(-)

diff --git a/tests/qemu-iotests/124 b/tests/qemu-iotests/124
index 2f0bc24..bc855ed 100644
--- a/tests/qemu-iotests/124
+++ b/tests/qemu-iotests/124
@@ -395,19 +395,7 @@ class TestIncrementalBackup(TestIncrementalBackupBase):
 self.check_backups()
 
 
-def test_transaction_failure(self):
-'''Test: Verify backups made from a transaction that partially fails.
-
-Add a second drive with its own unique pattern, and add a bitmap to 
each
-drive. Use blkdebug to interfere with the backup on just one drive and
-attempt to create a coherent incremental backup across both drives.
-
-verify a failure in one but not both, then delete the failed stubs and
-re-run the same transaction.
-
-verify that both incrementals are created successfully.
-'''
-
+def do_transaction_failure_test(self, race=False):
 # Create a second drive, with pattern:
 drive1 = self.add_node('drive1')
 self.img_create(drive1['file'], drive1['fmt'])
@@ -451,9 +439,10 @@ class TestIncrementalBackup(TestIncrementalBackupBase):
 self.assertFalse(self.vm.get_qmp_events(wait=False))
 
 # Emulate some writes
-self.hmp_io_writes(drive0['id'], (('0xab', 0, 512),
-  ('0xfe', '16M', '256k'),
-  ('0x64', '32736k', '64k')))
+if not race:
+self.hmp_io_writes(drive0['id'], (('0xab', 0, 512),
+  ('0xfe', '16M', '256k'),
+  ('0x64', '32736k', '64k')))
 self.hmp_io_writes(drive1['id'], (('0xba', 0, 512),
   ('0xef', '16M', '256k'),
   ('0x46', '32736k', '64k')))
@@ -463,7 +452,8 @@ class TestIncrementalBackup(TestIncrementalBackupBase):
 target1 = self.prepare_backup(dr1bm0)
 
 # Ask for a new incremental backup per-each drive,
-# expecting drive1's backup to fail:
+# expecting drive1's backup to fail. In the 'race' test,
+# we expect drive1 to attempt to cancel the empty drive0 job.
 transaction = [
 transaction_drive_backup(drive0['id'], target0, sync='incremental',
  format=drive0['fmt'], mode='existing',
@@ -488,9 +478,15 @@ class TestIncrementalBackup(TestIncrementalBackupBase):
 self.assert_no_active_block_jobs()
 
 # Delete drive0's successful target and eliminate our record of the
-# unsuccessful drive1 target. Then re-run the same transaction.
+# unsuccessful drive1 target.
 dr0bm0.del_target()
 dr1bm0.del_target()
+if race:
+# Don't re-run the transaction, we only wanted to test the race.
+self.vm.shutdown()
+return
+
+# Re-run the same transaction:
 target0 = self.prepare_backup(dr0bm0)
 target1 = self.prepare_backup(dr1bm0)
 
@@ -511,6 +507,27 @@ class TestIncrementalBackup(TestIncrementalBackupBase):
 self.vm.shutdown()
 self.check_backups()
 
+def test_transaction_failure(self):
+'''Test: Verify backups made from a transaction that partially fails.
+
+Add a second drive with its own unique pattern, and add a bitmap to 
each
+drive. Use blkdebug to interfere with the backup on just one drive and
+attempt to create a coherent incremental backup across both drives.
+
+verify a failure in one but not both, then delete the failed stubs and
+re-run the same transaction.
+
+verify that both incrementals are created successfully.
+'''
+self.do_transaction_failure_test()
+
+def test_transaction_failure_race(self):
+'''Test: Verify that transactions with jobs that have no data to
+transfer do not cause race conditions in the cancellation of the entire
+transaction job group.
+'''
+self.do_transaction_failure_test(race=True)
+
 
 def test_sync_dirty_bitmap_missing(self):
 self.assert_no_active_block_jobs()
diff --git a/tests/qemu-iotests/124.out b/tests/qemu-iotests/124.out
index 36376be..e56cae0 100644
--- a/tests/qemu-iotests/124.out
+++ b/tests/qemu-iotests/124.out
@@ -1,5 +1,5 @@
-..
+...
 --
-Ran 10 tests
+Ran 11 tests
 
 OK
-- 
2.7.4

[Qemu-block] [PATCH v2 5/6] blockjob: refactor backup_start as backup_job_create

2016-10-24 Thread John Snow

Refactor backup_start as backup_job_create, which only creates the job,
but does not automatically start it. The old interface, 'backup_start',
is not kept in favor of limiting the number of nearly-identical interfaces
that would have to be edited to keep up with QAPI changes in the future.

Callers that wish to synchronously start the backup_block_job can
instead just call block_job_start immediately after calling
backup_job_create.

Transactions are updated to use the new interface, calling block_job_start
only during the .commit phase, which helps prevent race conditions where
jobs may finish before we even finish building the transaction. This may
happen, for instance, during empty block backup jobs.

Reported-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: John Snow 
---
 block/backup.c| 26 ---
 block/replication.c   | 12 ---
 blockdev.c| 83 ++-
 include/block/block_int.h | 23 ++---
 4 files changed, 87 insertions(+), 57 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 2ce5115..d7e5c48 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -527,7 +527,7 @@ static const BlockJobDriver backup_job_driver = {
 .attached_aio_context   = backup_attached_aio_context,
 };
 
-void backup_start(const char *job_id, BlockDriverState *bs,
+BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
   BlockDriverState *target, int64_t speed,
   MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
   bool compress,
@@ -547,52 +547,52 @@ void backup_start(const char *job_id, BlockDriverState 
*bs,
 
 if (bs == target) {
 error_setg(errp, "Source and target cannot be the same");
-return;
+return NULL;
 }
 
 if (!bdrv_is_inserted(bs)) {
 error_setg(errp, "Device is not inserted: %s",
bdrv_get_device_name(bs));
-return;
+return NULL;
 }
 
 if (!bdrv_is_inserted(target)) {
 error_setg(errp, "Device is not inserted: %s",
bdrv_get_device_name(target));
-return;
+return NULL;
 }
 
 if (compress && target->drv->bdrv_co_pwritev_compressed == NULL) {
 error_setg(errp, "Compression is not supported for this drive %s",
bdrv_get_device_name(target));
-return;
+return NULL;
 }
 
 if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKUP_SOURCE, errp)) {
-return;
+return NULL;
 }
 
 if (bdrv_op_is_blocked(target, BLOCK_OP_TYPE_BACKUP_TARGET, errp)) {
-return;
+return NULL;
 }
 
 if (sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
 if (!sync_bitmap) {
 error_setg(errp, "must provide a valid bitmap name for "
  "\"incremental\" sync mode");
-return;
+return NULL;
 }
 
 /* Create a new bitmap, and freeze/disable this one. */
 if (bdrv_dirty_bitmap_create_successor(bs, sync_bitmap, errp) < 0) {
-return;
+return NULL;
 }
 } else if (sync_bitmap) {
 error_setg(errp,
"a sync_bitmap was provided to backup_run, "
"but received an incompatible sync_mode (%s)",
MirrorSyncMode_lookup[sync_mode]);
-return;
+return NULL;
 }
 
 len = bdrv_getlength(bs);
@@ -639,8 +639,8 @@ void backup_start(const char *job_id, BlockDriverState *bs,
 bdrv_op_block_all(target, job->common.blocker);
 job->common.len = len;
 block_job_txn_add_job(txn, &job->common);
-block_job_start(&job->common);
-return;
+
+return &job->common;
 
  error:
 if (sync_bitmap) {
@@ -650,4 +650,6 @@ void backup_start(const char *job_id, BlockDriverState *bs,
 backup_clean(&job->common);
 block_job_unref(&job->common);
 }
+
+return NULL;
 }
diff --git a/block/replication.c b/block/replication.c
index d4f4a7b..ca4a381 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -409,6 +409,7 @@ static void replication_start(ReplicationState *rs, 
ReplicationMode mode,
 int64_t active_length, hidden_length, disk_length;
 AioContext *aio_context;
 Error *local_err = NULL;
+BlockJob *job;
 
 aio_context = bdrv_get_aio_context(bs);
 aio_context_acquire(aio_context);
@@ -496,17 +497,18 @@ static void replication_start(ReplicationState *rs, 
ReplicationMode mode,
 bdrv_op_block_all(top_bs, s->blocker);
 bdrv_op_unblock(top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);
 
-backup_start(NULL, s->secondary_disk->bs, s->hidden_disk->bs, 0,
- MIRROR_SYNC_MODE_NONE, NULL, false,
- BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
- BLOCK_JOB_INTERNAL, backup_job_completed, s,
- NULL, &local_err);
+j

[Qemu-block] [PATCH v2 4/6] blockjob: add block_job_start

2016-10-24 Thread John Snow

Instead of automatically starting jobs at creation time via backup_start
et al, we'd like to return a job object pointer that can be started
manually at later point in time.

For now, add the block_job_start mechanism and start the jobs
automatically as we have been doing, with conversions job-by-job coming
in later patches.

Of note: cancellation of unstarted jobs will perform all the normal
cleanup as if the job had started, particularly abort and clean. The
only difference is that we will not emit any events, because the job
never actually started.

Signed-off-by: John Snow 
---
 block/backup.c|  3 +--
 block/commit.c|  3 +--
 block/mirror.c|  3 +--
 block/stream.c|  3 +--
 blockjob.c| 51 ---
 include/block/blockjob.h  |  9 +
 tests/test-blockjob-txn.c | 12 +--
 7 files changed, 58 insertions(+), 26 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 622f64e..2ce5115 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -638,9 +638,8 @@ void backup_start(const char *job_id, BlockDriverState *bs,
 
 bdrv_op_block_all(target, job->common.blocker);
 job->common.len = len;
-job->common.co = qemu_coroutine_create(job->common.driver->start, job);
 block_job_txn_add_job(txn, &job->common);
-qemu_coroutine_enter(job->common.co);
+block_job_start(&job->common);
 return;
 
  error:
diff --git a/block/commit.c b/block/commit.c
index cc2030d..89820d7 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -275,10 +275,9 @@ void commit_start(const char *job_id, BlockDriverState *bs,
 s->backing_file_str = g_strdup(backing_file_str);
 
 s->on_error = on_error;
-s->common.co = qemu_coroutine_create(s->common.driver->start, s);
 
 trace_commit_start(bs, base, top, s, s->common.co);
-qemu_coroutine_enter(s->common.co);
+block_job_start(&s->common);
 }
 
 
diff --git a/block/mirror.c b/block/mirror.c
index 3a29b94..8130474 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -970,9 +970,8 @@ static void mirror_start_job(const char *job_id, 
BlockDriverState *bs,
 
 bdrv_op_block_all(target, s->common.blocker);
 
-s->common.co = qemu_coroutine_create(s->common.driver->start, s);
 trace_mirror_start(bs, s, s->common.co, opaque);
-qemu_coroutine_enter(s->common.co);
+block_job_start(&s->common);
 }
 
 void mirror_start(const char *job_id, BlockDriverState *bs,
diff --git a/block/stream.c b/block/stream.c
index 8ffed9c..3e3a7d3 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -231,7 +231,6 @@ void stream_start(const char *job_id, BlockDriverState *bs,
 s->backing_file_str = g_strdup(backing_file_str);
 
 s->on_error = on_error;
-s->common.co = qemu_coroutine_create(s->common.driver->start, s);
 trace_stream_start(bs, base, s, s->common.co);
-qemu_coroutine_enter(s->common.co);
+block_job_start(&s->common);
 }
diff --git a/blockjob.c b/blockjob.c
index 150b87e..f574bc8 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -171,7 +171,8 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
 job->blk   = blk;
 job->cb= cb;
 job->opaque= opaque;
-job->busy  = true;
+job->busy  = false;
+job->paused= true;
 job->refcnt= 1;
 bs->job = job;
 
@@ -199,6 +200,21 @@ bool block_job_is_internal(BlockJob *job)
 return (job->id == NULL);
 }
 
+static bool block_job_started(BlockJob *job)
+{
+return job->co;
+}
+
+void block_job_start(BlockJob *job)
+{
+assert(job && !block_job_started(job) && job->paused &&
+   !job->busy && job->driver->start);
+job->paused = false;
+job->busy = true;
+job->co = qemu_coroutine_create(job->driver->start, job);
+qemu_coroutine_enter(job->co);
+}
+
 void block_job_ref(BlockJob *job)
 {
 ++job->refcnt;
@@ -239,14 +255,18 @@ static void block_job_completed_single(BlockJob *job)
 if (job->cb) {
 job->cb(job->opaque, job->ret);
 }
-if (block_job_is_cancelled(job)) {
-block_job_event_cancelled(job);
-} else {
-const char *msg = NULL;
-if (job->ret < 0) {
-msg = strerror(-job->ret);
+
+/* Emit events only if we actually started */
+if (block_job_started(job)) {
+if (block_job_is_cancelled(job)) {
+block_job_event_cancelled(job);
+} else {
+const char *msg = NULL;
+if (job->ret < 0) {
+msg = strerror(-job->ret);
+}
+block_job_event_completed(job, msg);
 }
-block_job_event_completed(job, msg);
 }
 
 if (job->txn) {
@@ -354,7 +374,8 @@ void block_job_complete(BlockJob *job, Error **errp)
 {
 /* Should not be reachable via external interface for internal jobs */
 assert(job->id);
-if (job->pause_count || job->cancelled || !job->driver->complete) {
+if (j

Re: [Qemu-block] does blk_commit_all need blk_all_next?

2016-10-24 Thread Paolo Bonzini

On Monday, October 24, 2016 8:12:48 PM, "Max Reitz"  wrote:
> On 24.10.2016 10:45, Paolo Bonzini wrote:
> > blk_commit_all is in block/block-backend.c only because it uses
> > blk_all_next.  This is also the only reason why it needs a stub
> > (stubs/blk-commit-all.c).
> > 
> > The only difference between blk_next and blk_all_next is that the latter
> > "iterates over all BlockBackends, even the ones which are hidden (i.e.
> > are not referenced by the monitor)".  Should blk_commit_all really
> > iterate over BlockBackends such as the NBD server or the block jobs'?
> 
> I guess bdrv_commit_all() did, so blk_commit_all() does now, too.
> 
> The issue is, though, that currently "hidden BB" is equivalent to
> "anonymous BB". And I think that Kevin is working towards basically all
> BBs being anonymous, i.e. hidden.
> 
> When using -drive without id but with node-name, then you'll get a
> hidden BB. So we can't get away without blk_all_next().  But maybe
> it would make sense to skip all BBs that are not attached to a
> device?

Yeah, it sounds like it shouldn't be blk_all_next() but another public
API.

Paolo

Re: [Qemu-block] [Qemu-devel] Assertion failure on qcow2 disk with cluster_size != 64k

2016-10-24 Thread Eric Blake

On 10/20/2016 07:24 PM, Ed Swierk wrote:
> Shortly after I start qemu 2.7.0 with a qcow2 disk image created with
> -o cluster_size=1048576, it prints the following and dies:
> 
> block/qcow2.c:2451: qcow2_co_pwrite_zeroes: Assertion `head + count <=
> s->cluster_size' failed.
> 
> I narrowed the problem to bdrv_co_do_pwrite_zeroes(), called by
> bdrv_aligned_pwritev() with flags & BDRV_REQ_ZERO_WRITE set.
> 
> On the first loop iteration, offset=8003584, count=2093056,
> head=663552, tail=659456 and num=2093056. qcow2_co_pwrite_zeroes() is
> called with offset=8003584 and count=385024 and finds that the head
> portion is not already zero, so it returns -ENOTSUP.
> bdrv_co_do_pwrite_zeroes() falls back to a normal write, with
> max_transfer=65536.

How are you getting max_transfer == 65536?  I can't reproduce it with
the following setup:

$ qemu-img create -f qcow2 -o cluster_size=1M file 10M
$ qemu-io -f qcow2 -c 'w 7m 1k' file
$ qemu-io -f qcow2 -c 'w -z 8003584 2093056' file

although I did confirm that the above sequence was enough to get the
-ENOTSUP failure and fall into the code calculating max_transfer.

I'm guessing that you are using something other than a file system as
the backing protocol for your qcow2 image.  But do you really have a
protocol that takes AT MOST 64k per transaction, while still trying to a
cluster size of 1M in the qcow2 format?  That's rather awkward, as it
means that you are required to do 16 transactions per cluster (the whole
point of using larger clusters is usually to get fewer transactions).  I
think we need to get to a root cause of why you are seeing such a small
max_transfer, before I can propose the right patch, since I haven't been
able to reproduce it locally yet (although I admit I haven't tried to
see if blkdebug could reliably introduce artificial limits to simulate
your setup).  And it may turn out that I just have to fix the
bdrv_co_do_pwrite_zeroes() code to loop multiple times if the size of
the unaligned head really does exceed the max_transfer size that the
underlying protocol is able to support, rather than assuming that the
unaligned head/tail always fit in a single fallback write.

Can you also try this patch? If I'm right, you'll still fail, but the
assertion will be slightly different.  (Again, I'm passing locally, but
that's because I'm using the file protocol, and my file system does not
impose a puny 64k max transfer).

diff --git i/block/io.c w/block/io.c
index b136c89..8757063 100644
--- i/block/io.c
+++ w/block/io.c
@@ -1179,6 +1179,8 @@ static int coroutine_fn
bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
 int max_write_zeroes = MIN_NON_ZERO(bs->bl.max_pwrite_zeroes, INT_MAX);
 int alignment = MAX(bs->bl.pwrite_zeroes_alignment,
 bs->bl.request_alignment);
+int max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
+MAX_WRITE_ZEROES_BOUNCE_BUFFER);

 assert(alignment % bs->bl.request_alignment == 0);
 head = offset % alignment;
@@ -1197,6 +1199,8 @@ static int coroutine_fn
bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
 /* Make a small request up to the first aligned sector.  */
 num = MIN(count, alignment - head);
 head = 0;
+assert(num < max_write_zeroes);
+assert(num < max_transfer);
 } else if (tail && num > alignment) {
 /* Shorten the request to the last aligned sector.  */
 num -= tail;
@@ -1222,8 +1226,6 @@ static int coroutine_fn
bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,

 if (ret == -ENOTSUP) {
 /* Fall back to bounce buffer if write zeroes is unsupported */
-int max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
-
MAX_WRITE_ZEROES_BOUNCE_BUFFER);
 BdrvRequestFlags write_flags = flags & ~BDRV_REQ_ZERO_WRITE;

 if ((flags & BDRV_REQ_FUA) &&

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [Qemu-devel] Assertion failure on qcow2 disk with cluster_size != 64k

2016-10-24 Thread Ed Swierk

On Mon, Oct 24, 2016 at 2:21 PM, Eric Blake  wrote:
> How are you getting max_transfer == 65536?  I can't reproduce it with
> the following setup:
>
> $ qemu-img create -f qcow2 -o cluster_size=1M file 10M
> $ qemu-io -f qcow2 -c 'w 7m 1k' file
> $ qemu-io -f qcow2 -c 'w -z 8003584 2093056' file
>
> although I did confirm that the above sequence was enough to get the
> -ENOTSUP failure and fall into the code calculating max_transfer.
>
> I'm guessing that you are using something other than a file system as
> the backing protocol for your qcow2 image.  But do you really have a
> protocol that takes AT MOST 64k per transaction, while still trying to a
> cluster size of 1M in the qcow2 format?  That's rather awkward, as it
> means that you are required to do 16 transactions per cluster (the whole
> point of using larger clusters is usually to get fewer transactions).  I
> think we need to get to a root cause of why you are seeing such a small
> max_transfer, before I can propose the right patch, since I haven't been
> able to reproduce it locally yet (although I admit I haven't tried to
> see if blkdebug could reliably introduce artificial limits to simulate
> your setup).  And it may turn out that I just have to fix the
> bdrv_co_do_pwrite_zeroes() code to loop multiple times if the size of
> the unaligned head really does exceed the max_transfer size that the
> underlying protocol is able to support, rather than assuming that the
> unaligned head/tail always fit in a single fallback write.

In this case I'm using a qcow2 image that's stored directly in a raw
dm-crypt/LUKS container, which is itself a loop device on an ext4
filesystem.

It appears loop devices (with or without dm-crypt/LUKS) report a
255-sector maximum per request via the BLKSECTGET ioctl, which qemu
rounds down to 64k in raw_refresh_limits(). However this maximum
appears to be just a hint: bdrv_driver_pwritev() succeeds even with a
385024-byte buffer of zeroes.

As for the 1M cluster size, this is a temporary workaround for another
qemu issue (the default qcow2 L2 table cache size performs well with
random reads covering only up to 8 GB of image data with 64k clusters;
beyond that the L2 table cache thrashes). I agree this is not an
optimal configuration for writes.

> Can you also try this patch? If I'm right, you'll still fail, but the
> assertion will be slightly different.  (Again, I'm passing locally, but
> that's because I'm using the file protocol, and my file system does not
> impose a puny 64k max transfer).
>
> diff --git i/block/io.c w/block/io.c
> index b136c89..8757063 100644
> --- i/block/io.c
> +++ w/block/io.c
> @@ -1179,6 +1179,8 @@ static int coroutine_fn
> bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>  int max_write_zeroes = MIN_NON_ZERO(bs->bl.max_pwrite_zeroes, INT_MAX);
>  int alignment = MAX(bs->bl.pwrite_zeroes_alignment,
>  bs->bl.request_alignment);
> +int max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
> +MAX_WRITE_ZEROES_BOUNCE_BUFFER);
>
>  assert(alignment % bs->bl.request_alignment == 0);
>  head = offset % alignment;
> @@ -1197,6 +1199,8 @@ static int coroutine_fn
> bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>  /* Make a small request up to the first aligned sector.  */
>  num = MIN(count, alignment - head);
>  head = 0;
> +assert(num < max_write_zeroes);
> +assert(num < max_transfer);
>  } else if (tail && num > alignment) {
>  /* Shorten the request to the last aligned sector.  */
>  num -= tail;
> @@ -1222,8 +1226,6 @@ static int coroutine_fn
> bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>
>  if (ret == -ENOTSUP) {
>  /* Fall back to bounce buffer if write zeroes is unsupported */
> -int max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
> -
> MAX_WRITE_ZEROES_BOUNCE_BUFFER);
>  BdrvRequestFlags write_flags = flags & ~BDRV_REQ_ZERO_WRITE;
>
>  if ((flags & BDRV_REQ_FUA) &&

With this change, the num < max_transfer assertion fails on the first
iteration (with num=385024 and max_transfer=65536).

--Ed

Re: [Qemu-block] [Qemu-devel] [PATCH] nbd: Use CoQueue for free_sema instead of CoMutex

2016-10-24 Thread Changlong Xie


On 10/24/2016 05:36 PM, Paolo Bonzini wrote:



On 24/10/2016 03:44, Changlong Xie wrote:

Ping. Any comments? It's really a problem for NBD.


Sorry, I haven't been sending pull requests.  I'll do it this week.



Thanks : )


Paolo


Thanks
 -Xie

On 10/12/2016 06:18 PM, Changlong Xie wrote:

NBD is using the CoMutex in a way that wasn't anticipated. For
example, if there are
N(N=26, MAX_NBD_REQUESTS=16) nbd write requests, so we will invoke
nbd_client_co_pwritev
N times.


time request Actions
11   in_flight=1, Coroutine=C1
22   in_flight=2, Coroutine=C2
...
15   15  in_flight=15, Coroutine=C15
16   16  in_flight=16, Coroutine=C16, free_sema->holder=C16,
mutex->locked=true
17   17  in_flight=16, Coroutine=C17, queue C17 into free_sema->queue
18   18  in_flight=16, Coroutine=C18, queue C18 into free_sema->queue
...
26   N   in_flight=16, Coroutine=C26, queue C26 into free_sema->queue



Once nbd client recieves request No.16' reply, we will re-enter C16.
It's ok, because
it's equal to 'free_sema->holder'.


time request Actions
27   16  in_flight=15, Coroutine=C16, free_sema->holder=C16,
mutex->locked=false



Then nbd_coroutine_end invokes qemu_co_mutex_unlock what will pop
coroutines from
free_sema->queue's head and enter C17. More free_sema->holder is C17 now.


time request Actions
28   17  in_flight=16, Coroutine=C17, free_sema->holder=C17,
mutex->locked=true



In above scenario, we only recieves request No.16' reply. As time goes
by, nbd client will
almostly recieves replies from requests 1 to 15 rather than request 17
who owns C17. In this
case, we will encounter assert "mutex->holder == self" failed since
Kevin's commit 0e438cdc
"coroutine: Let CoMutex remember who holds it". For example, if nbd
client recieves request
No.15' reply, qemu will stop unexpectedly:


time request   Actions
29   15(most case) in_flight=15, Coroutine=C15, free_sema->holder=C17,
mutex->locked=false



Per Paolo's suggestion "The simplest fix is to change it to CoQueue,
which is like a condition
variable", this patch replaces CoMutex with CoQueue.

Cc: Wen Congyang 
Reported-by: zhanghailiang 
Suggested-by: Paolo Bonzini 
Signed-off-by: Changlong Xie 
---
   block/nbd-client.c | 8 
   block/nbd-client.h | 2 +-
   2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 2cf3237..40b28ab 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -199,8 +199,8 @@ static void nbd_coroutine_start(NbdClientSession *s,
   {
   /* Poor man semaphore.  The free_sema is locked when no other
request
* can be accepted, and unlocked after receiving one reply.  */
-if (s->in_flight >= MAX_NBD_REQUESTS - 1) {
-qemu_co_mutex_lock(&s->free_sema);
+if (s->in_flight == MAX_NBD_REQUESTS) {
+qemu_co_queue_wait(&s->free_sema);
   assert(s->in_flight < MAX_NBD_REQUESTS);
   }
   s->in_flight++;
@@ -214,7 +214,7 @@ static void nbd_coroutine_end(NbdClientSession *s,
   int i = HANDLE_TO_INDEX(s, request->handle);
   s->recv_coroutine[i] = NULL;
   if (s->in_flight-- == MAX_NBD_REQUESTS) {
-qemu_co_mutex_unlock(&s->free_sema);
+qemu_co_queue_next(&s->free_sema);
   }
   }

@@ -386,7 +386,7 @@ int nbd_client_init(BlockDriverState *bs,
   }

   qemu_co_mutex_init(&client->send_mutex);
-qemu_co_mutex_init(&client->free_sema);
+qemu_co_queue_init(&client->free_sema);
   client->sioc = sioc;
   object_ref(OBJECT(client->sioc));

diff --git a/block/nbd-client.h b/block/nbd-client.h
index 044aca4..307b8b1 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -24,7 +24,7 @@ typedef struct NbdClientSession {
   off_t size;

   CoMutex send_mutex;
-CoMutex free_sema;
+CoQueue free_sema;
   Coroutine *send_coroutine;
   int in_flight;










.

[Qemu-block] [PATCH 0/4] block/curl: Fix FTP

2016-10-24 Thread Max Reitz

At least for me, the FTP support of our curl block driver currently
doesn't work at all. This is due to (at least) three issues, for each of
which this series provides a patch (and the first patch is just a minor
clean-up).

1. When establishing an FTP connection, libcurl hands us some data we do
   not expect because we have not really been asking for it. Not an
   issue in theory, because we can just ignore it. Unfortunately, qemu
   has decided to be more direct about the issue and tell libcurl that
   we did not process any of that data. libcurl doesn't like that. At
   all. Therefore, it returns the favor and just cancels the connection.
   In effect, it is impossible to even open a connection to an FTP
   server (at least in my test environment).
   Patch 2 fixes this by just playing along nicely.

2. libcurl has an old function called curl_multi_socket_all(). It allows
   you to kick off action on all of the sockets there are.
   Unfortunately, it is deprecated. Therefore, our code decides to be
   good and use the non-deprecated curl_multi_socket_action() function.
   However, that one only works on a single socket and wants to know
   which. So our code remembers the socket of the current connection.
   Works great for HTTP which generally only uses one socket.
   Unfortunately, FTP generally uses at least two, one for the control
   and one for the data stream. So us remembering only one of those two
   results in qemu only being able to receive the first 16 kB of any
   request (and maybe even of any connection).
   Patch 3 fixes this by putting the sockets into a list and thus being
   able to remember more than one.

3. The first two patches make curl work on files with file sizes that
   are multiples of 512, but not so well with others. curl still uses
   the sector-based interface, so it may receive requests beyond the
   EOF into the partial last sector. While it will actually not pass a
   request beyond the EOF to libcurl, it will unfortunately still wait
   to receive data from there. Which of course will not happen. So every
   request into that last sector makes the whole BDS hang indefinitely.
   Patch 4 fixes this by letting go of the futile hope for data from
   where there is none.


Max Reitz (4):
  block/curl: Use BDRV_SECTOR_SIZE
  block/curl: Fix return value from curl_read_cb
  block/curl: Remember all sockets
  block/curl: Do not wait for data beyond EOF

 block/curl.c | 99 +---
 1 file changed, 75 insertions(+), 24 deletions(-)

-- 
2.10.1

[Qemu-block] [PATCH 4/4] block/curl: Do not wait for data beyond EOF

2016-10-24 Thread Max Reitz

libcurl will only give us as much data as there is, not more. The block
layer will deny requests beyond the end of file for us; but since this
block driver is still using a sector-based interface, we can still get
in trouble if the file size is not a multiple of 512.

While we have already made sure not to attempt transfers beyond the end
of the file, we are currently still trying to receive data from there if
the original request exceeds the file size. This patch fixes this issue
and invokes qemu_iovec_memset() on the iovec's tail.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Max Reitz 
---
 block/curl.c | 32 +++-
 1 file changed, 23 insertions(+), 9 deletions(-)

diff --git a/block/curl.c b/block/curl.c
index 4fbba5c..2cb875a 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -253,8 +253,17 @@ static size_t curl_read_cb(void *ptr, size_t size, size_t 
nmemb, void *opaque)
 continue;
 
 if ((s->buf_off >= acb->end)) {
+size_t request_length = acb->nb_sectors * BDRV_SECTOR_SIZE;
+
 qemu_iovec_from_buf(acb->qiov, 0, s->orig_buf + acb->start,
 acb->end - acb->start);
+
+if (acb->end - acb->start < request_length) {
+size_t offset = acb->end - acb->start;
+qemu_iovec_memset(acb->qiov, offset, 0,
+  request_length - offset);
+}
+
 acb->common.cb(acb->common.opaque, 0);
 qemu_aio_unref(acb);
 s->acb[i] = NULL;
@@ -271,6 +280,8 @@ static int curl_find_buf(BDRVCURLState *s, size_t start, 
size_t len,
 {
 int i;
 size_t end = start + len;
+size_t clamped_end = MIN(end, s->len);
+size_t clamped_len = clamped_end - start;
 
 for (i=0; istates[i];
@@ -285,12 +296,15 @@ static int curl_find_buf(BDRVCURLState *s, size_t start, 
size_t len,
 // Does the existing buffer cover our section?
 if ((start >= state->buf_start) &&
 (start <= buf_end) &&
-(end >= state->buf_start) &&
-(end <= buf_end))
+(clamped_end >= state->buf_start) &&
+(clamped_end <= buf_end))
 {
 char *buf = state->orig_buf + (start - state->buf_start);
 
-qemu_iovec_from_buf(acb->qiov, 0, buf, len);
+qemu_iovec_from_buf(acb->qiov, 0, buf, clamped_len);
+if (clamped_len < len) {
+qemu_iovec_memset(acb->qiov, clamped_len, 0, len - 
clamped_len);
+}
 acb->common.cb(acb->common.opaque, 0);
 
 return FIND_RET_OK;
@@ -300,13 +314,13 @@ static int curl_find_buf(BDRVCURLState *s, size_t start, 
size_t len,
 if (state->in_use &&
 (start >= state->buf_start) &&
 (start <= buf_fend) &&
-(end >= state->buf_start) &&
-(end <= buf_fend))
+(clamped_end >= state->buf_start) &&
+(clamped_end <= buf_fend))
 {
 int j;
 
 acb->start = start - state->buf_start;
-acb->end = acb->start + len;
+acb->end = acb->start + clamped_len;
 
 for (j=0; jacb[j]) {
@@ -799,13 +813,13 @@ static void curl_readv_bh_cb(void *p)
 }
 
 acb->start = 0;
-acb->end = (acb->nb_sectors * BDRV_SECTOR_SIZE);
+acb->end = MIN(acb->nb_sectors * BDRV_SECTOR_SIZE, s->len - start);
 
 state->buf_off = 0;
 g_free(state->orig_buf);
 state->buf_start = start;
-state->buf_len = acb->end + s->readahead_size;
-end = MIN(start + state->buf_len, s->len) - 1;
+state->buf_len = MIN(acb->end + s->readahead_size, s->len - start);
+end = start + state->buf_len - 1;
 state->orig_buf = g_try_malloc(state->buf_len);
 if (state->buf_len && state->orig_buf == NULL) {
 curl_clean_state(state);
-- 
2.10.1

[Qemu-block] [PATCH 3/4] block/curl: Remember all sockets

2016-10-24 Thread Max Reitz

For some connection types (like FTP, generally), more than one socket
may be used (in FTP's case: control vs. data stream). As of commit
838ef602498b8d1985a231a06f5e328e2946a81d ("curl: Eliminate unnecessary
use of curl_multi_socket_all"), we have to remember all of the sockets
used by libcurl, but in fact we only did that for a single one. Since
one libcurl connection may use multiple sockets, however, we have to
remember them all.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Max Reitz 
---
 block/curl.c | 47 +--
 1 file changed, 41 insertions(+), 6 deletions(-)

diff --git a/block/curl.c b/block/curl.c
index 095ffda..4fbba5c 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -104,12 +104,17 @@ typedef struct CURLAIOCB {
 size_t end;
 } CURLAIOCB;
 
+typedef struct CURLSocket {
+int fd;
+QLIST_ENTRY(CURLSocket) next;
+} CURLSocket;
+
 typedef struct CURLState
 {
 struct BDRVCURLState *s;
 CURLAIOCB *acb[CURL_NUM_ACB];
 CURL *curl;
-curl_socket_t sock_fd;
+QLIST_HEAD(, CURLSocket) sockets;
 char *orig_buf;
 size_t buf_start;
 size_t buf_off;
@@ -163,10 +168,27 @@ static int curl_sock_cb(CURL *curl, curl_socket_t fd, int 
action,
 {
 BDRVCURLState *s;
 CURLState *state = NULL;
+CURLSocket *socket;
+
 curl_easy_getinfo(curl, CURLINFO_PRIVATE, (char **)&state);
-state->sock_fd = fd;
 s = state->s;
 
+QLIST_FOREACH(socket, &state->sockets, next) {
+if (socket->fd == fd) {
+if (action == CURL_POLL_REMOVE) {
+QLIST_REMOVE(socket, next);
+g_free(socket);
+}
+break;
+}
+}
+if (!socket) {
+socket = g_new0(CURLSocket, 1);
+socket->fd = fd;
+QLIST_INSERT_HEAD(&state->sockets, socket, next);
+}
+socket = NULL;
+
 DPRINTF("CURL (AIO): Sock action %d on fd %d\n", action, (int)fd);
 switch (action) {
 case CURL_POLL_IN:
@@ -354,6 +376,7 @@ static void curl_multi_check_completion(BDRVCURLState *s)
 static void curl_multi_do(void *arg)
 {
 CURLState *s = (CURLState *)arg;
+CURLSocket *socket, *next_socket;
 int running;
 int r;
 
@@ -361,10 +384,13 @@ static void curl_multi_do(void *arg)
 return;
 }
 
-do {
-r = curl_multi_socket_action(s->s->multi, s->sock_fd, 0, &running);
-} while(r == CURLM_CALL_MULTI_PERFORM);
-
+/* Need to use _SAFE because curl_multi_socket_action() may trigger
+ * curl_sock_cb() which might modify this list */
+QLIST_FOREACH_SAFE(socket, &s->sockets, next, next_socket) {
+do {
+r = curl_multi_socket_action(s->s->multi, socket->fd, 0, &running);
+} while (r == CURLM_CALL_MULTI_PERFORM);
+}
 }
 
 static void curl_multi_read(void *arg)
@@ -468,6 +494,7 @@ static CURLState *curl_init_state(BlockDriverState *bs, 
BDRVCURLState *s)
 #endif
 }
 
+QLIST_INIT(&state->sockets);
 state->s = s;
 
 return state;
@@ -477,6 +504,14 @@ static void curl_clean_state(CURLState *s)
 {
 if (s->s->multi)
 curl_multi_remove_handle(s->s->multi, s->curl);
+
+while (!QLIST_EMPTY(&s->sockets)) {
+CURLSocket *socket = QLIST_FIRST(&s->sockets);
+
+QLIST_REMOVE(socket, next);
+g_free(socket);
+}
+
 s->in_use = 0;
 }
 
-- 
2.10.1

[Qemu-block] [PATCH 1/4] block/curl: Use BDRV_SECTOR_SIZE

2016-10-24 Thread Max Reitz

Currently, curl defines its own constant SECTOR_SIZE. There is no
advantage over using the global BDRV_SECTOR_SIZE, so drop it.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Max Reitz 
---
 block/curl.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/block/curl.c b/block/curl.c
index e5eaa7b..12afa15 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -73,7 +73,6 @@ static CURLMcode __curl_multi_socket_action(CURLM 
*multi_handle,
 
 #define CURL_NUM_STATES 8
 #define CURL_NUM_ACB8
-#define SECTOR_SIZE 512
 #define READ_AHEAD_DEFAULT (256 * 1024)
 #define CURL_TIMEOUT_DEFAULT 5
 #define CURL_TIMEOUT_MAX 1
@@ -738,12 +737,12 @@ static void curl_readv_bh_cb(void *p)
 CURLAIOCB *acb = p;
 BDRVCURLState *s = acb->common.bs->opaque;
 
-size_t start = acb->sector_num * SECTOR_SIZE;
+size_t start = acb->sector_num * BDRV_SECTOR_SIZE;
 size_t end;
 
 // In case we have the requested data already (e.g. read-ahead),
 // we can just call the callback and be done.
-switch (curl_find_buf(s, start, acb->nb_sectors * SECTOR_SIZE, acb)) {
+switch (curl_find_buf(s, start, acb->nb_sectors * BDRV_SECTOR_SIZE, acb)) {
 case FIND_RET_OK:
 qemu_aio_unref(acb);
 // fall through
@@ -762,7 +761,7 @@ static void curl_readv_bh_cb(void *p)
 }
 
 acb->start = 0;
-acb->end = (acb->nb_sectors * SECTOR_SIZE);
+acb->end = (acb->nb_sectors * BDRV_SECTOR_SIZE);
 
 state->buf_off = 0;
 g_free(state->orig_buf);
@@ -779,8 +778,8 @@ static void curl_readv_bh_cb(void *p)
 state->acb[0] = acb;
 
 snprintf(state->range, 127, "%zd-%zd", start, end);
-DPRINTF("CURL (AIO): Reading %d at %zd (%s)\n",
-(acb->nb_sectors * SECTOR_SIZE), start, state->range);
+DPRINTF("CURL (AIO): Reading %llu at %zd (%s)\n",
+(acb->nb_sectors * BDRV_SECTOR_SIZE), start, state->range);
 curl_easy_setopt(state->curl, CURLOPT_RANGE, state->range);
 
 curl_multi_add_handle(s->multi, state->curl);
-- 
2.10.1

[Qemu-block] [PATCH 2/4] block/curl: Fix return value from curl_read_cb

2016-10-24 Thread Max Reitz

While commit 38bbc0a580f9f10570b1d1b5d3e92f0e6feb2970 is correct in that
the callback is supposed to return the number of bytes handled; what it
does not mention is that libcurl will throw an error if the callback did
not "handle" all of the data passed to it.

Therefore, if the callback receives some data that it cannot handle
(either because the receive buffer has not been set up yet or because it
would not fit into the receive buffer) and we have to ignore it, we
still have to report that the data has been handled.

Obviously, this should not happen normally. But it does happen at least
for FTP connections where some data (that we do not expect) may be
generated when the connection is established.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Max Reitz 
---
 block/curl.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/block/curl.c b/block/curl.c
index 12afa15..095ffda 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -212,12 +212,13 @@ static size_t curl_read_cb(void *ptr, size_t size, size_t 
nmemb, void *opaque)
 
 DPRINTF("CURL: Just reading %zd bytes\n", realsize);
 
-if (!s || !s->orig_buf)
-return 0;
+if (!s || !s->orig_buf) {
+goto read_end;
+}
 
 if (s->buf_off >= s->buf_len) {
 /* buffer full, read nothing */
-return 0;
+goto read_end;
 }
 realsize = MIN(realsize, s->buf_len - s->buf_off);
 memcpy(s->orig_buf + s->buf_off, ptr, realsize);
@@ -238,7 +239,9 @@ static size_t curl_read_cb(void *ptr, size_t size, size_t 
nmemb, void *opaque)
 }
 }
 
-return realsize;
+read_end:
+/* curl will error out if we do not return this value */
+return size * nmemb;
 }
 
 static int curl_find_buf(BDRVCURLState *s, size_t start, size_t len,
-- 
2.10.1

Re: [Qemu-block] [PATCH v8 02/36] qapi: Add ImageLockMode

2016-10-24 Thread Fam Zheng

On Fri, 10/21 22:45, Max Reitz wrote:
> On 30.09.2016 14:09, Fam Zheng wrote:
> > Signed-off-by: Fam Zheng 
> > ---
> >  qapi/block-core.json | 18 ++
> >  1 file changed, 18 insertions(+)
> > 
> > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > index 92193ab..22e8d04 100644
> > --- a/qapi/block-core.json
> > +++ b/qapi/block-core.json
> > @@ -2754,3 +2754,21 @@
> >'data' : { 'parent': 'str',
> >   '*child': 'str',
> >   '*node': 'str' } }
> > +
> > +##
> > +# @ImageLockMode:
> > +#
> > +# @auto: defer to the block driver to use the least strict mode, based on
> > +#the nature of format and read-only flag, and the supported locking
> > +#operations of the protocol.
> 
> I have some difficulty understanding this description. I'd intuitively
> assume no locking to be the "least strict mode"; however, since it
> should be always possible not to lock an image, this would mean that
> auto=nolock. Which is hopefully isn't.
> 
> If it's not easy to come up with a thorough explanation, perhaps it
> would be best to give some examples which help to understand the concept
> behind "auto" intuitively.

It could have beeen more specific, it's my bad being too terse here. Maybe
something like this:

@auto: defer to the block layer to use an appropriate lock mode, based on
   the driver used and read-only option: for read-only images, shared
   lock mode, or otherwise exclusive lock mode, will be attempted; if
   the driver doesn't support this mode (or sharing is particularly
   desired by its design), nolock will be used.

?

Fam

Re: [Qemu-block] [Qemu-devel] [PATCH v8 15/36] qdev: Add "lock-mode" to block device options

2016-10-24 Thread Fam Zheng

On Sat, 10/22 02:11, Max Reitz wrote:
> On 30.09.2016 14:09, Fam Zheng wrote:
> > Signed-off-by: Fam Zheng 
> > ---
> >  hw/core/qdev-properties.c| 10 ++
> >  include/hw/block/block.h |  1 +
> >  include/hw/qdev-properties.h |  3 +++
> >  3 files changed, 14 insertions(+)
> 
> Why don't you add _conf.lock_mode in this very patch (instead of patch
> 3) and pull it in front of patch 12? Setting the mode won't do anything
> until it's implemented for the various block devices, but I don't think
> that's a bad thing.

Sounds good!

Fam

Re: [Qemu-block] [PATCH v8 03/36] block: Introduce image file locking

2016-10-24 Thread Fam Zheng

On Fri, 10/21 23:04, Max Reitz wrote:
> > +ImageLockMode bdrv_lock_mode_from_flags(int flags)
> > +{
> > +if (flags & BDRV_O_NO_LOCK) {
> > +return IMAGE_LOCK_MODE_NOLOCK;
> > +} else if (flags & BDRV_O_SHARED_LOCK) {
> > +return IMAGE_LOCK_MODE_SHARED;
> > +} else if (flags & BDRV_O_EXCLUSIVE_LOCK) {
> > +return IMAGE_LOCK_MODE_EXCLUSIVE;
> > +} else {
> > +return IMAGE_LOCK_MODE_AUTO;
> > +}
> > +}
> 
> I don't know if there's been any discussion about the order of the flags
> here, but I personally would order them exactly the other way around:
> Asking for exclusive locking should override nolock, in my opinion.

The idea was to assert no two bits are set at the same time. But I seem to have
forgotten to actually add the assertion.

> 
> > +
> > +ImageLockMode bdrv_get_lock_mode(BlockDriverState *bs)
> > +{
> > +return bs->cur_lock;
> > +}
> > +
> > +int bdrv_set_lock_mode(BlockDriverState *bs, ImageLockMode mode)
> > +{
> > +int ret;
> > +
> > +if (bs->cur_lock == mode) {
> > +return 0;
> > +} else if (!bs->drv) {
> > +return -ENOMEDIUM;
> > +} else if (!bs->drv->bdrv_lockf) {
> > +if (bs->file) {
> > +return bdrv_set_lock_mode(bs->file->bs, mode);
> > +}
> > +return 0;
> > +}
> > +ret = bs->drv->bdrv_lockf(bs, mode);
> > +if (ret == -ENOTSUP) {
> > +/* Handle it the same way as !bs->drv->bdrv_lockf */
> > +ret = 0;
> 
> Yes, well, why do you handle both as success? Wouldn't returning
> -ENOTSUP make more sense?
> 
> I guess the caller can find out itself by checking whether bs->cur_lock
> has changed, but...

I can't think of a reason for any caller to do something different for -ENOTSUP
from success, hence the check here.

> 
> > +} else if (ret == 0) {
> > +bs->cur_lock = mode;
> > +}
> > +return ret;
> > +}
> > +
> >  static QemuOptsList bdrv_runtime_opts = {
> >  .name = "bdrv_common",
> >  .head = QTAILQ_HEAD_INITIALIZER(bdrv_runtime_opts.head),
> > @@ -1076,6 +1119,10 @@ static int bdrv_open_common(BlockDriverState *bs, 
> > BdrvChild *file,
> >  goto free_and_fail;
> >  }
> >  
> > +if (open_flags & BDRV_O_INACTIVE) {
> > +open_flags = (open_flags & ~BDRV_O_LOCK_MASK) & BDRV_O_NO_LOCK;
> 
> I suppose the second & is supposed to be a |?

Yes. Thanks for catching it.

> 
> > +}
> > +
> >  ret = refresh_total_sectors(bs, bs->total_sectors);
> >  if (ret < 0) {
> >  error_setg_errno(errp, -ret, "Could not refresh total sector 
> > count");
> > @@ -2273,6 +2320,7 @@ static void bdrv_close(BlockDriverState *bs)
> >  if (bs->drv) {
> >  BdrvChild *child, *next;
> >  
> > +bdrv_set_lock_mode(bs, IMAGE_LOCK_MODE_NOLOCK);
> >  bs->drv->bdrv_close(bs);
> >  bs->drv = NULL;
> >  
> > @@ -3188,6 +3236,9 @@ void bdrv_invalidate_cache(BlockDriverState *bs, 
> > Error **errp)
> 
> This function's name is pretty weird... Maybe it would be better to
> rename it to "bdrv_complete_incoming" or something. (Unrelated to this
> series, of course.)
> 
> >  error_setg_errno(errp, -ret, "Could not refresh total sector 
> > count");
> >  return;
> >  }
> > +if (bs->cur_lock != IMAGE_LOCK_MODE__MAX) {
> > +bdrv_set_lock_mode(bs, bs->cur_lock);
> > +}
> >  }
> >  
> >  void bdrv_invalidate_cache_all(Error **errp)
> > @@ -3230,6 +3281,7 @@ static int bdrv_inactivate_recurse(BlockDriverState 
> > *bs,
> >  }
> >  
> >  if (setting_flag) {
> > +ret = bdrv_set_lock_mode(bs, IMAGE_LOCK_MODE_NOLOCK);
> 
> Maybe it would make sense to do something with the return value...? :-)

Yes, sounds good.

Fam

Re: [Qemu-block] [PATCH v8 05/36] raw-posix: Add image locking support

2016-10-24 Thread Fam Zheng

On Sat, 10/22 01:40, Max Reitz wrote:
> On 30.09.2016 14:09, Fam Zheng wrote:
> > virtlockd in libvirt locks the first byte, we lock byte 1 to avoid
> > the intervene.
> 
> s/the intervene/a conflict/?

OK.

> 
> > 
> > Both file and host device protocols are covered.
> > 
> > The complication is with reopen. We have three different locking states,
> > namely "unlocked", "shared locked" and "exclusively locked".
> > 
> > When we reopen, the new fd may need a new locking mode. Moving away to or 
> > from
> > exclusive is a bit tricky because we cannot do it atomically. This patch 
> > solves
> > it by dup() s->fd to s->lock_fd and avoid close(), so that there isn't a 
> > racy
> > window where we drop the lock on one fd before acquiring the exclusive lock 
> > on
> > the other.
> > 
> > To make the logic easier to manage, and allow better reuse, the code is
> > internally organized by state transition table (old_lock -> new_lock).
> > 
> > Signed-off-by: Fam Zheng 
> > ---
> >  block/raw-posix.c | 318 
> > +-
> >  1 file changed, 317 insertions(+), 1 deletion(-)
> > 
> > diff --git a/block/raw-posix.c b/block/raw-posix.c
> > index 6ed7547..22de242 100644
> > --- a/block/raw-posix.c
> > +++ b/block/raw-posix.c
> > @@ -133,6 +133,7 @@ do { \
> >  
> >  typedef struct BDRVRawState {
> >  int fd;
> > +int lock_fd;
> 
> I think it would be a good idea to put a comment about the semantics of
> this lock_fd here.

Will do.

> 
> >  int type;
> >  int open_flags;
> >  size_t buf_align;
> > @@ -149,6 +150,7 @@ typedef struct BDRVRawState {
> >  
> >  typedef struct BDRVRawReopenState {
> >  int fd;
> > +int lock_fd;
> >  int open_flags;
> >  } BDRVRawReopenState;
> >  
> > @@ -367,6 +369,43 @@ static void raw_parse_flags(int bdrv_flags, int 
> > *open_flags)
> >  }
> >  }
> >  
> > +static int raw_lock_fd(int fd, ImageLockMode mode)
> > +{
> > +assert(fd >= 0);
> > +/* Locking byte 1 avoids interfereing with virtlockd. */
> > +switch (mode) {
> > +case IMAGE_LOCK_MODE_EXCLUSIVE:
> > +return qemu_lock_fd(fd, 1, 1, true);
> > +case IMAGE_LOCK_MODE_SHARED:
> > +return qemu_lock_fd(fd, 1, 1, false);
> > +case IMAGE_LOCK_MODE_NOLOCK:
> > +return qemu_unlock_fd(fd, 1, 1);
> > +default:
> > +abort();
> > +}
> > +}
> > +
> > +static int raw_lockf(BlockDriverState *bs, ImageLockMode mode)
> > +{
> > +BDRVRawState *s = bs->opaque;
> > +
> > +if (s->lock_fd < 0) {
> > +if (mode == IMAGE_LOCK_MODE_NOLOCK) {
> > +return 0;
> > +}
> > +s->lock_fd = qemu_dup(s->fd);
> > +if (s->lock_fd < 0) {
> > +return s->lock_fd;
> 
> You should probably return -errno instead (qemu_dup apparently doesn't
> return that but just -1).

Yes.

> 
> (I'm getting further and further from a high-level review, am I not?)
> 
> > +}
> > +}
> > +if (mode == IMAGE_LOCK_MODE_AUTO) {
> > +mode = bdrv_get_flags(bs) & BDRV_O_RDWR ?
> > +   IMAGE_LOCK_MODE_EXCLUSIVE :
> > +   IMAGE_LOCK_MODE_SHARED;
> > +}
> > +return raw_lock_fd(s->lock_fd, mode);
> 
> Without a comment for how lock_fd is supposed to work, this (for
> example) looks a bit weird. After all, the user is trying to lock the
> fd, and I don't find it immediately clear that lock_fd always points to
> the same file as fd, and you're using it only for locking (for reasons
> the comment should explain as well).
> 
> > +}
> > +
> >  #ifdef CONFIG_LINUX_AIO
> >  static bool raw_use_aio(int bdrv_flags)
> >  {
> > @@ -433,6 +472,7 @@ static int raw_open_common(BlockDriverState *bs, QDict 
> > *options,
> >  raw_parse_flags(bdrv_flags, &s->open_flags);
> >  
> >  s->fd = -1;
> > +s->lock_fd = -1;
> >  fd = qemu_open(filename, s->open_flags, 0644);
> >  if (fd < 0) {
> >  ret = -errno;
> > @@ -529,6 +569,268 @@ static int raw_open(BlockDriverState *bs, QDict 
> > *options, int flags,
> >  return raw_open_common(bs, options, flags, 0, errp);
> >  }
> >  
> > +typedef enum {
> > +RAW_REOPEN_PREPARE,
> > +RAW_REOPEN_COMMIT,
> > +RAW_REOPEN_ABORT
> > +} RawReopenOperation;
> > +
> > +typedef int (*RawReopenFunc)(BDRVReopenState *state,
> > + RawReopenOperation op,
> > + ImageLockMode old_lock,
> > + ImageLockMode new_lock,
> > + Error **errp);
> > +
> > +static int
> > +raw_reopen_identical(BDRVReopenState *state,
> > + RawReopenOperation op,
> > + ImageLockMode old_lock,
> > + ImageLockMode new_lock,
> > + Error **errp)
> > +{
> > +assert(old_lock == new_lock);
> > +return 0;
> > +}
> > +
> > +static int
> > +raw_reopen_from_unlock(BDRVReopenState *state,
> > +   RawReopenOperation op,

86 matches

Mail list logo