Re: [Qemu-block] [PATCH v2] hw/block/fdc: floppy command FIFO memory initialization

2019-05-29 Thread Andrey Shinkevich
On 30/05/2019 00:38, Max Reitz wrote:
> On 29.05.19 20:20, Andrey Shinkevich wrote:
>> The uninitialized memory allocated for the command FIFO of the
>> floppy controller during the VM hardware initialization incurs
>> many unwanted reports by Valgrind when VM state is being saved.
>> That verbosity hardens a search for the real memory issues when
>> the iotests run. Particularly, the patch eliminates 20 unnecessary
>> reports of the Valgrind tool in the iotest #169.
>>
>> Signed-off-by: Andrey Shinkevich 
>> ---
>> v2:
>>01: The pointer unnecessary check 'if (fdctrl->fifo)' was removed
>>as suggested by John.
>>
>>   hw/block/fdc.c | 1 +
>>   1 file changed, 1 insertion(+)
> 
> Thanks, applied to my block-on-kevin branch:
> 
> https://git.xanclic.moe/XanClic/qemu/commits/branch/block-on-kevin
> 
> (To become my block branch when my current pull request is done.)
> 
> Max
> 

Thank you very much.
Andrey


Re: [Qemu-block] [RFC 1/3] block: Add ImageRotationalInfo

2019-05-29 Thread Kevin Wolf
Am 24.05.2019 um 19:28 hat Max Reitz geschrieben:
> This enum indicates whether a file is stored on a rotating disk or a
> solid-state drive.  Drivers report it via the .bdrv_get_info() callback,
> and if they do not, the global bdrv_get_info() implementation
> automatically takes it from bs->file or bs->backing, if available.

Good that you wrote "bs->file or bs->backing" explicitly. Otherwise, I
might have missed that it begs one big question: What is the correct
answer for a qcow2 file that has bs->file on an SSD, but bs->backing on
a rotating disk?

I don't think there is a correct answer for the whole device, so maybe
this information shouldn't be per device in BlockDriverInfo, but per
block in bdrv_co_block_status() (optionally determined if the caller
requests it)?

Kevin



Re: [Qemu-block] [Qemu-devel] [PATCH v2 2/6] tests/qapi-schema: Test for good feature lists in structs

2019-05-29 Thread Kevin Wolf
Am 24.05.2019 um 15:29 hat Markus Armbruster geschrieben:
> Let's add
> 
>{ 'command': 'test-features',
>  'data': { 'fs0': 'FeatureStruct0',
>'fs1': 'FeatureStruct1',
>'fs2': 'FeatureStruct2',
>'fs3': 'FeatureStruct3',
>'cfs1': 'CondFeatureStruct1',
>'cfs2': 'CondFeatureStruct2',
>'cfs3': 'CondFeatureStruct3' } }
> 
> because without it, the feature test cases won't generate introspection
> code.

Of course, like everything else you requested, I'll just do this to get
the series off my table, but I'm still curious: Where would
introspection code ever be generated for the test cases? I saw neither
test code that generates the source files nor reference output that it
would be compared against.

Kevin



Re: [Qemu-block] [Qemu-devel] [PATCH v2 4/6] qapi: Disentangle QAPIDoc code

2019-05-29 Thread Kevin Wolf
Am 24.05.2019 um 18:11 hat Markus Armbruster geschrieben:
> Kevin Wolf  writes:
> 
> > Documentation comment follow a certain structure: First, we have a text
> > with a general description (called QAPIDoc.body). After this,
> > descriptions of the arguments follow. Finally, we have part that
> > contains various named sections.
> >
> > The code doesn't show this structure but just checks the right side
> > conditions so it happens to do the right set of things in the right
> 
> What are "side conditions"?
> 
> > phase. This is hard to follow, and adding support for documentation of
> > features would be even harder.
> >
> > This restructures the code so that the three parts are clearly
> > separated. The code becomes a bit longer, but easier to follow.
> 
> Recommend to mention that output remains unchanged.
> 
> >
> > Signed-off-by: Kevin Wolf 
> > ---
> >  scripts/qapi/common.py | 107 -
> >  1 file changed, 83 insertions(+), 24 deletions(-)
> >
> > diff --git a/scripts/qapi/common.py b/scripts/qapi/common.py
> > index 71944e2e30..1d0f4847db 100644
> > --- a/scripts/qapi/common.py
> > +++ b/scripts/qapi/common.py
> > @@ -120,6 +120,27 @@ class QAPIDoc(object):
> >  def connect(self, member):
> >  self.member = member
> >  
> > +class SymbolPart:
> > +"""
> > +Describes which part of the documentation we're parsing right now.
> 
> So SymbolPart is a part of the documentation.  Shouldn't it be named
> DocPart then?

That's a better name. I was stuck in the old code (which was concerned
about what a symbol name means at which point) rather than thinking
about high-level concepts.

> > +
> > +BODY means that we're ready to process freeform text into 
> > self.body. A
> 
> s/freeform/free-form/

Both are valid spellings and I generally don't expect correct spellings
to be corrected, but arguably "free-form" is more standard. I'll change
it. (If we were consistent, the method should have been named
_append_free_form rather than _append_freeform originally...)

> > +symbol name is only allowed if no other text was parsed yet. It is
> 
> Start your sentences with a capital letter.

I would gladly correct a sentence not starting with a capital letter if
I could see any. The quoted sentence starts with a capital "A" in the
previous line.

> > +interpreted as the symbol name that describes the currently 
> > documented
> > +object. On getting the second symbol name, we proceed to ARGS.
> > +
> > +ARGS means that we're parsing the arguments section. Any symbol 
> > name is
> > +interpreted as an argument and an ArgSection is created for it.
> > +
> > +VARIOUS is the final part where freeform sections may appear. This
> > +includes named sections such as "Return:" as well as unnamed
> > +paragraphs. No symbols are allowed any more in this part.
> 
> s/any more/anymore/

Again both are valid, but this time, "any more" is the more standard
spelling.

> > +# Can't make it a subclass of Enum because of Python 2
> 
> Thanks for documenting Python 2 contortions!  Let's phrase it as a TODO
> comment.
> 
> > +BODY = 0
> 
> Any particular reason for 0?
> 
> As far as I can tell, Python enum values commonly start with 1, to make
> them all true.

If you use enums in a boolean context, you're doing something wrong
anyway. *shrug*

I'll change it, it's consistent with real Enum classes where the values
becomes non-integer objects (which therefore evaluate as True in boolean
contexts).

> > +ARGS = 1
> > +VARIOUS = 2
> > +
> >  def __init__(self, parser, info):
> >  # self._parser is used to report errors with QAPIParseError.  The
> >  # resulting error position depends on the state of the parser.
> > @@ -135,6 +156,7 @@ class QAPIDoc(object):
> >  self.sections = []
> >  # the current section
> >  self._section = self.body
> > +self._part = QAPIDoc.SymbolPart.BODY
> 
> The right hand side is tiresome, but I don't have better ideas.

This is just what Python enums look like... I could move the class
outside of QAPIDoc to save that part of the prefix, but I'd prefer not
to.

> >  
> >  def has_section(self, name):
> >  """Return True if we have a section with this name."""
> > @@ -154,37 +176,84 @@ class QAPIDoc(object):
>def append(self, line):
>"""Parse a comment line and add it to the documentation."""
>line = line[1:]
>if not line:
>self._append_freeform(line)
>return
> 
>if line[0] != ' ':
> >  raise QAPIParseError(self._parser, "Missing space after #")
> >  line = line[1:]
> >  
> > +if self._part == QAPIDoc.SymbolPart.BODY:
> > +self._append_body_line(line)
> > +elif self._part == QAPIDoc.SymbolPart.ARGS:
> > +self._append_args_line(li

Re: [Qemu-block] [PATCH v2] hw/block/fdc: floppy command FIFO memory initialization

2019-05-29 Thread Max Reitz
On 29.05.19 20:20, Andrey Shinkevich wrote:
> The uninitialized memory allocated for the command FIFO of the
> floppy controller during the VM hardware initialization incurs
> many unwanted reports by Valgrind when VM state is being saved.
> That verbosity hardens a search for the real memory issues when
> the iotests run. Particularly, the patch eliminates 20 unnecessary
> reports of the Valgrind tool in the iotest #169.
> 
> Signed-off-by: Andrey Shinkevich 
> ---
> v2:
>   01: The pointer unnecessary check 'if (fdctrl->fifo)' was removed
>   as suggested by John.
> 
>  hw/block/fdc.c | 1 +
>  1 file changed, 1 insertion(+)

Thanks, applied to my block-on-kevin branch:

https://git.xanclic.moe/XanClic/qemu/commits/branch/block-on-kevin

(To become my block branch when my current pull request is done.)

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v2] hw/block/fdc: floppy command FIFO memory initialization

2019-05-29 Thread John Snow



On 5/29/19 2:20 PM, Andrey Shinkevich wrote:
> The uninitialized memory allocated for the command FIFO of the
> floppy controller during the VM hardware initialization incurs
> many unwanted reports by Valgrind when VM state is being saved.
> That verbosity hardens a search for the real memory issues when
> the iotests run. Particularly, the patch eliminates 20 unnecessary
> reports of the Valgrind tool in the iotest #169.
> 
> Signed-off-by: Andrey Shinkevich 
> ---
> v2:
>   01: The pointer unnecessary check 'if (fdctrl->fifo)' was removed
>   as suggested by John.
> 
>  hw/block/fdc.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/block/fdc.c b/hw/block/fdc.c
> index 6f19f12..9af762b 100644
> --- a/hw/block/fdc.c
> +++ b/hw/block/fdc.c
> @@ -2647,6 +2647,7 @@ static void fdctrl_realize_common(DeviceState *dev, 
> FDCtrl *fdctrl,
>  
>  FLOPPY_DPRINTF("init controller\n");
>  fdctrl->fifo = qemu_memalign(512, FD_SECTOR_LEN);
> +memset(fdctrl->fifo, 0, FD_SECTOR_LEN);
>  fdctrl->fifo_size = 512;
>  fdctrl->result_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
>   fdctrl_result_timer, fdctrl);
> 

I guess technically I would send a PR for this but it's just a single
patch, so it'd be nice if it can just get staged in the next block
roundup by whomever.

Max/Kevin, if you would be so kind?

--js



Re: [Qemu-block] [PATCH v2] hw/block/fdc: floppy command FIFO memory initialization

2019-05-29 Thread John Snow



On 5/29/19 2:20 PM, Andrey Shinkevich wrote:
> The uninitialized memory allocated for the command FIFO of the
> floppy controller during the VM hardware initialization incurs
> many unwanted reports by Valgrind when VM state is being saved.
> That verbosity hardens a search for the real memory issues when
> the iotests run. Particularly, the patch eliminates 20 unnecessary
> reports of the Valgrind tool in the iotest #169.
> 
> Signed-off-by: Andrey Shinkevich 
> ---
> v2:
>   01: The pointer unnecessary check 'if (fdctrl->fifo)' was removed
>   as suggested by John.
> 
>  hw/block/fdc.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/block/fdc.c b/hw/block/fdc.c
> index 6f19f12..9af762b 100644
> --- a/hw/block/fdc.c
> +++ b/hw/block/fdc.c
> @@ -2647,6 +2647,7 @@ static void fdctrl_realize_common(DeviceState *dev, 
> FDCtrl *fdctrl,
>  
>  FLOPPY_DPRINTF("init controller\n");
>  fdctrl->fifo = qemu_memalign(512, FD_SECTOR_LEN);
> +memset(fdctrl->fifo, 0, FD_SECTOR_LEN);
>  fdctrl->fifo_size = 512;
>  fdctrl->result_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
>   fdctrl_result_timer, fdctrl);
> 

Great, thanks!

Reviewed-by: John Snow 



[Qemu-block] [PATCH v2] hw/block/fdc: floppy command FIFO memory initialization

2019-05-29 Thread Andrey Shinkevich
The uninitialized memory allocated for the command FIFO of the
floppy controller during the VM hardware initialization incurs
many unwanted reports by Valgrind when VM state is being saved.
That verbosity hardens a search for the real memory issues when
the iotests run. Particularly, the patch eliminates 20 unnecessary
reports of the Valgrind tool in the iotest #169.

Signed-off-by: Andrey Shinkevich 
---
v2:
  01: The pointer unnecessary check 'if (fdctrl->fifo)' was removed
  as suggested by John.

 hw/block/fdc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index 6f19f12..9af762b 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -2647,6 +2647,7 @@ static void fdctrl_realize_common(DeviceState *dev, 
FDCtrl *fdctrl,
 
 FLOPPY_DPRINTF("init controller\n");
 fdctrl->fifo = qemu_memalign(512, FD_SECTOR_LEN);
+memset(fdctrl->fifo, 0, FD_SECTOR_LEN);
 fdctrl->fifo_size = 512;
 fdctrl->result_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
  fdctrl_result_timer, fdctrl);
-- 
1.8.3.1




Re: [Qemu-block] [PATCH 3/3] block/qcow2-bitmap: rewrite bitmap reopening logic

2019-05-29 Thread Max Reitz
On 29.05.19 17:58, Vladimir Sementsov-Ogievskiy wrote:
> 29.05.2019 18:33, Max Reitz wrote:
>> On 23.05.19 17:47, Vladimir Sementsov-Ogievskiy wrote:
>>> Current logic
>>> =
>>>
>>> Reopen rw -> ro:
>>>
>>> Store bitmaps and release BdrvDirtyBitmap's.
>>>
>>> Reopen ro -> rw:
>>>
>>> Load bitmap list
>>> Skip bitmaps which for which we don't have BdrvDirtyBitmap [this is
>>> the worst thing]
>>> A kind of fail with error message if we see not readonly bitmap
>>>
>>> This leads to at least the following bug:
>>>
>>> Assume we have bitmap0.
>>> Create external snapshot
>>>bitmap0 is stored and therefore removed
>>> Commit snapshot
>>>now we have no bitmaps
>>> Do some writes from guest (*)
>>>they are not marked in bitmap
>>> Shutdown
>>> Start
>>>bitmap0 is loaded as valid, but it is actually broken! It misses
>>>writes (*)
>>> Incremental backup
>>>it will be inconsistent
>>>
>>> New logic
>>> =
>>>
>>> Reopen rw -> ro:
>>>
>>> Store bitmaps and don't remove them from RAM, only mark them readonly
>>> (the latter is already done, but didn't work properly because of
>>> removing in storing function)
>>>
>>> Reopen to rw (don't filter case with reopen rw -> rw, it is supported
>>> now in qcow2_reopen_bitmaps_rw)
>>>
>>> Load bitmap list
>>>
>>> Carefully consider all possible combinations of bitmaps in RAM and in
>>> the image, try to fix corruptions, print corresponding error messages.
>>>
>>> Also, call qcow2_reopen_bitmaps_rw after the whole reopen queue
>>> commited, as otherwise we can't write to the qcow2 file child on reopen
>>> ro -> rw.
>>>
>>> Also, add corresponding test-cases to 255.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> ---
>>>   block/qcow2.h  |   5 +-
>>>   include/block/block_int.h  |   2 +-
>>>   block.c|  29 ++
>>>   block/qcow2-bitmap.c   | 197 ++---
>>>   block/qcow2.c  |   2 +-
>>>   tests/qemu-iotests/255 |   2 +
>>>   tests/qemu-iotests/255.out |  35 +++
>>>   7 files changed, 193 insertions(+), 79 deletions(-)
>>
>> [...]
>>
>>> diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
>>> index 2b84bfa007..4e565db11f 100644
>>> --- a/block/qcow2-bitmap.c
>>> +++ b/block/qcow2-bitmap.c
>>
>> [...]
>>
>>> @@ -1103,76 +1110,150 @@ Qcow2BitmapInfoList 
>>> *qcow2_get_bitmap_info_list(BlockDriverState *bs,
>>>   return list;
>>>   }
>>>   
>>> -int qcow2_reopen_bitmaps_rw_hint(BlockDriverState *bs, bool 
>>> *header_updated,
>>> - Error **errp)
>>> +/*
>>> + * qcow2_reopen_bitmaps_rw
>>> + *
>>> + * The function is called in bdrv_reopen_multiple after all calls to
>>> + * bdrv_reopen_commit, when final bs is writable. It is done so, as we need
>>> + * write access to child bs, and with current reopening architecture, when
>>> + * reopen ro -> rw it is possible only after all reopen commits.
>>> + *
>>> + * So, we can't fail here.
>>
>> Why?  Because the current design forbids it?
>>
>> Then the current design seems flawed to me.
>>
>> [CC-ing Berto]
>>
>> Without having looked too close into this, it seems from your commit
>> message that it is possible to run into unrecoverable fatal errors here.
>>   We cannot just ignore that on the basis that the current design cannot
>> deal with that.
>>
>> It just appears wrong to me to update any flags in the image in the
>> .commit() part of a transaction.  We should try to do so in .prepare().
>>   If it works, great, we’re set for .commit().  If it doesn’t, welp, time
>> for .abort() and pretend the image is still read-only (even though it
>> kind of may be half-prepared for writing).
>>
>> If we fail to set IN_USE in .commit(), that’s a fatal error in my opinion.
>>
>> After some chatting with John, I’ve come to the following belief:
>>
>> When switching a node from RO to R/W, it must be able to write to its
>> children in .prepare() -- precisely because performing this switch may
>> need some metadata change.  If we cannot do this change, then we cannot
>> switch the node to R/W.  So it’s clear to me that this needs to be done
>> in .prepare().
>>
>> So I think a node’s children must be R/W before we can .prepare() the
>> node.  That means we need to .commit() them.  That means we cannot have
>> a single transaction that switches a whole tree to be R/W, but it must
>> consist of one transaction per level.
>>
>> If something fails, we need to roll back all the previous layers.  Well,
>> it depends.
>>
>> If we switch to R/W (and something fails), then we need to try to revert
>> the children we have already made R/W to be RO.  If that doesn’t work,
>> welp, too bad, but not fatal.
> 
> And, than, why not do full reopen rw in .prepare, and just organize
> recursion so that children reopened rw before parent?

(1) Because .abort() must always succeed.  What I’m proposing is a
solution where we have to actively roll back, and that may not always work.

(2) 

Re: [Qemu-block] [Qemu-devel] [PATCH 3/3] block/qcow2-bitmap: rewrite bitmap reopening logic

2019-05-29 Thread John Snow
Max has picked this thread up for block discussion, so I'm going to
stick to slightly more bitmap related discussion here; we'll resume
block discussion in the other tail of this thread.

On 5/29/19 5:10 AM, Vladimir Sementsov-Ogievskiy wrote:
> 29.05.2019 2:24, John Snow wrote:
>>
>>
>> On 5/23/19 11:47 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Current logic
>>> =
>>>
>>> Reopen rw -> ro:
>>>
>>> Store bitmaps and release BdrvDirtyBitmap's.
>>>
>>> Reopen ro -> rw:
>>>
>>> Load bitmap list
>>> Skip bitmaps which for which we don't have BdrvDirtyBitmap [this is
>>> the worst thing]
>>
>> ...ah.
>>
>>> A kind of fail with error message if we see not readonly bitmap
>>>
>>> This leads to at least the following bug:
>>>
>>> Assume we have bitmap0.
>>
>> Presumably a normal, persistent bitmap.
>>
>>> Create external snapshot
>>>bitmap0 is stored and therefore removed
>>
>> Written out to the backing file and removed from memory, because we've
>> reopened rw->ro; this is because of the migration "safety" clause where
>> we simply drop these bitmaps.
>>
>> ...Ah, and then we don't actually open them readonly again; that entire
>> stanza in reopen_ro never fires off at all because the bitmaps are
>> already gone.
>>
>>> Commit snapshot
>>>now we have no bitmaps
>>
>> When we reopen the base as rw as you note, we skipped bitmaps for which
>> we had no in-memory bitmap for -- because the readonly logic was really
>> expecting to have these in-memory.
>>
>> I should probably say that for the sake of migration correctness, the
>> way we flush things to disk and remove it from memory on write is really
>> bothersome to keep correct. The migration logic is so particular that it
>> keeps causing issues elsewhere, of which this is another symptom.
>>
>>> Do some writes from guest (*)
>>>they are not marked in bitmap
>>
>> Yikes, right.
>>
>>> Shutdown
>>> Start
>>>bitmap0 is loaded as valid, but it is actually broken! It misses
>>>writes (*)
>>
>> Yikes; because it was consistent on flush and we skipped it on load,
>> it's not marked as IN_USE and we are free to load it up again.
>>
>>> Incremental backup
>>>it will be inconsistent
>>>
>>
>> Good writeup, thank you.
>>
>>> New logic
>>> =
>>>
>>> Reopen rw -> ro:
>>>
>>> Store bitmaps and don't remove them from RAM, only mark them readonly
>>> (the latter is already done, but didn't work properly because of
>>> removing in storing function)
>>>
>>
>> Yes! I like this change.
>>
>>> Reopen to rw (don't filter case with reopen rw -> rw, it is supported
>>> now in qcow2_reopen_bitmaps_rw)
>>>
>>
>> OK, so we allow rw --> rw without trying to be fussy about it. That
>> seems fine to me.
>>
>>> Load bitmap list
>>>
>>> Carefully consider all possible combinations of bitmaps in RAM and in
>>> the image, try to fix corruptions, print corresponding error messages.
>>>
>>> Also, call qcow2_reopen_bitmaps_rw after the whole reopen queue
>>> commited, as otherwise we can't write to the qcow2 file child on reopen
>>> ro -> rw.
>>>
>>
>> I take it this is the change that moved the invocation logic into
>> bdrv_reopen_multiple instead of bdrv_reopen_commit; also notably we no
>> longer check the transition edge which is much simpler.
>>
>> oh, I see why you're doing it there now...
>>
>>> Also, add corresponding test-cases to 255.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> ---
>>>   block/qcow2.h  |   5 +-
>>>   include/block/block_int.h  |   2 +-
>>>   block.c|  29 ++
>>>   block/qcow2-bitmap.c   | 197 ++---
>>>   block/qcow2.c  |   2 +-
>>>   tests/qemu-iotests/255 |   2 +
>>>   tests/qemu-iotests/255.out |  35 +++
>>>   7 files changed, 193 insertions(+), 79 deletions(-)
>>>
>>> diff --git a/block/qcow2.h b/block/qcow2.h
>>> index 8d92ef1fee..5928306e62 100644
>>> --- a/block/qcow2.h
>>> +++ b/block/qcow2.h
>>> @@ -725,9 +725,10 @@ Qcow2BitmapInfoList 
>>> *qcow2_get_bitmap_info_list(BlockDriverState *bs,
>>>   Error **errp);
>>>   int qcow2_reopen_bitmaps_rw_hint(BlockDriverState *bs, bool 
>>> *header_updated,
>>>Error **errp);
>>> -int qcow2_reopen_bitmaps_rw(BlockDriverState *bs, Error **errp);
>>> +void qcow2_reopen_bitmaps_rw(BlockDriverState *bs);
>>>   int qcow2_truncate_bitmaps_check(BlockDriverState *bs, Error **errp);
>>> -void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, Error 
>>> **errp);
>>> +void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
>>> +  bool release_stored, Error 
>>> **errp);
>>>   int qcow2_reopen_bitmaps_ro(BlockDriverState *bs, Error **errp);
>>>   bool qcow2_can_store_new_dirty_bitmap(BlockDriverState *bs,
>>> const char *name,
>>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>>> index 1

[Qemu-block] [PATCH v7 1/3] block: include base when checking image chain for block allocation

2019-05-29 Thread Andrey Shinkevich
This patch is used in the 'block/stream: introduce a bottom node'
that is following. Instead of the base node, the caller may pass
the node that has the base as its backing image to the function
bdrv_is_allocated_above() with a new parameter include_base = true
and get rid of the dependency on the base that may change during
commit/stream parallel jobs. Now, if the specified base is not
found in the backing image chain, the QEMU will abort.

Suggested-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Andrey Shinkevich 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Alberto Garcia 
---
 block/commit.c|  2 +-
 block/io.c| 21 +++--
 block/mirror.c|  2 +-
 block/replication.c   |  2 +-
 block/stream.c|  2 +-
 include/block/block.h |  3 ++-
 6 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index 14e5bb3..62ca90b 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -176,7 +176,7 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
 break;
 }
 /* Copy if allocated above the base */
-ret = bdrv_is_allocated_above(blk_bs(s->top), blk_bs(s->base),
+ret = bdrv_is_allocated_above(blk_bs(s->top), blk_bs(s->base), false,
   offset, COMMIT_BUFFER_SIZE, &n);
 copy = (ret == 1);
 trace_commit_one_iteration(s, offset, n, ret);
diff --git a/block/io.c b/block/io.c
index 3134a60..43cefc3 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2287,10 +2287,11 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState 
*bs, int64_t offset,
 /*
  * Given an image chain: ... -> [BASE] -> [INTER1] -> [INTER2] -> [TOP]
  *
- * Return true if (a prefix of) the given range is allocated in any image
- * between BASE and TOP (inclusive).  BASE can be NULL to check if the given
- * offset is allocated in any image of the chain.  Return false otherwise,
- * or negative errno on failure.
+ * Return 1 if (a prefix of) the given range is allocated in any image
+ * between BASE and TOP (BASE is only included if include_base is set).
+ * BASE can be NULL to check if the given offset is allocated in any
+ * image of the chain.  Return 0 otherwise, or negative errno on
+ * failure.
  *
  * 'pnum' is set to the number of bytes (including and immediately
  * following the specified offset) that are known to be in the same
@@ -2302,17 +2303,21 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState 
*bs, int64_t offset,
  */
 int bdrv_is_allocated_above(BlockDriverState *top,
 BlockDriverState *base,
-int64_t offset, int64_t bytes, int64_t *pnum)
+bool include_base, int64_t offset,
+int64_t bytes, int64_t *pnum)
 {
 BlockDriverState *intermediate;
 int ret;
 int64_t n = bytes;
 
+assert(base || !include_base);
+
 intermediate = top;
-while (intermediate && intermediate != base) {
+while (include_base || intermediate != base) {
 int64_t pnum_inter;
 int64_t size_inter;
 
+assert(intermediate);
 ret = bdrv_is_allocated(intermediate, offset, bytes, &pnum_inter);
 if (ret < 0) {
 return ret;
@@ -2331,6 +2336,10 @@ int bdrv_is_allocated_above(BlockDriverState *top,
 n = pnum_inter;
 }
 
+if (intermediate == base) {
+break;
+}
+
 intermediate = backing_bs(intermediate);
 }
 
diff --git a/block/mirror.c b/block/mirror.c
index ec4bd9f..81c2967 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -807,7 +807,7 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
 return 0;
 }
 
-ret = bdrv_is_allocated_above(bs, base, offset, bytes, &count);
+ret = bdrv_is_allocated_above(bs, base, false, offset, bytes, &count);
 if (ret < 0) {
 return ret;
 }
diff --git a/block/replication.c b/block/replication.c
index 3d4dedd..fc8d2ad 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -272,7 +272,7 @@ static coroutine_fn int 
replication_co_writev(BlockDriverState *bs,
 while (remaining_sectors > 0) {
 int64_t count;
 
-ret = bdrv_is_allocated_above(top->bs, base->bs,
+ret = bdrv_is_allocated_above(top->bs, base->bs, false,
   sector_num * BDRV_SECTOR_SIZE,
   remaining_sectors * BDRV_SECTOR_SIZE,
   &count);
diff --git a/block/stream.c b/block/stream.c
index 1a906fd..97fddb2 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -160,7 +160,7 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
 } else if (ret >= 0) {
 /* Copy if allocated in the intermediate images.  Limit to the
  * known-unallocated area [offset, offset+n*BDRV_SECTOR_SIZE).  */
-ret = bdrv_is_a

[Qemu-block] [PATCH v7 0/3] block/stream: get rid of the base

2019-05-29 Thread Andrey Shinkevich
This series introduces a bottom intermediate node that eliminates the
dependency on the base that may change while stream job is running.
It happens when stream/commit parallel jobs are running on the same
backing chain. The base node of the stream job may be a top node of
the parallel commit job and can change before the stream job is
completed. We avoid that dependency by introducing the bottom node.

v7: [resend by Andrey]
  01: assert(intermediate) was inserted before the call to
  bdrv_is_allocated() in the intermediate node loop of the
  bdrv_is_allocated_above() as suggested by Max.
  02: The change of the intermediate node loop in the stream_start() was
  rolled back to its original design and the reassignment of the base
  node pointer was added as Vladimir and Max suggested. The relevant
  comment was amended.

v6: [resend by Vladimir]
  01: improve comment in block/io.c, suggested by Alberto

v5: [resend by Vladimir]
  01: use comment wording in block/io.c suggested by Alberto

v4:
trace_stream_start reverted to the base.
bdrv_is_allocated_above_inclusive() deleted and the new parameter
'bool include_base' was added to the bdrv_is_allocated_above().

Andrey Shinkevich (3):
  block: include base when checking image chain for block allocation
  block/stream: refactor stream_run: drop goto
  block/stream: introduce a bottom node

 block/commit.c |  2 +-
 block/io.c | 21 +--
 block/mirror.c |  2 +-
 block/replication.c|  2 +-
 block/stream.c | 56 --
 include/block/block.h  |  3 ++-
 tests/qemu-iotests/245 |  4 ++--
 7 files changed, 49 insertions(+), 41 deletions(-)

-- 
1.8.3.1




[Qemu-block] [PATCH v7 3/3] block/stream: introduce a bottom node

2019-05-29 Thread Andrey Shinkevich
The bottom node is the intermediate block device that has the base as its
backing image. It is used instead of the base node while a block stream
job is running to avoid dependency on the base that may change due to the
parallel jobs. The change may take place due to a filter node as well that
is inserted between the base and the intermediate bottom node. It occurs
when the base node is the top one for another commit or stream job.
After the introduction of the bottom node, don't freeze its backing child,
that's the base, anymore.

Suggested-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Andrey Shinkevich 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Alberto Garcia 
---
 block/stream.c | 43 +++
 tests/qemu-iotests/245 |  4 ++--
 2 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index 65b13b2..cd5e2ba 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -31,7 +31,7 @@ enum {
 
 typedef struct StreamBlockJob {
 BlockJob common;
-BlockDriverState *base;
+BlockDriverState *bottom;
 BlockdevOnError on_error;
 char *backing_file_str;
 bool bs_read_only;
@@ -54,7 +54,7 @@ static void stream_abort(Job *job)
 
 if (s->chain_frozen) {
 BlockJob *bjob = &s->common;
-bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->base);
+bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->bottom);
 }
 }
 
@@ -63,11 +63,11 @@ static int stream_prepare(Job *job)
 StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
 BlockJob *bjob = &s->common;
 BlockDriverState *bs = blk_bs(bjob->blk);
-BlockDriverState *base = s->base;
+BlockDriverState *base = backing_bs(s->bottom);
 Error *local_err = NULL;
 int ret = 0;
 
-bdrv_unfreeze_backing_chain(bs, base);
+bdrv_unfreeze_backing_chain(bs, s->bottom);
 s->chain_frozen = false;
 
 if (bs->backing) {
@@ -110,7 +110,7 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
 StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
 BlockBackend *blk = s->common.blk;
 BlockDriverState *bs = blk_bs(blk);
-BlockDriverState *base = s->base;
+bool enable_cor = !backing_bs(s->bottom);
 int64_t len;
 int64_t offset = 0;
 uint64_t delay_ns = 0;
@@ -119,7 +119,8 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
 int64_t n = 0; /* bytes */
 void *buf;
 
-if (!bs->backing) {
+if (bs == s->bottom) {
+/* Nothing to stream */
 return 0;
 }
 
@@ -136,7 +137,7 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
  * backing chain since the copy-on-read operation does not take base into
  * account.
  */
-if (!base) {
+if (enable_cor) {
 bdrv_enable_copy_on_read(bs);
 }
 
@@ -159,9 +160,8 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
 } else if (ret >= 0) {
 /* Copy if allocated in the intermediate images.  Limit to the
  * known-unallocated area [offset, offset+n*BDRV_SECTOR_SIZE).  */
-ret = bdrv_is_allocated_above(backing_bs(bs), base, false,
+ret = bdrv_is_allocated_above(backing_bs(bs), s->bottom, true,
   offset, n, &n);
-
 /* Finish early if end of backing file has been reached */
 if (ret == 0 && n == 0) {
 n = len - offset;
@@ -198,7 +198,7 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
 }
 }
 
-if (!base) {
+if (enable_cor) {
 bdrv_disable_copy_on_read(bs);
 }
 
@@ -230,8 +230,10 @@ void stream_start(const char *job_id, BlockDriverState *bs,
 StreamBlockJob *s;
 BlockDriverState *iter;
 bool bs_read_only;
+int basic_flags = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED;
+BlockDriverState *bottom = bdrv_find_overlay(bs, base);
 
-if (bdrv_freeze_backing_chain(bs, base, errp) < 0) {
+if (bdrv_freeze_backing_chain(bs, bottom, errp) < 0) {
 return;
 }
 
@@ -248,10 +250,8 @@ void stream_start(const char *job_id, BlockDriverState *bs,
  * already have our own plans. Also don't allow resize as the image size is
  * queried only at the job start and then cached. */
 s = block_job_create(job_id, &stream_job_driver, NULL, bs,
- BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
- BLK_PERM_GRAPH_MOD,
- BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
- BLK_PERM_WRITE,
+ basic_flags | BLK_PERM_GRAPH_MOD,
+ basic_flags | BLK_PERM_WRITE,
  speed, creation_flags, NULL, NULL, errp);
 if (!s) {
 goto fail;
@@ -259,15 +259,18 @@ void stream_start(const char *job_id, BlockDriverState 
*bs,
 
 /* Block all intermediate nodes between

[Qemu-block] [PATCH v7 2/3] block/stream: refactor stream_run: drop goto

2019-05-29 Thread Andrey Shinkevich
The goto is unnecessary in the stream_run() since the common exit
code was removed in the commit eb23654dbe43b549ea2a9ebff9d8e:
"jobs: utilize job_exit shim".

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Andrey Shinkevich 
Reviewed-by: Alberto Garcia 
Reviewed-by: Max Reitz 
---
 block/stream.c | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index 97fddb2..65b13b2 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -120,13 +120,12 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
 void *buf;
 
 if (!bs->backing) {
-goto out;
+return 0;
 }
 
 len = bdrv_getlength(bs);
 if (len < 0) {
-ret = len;
-goto out;
+return len;
 }
 job_progress_set_remaining(&s->common.job, len);
 
@@ -203,14 +202,10 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
 bdrv_disable_copy_on_read(bs);
 }
 
-/* Do not remove the backing file if an error was there but ignored.  */
-ret = error;
-
 qemu_vfree(buf);
 
-out:
-/* Modify backing chain and close BDSes in main loop */
-return ret;
+/* Do not remove the backing file if an error was there but ignored. */
+return error;
 }
 
 static const BlockJobDriver stream_job_driver = {
-- 
1.8.3.1




[Qemu-block] [RFC PATCH] qemu-io-cmds: use clock_gettime for benchmarking

2019-05-29 Thread Alex Bennée
The previous use of gettimeofday() ran into undefined behaviour when
we ended up doing a div 0 for a very short operation. This is because
gettimeofday only works at the microsecond level as well as being
prone to discontinuous jumps in system time. Using clock_gettime with
CLOCK_MONOTONIC gives greater precision and alleviates some of the
potential problems with time jumping around.

We could use CLOCK_MONOTONIC_RAW to avoid being tripped up by NTP and
adjtime but that is Linux specific so I decided it would do for now.

Signed-off-by: Alex Bennée 
---
 qemu-io-cmds.c | 69 +-
 1 file changed, 35 insertions(+), 34 deletions(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 30a7d9a13bf..4ace2969a9e 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -248,20 +248,21 @@ static void cvtstr(double value, char *str, size_t size)
 
 
 
-static struct timeval tsub(struct timeval t1, struct timeval t2)
+static struct timespec tsub(struct timespec t1, struct timespec t2)
 {
-t1.tv_usec -= t2.tv_usec;
-if (t1.tv_usec < 0) {
-t1.tv_usec += 100;
+t1.tv_nsec -= t2.tv_nsec;
+if (t1.tv_nsec < 0) {
+t1.tv_nsec += 10;
 t1.tv_sec--;
 }
 t1.tv_sec -= t2.tv_sec;
 return t1;
 }
 
-static double tdiv(double value, struct timeval tv)
+static double tdiv(double value, struct timespec tv)
 {
-return value / ((double)tv.tv_sec + ((double)tv.tv_usec / 100.0));
+double time = (double)tv.tv_sec + ((double)tv.tv_nsec / 10.0);
+return value / time;
 }
 
 #define HOURS(sec)  ((sec) / (60 * 60))
@@ -274,16 +275,16 @@ enum {
 VERBOSE_FIXED_TIME  = 0x2,
 };
 
-static void timestr(struct timeval *tv, char *ts, size_t size, int format)
+static void timestr(struct timespec *tv, char *ts, size_t size, int format)
 {
-double usec = (double)tv->tv_usec / 100.0;
+double nsec = (double)tv->tv_nsec / 10.0;
 
 if (format & TERSE_FIXED_TIME) {
 if (!HOURS(tv->tv_sec)) {
 snprintf(ts, size, "%u:%02u.%02u",
 (unsigned int) MINUTES(tv->tv_sec),
 (unsigned int) SECONDS(tv->tv_sec),
-(unsigned int) (usec * 100));
+(unsigned int) (nsec * 10));
 return;
 }
 format |= VERBOSE_FIXED_TIME; /* fallback if hours needed */
@@ -294,9 +295,9 @@ static void timestr(struct timeval *tv, char *ts, size_t 
size, int format)
 (unsigned int) HOURS(tv->tv_sec),
 (unsigned int) MINUTES(tv->tv_sec),
 (unsigned int) SECONDS(tv->tv_sec),
-(unsigned int) (usec * 100));
+(unsigned int) (nsec * 10));
 } else {
-snprintf(ts, size, "0.%04u sec", (unsigned int) (usec * 1));
+snprintf(ts, size, "0.%04u sec", (unsigned int) (nsec * 1000));
 }
 }
 
@@ -376,7 +377,7 @@ static void dump_buffer(const void *buffer, int64_t offset, 
int64_t len)
 }
 }
 
-static void print_report(const char *op, struct timeval *t, int64_t offset,
+static void print_report(const char *op, struct timespec *t, int64_t offset,
  int64_t count, int64_t total, int cnt, bool Cflag)
 {
 char s1[64], s2[64], ts[64];
@@ -649,7 +650,7 @@ static const cmdinfo_t read_cmd = {
 
 static int read_f(BlockBackend *blk, int argc, char **argv)
 {
-struct timeval t1, t2;
+struct timespec t1, t2;
 bool Cflag = false, qflag = false, vflag = false;
 bool Pflag = false, sflag = false, lflag = false, bflag = false;
 int c, cnt, ret;
@@ -758,13 +759,13 @@ static int read_f(BlockBackend *blk, int argc, char 
**argv)
 
 buf = qemu_io_alloc(blk, count, 0xab);
 
-gettimeofday(&t1, NULL);
+clock_gettime(CLOCK_MONOTONIC, &t1);
 if (bflag) {
 ret = do_load_vmstate(blk, buf, offset, count, &total);
 } else {
 ret = do_pread(blk, buf, offset, count, &total);
 }
-gettimeofday(&t2, NULL);
+clock_gettime(CLOCK_MONOTONIC, &t2);
 
 if (ret < 0) {
 printf("read failed: %s\n", strerror(-ret));
@@ -836,7 +837,7 @@ static const cmdinfo_t readv_cmd = {
 
 static int readv_f(BlockBackend *blk, int argc, char **argv)
 {
-struct timeval t1, t2;
+struct timespec t1, t2;
 bool Cflag = false, qflag = false, vflag = false;
 int c, cnt, ret;
 char *buf;
@@ -891,9 +892,9 @@ static int readv_f(BlockBackend *blk, int argc, char **argv)
 return -EINVAL;
 }
 
-gettimeofday(&t1, NULL);
+clock_gettime(CLOCK_MONOTONIC,&t1);
 ret = do_aio_readv(blk, &qiov, offset, &total);
-gettimeofday(&t2, NULL);
+clock_gettime(CLOCK_MONOTONIC,&t2);
 
 if (ret < 0) {
 printf("readv failed: %s\n", strerror(-ret));
@@ -972,7 +973,7 @@ static const cmdinfo_t write_cmd = {
 
 static int write_f(BlockBackend *blk, int argc, char **argv)
 {
-struct timeval t1, t2;
+struct timespec t1, t2;
 bool 

Re: [Qemu-block] [PATCH 3/3] block/qcow2-bitmap: rewrite bitmap reopening logic

2019-05-29 Thread Vladimir Sementsov-Ogievskiy
29.05.2019 18:33, Max Reitz wrote:
> On 23.05.19 17:47, Vladimir Sementsov-Ogievskiy wrote:
>> Current logic
>> =
>>
>> Reopen rw -> ro:
>>
>> Store bitmaps and release BdrvDirtyBitmap's.
>>
>> Reopen ro -> rw:
>>
>> Load bitmap list
>> Skip bitmaps which for which we don't have BdrvDirtyBitmap [this is
>> the worst thing]
>> A kind of fail with error message if we see not readonly bitmap
>>
>> This leads to at least the following bug:
>>
>> Assume we have bitmap0.
>> Create external snapshot
>>bitmap0 is stored and therefore removed
>> Commit snapshot
>>now we have no bitmaps
>> Do some writes from guest (*)
>>they are not marked in bitmap
>> Shutdown
>> Start
>>bitmap0 is loaded as valid, but it is actually broken! It misses
>>writes (*)
>> Incremental backup
>>it will be inconsistent
>>
>> New logic
>> =
>>
>> Reopen rw -> ro:
>>
>> Store bitmaps and don't remove them from RAM, only mark them readonly
>> (the latter is already done, but didn't work properly because of
>> removing in storing function)
>>
>> Reopen to rw (don't filter case with reopen rw -> rw, it is supported
>> now in qcow2_reopen_bitmaps_rw)
>>
>> Load bitmap list
>>
>> Carefully consider all possible combinations of bitmaps in RAM and in
>> the image, try to fix corruptions, print corresponding error messages.
>>
>> Also, call qcow2_reopen_bitmaps_rw after the whole reopen queue
>> commited, as otherwise we can't write to the qcow2 file child on reopen
>> ro -> rw.
>>
>> Also, add corresponding test-cases to 255.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> ---
>>   block/qcow2.h  |   5 +-
>>   include/block/block_int.h  |   2 +-
>>   block.c|  29 ++
>>   block/qcow2-bitmap.c   | 197 ++---
>>   block/qcow2.c  |   2 +-
>>   tests/qemu-iotests/255 |   2 +
>>   tests/qemu-iotests/255.out |  35 +++
>>   7 files changed, 193 insertions(+), 79 deletions(-)
> 
> [...]
> 
>> diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
>> index 2b84bfa007..4e565db11f 100644
>> --- a/block/qcow2-bitmap.c
>> +++ b/block/qcow2-bitmap.c
> 
> [...]
> 
>> @@ -1103,76 +1110,150 @@ Qcow2BitmapInfoList 
>> *qcow2_get_bitmap_info_list(BlockDriverState *bs,
>>   return list;
>>   }
>>   
>> -int qcow2_reopen_bitmaps_rw_hint(BlockDriverState *bs, bool *header_updated,
>> - Error **errp)
>> +/*
>> + * qcow2_reopen_bitmaps_rw
>> + *
>> + * The function is called in bdrv_reopen_multiple after all calls to
>> + * bdrv_reopen_commit, when final bs is writable. It is done so, as we need
>> + * write access to child bs, and with current reopening architecture, when
>> + * reopen ro -> rw it is possible only after all reopen commits.
>> + *
>> + * So, we can't fail here.
> 
> Why?  Because the current design forbids it?
> 
> Then the current design seems flawed to me.
> 
> [CC-ing Berto]
> 
> Without having looked too close into this, it seems from your commit
> message that it is possible to run into unrecoverable fatal errors here.
>   We cannot just ignore that on the basis that the current design cannot
> deal with that.
> 
> It just appears wrong to me to update any flags in the image in the
> .commit() part of a transaction.  We should try to do so in .prepare().
>   If it works, great, we’re set for .commit().  If it doesn’t, welp, time
> for .abort() and pretend the image is still read-only (even though it
> kind of may be half-prepared for writing).
> 
> If we fail to set IN_USE in .commit(), that’s a fatal error in my opinion.
> 
> After some chatting with John, I’ve come to the following belief:
> 
> When switching a node from RO to R/W, it must be able to write to its
> children in .prepare() -- precisely because performing this switch may
> need some metadata change.  If we cannot do this change, then we cannot
> switch the node to R/W.  So it’s clear to me that this needs to be done
> in .prepare().
> 
> So I think a node’s children must be R/W before we can .prepare() the
> node.  That means we need to .commit() them.  That means we cannot have
> a single transaction that switches a whole tree to be R/W, but it must
> consist of one transaction per level.
> 
> If something fails, we need to roll back all the previous layers.  Well,
> it depends.
> 
> If we switch to R/W (and something fails), then we need to try to revert
> the children we have already made R/W to be RO.  If that doesn’t work,
> welp, too bad, but not fatal.

And, than, why not do full reopen rw in .prepare, and just organize
recursion so that children reopened rw before parent? Than we again have
one transaction for the tree, but abort may fail to rollback it in worst case.
But we cant avoid it anyway, with one transaction or with transactions per 
level..

So, just move all code from .commit to .prepare for reopen-rw and try to 
roll-back
it in .abort?

Or do you have an idea for a patch?

> 
> Switchi

[Qemu-block] [PATCH v8 7/7] block/backup: use backup-top instead of write notifiers

2019-05-29 Thread Vladimir Sementsov-Ogievskiy
Drop write notifiers and use filter node instead. Changes:

1. copy-before-writes now handled by filter node, so, drop all
   is_write_notifier arguments.

2. we don't have intersecting requests, so their handling is dropped.
Instead, synchronization works as follows:
when backup or backup-top starts copying of some area it firstly
clears copy-bitmap bits, and nobody touches areas, not marked with
dirty bits in copy-bitmap, so there is no intersection. Also, backup
job copy operations are surrounded by bdrv region lock, which is
actually serializing request, to not interfere with guest writes and
not read changed data from source (before reading we clear
corresponding bit in copy-bitmap, so, this area is not more handled by
backup-top).

3. To sync with in-flight requests at job finish we now have drained
removing of the filter, we don't need rw-lock.

== RFC part ==

iotests changed:
56: op-blocker doesn't shot now, as we set it on source, but then check
on filter, when trying to start second backup... Should I workaround it
somehow?

129: Hmm, now it is not busy at this moment.. But it's illegal to check
busy, as job has pause-points and set busy to false in these points.
Why we assert it in this test?

141: Obvious, as drv0 is not root node now, but backing of the filter,
when we try to remove it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/backup.c | 171 ++---
 tests/qemu-iotests/056 |   2 +-
 tests/qemu-iotests/129 |   1 -
 tests/qemu-iotests/141.out |   2 +-
 4 files changed, 68 insertions(+), 108 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 00f4f8af53..a5b8e04c9c 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -2,6 +2,7 @@
  * QEMU backup
  *
  * Copyright (C) 2013 Proxmox Server Solutions
+ * Copyright (c) 2019 Virtuozzo International GmbH. All rights reserved.
  *
  * Authors:
  *  Dietmar Maurer (diet...@proxmox.com)
@@ -26,14 +27,9 @@
 #include "qemu/bitmap.h"
 #include "qemu/error-report.h"
 
-#define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
+#include "block/backup-top.h"
 
-typedef struct CowRequest {
-int64_t start_byte;
-int64_t end_byte;
-QLIST_ENTRY(CowRequest) list;
-CoQueue wait_queue; /* coroutines blocked on this request */
-} CowRequest;
+#define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
 
 typedef struct BackupBlockJob {
 BlockJob common;
@@ -43,13 +39,10 @@ typedef struct BackupBlockJob {
 MirrorSyncMode sync_mode;
 BlockdevOnError on_source_error;
 BlockdevOnError on_target_error;
-CoRwlock flush_rwlock;
 uint64_t len;
 uint64_t bytes_read;
 int64_t cluster_size;
 bool compress;
-NotifierWithReturn before_write;
-QLIST_HEAD(, CowRequest) inflight_reqs;
 
 HBitmap *copy_bitmap;
 bool use_copy_range;
@@ -60,56 +53,17 @@ typedef struct BackupBlockJob {
 
 static const BlockJobDriver backup_job_driver;
 
-/* See if in-flight requests overlap and wait for them to complete */
-static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job,
-   int64_t start,
-   int64_t end)
-{
-CowRequest *req;
-bool retry;
-
-do {
-retry = false;
-QLIST_FOREACH(req, &job->inflight_reqs, list) {
-if (end > req->start_byte && start < req->end_byte) {
-qemu_co_queue_wait(&req->wait_queue, NULL);
-retry = true;
-break;
-}
-}
-} while (retry);
-}
-
-/* Keep track of an in-flight request */
-static void cow_request_begin(CowRequest *req, BackupBlockJob *job,
-  int64_t start, int64_t end)
-{
-req->start_byte = start;
-req->end_byte = end;
-qemu_co_queue_init(&req->wait_queue);
-QLIST_INSERT_HEAD(&job->inflight_reqs, req, list);
-}
-
-/* Forget about a completed request */
-static void cow_request_end(CowRequest *req)
-{
-QLIST_REMOVE(req, list);
-qemu_co_queue_restart_all(&req->wait_queue);
-}
-
 /* Copy range to target with a bounce buffer and return the bytes copied. If
  * error occurred, return a negative error number */
 static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
   int64_t start,
   int64_t end,
-  bool is_write_notifier,
   bool *error_is_read,
   void **bounce_buffer)
 {
 int ret;
 BlockBackend *blk = job->common.blk;
 int nbytes;
-int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
 int write_flags = job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0;
 
 assert(QEMU_IS_ALIGNED(start, job->cluster_size));
@@ -119,7 +73,7 @@ static int coroutine_fn 
b

[Qemu-block] [PATCH v8 6/7] block: add lock/unlock range functions

2019-05-29 Thread Vladimir Sementsov-Ogievskiy
From: Vladimir Sementsov-Ogievskiy 

Introduce lock/unlock range functionality, based on serialized
requests. This is needed to refactor backup, dropping local
tracked-request-like synchronization.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block_int.h |  4 
 block/io.c| 44 ++-
 2 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 1eebc7c8f3..9d9d4346e9 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -66,6 +66,7 @@ enum BdrvTrackedRequestType {
 BDRV_TRACKED_WRITE,
 BDRV_TRACKED_DISCARD,
 BDRV_TRACKED_TRUNCATE,
+BDRV_TRACKED_LOCK,
 };
 
 typedef struct BdrvTrackedRequest {
@@ -928,6 +929,9 @@ int coroutine_fn bdrv_co_preadv(BdrvChild *child,
 int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
 int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
 BdrvRequestFlags flags);
+void *coroutine_fn bdrv_co_try_lock(BlockDriverState *bs,
+int64_t offset, unsigned int bytes);
+void coroutine_fn bdrv_co_unlock(void *opaque);
 
 static inline int coroutine_fn bdrv_co_pread(BdrvChild *child,
 int64_t offset, unsigned int bytes, void *buf, BdrvRequestFlags flags)
diff --git a/block/io.c b/block/io.c
index c347f90722..488e01e0a1 100644
--- a/block/io.c
+++ b/block/io.c
@@ -720,6 +720,15 @@ void bdrv_dec_in_flight(BlockDriverState *bs)
 bdrv_wakeup(bs);
 }
 
+static bool ignore_intersection(BdrvTrackedRequest *a, BdrvTrackedRequest *b)
+{
+return a == b || (!a->serialising && !b->serialising) ||
+(a->type == BDRV_TRACKED_LOCK && b->type == BDRV_TRACKED_READ &&
+ !b->serialising) ||
+(b->type == BDRV_TRACKED_LOCK && a->type == BDRV_TRACKED_READ &&
+ !a->serialising);
+}
+
 static bool coroutine_fn do_wait_serialising_requests(BdrvTrackedRequest *self,
   bool wait)
 {
@@ -736,7 +745,7 @@ static bool coroutine_fn 
do_wait_serialising_requests(BdrvTrackedRequest *self,
 retry = false;
 qemu_co_mutex_lock(&bs->reqs_lock);
 QLIST_FOREACH(req, &bs->tracked_requests, list) {
-if (req == self || (!req->serialising && !self->serialising)) {
+if (ignore_intersection(self, req)) {
 continue;
 }
 if (tracked_request_overlaps(req, self->overlap_offset,
@@ -774,6 +783,12 @@ static bool coroutine_fn 
wait_serialising_requests(BdrvTrackedRequest *self)
 return do_wait_serialising_requests(self, true);
 }
 
+static bool coroutine_fn should_wait_serialising_requests(
+BdrvTrackedRequest *self)
+{
+return do_wait_serialising_requests(self, false);
+}
+
 static int bdrv_check_byte_request(BlockDriverState *bs, int64_t offset,
size_t size)
 {
@@ -3184,3 +3199,30 @@ int bdrv_truncate(BdrvChild *child, int64_t offset, 
PreallocMode prealloc,
 
 return tco.ret;
 }
+
+void *coroutine_fn bdrv_co_try_lock(BlockDriverState *bs,
+int64_t offset, unsigned int bytes)
+{
+BdrvTrackedRequest *req = g_new(BdrvTrackedRequest, 1);
+
+tracked_request_begin(req, bs, offset, bytes, BDRV_TRACKED_LOCK);
+mark_request_serialising(req, bdrv_get_cluster_size(bs));
+
+if (should_wait_serialising_requests(req)) {
+tracked_request_end(req);
+g_free(req);
+return NULL;
+}
+
+return req;
+}
+
+void coroutine_fn bdrv_co_unlock(void *opaque)
+{
+BdrvTrackedRequest *req = opaque;
+
+assert(req->type == BDRV_TRACKED_LOCK);
+
+tracked_request_end(req);
+g_free(req);
+}
-- 
2.18.0




[Qemu-block] [PATCH v8 0/7] backup-top filter driver for backup

2019-05-29 Thread Vladimir Sementsov-Ogievskiy
Hi all!

These series introduce backup-top driver. It's a filter-node, which
do copy-before-write operation. Mirror uses filter-node for handling
guest writes, let's move to filter-node (from write-notifiers) for
backup too

v8:
01-03: new
05: add Max's r-b
others changed, change description in each patch mail in Notes section.

v6-v7 was preparing refactoring, which now is in Max's pull request, and
these series based on it:
Based-on: https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg06526.html

Vladimir Sementsov-Ogievskiy (7):
  block: teach bdrv_debug_breakpoint skip filters with backing
  block: swap operation order in bdrv_append
  block: allow not one child for implicit node
  block: introduce backup-top filter driver
  block/io: refactor wait_serialising_requests
  block: add lock/unlock range functions
  block/backup: use backup-top instead of write notifiers

 block/backup-top.h |  64 
 include/block/block_int.h  |   4 +
 block.c|  60 +--
 block/backup-top.c | 322 +
 block/backup.c | 171 
 block/io.c |  68 ++--
 block/Makefile.objs|   2 +
 tests/qemu-iotests/056 |   2 +-
 tests/qemu-iotests/085.out |   2 +-
 tests/qemu-iotests/129 |   1 -
 tests/qemu-iotests/141.out |   2 +-
 11 files changed, 567 insertions(+), 131 deletions(-)
 create mode 100644 block/backup-top.h
 create mode 100644 block/backup-top.c

-- 
2.18.0




[Qemu-block] [PATCH v8 3/7] block: allow not one child for implicit node

2019-05-29 Thread Vladimir Sementsov-Ogievskiy
Upcoming backup-top filter wants to operate like usual implicit filter
node with fall-through to backing child. But also needs additional
target child, let's support that.

On the other hand, after backup completion (before job dismiss) filter
is still attached to job blk, but don't have any children. Support this
too.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 57216f4115..3f4de3ae32 100644
--- a/block.c
+++ b/block.c
@@ -6200,9 +6200,20 @@ void bdrv_refresh_filename(BlockDriverState *bs)
 }
 
 if (bs->implicit) {
-/* For implicit nodes, just copy everything from the single child */
+/*
+ * For implicit nodes, just copy everything from the single child or
+ * from backing, if there are several children.
+ * If there are no children for some reason (filter is still attached
+ * to block-job blk, but already removed from backing chain of device)
+ * do nothing.
+ */
 child = QLIST_FIRST(&bs->children);
-assert(QLIST_NEXT(child, next) == NULL);
+if (!child) {
+return;
+} else if (QLIST_NEXT(child, next)) {
+assert(bs->backing);
+child = bs->backing;
+}
 
 pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
 child->bs->exact_filename);
-- 
2.18.0




[Qemu-block] [PATCH v8 4/7] block: introduce backup-top filter driver

2019-05-29 Thread Vladimir Sementsov-Ogievskiy
Backup-top filter does copy-before-write operation. It should be
inserted above active disk and has a target node for CBW, like the
following:

+---+
| Guest |
+---+
|r,w
v
++  target   +---+
| backup_top |-->| target(qcow2) |
++   CBW +---+
|
backing |r,w
v
+-+
| Active disk |
+-+

The driver will be used in backup instead of write-notifiers.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/backup-top.h  |  64 +
 block/backup-top.c  | 322 
 block/Makefile.objs |   2 +
 3 files changed, 388 insertions(+)
 create mode 100644 block/backup-top.h
 create mode 100644 block/backup-top.c

diff --git a/block/backup-top.h b/block/backup-top.h
new file mode 100644
index 00..788e18c358
--- /dev/null
+++ b/block/backup-top.h
@@ -0,0 +1,64 @@
+/*
+ * backup-top filter driver
+ *
+ * The driver performs Copy-Before-Write (CBW) operation: it is injected above
+ * some node, and before each write it copies _old_ data to the target node.
+ *
+ * Copyright (c) 2018 Virtuozzo International GmbH. All rights reserved.
+ *
+ * Author:
+ *  Sementsov-Ogievskiy Vladimir 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see .
+ */
+
+#ifndef BACKUP_TOP_H
+#define BACKUP_TOP_H
+
+#include "qemu/osdep.h"
+
+#include "block/block_int.h"
+
+typedef void (*BackupTopProgressCallback)(uint64_t done, void *opaque);
+typedef struct BDRVBackupTopState {
+HBitmap *copy_bitmap; /* what should be copied to @target on guest write. 
*/
+BdrvChild *target;
+
+BackupTopProgressCallback progress_cb;
+void *progress_opaque;
+} BDRVBackupTopState;
+
+/*
+ * bdrv_backup_top_append
+ *
+ * Append backup_top filter node above @source node. @target node will receive
+ * the data backed up during CBE operations. New filter together with @target
+ * node are attached to @source aio context.
+ *
+ * The resulting filter node is implicit.
+ *
+ * @copy_bitmap selects regions which needs CBW. Furthermore, backup_top will
+ * use exactly this bitmap, so it may be used to control backup_top behavior
+ * dynamically. Caller should not release @copy_bitmap during life-time of
+ * backup_top. Progress is tracked by calling @progress_cb function.
+ */
+BlockDriverState *bdrv_backup_top_append(
+BlockDriverState *source, BlockDriverState *target,
+HBitmap *copy_bitmap, Error **errp);
+void bdrv_backup_top_set_progress_callback(
+BlockDriverState *bs, BackupTopProgressCallback progress_cb,
+void *progress_opaque);
+void bdrv_backup_top_drop(BlockDriverState *bs);
+
+#endif /* BACKUP_TOP_H */
diff --git a/block/backup-top.c b/block/backup-top.c
new file mode 100644
index 00..1daa02f539
--- /dev/null
+++ b/block/backup-top.c
@@ -0,0 +1,322 @@
+/*
+ * backup-top filter driver
+ *
+ * The driver performs Copy-Before-Write (CBW) operation: it is injected above
+ * some node, and before each write it copies _old_ data to the target node.
+ *
+ * Copyright (c) 2018 Virtuozzo International GmbH. All rights reserved.
+ *
+ * Author:
+ *  Sementsov-Ogievskiy Vladimir 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see .
+ */
+
+#include "qemu/osdep.h"
+
+#include "qemu/cutils.h"
+#include "qapi/error.h"
+#include "block/block_int.h"
+#include "block/qdict.h"
+
+#include "block/backup-top.h"
+
+static coroutine_fn int backup_top_co_preadv(
+BlockDriverState *bs, uint64_t offset, uint64_t bytes,
+QEMUIOVector *qiov, int flags)
+{
+/*
+ * Features to be implemented:
+ * F1. COR. save read data to fleecing target for fast access
+ * (to reduce reads). This possibly may be done

[Qemu-block] [PATCH v8 1/7] block: teach bdrv_debug_breakpoint skip filters with backing

2019-05-29 Thread Vladimir Sementsov-Ogievskiy
Teach bdrv_debug_breakpoint and bdrv_debug_remove_breakpoint skip
filters with backing. This is needed to implement and use in backup job
it's own backup_top filter driver (like mirror already has one), and
without this improvement, breakpoint removal will fail at least in 55
iotest.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block.c | 34 ++
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/block.c b/block.c
index 1a73e310c1..e6e9770704 100644
--- a/block.c
+++ b/block.c
@@ -4982,14 +4982,35 @@ void bdrv_debug_event(BlockDriverState *bs, 
BlkdebugEvent event)
 bs->drv->bdrv_debug_event(bs, event);
 }
 
-int bdrv_debug_breakpoint(BlockDriverState *bs, const char *event,
-  const char *tag)
+static BlockDriverState *bdrv_find_debug_node(BlockDriverState *bs)
 {
 while (bs && bs->drv && !bs->drv->bdrv_debug_breakpoint) {
-bs = bs->file ? bs->file->bs : NULL;
+if (bs->file) {
+bs = bs->file->bs;
+continue;
+}
+
+if (bs->drv->is_filter && bs->backing) {
+bs = bs->backing->bs;
+continue;
+}
+
+break;
 }
 
 if (bs && bs->drv && bs->drv->bdrv_debug_breakpoint) {
+assert(bs->drv->bdrv_debug_remove_breakpoint);
+return bs;
+}
+
+return NULL;
+}
+
+int bdrv_debug_breakpoint(BlockDriverState *bs, const char *event,
+  const char *tag)
+{
+bs = bdrv_find_debug_node(bs);
+if (bs) {
 return bs->drv->bdrv_debug_breakpoint(bs, event, tag);
 }
 
@@ -4998,11 +5019,8 @@ int bdrv_debug_breakpoint(BlockDriverState *bs, const 
char *event,
 
 int bdrv_debug_remove_breakpoint(BlockDriverState *bs, const char *tag)
 {
-while (bs && bs->drv && !bs->drv->bdrv_debug_remove_breakpoint) {
-bs = bs->file ? bs->file->bs : NULL;
-}
-
-if (bs && bs->drv && bs->drv->bdrv_debug_remove_breakpoint) {
+bs = bdrv_find_debug_node(bs);
+if (bs) {
 return bs->drv->bdrv_debug_remove_breakpoint(bs, tag);
 }
 
-- 
2.18.0




[Qemu-block] [PATCH v8 2/7] block: swap operation order in bdrv_append

2019-05-29 Thread Vladimir Sementsov-Ogievskiy
bs_top parents may conflict with bs_new backing child permissions, so
let's do bdrv_replace_node first, it covers more possible cases.

It is needed for further implementation of backup-top filter, which
don't want to share write permission on its backing child.

Side effect is that we may set backing hd when device name is already
available, so 085 iotest output is changed.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block.c| 11 ---
 tests/qemu-iotests/085.out |  2 +-
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index e6e9770704..57216f4115 100644
--- a/block.c
+++ b/block.c
@@ -4088,22 +4088,27 @@ void bdrv_append(BlockDriverState *bs_new, 
BlockDriverState *bs_top,
 {
 Error *local_err = NULL;
 
-bdrv_set_backing_hd(bs_new, bs_top, &local_err);
+bdrv_ref(bs_top);
+
+bdrv_replace_node(bs_top, bs_new, &local_err);
 if (local_err) {
 error_propagate(errp, local_err);
+error_prepend(errp, "Failed to replace node: ");
 goto out;
 }
 
-bdrv_replace_node(bs_top, bs_new, &local_err);
+bdrv_set_backing_hd(bs_new, bs_top, &local_err);
 if (local_err) {
+bdrv_replace_node(bs_new, bs_top, &error_abort);
 error_propagate(errp, local_err);
-bdrv_set_backing_hd(bs_new, NULL, &error_abort);
+error_prepend(errp, "Failed to set backing: ");
 goto out;
 }
 
 /* bs_new is now referenced by its new parents, we don't need the
  * additional reference any more. */
 out:
+bdrv_unref(bs_top);
 bdrv_unref(bs_new);
 }
 
diff --git a/tests/qemu-iotests/085.out b/tests/qemu-iotests/085.out
index 6edf107f55..e5a2645bf5 100644
--- a/tests/qemu-iotests/085.out
+++ b/tests/qemu-iotests/085.out
@@ -74,7 +74,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 
backing_file=TEST_DIR/
 
 === Invalid command - snapshot node used as backing hd ===
 
-{"error": {"class": "GenericError", "desc": "Node 'snap_11' is busy: node is 
used as backing hd of 'snap_12'"}}
+{"error": {"class": "GenericError", "desc": "Node 'snap_11' is busy: node is 
used as backing hd of 'virtio0'"}}
 
 === Invalid command - snapshot node has a backing image ===
 
-- 
2.18.0




[Qemu-block] [PATCH v8 5/7] block/io: refactor wait_serialising_requests

2019-05-29 Thread Vladimir Sementsov-Ogievskiy
Split out do_wait_serialising_requests with additional possibility to
not actually wait but just check, that there is something to wait for.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 block/io.c | 24 
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/block/io.c b/block/io.c
index 3134a60a48..c347f90722 100644
--- a/block/io.c
+++ b/block/io.c
@@ -720,12 +720,13 @@ void bdrv_dec_in_flight(BlockDriverState *bs)
 bdrv_wakeup(bs);
 }
 
-static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
+static bool coroutine_fn do_wait_serialising_requests(BdrvTrackedRequest *self,
+  bool wait)
 {
 BlockDriverState *bs = self->bs;
 BdrvTrackedRequest *req;
 bool retry;
-bool waited = false;
+bool found = false;
 
 if (!atomic_read(&bs->serialising_in_flight)) {
 return false;
@@ -751,11 +752,13 @@ static bool coroutine_fn 
wait_serialising_requests(BdrvTrackedRequest *self)
  * will wait for us as soon as it wakes up, then just go on
  * (instead of producing a deadlock in the former case). */
 if (!req->waiting_for) {
-self->waiting_for = req;
-qemu_co_queue_wait(&req->wait_queue, &bs->reqs_lock);
-self->waiting_for = NULL;
-retry = true;
-waited = true;
+found = true;
+if (wait) {
+self->waiting_for = req;
+qemu_co_queue_wait(&req->wait_queue, &bs->reqs_lock);
+self->waiting_for = NULL;
+retry = true;
+}
 break;
 }
 }
@@ -763,7 +766,12 @@ static bool coroutine_fn 
wait_serialising_requests(BdrvTrackedRequest *self)
 qemu_co_mutex_unlock(&bs->reqs_lock);
 } while (retry);
 
-return waited;
+return found;
+}
+
+static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
+{
+return do_wait_serialising_requests(self, true);
 }
 
 static int bdrv_check_byte_request(BlockDriverState *bs, int64_t offset,
-- 
2.18.0




Re: [Qemu-block] [PATCH 3/3] block/qcow2-bitmap: rewrite bitmap reopening logic

2019-05-29 Thread Max Reitz
On 23.05.19 17:47, Vladimir Sementsov-Ogievskiy wrote:
> Current logic
> =
> 
> Reopen rw -> ro:
> 
> Store bitmaps and release BdrvDirtyBitmap's.
> 
> Reopen ro -> rw:
> 
> Load bitmap list
> Skip bitmaps which for which we don't have BdrvDirtyBitmap [this is
>the worst thing]
> A kind of fail with error message if we see not readonly bitmap
> 
> This leads to at least the following bug:
> 
> Assume we have bitmap0.
> Create external snapshot
>   bitmap0 is stored and therefore removed
> Commit snapshot
>   now we have no bitmaps
> Do some writes from guest (*)
>   they are not marked in bitmap
> Shutdown
> Start
>   bitmap0 is loaded as valid, but it is actually broken! It misses
>   writes (*)
> Incremental backup
>   it will be inconsistent
> 
> New logic
> =
> 
> Reopen rw -> ro:
> 
> Store bitmaps and don't remove them from RAM, only mark them readonly
> (the latter is already done, but didn't work properly because of
> removing in storing function)
> 
> Reopen to rw (don't filter case with reopen rw -> rw, it is supported
> now in qcow2_reopen_bitmaps_rw)
> 
> Load bitmap list
> 
> Carefully consider all possible combinations of bitmaps in RAM and in
> the image, try to fix corruptions, print corresponding error messages.
> 
> Also, call qcow2_reopen_bitmaps_rw after the whole reopen queue
> commited, as otherwise we can't write to the qcow2 file child on reopen
> ro -> rw.
> 
> Also, add corresponding test-cases to 255.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/qcow2.h  |   5 +-
>  include/block/block_int.h  |   2 +-
>  block.c|  29 ++
>  block/qcow2-bitmap.c   | 197 ++---
>  block/qcow2.c  |   2 +-
>  tests/qemu-iotests/255 |   2 +
>  tests/qemu-iotests/255.out |  35 +++
>  7 files changed, 193 insertions(+), 79 deletions(-)

[...]

> diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
> index 2b84bfa007..4e565db11f 100644
> --- a/block/qcow2-bitmap.c
> +++ b/block/qcow2-bitmap.c

[...]

> @@ -1103,76 +1110,150 @@ Qcow2BitmapInfoList 
> *qcow2_get_bitmap_info_list(BlockDriverState *bs,
>  return list;
>  }
>  
> -int qcow2_reopen_bitmaps_rw_hint(BlockDriverState *bs, bool *header_updated,
> - Error **errp)
> +/*
> + * qcow2_reopen_bitmaps_rw
> + *
> + * The function is called in bdrv_reopen_multiple after all calls to
> + * bdrv_reopen_commit, when final bs is writable. It is done so, as we need
> + * write access to child bs, and with current reopening architecture, when
> + * reopen ro -> rw it is possible only after all reopen commits.
> + *
> + * So, we can't fail here.

Why?  Because the current design forbids it?

Then the current design seems flawed to me.

[CC-ing Berto]

Without having looked too close into this, it seems from your commit
message that it is possible to run into unrecoverable fatal errors here.
 We cannot just ignore that on the basis that the current design cannot
deal with that.

It just appears wrong to me to update any flags in the image in the
.commit() part of a transaction.  We should try to do so in .prepare().
 If it works, great, we’re set for .commit().  If it doesn’t, welp, time
for .abort() and pretend the image is still read-only (even though it
kind of may be half-prepared for writing).

If we fail to set IN_USE in .commit(), that’s a fatal error in my opinion.

After some chatting with John, I’ve come to the following belief:

When switching a node from RO to R/W, it must be able to write to its
children in .prepare() -- precisely because performing this switch may
need some metadata change.  If we cannot do this change, then we cannot
switch the node to R/W.  So it’s clear to me that this needs to be done
in .prepare().

So I think a node’s children must be R/W before we can .prepare() the
node.  That means we need to .commit() them.  That means we cannot have
a single transaction that switches a whole tree to be R/W, but it must
consist of one transaction per level.

If something fails, we need to roll back all the previous layers.  Well,
it depends.

If we switch to R/W (and something fails), then we need to try to revert
the children we have already made R/W to be RO.  If that doesn’t work,
welp, too bad, but not fatal.

Switching to RO goes the other way around (parents to children), so if
we encounter an error there, we cannot make that node’s children RO,
too.  We could try to revert the whole change and make the whole tree
R/W again, or we just don’t.  I think “just don’t” is reasonable.

Max

> On the other hand, we may face different kinds of
> + * corruptions and/or just can't write IN_USE flags to the image due to EIO.
> + *
> + * Try to handle as many cases as we can.
> + */
> +void qcow2_reopen_bitmaps_rw(BlockDriverState *bs)



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH v2 0/3] Update documentation and help related to the preallocation parameter

2019-05-29 Thread Markus Armbruster
Stefano Garzarella  writes:

> Following Markus' advice, I updated the documentation of preallocation
> parameter in qapi/block-core.json adding default and allowed values (patch 1).
> I also updated the help related to BLOCK_OPT_PREALLOC in the QemuOptsList of
> file-posix (patch 2) and gluster (patch 3).

Queued.  Thanks!



Re: [Qemu-block] [PATCH] iotests: Fix intermittent failure in 219

2019-05-29 Thread Max Reitz
On 16.05.19 18:11, Max Reitz wrote:
> In 219, we wait for the job to make progress before we emit its status.
> This makes the output reliable.
> 
> Unfortunately, there is a bug: We do not wait for any more progress if
> the job has already reached its total-progress.  Right after the job has
> been started, it is possible that total-progress is still 0, though.  In
> that case, we may skip the first progress-making step and keep ending up
> 64 kB short.
> 
> To fix that bug, we cab simply wait for total-progress to reach 4 MB
> (the image size) after starting the job.
> 
> Reported-by: Karen Mezick 
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1686651
> Signed-off-by: Max Reitz 
> ---
>  tests/qemu-iotests/219 | 13 ++---
>  1 file changed, 10 insertions(+), 3 deletions(-)

Adjusted the commit message to what I understood John to mean, and
applied to my block(-on-kevin) branch.

(Thanks for the review!)

Max



signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PATCH v3] qemu-io: add pattern file for write command

2019-05-29 Thread Denis Plotnikov
The patch allows to provide a pattern file for write
command. There was no similar ability before.
---
 qemu-io-cmds.c | 80 ++
 1 file changed, 74 insertions(+), 6 deletions(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 09750a23ce..b75ffaa739 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -21,6 +21,7 @@
 #include "qemu/option.h"
 #include "qemu/timer.h"
 #include "qemu/cutils.h"
+#include "string.h"
 
 #define CMD_NOFILE_OK   0x01
 
@@ -343,6 +344,61 @@ static void *qemu_io_alloc(BlockBackend *blk, size_t len, 
int pattern)
 return buf;
 }
 
+static void *qemu_io_alloc_from_file(BlockBackend *blk, size_t len,
+ char *file_name)
+{
+char *buf, *buf_pos;
+FILE *f = fopen(file_name, "r");
+int l;
+
+if (!f) {
+printf("'%s': %s\n", file_name, strerror(errno));
+return NULL;
+}
+
+if (qemuio_misalign) {
+len += MISALIGN_OFFSET;
+}
+buf = blk_blockalign(blk, len);
+memset(buf, 0, len);
+
+buf_pos = buf;
+
+while (len > 0) {
+l = fread(buf_pos, sizeof(char), len, f);
+
+if (feof(f)) {
+   rewind(f);
+}
+
+if (ferror(f)) {
+printf("'%s': %s\n", file_name, strerror(errno));
+goto error;
+}
+
+if (l == 0) {
+printf("'%s' is empty\n", file_name);
+goto error;
+}
+
+len -= l;
+buf_pos += l;
+}
+
+if (qemuio_misalign) {
+buf += MISALIGN_OFFSET;
+}
+
+goto out;
+
+error:
+qemu_vfree(buf);
+buf = NULL;
+out:
+fclose(f);
+return buf;
+}
+
 static void qemu_io_free(void *p)
 {
 if (qemuio_misalign) {
@@ -965,7 +1021,7 @@ static const cmdinfo_t write_cmd = {
 .perm   = BLK_PERM_WRITE,
 .argmin = 2,
 .argmax = -1,
-.args   = "[-bcCfnquz] [-P pattern] off len",
+.args   = "[-bcCfnquz] [-P pattern | -s source_file] off len",
 .oneline= "writes a number of bytes at a specified offset",
 .help   = write_help,
 };
@@ -974,7 +1030,7 @@ static int write_f(BlockBackend *blk, int argc, char 
**argv)
 {
 struct timeval t1, t2;
 bool Cflag = false, qflag = false, bflag = false;
-bool Pflag = false, zflag = false, cflag = false;
+bool Pflag = false, zflag = false, cflag = false, sflag = false;
 int flags = 0;
 int c, cnt, ret;
 char *buf = NULL;
@@ -983,8 +1039,9 @@ static int write_f(BlockBackend *blk, int argc, char 
**argv)
 /* Some compilers get confused and warn if this is not initialized.  */
 int64_t total = 0;
 int pattern = 0xcd;
+char *file_name;
 
-while ((c = getopt(argc, argv, "bcCfnpP:quz")) != -1) {
+while ((c = getopt(argc, argv, "bcCfnpP:quzs:")) != -1) {
 switch (c) {
 case 'b':
 bflag = true;
@@ -1020,6 +1077,10 @@ static int write_f(BlockBackend *blk, int argc, char 
**argv)
 case 'z':
 zflag = true;
 break;
+case 's':
+sflag = true;
+file_name = g_strdup(optarg);
+break;
 default:
 qemuio_command_usage(&write_cmd);
 return -EINVAL;
@@ -1051,8 +1112,8 @@ static int write_f(BlockBackend *blk, int argc, char 
**argv)
 return -EINVAL;
 }
 
-if (zflag && Pflag) {
-printf("-z and -P cannot be specified at the same time\n");
+if ((int)zflag + (int)Pflag + (int)sflag > 1) {
+printf("Only one of -z, -P, and -s can be specified at the same 
time\n");
 return -EINVAL;
 }
 
@@ -1088,7 +1149,14 @@ static int write_f(BlockBackend *blk, int argc, char 
**argv)
 }
 
 if (!zflag) {
-buf = qemu_io_alloc(blk, count, pattern);
+if (sflag) {
+buf = qemu_io_alloc_from_file(blk, count, file_name);
+if (!buf) {
+return -EINVAL;
+}
+} else {
+buf = qemu_io_alloc(blk, count, pattern);
+}
 }
 
 gettimeofday(&t1, NULL);
-- 
2.17.0




Re: [Qemu-block] [Qemu-devel] [PATCH] hw/block/fdc: floppy command FIFO memory initialization

2019-05-29 Thread John Snow



On 5/29/19 9:56 AM, Andrey Shinkevich wrote:
> 
> 
> On 29/05/2019 16:40, John Snow wrote:
>>
>>
>> On 5/29/19 8:22 AM, Andrey Shinkevich wrote:
>>> The uninitialized memory allocated for the command FIFO of the
>>> floppy controller during the VM hardware initialization incurs
>>> many unwanted reports by Valgrind when VM state is being saved.
>>> That verbosity hardens a search for the real memory issues when
>>> the iotests run. Particularly, the patch eliminates 20 unnecessary
>>> reports of the Valgrind tool in the iotest #169.
>>>
>>> Signed-off-by: Andrey Shinkevich 
>>> ---
>>>   hw/block/fdc.c | 4 
>>>   1 file changed, 4 insertions(+)
>>>
>>> diff --git a/hw/block/fdc.c b/hw/block/fdc.c
>>> index 6f19f12..54e470c 100644
>>> --- a/hw/block/fdc.c
>>> +++ b/hw/block/fdc.c
>>> @@ -2647,6 +2647,10 @@ static void fdctrl_realize_common(DeviceState *dev, 
>>> FDCtrl *fdctrl,
>>>   
>>>   FLOPPY_DPRINTF("init controller\n");
>>>   fdctrl->fifo = qemu_memalign(512, FD_SECTOR_LEN);
>>> +if (fdctrl->fifo) {
>>> +/* To avoid using the uninitialized memory while saving VM state */
>>> +memset(fdctrl->fifo, 0, FD_SECTOR_LEN);
>>> +}
>>
>> qemu_memalign doesn't look like it can fail (looking at
>> util/oslib-posix); is this conditional necessary?
>>
>> I think you could just:
>>
>> fdctrl->fifo = qemu_memalign(512, FD_SECTOR_LEN);
>> memset(fdctrl->fifo, 0, FD_SECTOR_LEN);
>>
>>>   fdctrl->fifo_size = 512;
>>>   fdctrl->result_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
>>>fdctrl_result_timer, fdctrl);
>>>
> 
> Yes, that's right.
> Thank you, John.
> 
> Andrey
> 

Thanks for valgrinding QEMU :)

--js



Re: [Qemu-block] [Qemu-devel] [PULL 0/3] Bitmaps patches

2019-05-29 Thread John Snow



On 5/28/19 8:54 PM, no-re...@patchew.org wrote:
> Patchew URL: https://patchew.org/QEMU/20190528235842.29453-1-js...@redhat.com/
> 
> 
> 
> Hi,
> 
> This series failed the asan build test. Please find the testing commands and
> their output below. If you have Docker installed, you can probably reproduce 
> it
> locally.
> 

Hm, I don't understand this output too well; is this truly related to my
patches? It looks like some noise around aio_notify, if this is the
actual error text that caused the failure:

PASS 19 test-bdrv-drain /bdrv-drain/iothread/drain
=
==10624==ERROR: AddressSanitizer: heap-use-after-free on address
0x61230e70 at pc 0x558de21c3386 bp 0x7fcb8aeb8000 sp 0x7fcb8aeb7ff8
WRITE of size 1 at 0x61230e70 thread T15
PASS 1 ahci-test /x86_64/ahci/sanity
==10643==WARNING: ASan doesn't fully support makecontext/swapcontext
functions and may produce false positives in some cases!
#0 0x558de21c3385 in aio_notify /tmp/qemu-test/src/util/async.c:352:9
#1 0x558de21c4ffb in qemu_bh_schedule
/tmp/qemu-test/src/util/async.c:168:9
#2 0x558de21c8202 in aio_co_schedule
/tmp/qemu-test/src/util/async.c:465:5
#3 0x558de21c84d9 in aio_co_enter /tmp/qemu-test/src/util/async.c:484:9
#4 0x558de21c845d in aio_co_wake /tmp/qemu-test/src/util/async.c:478:5
#5 0x558de1abada4 in co_reenter_bh
/tmp/qemu-test/src/tests/test-bdrv-drain.c:63:5
#6 0x558de21c3c6a in aio_bh_call /tmp/qemu-test/src/util/async.c:90:5
#7 0x558de21c4382 in aio_bh_poll /tmp/qemu-test/src/util/async.c:118:13
#8 0x558de21e9ee3 in aio_poll /tmp/qemu-test/src/util/aio-posix.c:729:17
#9 0x558de208d92a in iothread_run
/tmp/qemu-test/src/tests/iothread.c:51:9
#10 0x558de21fdb12 in qemu_thread_start
/tmp/qemu-test/src/util/qemu-thread-posix.c:502:9
#11 0x7fcb9d63c58d in start_thread (/lib64/libpthread.so.0+0x858d)
#12 0x7fcb9d54a682 in __GI___clone (/lib64/libc.so.6+0xfd682)

0x61230e70 is located 176 bytes inside of 312-byte region
[0x61230dc0,0x61230ef8)
freed by thread T0 here:
#0 0x558de1a6c3a0 in free
(/tmp/qemu-test/build/tests/test-bdrv-drain+0x5193a0)
#1 0x7fcb9da4fed1 in g_free (/lib64/libglib-2.0.so.0+0x54ed1)

previously allocated by thread T14 here:
#0 0x558de1a6c9cf in calloc
(/tmp/qemu-test/build/tests/test-bdrv-drain+0x5199cf)
#1 0x7fcb9da4fe1d in g_malloc0 (/lib64/libglib-2.0.so.0+0x54e1d)

Thread T15 created by T0 here:
#0 0x558de19c3074 in __interceptor_pthread_create
(/tmp/qemu-test/build/tests/test-bdrv-drain+0x470074)
#1 0x558de21fd40f in qemu_thread_create
/tmp/qemu-test/src/util/qemu-thread-posix.c:539:11
#2 0x558de208cfc7 in iothread_new
/tmp/qemu-test/src/tests/iothread.c:75:5
#3 0x558de1abbdd4 in test_iothread_common
/tmp/qemu-test/src/tests/test-bdrv-drain.c:664:19
#4 0x558de1ab6c9e in test_iothread_drain_subtree
/tmp/qemu-test/src/tests/test-bdrv-drain.c:771:5
#5 0x7fcb9da71fc9  (/lib64/libglib-2.0.so.0+0x76fc9)

Thread T14 created by T0 here:
#0 0x558de19c3074 in __interceptor_pthread_create
(/tmp/qemu-test/build/tests/test-bdrv-drain+0x470074)
#1 0x558de21fd40f in qemu_thread_create
/tmp/qemu-test/src/util/qemu-thread-posix.c:539:11
#2 0x558de208cfc7 in iothread_new
/tmp/qemu-test/src/tests/iothread.c:75:5
#3 0x558de1abbdc8 in test_iothread_common
/tmp/qemu-test/src/tests/test-bdrv-drain.c:663:19
#4 0x558de1ab6c9e in test_iothread_drain_subtree
/tmp/qemu-test/src/tests/test-bdrv-drain.c:771:5
#5 0x7fcb9da71fc9  (/lib64/libglib-2.0.so.0+0x76fc9)

SUMMARY: AddressSanitizer: heap-use-after-free
/tmp/qemu-test/src/util/async.c:352:9 in aio_notify
Shadow bytes around the buggy address:
  0x0c247fffe170: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c247fffe180: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c247fffe190: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c247fffe1a0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
  0x0c247fffe1b0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
=>0x0c247fffe1c0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd[fd]fd
  0x0c247fffe1d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
  0x0c247fffe1e0: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
  0x0c247fffe1f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c247fffe200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa fa
  0x0c247fffe210: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:   00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:   fa
  Freed heap region:   fd
  Stack left redzone:  f1
  Stack mid redzone:   f2
  Stack right redzone: f3
  Stack after return:  f5
  Stack use after scope:   f8
  Global redzone:  f9
  Global init order:   f6
  Poisoned by user:f7
  Container overflow:  fc
  Array cookie:ac
  Intra object redzone:bb
  ASan inter

Re: [Qemu-block] [Qemu-devel] [PATCH] hw/block/fdc: floppy command FIFO memory initialization

2019-05-29 Thread Andrey Shinkevich


On 29/05/2019 16:40, John Snow wrote:
> 
> 
> On 5/29/19 8:22 AM, Andrey Shinkevich wrote:
>> The uninitialized memory allocated for the command FIFO of the
>> floppy controller during the VM hardware initialization incurs
>> many unwanted reports by Valgrind when VM state is being saved.
>> That verbosity hardens a search for the real memory issues when
>> the iotests run. Particularly, the patch eliminates 20 unnecessary
>> reports of the Valgrind tool in the iotest #169.
>>
>> Signed-off-by: Andrey Shinkevich 
>> ---
>>   hw/block/fdc.c | 4 
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/hw/block/fdc.c b/hw/block/fdc.c
>> index 6f19f12..54e470c 100644
>> --- a/hw/block/fdc.c
>> +++ b/hw/block/fdc.c
>> @@ -2647,6 +2647,10 @@ static void fdctrl_realize_common(DeviceState *dev, 
>> FDCtrl *fdctrl,
>>   
>>   FLOPPY_DPRINTF("init controller\n");
>>   fdctrl->fifo = qemu_memalign(512, FD_SECTOR_LEN);
>> +if (fdctrl->fifo) {
>> +/* To avoid using the uninitialized memory while saving VM state */
>> +memset(fdctrl->fifo, 0, FD_SECTOR_LEN);
>> +}
> 
> qemu_memalign doesn't look like it can fail (looking at
> util/oslib-posix); is this conditional necessary?
> 
> I think you could just:
> 
> fdctrl->fifo = qemu_memalign(512, FD_SECTOR_LEN);
> memset(fdctrl->fifo, 0, FD_SECTOR_LEN);
> 
>>   fdctrl->fifo_size = 512;
>>   fdctrl->result_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
>>fdctrl_result_timer, fdctrl);
>>

Yes, that's right.
Thank you, John.

Andrey



Re: [Qemu-block] [Qemu-devel] [PATCH] hw/block/fdc: floppy command FIFO memory initialization

2019-05-29 Thread John Snow



On 5/29/19 8:22 AM, Andrey Shinkevich wrote:
> The uninitialized memory allocated for the command FIFO of the
> floppy controller during the VM hardware initialization incurs
> many unwanted reports by Valgrind when VM state is being saved.
> That verbosity hardens a search for the real memory issues when
> the iotests run. Particularly, the patch eliminates 20 unnecessary
> reports of the Valgrind tool in the iotest #169.
> 
> Signed-off-by: Andrey Shinkevich 
> ---
>  hw/block/fdc.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/hw/block/fdc.c b/hw/block/fdc.c
> index 6f19f12..54e470c 100644
> --- a/hw/block/fdc.c
> +++ b/hw/block/fdc.c
> @@ -2647,6 +2647,10 @@ static void fdctrl_realize_common(DeviceState *dev, 
> FDCtrl *fdctrl,
>  
>  FLOPPY_DPRINTF("init controller\n");
>  fdctrl->fifo = qemu_memalign(512, FD_SECTOR_LEN);
> +if (fdctrl->fifo) {
> +/* To avoid using the uninitialized memory while saving VM state */
> +memset(fdctrl->fifo, 0, FD_SECTOR_LEN);
> +}

qemu_memalign doesn't look like it can fail (looking at
util/oslib-posix); is this conditional necessary?

I think you could just:

fdctrl->fifo = qemu_memalign(512, FD_SECTOR_LEN);
memset(fdctrl->fifo, 0, FD_SECTOR_LEN);

>  fdctrl->fifo_size = 512;
>  fdctrl->result_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
>   fdctrl_result_timer, fdctrl);
> 



Re: [Qemu-block] [PATCH v3] iotests: Filter 175's allocation information

2019-05-29 Thread Max Reitz
On 16.05.19 16:43, Max Reitz wrote:
> It is possible for an empty file to take up blocks on a filesystem, for
> example:
> 
> $ qemu-img create -f raw test.img 1G
> Formatting 'test.img', fmt=raw size=1073741824
> $ mkfs.ext4 -I 128 -q test.img
> $ mkdir test-mount
> $ sudo mount -o loop test.img test-mount
> $ sudo touch test-mount/test-file
> $ stat -c 'blocks=%b' test-mount/test-file
> blocks=8
> 
> These extra blocks (one cluster) are apparently used for metadata,
> because they are always there, on top of blocks used for data:
> 
> $ sudo dd if=/dev/zero of=test-mount/test-file bs=1M count=1
> 1+0 records in
> 1+0 records out
> 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00135339 s, 775 MB/s
> $ stat -c 'blocks=%b' test-mount/test-file
> blocks=2056
> 
> Make iotest 175 take this into account.
> 
> Reported-by: Thomas Huth 
> Signed-off-by: Max Reitz 
> Reviewed-by: Eric Blake 
> Reviewed-by: Nir Soffer 
> ---
> v3:
> - Actually tested this on an FS with the behavior in question and
>   noticed the patterns were lacking a $.  Since I'm now sending a v3
>   anyway, I might as well fix it with the heavy hammer and make it a
>   ($|[^0-9]).
> - Added example configuration to the commit message [Nir]
> - Kept the R-bs because I didn't feel too bad about doing so.
> ---
>  tests/qemu-iotests/175 | 26 ++
>  tests/qemu-iotests/175.out |  8 
>  2 files changed, 26 insertions(+), 8 deletions(-)

Applied to my block branch.

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH v2 0/3] Update documentation and help related to the preallocation parameter

2019-05-29 Thread Max Reitz
On 28.05.19 08:15, Markus Armbruster wrote:
> Kevin, Max, this series looks ready to me.  Feel free to ask me to take
> it through my tree.

*ask*

:-)



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PULL 0/3] Bitmaps patches

2019-05-29 Thread no-reply
Patchew URL: https://patchew.org/QEMU/20190528235842.29453-1-js...@redhat.com/



Hi,

This series failed the asan build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
time make docker-test-debug@fedora TARGET_LIST=x86_64-softmmu J=14 NETWORK=1
=== TEST SCRIPT END ===

PASS 3 endianness-test /x86_64/endianness/combine/pc
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
tests/test-coroutine -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl 
--test-name="test-coroutine" 
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img 
tests/fdc-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl 
--test-name="fdc-test" 
==10394==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 1 test-coroutine /basic/no-dangling-access
PASS 2 test-coroutine /basic/lifecycle
PASS 3 test-coroutine /basic/yield
==10394==WARNING: ASan is ignoring requested __asan_handle_no_return: stack 
top: 0x7ffe0e2b1000; bottom 0x7f0537af8000; size: 0x00f8d67b9000 (1068750311424)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 4 test-coroutine /basic/nesting
---
PASS 1 fdc-test /x86_64/fdc/cmos
PASS 2 fdc-test /x86_64/fdc/no_media_on_start
PASS 3 fdc-test /x86_64/fdc/read_without_media
==10399==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 4 fdc-test /x86_64/fdc/media_change
PASS 1 test-iov /basic/iov/from-to-buf
PASS 5 fdc-test /x86_64/fdc/sense_interrupt
---
PASS 11 test-aio /aio/event/wait
PASS 12 test-aio /aio/event/flush
PASS 13 test-aio /aio/event/wait/no-flush-cb
==10417==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 14 test-aio /aio/timer/schedule
PASS 15 test-aio /aio/coroutine/queue-chaining
PASS 16 test-aio /aio-gsource/flush
---
PASS 28 test-aio /aio-gsource/timer/schedule
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
tests/test-aio-multithread -m=quick -k --tap < /dev/null | 
./scripts/tap-driver.pl --test-name="test-aio-multithread" 
PASS 1 test-aio-multithread /aio/multi/lifecycle
==10423==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 2 test-aio-multithread /aio/multi/schedule
PASS 3 test-aio-multithread /aio/multi/mutex/contended
PASS 12 fdc-test /x86_64/fdc/read_no_dma_19
PASS 13 fdc-test /x86_64/fdc/fuzz-registers
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img 
tests/ide-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl 
--test-name="ide-test" 
PASS 4 test-aio-multithread /aio/multi/mutex/handoff
==10451==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 1 ide-test /x86_64/ide/identify
==10462==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 5 test-aio-multithread /aio/multi/mutex/mcs
PASS 2 ide-test /x86_64/ide/flush
==10473==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 6 test-aio-multithread /aio/multi/mutex/pthread
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
tests/test-throttle -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl 
--test-name="test-throttle" 
PASS 1 test-throttle /throttle/leak_bucket
---
PASS 6 test-throttle /throttle/detach_attach
PASS 7 test-throttle /throttle/config_functions
PASS 8 test-throttle /throttle/accounting
==10481==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 9 test-throttle /throttle/groups
PASS 10 test-throttle /throttle/config/enabled
PASS 11 test-throttle /throttle/config/conflicting
---
PASS 1 test-thread-pool /thread-pool/submit
PASS 2 test-thread-pool /thread-pool/submit-aio
PASS 3 test-thread-pool /thread-pool/submit-co
==10488==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 4 test-thread-pool /thread-pool/submit-many
==10485==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 4 ide-test /x86_64/ide/bmdma/trim
==10539==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 5 ide-test /x86_64/ide/bmdma/short_prdt
==10546==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some 

Re: [Qemu-block] [PATCH v2] qemu-io: add pattern file for write command

2019-05-29 Thread Max Reitz
On 29.05.19 13:46, Denis Plotnikov wrote:
> The patch allows to provide a pattern file for write
> command. There was no similar ability before.
> ---
>  qemu-io-cmds.c | 58 ++
>  1 file changed, 54 insertions(+), 4 deletions(-)
> 
> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
> index 09750a23ce..148f2ff92a 100644
> --- a/qemu-io-cmds.c
> +++ b/qemu-io-cmds.c
> @@ -21,6 +21,7 @@
>  #include "qemu/option.h"
>  #include "qemu/timer.h"
>  #include "qemu/cutils.h"
> +#include "string.h"
>  
>  #define CMD_NOFILE_OK   0x01
>  
> @@ -343,6 +344,35 @@ static void *qemu_io_alloc(BlockBackend *blk, size_t 
> len, int pattern)
>  return buf;
>  }
>  
> +static void *qemu_io_alloc_from_file(BlockBackend *blk, size_t len,
> + char *file_name)
> +{
> +void *buf;
> +FILE *f = fopen(file_name, "r");
> +
> +if (!f) {
> +printf("cannot open file '%s'\n", file_name);

strerror(errno) would probably be helpful.

> +return NULL;
> +}
> +
> +if (qemuio_misalign) {
> +len += MISALIGN_OFFSET;
> +}
> +buf = blk_blockalign(blk, len);
> +memset(buf, 0, len);
> +
> +if (!fread(buf, sizeof(char), len, f)) {

(1) You should probably check ferror(f),

(2) Currently, on a short read, the rest in buf is unallocated.  I think
it would be nicest if the pattern would be repeated to fill the whole
buffer.

> +printf("file '%s' is empty\n", file_name);
> +qemu_vfree(buf);
> +return NULL;
> +}
> +
> +if (qemuio_misalign) {
> +buf += MISALIGN_OFFSET;

This still uses pointer arithmetic on a void * pointer as Vladimir has
said.  Making buf a char * pointer should be the simplest way to fix this.

> +}
> +return buf;
> +}
> +
>  static void qemu_io_free(void *p)
>  {
>  if (qemuio_misalign) {
> @@ -965,7 +995,7 @@ static const cmdinfo_t write_cmd = {
>  .perm   = BLK_PERM_WRITE,
>  .argmin = 2,
>  .argmax = -1,
> -.args   = "[-bcCfnquz] [-P pattern] off len",
> +.args   = "[-bcCfnquz] [-P pattern | -s source_file] off len",
>  .oneline= "writes a number of bytes at a specified offset",
>  .help   = write_help,
>  };
> @@ -974,7 +1004,7 @@ static int write_f(BlockBackend *blk, int argc, char 
> **argv)
>  {
>  struct timeval t1, t2;
>  bool Cflag = false, qflag = false, bflag = false;
> -bool Pflag = false, zflag = false, cflag = false;
> +bool Pflag = false, zflag = false, cflag = false, sflag = false;
>  int flags = 0;
>  int c, cnt, ret;
>  char *buf = NULL;
> @@ -983,8 +1013,9 @@ static int write_f(BlockBackend *blk, int argc, char 
> **argv)
>  /* Some compilers get confused and warn if this is not initialized.  */
>  int64_t total = 0;
>  int pattern = 0xcd;
> +char *file_name;
>  
> -while ((c = getopt(argc, argv, "bcCfnpP:quz")) != -1) {
> +while ((c = getopt(argc, argv, "bcCfnpP:quzs:")) != -1) {
>  switch (c) {
>  case 'b':
>  bflag = true;
> @@ -1020,6 +1051,10 @@ static int write_f(BlockBackend *blk, int argc, char 
> **argv)
>  case 'z':
>  zflag = true;
>  break;
> +case 's':
> +sflag = true;
> +file_name = g_strdup(optarg);
> +break;
>  default:
>  qemuio_command_usage(&write_cmd);
>  return -EINVAL;
> @@ -1056,6 +1091,14 @@ static int write_f(BlockBackend *blk, int argc, char 
> **argv)
>  return -EINVAL;
>  }
>  
> +if (sflag && Pflag) {
> +printf("-s and -P cannot be specified at the same time\n");
> +}
> +
> +if (sflag && zflag) {
> +printf("-s and -z cannot be specified at the same time\n");
> +}
> +

Well, you also need to return from this function, just printing the
message is not sufficient to abort the operation.

Also, the shorter way would be something like:

if ((int)zflag + (int)Pflag + (int)sflag >= 1) {
printf("Only one of -z, -P, and -s can be specified at the same
time\n");
return -EINVAL;
}

Max

>  offset = cvtnum(argv[optind]);
>  if (offset < 0) {
>  print_cvtnum_err(offset, argv[optind]);
> @@ -1088,7 +1131,14 @@ static int write_f(BlockBackend *blk, int argc, char 
> **argv)
>  }
>  
>  if (!zflag) {
> -buf = qemu_io_alloc(blk, count, pattern);
> +if (sflag) {
> +buf = qemu_io_alloc_from_file(blk, count, file_name);
> +if (!buf) {
> +return -EINVAL;
> +}
> +} else {
> +buf = qemu_io_alloc(blk, count, pattern);
> +}
>  }
>  
>  gettimeofday(&t1, NULL);
> 




signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PATCH] hw/block/fdc: floppy command FIFO memory initialization

2019-05-29 Thread Andrey Shinkevich
The uninitialized memory allocated for the command FIFO of the
floppy controller during the VM hardware initialization incurs
many unwanted reports by Valgrind when VM state is being saved.
That verbosity hardens a search for the real memory issues when
the iotests run. Particularly, the patch eliminates 20 unnecessary
reports of the Valgrind tool in the iotest #169.

Signed-off-by: Andrey Shinkevich 
---
 hw/block/fdc.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index 6f19f12..54e470c 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -2647,6 +2647,10 @@ static void fdctrl_realize_common(DeviceState *dev, 
FDCtrl *fdctrl,
 
 FLOPPY_DPRINTF("init controller\n");
 fdctrl->fifo = qemu_memalign(512, FD_SECTOR_LEN);
+if (fdctrl->fifo) {
+/* To avoid using the uninitialized memory while saving VM state */
+memset(fdctrl->fifo, 0, FD_SECTOR_LEN);
+}
 fdctrl->fifo_size = 512;
 fdctrl->result_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
  fdctrl_result_timer, fdctrl);
-- 
1.8.3.1




Re: [Qemu-block] [PATCH v6 3/3] block/stream: introduce a bottom node

2019-05-29 Thread Max Reitz
On 29.05.19 13:44, Vladimir Sementsov-Ogievskiy wrote:
> 29.05.2019 14:23, Max Reitz wrote:
>> On 29.05.19 09:34, Vladimir Sementsov-Ogievskiy wrote:
>>> 28.05.2019 20:33, Max Reitz wrote:
 On 06.05.19 17:34, Vladimir Sementsov-Ogievskiy wrote:
> From: Andrey Shinkevich 
>
> The bottom node is the intermediate block device that has the base as its
> backing image. It is used instead of the base node while a block stream
> job is running to avoid dependency on the base that may change due to the
> parallel jobs. The change may take place due to a filter node as well that
> is inserted between the base and the intermediate bottom node. It occurs
> when the base node is the top one for another commit or stream job.
> After the introduction of the bottom node, don't freeze its backing child,
> that's the base, anymore.
>
> Suggested-by: Vladimir Sementsov-Ogievskiy 
> Signed-off-by: Andrey Shinkevich 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: Alberto Garcia 
> ---
>block/stream.c | 49 +-
>tests/qemu-iotests/245 |  4 ++--
>2 files changed, 27 insertions(+), 26 deletions(-)
>
> diff --git a/block/stream.c b/block/stream.c
> index 65b13b27e0..fc97c89f81 100644
> --- a/block/stream.c
> +++ b/block/stream.c

 [...]

> @@ -248,26 +250,25 @@ void stream_start(const char *job_id, 
> BlockDriverState *bs,
> * already have our own plans. Also don't allow resize as the 
> image size is
> * queried only at the job start and then cached. */
>s = block_job_create(job_id, &stream_job_driver, NULL, bs,
> - BLK_PERM_CONSISTENT_READ | 
> BLK_PERM_WRITE_UNCHANGED |
> - BLK_PERM_GRAPH_MOD,
> - BLK_PERM_CONSISTENT_READ | 
> BLK_PERM_WRITE_UNCHANGED |
> - BLK_PERM_WRITE,
> + basic_flags | BLK_PERM_GRAPH_MOD,
> + basic_flags | BLK_PERM_WRITE,
> speed, creation_flags, NULL, NULL, errp);
>if (!s) {
>goto fail;
>}
>
> -/* Block all intermediate nodes between bs and base, because they 
> will
> - * disappear from the chain after this operation. The streaming job 
> reads
> - * every block only once, assuming that it doesn't change, so block 
> writes
> - * and resizes. */
> -for (iter = backing_bs(bs); iter && iter != base; iter = 
> backing_bs(iter)) {
> -block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
> -   BLK_PERM_CONSISTENT_READ | 
> BLK_PERM_WRITE_UNCHANGED,
> -   &error_abort);
> +/*
> + * Block all intermediate nodes between bs and bottom (inclusive), 
> because
> + * they will disappear from the chain after this operation. The 
> streaming
> + * job reads every block only once, assuming that it doesn't change, 
> so
> + * forbid writes and resizes.
> + */
> +for (iter = bs; iter != bottom; iter = backing_bs(iter)) {
> +block_job_add_bdrv(&s->common, "intermediate node", 
> backing_bs(iter),
> +   0, basic_flags, &error_abort);

 I don’t understand this change.  Isn’t it doing exactly the same as before?

 (If so, I just find it harder to read because every iteration isn’t
 about @iter but backing_bs(iter).)
>>>
>>> Hm, it's the same, but not using base. We may save old loop if calculate 
>>> base exactly before
>>> the loop (or at least not separated from it by any yield-point)
>>
>> But we are still in stream_start() here.  base cannot have changed yet,
>> can it?
>>
>> (I don’t even think this is about yield points.  As long as
>> stream_start() doesn’t return, the QMP monitor won’t receive any new
>> commands.)
>>
> 
> But block graph may be modified not only from qmp. From block jobs too. If 
> base is another filter, it may
> be dropped in any time.

Ah, yes, that’s true.  And I suppose that can happen in
bdrv_reopen_set_read_only().  Hm.

OK, I still don’t like the loop how it is currently written.  Maybe I’d
like it better with s/iter/parent_bs/?

Well, or you proposed would work, too, i.e. base = backing_bs(bottom)
just before the loop – with a comment that explains why we need that.
That’s probably better.

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH v2] qemu-io: add pattern file for write command

2019-05-29 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/20190529114624.23107-1-dplotni...@virtuozzo.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [Qemu-devel] [PATCH v2] qemu-io: add pattern file for write command
Type: series
Message-id: 20190529114624.23107-1-dplotni...@virtuozzo.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

From https://github.com/patchew-project/qemu
 * [new tag]   
patchew/20190529114624.23107-1-dplotni...@virtuozzo.com -> 
patchew/20190529114624.23107-1-dplotni...@virtuozzo.com
Switched to a new branch 'test'
3b7f7726d7 qemu-io: add pattern file for write command

=== OUTPUT BEGIN ===
ERROR: Missing Signed-off-by: line(s)

total: 1 errors, 0 warnings, 107 lines checked

Commit 3b7f7726d7d4 (qemu-io: add pattern file for write command) has style 
problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20190529114624.23107-1-dplotni...@virtuozzo.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[Qemu-block] [PATCH v2] qemu-io: add pattern file for write command

2019-05-29 Thread Denis Plotnikov
The patch allows to provide a pattern file for write
command. There was no similar ability before.
---
 qemu-io-cmds.c | 58 ++
 1 file changed, 54 insertions(+), 4 deletions(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 09750a23ce..148f2ff92a 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -21,6 +21,7 @@
 #include "qemu/option.h"
 #include "qemu/timer.h"
 #include "qemu/cutils.h"
+#include "string.h"
 
 #define CMD_NOFILE_OK   0x01
 
@@ -343,6 +344,35 @@ static void *qemu_io_alloc(BlockBackend *blk, size_t len, 
int pattern)
 return buf;
 }
 
+static void *qemu_io_alloc_from_file(BlockBackend *blk, size_t len,
+ char *file_name)
+{
+void *buf;
+FILE *f = fopen(file_name, "r");
+
+if (!f) {
+printf("cannot open file '%s'\n", file_name);
+return NULL;
+}
+
+if (qemuio_misalign) {
+len += MISALIGN_OFFSET;
+}
+buf = blk_blockalign(blk, len);
+memset(buf, 0, len);
+
+if (!fread(buf, sizeof(char), len, f)) {
+printf("file '%s' is empty\n", file_name);
+qemu_vfree(buf);
+return NULL;
+}
+
+if (qemuio_misalign) {
+buf += MISALIGN_OFFSET;
+}
+return buf;
+}
+
 static void qemu_io_free(void *p)
 {
 if (qemuio_misalign) {
@@ -965,7 +995,7 @@ static const cmdinfo_t write_cmd = {
 .perm   = BLK_PERM_WRITE,
 .argmin = 2,
 .argmax = -1,
-.args   = "[-bcCfnquz] [-P pattern] off len",
+.args   = "[-bcCfnquz] [-P pattern | -s source_file] off len",
 .oneline= "writes a number of bytes at a specified offset",
 .help   = write_help,
 };
@@ -974,7 +1004,7 @@ static int write_f(BlockBackend *blk, int argc, char 
**argv)
 {
 struct timeval t1, t2;
 bool Cflag = false, qflag = false, bflag = false;
-bool Pflag = false, zflag = false, cflag = false;
+bool Pflag = false, zflag = false, cflag = false, sflag = false;
 int flags = 0;
 int c, cnt, ret;
 char *buf = NULL;
@@ -983,8 +1013,9 @@ static int write_f(BlockBackend *blk, int argc, char 
**argv)
 /* Some compilers get confused and warn if this is not initialized.  */
 int64_t total = 0;
 int pattern = 0xcd;
+char *file_name;
 
-while ((c = getopt(argc, argv, "bcCfnpP:quz")) != -1) {
+while ((c = getopt(argc, argv, "bcCfnpP:quzs:")) != -1) {
 switch (c) {
 case 'b':
 bflag = true;
@@ -1020,6 +1051,10 @@ static int write_f(BlockBackend *blk, int argc, char 
**argv)
 case 'z':
 zflag = true;
 break;
+case 's':
+sflag = true;
+file_name = g_strdup(optarg);
+break;
 default:
 qemuio_command_usage(&write_cmd);
 return -EINVAL;
@@ -1056,6 +1091,14 @@ static int write_f(BlockBackend *blk, int argc, char 
**argv)
 return -EINVAL;
 }
 
+if (sflag && Pflag) {
+printf("-s and -P cannot be specified at the same time\n");
+}
+
+if (sflag && zflag) {
+printf("-s and -z cannot be specified at the same time\n");
+}
+
 offset = cvtnum(argv[optind]);
 if (offset < 0) {
 print_cvtnum_err(offset, argv[optind]);
@@ -1088,7 +1131,14 @@ static int write_f(BlockBackend *blk, int argc, char 
**argv)
 }
 
 if (!zflag) {
-buf = qemu_io_alloc(blk, count, pattern);
+if (sflag) {
+buf = qemu_io_alloc_from_file(blk, count, file_name);
+if (!buf) {
+return -EINVAL;
+}
+} else {
+buf = qemu_io_alloc(blk, count, pattern);
+}
 }
 
 gettimeofday(&t1, NULL);
-- 
2.17.0




Re: [Qemu-block] [PATCH v6 3/3] block/stream: introduce a bottom node

2019-05-29 Thread Vladimir Sementsov-Ogievskiy
29.05.2019 14:23, Max Reitz wrote:
> On 29.05.19 09:34, Vladimir Sementsov-Ogievskiy wrote:
>> 28.05.2019 20:33, Max Reitz wrote:
>>> On 06.05.19 17:34, Vladimir Sementsov-Ogievskiy wrote:
 From: Andrey Shinkevich 

 The bottom node is the intermediate block device that has the base as its
 backing image. It is used instead of the base node while a block stream
 job is running to avoid dependency on the base that may change due to the
 parallel jobs. The change may take place due to a filter node as well that
 is inserted between the base and the intermediate bottom node. It occurs
 when the base node is the top one for another commit or stream job.
 After the introduction of the bottom node, don't freeze its backing child,
 that's the base, anymore.

 Suggested-by: Vladimir Sementsov-Ogievskiy 
 Signed-off-by: Andrey Shinkevich 
 Reviewed-by: Vladimir Sementsov-Ogievskiy 
 Reviewed-by: Alberto Garcia 
 ---
block/stream.c | 49 +-
tests/qemu-iotests/245 |  4 ++--
2 files changed, 27 insertions(+), 26 deletions(-)

 diff --git a/block/stream.c b/block/stream.c
 index 65b13b27e0..fc97c89f81 100644
 --- a/block/stream.c
 +++ b/block/stream.c
>>>
>>> [...]
>>>
 @@ -248,26 +250,25 @@ void stream_start(const char *job_id, 
 BlockDriverState *bs,
 * already have our own plans. Also don't allow resize as the image 
 size is
 * queried only at the job start and then cached. */
s = block_job_create(job_id, &stream_job_driver, NULL, bs,
 - BLK_PERM_CONSISTENT_READ | 
 BLK_PERM_WRITE_UNCHANGED |
 - BLK_PERM_GRAPH_MOD,
 - BLK_PERM_CONSISTENT_READ | 
 BLK_PERM_WRITE_UNCHANGED |
 - BLK_PERM_WRITE,
 + basic_flags | BLK_PERM_GRAPH_MOD,
 + basic_flags | BLK_PERM_WRITE,
 speed, creation_flags, NULL, NULL, errp);
if (!s) {
goto fail;
}

 -/* Block all intermediate nodes between bs and base, because they will
 - * disappear from the chain after this operation. The streaming job 
 reads
 - * every block only once, assuming that it doesn't change, so block 
 writes
 - * and resizes. */
 -for (iter = backing_bs(bs); iter && iter != base; iter = 
 backing_bs(iter)) {
 -block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
 -   BLK_PERM_CONSISTENT_READ | 
 BLK_PERM_WRITE_UNCHANGED,
 -   &error_abort);
 +/*
 + * Block all intermediate nodes between bs and bottom (inclusive), 
 because
 + * they will disappear from the chain after this operation. The 
 streaming
 + * job reads every block only once, assuming that it doesn't change, 
 so
 + * forbid writes and resizes.
 + */
 +for (iter = bs; iter != bottom; iter = backing_bs(iter)) {
 +block_job_add_bdrv(&s->common, "intermediate node", 
 backing_bs(iter),
 +   0, basic_flags, &error_abort);
>>>
>>> I don’t understand this change.  Isn’t it doing exactly the same as before?
>>>
>>> (If so, I just find it harder to read because every iteration isn’t
>>> about @iter but backing_bs(iter).)
>>
>> Hm, it's the same, but not using base. We may save old loop if calculate 
>> base exactly before
>> the loop (or at least not separated from it by any yield-point)
> 
> But we are still in stream_start() here.  base cannot have changed yet,
> can it?
> 
> (I don’t even think this is about yield points.  As long as
> stream_start() doesn’t return, the QMP monitor won’t receive any new
> commands.)
> 

But block graph may be modified not only from qmp. From block jobs too. If base 
is another filter, it may
be dropped in any time.



-- 
Best regards,
Vladimir


Re: [Qemu-block] [PATCH v0 1/3] qcow2: introduce compression type feature

2019-05-29 Thread Vladimir Sementsov-Ogievskiy
28.05.2019 17:37, Denis Plotnikov wrote:
> The patch adds some preparation parts for incompatible compression type
> feature to QCOW2 header that indicates that *all* compressed clusters
> must be (de)compressed using a certain compression type.
> 
> It is implied that the compression type is set on the image creation and
> can be changed only later by image conversion, thus compression type
> defines the only compression algorithm used for the image.
> 
> The goal of the feature is to add support of other compression algorithms
> to qcow2. For example, ZSTD which is more effective on compression than ZLIB.
> It works roughly x2 faster than ZLIB providing a comparable compression ratio
> and therefore provide a performance advantage in backup scenarios.
> 
> The default compression is ZLIB. Images created with ZLIB compression type
> is backward compatible with older qemu versions.
> 
> Signed-off-by: Denis Plotnikov 
> ---

[...]

> diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
> index af5711e533..cebcbc4f2f 100644
> --- a/docs/interop/qcow2.txt
> +++ b/docs/interop/qcow2.txt
> @@ -109,7 +109,14 @@ in the description of a field.
>   An External Data File Name header extension 
> may
>   be present if this bit is set.
>   
> -Bits 3-63:  Reserved (set to 0)
> +Bit 3:  Compression type. If the bit is set, then the
> +type of compression the image uses is set in 
> the
> +header extension. When the bit is set the
> +compression type extension header must be 
> present.
> +When the bit is not set the compression type
> +header must absent.
> +
> +Bits 4-63:  Reserved (set to 0)
>   
>80 -  87:  compatible_features
>   Bitmask of compatible features. An implementation can
> @@ -175,6 +182,7 @@ be stored. Each extension has a structure like the 
> following:
>   0x23852875 - Bitmaps extension
>   0x0537be77 - Full disk encryption header pointer
>   0x44415441 - External data file name string
> +0x434D5052 - Compression type extension
>   other  - Unknown header extension, can be safely
>ignored
>   
> @@ -771,3 +779,30 @@ In the image file the 'enabled' state is reflected by 
> the 'auto' flag. If this
>   flag is set, the software must consider the bitmap as 'enabled' and start
>   tracking virtual disk changes to this bitmap from the first write to the
>   virtual disk. If this flag is not set then the bitmap is disabled.
> +
> +
> +== Compression type extension ==
> +
> +The compression type extension is an optional header extension. It stores the
> +compression type used for disk clusters (de)compression.
> +A single compression type is applied to all compressed disk clusters,
> +with no way to change compression types per cluster. Two clusters of the 
> image
> +couldn't be compressed with different compression types.
> +
> +The compression type is set on image creation. The only way to change
> +the compression type is to convert the image explicitly.
> +
> +The compression type extension is present if and only if the incompatible
> +compression type bit is set. When the bit is not set the compression type
> +header must be absent.

Hmm, not the first time we introduce incompatible bit to make incompatible
header extension, as all header extensions are compatible by default, as for
unknown header extension we have:

 other  - Unknown header extension, can be safely
  ignored

Hm. Should we instead define one incompatible bit for all such future cases,
i.e. add incompatible bit HEADER_EXTENSION_FLAGS, add flag field to header
extension declaration, and define one flag in it: COMPATIBLE, which will mean,
that this extension may be safely ignored (old default behavior).

> +
> +When the compression type bit is not set and the compression type header
> +extension is absent, ZLIB compression is used for compressed clusters.
> +This defines default image compression type: ZLIB.
> +Qemu < 4.1 can use images created with compression type ZLIB without any
> +additional preparations and cannot use images created with compression
> +types != ZLIB.
> +
> +Available compression types:
> +0: ZLIB
> +1: ZSTD



-- 
Best regards,
Vladimir


Re: [Qemu-block] [PATCH v6 3/3] block/stream: introduce a bottom node

2019-05-29 Thread Max Reitz
On 29.05.19 09:34, Vladimir Sementsov-Ogievskiy wrote:
> 28.05.2019 20:33, Max Reitz wrote:
>> On 06.05.19 17:34, Vladimir Sementsov-Ogievskiy wrote:
>>> From: Andrey Shinkevich 
>>>
>>> The bottom node is the intermediate block device that has the base as its
>>> backing image. It is used instead of the base node while a block stream
>>> job is running to avoid dependency on the base that may change due to the
>>> parallel jobs. The change may take place due to a filter node as well that
>>> is inserted between the base and the intermediate bottom node. It occurs
>>> when the base node is the top one for another commit or stream job.
>>> After the introduction of the bottom node, don't freeze its backing child,
>>> that's the base, anymore.
>>>
>>> Suggested-by: Vladimir Sementsov-Ogievskiy 
>>> Signed-off-by: Andrey Shinkevich 
>>> Reviewed-by: Vladimir Sementsov-Ogievskiy 
>>> Reviewed-by: Alberto Garcia 
>>> ---
>>>   block/stream.c | 49 +-
>>>   tests/qemu-iotests/245 |  4 ++--
>>>   2 files changed, 27 insertions(+), 26 deletions(-)
>>>
>>> diff --git a/block/stream.c b/block/stream.c
>>> index 65b13b27e0..fc97c89f81 100644
>>> --- a/block/stream.c
>>> +++ b/block/stream.c
>>
>> [...]
>>
>>> @@ -248,26 +250,25 @@ void stream_start(const char *job_id, 
>>> BlockDriverState *bs,
>>>* already have our own plans. Also don't allow resize as the image 
>>> size is
>>>* queried only at the job start and then cached. */
>>>   s = block_job_create(job_id, &stream_job_driver, NULL, bs,
>>> - BLK_PERM_CONSISTENT_READ | 
>>> BLK_PERM_WRITE_UNCHANGED |
>>> - BLK_PERM_GRAPH_MOD,
>>> - BLK_PERM_CONSISTENT_READ | 
>>> BLK_PERM_WRITE_UNCHANGED |
>>> - BLK_PERM_WRITE,
>>> + basic_flags | BLK_PERM_GRAPH_MOD,
>>> + basic_flags | BLK_PERM_WRITE,
>>>speed, creation_flags, NULL, NULL, errp);
>>>   if (!s) {
>>>   goto fail;
>>>   }
>>>   
>>> -/* Block all intermediate nodes between bs and base, because they will
>>> - * disappear from the chain after this operation. The streaming job 
>>> reads
>>> - * every block only once, assuming that it doesn't change, so block 
>>> writes
>>> - * and resizes. */
>>> -for (iter = backing_bs(bs); iter && iter != base; iter = 
>>> backing_bs(iter)) {
>>> -block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
>>> -   BLK_PERM_CONSISTENT_READ | 
>>> BLK_PERM_WRITE_UNCHANGED,
>>> -   &error_abort);
>>> +/*
>>> + * Block all intermediate nodes between bs and bottom (inclusive), 
>>> because
>>> + * they will disappear from the chain after this operation. The 
>>> streaming
>>> + * job reads every block only once, assuming that it doesn't change, so
>>> + * forbid writes and resizes.
>>> + */
>>> +for (iter = bs; iter != bottom; iter = backing_bs(iter)) {
>>> +block_job_add_bdrv(&s->common, "intermediate node", 
>>> backing_bs(iter),
>>> +   0, basic_flags, &error_abort);
>>
>> I don’t understand this change.  Isn’t it doing exactly the same as before?
>>
>> (If so, I just find it harder to read because every iteration isn’t
>> about @iter but backing_bs(iter).)
> 
> Hm, it's the same, but not using base. We may save old loop if calculate base 
> exactly before
> the loop (or at least not separated from it by any yield-point)

But we are still in stream_start() here.  base cannot have changed yet,
can it?

(I don’t even think this is about yield points.  As long as
stream_start() doesn’t return, the QMP monitor won’t receive any new
commands.)

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v0 2/3] qcow2: add compression type processing

2019-05-29 Thread Vladimir Sementsov-Ogievskiy
28.05.2019 17:37, Denis Plotnikov wrote:
> With the patch, qcow2 is able to process image compression type
> defined in the image header and choose the corresponding method
> for clusters compressing.
> 
> Also, it rework the cluster compression code for adding more
> compression types.
> 
> Signed-off-by: Denis Plotnikov 
> ---
>   block/qcow2.c | 103 --
>   1 file changed, 92 insertions(+), 11 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index c4b5b93408..90f15cc3c9 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -400,11 +400,39 @@ static int qcow2_read_extensions(BlockDriverState *bs, 
> uint64_t start_offset,
>   break;
>   
>   case QCOW2_EXT_MAGIC_COMPRESSION_TYPE:
> +/* Compression type always goes with the compression type bit 
> set */
> +if (!(s->incompatible_features & 
> QCOW2_INCOMPAT_COMPRESSION_TYPE)) {
> +error_setg(errp,
> +   "compression_type_ext: "
> +   "expect compression type bit set");
> +return -EINVAL;
> +}
> +
> +ret = bdrv_pread(bs->file, offset, &s->compression_type, 
> ext.len);
> +s->compression_type = be32_to_cpu(s->compression_type);
> +
> +if (ret < 0) {
> +error_setg_errno(errp, -ret,
> + "ERROR: Could not read compression type");
> +return ret;
> +}
> +
>   /*
> - * Setting compression type to BDRVQcow2State->compression_type
> - * from the image header is going to be here
> + * The default compression type is not allowed when the extension
> + * is present. ZLIB is used as the default compression type.
> + * When compression type extension header is present then
> + * compression_type should have a value different from the 
> default.
>*/
> - break;
> +if (s->compression_type == QCOW2_COMPRESSION_TYPE_ZLIB) {
> +error_setg(errp,
> +   "compression_type_ext:"
> +   "invalid compression type %d",
> +   QCOW2_COMPRESSION_TYPE_ZLIB);
> +}
> +#ifdef DEBUG_EXT
> +printf("Qcow2: image compression type %s\n", 
> s->compression_type);
> +#endif
> +break;
>   
>   case QCOW2_EXT_MAGIC_DATA_FILE:
>   {
> @@ -1492,6 +1520,9 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
> *bs, QDict *options,
>   QLIST_INIT(&s->cluster_allocs);
>   QTAILQ_INIT(&s->discards);
>   
> +/* Set default compression type */
> +s->compression_type = QCOW2_COMPRESSION_TYPE_ZLIB;
> +
>   /* read qcow2 extensions */
>   if (qcow2_read_extensions(bs, header.header_length, ext_end, NULL,
> flags, &update_header, &local_err)) {
> @@ -1500,6 +1531,34 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
> *bs, QDict *options,
>   goto fail;
>   }
>   
> +/*
> + * The compression type is set on the extension header processing
> + * if the compression type extension header is present.
> + * When the compression type is different from ZLIB (default) there
> + * should be both the compression type bit and the compression
> + * type extension header set. When the ZLIB (default) compression
> + * type is used there should be neither the compression type bit nor
> + * the compression type extension header set.
> + */
> +
> +if ((s->incompatible_features & QCOW2_INCOMPAT_COMPRESSION_TYPE) &
> +(s->compression_type == QCOW2_COMPRESSION_TYPE_ZLIB)) {
> +error_setg(errp, "Illegal compression type setting");
> +ret = -EINVAL;
> +goto fail;
> +}
> +
> +/* Check available compression types */
> +switch (s->compression_type) {
> +case QCOW2_COMPRESSION_TYPE_ZLIB:
> +break;
> +
> +default:
> +error_setg(errp, "Unknown compression type");
> +ret = -EINVAL;
> +goto fail;
> +}
> +
>   /* Open external data file */
>   s->data_file = bdrv_open_child(NULL, options, "data-file", bs, 
> &child_file,
>  true, &local_err);
> @@ -3970,7 +4029,7 @@ fail:
>   }
>   
>   /*
> - * qcow2_compress()
> + * zlib_compress()
>*
>* @dest - destination buffer, @dest_size bytes
>* @src - source buffer, @src_size bytes
> @@ -3979,7 +4038,7 @@ fail:
>*  -1 destination buffer is not enough to store compressed data
>*  -2 on any other error
>*/
> -static ssize_t qcow2_compress(void *dest, size_t dest_size,
> +static ssize_t zlib_compress(void *dest, size_t dest_size,
> const void *src, size_t src_size)
>   {
>   ssize_t ret;
> @@ -4013,7 +4072,7 @@ stati

Re: [Qemu-block] [PATCH] pr-manager-helper: fix pr process been killed when reconectting

2019-05-29 Thread Paolo Bonzini
On 29/05/19 10:37, Jie Wang wrote:
> when the problem appeared:
> 
> 1. qemu will initialize pr-helper and connect to it cyclically, but
> always failed because no running pr-helper process to connect.
> 
> 2. libvirt will always waiting for connected event, but will never to
> start new pr-helper process because not receive disconnect event.
> 
> I'm not found the best way to solve this problem, can you give me some
> suggestion?

I can't find a way that is better than your patch, either.  Another
possible problem is that this could cause libvirt to spawn two helpers
if you have a race like

qemu: report DISCONNECTED
libvirt: start pr-helper #1
qemu: report DISCONNECTED
libvirt: start pr-helper #2
pr-helper #1: create socket
pr-helper #2: fail to start

But it should not be an issue since one of the two pr-helpers will clean
up after itself.

Paolo



Re: [Qemu-block] [PATCH] pr-manager-helper: fix pr process been killed when reconectting

2019-05-29 Thread Paolo Bonzini
On 29/05/19 09:33, Michal Privoznik wrote:
> On 5/28/19 7:45 PM, Paolo Bonzini wrote:
>> On 28/05/19 15:06, Jie Wang wrote:
>>> if pr-helper been killed and qemu send disconnect event to libvirt
>>> and libvirt started a new pr-helper process, the new pr-heleper
>>> been killed again when qemu is connectting to the new pr-helper,
>>> qemu will never send disconnect to libvirt, and libvirt will never
>>> receive connected event.
>>
>> I think this would let a guest "spam" events just by sending a lot PR
>> commands.  Also, as you said, in this case QEMU has never sent a
>> "connected" event, so I'm not sure why it should send a disconnection
>> event.
> 
> So pr manager is initialized on the first PR command and not when qemu
> is starting?

It is initialized when QEMU is started, but if it dies, that's not
detected until the first PR command.  The command is retried for 5
seconds, which should give libvirt ample time to restart the PR manager
(and it's an exceptional situation anyway).

> If a user inside the guest could somehow kill pr-helper process in the
> host then yes, they could spam libvirt/qemu. But if a user from inside a
> guest can kill a process in the host that is much bigger problem than
> spaming libvirt.

This is true.

>> Does libvirt monitor at all the pr-helper to check if it dies?  Or does
>> it rely exclusively on QEMU's events?
> 
> Libvirt relies solely on QEMU's events. Just like with qemu process
> itself, libvirt can't rely on SIGCHILD because the daemon might be
> restarted which would reparent all qemu and pr-helper processes
> rendering libvirt wait for SIGCHILD useless.
> 
> But there is an exception to this: when libvirt is spawning pr-helper it
> does so by following these steps:
> 
> 1) Try to acquire (lock) pidfile
> 2) unlink(socket)
> 3) spawn pr-helper process (this yields child's PID)
> 4) wait some time until socket is created
> 5) some follow up work (move child's PID into same cgroup as qemu's main
> thread, relabel the socket so that qemu can access it)
> 
> If any of these steps fails then child is killed. However, the PID is
> not recorded anywhere and thus is forgotten once control jumps out of
> the function.

Note that qemu-pr-helper supports the systemd socket activation
protocol.  Would it help if libvirt used it?

Paolo



Re: [Qemu-block] [Qemu-devel] [PATCH 3/3] block/qcow2-bitmap: rewrite bitmap reopening logic

2019-05-29 Thread Vladimir Sementsov-Ogievskiy
29.05.2019 2:24, John Snow wrote:
> 
> 
> On 5/23/19 11:47 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Current logic
>> =
>>
>> Reopen rw -> ro:
>>
>> Store bitmaps and release BdrvDirtyBitmap's.
>>
>> Reopen ro -> rw:
>>
>> Load bitmap list
>> Skip bitmaps which for which we don't have BdrvDirtyBitmap [this is
>> the worst thing]
> 
> ...ah.
> 
>> A kind of fail with error message if we see not readonly bitmap
>>
>> This leads to at least the following bug:
>>
>> Assume we have bitmap0.
> 
> Presumably a normal, persistent bitmap.
> 
>> Create external snapshot
>>bitmap0 is stored and therefore removed
> 
> Written out to the backing file and removed from memory, because we've
> reopened rw->ro; this is because of the migration "safety" clause where
> we simply drop these bitmaps.
> 
> ...Ah, and then we don't actually open them readonly again; that entire
> stanza in reopen_ro never fires off at all because the bitmaps are
> already gone.
> 
>> Commit snapshot
>>now we have no bitmaps
> 
> When we reopen the base as rw as you note, we skipped bitmaps for which
> we had no in-memory bitmap for -- because the readonly logic was really
> expecting to have these in-memory.
> 
> I should probably say that for the sake of migration correctness, the
> way we flush things to disk and remove it from memory on write is really
> bothersome to keep correct. The migration logic is so particular that it
> keeps causing issues elsewhere, of which this is another symptom.
> 
>> Do some writes from guest (*)
>>they are not marked in bitmap
> 
> Yikes, right.
> 
>> Shutdown
>> Start
>>bitmap0 is loaded as valid, but it is actually broken! It misses
>>writes (*)
> 
> Yikes; because it was consistent on flush and we skipped it on load,
> it's not marked as IN_USE and we are free to load it up again.
> 
>> Incremental backup
>>it will be inconsistent
>>
> 
> Good writeup, thank you.
> 
>> New logic
>> =
>>
>> Reopen rw -> ro:
>>
>> Store bitmaps and don't remove them from RAM, only mark them readonly
>> (the latter is already done, but didn't work properly because of
>> removing in storing function)
>>
> 
> Yes! I like this change.
> 
>> Reopen to rw (don't filter case with reopen rw -> rw, it is supported
>> now in qcow2_reopen_bitmaps_rw)
>>
> 
> OK, so we allow rw --> rw without trying to be fussy about it. That
> seems fine to me.
> 
>> Load bitmap list
>>
>> Carefully consider all possible combinations of bitmaps in RAM and in
>> the image, try to fix corruptions, print corresponding error messages.
>>
>> Also, call qcow2_reopen_bitmaps_rw after the whole reopen queue
>> commited, as otherwise we can't write to the qcow2 file child on reopen
>> ro -> rw.
>>
> 
> I take it this is the change that moved the invocation logic into
> bdrv_reopen_multiple instead of bdrv_reopen_commit; also notably we no
> longer check the transition edge which is much simpler.
> 
> oh, I see why you're doing it there now...
> 
>> Also, add corresponding test-cases to 255.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> ---
>>   block/qcow2.h  |   5 +-
>>   include/block/block_int.h  |   2 +-
>>   block.c|  29 ++
>>   block/qcow2-bitmap.c   | 197 ++---
>>   block/qcow2.c  |   2 +-
>>   tests/qemu-iotests/255 |   2 +
>>   tests/qemu-iotests/255.out |  35 +++
>>   7 files changed, 193 insertions(+), 79 deletions(-)
>>
>> diff --git a/block/qcow2.h b/block/qcow2.h
>> index 8d92ef1fee..5928306e62 100644
>> --- a/block/qcow2.h
>> +++ b/block/qcow2.h
>> @@ -725,9 +725,10 @@ Qcow2BitmapInfoList 
>> *qcow2_get_bitmap_info_list(BlockDriverState *bs,
>>   Error **errp);
>>   int qcow2_reopen_bitmaps_rw_hint(BlockDriverState *bs, bool 
>> *header_updated,
>>Error **errp);
>> -int qcow2_reopen_bitmaps_rw(BlockDriverState *bs, Error **errp);
>> +void qcow2_reopen_bitmaps_rw(BlockDriverState *bs);
>>   int qcow2_truncate_bitmaps_check(BlockDriverState *bs, Error **errp);
>> -void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, Error 
>> **errp);
>> +void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
>> +  bool release_stored, Error 
>> **errp);
>>   int qcow2_reopen_bitmaps_ro(BlockDriverState *bs, Error **errp);
>>   bool qcow2_can_store_new_dirty_bitmap(BlockDriverState *bs,
>> const char *name,
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index 1eebc7c8f3..9a694f3da0 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -536,7 +536,7 @@ struct BlockDriver {
>>* as rw. This handler should realize it. It also should unset readonly
>>* field of BlockDirtyBitmap's in case of success.
>>*/
>> -int (*bdrv_reopen_bitmaps_rw)(BlockDriverState *bs, Err

[Qemu-block] [PATCH v1] qemu-io: add pattern file for write command

2019-05-29 Thread Denis Plotnikov
The patch allows to provide a pattern file for write
command. There was no similar ability before.

Signed-off-by: Denis Plotnikov 
---
 qemu-io-cmds.c | 58 ++
 1 file changed, 54 insertions(+), 4 deletions(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 09750a23ce..6786536be7 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -21,6 +21,7 @@
 #include "qemu/option.h"
 #include "qemu/timer.h"
 #include "qemu/cutils.h"
+#include "string.h"
 
 #define CMD_NOFILE_OK   0x01
 
@@ -343,6 +344,35 @@ static void *qemu_io_alloc(BlockBackend *blk, size_t len, 
int pattern)
 return buf;
 }
 
+static void *qemu_io_alloc_from_file(BlockBackend *blk, size_t len,
+ char *file_name)
+{
+void *buf;
+FILE *f = fopen(file_name, "r");
+
+if (!f) {
+printf("cannot open file '%s'\n", file_name);
+return NULL;
+}
+
+if (qemuio_misalign) {
+len += MISALIGN_OFFSET;
+}
+buf = blk_blockalign(blk, len);
+memset(buf, 0, len);
+
+if (!fread(buf, sizeof(char), len, f)) {
+printf("file '%s' is empty\n", file_name);
+g_free(buf);
+return NULL;
+}
+
+if (qemuio_misalign) {
+buf += MISALIGN_OFFSET;
+}
+return buf;
+}
+
 static void qemu_io_free(void *p)
 {
 if (qemuio_misalign) {
@@ -965,7 +995,7 @@ static const cmdinfo_t write_cmd = {
 .perm   = BLK_PERM_WRITE,
 .argmin = 2,
 .argmax = -1,
-.args   = "[-bcCfnquz] [-P pattern] off len",
+.args   = "[-bcCfnquz] [-P pattern | -s source_file] off len",
 .oneline= "writes a number of bytes at a specified offset",
 .help   = write_help,
 };
@@ -974,7 +1004,7 @@ static int write_f(BlockBackend *blk, int argc, char 
**argv)
 {
 struct timeval t1, t2;
 bool Cflag = false, qflag = false, bflag = false;
-bool Pflag = false, zflag = false, cflag = false;
+bool Pflag = false, zflag = false, cflag = false, sflag = false;
 int flags = 0;
 int c, cnt, ret;
 char *buf = NULL;
@@ -983,8 +1013,9 @@ static int write_f(BlockBackend *blk, int argc, char 
**argv)
 /* Some compilers get confused and warn if this is not initialized.  */
 int64_t total = 0;
 int pattern = 0xcd;
+char *file_name;
 
-while ((c = getopt(argc, argv, "bcCfnpP:quz")) != -1) {
+while ((c = getopt(argc, argv, "bcCfnpP:quzs:")) != -1) {
 switch (c) {
 case 'b':
 bflag = true;
@@ -1020,6 +1051,10 @@ static int write_f(BlockBackend *blk, int argc, char 
**argv)
 case 'z':
 zflag = true;
 break;
+case 's':
+sflag = true;
+file_name = g_strdup(optarg);
+break;
 default:
 qemuio_command_usage(&write_cmd);
 return -EINVAL;
@@ -1056,6 +1091,14 @@ static int write_f(BlockBackend *blk, int argc, char 
**argv)
 return -EINVAL;
 }
 
+if (sflag && Pflag) {
+printf("-s and -P cannot be specified at the same time\n");
+}
+
+if (sflag && zflag) {
+printf("-s and -z cannot be specified at the same time\n");
+}
+
 offset = cvtnum(argv[optind]);
 if (offset < 0) {
 print_cvtnum_err(offset, argv[optind]);
@@ -1088,7 +1131,14 @@ static int write_f(BlockBackend *blk, int argc, char 
**argv)
 }
 
 if (!zflag) {
-buf = qemu_io_alloc(blk, count, pattern);
+if (sflag) {
+buf = qemu_io_alloc_from_file(blk, count, file_name);
+if (!buf) {
+return -EINVAL;
+}
+} else {
+buf = qemu_io_alloc(blk, count, pattern);
+}
 }
 
 gettimeofday(&t1, NULL);
-- 
2.17.0




Re: [Qemu-block] [PATCH v0] qemu-io: add pattern file for write command

2019-05-29 Thread Denis Plotnikov


On 29.05.2019 11:11, Vladimir Sementsov-Ogievskiy wrote:
> 29.05.2019 9:48, Denis Plotnikov wrote:
>> The patch allows to provide a pattern file for write
>> command. There was no similar ability before.
>>
>> Signed-off-by: Denis Plotnikov 
>> ---
>>qemu-io-cmds.c | 58 ++
>>1 file changed, 54 insertions(+), 4 deletions(-)
>>
>> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
>> index 09750a23ce..b93955116f 100644
>> --- a/qemu-io-cmds.c
>> +++ b/qemu-io-cmds.c
>> @@ -21,6 +21,7 @@
>>#include "qemu/option.h"
>>#include "qemu/timer.h"
>>#include "qemu/cutils.h"
>> +#include "string.h"
>>
>>#define CMD_NOFILE_OK   0x01
>>
>> @@ -343,6 +344,35 @@ static void *qemu_io_alloc(BlockBackend *blk, size_t 
>> len, int pattern)
>>return buf;
>>}
>>
>> +static void *qemu_io_alloc_from_file(BlockBackend *blk, size_t len,
>> + char *file_name)
>> +{
>> +void *buf;
>> +FILE *f = fopen(file_name, "r");
>> +
>> +if (!f) {
>> +printf("cannot open file '%s'\n", file_name);
> 
> I see, printf fro errors is preexisting in qemu-io-cmds.c, but error_report 
> is used here too,
> I think it's better to use error_report.
write_f don't use error_report. I think it's worth to go with printf in 
this case - not to mix up printf-s with error_report
> 
>> +return NULL;
>> +}
>> +
>> +if (qemuio_misalign) {
>> +len += MISALIGN_OFFSET;
>> +}
>> +buf = blk_blockalign(blk, len);
>> +memset(buf, 0, len);
>> +
>> +if (!fread(buf, sizeof(char), len, f)) {
>> +printf("file '%s' is empty\n", file_name);
>> +free(buf);
>> +return NULL;
>> +}
>> +
>> +if (qemuio_misalign) {
>> +buf += MISALIGN_OFFSET;
> 
> hmm, preexisting in qemu_io_alloc and qemu_io_free, but If I remember 
> correctly, pointer arithmetic
> on void* is not guaranteed to work as expected here..
will change to g_malloc/g_free
> 
>> +}
>> +return buf;
>> +}
>> +
>>static void qemu_io_free(void *p)
>>{
>>if (qemuio_misalign) {
>> @@ -965,7 +995,7 @@ static const cmdinfo_t write_cmd = {
>>.perm   = BLK_PERM_WRITE,
>>.argmin = 2,
>>.argmax = -1,
>> -.args   = "[-bcCfnquz] [-P pattern] off len",
>> +.args   = "[-bcCfnquz] [-P pattern | -s source_file] off len",
>>.oneline= "writes a number of bytes at a specified offset",
>>.help   = write_help,
>>};
>> @@ -974,7 +1004,7 @@ static int write_f(BlockBackend *blk, int argc, char 
>> **argv)
>>{
>>struct timeval t1, t2;
>>bool Cflag = false, qflag = false, bflag = false;
>> -bool Pflag = false, zflag = false, cflag = false;
>> +bool Pflag = false, zflag = false, cflag = false, sflag = false;
>>int flags = 0;
>>int c, cnt, ret;
>>char *buf = NULL;
>> @@ -983,8 +1013,9 @@ static int write_f(BlockBackend *blk, int argc, char 
>> **argv)
>>/* Some compilers get confused and warn if this is not initialized.  
>> */
>>int64_t total = 0;
>>int pattern = 0xcd;
>> +char file_name[255] = { 0 };
>>
>> -while ((c = getopt(argc, argv, "bcCfnpP:quz")) != -1) {
>> +while ((c = getopt(argc, argv, "bcCfnpP:quzs:")) != -1) {
>>switch (c) {
>>case 'b':
>>bflag = true;
>> @@ -1020,6 +1051,10 @@ static int write_f(BlockBackend *blk, int argc, char 
>> **argv)
>>case 'z':
>>zflag = true;
>>break;
>> +case 's':
>> +sflag = true;
>> +strncpy(file_name, optarg, sizeof(file_name) - 1);
> 
> Maybe, g_strdup and don't care about file_name length?
yeah, that would be better
> 
>> +break;
>>default:
>>qemuio_command_usage(&write_cmd);
>>return -EINVAL;
>> @@ -1056,6 +1091,14 @@ static int write_f(BlockBackend *blk, int argc, char 
>> **argv)
>>return -EINVAL;
>>}
>>
>> +if (sflag && Pflag) {
>> +printf("-s and -P cannot be specified at the same time\n");
>> +}
>> +
>> +if (zflag && Pflag) {
>> +printf("-z and -P cannot be specified at the same time\n");
>> +}
yes, this is an error. Here should be zflag && sflag
> 
> strange, that you add this last check.. I see it in master branch already.
> 
>> +
>>offset = cvtnum(argv[optind]);
>>if (offset < 0) {
>>print_cvtnum_err(offset, argv[optind]);
>> @@ -1088,7 +1131,14 @@ static int write_f(BlockBackend *blk, int argc, char 
>> **argv)
>>}
>>
>>if (!zflag) {
>> -buf = qemu_io_alloc(blk, count, pattern);
>> +if (sflag) {
>> +buf = qemu_io_alloc_from_file(blk, count, file_name);
>> +if (!buf) {
>> +return -EINVAL;
>> +}
>> +} else {
>> +buf = qemu_io

Re: [Qemu-block] [PATCH] pr-manager-helper: fix pr process been killed when reconectting

2019-05-29 Thread Jie Wang
Hi, Paolo and Michal:

Let me add some details about this problem.


reappear steps:

1. in Host, execute the following command many times quickly:

"ps aux|grep helper|grep -v grep|grep -v qemu-kvm|awk '{print $2}';ps aux|grep 
helper|grep -v grep|grep -v qemu-kvm|awk '{print $2}'|xargs -n1 kill -9"

2. at the same time , execute PR command continuously in Guest

just execute step 1 and 2 for a moment, the problem will appear.


when the problem appeared:

1. qemu will initialize pr-helper and connect to it cyclically, but always 
failed because no running pr-helper process to connect.

2. libvirt will always waiting for connected event, but will never to start new 
pr-helper process because not receive disconnect event.


I'm not found the best way to solve this problem, can you give me some 
suggestion?


On 2019/5/29 15:33, Michal Privoznik wrote:
> On 5/28/19 7:45 PM, Paolo Bonzini wrote:
>> On 28/05/19 15:06, Jie Wang wrote:
>>> if pr-helper been killed and qemu send disconnect event to libvirt
>>> and libvirt started a new pr-helper process, the new pr-heleper
>>> been killed again when qemu is connectting to the new pr-helper,
>>> qemu will never send disconnect to libvirt, and libvirt will never
>>> receive connected event.
>>
>> I think this would let a guest "spam" events just by sending a lot PR
>> commands.  Also, as you said, in this case QEMU has never sent a
>> "connected" event, so I'm not sure why it should send a disconnection event.
>
> So pr manager is initialized on the first PR command and not when qemu is 
> starting?
>
> If a user inside the guest could somehow kill pr-helper process in the host 
> then yes, they could spam libvirt/qemu. But if a user from inside a guest can 
> kill a process in the host that is much bigger problem than spaming libvirt.
>
>>
>> Does libvirt monitor at all the pr-helper to check if it dies?  Or does
>> it rely exclusively on QEMU's events?
>
> Libvirt relies solely on QEMU's events. Just like with qemu process itself, 
> libvirt can't rely on SIGCHILD because the daemon might be restarted which 
> would reparent all qemu and pr-helper processes rendering libvirt wait for 
> SIGCHILD useless.
>
> But there is an exception to this: when libvirt is spawning pr-helper it does 
> so by following these steps:
>
> 1) Try to acquire (lock) pidfile
> 2) unlink(socket)
> 3) spawn pr-helper process (this yields child's PID)
> 4) wait some time until socket is created
> 5) some follow up work (move child's PID into same cgroup as qemu's main 
> thread, relabel the socket so that qemu can access it)
>
> If any of these steps fails then child is killed. However, the PID is not 
> recorded anywhere and thus is forgotten once control jumps out of the 
> function.
>
> Michal
>
> .
>


Re: [Qemu-block] [PATCH v0] qemu-io: add pattern file for write command

2019-05-29 Thread Vladimir Sementsov-Ogievskiy
29.05.2019 9:48, Denis Plotnikov wrote:
> The patch allows to provide a pattern file for write
> command. There was no similar ability before.
> 
> Signed-off-by: Denis Plotnikov 
> ---
>   qemu-io-cmds.c | 58 ++
>   1 file changed, 54 insertions(+), 4 deletions(-)
> 
> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
> index 09750a23ce..b93955116f 100644
> --- a/qemu-io-cmds.c
> +++ b/qemu-io-cmds.c
> @@ -21,6 +21,7 @@
>   #include "qemu/option.h"
>   #include "qemu/timer.h"
>   #include "qemu/cutils.h"
> +#include "string.h"
>   
>   #define CMD_NOFILE_OK   0x01
>   
> @@ -343,6 +344,35 @@ static void *qemu_io_alloc(BlockBackend *blk, size_t 
> len, int pattern)
>   return buf;
>   }
>   
> +static void *qemu_io_alloc_from_file(BlockBackend *blk, size_t len,
> + char *file_name)
> +{
> +void *buf;
> +FILE *f = fopen(file_name, "r");
> +
> +if (!f) {
> +printf("cannot open file '%s'\n", file_name);

I see, printf fro errors is preexisting in qemu-io-cmds.c, but error_report is 
used here too,
I think it's better to use error_report.

> +return NULL;
> +}
> +
> +if (qemuio_misalign) {
> +len += MISALIGN_OFFSET;
> +}
> +buf = blk_blockalign(blk, len);
> +memset(buf, 0, len);
> +
> +if (!fread(buf, sizeof(char), len, f)) {
> +printf("file '%s' is empty\n", file_name);
> +free(buf);
> +return NULL;
> +}
> +
> +if (qemuio_misalign) {
> +buf += MISALIGN_OFFSET;

hmm, preexisting in qemu_io_alloc and qemu_io_free, but If I remember 
correctly, pointer arithmetic
on void* is not guaranteed to work as expected here..

> +}
> +return buf;
> +}
> +
>   static void qemu_io_free(void *p)
>   {
>   if (qemuio_misalign) {
> @@ -965,7 +995,7 @@ static const cmdinfo_t write_cmd = {
>   .perm   = BLK_PERM_WRITE,
>   .argmin = 2,
>   .argmax = -1,
> -.args   = "[-bcCfnquz] [-P pattern] off len",
> +.args   = "[-bcCfnquz] [-P pattern | -s source_file] off len",
>   .oneline= "writes a number of bytes at a specified offset",
>   .help   = write_help,
>   };
> @@ -974,7 +1004,7 @@ static int write_f(BlockBackend *blk, int argc, char 
> **argv)
>   {
>   struct timeval t1, t2;
>   bool Cflag = false, qflag = false, bflag = false;
> -bool Pflag = false, zflag = false, cflag = false;
> +bool Pflag = false, zflag = false, cflag = false, sflag = false;
>   int flags = 0;
>   int c, cnt, ret;
>   char *buf = NULL;
> @@ -983,8 +1013,9 @@ static int write_f(BlockBackend *blk, int argc, char 
> **argv)
>   /* Some compilers get confused and warn if this is not initialized.  */
>   int64_t total = 0;
>   int pattern = 0xcd;
> +char file_name[255] = { 0 };
>   
> -while ((c = getopt(argc, argv, "bcCfnpP:quz")) != -1) {
> +while ((c = getopt(argc, argv, "bcCfnpP:quzs:")) != -1) {
>   switch (c) {
>   case 'b':
>   bflag = true;
> @@ -1020,6 +1051,10 @@ static int write_f(BlockBackend *blk, int argc, char 
> **argv)
>   case 'z':
>   zflag = true;
>   break;
> +case 's':
> +sflag = true;
> +strncpy(file_name, optarg, sizeof(file_name) - 1);

Maybe, g_strdup and don't care about file_name length?

> +break;
>   default:
>   qemuio_command_usage(&write_cmd);
>   return -EINVAL;
> @@ -1056,6 +1091,14 @@ static int write_f(BlockBackend *blk, int argc, char 
> **argv)
>   return -EINVAL;
>   }
>   
> +if (sflag && Pflag) {
> +printf("-s and -P cannot be specified at the same time\n");
> +}
> +
> +if (zflag && Pflag) {
> +printf("-z and -P cannot be specified at the same time\n");
> +}

strange, that you add this last check.. I see it in master branch already.

> +
>   offset = cvtnum(argv[optind]);
>   if (offset < 0) {
>   print_cvtnum_err(offset, argv[optind]);
> @@ -1088,7 +1131,14 @@ static int write_f(BlockBackend *blk, int argc, char 
> **argv)
>   }
>   
>   if (!zflag) {
> -buf = qemu_io_alloc(blk, count, pattern);
> +if (sflag) {
> +buf = qemu_io_alloc_from_file(blk, count, file_name);
> +if (!buf) {
> +return -EINVAL;
> +}
> +} else {
> +buf = qemu_io_alloc(blk, count, pattern);
> +}
>   }
>   
>   gettimeofday(&t1, NULL);
> 


-- 
Best regards,
Vladimir


Re: [Qemu-block] [PATCH] pr-manager-helper: fix pr process been killed when reconectting

2019-05-29 Thread Michal Privoznik

On 5/28/19 7:45 PM, Paolo Bonzini wrote:

On 28/05/19 15:06, Jie Wang wrote:

if pr-helper been killed and qemu send disconnect event to libvirt
and libvirt started a new pr-helper process, the new pr-heleper
been killed again when qemu is connectting to the new pr-helper,
qemu will never send disconnect to libvirt, and libvirt will never
receive connected event.


I think this would let a guest "spam" events just by sending a lot PR
commands.  Also, as you said, in this case QEMU has never sent a
"connected" event, so I'm not sure why it should send a disconnection event.


So pr manager is initialized on the first PR command and not when qemu 
is starting?


If a user inside the guest could somehow kill pr-helper process in the 
host then yes, they could spam libvirt/qemu. But if a user from inside a 
guest can kill a process in the host that is much bigger problem than 
spaming libvirt.




Does libvirt monitor at all the pr-helper to check if it dies?  Or does
it rely exclusively on QEMU's events?


Libvirt relies solely on QEMU's events. Just like with qemu process 
itself, libvirt can't rely on SIGCHILD because the daemon might be 
restarted which would reparent all qemu and pr-helper processes 
rendering libvirt wait for SIGCHILD useless.


But there is an exception to this: when libvirt is spawning pr-helper it 
does so by following these steps:


1) Try to acquire (lock) pidfile
2) unlink(socket)
3) spawn pr-helper process (this yields child's PID)
4) wait some time until socket is created
5) some follow up work (move child's PID into same cgroup as qemu's main 
thread, relabel the socket so that qemu can access it)


If any of these steps fails then child is killed. However, the PID is 
not recorded anywhere and thus is forgotten once control jumps out of 
the function.


Michal



Re: [Qemu-block] [PATCH v6 3/3] block/stream: introduce a bottom node

2019-05-29 Thread Vladimir Sementsov-Ogievskiy
28.05.2019 20:33, Max Reitz wrote:
> On 06.05.19 17:34, Vladimir Sementsov-Ogievskiy wrote:
>> From: Andrey Shinkevich 
>>
>> The bottom node is the intermediate block device that has the base as its
>> backing image. It is used instead of the base node while a block stream
>> job is running to avoid dependency on the base that may change due to the
>> parallel jobs. The change may take place due to a filter node as well that
>> is inserted between the base and the intermediate bottom node. It occurs
>> when the base node is the top one for another commit or stream job.
>> After the introduction of the bottom node, don't freeze its backing child,
>> that's the base, anymore.
>>
>> Suggested-by: Vladimir Sementsov-Ogievskiy 
>> Signed-off-by: Andrey Shinkevich 
>> Reviewed-by: Vladimir Sementsov-Ogievskiy 
>> Reviewed-by: Alberto Garcia 
>> ---
>>   block/stream.c | 49 +-
>>   tests/qemu-iotests/245 |  4 ++--
>>   2 files changed, 27 insertions(+), 26 deletions(-)
>>
>> diff --git a/block/stream.c b/block/stream.c
>> index 65b13b27e0..fc97c89f81 100644
>> --- a/block/stream.c
>> +++ b/block/stream.c
> 
> [...]
> 
>> @@ -248,26 +250,25 @@ void stream_start(const char *job_id, BlockDriverState 
>> *bs,
>>* already have our own plans. Also don't allow resize as the image 
>> size is
>>* queried only at the job start and then cached. */
>>   s = block_job_create(job_id, &stream_job_driver, NULL, bs,
>> - BLK_PERM_CONSISTENT_READ | 
>> BLK_PERM_WRITE_UNCHANGED |
>> - BLK_PERM_GRAPH_MOD,
>> - BLK_PERM_CONSISTENT_READ | 
>> BLK_PERM_WRITE_UNCHANGED |
>> - BLK_PERM_WRITE,
>> + basic_flags | BLK_PERM_GRAPH_MOD,
>> + basic_flags | BLK_PERM_WRITE,
>>speed, creation_flags, NULL, NULL, errp);
>>   if (!s) {
>>   goto fail;
>>   }
>>   
>> -/* Block all intermediate nodes between bs and base, because they will
>> - * disappear from the chain after this operation. The streaming job 
>> reads
>> - * every block only once, assuming that it doesn't change, so block 
>> writes
>> - * and resizes. */
>> -for (iter = backing_bs(bs); iter && iter != base; iter = 
>> backing_bs(iter)) {
>> -block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
>> -   BLK_PERM_CONSISTENT_READ | 
>> BLK_PERM_WRITE_UNCHANGED,
>> -   &error_abort);
>> +/*
>> + * Block all intermediate nodes between bs and bottom (inclusive), 
>> because
>> + * they will disappear from the chain after this operation. The 
>> streaming
>> + * job reads every block only once, assuming that it doesn't change, so
>> + * forbid writes and resizes.
>> + */
>> +for (iter = bs; iter != bottom; iter = backing_bs(iter)) {
>> +block_job_add_bdrv(&s->common, "intermediate node", 
>> backing_bs(iter),
>> +   0, basic_flags, &error_abort);
> 
> I don’t understand this change.  Isn’t it doing exactly the same as before?
> 
> (If so, I just find it harder to read because every iteration isn’t
> about @iter but backing_bs(iter).)

Hm, it's the same, but not using base. We may save old loop if calculate base 
exactly before
the loop (or at least not separated from it by any yield-point)

> 
> The rest looks good to me.
> 
> Max
> 
>>   }
>>   
>> -s->base = base;
>> +s->bottom = bottom;
>>   s->backing_file_str = g_strdup(backing_file_str);
>>   s->bs_read_only = bs_read_only;
>>   s->chain_frozen = true;
> 


-- 
Best regards,
Vladimir