Re: [Qemu-block] [PATCH] iotests: fix 075 and 078

2017-11-21 Thread Lukáš Doktor
Dne 22.11.2017 v 01:16 John Snow napsal(a):
> Both of these tests are for formats which now stipulate that they are
> read-only. Adjust the tests to match.
> 
> Signed-off-by: John Snow 
> ---
>  tests/qemu-iotests/075 | 18 +-
>  tests/qemu-iotests/078 | 14 +++---
>  2 files changed, 16 insertions(+), 16 deletions(-)

Reviewed-by: Lukáš Doktor  

Yep, fixes the issue with "Opening cloop images without an explicit 
read-only=on option is deprecated. Future versions will refuse to open the 
image instead of automatically marking the image read-only.".

Lukáš



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH] iotests: fix 075 and 078

2017-11-21 Thread Eric Blake
On 11/21/2017 06:16 PM, John Snow wrote:
> Both of these tests are for formats which now stipulate that they are
> read-only. Adjust the tests to match.
> 
> Signed-off-by: John Snow 
> ---
>  tests/qemu-iotests/075 | 18 +-
>  tests/qemu-iotests/078 | 14 +++---
>  2 files changed, 16 insertions(+), 16 deletions(-)
> 

Reviewed-by: Eric Blake 

Safe for 2.11, although now that -rc2 is out, it's also okay if this
slips into 2.12.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PATCH] iotests: fix 075 and 078

2017-11-21 Thread John Snow
Both of these tests are for formats which now stipulate that they are
read-only. Adjust the tests to match.

Signed-off-by: John Snow 
---
 tests/qemu-iotests/075 | 18 +-
 tests/qemu-iotests/078 | 14 +++---
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/tests/qemu-iotests/075 b/tests/qemu-iotests/075
index 770d51c6cb..caa30d4743 100755
--- a/tests/qemu-iotests/075
+++ b/tests/qemu-iotests/075
@@ -48,56 +48,56 @@ offsets_offset=136
 echo
 echo "== check that the first sector can be read =="
 _use_sample_img simple-pattern.cloop.bz2
-$QEMU_IO -c "read 0 512" $TEST_IMG 2>&1 | _filter_qemu_io | _filter_testdir
+$QEMU_IO -r -c "read 0 512" $TEST_IMG 2>&1 | _filter_qemu_io | _filter_testdir
 
 echo
 echo "== check that the last sector can be read =="
 _use_sample_img simple-pattern.cloop.bz2
-$QEMU_IO -c "read $((1024 * 1024 - 512)) 512" $TEST_IMG 2>&1 | _filter_qemu_io 
| _filter_testdir
+$QEMU_IO -r -c "read $((1024 * 1024 - 512)) 512" $TEST_IMG 2>&1 | 
_filter_qemu_io | _filter_testdir
 
 echo
 echo "== block_size must be a multiple of 512 =="
 _use_sample_img simple-pattern.cloop.bz2
 poke_file "$TEST_IMG" "$block_size_offset" "\x00\x00\x02\x01"
-$QEMU_IO -c "read 0 512" $TEST_IMG 2>&1 | _filter_qemu_io | _filter_testdir
+$QEMU_IO -r -c "read 0 512" $TEST_IMG 2>&1 | _filter_qemu_io | _filter_testdir
 
 echo
 echo "== block_size cannot be zero =="
 _use_sample_img simple-pattern.cloop.bz2
 poke_file "$TEST_IMG" "$block_size_offset" "\x00\x00\x00\x00"
-$QEMU_IO -c "read 0 512" $TEST_IMG 2>&1 | _filter_qemu_io | _filter_testdir
+$QEMU_IO -r -c "read 0 512" $TEST_IMG 2>&1 | _filter_qemu_io | _filter_testdir
 
 echo
 echo "== huge block_size ==="
 _use_sample_img simple-pattern.cloop.bz2
 poke_file "$TEST_IMG" "$block_size_offset" "\xff\xff\xfe\x00"
-$QEMU_IO -c "read 0 512" $TEST_IMG 2>&1 | _filter_qemu_io | _filter_testdir
+$QEMU_IO -r -c "read 0 512" $TEST_IMG 2>&1 | _filter_qemu_io | _filter_testdir
 
 echo
 echo "== offsets_size overflow ==="
 _use_sample_img simple-pattern.cloop.bz2
 poke_file "$TEST_IMG" "$n_blocks_offset" "\xff\xff\xff\xff"
-$QEMU_IO -c "read 0 512" $TEST_IMG 2>&1 | _filter_qemu_io | _filter_testdir
+$QEMU_IO -r -c "read 0 512" $TEST_IMG 2>&1 | _filter_qemu_io | _filter_testdir
 
 echo
 echo "== refuse images that require too many offsets ==="
 _use_sample_img simple-pattern.cloop.bz2
 poke_file "$TEST_IMG" "$n_blocks_offset" "\x04\x00\x00\x01"
-$QEMU_IO -c "read 0 512" $TEST_IMG 2>&1 | _filter_qemu_io | _filter_testdir
+$QEMU_IO -r -c "read 0 512" $TEST_IMG 2>&1 | _filter_qemu_io | _filter_testdir
 
 echo
 echo "== refuse images with non-monotonically increasing offsets =="
 _use_sample_img simple-pattern.cloop.bz2
 poke_file "$TEST_IMG" "$offsets_offset" "\x00\x00\x00\x00\xff\xff\xff\xff"
 poke_file "$TEST_IMG" $((offsets_offset + 8)) 
"\x00\x00\x00\x00\xff\xfe\x00\x00"
-$QEMU_IO -c "read 0 512" $TEST_IMG 2>&1 | _filter_qemu_io | _filter_testdir
+$QEMU_IO -r -c "read 0 512" $TEST_IMG 2>&1 | _filter_qemu_io | _filter_testdir
 
 echo
 echo "== refuse images with invalid compressed block size =="
 _use_sample_img simple-pattern.cloop.bz2
 poke_file "$TEST_IMG" "$offsets_offset" "\x00\x00\x00\x00\x00\x00\x00\x00"
 poke_file "$TEST_IMG" $((offsets_offset + 8)) 
"\xff\xff\xff\xff\xff\xff\xff\xff"
-$QEMU_IO -c "read 0 512" $TEST_IMG 2>&1 | _filter_qemu_io | _filter_testdir
+$QEMU_IO -r -c "read 0 512" $TEST_IMG 2>&1 | _filter_qemu_io | _filter_testdir
 
 # success, all done
 echo "*** done"
diff --git a/tests/qemu-iotests/078 b/tests/qemu-iotests/078
index f333e9ac84..a106c26f6b 100755
--- a/tests/qemu-iotests/078
+++ b/tests/qemu-iotests/078
@@ -48,41 +48,41 @@ disk_size_offset=$((0x58))
 echo
 echo "== Read from a valid image =="
 _use_sample_img empty.bochs.bz2
-{ $QEMU_IO -c "read 0 512" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
+{ $QEMU_IO -r -c "read 0 512" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
 
 echo
 echo "== Negative catalog size =="
 _use_sample_img empty.bochs.bz2
 poke_file "$TEST_IMG" "$catalog_size_offset" "\xff\xff\xff\xff"
-{ $QEMU_IO -c "read 0 512" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
+{ $QEMU_IO -r -c "read 0 512" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
 
 echo
 echo "== Overflow for catalog size * sizeof(uint32_t) =="
 _use_sample_img empty.bochs.bz2
 poke_file "$TEST_IMG" "$catalog_size_offset" "\x00\x00\x00\x40"
-{ $QEMU_IO -c "read 0 512" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
+{ $QEMU_IO -r -c "read 0 512" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
 
 echo
 echo "== Too small catalog bitmap for image size =="
 _use_sample_img empty.bochs.bz2
 poke_file "$TEST_IMG" "$disk_size_offset" "\x00\xc0\x0f\x00\x00\x00\x00\x7f"
-{ $QEMU_IO -c "read 2T 4k" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
+{ $QEMU_IO -r -c "read 2T 4k" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
 
 _use_sample_img empty.bochs.bz2
 poke_file "$TEST_I

Re: [Qemu-block] segfault in parallel blockjobs (iotest 30)

2017-11-21 Thread John Snow
CC Jeff Cody

... who may or may not be preoccupied with Thanksgiving travel now.

Convenient URL for reading past replies:
https://lists.nongnu.org/archive/html/qemu-devel/2017-11/msg03844.html

On 11/21/2017 10:31 AM, Alberto Garcia wrote:
> On Tue 21 Nov 2017 04:18:13 PM CET, Anton Nefedov wrote:
> 
 Or, perhaps another approach, keep BlockJob referenced while it is
 paused (by block_job_pause/resume_all()). That should prevent it
 from deleting the BB.
>>>
>>> Yes, I tried this and it actually solves the issue. But I still think
>>> that the problem is that block jobs are allowed to finish when they
>>> are paused.
>>
>> Agree, but
>>
>>> Adding block_job_pause_point(&s->common) at the end of stream_run()
>>> fixes the problem too.
>>
>> would be a nice fix, but it only works unless the job is already
>> deferred, right?
> 
> Right, I didn't mean to propose it as the proper solution (it would
> still leave mirror job vulnerable because it's already paused by the
> time it calls defer_to_main_loop()).
> 
>> This:
>>
>>  >> keep BlockJob referenced while it is
>>  >> paused (by block_job_pause/resume_all()). That should prevent it from
>>  >> deleting the BB.
>>
>> looks kind of hacky; maybe referencing in block_job_pause() (and not
>> just pause_all) seems more correct? I think it didn't work for me
>> right away though. But I can look more.
> 
> You have to be careful when you unref the block job because you may
> destroy it, and therefore block_job_next() in block_job_resume_all()
> would be using freed memory.
> 
> Berto
> 



Re: [Qemu-block] [Qemu-devel] [PATCH for-2.12 0/4] qmp dirty bitmap API

2017-11-21 Thread John Snow


On 11/21/2017 12:23 PM, Kevin Wolf wrote:
> Am 17.11.2017 um 22:35 hat John Snow geschrieben:
> usage is like this:
>
> 1. we have dirty bitmap bitmap0 for incremental backup.
>
> 2. prepare image fleecing (create temporary image with backing=our_disk)
> 3. in qmp transaction:
>     - disable bitmap0
>     - create bitmap1
>     - start image fleecing (backup sync=none our_disk -> temp_disk)
 This could probably just be its own command, though:

 block-job-fleece node=foobar bitmap=bitmap0 etc=etera etc=etera

 Could handle forking the bitmap. I'm not sure what the arguments would
 look like, but we could name the NBD export here, too. (Assuming the
 server is already started and we just need to create the share.)

 Then, we can basically do what mirror does:

 (1) Cancel
 (2) Complete

 Cancel would instruct QEMU to keep the bitmap changes (i.e. roll back),
 and Complete would instruct QEMU to discard the changes.

 This way we don't need to expose commands like split or merge that will
 almost always be dangerous to use over QMP.

 In fact, a fleecing job would be really convenient even without a
 bitmap, because it'd still be nice to have a convenience command for it.
 Using an existing infrastructure and understood paradigm is just a bonus.
>>>
>>> 1. If I understand correctly, Kevin and Max said in their report in
>>> Prague about new block-job approach,
>>>   using filter nodes, so I'm not sure that this is a good Idea to
>>> introduce now new old-style block-job, where we can
>>>   do without it.
>>
>> We could do without it, but it might be a lot better to have everything
>> wrapped up in a command that's easy to digest instead of releasing 10
>> smaller commands that have to be executed in a very specific way in
>> order to work correctly.
>>
>> I'm thinking about the complexity of error checking here with all the
>> smaller commands, versus error checking on a larger workflow we
>> understand and can quality test better.
>>
>> I'm not sure that filter nodes becoming the new normal for block jobs
>> precludes our ability to use the job-management API as a handle for
>> managing the lifetime of a long-running task like fleecing, but I'll
>> check with Max and Kevin about this.
> 
> We have a general tendency at least in the block layer, but in fact I
> think in qemu in general, to switch from exposing high-level
> functionality to exposing lower-level operations via QAPI.
> 

I am aware of that, yeah. I worry about going too far to the other
extreme in some cases. Even at the API level where we don't care about
the feelings of, or the ease-of-use by a robot, if a certain action
requires several API commands to be issued in a very specific order,
that increases our test matrix and it increases the complexity in the
management API.

There's a middle ground, I think.

"Fleecing" is one of those cases where we can already fleece today with
component commands, but a composite command that encapsulates that
functionality would be helpful.

In this case, I worry about adding low-level commands for bitmaps that
will almost always be incorrect to use except in conjunction with other
commands -- and even then generally only useful when issued via
transaction specifically.

(You might be able to make the case to me that we should add these
commands but ONLY as transaction primitives, foregoing their traditional
standalone QMP command counterparts.)

If I capitulate and let the more targeted primitives into QEMU instead
of an encompassing job, it means a higher number of QMP commands
overall, more tests, and more interfaces to test and maintain.

Maybe I am being wrong-headed, but I actually think a new job would
actually give us less to maintain, test and verify than several new
primitives would, especially when considering that these primitives will
in general only be used by transactions with other commands anyway, it
increases the evidence that the right paradigm here is a new job, not
more transaction primitives.

...maybe. I won't draw a line in the sand, but it's an approach I would
like you to consider.

> If we expose high-level commands, then every new use case will require a
> change in both qemu and libvirt. With low-level commands, often libvirt
> already has all of the required tools to implement what it needs.
> 

I am receptive to how "big" commands often need to change frequently,
though. Primitives certainly have a purity of design about them that
larger job commands do not possess.

> So I do see good reasons for exposing low-level commands.
> 
> 
> On another note, I would double check before adding a new block job type
> that this is the right way to go. We have recently noticed that many, if
> not all, of the existing block jobs (at least mirror, commit and backup)
> are so similar that they implement the same things multiple times and
> are just lacking different 

[Qemu-block] qemu-iotest 059 vmdk failure

2017-11-21 Thread John Snow
The last sub-test in 059, which uses an AFL fuzzer image to test for how
a large L1 table of a specific size is handled has a slight regression.

Previously, QEMU expects -EFBIG to come out the vmdk_open call. Now, we
get -EINVAL. Not too ominous.

Now, QEMU actually allocates the L1 table (1.6GB) and gets a little
further, only to eventually trip up here:

#0  vmdk_read_cid (bs=0x55d4c5a0, parent=0, pcid=0x55d5786c) at
/home/bos/jhuston/src/qemu/block/vmdk.c:272
#1  0x555947cb in vmdk_open (bs=0x55d4c5a0,
options=0x55d509b0, flags=65536, errp=0x7fffd950)
at /home/bos/jhuston/src/qemu/block/vmdk.c:989

where p_name is NULL, so we return -EINVAL back up the stack. We get all
the way through vmdk_open_sparse --> vmdk_open_vmdk4 before we
eventually catch an error in the header and return a fairly nondescript
EINVAL which doesn't tell the end user much.

We don't error out earlier because the vmdk_add_extent function checks
to see if we are exceeding 512*1024*1024 ... entries, which means that
later when we get to vmdk_init_tables, we do multiply this number by
four again, so our actual maximum L1 table size in terms of bytes is
2GiB, not 512MiB.

(Maybe you knew that, I didn't, enjoy useless facts. The comment in code
seemed to suggest the size was a literal size in bytes, but it's treated
more as the number of L1 entries, which is different.)

Anyway, this still seems slightly silly that we don't catch obviously
bogus images a little sooner, so I wanna rewind and see why we don't get
-EFBIG anymore.

The output changes as of 9877860e which adds the p_name check, so we
appear to actually be failing even earlier, just not before we
apparently load up a 2GB L1 table from a file that clocks in at less
than 1KB.

(Wait, why does that work? Am I tracing this wrong? -EOUTTATIME)

Anyway, we can fix this by amending the reference test output, or we
might want to shore up the vmdk input validation a bit.

Not a regression, and the test was broken in v2.10 anyway, so not
critical to fix during rc phase, I think.

Anyway, I need to go prepare some food for Thanksgiving, so I won't come
back to this one for a few days. If it's not important to anyone else,
I'll get back around to it eventually.

--js



[Qemu-block] qemu iotest 020 failing for vmdk after 2b7731938d9

2017-11-21 Thread John Snow
Commit 2b7731938d9 adds a blkdebug driver test for failing commits, but
the vmdk driver doesn't appear to appreciate this format:

+_qemu_img_wrapper create -f vmdk -b "json:{'driver': 'raw',
+ 'file': {
+ 'driver': 'blkdebug',
+ 'inject-error': [{
+ 'event': 'write_aio',
+ 'errno': 28,
+ 'once': true
+ }],
+ 'image': {
+ 'driver': 'null-co'
+ }}}"
"/home/bos/jhuston/src/qemu/bin/git/tests/qemu-iotests/scratch/t.vmdk"
+qemu-img: TEST_DIR/t.IMGFMT: Could not create image: Invalid argument


...so;

(A) VMDK should be dropped from 020, or
(B) This sub-test should be rewritten, or
(C) This sub-test should be split out into a new unit where VMDK can be
dropped.

I don't like (A) very much because I like testing our weird formats when
possible, I don't like (B) very much because I don't really like
wrangling QMP commands inside of the bash unit tests.

(C) Could work; though it's odd to have it away from its kin in 020.

Opinions?

--js



Re: [Qemu-block] [PULL 0/4] Late blockjob / coroutine patches for -rc2

2017-11-21 Thread Peter Maydell
On 21 November 2017 at 17:03, Jeff Cody  wrote:
> The following changes since commit 7c3d1917fd7a3de906170fa3d6d3d4c5918b1e49:
>
>   build: disarm the TCG unit test trap (2017-11-21 15:42:47 +)
>
> are available in the git repository at:
>
>   git://github.com/codyprime/qemu-kvm-jtc.git tags/block-pull-request
>
> for you to fetch changes up to d975301dc8ae56fb3154348878e47a6211843c0b:
>
>   qemu-iotest: add test for blockjob coroutine race condition (2017-11-21 
> 11:58:12 -0500)
>
> 
> Late fix for blockjob / coroutine segfaults w/iothreads
> 
>

Applied, thanks.

-- PMM



Re: [Qemu-block] [PATCH 2/5] qcow2: multiple clusters write compressed

2017-11-21 Thread Kevin Wolf
Am 15.11.2017 um 17:30 hat Max Reitz geschrieben:
> On 2017-11-15 17:28, Anton Nefedov wrote:
> > On 15/11/2017 6:11 PM, Max Reitz wrote:
> >> On 2017-11-14 11:16, Anton Nefedov wrote:
> >>> From: Pavel Butsykin 
> >>>
> >>> At the moment, qcow2_co_pwritev_compressed can process the requests size
> >>> less than or equal to one cluster. This patch added possibility to write
> >>> compressed data in the QCOW2 more than one cluster. The implementation
> >>> is simple, we just split large requests into separate clusters and write
> >>> using existing functionality. For unaligned requests we use a workaround
> >>> and do write data without compression.
> >>>
> >>> Signed-off-by: Anton Nefedov 
> >>> ---
> >>>   block/qcow2.c | 77
> >>> +++
> >>>   1 file changed, 56 insertions(+), 21 deletions(-)
> >>
> >> On one hand, it might be better to do this centrally somewhere in
> >> block/io.c.  On the other, that would require more work because it would
> >> probably mean having to introduce another field in BlockLimits, and it
> >> wouldn't do much -- because qcow (v1) is, well, qcow v1...  And vmdk
> >> seems to completely not care about this single cluster limitation.  So
> >> for now we probably wouldn't even gain anything by doing this in
> >> block/io.c.
> >>
> >> So long story short, it's OK to do this locally in qcow2, yes.
> >>
> > 
> > [..]
> > 
> >>> +    qemu_iovec_reset(&hd_qiov);
> >>> +    chunk_size = MIN(bytes, s->cluster_size);
> >>> +    qemu_iovec_concat(&hd_qiov, qiov, curr_off, chunk_size);
> >>> +
> >>> +    ret = qcow2_co_pwritev_cluster_compressed(bs, offset +
> >>> curr_off,
> >>> +  chunk_size,
> >>> &hd_qiov);
> >>> +    if (ret == -ENOTSUP) {
> >>
> >> Why this?  I mean, I can see the appeal, but then we should probably
> >> actually return -ENOTSUP somewhere (e.g. for unaligned clusters and the
> >> like) -- and we should not abort if offset_into_cluster(s, cluster) is
> >> true, but we should write the header uncompressed and compress the main
> >> bulk.
> >>
> >> Max
> >>
> > 
> > Right, sorry, missed this part when porting the patch.
> > 
> > I think this needs to be removed (and the commit message fixed
> > accordingly).
> > Returning an error, rather than silently falling back to uncompressed
> > seems preferable to me. "Compressing writers" (backup, img convert and
> > now stream) are aware that they have to cluster-align, and if they fail
> > to do so that means there is an error somewhere.
> 
> OK for me.
> 
> > If it won't return an error anymore, then the unaligned tail shouldn't
> > either. So we can end up 'letting' the callers send small unaligned
> > requests which will never get compressed.
> 
> Either way is fine.  It just looks to me like vmdk falls back to
> uncompressed writes, so it's not like it would be completely new behavior...
> 
> (But I won't judge whether what vmdk does is a good idea.)

Probably not.

If we let io.c know about the cluster-size alignment requirement for
compressed writes, the usual RMW code path could kick in. Wouldn't this
actually be a better solution than uncompressed writes or erroring out?

In fact, with this, we might even be very close to an option that turns
every write into a compressed write, so you get images that stay
compressed even while they are in use.

Kevin


signature.asc
Description: PGP signature


Re: [Qemu-block] [Qemu-devel] [PATCH for-2.12 0/4] qmp dirty bitmap API

2017-11-21 Thread Kevin Wolf
Am 17.11.2017 um 22:35 hat John Snow geschrieben:
> >>> usage is like this:
> >>>
> >>> 1. we have dirty bitmap bitmap0 for incremental backup.
> >>>
> >>> 2. prepare image fleecing (create temporary image with backing=our_disk)
> >>> 3. in qmp transaction:
> >>>     - disable bitmap0
> >>>     - create bitmap1
> >>>     - start image fleecing (backup sync=none our_disk -> temp_disk)
> >> This could probably just be its own command, though:
> >>
> >> block-job-fleece node=foobar bitmap=bitmap0 etc=etera etc=etera
> >>
> >> Could handle forking the bitmap. I'm not sure what the arguments would
> >> look like, but we could name the NBD export here, too. (Assuming the
> >> server is already started and we just need to create the share.)
> >>
> >> Then, we can basically do what mirror does:
> >>
> >> (1) Cancel
> >> (2) Complete
> >>
> >> Cancel would instruct QEMU to keep the bitmap changes (i.e. roll back),
> >> and Complete would instruct QEMU to discard the changes.
> >>
> >> This way we don't need to expose commands like split or merge that will
> >> almost always be dangerous to use over QMP.
> >>
> >> In fact, a fleecing job would be really convenient even without a
> >> bitmap, because it'd still be nice to have a convenience command for it.
> >> Using an existing infrastructure and understood paradigm is just a bonus.
> > 
> > 1. If I understand correctly, Kevin and Max said in their report in
> > Prague about new block-job approach,
> >   using filter nodes, so I'm not sure that this is a good Idea to
> > introduce now new old-style block-job, where we can
> >   do without it.
> 
> We could do without it, but it might be a lot better to have everything
> wrapped up in a command that's easy to digest instead of releasing 10
> smaller commands that have to be executed in a very specific way in
> order to work correctly.
> 
> I'm thinking about the complexity of error checking here with all the
> smaller commands, versus error checking on a larger workflow we
> understand and can quality test better.
> 
> I'm not sure that filter nodes becoming the new normal for block jobs
> precludes our ability to use the job-management API as a handle for
> managing the lifetime of a long-running task like fleecing, but I'll
> check with Max and Kevin about this.

We have a general tendency at least in the block layer, but in fact I
think in qemu in general, to switch from exposing high-level
functionality to exposing lower-level operations via QAPI.

If we expose high-level commands, then every new use case will require a
change in both qemu and libvirt. With low-level commands, often libvirt
already has all of the required tools to implement what it needs.

So I do see good reasons for exposing low-level commands.


On another note, I would double check before adding a new block job type
that this is the right way to go. We have recently noticed that many, if
not all, of the existing block jobs (at least mirror, commit and backup)
are so similar that they implement the same things multiple times and
are just lacking different options and have different bugs. I'm
seriously considering merging all of them into a single job type
internally that just provides options that effectively turn it into one
of the existing job types.

So even if we want to tie the bitmap management to a block job, we
should consider adding it as an option to the existing backup job rather
than adding a completely new fourth job type that again does almost the
same except for some bitmap mangement stuff on completion.

Kevin



Re: [Qemu-block] [PULL 0/7] Block layer patches for 2.11.0-rc2

2017-11-21 Thread Peter Maydell
On 21 November 2017 at 15:10, Kevin Wolf  wrote:
> The following changes since commit d0dead3b6df7f6cd970ed02e8369ab8730aac9d3:
>
>   scripts/make-release: ship u-boot source as a tarball (2017-11-21 12:48:20 
> +)
>
> are available in the git repository at:
>
>   git://repo.or.cz/qemu/kevin.git tags/for-upstream
>
> for you to fetch changes up to 4fd0295c151e596944be2f8062c4563cdf9106a0:
>
>   Merge remote-tracking branch 'mreitz/tags/pull-block-2017-11-21' into 
> queue-block (2017-11-21 15:09:54 +0100)
>
> 
> Block layer patches for 2.11.0-rc2
>
> 

Applied, thanks.

-- PMM



[Qemu-block] [PULL 3/4] qemu-iotests: add option in common.qemu for mismatch only

2017-11-21 Thread Jeff Cody
Add option to echo response to QMP / HMP command only on mismatch.

Useful for ignore all normal responses, but catching things like
segfaults.

Signed-off-by: Jeff Cody 
Reviewed-by: Stefan Hajnoczi 
---
 tests/qemu-iotests/common.qemu | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/common.qemu b/tests/qemu-iotests/common.qemu
index 7b3052d..85f66b8 100644
--- a/tests/qemu-iotests/common.qemu
+++ b/tests/qemu-iotests/common.qemu
@@ -50,6 +50,8 @@ _in_fd=4
 #
 # If $silent is set to anything but an empty string, then
 # response is not echoed out.
+# If $mismatch_only is set, only non-matching responses will
+# be echoed.
 function _timed_wait_for()
 {
 local h=${1}
@@ -58,14 +60,18 @@ function _timed_wait_for()
 QEMU_STATUS[$h]=0
 while IFS= read -t ${QEMU_COMM_TIMEOUT} resp <&${QEMU_OUT[$h]}
 do
-if [ -z "${silent}" ]; then
+if [ -z "${silent}" ] && [ -z "${mismatch_only}" ]; then
 echo "${resp}" | _filter_testdir | _filter_qemu \
| _filter_qemu_io | _filter_qmp | _filter_hmp
 fi
 grep -q "${*}" < <(echo "${resp}")
 if [ $? -eq 0 ]; then
 return
+elif [ -z "${silent}" ] && [ -n "${mismatch_only}" ]; then
+echo "${resp}" | _filter_testdir | _filter_qemu \
+   | _filter_qemu_io | _filter_qmp | _filter_hmp
 fi
+
 done
 QEMU_STATUS[$h]=-1
 if [ -z "${qemu_error_no_exit}" ]; then
-- 
2.9.5




[Qemu-block] [PULL 4/4] qemu-iotest: add test for blockjob coroutine race condition

2017-11-21 Thread Jeff Cody
Signed-off-by: Jeff Cody 
Reviewed-by: Stefan Hajnoczi 
---
 tests/qemu-iotests/200 | 99 ++
 tests/qemu-iotests/200.out | 14 +++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 114 insertions(+)
 create mode 100755 tests/qemu-iotests/200
 create mode 100644 tests/qemu-iotests/200.out

diff --git a/tests/qemu-iotests/200 b/tests/qemu-iotests/200
new file mode 100755
index 000..d8787dd
--- /dev/null
+++ b/tests/qemu-iotests/200
@@ -0,0 +1,99 @@
+#!/bin/bash
+#
+# Block job co-routine race condition test.
+#
+# See: https://bugzilla.redhat.com/show_bug.cgi?id=1508708
+#
+# Copyright (C) 2017 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=jc...@redhat.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+status=1# failure is the default!
+
+_cleanup()
+{
+_cleanup_qemu
+rm -f "${TEST_IMG}" "${BACKING_IMG}"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+. ./common.qemu
+
+_supported_fmt qcow2 qed
+_supported_proto file
+_supported_os Linux
+
+BACKING_IMG="${TEST_DIR}/backing.img"
+TEST_IMG="${TEST_DIR}/test.img"
+
+${QEMU_IMG} create -f $IMGFMT "${BACKING_IMG}" 512M | _filter_img_create
+${QEMU_IMG} create -f $IMGFMT -F $IMGFMT "${TEST_IMG}" -b "${BACKING_IMG}" 
512M | _filter_img_create
+
+${QEMU_IO} -c "write -P 0xa5 512 300M" "${BACKING_IMG}" | _filter_qemu_io
+
+echo
+echo === Starting QEMU VM ===
+echo
+qemu_comm_method="qmp"
+_launch_qemu -device pci-bridge,id=bridge1,chassis_nr=1,bus=pci.0 \
+ -object iothread,id=iothread0 \
+ -device 
virtio-scsi-pci,bus=bridge1,addr=0x1f,id=scsi0,iothread=iothread0 \
+ -drive 
file="${TEST_IMG}",media=disk,if=none,cache=none,id=drive_sysdisk,aio=native,format=$IMGFMT
 \
+ -device 
scsi-hd,drive=drive_sysdisk,bus=scsi0.0,id=sysdisk,bootindex=0
+h1=$QEMU_HANDLE
+
+_send_qemu_cmd $h1 "{ 'execute': 'qmp_capabilities' }" 'return'
+
+echo
+echo === Sending stream/cancel, checking for SIGSEGV only  ===
+echo
+for (( i=1;i<500;i++ ))
+do
+mismatch_only='y' qemu_error_no_exit='n' _send_qemu_cmd $h1 \
+   "{
+'execute': 'block-stream',
+'arguments': {
+'device': 'drive_sysdisk',
+'speed': 1000,
+'on-error': 'report',
+'job-id': 'job-$i'
+ }
+}
+{
+'execute': 'block-job-cancel',
+'arguments': {
+'device': 'job-$i'
+ }
+}" \
+   "{.*{.*}.*}"  # should match all well-formed QMP 
responses
+done
+
+silent='y' _send_qemu_cmd $h1  "{ 'execute': 'quit' }" 'return'
+
+echo "$i iterations performed"
+
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/200.out b/tests/qemu-iotests/200.out
new file mode 100644
index 000..af6a809
--- /dev/null
+++ b/tests/qemu-iotests/200.out
@@ -0,0 +1,14 @@
+QA output created by 200
+Formatting 'TEST_DIR/backing.img', fmt=IMGFMT size=536870912
+Formatting 'TEST_DIR/test.img', fmt=IMGFMT size=536870912 
backing_file=TEST_DIR/backing.img backing_fmt=IMGFMT
+wrote 314572800/314572800 bytes at offset 512
+300 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+=== Starting QEMU VM ===
+
+{"return": {}}
+
+=== Sending stream/cancel, checking for SIGSEGV only ===
+
+500 iterations performed
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 1fad602..3e68867 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -196,3 +196,4 @@
 196 rw auto quick
 197 rw auto quick
 198 rw auto
+200 rw auto
-- 
2.9.5




[Qemu-block] [PULL 2/4] coroutine: abort if we try to schedule or enter a pending coroutine

2017-11-21 Thread Jeff Cody
The previous patch fixed a race condition, in which there were
coroutines being executing doubly, or after coroutine deletion.

We can detect common scenarios when this happens, and print an error
message and abort before we corrupt memory / data, or segfault.

This patch will abort if an attempt to enter a coroutine is made while
it is currently pending execution, either in a specific AioContext bh,
or pending execution via a timer.  It will also abort if a coroutine
is scheduled, before a prior scheduled run has occurred.

We cannot rely on the existing co->caller check for recursive re-entry
to catch this, as the coroutine may run and exit with
COROUTINE_TERMINATE before the scheduled coroutine executes.

(This is the scenario that was occurring and fixed in the previous
patch).

This patch also re-orders the Coroutine struct elements in an attempt to
optimize caching.

Signed-off-by: Jeff Cody 
Reviewed-by: Stefan Hajnoczi 
---
 include/qemu/coroutine_int.h | 13 ++---
 util/async.c | 13 +
 util/qemu-coroutine-sleep.c  | 12 
 util/qemu-coroutine.c| 14 ++
 4 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
index cb98892..59e8406 100644
--- a/include/qemu/coroutine_int.h
+++ b/include/qemu/coroutine_int.h
@@ -46,14 +46,21 @@ struct Coroutine {
 
 size_t locks_held;
 
+/* Only used when the coroutine has yielded.  */
+AioContext *ctx;
+
+/* Used to catch and abort on illegal co-routine entry.
+ * Will contain the name of the function that had first
+ * scheduled the coroutine. */
+const char *scheduled;
+
+QSIMPLEQ_ENTRY(Coroutine) co_queue_next;
+
 /* Coroutines that should be woken up when we yield or terminate.
  * Only used when the coroutine is running.
  */
 QSIMPLEQ_HEAD(, Coroutine) co_queue_wakeup;
 
-/* Only used when the coroutine has yielded.  */
-AioContext *ctx;
-QSIMPLEQ_ENTRY(Coroutine) co_queue_next;
 QSLIST_ENTRY(Coroutine) co_scheduled_next;
 };
 
diff --git a/util/async.c b/util/async.c
index 0e1bd87..4dd9d95 100644
--- a/util/async.c
+++ b/util/async.c
@@ -388,6 +388,9 @@ static void co_schedule_bh_cb(void *opaque)
 QSLIST_REMOVE_HEAD(&straight, co_scheduled_next);
 trace_aio_co_schedule_bh_cb(ctx, co);
 aio_context_acquire(ctx);
+
+/* Protected by write barrier in qemu_aio_coroutine_enter */
+atomic_set(&co->scheduled, NULL);
 qemu_coroutine_enter(co);
 aio_context_release(ctx);
 }
@@ -438,6 +441,16 @@ fail:
 void aio_co_schedule(AioContext *ctx, Coroutine *co)
 {
 trace_aio_co_schedule(ctx, co);
+const char *scheduled = atomic_cmpxchg(&co->scheduled, NULL,
+   __func__);
+
+if (scheduled) {
+fprintf(stderr,
+"%s: Co-routine was already scheduled in '%s'\n",
+__func__, scheduled);
+abort();
+}
+
 QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines,
   co, co_scheduled_next);
 qemu_bh_schedule(ctx->co_schedule_bh);
diff --git a/util/qemu-coroutine-sleep.c b/util/qemu-coroutine-sleep.c
index 9c56550..254349c 100644
--- a/util/qemu-coroutine-sleep.c
+++ b/util/qemu-coroutine-sleep.c
@@ -13,6 +13,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/coroutine.h"
+#include "qemu/coroutine_int.h"
 #include "qemu/timer.h"
 #include "block/aio.h"
 
@@ -25,6 +26,8 @@ static void co_sleep_cb(void *opaque)
 {
 CoSleepCB *sleep_cb = opaque;
 
+/* Write of schedule protected by barrier write in aio_co_schedule */
+atomic_set(&sleep_cb->co->scheduled, NULL);
 aio_co_wake(sleep_cb->co);
 }
 
@@ -34,6 +37,15 @@ void coroutine_fn co_aio_sleep_ns(AioContext *ctx, 
QEMUClockType type,
 CoSleepCB sleep_cb = {
 .co = qemu_coroutine_self(),
 };
+
+const char *scheduled = atomic_cmpxchg(&sleep_cb.co->scheduled, NULL,
+   __func__);
+if (scheduled) {
+fprintf(stderr,
+"%s: Co-routine was already scheduled in '%s'\n",
+__func__, scheduled);
+abort();
+}
 sleep_cb.ts = aio_timer_new(ctx, type, SCALE_NS, co_sleep_cb, &sleep_cb);
 timer_mod(sleep_cb.ts, qemu_clock_get_ns(type) + ns);
 qemu_coroutine_yield();
diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
index d6095c1..9eff7fd 100644
--- a/util/qemu-coroutine.c
+++ b/util/qemu-coroutine.c
@@ -107,8 +107,22 @@ void qemu_aio_coroutine_enter(AioContext *ctx, Coroutine 
*co)
 Coroutine *self = qemu_coroutine_self();
 CoroutineAction ret;
 
+/* Cannot rely on the read barrier for co in aio_co_wake(), as there are
+ * callers outside of aio_co_wake() */
+const char *scheduled = atomic_mb_read(&co->scheduled);
+
 trace_qemu_aio_coroutine_enter(ctx, self, co, co->entry_arg);
 
+/* if the Coroutine

[Qemu-block] [PULL 1/4] blockjob: do not allow coroutine double entry or entry-after-completion

2017-11-21 Thread Jeff Cody
When block_job_sleep_ns() is called, the co-routine is scheduled for
future execution.  If we allow the job to be re-entered prior to the
scheduled time, we present a race condition in which a coroutine can be
entered recursively, or even entered after the coroutine is deleted.

The job->busy flag is used by blockjobs when a coroutine is busy
executing. The function 'block_job_enter()' obeys the busy flag,
and will not enter a coroutine if set.  If we sleep a job, we need to
leave the busy flag set, so that subsequent calls to block_job_enter()
are prevented.

This changes the prior behavior of block_job_cancel() being able to
immediately wake up and cancel a job; in practice, this should not be an
issue, as the coroutine sleep times are generally very small, and the
cancel will occur the next time the coroutine wakes up.

This fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1508708

Signed-off-by: Jeff Cody 
Reviewed-by: Stefan Hajnoczi 
---
 blockjob.c   | 7 +--
 include/block/blockjob_int.h | 3 ++-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 3a0c491..ff9a614 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -797,11 +797,14 @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType 
type, int64_t ns)
 return;
 }
 
-job->busy = false;
+/* We need to leave job->busy set here, because when we have
+ * put a coroutine to 'sleep', we have scheduled it to run in
+ * the future.  We cannot enter that same coroutine again before
+ * it wakes and runs, otherwise we risk double-entry or entry after
+ * completion. */
 if (!block_job_should_pause(job)) {
 co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns);
 }
-job->busy = true;
 
 block_job_pause_point(job);
 }
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index f13ad05..43f3be2 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -143,7 +143,8 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
  * @ns: How many nanoseconds to stop for.
  *
  * Put the job to sleep (assuming that it wasn't canceled) for @ns
- * nanoseconds.  Canceling the job will interrupt the wait immediately.
+ * nanoseconds.  Canceling the job will not interrupt the wait, so the
+ * cancel will not process until the coroutine wakes up.
  */
 void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns);
 
-- 
2.9.5




[Qemu-block] [PULL 0/4] Late blockjob / coroutine patches for -rc2

2017-11-21 Thread Jeff Cody
The following changes since commit 7c3d1917fd7a3de906170fa3d6d3d4c5918b1e49:

  build: disarm the TCG unit test trap (2017-11-21 15:42:47 +)

are available in the git repository at:

  git://github.com/codyprime/qemu-kvm-jtc.git tags/block-pull-request

for you to fetch changes up to d975301dc8ae56fb3154348878e47a6211843c0b:

  qemu-iotest: add test for blockjob coroutine race condition (2017-11-21 
11:58:12 -0500)


Late fix for blockjob / coroutine segfaults w/iothreads


Jeff Cody (4):
  blockjob: do not allow coroutine double entry or
entry-after-completion
  coroutine: abort if we try to schedule or enter a pending coroutine
  qemu-iotests: add option in common.qemu for mismatch only
  qemu-iotest: add test for blockjob coroutine race condition

 blockjob.c |  7 ++-
 include/block/blockjob_int.h   |  3 +-
 include/qemu/coroutine_int.h   | 13 --
 tests/qemu-iotests/200 | 99 ++
 tests/qemu-iotests/200.out | 14 ++
 tests/qemu-iotests/common.qemu |  8 +++-
 tests/qemu-iotests/group   |  1 +
 util/async.c   | 13 ++
 util/qemu-coroutine-sleep.c| 12 +
 util/qemu-coroutine.c  | 14 ++
 10 files changed, 177 insertions(+), 7 deletions(-)
 create mode 100755 tests/qemu-iotests/200
 create mode 100644 tests/qemu-iotests/200.out

-- 
2.9.5




Re: [Qemu-block] [PATCH v3 for-2.11 0/4] Fix segfault in blockjob race condition

2017-11-21 Thread Jeff Cody
On Tue, Nov 21, 2017 at 10:38:49AM -0500, Jeff Cody wrote:
> Changes from v2 -> v3:
> ---
> 
> Patch 1: Updated commit message to include why immediate cancel is
>  ok to remove (Thanks Paolo)
> 
>  Dropped useless hunk (Thanks Stefan)
> 
> 
> Patch 2: Use correct atomic primitives, and document implicit
>  assumptions (Thanks Stefan, Paolo)
> 
> Fix spelling in commit message (Thanks Eric)
> 
> Patch 3/4: Unchanged.
> 
> 
> Changes from v1 -> v2:
> ---
> 
> Patch 1: Updated docs in blockjob_int.h (Thanks Stefan)
> 
> Patch 2/3: Squashed, and used const char * to hold the __func__ name of
>the original scheduler (Thanks Paolo)
> 
> Patch 4: Unchanged.
> 
> Patch 5: Dropped qcow format for the test, it was so slow the test times
>  out, and it doesn't add any new dimension to the test.
> 
> 
> # git-backport-diff -r qemu/master.. -u github/bz1508708
> 
> 001/4:[0003] [FC] 'blockjob: do not allow coroutine double entry or 
> entry-after-completion'
> 002/4:[down] 'coroutine: abort if we try to schedule or enter a pending 
> coroutine'
> 003/4:[] [--] 'qemu-iotests: add option in common.qemu for mismatch only'
> 004/4:[0002] [FC] 'qemu-iotest: add test for blockjob coroutine race 
> condition'
> 
> 
> This series fixes a race condition segfault when using iothreads with
> blockjobs.
> 
> The qemu iotest in this series is a reproducer, as is the reproducer
> script attached in this bug report:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1508708
> 
> There are two additional patches to try and catch this sort of scenario
> with an abort, before a segfault or memory corruption occurs.
> 
> 
> Jeff Cody (4):
>   blockjob: do not allow coroutine double entry or
> entry-after-completion
>   coroutine: abort if we try to schedule or enter a pending coroutine
>   qemu-iotests: add option in common.qemu for mismatch only
>   qemu-iotest: add test for blockjob coroutine race condition
> 
>  blockjob.c |  7 ++-
>  include/block/blockjob_int.h   |  3 +-
>  include/qemu/coroutine_int.h   |  6 +++
>  tests/qemu-iotests/200 | 99 
> ++
>  tests/qemu-iotests/200.out | 14 ++
>  tests/qemu-iotests/common.qemu |  8 +++-
>  tests/qemu-iotests/group   |  1 +
>  util/async.c   | 13 ++
>  util/qemu-coroutine-sleep.c| 12 +
>  util/qemu-coroutine.c  | 13 ++
>  10 files changed, 172 insertions(+), 4 deletions(-)
>  create mode 100755 tests/qemu-iotests/200
>  create mode 100644 tests/qemu-iotests/200.out
> 
> -- 
> 2.9.5
> 


Thanks,

Made the change suggested by Stefan and Paolo.

Applied to my block branch:

git://github.com/codyprime/qemu-kvm-jtc block

-Jeff



Re: [Qemu-block] [PATCH v3 for-2.11 0/4] Fix segfault in blockjob race condition

2017-11-21 Thread Stefan Hajnoczi
On Tue, Nov 21, 2017 at 10:38:49AM -0500, Jeff Cody wrote:
> Changes from v2 -> v3:
> ---
> 
> Patch 1: Updated commit message to include why immediate cancel is
>  ok to remove (Thanks Paolo)
> 
>  Dropped useless hunk (Thanks Stefan)
> 
> 
> Patch 2: Use correct atomic primitives, and document implicit
>  assumptions (Thanks Stefan, Paolo)
> 
> Fix spelling in commit message (Thanks Eric)
> 
> Patch 3/4: Unchanged.
> 
> 
> Changes from v1 -> v2:
> ---
> 
> Patch 1: Updated docs in blockjob_int.h (Thanks Stefan)
> 
> Patch 2/3: Squashed, and used const char * to hold the __func__ name of
>the original scheduler (Thanks Paolo)
> 
> Patch 4: Unchanged.
> 
> Patch 5: Dropped qcow format for the test, it was so slow the test times
>  out, and it doesn't add any new dimension to the test.
> 
> 
> # git-backport-diff -r qemu/master.. -u github/bz1508708
> 
> 001/4:[0003] [FC] 'blockjob: do not allow coroutine double entry or 
> entry-after-completion'
> 002/4:[down] 'coroutine: abort if we try to schedule or enter a pending 
> coroutine'
> 003/4:[] [--] 'qemu-iotests: add option in common.qemu for mismatch only'
> 004/4:[0002] [FC] 'qemu-iotest: add test for blockjob coroutine race 
> condition'
> 
> 
> This series fixes a race condition segfault when using iothreads with
> blockjobs.
> 
> The qemu iotest in this series is a reproducer, as is the reproducer
> script attached in this bug report:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1508708
> 
> There are two additional patches to try and catch this sort of scenario
> with an abort, before a segfault or memory corruption occurs.
> 
> 
> Jeff Cody (4):
>   blockjob: do not allow coroutine double entry or
> entry-after-completion
>   coroutine: abort if we try to schedule or enter a pending coroutine
>   qemu-iotests: add option in common.qemu for mismatch only
>   qemu-iotest: add test for blockjob coroutine race condition
> 
>  blockjob.c |  7 ++-
>  include/block/blockjob_int.h   |  3 +-
>  include/qemu/coroutine_int.h   |  6 +++
>  tests/qemu-iotests/200 | 99 
> ++
>  tests/qemu-iotests/200.out | 14 ++
>  tests/qemu-iotests/common.qemu |  8 +++-
>  tests/qemu-iotests/group   |  1 +
>  util/async.c   | 13 ++
>  util/qemu-coroutine-sleep.c| 12 +
>  util/qemu-coroutine.c  | 13 ++
>  10 files changed, 172 insertions(+), 4 deletions(-)
>  create mode 100755 tests/qemu-iotests/200
>  create mode 100644 tests/qemu-iotests/200.out

Besides the read memory barrier issue:

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH v3 for-2.11 2/4] coroutine: abort if we try to schedule or enter a pending coroutine

2017-11-21 Thread Stefan Hajnoczi
On Tue, Nov 21, 2017 at 10:38:51AM -0500, Jeff Cody wrote:
> diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
> index d6095c1..9e4ed43 100644
> --- a/util/qemu-coroutine.c
> +++ b/util/qemu-coroutine.c
> @@ -107,8 +107,21 @@ void qemu_aio_coroutine_enter(AioContext *ctx, Coroutine 
> *co)
>  Coroutine *self = qemu_coroutine_self();
>  CoroutineAction ret;
>  
> +/* Prior read of co protected by callers - e.g., aio_co_wake */
> +const char *scheduled = atomic_read(&co->scheduled);

Consider: user code -> qemu_coroutine_enter() ->
qemu_aio_coroutine_enter().  Unlike the
aio_co_wake() code path, there is no read barrier here.

I think atomic_mb_read() is needed to guarantee the thread sees the
latest value.


signature.asc
Description: PGP signature


[Qemu-block] [PATCH v3 for-2.11 4/4] qemu-iotest: add test for blockjob coroutine race condition

2017-11-21 Thread Jeff Cody
Signed-off-by: Jeff Cody 
---
 tests/qemu-iotests/200 | 99 ++
 tests/qemu-iotests/200.out | 14 +++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 114 insertions(+)
 create mode 100755 tests/qemu-iotests/200
 create mode 100644 tests/qemu-iotests/200.out

diff --git a/tests/qemu-iotests/200 b/tests/qemu-iotests/200
new file mode 100755
index 000..d8787dd
--- /dev/null
+++ b/tests/qemu-iotests/200
@@ -0,0 +1,99 @@
+#!/bin/bash
+#
+# Block job co-routine race condition test.
+#
+# See: https://bugzilla.redhat.com/show_bug.cgi?id=1508708
+#
+# Copyright (C) 2017 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=jc...@redhat.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+status=1# failure is the default!
+
+_cleanup()
+{
+_cleanup_qemu
+rm -f "${TEST_IMG}" "${BACKING_IMG}"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+. ./common.qemu
+
+_supported_fmt qcow2 qed
+_supported_proto file
+_supported_os Linux
+
+BACKING_IMG="${TEST_DIR}/backing.img"
+TEST_IMG="${TEST_DIR}/test.img"
+
+${QEMU_IMG} create -f $IMGFMT "${BACKING_IMG}" 512M | _filter_img_create
+${QEMU_IMG} create -f $IMGFMT -F $IMGFMT "${TEST_IMG}" -b "${BACKING_IMG}" 
512M | _filter_img_create
+
+${QEMU_IO} -c "write -P 0xa5 512 300M" "${BACKING_IMG}" | _filter_qemu_io
+
+echo
+echo === Starting QEMU VM ===
+echo
+qemu_comm_method="qmp"
+_launch_qemu -device pci-bridge,id=bridge1,chassis_nr=1,bus=pci.0 \
+ -object iothread,id=iothread0 \
+ -device 
virtio-scsi-pci,bus=bridge1,addr=0x1f,id=scsi0,iothread=iothread0 \
+ -drive 
file="${TEST_IMG}",media=disk,if=none,cache=none,id=drive_sysdisk,aio=native,format=$IMGFMT
 \
+ -device 
scsi-hd,drive=drive_sysdisk,bus=scsi0.0,id=sysdisk,bootindex=0
+h1=$QEMU_HANDLE
+
+_send_qemu_cmd $h1 "{ 'execute': 'qmp_capabilities' }" 'return'
+
+echo
+echo === Sending stream/cancel, checking for SIGSEGV only  ===
+echo
+for (( i=1;i<500;i++ ))
+do
+mismatch_only='y' qemu_error_no_exit='n' _send_qemu_cmd $h1 \
+   "{
+'execute': 'block-stream',
+'arguments': {
+'device': 'drive_sysdisk',
+'speed': 1000,
+'on-error': 'report',
+'job-id': 'job-$i'
+ }
+}
+{
+'execute': 'block-job-cancel',
+'arguments': {
+'device': 'job-$i'
+ }
+}" \
+   "{.*{.*}.*}"  # should match all well-formed QMP 
responses
+done
+
+silent='y' _send_qemu_cmd $h1  "{ 'execute': 'quit' }" 'return'
+
+echo "$i iterations performed"
+
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/200.out b/tests/qemu-iotests/200.out
new file mode 100644
index 000..af6a809
--- /dev/null
+++ b/tests/qemu-iotests/200.out
@@ -0,0 +1,14 @@
+QA output created by 200
+Formatting 'TEST_DIR/backing.img', fmt=IMGFMT size=536870912
+Formatting 'TEST_DIR/test.img', fmt=IMGFMT size=536870912 
backing_file=TEST_DIR/backing.img backing_fmt=IMGFMT
+wrote 314572800/314572800 bytes at offset 512
+300 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+=== Starting QEMU VM ===
+
+{"return": {}}
+
+=== Sending stream/cancel, checking for SIGSEGV only ===
+
+500 iterations performed
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 24e5ad1..25d9adf 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -194,3 +194,4 @@
 194 rw auto migration quick
 195 rw auto quick
 197 rw auto quick
+200 rw auto
-- 
2.9.5




[Qemu-block] [PATCH v3 for-2.11 3/4] qemu-iotests: add option in common.qemu for mismatch only

2017-11-21 Thread Jeff Cody
Add option to echo response to QMP / HMP command only on mismatch.

Useful for ignore all normal responses, but catching things like
segfaults.

Signed-off-by: Jeff Cody 
---
 tests/qemu-iotests/common.qemu | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/common.qemu b/tests/qemu-iotests/common.qemu
index 7b3052d..85f66b8 100644
--- a/tests/qemu-iotests/common.qemu
+++ b/tests/qemu-iotests/common.qemu
@@ -50,6 +50,8 @@ _in_fd=4
 #
 # If $silent is set to anything but an empty string, then
 # response is not echoed out.
+# If $mismatch_only is set, only non-matching responses will
+# be echoed.
 function _timed_wait_for()
 {
 local h=${1}
@@ -58,14 +60,18 @@ function _timed_wait_for()
 QEMU_STATUS[$h]=0
 while IFS= read -t ${QEMU_COMM_TIMEOUT} resp <&${QEMU_OUT[$h]}
 do
-if [ -z "${silent}" ]; then
+if [ -z "${silent}" ] && [ -z "${mismatch_only}" ]; then
 echo "${resp}" | _filter_testdir | _filter_qemu \
| _filter_qemu_io | _filter_qmp | _filter_hmp
 fi
 grep -q "${*}" < <(echo "${resp}")
 if [ $? -eq 0 ]; then
 return
+elif [ -z "${silent}" ] && [ -n "${mismatch_only}" ]; then
+echo "${resp}" | _filter_testdir | _filter_qemu \
+   | _filter_qemu_io | _filter_qmp | _filter_hmp
 fi
+
 done
 QEMU_STATUS[$h]=-1
 if [ -z "${qemu_error_no_exit}" ]; then
-- 
2.9.5




[Qemu-block] [PATCH v3 for-2.11 1/4] blockjob: do not allow coroutine double entry or entry-after-completion

2017-11-21 Thread Jeff Cody
When block_job_sleep_ns() is called, the co-routine is scheduled for
future execution.  If we allow the job to be re-entered prior to the
scheduled time, we present a race condition in which a coroutine can be
entered recursively, or even entered after the coroutine is deleted.

The job->busy flag is used by blockjobs when a coroutine is busy
executing. The function 'block_job_enter()' obeys the busy flag,
and will not enter a coroutine if set.  If we sleep a job, we need to
leave the busy flag set, so that subsequent calls to block_job_enter()
are prevented.

This changes the prior behavior of block_job_cancel() being able to
immediately wake up and cancel a job; in practice, this should not be an
issue, as the coroutine sleep times are generally very small, and the
cancel will occur the next time the coroutine wakes up.

This fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1508708

Signed-off-by: Jeff Cody 
---
 blockjob.c   | 7 +--
 include/block/blockjob_int.h | 3 ++-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 3a0c491..ff9a614 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -797,11 +797,14 @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType 
type, int64_t ns)
 return;
 }
 
-job->busy = false;
+/* We need to leave job->busy set here, because when we have
+ * put a coroutine to 'sleep', we have scheduled it to run in
+ * the future.  We cannot enter that same coroutine again before
+ * it wakes and runs, otherwise we risk double-entry or entry after
+ * completion. */
 if (!block_job_should_pause(job)) {
 co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns);
 }
-job->busy = true;
 
 block_job_pause_point(job);
 }
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index f13ad05..43f3be2 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -143,7 +143,8 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
  * @ns: How many nanoseconds to stop for.
  *
  * Put the job to sleep (assuming that it wasn't canceled) for @ns
- * nanoseconds.  Canceling the job will interrupt the wait immediately.
+ * nanoseconds.  Canceling the job will not interrupt the wait, so the
+ * cancel will not process until the coroutine wakes up.
  */
 void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns);
 
-- 
2.9.5




[Qemu-block] [PATCH v3 for-2.11 2/4] coroutine: abort if we try to schedule or enter a pending coroutine

2017-11-21 Thread Jeff Cody
The previous patch fixed a race condition, in which there were
coroutines being executing doubly, or after coroutine deletion.

We can detect common scenarios when this happens, and print an error
message and abort before we corrupt memory / data, or segfault.

This patch will abort if an attempt to enter a coroutine is made while
it is currently pending execution, either in a specific AioContext bh,
or pending execution via a timer.  It will also abort if a coroutine
is scheduled, before a prior scheduled run has occurred.

We cannot rely on the existing co->caller check for recursive re-entry
to catch this, as the coroutine may run and exit with
COROUTINE_TERMINATE before the scheduled coroutine executes.

(This is the scenario that was occurring and fixed in the previous
patch).

Signed-off-by: Jeff Cody 
---
 include/qemu/coroutine_int.h |  6 ++
 util/async.c | 13 +
 util/qemu-coroutine-sleep.c  | 12 
 util/qemu-coroutine.c| 13 +
 4 files changed, 44 insertions(+)

diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
index cb98892..56e4c48 100644
--- a/include/qemu/coroutine_int.h
+++ b/include/qemu/coroutine_int.h
@@ -53,6 +53,12 @@ struct Coroutine {
 
 /* Only used when the coroutine has yielded.  */
 AioContext *ctx;
+
+/* Used to catch and abort on illegal co-routine entry.
+ * Will contain the name of the function that had first
+ * scheduled the coroutine. */
+const char *scheduled;
+
 QSIMPLEQ_ENTRY(Coroutine) co_queue_next;
 QSLIST_ENTRY(Coroutine) co_scheduled_next;
 };
diff --git a/util/async.c b/util/async.c
index 0e1bd87..4dd9d95 100644
--- a/util/async.c
+++ b/util/async.c
@@ -388,6 +388,9 @@ static void co_schedule_bh_cb(void *opaque)
 QSLIST_REMOVE_HEAD(&straight, co_scheduled_next);
 trace_aio_co_schedule_bh_cb(ctx, co);
 aio_context_acquire(ctx);
+
+/* Protected by write barrier in qemu_aio_coroutine_enter */
+atomic_set(&co->scheduled, NULL);
 qemu_coroutine_enter(co);
 aio_context_release(ctx);
 }
@@ -438,6 +441,16 @@ fail:
 void aio_co_schedule(AioContext *ctx, Coroutine *co)
 {
 trace_aio_co_schedule(ctx, co);
+const char *scheduled = atomic_cmpxchg(&co->scheduled, NULL,
+   __func__);
+
+if (scheduled) {
+fprintf(stderr,
+"%s: Co-routine was already scheduled in '%s'\n",
+__func__, scheduled);
+abort();
+}
+
 QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines,
   co, co_scheduled_next);
 qemu_bh_schedule(ctx->co_schedule_bh);
diff --git a/util/qemu-coroutine-sleep.c b/util/qemu-coroutine-sleep.c
index 9c56550..254349c 100644
--- a/util/qemu-coroutine-sleep.c
+++ b/util/qemu-coroutine-sleep.c
@@ -13,6 +13,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/coroutine.h"
+#include "qemu/coroutine_int.h"
 #include "qemu/timer.h"
 #include "block/aio.h"
 
@@ -25,6 +26,8 @@ static void co_sleep_cb(void *opaque)
 {
 CoSleepCB *sleep_cb = opaque;
 
+/* Write of schedule protected by barrier write in aio_co_schedule */
+atomic_set(&sleep_cb->co->scheduled, NULL);
 aio_co_wake(sleep_cb->co);
 }
 
@@ -34,6 +37,15 @@ void coroutine_fn co_aio_sleep_ns(AioContext *ctx, 
QEMUClockType type,
 CoSleepCB sleep_cb = {
 .co = qemu_coroutine_self(),
 };
+
+const char *scheduled = atomic_cmpxchg(&sleep_cb.co->scheduled, NULL,
+   __func__);
+if (scheduled) {
+fprintf(stderr,
+"%s: Co-routine was already scheduled in '%s'\n",
+__func__, scheduled);
+abort();
+}
 sleep_cb.ts = aio_timer_new(ctx, type, SCALE_NS, co_sleep_cb, &sleep_cb);
 timer_mod(sleep_cb.ts, qemu_clock_get_ns(type) + ns);
 qemu_coroutine_yield();
diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
index d6095c1..9e4ed43 100644
--- a/util/qemu-coroutine.c
+++ b/util/qemu-coroutine.c
@@ -107,8 +107,21 @@ void qemu_aio_coroutine_enter(AioContext *ctx, Coroutine 
*co)
 Coroutine *self = qemu_coroutine_self();
 CoroutineAction ret;
 
+/* Prior read of co protected by callers - e.g., aio_co_wake */
+const char *scheduled = atomic_read(&co->scheduled);
+
 trace_qemu_aio_coroutine_enter(ctx, self, co, co->entry_arg);
 
+/* if the Coroutine has already been scheduled, entering it again will
+ * cause us to enter it twice, potentially even after the coroutine has
+ * been deleted */
+if (scheduled) {
+fprintf(stderr,
+"%s: Co-routine was already scheduled in '%s'\n",
+__func__, scheduled);
+abort();
+}
+
 if (co->caller) {
 fprintf(stderr, "Co-routine re-entered recursively\n");
 abort();
-- 
2.9.5




[Qemu-block] [PATCH v3 for-2.11 0/4] Fix segfault in blockjob race condition

2017-11-21 Thread Jeff Cody
Changes from v2 -> v3:
---

Patch 1: Updated commit message to include why immediate cancel is
 ok to remove (Thanks Paolo)

 Dropped useless hunk (Thanks Stefan)


Patch 2: Use correct atomic primitives, and document implicit
 assumptions (Thanks Stefan, Paolo)

Fix spelling in commit message (Thanks Eric)

Patch 3/4: Unchanged.


Changes from v1 -> v2:
---

Patch 1: Updated docs in blockjob_int.h (Thanks Stefan)

Patch 2/3: Squashed, and used const char * to hold the __func__ name of
   the original scheduler (Thanks Paolo)

Patch 4: Unchanged.

Patch 5: Dropped qcow format for the test, it was so slow the test times
 out, and it doesn't add any new dimension to the test.


# git-backport-diff -r qemu/master.. -u github/bz1508708

001/4:[0003] [FC] 'blockjob: do not allow coroutine double entry or 
entry-after-completion'
002/4:[down] 'coroutine: abort if we try to schedule or enter a pending 
coroutine'
003/4:[] [--] 'qemu-iotests: add option in common.qemu for mismatch only'
004/4:[0002] [FC] 'qemu-iotest: add test for blockjob coroutine race condition'


This series fixes a race condition segfault when using iothreads with
blockjobs.

The qemu iotest in this series is a reproducer, as is the reproducer
script attached in this bug report:

https://bugzilla.redhat.com/show_bug.cgi?id=1508708

There are two additional patches to try and catch this sort of scenario
with an abort, before a segfault or memory corruption occurs.


Jeff Cody (4):
  blockjob: do not allow coroutine double entry or
entry-after-completion
  coroutine: abort if we try to schedule or enter a pending coroutine
  qemu-iotests: add option in common.qemu for mismatch only
  qemu-iotest: add test for blockjob coroutine race condition

 blockjob.c |  7 ++-
 include/block/blockjob_int.h   |  3 +-
 include/qemu/coroutine_int.h   |  6 +++
 tests/qemu-iotests/200 | 99 ++
 tests/qemu-iotests/200.out | 14 ++
 tests/qemu-iotests/common.qemu |  8 +++-
 tests/qemu-iotests/group   |  1 +
 util/async.c   | 13 ++
 util/qemu-coroutine-sleep.c| 12 +
 util/qemu-coroutine.c  | 13 ++
 10 files changed, 172 insertions(+), 4 deletions(-)
 create mode 100755 tests/qemu-iotests/200
 create mode 100644 tests/qemu-iotests/200.out

-- 
2.9.5




Re: [Qemu-block] segfault in parallel blockjobs (iotest 30)

2017-11-21 Thread Alberto Garcia
On Tue 21 Nov 2017 04:18:13 PM CET, Anton Nefedov wrote:

>>> Or, perhaps another approach, keep BlockJob referenced while it is
>>> paused (by block_job_pause/resume_all()). That should prevent it
>>> from deleting the BB.
>> 
>> Yes, I tried this and it actually solves the issue. But I still think
>> that the problem is that block jobs are allowed to finish when they
>> are paused.
>
> Agree, but
>
>> Adding block_job_pause_point(&s->common) at the end of stream_run()
>> fixes the problem too.
>
> would be a nice fix, but it only works unless the job is already
> deferred, right?

Right, I didn't mean to propose it as the proper solution (it would
still leave mirror job vulnerable because it's already paused by the
time it calls defer_to_main_loop()).

> This:
>
>  >> keep BlockJob referenced while it is
>  >> paused (by block_job_pause/resume_all()). That should prevent it from
>  >> deleting the BB.
>
> looks kind of hacky; maybe referencing in block_job_pause() (and not
> just pause_all) seems more correct? I think it didn't work for me
> right away though. But I can look more.

You have to be careful when you unref the block job because you may
destroy it, and therefore block_job_next() in block_job_resume_all()
would be using freed memory.

Berto



Re: [Qemu-block] segfault in parallel blockjobs (iotest 30)

2017-11-21 Thread Anton Nefedov



On 21/11/2017 3:51 PM, Alberto Garcia wrote:

On Thu 16 Nov 2017 05:09:59 PM CET, Anton Nefedov wrote:

I have the impression that one major source of headaches is the
fact that the reopen queue contains nodes that don't need to be
reopened at all. Ideally this should be detected early on in
bdrv_reopen_queue(), so there's no chance that the queue contains
nodes used by a different block job. If we had that then op
blockers should be enough to prevent these things. Or am I missing
something?


After applying Max's patch I tried the similar approach; that is
keep BDSes referenced while they are in the reopen queue.  Now I get
the stream job hanging. Somehow one blk_root_drained_begin() is not
paired with blk_root_drained_end(). So the job stays paused.


I can see this if I apply Max's patch and keep refs to BDSs in the
reopen queue:

#0  block_job_pause (...) at blockjob.c:130
#1  0x55c143cb586d in block_job_drained_begin (...) at blockjob.c:227
#2  0x55c143d08067 in blk_set_dev_ops (...) at block/block-backend.c:887
#3  0x55c143cb69db in block_job_create (...) at blockjob.c:678
#4  0x55c143d17c0c in mirror_start_job (...) at block/mirror.c:1177

There's a ops->drained_begin(opaque) call in blk_set_dev_ops() that
doesn't seem to be paired.


My understanding for now is

1. in bdrv_drain_all_begin(), BDS gets drained with
   bdrv_parent_drained_begin(), all the parents get
   blk_root_drained_begin(), pause their jobs.
2. one of the jobs finishes and deletes BB.
3. in bdrv_drain_all_end(), bdrv_parent_drained_end() is never
   called because even though BDS still exists (referenced in the
   queue), it cannot be accessed as bdrv_next() takes BDS from the
   global BB list (and the corresponding BB is deleted).


Yes, I was debugging this and I got to a similar conclusion. With
test_stream_commit from iotest 030 I can see that

1. the block-stream job is created first, then stream_run begins and
   starts copying the data.
2. block-commit starts and tries to reopen the top image in
   read-write mode. This pauses the stream block job and drains all
   BDSs.
3. The draining causes the stream job to finish, it is deferred to
   the main loop, then stream_complete finishes and unrefs the block
   job, deleting it. At the point of deletion the pause count was
   still > 0 (because of step (2))


Max's patch v1 could have helped:
http://lists.nongnu.org/archive/html/qemu-devel/2017-11/msg01835.html


This doesn't solve the problem.


Or, perhaps another approach, keep BlockJob referenced while it is
paused (by block_job_pause/resume_all()). That should prevent it from
deleting the BB.


Yes, I tried this and it actually solves the issue. But I still think
that the problem is that block jobs are allowed to finish when they are
paused.



Agree, but


Adding block_job_pause_point(&s->common) at the end of stream_run()
fixes the problem too.



would be a nice fix, but it only works unless the job is already
deferred, right? And the BH can't be held off while job->paused because
with mirror job it is always the case.
So we cover most of the cases but also lose the stable reproduction.

This:

>> keep BlockJob referenced while it is
>> paused (by block_job_pause/resume_all()). That should prevent it from
>> deleting the BB.

looks kind of hacky; maybe referencing in block_job_pause() (and not
just pause_all) seems more correct? I think it didn't work for me
right away though. But I can look more.

/Anton


Berto





Re: [Qemu-block] [PATCH v2 for-2.11 2/4] coroutine: abort if we try to schedule or enter a pending coroutine

2017-11-21 Thread Paolo Bonzini
On 21/11/2017 14:47, Kevin Wolf wrote:
> Am 21.11.2017 um 03:23 hat Jeff Cody geschrieben:
>> The previous patch fixed a race condition, in which there were
>> coroutines being executing doubly, or after coroutine deletion.
>>
>> We can detect common scenarios when this happens, and print an error
>> message and abort before we corrupt memory / data, or segfault.
>>
>> This patch will abort if an attempt to enter a coroutine is made while
>> it is currently pending execution, either in a specific AioContext bh,
>> or pending execution via a timer.  It will also abort if a coroutine
>> is scheduled, before a prior scheduled run has occured.
>>
>> We cannot rely on the existing co->caller check for recursive re-entry
>> to catch this, as the coroutine may run and exit with
>> COROUTINE_TERMINATE before the scheduled coroutine executes.
>>
>> (This is the scenario that was occuring and fixed in the previous
>> patch).
>>
>> Signed-off-by: Jeff Cody 
>> ---
>>  include/qemu/coroutine_int.h |  6 ++
>>  util/async.c | 11 +++
>>  util/qemu-coroutine-sleep.c  | 11 +++
>>  util/qemu-coroutine.c| 11 +++
>>  4 files changed, 39 insertions(+)
>>
>> diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
>> index cb98892..56e4c48 100644
>> --- a/include/qemu/coroutine_int.h
>> +++ b/include/qemu/coroutine_int.h
>> @@ -53,6 +53,12 @@ struct Coroutine {
>>  
>>  /* Only used when the coroutine has yielded.  */
>>  AioContext *ctx;
>> +
>> +/* Used to catch and abort on illegal co-routine entry.
>> + * Will contain the name of the function that had first
>> + * scheduled the coroutine. */
>> +const char *scheduled;
> 
> Not sure if it makes any difference in practice, but I just want to
> mention that the new field is right after a cacheline boundary and
> the only field that is used in qemu_aio_coroutine_enter() and accesses
> this second cacheline.
> 
> I'm not paying much attention to this kind of thing in most contexts,
> but entering a coroutine is a hot path that we want to be fast, so maybe
> it's worth having a second look.

Makes sense!  Since co_queue_wakeup is used on *yield*, maybe the order
should be: ctx, scheduled, co_queue_next, co_queue_wakeup,
co_scheduled_next.

Thanks,

Paolo



[Qemu-block] [PULL 5/7] block: Error out on load_vm with active dirty bitmaps

2017-11-21 Thread Kevin Wolf
Loading a snapshot invalidates the bitmap. Just marking all blocks dirty
is not a useful response in practice, instead the user needs to be aware
that we switch to a completely different state. If they are okay with
losing the dirty bitmap, they can just explicitly delete it.

This effectively reverts commit 04dec3c3ae5.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Denis V. Lunev 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: John Snow 
---
 block/snapshot.c | 15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/block/snapshot.c b/block/snapshot.c
index 8585599579..8cb70dbad5 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -182,25 +182,16 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
 {
 BlockDriver *drv = bs->drv;
 int ret, open_ret;
-int64_t len;
 
 if (!drv) {
 error_setg(errp, "Block driver is closed");
 return -ENOMEDIUM;
 }
 
-len = bdrv_getlength(bs);
-if (len < 0) {
-error_setg_errno(errp, -len, "Cannot get block device size");
-return len;
+if (!QLIST_EMPTY(&bs->dirty_bitmaps)) {
+error_setg(errp, "Device has active dirty bitmaps");
+return -EBUSY;
 }
-/* We should set all bits in all enabled dirty bitmaps, because dirty
- * bitmaps reflect active state of disk and snapshot switch operation
- * actually dirties active state.
- * TODO: It may make sense not to set all bits but analyze block status of
- * current state and destination snapshot and do not set bits corresponding
- * to both-zero or both-unallocated areas. */
-bdrv_set_dirty(bs, 0, len);
 
 if (drv->bdrv_snapshot_goto) {
 ret = drv->bdrv_snapshot_goto(bs, snapshot_id);
-- 
2.13.6




[Qemu-block] [PULL 7/7] iotests: Fix 176 on 32-bit host

2017-11-21 Thread Kevin Wolf
From: Eric Blake 

The contents of a qcow2 bitmap are rounded up to a size that
matches the number of bits available for the granularity, but
that granularity differs for 32-bit hosts (our default 64k
cluster allows for 2M bitmap coverage per 'long') and 64-bit
hosts (4M bitmap per 'long').  If the image is a multiple of
2M but not 4M, then the number of bytes occupied by the array
of longs in memory differs between architecture, thus
resulting in different SHA256 hashes.

Furthermore (but untested by me), if our computation of the
SHA256 hash is at all endian-dependent because of how we store
data in memory, that's another variable we'd have to account
for (ideally, we specified the bitmap stored in qcow2 as
fixed-endian on disk, because the same qcow2 file must be
usable across any architecture; but that says nothing about
how we represent things in memory).  But we already have test
165 to validate that bitmaps are stored correctly on disk,
while this test is merely testing that the bitmap exists.

So for this test, the easiest solution is to filter out the
actual hash value.  Broken in commit 4096974e.

Reported-by: Max Reitz 
Signed-off-by: Eric Blake 
Message-id: 20171117190422.23626-1-ebl...@redhat.com
Reviewed-by: John Snow 
Reviewed-by: Max Reitz 
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/176 | 3 ++-
 tests/qemu-iotests/176.out | 8 
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/tests/qemu-iotests/176 b/tests/qemu-iotests/176
index 0f31a20294..b8dc17c592 100755
--- a/tests/qemu-iotests/176
+++ b/tests/qemu-iotests/176
@@ -52,7 +52,8 @@ _supported_os Linux
 function run_qemu()
 {
 $QEMU -nographic -qmp stdio -serial none "$@" 2>&1 \
-   | _filter_testdir | _filter_qmp | _filter_qemu
+   | _filter_testdir | _filter_qmp | _filter_qemu \
+| sed 's/"sha256": ".\{64\}"/"sha256": HASH/'
 }
 
 for reason in snapshot bitmap; do
diff --git a/tests/qemu-iotests/176.out b/tests/qemu-iotests/176.out
index e62085cd0a..f03a2e776c 100644
--- a/tests/qemu-iotests/176.out
+++ b/tests/qemu-iotests/176.out
@@ -205,7 +205,7 @@ Offset  Length  File
 QMP_VERSION
 {"return": {}}
 {"return": {}}
-{"return": {"sha256": 
"e12600978d86b5a453861ae5c17d275204673fef3874b7c3c5433c6153d84706"}}
+{"return": {"sha256": HASH}}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"SHUTDOWN", "data": {"guest": false}}
 
@@ -255,7 +255,7 @@ Offset  Length  File
 QMP_VERSION
 {"return": {}}
 {"return": {}}
-{"return": {"sha256": 
"e12600978d86b5a453861ae5c17d275204673fef3874b7c3c5433c6153d84706"}}
+{"return": {"sha256": HASH}}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"SHUTDOWN", "data": {"guest": false}}
 
@@ -305,7 +305,7 @@ Offset  Length  File
 QMP_VERSION
 {"return": {}}
 {"return": {}}
-{"return": {"sha256": 
"e12600978d86b5a453861ae5c17d275204673fef3874b7c3c5433c6153d84706"}}
+{"return": {"sha256": HASH}}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"SHUTDOWN", "data": {"guest": false}}
 
@@ -352,7 +352,7 @@ Offset  Length  File
 QMP_VERSION
 {"return": {}}
 {"return": {}}
-{"return": {"sha256": 
"e12600978d86b5a453861ae5c17d275204673fef3874b7c3c5433c6153d84706"}}
+{"return": {"sha256": HASH}}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"SHUTDOWN", "data": {"guest": false}}
 *** done
-- 
2.13.6




[Qemu-block] [PULL 3/7] block: Add errp to bdrv_snapshot_goto()

2017-11-21 Thread Kevin Wolf
Signed-off-by: Kevin Wolf 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: John Snow 
---
 include/block/snapshot.h |  3 ++-
 block/snapshot.c | 23 +--
 qemu-img.c   |  6 +++---
 3 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/include/block/snapshot.h b/include/block/snapshot.h
index e5c0553115..aeb80405e8 100644
--- a/include/block/snapshot.h
+++ b/include/block/snapshot.h
@@ -57,7 +57,8 @@ int bdrv_can_snapshot(BlockDriverState *bs);
 int bdrv_snapshot_create(BlockDriverState *bs,
  QEMUSnapshotInfo *sn_info);
 int bdrv_snapshot_goto(BlockDriverState *bs,
-   const char *snapshot_id);
+   const char *snapshot_id,
+   Error **errp);
 int bdrv_snapshot_delete(BlockDriverState *bs,
  const char *snapshot_id,
  const char *name,
diff --git a/block/snapshot.c b/block/snapshot.c
index be0743abac..75562df4cc 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -177,18 +177,21 @@ int bdrv_snapshot_create(BlockDriverState *bs,
 }
 
 int bdrv_snapshot_goto(BlockDriverState *bs,
-   const char *snapshot_id)
+   const char *snapshot_id,
+   Error **errp)
 {
 BlockDriver *drv = bs->drv;
 int ret, open_ret;
 int64_t len;
 
 if (!drv) {
+error_setg(errp, "Block driver is closed");
 return -ENOMEDIUM;
 }
 
 len = bdrv_getlength(bs);
 if (len < 0) {
+error_setg_errno(errp, -len, "Cannot get block device size");
 return len;
 }
 /* We should set all bits in all enabled dirty bitmaps, because dirty
@@ -200,13 +203,18 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
 bdrv_set_dirty(bs, 0, len);
 
 if (drv->bdrv_snapshot_goto) {
-return drv->bdrv_snapshot_goto(bs, snapshot_id);
+ret = drv->bdrv_snapshot_goto(bs, snapshot_id);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to load snapshot");
+}
+return ret;
 }
 
 if (bs->file) {
 BlockDriverState *file;
 QDict *options = qdict_clone_shallow(bs->options);
 QDict *file_options;
+Error *local_err = NULL;
 
 file = bs->file->bs;
 /* Prevent it from getting deleted when detached from bs */
@@ -220,13 +228,15 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
 bdrv_unref_child(bs, bs->file);
 bs->file = NULL;
 
-ret = bdrv_snapshot_goto(file, snapshot_id);
-open_ret = drv->bdrv_open(bs, options, bs->open_flags, NULL);
+ret = bdrv_snapshot_goto(file, snapshot_id, errp);
+open_ret = drv->bdrv_open(bs, options, bs->open_flags, &local_err);
 QDECREF(options);
 if (open_ret < 0) {
 bdrv_unref(file);
 bs->drv = NULL;
-return open_ret;
+/* A bdrv_snapshot_goto() error takes precedence */
+error_propagate(errp, local_err);
+return ret < 0 ? ret : open_ret;
 }
 
 assert(bs->file->bs == file);
@@ -234,6 +244,7 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
 return ret;
 }
 
+error_setg(errp, "Block driver does not support snapshots");
 return -ENOTSUP;
 }
 
@@ -467,7 +478,7 @@ int bdrv_all_goto_snapshot(const char *name, 
BlockDriverState **first_bad_bs)
 
 aio_context_acquire(ctx);
 if (bdrv_can_snapshot(bs)) {
-err = bdrv_snapshot_goto(bs, name);
+err = bdrv_snapshot_goto(bs, name, NULL);
 }
 aio_context_release(ctx);
 if (err < 0) {
diff --git a/qemu-img.c b/qemu-img.c
index 02a6e27beb..68b375f998 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2989,10 +2989,10 @@ static int img_snapshot(int argc, char **argv)
 break;
 
 case SNAPSHOT_APPLY:
-ret = bdrv_snapshot_goto(bs, snapshot_name);
+ret = bdrv_snapshot_goto(bs, snapshot_name, &err);
 if (ret) {
-error_report("Could not apply snapshot '%s': %d (%s)",
-snapshot_name, ret, strerror(-ret));
+error_reportf_err(err, "Could not apply snapshot '%s': ",
+  snapshot_name);
 }
 break;
 
-- 
2.13.6




[Qemu-block] [PULL 2/7] block: Don't request I/O permission with BDRV_O_NO_IO

2017-11-21 Thread Kevin Wolf
'qemu-img info' makes sense even when BLK_PERM_CONSISTENT_READ cannot be
granted because of a block job in a running qemu process. It already
sets BDRV_O_NO_IO to indicate that it doesn't access the guest visible
data at all.

Check the BDRV_O_NO_IO flags in blk_new_open(), so that I/O related
permissions are not unnecessarily requested and 'qemu-img info' can work
even if BLK_PERM_CONSISTENT_READ cannot be granted.

Signed-off-by: Kevin Wolf 
Reviewed-by: Fam Zheng 
Reviewed-by: Alberto Garcia 
---
 block/block-backend.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 5836cb3087..baef8e7abc 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -299,7 +299,7 @@ BlockBackend *blk_new_open(const char *filename, const char 
*reference,
 {
 BlockBackend *blk;
 BlockDriverState *bs;
-uint64_t perm;
+uint64_t perm = 0;
 
 /* blk_new_open() is mainly used in .bdrv_create implementations and the
  * tools where sharing isn't a concern because the BDS stays private, so we
@@ -309,9 +309,11 @@ BlockBackend *blk_new_open(const char *filename, const 
char *reference,
  * caller of blk_new_open() doesn't make use of the permissions, but they
  * shouldn't hurt either. We can still share everything here because the
  * guest devices will add their own blockers if they can't share. */
-perm = BLK_PERM_CONSISTENT_READ;
-if (flags & BDRV_O_RDWR) {
-perm |= BLK_PERM_WRITE;
+if ((flags & BDRV_O_NO_IO) == 0) {
+perm |= BLK_PERM_CONSISTENT_READ;
+if (flags & BDRV_O_RDWR) {
+perm |= BLK_PERM_WRITE;
+}
 }
 if (flags & BDRV_O_RESIZE) {
 perm |= BLK_PERM_RESIZE;
-- 
2.13.6




[Qemu-block] [PULL 4/7] block: Add errp to bdrv_all_goto_snapshot()

2017-11-21 Thread Kevin Wolf
Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Denis V. Lunev 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: John Snow 
---
 include/block/snapshot.h |  3 ++-
 block/snapshot.c | 11 ++-
 migration/savevm.c   |  6 +++---
 3 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/include/block/snapshot.h b/include/block/snapshot.h
index aeb80405e8..9407799941 100644
--- a/include/block/snapshot.h
+++ b/include/block/snapshot.h
@@ -84,7 +84,8 @@ int bdrv_snapshot_load_tmp_by_id_or_name(BlockDriverState *bs,
 bool bdrv_all_can_snapshot(BlockDriverState **first_bad_bs);
 int bdrv_all_delete_snapshot(const char *name, BlockDriverState **first_bsd_bs,
  Error **err);
-int bdrv_all_goto_snapshot(const char *name, BlockDriverState **first_bsd_bs);
+int bdrv_all_goto_snapshot(const char *name, BlockDriverState **first_bad_bs,
+   Error **errp);
 int bdrv_all_find_snapshot(const char *name, BlockDriverState **first_bad_bs);
 int bdrv_all_create_snapshot(QEMUSnapshotInfo *sn,
  BlockDriverState *vm_state_bs,
diff --git a/block/snapshot.c b/block/snapshot.c
index 75562df4cc..8585599579 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -467,9 +467,10 @@ fail:
 }
 
 
-int bdrv_all_goto_snapshot(const char *name, BlockDriverState **first_bad_bs)
+int bdrv_all_goto_snapshot(const char *name, BlockDriverState **first_bad_bs,
+   Error **errp)
 {
-int err = 0;
+int ret = 0;
 BlockDriverState *bs;
 BdrvNextIterator it;
 
@@ -478,10 +479,10 @@ int bdrv_all_goto_snapshot(const char *name, 
BlockDriverState **first_bad_bs)
 
 aio_context_acquire(ctx);
 if (bdrv_can_snapshot(bs)) {
-err = bdrv_snapshot_goto(bs, name, NULL);
+ret = bdrv_snapshot_goto(bs, name, errp);
 }
 aio_context_release(ctx);
-if (err < 0) {
+if (ret < 0) {
 bdrv_next_cleanup(&it);
 goto fail;
 }
@@ -489,7 +490,7 @@ int bdrv_all_goto_snapshot(const char *name, 
BlockDriverState **first_bad_bs)
 
 fail:
 *first_bad_bs = bs;
-return err;
+return ret;
 }
 
 int bdrv_all_find_snapshot(const char *name, BlockDriverState **first_bad_bs)
diff --git a/migration/savevm.c b/migration/savevm.c
index 4a88228614..192f2d82cd 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2346,10 +2346,10 @@ int load_snapshot(const char *name, Error **errp)
 /* Flush all IO requests so they don't interfere with the new state.  */
 bdrv_drain_all_begin();
 
-ret = bdrv_all_goto_snapshot(name, &bs);
+ret = bdrv_all_goto_snapshot(name, &bs, errp);
 if (ret < 0) {
-error_setg(errp, "Error %d while activating snapshot '%s' on '%s'",
- ret, name, bdrv_get_device_name(bs));
+error_prepend(errp, "Could not load snapshot '%s' on '%s': ",
+  name, bdrv_get_device_name(bs));
 goto err_drain;
 }
 
-- 
2.13.6




[Qemu-block] [PULL 0/7] Block layer patches for 2.11.0-rc2

2017-11-21 Thread Kevin Wolf
The following changes since commit d0dead3b6df7f6cd970ed02e8369ab8730aac9d3:

  scripts/make-release: ship u-boot source as a tarball (2017-11-21 12:48:20 
+)

are available in the git repository at:

  git://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to 4fd0295c151e596944be2f8062c4563cdf9106a0:

  Merge remote-tracking branch 'mreitz/tags/pull-block-2017-11-21' into 
queue-block (2017-11-21 15:09:54 +0100)


Block layer patches for 2.11.0-rc2


Alberto Garcia (1):
  block: Close a BlockDriverState completely even when bs->drv is NULL

Eric Blake (1):
  iotests: Fix 176 on 32-bit host

Kevin Wolf (6):
  block: Don't use BLK_PERM_CONSISTENT_READ for format probing
  block: Don't request I/O permission with BDRV_O_NO_IO
  block: Add errp to bdrv_snapshot_goto()
  block: Add errp to bdrv_all_goto_snapshot()
  block: Error out on load_vm with active dirty bitmaps
  Merge remote-tracking branch 'mreitz/tags/pull-block-2017-11-21' into 
queue-block

 include/block/snapshot.h   |  6 +++--
 block.c| 62 --
 block/block-backend.c  | 10 +---
 block/snapshot.c   | 45 +
 migration/savevm.c |  6 ++---
 qemu-img.c |  6 ++---
 tests/qemu-iotests/060 | 13 ++
 tests/qemu-iotests/060.out | 12 +
 tests/qemu-iotests/176 |  3 ++-
 tests/qemu-iotests/176.out |  8 +++---
 10 files changed, 103 insertions(+), 68 deletions(-)



[Qemu-block] [PULL 6/7] block: Close a BlockDriverState completely even when bs->drv is NULL

2017-11-21 Thread Kevin Wolf
From: Alberto Garcia 

bdrv_close() skips much of its logic when bs->drv is NULL. This is
fine when we're closing a BlockDriverState that has just been created
(because e.g the initialization process failed), but it's not enough
in other cases.

For example, when a valid qcow2 image is found to be corrupted then
QEMU marks it as such in the file header and then sets bs->drv to
NULL in order to make the BlockDriverState unusable. When that BDS is
later closed then many of its data structures are not freed (leaking
their memory) and none of its children are detached. This results in
bdrv_close_all() failing to close all BDSs and making this assertion
fail when QEMU is being shut down:

   bdrv_close_all: Assertion `QTAILQ_EMPTY(&all_bdrv_states)' failed.

This patch makes bdrv_close() do the full uninitialization process
in all cases. This fixes the problem with corrupted images and still
works fine with freshly created BDSs.

Signed-off-by: Alberto Garcia 
Message-id: 20171106145345.12038-1-be...@igalia.com
Reviewed-by: Eric Blake 
Signed-off-by: Max Reitz 
---
 block.c| 57 +++---
 tests/qemu-iotests/060 | 13 +++
 tests/qemu-iotests/060.out | 12 ++
 3 files changed, 53 insertions(+), 29 deletions(-)

diff --git a/block.c b/block.c
index 68b724206d..9a1a0d1e73 100644
--- a/block.c
+++ b/block.c
@@ -3198,6 +3198,7 @@ void bdrv_reopen_abort(BDRVReopenState *reopen_state)
 static void bdrv_close(BlockDriverState *bs)
 {
 BdrvAioNotifier *ban, *ban_next;
+BdrvChild *child, *next;
 
 assert(!bs->job);
 assert(!bs->refcnt);
@@ -3207,43 +3208,41 @@ static void bdrv_close(BlockDriverState *bs)
 bdrv_drain(bs); /* in case flush left pending I/O */
 
 if (bs->drv) {
-BdrvChild *child, *next;
-
 bs->drv->bdrv_close(bs);
 bs->drv = NULL;
+}
 
-bdrv_set_backing_hd(bs, NULL, &error_abort);
+bdrv_set_backing_hd(bs, NULL, &error_abort);
 
-if (bs->file != NULL) {
-bdrv_unref_child(bs, bs->file);
-bs->file = NULL;
-}
+if (bs->file != NULL) {
+bdrv_unref_child(bs, bs->file);
+bs->file = NULL;
+}
 
-QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
-/* TODO Remove bdrv_unref() from drivers' close function and use
- * bdrv_unref_child() here */
-if (child->bs->inherits_from == bs) {
-child->bs->inherits_from = NULL;
-}
-bdrv_detach_child(child);
+QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
+/* TODO Remove bdrv_unref() from drivers' close function and use
+ * bdrv_unref_child() here */
+if (child->bs->inherits_from == bs) {
+child->bs->inherits_from = NULL;
 }
-
-g_free(bs->opaque);
-bs->opaque = NULL;
-atomic_set(&bs->copy_on_read, 0);
-bs->backing_file[0] = '\0';
-bs->backing_format[0] = '\0';
-bs->total_sectors = 0;
-bs->encrypted = false;
-bs->sg = false;
-QDECREF(bs->options);
-QDECREF(bs->explicit_options);
-bs->options = NULL;
-bs->explicit_options = NULL;
-QDECREF(bs->full_open_options);
-bs->full_open_options = NULL;
+bdrv_detach_child(child);
 }
 
+g_free(bs->opaque);
+bs->opaque = NULL;
+atomic_set(&bs->copy_on_read, 0);
+bs->backing_file[0] = '\0';
+bs->backing_format[0] = '\0';
+bs->total_sectors = 0;
+bs->encrypted = false;
+bs->sg = false;
+QDECREF(bs->options);
+QDECREF(bs->explicit_options);
+bs->options = NULL;
+bs->explicit_options = NULL;
+QDECREF(bs->full_open_options);
+bs->full_open_options = NULL;
+
 bdrv_release_named_dirty_bitmaps(bs);
 assert(QLIST_EMPTY(&bs->dirty_bitmaps));
 
diff --git a/tests/qemu-iotests/060 b/tests/qemu-iotests/060
index 1eca09417b..14797dd3b0 100755
--- a/tests/qemu-iotests/060
+++ b/tests/qemu-iotests/060
@@ -426,6 +426,19 @@ echo '--- Repairing ---'
 _check_test_img -q -r all
 _check_test_img -r all
 
+echo
+echo "=== Testing the QEMU shutdown with a corrupted image ==="
+echo
+_make_test_img 64M
+poke_file "$TEST_IMG" "$rt_offset""\x00\x00\x00\x00\x00\x00\x00\x00"
+echo "{'execute': 'qmp_capabilities'}
+  {'execute': 'human-monitor-command',
+   'arguments': {'command-line': 'qemu-io drive \"write 0 512\"'}}
+  {'execute': 'quit'}" \
+| $QEMU -qmp stdio -nographic -nodefaults \
+-drive if=none,node-name=drive,file="$TEST_IMG",driver=qcow2 \
+| _filter_qmp | _filter_qemu_io
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
index 56f5eb15d8..c4cb7c665e 100644
--- a/tests/qemu-iotests/060.out
+++ b/tests/qemu-iotests/060.out
@@ -399,4 +399,16 @@ The following inconsistencies were found and repaired:
 
 Double checking the fixed ima

[Qemu-block] [PULL 1/7] block: Don't use BLK_PERM_CONSISTENT_READ for format probing

2017-11-21 Thread Kevin Wolf
For format probing, we don't really care whether all of the image
content is consistent. The only thing we're looking at is the image
header, and specifically the magic numbers that are expected to never
change, no matter how inconsistent the guest visible disk content is.

Therefore, don't request BLK_PERM_CONSISTENT_READ. This allows to use
format probing, e.g. in the context of 'qemu-img info', even while the
guest visible data in the image is inconsistent during a running block
job.

Signed-off-by: Kevin Wolf 
Reviewed-by: Fam Zheng 
---
 block.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 6c8ef98dfa..68b724206d 100644
--- a/block.c
+++ b/block.c
@@ -2579,7 +2579,10 @@ static BlockDriverState *bdrv_open_inherit(const char 
*filename,
 goto fail;
 }
 if (file_bs != NULL) {
-file = blk_new(BLK_PERM_CONSISTENT_READ, BLK_PERM_ALL);
+/* Not requesting BLK_PERM_CONSISTENT_READ because we're only
+ * looking at the header to guess the image format. This works even
+ * in cases where a guest would not see a consistent state. */
+file = blk_new(0, BLK_PERM_ALL);
 blk_insert_bs(file, file_bs, &local_err);
 bdrv_unref(file_bs);
 if (local_err) {
-- 
2.13.6




Re: [Qemu-block] [PATCH v2 for-2.11 2/4] coroutine: abort if we try to schedule or enter a pending coroutine

2017-11-21 Thread Kevin Wolf
Am 21.11.2017 um 03:23 hat Jeff Cody geschrieben:
> The previous patch fixed a race condition, in which there were
> coroutines being executing doubly, or after coroutine deletion.
> 
> We can detect common scenarios when this happens, and print an error
> message and abort before we corrupt memory / data, or segfault.
> 
> This patch will abort if an attempt to enter a coroutine is made while
> it is currently pending execution, either in a specific AioContext bh,
> or pending execution via a timer.  It will also abort if a coroutine
> is scheduled, before a prior scheduled run has occured.
> 
> We cannot rely on the existing co->caller check for recursive re-entry
> to catch this, as the coroutine may run and exit with
> COROUTINE_TERMINATE before the scheduled coroutine executes.
> 
> (This is the scenario that was occuring and fixed in the previous
> patch).
> 
> Signed-off-by: Jeff Cody 
> ---
>  include/qemu/coroutine_int.h |  6 ++
>  util/async.c | 11 +++
>  util/qemu-coroutine-sleep.c  | 11 +++
>  util/qemu-coroutine.c| 11 +++
>  4 files changed, 39 insertions(+)
> 
> diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
> index cb98892..56e4c48 100644
> --- a/include/qemu/coroutine_int.h
> +++ b/include/qemu/coroutine_int.h
> @@ -53,6 +53,12 @@ struct Coroutine {
>  
>  /* Only used when the coroutine has yielded.  */
>  AioContext *ctx;
> +
> +/* Used to catch and abort on illegal co-routine entry.
> + * Will contain the name of the function that had first
> + * scheduled the coroutine. */
> +const char *scheduled;

Not sure if it makes any difference in practice, but I just want to
mention that the new field is right after a cacheline boundary and
the only field that is used in qemu_aio_coroutine_enter() and accesses
this second cacheline.

I'm not paying much attention to this kind of thing in most contexts,
but entering a coroutine is a hot path that we want to be fast, so maybe
it's worth having a second look.

Kevin



Re: [Qemu-block] [PATCH v2 for-2.11 1/4] blockjob: do not allow coroutine double entry or entry-after-completion

2017-11-21 Thread Jeff Cody
On Tue, Nov 21, 2017 at 02:12:32PM +0100, Paolo Bonzini wrote:
> On 21/11/2017 11:49, Stefan Hajnoczi wrote:
> > On Mon, Nov 20, 2017 at 09:23:23PM -0500, Jeff Cody wrote:
> >> @@ -291,10 +291,10 @@ void block_job_start(BlockJob *job)
> >>  {
> >>  assert(job && !block_job_started(job) && job->paused &&
> >> job->driver && job->driver->start);
> >> -job->co = qemu_coroutine_create(block_job_co_entry, job);
> >>  job->pause_count--;
> >>  job->busy = true;
> >>  job->paused = false;
> >> +job->co = qemu_coroutine_create(block_job_co_entry, job);
> >>  bdrv_coroutine_enter(blk_bs(job->blk), job->co);
> >>  }
> > 
> > Please see discussion on v1 about this hunk.
> > 
> > The rest looks good.
> 
> I'm okay with this hunk, but I would appreciate that the commit message
> said why it's okay to delay block job cancellation after
> block_job_sleep_ns returns.
> 

Stefan is right in his reply to my v1, so I'll go ahead and drop this hunk
for v3.  I'll also add the info you requested to the commit message.

Jeff



Re: [Qemu-block] [PATCH v2 for-2.11 1/4] blockjob: do not allow coroutine double entry or entry-after-completion

2017-11-21 Thread Paolo Bonzini
On 21/11/2017 11:49, Stefan Hajnoczi wrote:
> On Mon, Nov 20, 2017 at 09:23:23PM -0500, Jeff Cody wrote:
>> @@ -291,10 +291,10 @@ void block_job_start(BlockJob *job)
>>  {
>>  assert(job && !block_job_started(job) && job->paused &&
>> job->driver && job->driver->start);
>> -job->co = qemu_coroutine_create(block_job_co_entry, job);
>>  job->pause_count--;
>>  job->busy = true;
>>  job->paused = false;
>> +job->co = qemu_coroutine_create(block_job_co_entry, job);
>>  bdrv_coroutine_enter(blk_bs(job->blk), job->co);
>>  }
> 
> Please see discussion on v1 about this hunk.
> 
> The rest looks good.

I'm okay with this hunk, but I would appreciate that the commit message
said why it's okay to delay block job cancellation after
block_job_sleep_ns returns.

Thanks,

Paolo



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v2 for-2.11 2/4] coroutine: abort if we try to schedule or enter a pending coroutine

2017-11-21 Thread Paolo Bonzini
On 21/11/2017 11:59, Stefan Hajnoczi wrote:
> On Mon, Nov 20, 2017 at 09:23:24PM -0500, Jeff Cody wrote:
>> @@ -438,6 +439,16 @@ fail:
>>  void aio_co_schedule(AioContext *ctx, Coroutine *co)
>>  {
>>  trace_aio_co_schedule(ctx, co);
>> +const char *scheduled = atomic_read(&co->scheduled);
>> +
>> +if (scheduled) {
>> +fprintf(stderr,
>> +"%s: Co-routine was already scheduled in '%s'\n",
>> +__func__, scheduled);
>> +abort();
>> +}
>> +atomic_set(&co->scheduled, __func__);
> 
> According to docs/devel/atomics.txt atomic_set()/atomic_read() are
> weakly ordered.  They require memory barriers to provide guarantees
> about ordering.  Your patch doesn't include barriers or comments about
> where the implicit barriers are.
> 
> The docs recommend using the following instead of
> atomic_read()/atomic_set() to get ordering:
> 
>   typeof(*ptr) atomic_mb_read(ptr)
>   void atomic_mb_set(ptr, val)

Even with atomic_mb_read/atomic_mb_set we should document what memory
ordering is required for correctness (i.e. what should be ordered
against what, or equivalently what operation has release/acquire semantics).

My reasoning is that the NULL write to co->scheduled should be ordered
before a write of co (anywhere), but the same is true for co->ctx so the
NULL write is ordered by the smp_wmb in qemu_aio_coroutine_enter.  So
that is fine but it needs a comment.

Dually, the read should be ordered after a read of co, but that's
handled by the callers via atomic_rcu_read if they need the ordering.
No comment needed here I think.

What's left is the __func__ write, where atomic_mb_set should be used
indeed. Or, perhaps better:

const char *scheduled = atomic_cmpxchg(&co->scheduled,
   NULL, __func__);
if (scheduled) {
...
}

... which also removes the atomic_read.

Thanks,

Paolo



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] segfault in parallel blockjobs (iotest 30)

2017-11-21 Thread Alberto Garcia
On Thu 16 Nov 2017 05:09:59 PM CET, Anton Nefedov wrote:
 I have the impression that one major source of headaches is the
 fact that the reopen queue contains nodes that don't need to be
 reopened at all. Ideally this should be detected early on in
 bdrv_reopen_queue(), so there's no chance that the queue contains
 nodes used by a different block job. If we had that then op
 blockers should be enough to prevent these things. Or am I missing
 something?

>>> After applying Max's patch I tried the similar approach; that is
>>> keep BDSes referenced while they are in the reopen queue.  Now I get
>>> the stream job hanging. Somehow one blk_root_drained_begin() is not
>>> paired with blk_root_drained_end(). So the job stays paused.
>> 
>> I can see this if I apply Max's patch and keep refs to BDSs in the
>> reopen queue:
>> 
>> #0  block_job_pause (...) at blockjob.c:130
>> #1  0x55c143cb586d in block_job_drained_begin (...) at blockjob.c:227
>> #2  0x55c143d08067 in blk_set_dev_ops (...) at block/block-backend.c:887
>> #3  0x55c143cb69db in block_job_create (...) at blockjob.c:678
>> #4  0x55c143d17c0c in mirror_start_job (...) at block/mirror.c:1177
>> 
>> There's a ops->drained_begin(opaque) call in blk_set_dev_ops() that
>> doesn't seem to be paired. 
>
> My understanding for now is
>
>1. in bdrv_drain_all_begin(), BDS gets drained with
>   bdrv_parent_drained_begin(), all the parents get
>   blk_root_drained_begin(), pause their jobs.
>2. one of the jobs finishes and deletes BB.
>3. in bdrv_drain_all_end(), bdrv_parent_drained_end() is never
>   called because even though BDS still exists (referenced in the
>   queue), it cannot be accessed as bdrv_next() takes BDS from the
>   global BB list (and the corresponding BB is deleted).

Yes, I was debugging this and I got to a similar conclusion. With
test_stream_commit from iotest 030 I can see that

   1. the block-stream job is created first, then stream_run begins and
  starts copying the data.
   2. block-commit starts and tries to reopen the top image in
  read-write mode. This pauses the stream block job and drains all
  BDSs.
   3. The draining causes the stream job to finish, it is deferred to
  the main loop, then stream_complete finishes and unrefs the block
  job, deleting it. At the point of deletion the pause count was
  still > 0 (because of step (2))

> Max's patch v1 could have helped:
> http://lists.nongnu.org/archive/html/qemu-devel/2017-11/msg01835.html

This doesn't solve the problem.

> Or, perhaps another approach, keep BlockJob referenced while it is
> paused (by block_job_pause/resume_all()). That should prevent it from
> deleting the BB.

Yes, I tried this and it actually solves the issue. But I still think
that the problem is that block jobs are allowed to finish when they are
paused.

Adding block_job_pause_point(&s->common) at the end of stream_run()
fixes the problem too.

Berto



Re: [Qemu-block] [PATCH 1/5] blockjob: do not allow coroutine double entry or entry-after-completion

2017-11-21 Thread Kevin Wolf
Am 20.11.2017 um 23:25 hat Paolo Bonzini geschrieben:
> On 20/11/2017 12:16, Stefan Hajnoczi wrote:
> > This raises questions about the ability to cancel sleep:
> > 
> > 1. Does something depend on cancelling sleep?
> 
> block_job_cancel does, but in practice the sleep time is so small
> (smaller than SLICE_TIME, which is 100 ms) that we probably don't care.

Just note that this is something that can happen during the final
migration phase when the VM is already stopped. In other words, with
non-shared storage, these up to 100 ms are added to the migration
downtime.

Kevin

> I agree with Jeff that canceling the sleep by force-entering the
> coroutine seemed clever but is probably a very bad idea.
> 
> Paolo


signature.asc
Description: PGP signature


Re: [Qemu-block] [Qemu-devel] [PATCH v2 for-2.11 2/4] coroutine: abort if we try to schedule or enter a pending coroutine

2017-11-21 Thread Eric Blake
On 11/20/2017 08:23 PM, Jeff Cody wrote:
> The previous patch fixed a race condition, in which there were
> coroutines being executing doubly, or after coroutine deletion.
> 
> We can detect common scenarios when this happens, and print an error
> message and abort before we corrupt memory / data, or segfault.
> 
> This patch will abort if an attempt to enter a coroutine is made while
> it is currently pending execution, either in a specific AioContext bh,
> or pending execution via a timer.  It will also abort if a coroutine
> is scheduled, before a prior scheduled run has occured.

s/occured/occurred/

> 
> We cannot rely on the existing co->caller check for recursive re-entry
> to catch this, as the coroutine may run and exit with
> COROUTINE_TERMINATE before the scheduled coroutine executes.
> 
> (This is the scenario that was occuring and fixed in the previous

s/occuring/occurring/

> patch).
> 
> Signed-off-by: Jeff Cody 
> ---

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v2 for-2.11 2/4] coroutine: abort if we try to schedule or enter a pending coroutine

2017-11-21 Thread Stefan Hajnoczi
On Mon, Nov 20, 2017 at 09:23:24PM -0500, Jeff Cody wrote:
> @@ -438,6 +439,16 @@ fail:
>  void aio_co_schedule(AioContext *ctx, Coroutine *co)
>  {
>  trace_aio_co_schedule(ctx, co);
> +const char *scheduled = atomic_read(&co->scheduled);
> +
> +if (scheduled) {
> +fprintf(stderr,
> +"%s: Co-routine was already scheduled in '%s'\n",
> +__func__, scheduled);
> +abort();
> +}
> +atomic_set(&co->scheduled, __func__);

According to docs/devel/atomics.txt atomic_set()/atomic_read() are
weakly ordered.  They require memory barriers to provide guarantees
about ordering.  Your patch doesn't include barriers or comments about
where the implicit barriers are.

The docs recommend using the following instead of
atomic_read()/atomic_set() to get ordering:

  typeof(*ptr) atomic_mb_read(ptr)
  void atomic_mb_set(ptr, val)


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH v2 for-2.11 1/4] blockjob: do not allow coroutine double entry or entry-after-completion

2017-11-21 Thread Stefan Hajnoczi
On Mon, Nov 20, 2017 at 09:23:23PM -0500, Jeff Cody wrote:
> @@ -291,10 +291,10 @@ void block_job_start(BlockJob *job)
>  {
>  assert(job && !block_job_started(job) && job->paused &&
> job->driver && job->driver->start);
> -job->co = qemu_coroutine_create(block_job_co_entry, job);
>  job->pause_count--;
>  job->busy = true;
>  job->paused = false;
> +job->co = qemu_coroutine_create(block_job_co_entry, job);
>  bdrv_coroutine_enter(blk_bs(job->blk), job->co);
>  }

Please see discussion on v1 about this hunk.

The rest looks good.


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH 1/5] blockjob: do not allow coroutine double entry or entry-after-completion

2017-11-21 Thread Stefan Hajnoczi
On Mon, Nov 20, 2017 at 08:36:19AM -0500, Jeff Cody wrote:
> On Mon, Nov 20, 2017 at 11:16:53AM +, Stefan Hajnoczi wrote:
> > On Sun, Nov 19, 2017 at 09:46:42PM -0500, Jeff Cody wrote:
> > > --- a/blockjob.c
> > > +++ b/blockjob.c
> > > @@ -291,10 +291,10 @@ void block_job_start(BlockJob *job)
> > >  {
> > >  assert(job && !block_job_started(job) && job->paused &&
> > > job->driver && job->driver->start);
> > > -job->co = qemu_coroutine_create(block_job_co_entry, job);
> > >  job->pause_count--;
> > >  job->busy = true;
> > >  job->paused = false;
> > > +job->co = qemu_coroutine_create(block_job_co_entry, job);
> > >  bdrv_coroutine_enter(blk_bs(job->blk), job->co);
> > >  }
> > >  
> > 
> > This hunk makes no difference.  The coroutine is only entered by
> > bdrv_coroutine_enter() so the order of job field initialization doesn't
> > matter.
> > 
> 
> It likely makes no difference with the current code (unless there is a
> latent bug). However I made the change to protect against the following
> scenario - which, perhaps to your point, would be a bug in any case:
> 
> 1. job->co = qemu_coroutine_create()
> 
> * Now block_job_started() returns true, as it just checks for job->co
> 
> 2. Another thread calls block_job_enter(), before we call
>bdrv_coroutine_enter().

The job is protected by AioContext acquire/release.  Other threads
cannot touch it because the block_job_start() caller has already
acquired the AioContext.

> 
> * block_job_enter() checks job->busy and block_job_started() to
>   determine if coroutine entry is allowed.  Without this change, these
>   checks could pass and coroutine entry could occur.
> 
> * I don't think this can happen in the current code, but the above hunk
>   change is still correct, and would protect against such an
>   occurrence.
>
> I guess the question is, "is it worth doing?", to try and prevent that sort
> of buggy behavior. My thought was "yes" because:
> 
> A) there is no penalty in doing it this way
> 
> B) while a bug, double entry like this can lead to memory and/or
> data corruption, and the checks for co->caller et al. might not
> catch it.  This is particularly true if the coroutine exits
> (COROUTINE_TERMINATE) before the re-entry.
> 
> But maybe if we are concerned about that we should figure out a way to
> abort() instead.  Of course, that makes allowing recursive coroutines more
> difficult in the future.

The compiler and CPU can reorder memory accesses so simply reordering
assignment statements is ineffective against threads.

I'm against merging this hunk because:

1. There is a proper thread-safety mechanism in place that callers are
   already using, so this is the wrong way to attempt to provide
   thread-safety.

2. This change doesn't protect against the multi-threaded scenario you
   described because the memory order isn't being controlled.

> 
> 
> > > @@ -797,11 +797,14 @@ void block_job_sleep_ns(BlockJob *job, 
> > > QEMUClockType type, int64_t ns)
> > >  return;
> > >  }
> > >  
> > > -job->busy = false;
> > > +/* We need to leave job->busy set here, because when we have
> > > + * put a coroutine to 'sleep', we have scheduled it to run in
> > > + * the future.  We cannot enter that same coroutine again before
> > > + * it wakes and runs, otherwise we risk double-entry or entry after
> > > + * completion. */
> > >  if (!block_job_should_pause(job)) {
> > >  co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns);
> > >  }
> > > -job->busy = true;
> > >  
> > >  block_job_pause_point(job);
> > 
> > This leaves a stale doc comment in include/block/blockjob_int.h:
> > 
> >   /**
> >* block_job_sleep_ns:
> >* @job: The job that calls the function.
> >* @clock: The clock to sleep on.
> >* @ns: How many nanoseconds to stop for.
> >*
> >* Put the job to sleep (assuming that it wasn't canceled) for @ns
> >* nanoseconds.  Canceling the job will interrupt the wait immediately.
> >~~
> >*/
> 
> I didn't catch the doc, that should be changed as well.
> 
> >   void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns);
> > 
> > This raises questions about the ability to cancel sleep:
> > 
> > 1. Does something depend on cancelling sleep?
> > 
> 
> Not that I can tell.  The advantage is that you don't have to wait for the
> timer, so something like qmp_block_job_cancel() will cancel sooner.
> 
> But it is obviously broken with the current coroutine implementation to try
> to do that.
> 
> > 2. Did cancellation work properly in commit
> >4513eafe928ff47486f4167c28d364c72b5ff7e3 ("block: add
> >block_job_sleep_ns") and was it broken afterwards?
> > 
> 
> With iothreads, the answer is complicated.  It was broken for a while for
> other reasons.
> 
> It broke after using aio_co_wake(

Re: [Qemu-block] [PATCH 3/5] coroutines: abort if we try to enter a still-sleeping coroutine

2017-11-21 Thread Stefan Hajnoczi
On Mon, Nov 20, 2017 at 08:45:21AM -0500, Jeff Cody wrote:
> On Mon, Nov 20, 2017 at 11:43:34AM +, Stefan Hajnoczi wrote:
> > On Sun, Nov 19, 2017 at 09:46:44PM -0500, Jeff Cody wrote:
> > BTW an alternative to adding individual bools is to implement a finite
> > state machine for the entire coroutine lifecycle.  A single function can
> > validate all state transitions:
> > 
> >   void check_state_transition(CoState old, CoState new,
> >   const char *action)
> >   {
> >   const char *errmsg = fsm[old][new];
> >   if (!errmsg) {
> >   return; /* valid transition! */
> >   }
> > 
> >   fprintf(stderr, "Cannot %s coroutine from %s state\n",
> >   action, state_name[old]);
> >   abort();
> >   }
> > 
> > Specifying fsm[][] forces us to think through all possible state
> > transitions.  This approach is proactive whereas adding bool flags is
> > reactive since it only covers a subset of states that were encountered
> > after crashes.  I'm not sure if it's worth it though :).
> 
> Interesting idea; maybe more for 2.12 instead of 2.11, though?

Sure.

Stefan


signature.asc
Description: PGP signature


[Qemu-block] [PATCH v5 for-2.11] QAPI & interop: Clarify events emitted by 'block-job-cancel'

2017-11-21 Thread Kashyap Chamarthy
When you cancel an in-progress 'mirror' job (or "active `block-commit`")
with QMP `block-job-cancel`, it emits the event: BLOCK_JOB_CANCELLED.
However, when `block-job-cancel` is issued *after* `drive-mirror` has
indicated (via the event BLOCK_JOB_READY) that the source and
destination have reached synchronization:

[...] # Snip `drive-mirror` invocation & outputs
{
  "execute":"block-job-cancel",
  "arguments":{
"device":"virtio0"
  }
}

{"return": {}}

It (`block-job-cancel`) will counterintuitively emit the event
'BLOCK_JOB_COMPLETED':

{
  "timestamp":{
"seconds":1510678024,
"microseconds":526240
  },
  "event":"BLOCK_JOB_COMPLETED",
  "data":{
"device":"virtio0",
"len":41126400,
"offset":41126400,
"speed":0,
"type":"mirror"
  }
}

But this is expected behaviour, where the _COMPLETED event indicates
that synchronization has successfully ended (and the destination now has
a point-in-time copy, which is at the time of cancel).

So add a small note to this effect in 'block-core.json'.  While at it,
also update the "Live disk synchronization -- drive-mirror and
blockdev-mirror" section in 'live-block-operations.rst'.

(Thanks: Max Reitz for reminding me of this caveat on IRC.)

Signed-off-by: Kashyap Chamarthy 
Reviewed-by: Eric Blake 
---
Changes in v5:
 - Fix the incorrect claim that 'block-job-complete' is used as part of
   live storage migration [Dave Gilbert]

Changes in v4:
 - Make the wording nicer [Eric Blake]

Changes in v3:
 - Fix / rewrite the section "Live disk synchronization -- drive-mirror
   and blockdev-mirror" to note this gotcha [John Snow]

Changes in v2:
 - "Note:" seems to be a special construct in Patchew, my usage caused a
build failure.  So do: s/Note:/Note that/
 - Add the missing 'Signed-off-by'
---
 docs/interop/live-block-operations.rst | 50 ++
 qapi/block-core.json   |  6 
 2 files changed, 38 insertions(+), 18 deletions(-)

diff --git a/docs/interop/live-block-operations.rst 
b/docs/interop/live-block-operations.rst
index 5f01797..0824827 100644
--- a/docs/interop/live-block-operations.rst
+++ b/docs/interop/live-block-operations.rst
@@ -506,26 +506,40 @@ Again, given our familiar disk image chain::
 
 [A] <-- [B] <-- [C] <-- [D]
 
-The ``drive-mirror`` (and its newer equivalent ``blockdev-mirror``) allows
-you to copy data from the entire chain into a single target image (which
-can be located on a different host).
-
-Once a 'mirror' job has started, there are two possible actions while a
-``drive-mirror`` job is active:
-
-(1) Issuing the command ``block-job-cancel`` after it emits the event
-``BLOCK_JOB_CANCELLED``: will (after completing synchronization of
-the content from the disk image chain to the target image, [E])
-create a point-in-time (which is at the time of *triggering* the
-cancel command) copy, contained in image [E], of the the entire disk
+The ``drive-mirror`` (and its newer equivalent ``blockdev-mirror``)
+allows you to copy data from the entire chain into a single target image
+(which can be located on a different host), [E].
+
+.. note::
+
+When you cancel an in-progress 'mirror' job *before* the source and
+target are synchronized, ``block-job-cancel`` will emit the event
+``BLOCK_JOB_CANCELLED``.  However, note that if you cancel a
+'mirror' job *after* it has indicated (via the event
+``BLOCK_JOB_READY``) that the source and target have reached
+synchronization, then the event emitted by ``block-job-cancel``
+changes to ``BLOCK_JOB_COMPLETED``.
+
+Besides the 'mirror' job, the "active ``block-commit``" is the only
+other block device job that emits the event ``BLOCK_JOB_READY``.
+The rest of the block device jobs ('stream', "non-active
+``block-commit``", and 'backup') end automatically.
+
+So there are two possible actions to take, after a 'mirror' job has
+emitted the event ``BLOCK_JOB_READY``, indicating that the source and
+target have reached synchronization:
+
+(1) Issuing the command ``block-job-cancel`` (after it emits the event
+``BLOCK_JOB_COMPLETED``) will create a point-in-time (which is at
+the time of *triggering* the cancel command) copy of the entire disk
 image chain (or only the top-most image, depending on the ``sync``
-mode).
+mode), contained in the target image [E]. One use case for this is
+live storage migration.
 
-(2) Issuing the command ``block-job-complete`` after it emits the event
-``BLOCK_JOB_COMPLETED``: will, after completing synchronization of
-the content, adjust the guest device (i.e. live QEMU) to point to
-the target image, and, causing all the new writes from this point on
-to happen there.  One use case for this is live storage migration.
+(2) Issuing the command ``block-job-complete`` (after it emits the event
+``BLOCK_JOB_COMPLETE

Re: [Qemu-block] [Qemu-devel] [PATCH v2 1/4] qcow2: multiple clusters write compressed

2017-11-21 Thread Stefan Hajnoczi
On Mon, Nov 20, 2017 at 06:03:00PM +0300, Denis V. Lunev wrote:
> On 11/20/2017 05:53 PM, Stefan Hajnoczi wrote:
> > On Thu, Nov 16, 2017 at 07:54:55PM +0300, Anton Nefedov wrote:
> >> From: Pavel Butsykin 
> >>
> >> At the moment, qcow2_co_pwritev_compressed can process the requests size
> >> less than or equal to one cluster. This patch added possibility to write
> >> compressed data in the QCOW2 more than one cluster. The implementation
> >> is simple, we just split large requests into separate clusters and write
> >> using existing functionality.
> >>
> >> Signed-off-by: Anton Nefedov 
> >> ---
> >>  block/qcow2.c | 73 
> >> ++-
> >>  1 file changed, 52 insertions(+), 21 deletions(-)
> > Kevin: qcow2_alloc_compressed_cluster_offset() sets up an L2 table entry
> > before doing the compressed data write.  Can the following scenario
> > occur if there is another request racing with the compressed write?
> >
> > 1. Compressed cluster L2 table entry added to qcow2 in-memory cache
> > 2. Another request forces cached L2 table to be written to disk
> > 3. Power failure or crash before compressed data is written
> >
> > Now there is an L2 table entry pointing to garbage data.  This violates
> > qcow2's data integrity model.
> >
> > I'm not sure if compressed writes are safe...  It may have been okay for
> > qemu-img convert but the risk is increased when running a VM.
> qemu-img is now multi-coroutine, thus the danger is exactly the same.

Yes, the the problem is exactly the same but users delete incomplete
output files when qemu-img covert is interrupted due to power failure.
That's probably why it hasn't been a priority to fix this qcow2 code
path.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-block] [Qemu-devel] [PATCH v4 for-2.11] QAPI & interop: Clarify events emitted by 'block-job-cancel'

2017-11-21 Thread Kashyap Chamarthy
On Fri, Nov 17, 2017 at 07:54:44PM +0100, Kashyap Chamarthy wrote:

[...]

> +So there are two possible actions to take, after a 'mirror' job has
> +emitted the event ``BLOCK_JOB_READY``, indicating that the source and
> +target have reached synchronization:
> +
> +(1) Issuing the command ``block-job-cancel`` (after it emits the event
> +``BLOCK_JOB_COMPLETED``) will create a point-in-time (which is at
> +the time of *triggering* the cancel command) copy of the entire disk
>  image chain (or only the top-most image, depending on the ``sync``
> -mode).
> +mode), contained in the target image [E].
>  
> -(2) Issuing the command ``block-job-complete`` after it emits the event
> -``BLOCK_JOB_COMPLETED``: will, after completing synchronization of
> -the content, adjust the guest device (i.e. live QEMU) to point to
> -the target image, and, causing all the new writes from this point on
> -to happen there.  One use case for this is live storage migration.
> +(2) Issuing the command ``block-job-complete`` (after it emits the event
> +``BLOCK_JOB_COMPLETED``) will adjust the guest device (i.e. live
> +QEMU) to point to the target image, [E], causing all the new writes
> +from this point on to happen there.  One use case for this is live
> +storage migration.

Dave Gilbert pointed out on IRC that last sentence ("One use case for
this is live storage migration") should apply to point (1) above.  He's
right, I'll send a v5 with that (and the commit message tweak I noted
earlier) fix.

[...]


-- 
/kashyap