date:20170428

Re: [Qemu-block] [Qemu-devel] [PATCH v17 00/24] qcow2: persistent dirty bitmaps

2017-04-28 Thread Vladimir Sementsov-Ogievskiy

This is fixed by not fail on ! can_write() if there are no persistent 
BdrvDirtyBitmap's. But the problem is, what if we have readonly image with 
bitmaps? This should not fail. I'll have to add bool 'changed' field to 
BdrvDirtyBitmap to handle this. Or refactor all bool's to int flags

Best regards, 
Vladimir. 

От: John Snow [js...@redhat.com]
Отправлено: 27 апреля 2017 г. 19:43
Кому: Vladimir Sementsov-Ogievskiy; qemu-block@nongnu.org; qemu-de...@nongnu.org
Копия: kw...@redhat.com; f...@redhat.com; arm...@redhat.com; mre...@redhat.com; 
stefa...@redhat.com; pbonz...@redhat.com; Denis Lunev
Тема: Re: [Qemu-devel] [PATCH v17 00/24] qcow2: persistent dirty bitmaps

On 04/26/2017 07:30 AM, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
>
> There is a new update of qcow2-bitmap series - v17.
>
> web: 
> https://src.openvz.org/users/vsementsov/repos/qemu/browse?at=qcow2-bitmap-v17
> git: https://src.openvz.org/scm/~vsementsov/qemu.git (tag qcow2-bitmap-v17)
>

Hm, what is this branch based on? My patches tools are having trouble
finding common ancestors -- 2d6752d38d8acda6aae674a72b72be05482a58eb I
guess, but that's pre-rc0, and the histories don't quite match up.

I did a rebase myself, but I notice that patch 14/24; "qcow2: add
persistent dirty bitmaps support" causes a large number of iotest
regressions:

Failures: 007 009 010 011 012 013 014 015 017 018 019 020 022 023 024
025 026 028 029 030 031 032 034 035 037 038 039 040 041 042 043 044 046
047 048 050 051 052 053 055 056 058 060 061 062 063 065 066 073 074 080
082 086 089 090 095 097 098 102 104 110 112 114 115 117 121 122 124 129
130 132 133 138 140 141 142 150 152 154 155 156 158 159 170 176

Can you give that a look and if you respin, rebase on top of origin/master?

Thanks,
-John

> v17:
>
> 08: add r-b's by Max and John
> 09: clear unknown autoclear features from BDRVQcow2State before calling
> qcow2_load_autoloading_dirty_bitmaps(), and also do not extra update
> header if it is updated by qcow2_load_autoloading_dirty_bitmaps().
> 11: new patch, splitted out from 16
> 12: rewrite commit message
> 14: changing bdrv_close moved to separate new patch 11
> s/1/1ULL/ ; s/%u/%llu/ for two errors
> 16: s/No/Not/
> add Max's r-b
> 24: new patch
>
>
> v16:
>
> mostly by Kevin's comments:
> 07: just moved here, as preparation for merging refcounts to 08
> 08: move "qcow2-bitmap: refcounts" staff to this patch, to not break 
> qcow2-img check
> drop r-b's
> move necessary supporting static functions to this patch too (to not 
> break compilation)
> fprintf -> error_report
> other small changes with error messages and comments
> code style
> for qcow2-bitmap.c:
>   'include "exec/log.h"' was dropped
>   s/1/(1U << 0) for BME_FLAG_IN_USE
>   add BME_TABLE_ENTRY_FLAG_ALL_ONES to replace magic 1
>   don't check 'cluster_size <= 0' in check_table_entry
> old "[PATCH v15 08/25] block: introduce auto-loading bitmaps" was dropped
> 09: was "[PATCH v15 09/25] qcow2: add .bdrv_load_autoloading_dirty_bitmaps"
> drop r-b's
> some staff was moved to 08
> update_header_sync - error on bdrv_flush fail
> rename disk_sectors_in_bitmap_cluster to sectors_covered_by_bitmap_cluster
>  and adjust comment.
>  so, variable for storing this function result: s/dsc/sbc/
> s/1/BME_TABLE_ENTRY_FLAG_ALL_ONES/
> also, as Qcow2BitmapTable already introduced in 08,
>s/table_offset/table.offset/ and s/table_size/table.size, etc..
> update_ext_header_and_dir_in_place: add comments, add additional
>   update_header_sync, to reduce indeterminacy in case of error.
> call qcow2_load_autoloading_dirty_bitmaps directly from qcow2_open
> no .bdrv_load_autoloading_dirty_bitmaps
> 11: drop bdrv_store_persistent_dirty_bitmaps at all.
> drop r-b's
> 13: rename patch, rewrite commit msg
> drop r-b's
> move bdrv_release_named_dirty_bitmaps in bdrv_close() after 
> drv->bdrv_close()
> Qcow2BitmapTable is already introduced, so no
>   s/table_offset/table.offset/ and s/table_size/table.size, etc.. here
> old 25 with check_constraints_on_bitmap() improvements merged here (Eric)
> like in 09, s/dsc/sbc/ and 
> s/disk_sectors_in_bitmap_cluster/sectors_covered_by_bitmap_cluster/
> no .bdrv_store_persistent_dirty_bitmaps
> call qcow2_store_persistent_dirty_bitmaps directly from qcow2_inactivate
> 15: corresponding part of 25 merged here. Add John's r-b, as 25 was also 
> reviewed by John.
> 16: add John's r-b
>
>
> v15:
> 13,14: add John's r-b
> 15: qcow2_can_store_new_dirty_bitmap:
>   - check max dirty bitmaps and bitmap directory overhead
>   - switch to error_prepend
> rm Max's r-b
> not add John's r-b
> 17-24: add John's r-b
> 25: changed because 15 changed,
> not add John's r-b
>
>
> v14:
>
> 07: use '|=' to update need_update_header
> add John's r-b
> add Max's r-b
> 09: remove unused bitma

Re: [Qemu-block] RFC: some problems with bdrv_get_block_status

2017-04-28 Thread Denis V. Lunev

On 04/28/2017 09:31 PM, Eric Blake wrote:
> On 04/28/2017 11:17 AM, Denis V. Lunev wrote:
>> Hello, All!
>>
>> Recently we have experienced problems with very slow
>> bdrv_get_block_status call, which is massive called f.e.
>> from mirror_run().
>>
>> The problem was narrowed down to slow lseek64() system
>> call, which can take 1-2-3 seconds.
> I'm guessing you meant one-to-three (the range), not one-two-three
> (three separate digits), and just had an unfortunate abbreviation of
> 'to' turning into the phonetically-similar '2'.
>
from 1 to 3, you are correct


>> root@s158 ~]# strace -f -p 77048 -T  -e lseek
>> Process 77048 attached with 14 threads
>> [pid 77053] lseek(23, 6359613440, SEEK_DATA) = 6360276992 <1.250323>
> That sounds like a bug in your choice of filesystem.  It's been
> mentioned before that lseek has some pathetically poor behavior (I think
> tmpfs was one of the culprits), but I maintain that it is better to
> hammer on the kernel folks to fix the poor behavior than it is to have
> to implement user-space workarounds in every single affected program.
>
> That said, a workaround of being able to request the avoidance of lseek
> has been brought up on this list before.
>
>
>> The problem comes from this branch of the code
>>
>> bdrv_co_get_block_status
>> ...
>> if (bs->file &&
>> (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
>> (ret & BDRV_BLOCK_OFFSET_VALID)) {
>> int file_pnum;
>>
>> ret2 = bdrv_co_get_block_status(bs->file, ret >> BDRV_SECTOR_BITS,
>> *pnum, &file_pnum);
>> if (ret2 >= 0) {
>> /* Ignore errors.  This is just providing extra information, it
>>  * is useful but not necessary.
>>  */
>> if (!file_pnum) {
>> /* !file_pnum indicates an offset at or beyond the EOF;
>> it is
>>  * perfectly valid for the format block driver to point
>> to such
>>  * offsets, so catch it and mark everything as zero */
>> ret |= BDRV_BLOCK_ZERO;
>> } else {
>> /* Limit request to the range reported by the protocol
>> driver */
>> *pnum = file_pnum;
>> ret |= (ret2 & BDRV_BLOCK_ZERO);
>> }
>> }
>> }
> I don't see anything wrong with this code. It's nice to know that we
> have data (qcow2 says we have allocated bytes due to this layer, so
> don't refer to the backing files), but when the underlying file can ALSO
> tell us that the underlying protocol is sparse and we are on a hole,
> then we know that we have BDRV_BLOCK_ZERO condition which can in turn
> enable other optimizations in quite a bit of the stack.  It IS valid to
> return BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO (we don't have to limit
> ourselves to BDRV_BLOCK_DATA), and we should probably do so any time
> that such computation is cheap.
>
> I agree with your analysis that on a poor filesystem where lseek()
> behavior sucks that it is no longer cheap to determine where the holes are.
>
> But the proper workaround for those filesystems should NOT be made by
> undoing this part of bdrv_co_get_block_status(), but should rather be
> fixed in file-posix.c by the addition of a user-controllable knob on
> whether to skip lseek().  In other words, if we're going to work around
> the poor filesystem performance, the workaround should live in
> file-posix.c, not in the generic block/io.c.  Once the workaround is
> enabled, the 'ret2 = bdrv_co_get_block_status()' nested call should be
> lightning fast (because file-posix.c would no longer be using lseek when
> you've requested the workaround).
>
>> Frankly speaking, this optimization should not give much.
>> If upper layer format (QCOW2) says that we have data
>> here, then nowadays in 99.9% we do have the data.
> You are correct that we have BDRV_BLOCK_DATA; but it IS useful to ALSO
> know if that data reads back as all zeros (so we can skip reading it).
> Just because qcow2 reports something as data does NOT rule out whether
> the data still reads as zeros.
>
>> Meanwhile this branch can cause problems. We would
>> need block cleared entirely to get the benefit for most
>> cases in mirror and backup operations.
>>
>> At my opinion it worth to drop this at all.
>>
>> Guys, do you have any opinion?
> Again, my opinion is to not change this part of block/io.c.  Rather,
> work should be expended on individual protocol drivers to quit wasting
> unnecessary time on reporting BDRV_BLOCK_ZERO if the time spent to
> determine that is not worth the effort (as it is when using lseek() on
> inefficient filesystems).  It is always safe for a protocol driver to
> report BDRV_BLOCK_DATA instead of BDRV_BLOCK_ZERO, if asking the
> question is too expensive (treating holes as data may pessimize other
> operations, but that's okay if the penalty for asking is worse than what
> optimizations are able to regain).
>
>> Den
>>

Re: [Qemu-block] [Qemu-devel] [PATCH v5 4/4] RFC: shutdown: Expose full ShutdownCause across QMP

2017-04-28 Thread Eric Blake

[adding Dan in cc]

On 04/27/2017 09:13 PM, Eric Blake wrote:
> Since all reset/shutdown requests have been associated with a
> reason, we can expose the full reason rather than a simple bool
> to the guest. Document that any future additions to the enum
> type will still use a 'host-' or 'guest-' prefix.
> 
> Signed-off-by: Eric Blake 
> 

This is RFC because I really want to know whether libvirt and upper
layer management apps will care about the various causes, or if exposing
just a boolean is okay.  We can maintain backwards compatibility if we
release with just a boolean and later decide we want to expose the enum,
but it's easier to start with a design that meets everyone's needs that
it is to have to add stuff later or wish that we hadn't exposed quite so
many enum values.


> +++ b/qapi/event.json
> @@ -10,9 +10,7 @@
>  # Emitted when the virtual machine has shut down, indicating that qemu is
>  # about to exit.
>  #
> -# @guest: If true, the shutdown was triggered by a guest request (such as
> -# a guest-initiated ACPI shutdown request or other hardware-specific action)
> -# rather than a host request (such as sending qemu a SIGINT). (since 2.10)
> +# @reason: What triggered the reset. (since 2.10)
>  #
>  # Note: If the command-line option "-no-shutdown" has been specified, qemu 
> will
>  # not exit, and a STOP event will eventually follow the SHUTDOWN event
> @@ -21,11 +19,11 @@
>  #
>  # Example:
>  #
> -# <- { "event": "SHUTDOWN", "data": { "guest": true },
> +# <- { "event": "SHUTDOWN", "data": { "reason": "guest-shutdown" },
>  #  "timestamp": { "seconds": 1267040730, "microseconds": 682951 } }
>  #
>  ##
> -{ 'event': 'SHUTDOWN', 'data': { 'guest': 'bool' } }
> +{ 'event': 'SHUTDOWN', 'data': { 'reason': 'ShutdownCause' } }
> 


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [Qemu-devel] [PULL 14/34] iotests: Launch qemu-nbd with -e 42

2017-04-28 Thread Eric Blake

On 04/28/2017 03:33 PM, Kevin Wolf wrote:
> From: Max Reitz 
> 
> There is no reason for the qemu-nbd server used for tests not to accept
> an arbitrary number of clients. In fact, test 181 will require it to
> accept two clients at the same time (and thus it fails before this
> patch).

Quick question:

Is this mention of test 181 the same test that occurs earlier in the
pull request as 8/34?  If so, should the series have been reordered for
bisectability?  If not, should the commit message be touched up?

At any rate, bisectability of qemu-iotests is less of a concern than
bisectability of 'make check' success, so I don't think your answer
invalidates the pull request.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v10 09/17] qcow2: Optimize write zero of unaligned tail cluster

2017-04-28 Thread Eric Blake

On 04/28/2017 03:48 PM, Max Reitz wrote:
> On 27.04.2017 03:46, Eric Blake wrote:
>> We've already improved discards to operate efficiently on the tail
>> of an unaligned qcow2 image; it's time to make a similar improvement
>> to write zeroes.  The special case is only valid at the tail
>> cluster of a file, where we must recognize that any sectors beyond
>> the image end would implicitly read as zero, and therefore should
>> not penalize our logic for widening a partial cluster into writing
>> the whole cluster as zero.
>>
>> Update test 154 to cover the new scenarios, using two images of
>> intentionally differing length.
>>
>> While at it, fix the test to gracefully skip when run as
>> ./check -qcow2 -o compat=0.10 154
>> since the older format lacks zero clusters already required earlier
>> in the test.
>>
>> Signed-off-by: Eric Blake 
>>

>>
>> +echo
>> +echo == unaligned image tail cluster, no allocation needed ==
>> +
>> +CLUSTER_SIZE=1024 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))
> 
> Any reason for the CLUSTER_SIZE? It passes with 64 kB as well, and I
> actually think that would be a better test because as it is, the image's
> "unaligned" tail is exactly one cluster (so it isn't really unaligned).

Uhhh - copy-and-paste?  You're right, that 1024 is too small for what
I'm actually doing with it.  :(

> 
>> +_make_test_img -b "$TEST_IMG.base" $((size + 2048))
>> +
>> +# With no backing file, write to all or part of unallocated partial cluster
>> +
>> +# Backing file: 128m: ... | 00 --
> 
> Not sure what the "00 --" is supposed to mean. The cluster size is 1 kB,
> so this is just a single cluster; which will be converted from
> unallocated to a zero cluster because the tail reads as zero.
> 
>> +$QEMU_IO -c "write -z $size 512" "$TEST_IMG.base" | _filter_qemu_io

I'm doing a write of 512 bytes, so I was trying to show that in the
cluster starting at 128m, we have a write request for one sector to be
written to 00, while the rest of the sector is left uninitialized. The
obvious intent is that we note that the uninitialized portion reads as
zero, so we can optimize the entire cluster (even though it is a partial
cluster) to be a cluster write-zero instead of a sector-based allocating
write.  But since you've pointed out that my setup is flawed, I'll have
to double-check that I'm actually testing what I thought I was (I know I
hit the right breakpoints in gdb, but its the iotest expected output
that needs to show the difference between pre- and post-patched qemu
with regards to the partial-tail-cluster optimizations).

>> +
>> +# Backing file: 128m: ... | -- 00
> 
> Same here, but now it's the head that reads as zero (and it's already a
> zero cluster so nothing changes here).
> 
>> +$QEMU_IO -c "write -z $size 512" "$TEST_IMG.base" | _filter_qemu_io
> 
> But I suppose you mean $((size + 512))?

Umm. Yeah.  (Hey - you're not the only one that writes patches late at
night).

> 
>> +
>> +# Backing file: 128m: ... | 00 00
> 
> And finally this doesn't change anything either -- but then again this
> is a fully aligned request, so I don't quite see the point in testing
> this here.

At one point during my development, I had a bug where a partial write
request at the head of the cluster behaved differently from a partial
write request at the end (one allocated while the other did not).

Maybe it's best if I do a single write, then inspect the map, then reset
the image, rather than doing all three writes in a row and proving that
the net result of the three writes had no allocation.  In v9, when I was
trying to use unallocated clusters as a shortcut on files with no
backing image, this actually worked (after all three writes, the cluster
would still be unallocated); but since Kevin convinced me that v10 has
to set the zero flag on all three writes, I'm back to the point of
needing to clear the zero flag between writes.

Okay, I'll definitely have to fix it.

>> +# With backing file that reads as zero, write to all or part of entire 
>> cluster
>> +
>> +# Backing file: 128m: ... | 00 00
>> +# Active layer: 128m: ... | 00 00 00 00
>> +$QEMU_IO -c "write -z $size 2048" "$TEST_IMG" | _filter_qemu_io
>> +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
>> +
>> +# Backing file: 128m: ... | 00 00
>> +# Active layer: 128m: ... | 00 00 -- --
> 
> How so? Shouldn't this just stay a zero cluster because the rest of the
> cluster already reads as zero?

Same problem as above.  In v9, I was trying to exploit code that left a
cluster unallocated; but that's not viable, so I'll have to reset the
image between steps.

> 
> Also, I'm not quite sure what you are testing by just issuing three
> consecutive write requests without checking the result each time. You
> assume it to be something which you wrote in the comments above each,
> but you can't be sure (yes, if something goes wrong in between, the
> following final map should catch it, but it's not for certain).

Well, it helped me during devel

Re: [Qemu-block] RFC: some problems with bdrv_get_block_status

2017-04-28 Thread Denis V. Lunev

On 04/28/2017 09:31 PM, Eric Blake wrote:
> On 04/28/2017 11:17 AM, Denis V. Lunev wrote:
>> Hello, All!
>>
>> Recently we have experienced problems with very slow
>> bdrv_get_block_status call, which is massive called f.e.
>> from mirror_run().
>>
>> The problem was narrowed down to slow lseek64() system
>> call, which can take 1-2-3 seconds.
> I'm guessing you meant one-to-three (the range), not one-two-three
> (three separate digits), and just had an unfortunate abbreviation of
> 'to' turning into the phonetically-similar '2'.
First of all, Eric, thank you very much for the prompt answer.

yes. I have meant 1..3 seconds. Thank you for clarifying that.

>> root@s158 ~]# strace -f -p 77048 -T  -e lseek
>> Process 77048 attached with 14 threads
>> [pid 77053] lseek(23, 6359613440, SEEK_DATA) = 6360276992 <1.250323>
> That sounds like a bug in your choice of filesystem.  It's been
> mentioned before that lseek has some pathetically poor behavior (I think
> tmpfs was one of the culprits), but I maintain that it is better to
> hammer on the kernel folks to fix the poor behavior than it is to have
> to implement user-space workarounds in every single affected program.
>
> That said, a workaround of being able to request the avoidance of lseek
> has been brought up on this list before.
I have found that letter (Feb 2017). The problem there was with ZFS.
I have problems with EXT4. The problem comes not immediately
but after some time under load.

This call will also cost a lot for the case of NFSv4, which supports
SEEK_DATA and SEEK_HOLE, as it will require additional
network exchange. Thus this is not cheap for at least some cases
and thus it is dangerous to follow current way. May be other
filesystems are also affected.

>
>> The problem comes from this branch of the code
>>
>> bdrv_co_get_block_status
>> ...
>> if (bs->file &&
>> (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
>> (ret & BDRV_BLOCK_OFFSET_VALID)) {
>> int file_pnum;
>>
>> ret2 = bdrv_co_get_block_status(bs->file, ret >> BDRV_SECTOR_BITS,
>> *pnum, &file_pnum);
>> if (ret2 >= 0) {
>> /* Ignore errors.  This is just providing extra information, it
>>  * is useful but not necessary.
>>  */
>> if (!file_pnum) {
>> /* !file_pnum indicates an offset at or beyond the EOF;
>> it is
>>  * perfectly valid for the format block driver to point
>> to such
>>  * offsets, so catch it and mark everything as zero */
>> ret |= BDRV_BLOCK_ZERO;
>> } else {
>> /* Limit request to the range reported by the protocol
>> driver */
>> *pnum = file_pnum;
>> ret |= (ret2 & BDRV_BLOCK_ZERO);
>> }
>> }
>> }
> I don't see anything wrong with this code. It's nice to know that we
> have data (qcow2 says we have allocated bytes due to this layer, so
> don't refer to the backing files), but when the underlying file can ALSO
> tell us that the underlying protocol is sparse and we are on a hole,
> then we know that we have BDRV_BLOCK_ZERO condition which can in turn
> enable other optimizations in quite a bit of the stack.  It IS valid to
> return BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO (we don't have to limit
> ourselves to BDRV_BLOCK_DATA), and we should probably do so any time
> that such computation is cheap.
>
> I agree with your analysis that on a poor filesystem where lseek()
> behavior sucks that it is no longer cheap to determine where the holes are.
>
> But the proper workaround for those filesystems should NOT be made by
> undoing this part of bdrv_co_get_block_status(), but should rather be
> fixed in file-posix.c by the addition of a user-controllable knob on
> whether to skip lseek().  In other words, if we're going to work around
> the poor filesystem performance, the workaround should live in
> file-posix.c, not in the generic block/io.c.  Once the workaround is
> enabled, the 'ret2 = bdrv_co_get_block_status()' nested call should be
> lightning fast (because file-posix.c would no longer be using lseek when
> you've requested the workaround).
I understand that it is very good to know, but what benefit we will have
really:
- backup will skip cluster only if the cluster is all zeroes
- mirror skips the cluster only if it will be full zeroes
- the same applies for 'qemu-img convert'

I think we should somehow measure the benefit against the loss.
We have at least 2 (or probably 3) different filesystems which do
not work well with that.

I have seen the question to try disable 'detect-zeroes' and
this could look like a worthy option to put original behavior
under, i.e. we could perform lseek(SEEK_DATA) only if
'detect-zeroes' option is on, which well match the meaning
of that option and will save the flag namespace a bit.

Other option would be to split this API to one rough call whi

Re: [Qemu-block] RFC: some problems with bdrv_get_block_status

2017-04-28 Thread Denis V. Lunev

On 04/28/2017 10:35 PM, Denis V. Lunev wrote:
> On 04/28/2017 09:31 PM, Eric Blake wrote:
>> On 04/28/2017 11:17 AM, Denis V. Lunev wrote:
>>> Hello, All!
>>>
>>> Recently we have experienced problems with very slow
>>> bdrv_get_block_status call, which is massive called f.e.
>>> from mirror_run().
>>>
>>> The problem was narrowed down to slow lseek64() system
>>> call, which can take 1-2-3 seconds.
>> I'm guessing you meant one-to-three (the range), not one-two-three
>> (three separate digits), and just had an unfortunate abbreviation of
>> 'to' turning into the phonetically-similar '2'.
>>
> from 1 to 3, you are correct
>
>
>>> root@s158 ~]# strace -f -p 77048 -T  -e lseek
>>> Process 77048 attached with 14 threads
>>> [pid 77053] lseek(23, 6359613440, SEEK_DATA) = 6360276992 <1.250323>
>> That sounds like a bug in your choice of filesystem.  It's been
>> mentioned before that lseek has some pathetically poor behavior (I think
>> tmpfs was one of the culprits), but I maintain that it is better to
>> hammer on the kernel folks to fix the poor behavior than it is to have
>> to implement user-space workarounds in every single affected program.
>>
>> That said, a workaround of being able to request the avoidance of lseek
>> has been brought up on this list before.
>>
>>
>>> The problem comes from this branch of the code
>>>
>>> bdrv_co_get_block_status
>>> ...
>>> if (bs->file &&
>>> (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
>>> (ret & BDRV_BLOCK_OFFSET_VALID)) {
>>> int file_pnum;
>>>
>>> ret2 = bdrv_co_get_block_status(bs->file, ret >> BDRV_SECTOR_BITS,
>>> *pnum, &file_pnum);
>>> if (ret2 >= 0) {
>>> /* Ignore errors.  This is just providing extra information, it
>>>  * is useful but not necessary.
>>>  */
>>> if (!file_pnum) {
>>> /* !file_pnum indicates an offset at or beyond the EOF;
>>> it is
>>>  * perfectly valid for the format block driver to point
>>> to such
>>>  * offsets, so catch it and mark everything as zero */
>>> ret |= BDRV_BLOCK_ZERO;
>>> } else {
>>> /* Limit request to the range reported by the protocol
>>> driver */
>>> *pnum = file_pnum;
>>> ret |= (ret2 & BDRV_BLOCK_ZERO);
>>> }
>>> }
>>> }
>> I don't see anything wrong with this code. It's nice to know that we
>> have data (qcow2 says we have allocated bytes due to this layer, so
>> don't refer to the backing files), but when the underlying file can ALSO
>> tell us that the underlying protocol is sparse and we are on a hole,
>> then we know that we have BDRV_BLOCK_ZERO condition which can in turn
>> enable other optimizations in quite a bit of the stack.  It IS valid to
>> return BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO (we don't have to limit
>> ourselves to BDRV_BLOCK_DATA), and we should probably do so any time
>> that such computation is cheap.
>>
>> I agree with your analysis that on a poor filesystem where lseek()
>> behavior sucks that it is no longer cheap to determine where the holes are.
>>
>> But the proper workaround for those filesystems should NOT be made by
>> undoing this part of bdrv_co_get_block_status(), but should rather be
>> fixed in file-posix.c by the addition of a user-controllable knob on
>> whether to skip lseek().  In other words, if we're going to work around
>> the poor filesystem performance, the workaround should live in
>> file-posix.c, not in the generic block/io.c.  Once the workaround is
>> enabled, the 'ret2 = bdrv_co_get_block_status()' nested call should be
>> lightning fast (because file-posix.c would no longer be using lseek when
>> you've requested the workaround).
>>
>>> Frankly speaking, this optimization should not give much.
>>> If upper layer format (QCOW2) says that we have data
>>> here, then nowadays in 99.9% we do have the data.
>> You are correct that we have BDRV_BLOCK_DATA; but it IS useful to ALSO
>> know if that data reads back as all zeros (so we can skip reading it).
>> Just because qcow2 reports something as data does NOT rule out whether
>> the data still reads as zeros.
>>
>>> Meanwhile this branch can cause problems. We would
>>> need block cleared entirely to get the benefit for most
>>> cases in mirror and backup operations.
>>>
>>> At my opinion it worth to drop this at all.
>>>
>>> Guys, do you have any opinion?
>> Again, my opinion is to not change this part of block/io.c.  Rather,
>> work should be expended on individual protocol drivers to quit wasting
>> unnecessary time on reporting BDRV_BLOCK_ZERO if the time spent to
>> determine that is not worth the effort (as it is when using lseek() on
>> inefficient filesystems).  It is always safe for a protocol driver to
>> report BDRV_BLOCK_DATA instead of BDRV_BLOCK_ZERO, if asking the
>> question is too expensive (treating holes

Re: [Qemu-block] [PATCH v10 09/17] qcow2: Optimize write zero of unaligned tail cluster

2017-04-28 Thread Max Reitz

On 27.04.2017 03:46, Eric Blake wrote:
> We've already improved discards to operate efficiently on the tail
> of an unaligned qcow2 image; it's time to make a similar improvement
> to write zeroes.  The special case is only valid at the tail
> cluster of a file, where we must recognize that any sectors beyond
> the image end would implicitly read as zero, and therefore should
> not penalize our logic for widening a partial cluster into writing
> the whole cluster as zero.
> 
> Update test 154 to cover the new scenarios, using two images of
> intentionally differing length.
> 
> While at it, fix the test to gracefully skip when run as
> ./check -qcow2 -o compat=0.10 154
> since the older format lacks zero clusters already required earlier
> in the test.
> 
> Signed-off-by: Eric Blake 
> 
> ---
> v10: rebase to better reporting of preallocated zero clusters
> v9: new patch
> ---
>  block/qcow2.c  |   7 +++
>  tests/qemu-iotests/154 | 112 
> -
>  tests/qemu-iotests/154.out |  83 +
>  3 files changed, 200 insertions(+), 2 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 5c1573c..8038793 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -2449,6 +2449,10 @@ static bool is_zero_sectors(BlockDriverState *bs, 
> int64_t start,
>  BlockDriverState *file;
>  int64_t res;
> 
> +if (start + count > bs->total_sectors) {
> +count = bs->total_sectors - start;
> +}
> +
>  if (!count) {
>  return true;
>  }
> @@ -2467,6 +2471,9 @@ static coroutine_fn int 
> qcow2_co_pwrite_zeroes(BlockDriverState *bs,
>  uint32_t tail = (offset + count) % s->cluster_size;
> 
>  trace_qcow2_pwrite_zeroes_start_req(qemu_coroutine_self(), offset, 
> count);
> +if (offset + count == bs->total_sectors * BDRV_SECTOR_SIZE) {
> +tail = 0;
> +}
> 
>  if (head || tail) {
>  int64_t cl_start = (offset - head) >> BDRV_SECTOR_BITS;
> diff --git a/tests/qemu-iotests/154 b/tests/qemu-iotests/154
> index 7ca7219..005f09f 100755
> --- a/tests/qemu-iotests/154
> +++ b/tests/qemu-iotests/154
> @@ -2,7 +2,7 @@
>  #
>  # qcow2 specific bdrv_pwrite_zeroes tests with backing files (complements 
> 034)
>  #
> -# Copyright (C) 2016 Red Hat, Inc.
> +# Copyright (C) 2016-2017 Red Hat, Inc.
>  #
>  # This program is free software; you can redistribute it and/or modify
>  # it under the terms of the GNU General Public License as published by
> @@ -42,7 +42,10 @@ _supported_proto file
>  _supported_os Linux
> 
>  CLUSTER_SIZE=4k
> -size=128M
> +size=$((128 * 1024 * 1024))
> +
> +# This test requires zero clusters, added in v3 images
> +_unsupported_imgopts compat=0.10
> 
>  echo
>  echo == backing file contains zeros ==
> @@ -299,6 +302,111 @@ $QEMU_IO -c "read -P 0 75k 1k" "$TEST_IMG" | 
> _filter_qemu_io
> 
>  $QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
> 
> +echo
> +echo == unaligned image tail cluster, no allocation needed ==
> +
> +CLUSTER_SIZE=1024 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))

Any reason for the CLUSTER_SIZE? It passes with 64 kB as well, and I
actually think that would be a better test because as it is, the image's
"unaligned" tail is exactly one cluster (so it isn't really unaligned).

> +_make_test_img -b "$TEST_IMG.base" $((size + 2048))
> +
> +# With no backing file, write to all or part of unallocated partial cluster
> +
> +# Backing file: 128m: ... | 00 --

Not sure what the "00 --" is supposed to mean. The cluster size is 1 kB,
so this is just a single cluster; which will be converted from
unallocated to a zero cluster because the tail reads as zero.

> +$QEMU_IO -c "write -z $size 512" "$TEST_IMG.base" | _filter_qemu_io
> +
> +# Backing file: 128m: ... | -- 00

Same here, but now it's the head that reads as zero (and it's already a
zero cluster so nothing changes here).

> +$QEMU_IO -c "write -z $size 512" "$TEST_IMG.base" | _filter_qemu_io

But I suppose you mean $((size + 512))?

> +
> +# Backing file: 128m: ... | 00 00

And finally this doesn't change anything either -- but then again this
is a fully aligned request, so I don't quite see the point in testing
this here.

> +$QEMU_IO -c "write -z $size 1024" "$TEST_IMG.base" | _filter_qemu_io
> +
> +# No offset should be allocated, although the cluster itself is considered
> +# allocated due to the zero flag
> +$QEMU_IO -c "alloc $size 1024" "$TEST_IMG.base" | _filter_qemu_io
> +$QEMU_IMG map --output=json "$TEST_IMG.base" | _filter_qemu_img_map
> +
> +# With backing file that reads as zero, write to all or part of entire 
> cluster
> +
> +# Backing file: 128m: ... | 00 00
> +# Active layer: 128m: ... | 00 00 00 00
> +$QEMU_IO -c "write -z $size 2048" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
> +
> +# Backing file: 128m: ... | 00 00
> +# Active layer: 128m: ... | 00 00 -- --

How so? Shouldn't this just stay a ze

Re: [Qemu-block] [PATCH v10 07/17] qemu-io: Switch 'alloc' command to byte-based length

2017-04-28 Thread Max Reitz

On 28.04.2017 22:36, Eric Blake wrote:
> On 04/28/2017 03:09 PM, Max Reitz wrote:
>> On 28.04.2017 21:59, Eric Blake wrote:
>>> On 04/28/2017 02:46 PM, Max Reitz wrote:
 On 27.04.2017 03:46, Eric Blake wrote:
> For the 'alloc' command, accepting an offset in bytes but a length
> in sectors, and reporting output in sectors, is confusing.  Do
> everything in bytes, and adjust the expected output accordingly.
>
> Signed-off-by: Eric Blake 
> Reviewed-by: Philippe Mathieu-Daudé 
>
>>>
>  }
> +if (!QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE)) {
> +printf("bytes %" PRId64 " is not sector aligned\n",

 This isn't real English. :-)
>>>
>>> But, it's just copy-and-paste from the other instances you just reviewed
>>> in 6/17!  [Translation - if I change this one, I also get to redo that one]
>>
>> No, those are "offset" and "count" -- all singular. "bytes" is plural. ;-)
> 
> Then an obvious solution: s/bytes/count/ in the parameter name :)
> 
> But I still get to redo those, to add the '-' in 'sector-aligned'.

Oh, right! Didn't even notice. Well, in real languages stuff like that
would have to be joined into a single word anyway.

>>> Which of these various alternatives (if any) looks better:
>>>
>>> bytes=511 is not sector-aligned
>>> 511 is not a sector-aligned value for 'bytes'
>>> requested 'bytes' of 511 is not sector-aligned
>>> alignment error: 511 bytes is not sector-aligned
>>> 'bytes' must be sector-aligned: 511
>>> your clever entry here...
>>
>> How about "byte count" instead of "bytes" or "bytes value", if really
>> want to have the exact spelling in there?
>>
>> For your entries above: (1) and (2) work for me (I like (2) a bit
>> better), (3) doesn't sound like real English either, and it should be
>> s/is/are/ in (4) (but it still sounds off with that change). (5) I
>> mostly dislike because I dislike error message of the form "This should
>> be X: $foo", I like "$foo is not X" better.
> 
> Maybe this variation of (3) solves the singular/plural disconnect:
> 
> request of 511 for 'bytes' is not sector-aligned
> 
> which makes it obvious that the "request of 511" (singular) and not the
> parameter name (whether singular 'count' or plural 'bytes') is the
> subject.  But it's a bit wordier than (2).  So it looks like (2) may be
> a winner in all the situations.  But I also think you convinced me to
> rename the command parameter; in my next spin, the help text will read:
> 
> alloc offset [count] -- checks if offset is allocated in the file
> 
> which starts to be non-trivial enough to drop R-b that you were willing
> to give for just an error message wording change.

Well, reviewing the change will be simple enough, so it doesn't really
matter to me. :-)

(I'm still a bit upset why you think that the average qemu-io user
cannot make the connection between "byte count" and a parameter named
"bytes". Because I am the average qemu-io user. Huff!)

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v10 07/17] qemu-io: Switch 'alloc' command to byte-based length

2017-04-28 Thread Eric Blake

On 04/28/2017 03:09 PM, Max Reitz wrote:
> On 28.04.2017 21:59, Eric Blake wrote:
>> On 04/28/2017 02:46 PM, Max Reitz wrote:
>>> On 27.04.2017 03:46, Eric Blake wrote:
 For the 'alloc' command, accepting an offset in bytes but a length
 in sectors, and reporting output in sectors, is confusing.  Do
 everything in bytes, and adjust the expected output accordingly.

 Signed-off-by: Eric Blake 
 Reviewed-by: Philippe Mathieu-Daudé 

>>
  }
 +if (!QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE)) {
 +printf("bytes %" PRId64 " is not sector aligned\n",
>>>
>>> This isn't real English. :-)
>>
>> But, it's just copy-and-paste from the other instances you just reviewed
>> in 6/17!  [Translation - if I change this one, I also get to redo that one]
> 
> No, those are "offset" and "count" -- all singular. "bytes" is plural. ;-)

Then an obvious solution: s/bytes/count/ in the parameter name :)

But I still get to redo those, to add the '-' in 'sector-aligned'.

> 
>>
>> Which of these various alternatives (if any) looks better:
>>
>> bytes=511 is not sector-aligned
>> 511 is not a sector-aligned value for 'bytes'
>> requested 'bytes' of 511 is not sector-aligned
>> alignment error: 511 bytes is not sector-aligned
>> 'bytes' must be sector-aligned: 511
>> your clever entry here...
> 
> How about "byte count" instead of "bytes" or "bytes value", if really
> want to have the exact spelling in there?
> 
> For your entries above: (1) and (2) work for me (I like (2) a bit
> better), (3) doesn't sound like real English either, and it should be
> s/is/are/ in (4) (but it still sounds off with that change). (5) I
> mostly dislike because I dislike error message of the form "This should
> be X: $foo", I like "$foo is not X" better.

Maybe this variation of (3) solves the singular/plural disconnect:

request of 511 for 'bytes' is not sector-aligned

which makes it obvious that the "request of 511" (singular) and not the
parameter name (whether singular 'count' or plural 'bytes') is the
subject.  But it's a bit wordier than (2).  So it looks like (2) may be
a winner in all the situations.  But I also think you convinced me to
rename the command parameter; in my next spin, the help text will read:

alloc offset [count] -- checks if offset is allocated in the file

which starts to be non-trivial enough to drop R-b that you were willing
to give for just an error message wording change.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-block] [PULL 30/34] qemu-img: improve convert_iteration_sectors()

2017-04-28 Thread Kevin Wolf

From: Vladimir Sementsov-Ogievskiy 

Do not do extra call to _get_block_status()

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-id: 20170407113404.9351-1-vsement...@virtuozzo.com
Reviewed-by: John Snow 
Signed-off-by: Max Reitz 
---
 qemu-img.c | 32 ++--
 1 file changed, 10 insertions(+), 22 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 9eb8283..9b6e728 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1554,9 +1554,15 @@ static int convert_iteration_sectors(ImgConvertState *s, 
int64_t sector_num)
 
 if (s->sector_next_status <= sector_num) {
 BlockDriverState *file;
-ret = bdrv_get_block_status(blk_bs(s->src[src_cur]),
-sector_num - src_cur_offset,
-n, &n, &file);
+if (s->target_has_backing) {
+ret = bdrv_get_block_status(blk_bs(s->src[src_cur]),
+sector_num - src_cur_offset,
+n, &n, &file);
+} else {
+ret = bdrv_get_block_status_above(blk_bs(s->src[src_cur]), NULL,
+  sector_num - src_cur_offset,
+  n, &n, &file);
+}
 if (ret < 0) {
 return ret;
 }
@@ -1565,26 +1571,8 @@ static int convert_iteration_sectors(ImgConvertState *s, 
int64_t sector_num)
 s->status = BLK_ZERO;
 } else if (ret & BDRV_BLOCK_DATA) {
 s->status = BLK_DATA;
-} else if (!s->target_has_backing) {
-/* Without a target backing file we must copy over the contents of
- * the backing file as well. */
-/* Check block status of the backing file chain to avoid
- * needlessly reading zeroes and limiting the iteration to the
- * buffer size */
-ret = bdrv_get_block_status_above(blk_bs(s->src[src_cur]), NULL,
-  sector_num - src_cur_offset,
-  n, &n, &file);
-if (ret < 0) {
-return ret;
-}
-
-if (ret & BDRV_BLOCK_ZERO) {
-s->status = BLK_ZERO;
-} else {
-s->status = BLK_DATA;
-}
 } else {
-s->status = BLK_BACKING_FILE;
+s->status = s->target_has_backing ? BLK_BACKING_FILE : BLK_DATA;
 }
 
 s->sector_next_status = sector_num + n;
-- 
1.8.3.1

[Qemu-block] [PULL 32/34] iotests: clarify help text

2017-04-28 Thread Kevin Wolf

From: John Snow 

Split the help text to highlight the groups of options
a little better, carving out a clear "format" and
"protocols" section.

Signed-off-by: John Snow 
Message-id: 20170427205100.9505-2-js...@redhat.com
Reviewed-by: Eric Blake 
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/common | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/common b/tests/qemu-iotests/common
index 9c6f972..fa8e69e 100644
--- a/tests/qemu-iotests/common
+++ b/tests/qemu-iotests/common
@@ -138,7 +138,7 @@ common options
 -v  verbose
 -d  debug
 
-check options
+image format options
 -rawtest raw (default)
 -bochs  test bochs
 -cloop  test cloop
@@ -150,14 +150,18 @@ check options
 -vpctest vpc
 -vhdx   test vhdx
 -vmdk   test vmdk
+-luks   test luks
+
+image protocol options
 -file   test file (default)
 -rbdtest rbd
 -sheepdog   test sheepdog
 -nbdtest nbd
 -sshtest ssh
 -nfstest nfs
--luks   test luks
 -vxhs   test vxhs
+
+other options
 -xdiff  graphical mode diff
 -nocacheuse O_DIRECT on backing file
 -misalign   misalign memory allocations
-- 
1.8.3.1

[Qemu-block] [PULL 29/34] block: assert no image modification under BDRV_O_INACTIVE

2017-04-28 Thread Kevin Wolf

From: "Denis V. Lunev" 

As long as BDRV_O_INACTIVE is set, the image file is only opened so we
have a file descriptor for it. We're definitely not supposed to modify
the image, it's still owned by the migration source.

This commit is an addition to 09e0c771 but the assert() is added to
bdrv_truncate().

Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Max Reitz 
Message-id: 1491405505-31620-3-git-send-email-...@openvz.org
Reviewed-by: Eric Blake 
Signed-off-by: Max Reitz 
---
 block.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block.c b/block.c
index 76bf00f..6c6bb3e 100644
--- a/block.c
+++ b/block.c
@@ -3328,6 +3328,8 @@ int bdrv_truncate(BdrvChild *child, int64_t offset, Error 
**errp)
 return -EACCES;
 }
 
+assert(!(bs->open_flags & BDRV_O_INACTIVE));
+
 ret = drv->bdrv_truncate(bs, offset, errp);
 if (ret == 0) {
 ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
-- 
1.8.3.1

[Qemu-block] [PULL 34/34] progress: Show current progress on SIGINFO

2017-04-28 Thread Kevin Wolf

From: Max Reitz 

Currently we only print progress information on retrieval of SIGUSR1.
Some systems have a dedicated SIGINFO for this, however, so it should be
handled appropriately if it is available.

Buglink: https://bugs.launchpad.net/qemu/+bug/1662468
Signed-off-by: Max Reitz 
Message-id: 20170207235757.2026-1-mre...@redhat.com
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Max Reitz 
---
 qemu-img.texi| 3 ++-
 util/qemu-progress.c | 3 +++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/qemu-img.texi b/qemu-img.texi
index 8c573ae..50a2364 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -84,7 +84,8 @@ with or without a command shows help and lists the supported 
formats
 @item -p
 display progress bar (compare, convert and rebase commands only).
 If the @var{-p} option is not used for a command that supports it, the
-progress is reported when the process receives a @code{SIGUSR1} signal.
+progress is reported when the process receives a @code{SIGUSR1} or
+@code{SIGINFO} signal.
 @item -q
 Quiet mode - do not print any output (except errors). There's no progress bar
 in case both @var{-q} and @var{-p} options are used.
diff --git a/util/qemu-progress.c b/util/qemu-progress.c
index f745233..3c2223c 100644
--- a/util/qemu-progress.c
+++ b/util/qemu-progress.c
@@ -88,6 +88,9 @@ static void progress_dummy_init(void)
 action.sa_handler = sigusr_print;
 action.sa_flags = 0;
 sigaction(SIGUSR1, &action, NULL);
+#ifdef SIGINFO
+sigaction(SIGINFO, &action, NULL);
+#endif
 
 /*
  * SIGUSR1 is SIG_IPI and gets blocked in qemu_init_main_loop(). In the
-- 
1.8.3.1

[Qemu-block] [PULL 33/34] iotests: fix exclusion option

2017-04-28 Thread Kevin Wolf

From: John Snow 

If you are running out-of-tree, the -x option to exclude
a certain iotest is broken.

Replace porcelain usage of ls with a sturdier awk command.

Reviewed-by: Fam Zheng 
Signed-off-by: John Snow 
Message-id: 20170427205100.9505-3-js...@redhat.com
Reviewed-by: Eric Blake 
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/common | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/common b/tests/qemu-iotests/common
index fa8e69e..f2a7199 100644
--- a/tests/qemu-iotests/common
+++ b/tests/qemu-iotests/common
@@ -86,7 +86,8 @@ s/ .*//p
 elif $xgroup
 then
 # arg after -x
-[ ! -s $tmp.list ] && ls [0-9][0-9][0-9] [0-9][0-9][0-9][0-9] 
>$tmp.list 2>/dev/null
+# Populate $tmp.list with all tests
+awk '/^[0-9]{3,}/ {print $1}' "${source_iotests}/group" > $tmp.list 
2>/dev/null
 group_list=`sed -n <"$source_iotests/group" -e 's/$/ /' -e 
"/^[0-9][0-9][0-9].* $r /"'{
 s/ .*//p
 }'`
-- 
1.8.3.1

[Qemu-block] [PULL 25/34] block: Add errp to BD.bdrv_truncate()

2017-04-28 Thread Kevin Wolf

From: Max Reitz 

Add an Error parameter to the block drivers' bdrv_truncate() interface.
If a block driver does not set this in case of an error, the generic
bdrv_truncate() implementation will do so.

Where it is obvious, this patch also makes some block drivers set this
value.

Signed-off-by: Max Reitz 
Message-id: 20170328205129.15138-4-mre...@redhat.com
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Max Reitz 
---
 block.c   |  4 ++--
 block/blkdebug.c  |  4 ++--
 block/crypto.c|  5 +++--
 block/file-posix.c|  2 +-
 block/file-win32.c|  6 +++---
 block/gluster.c   |  3 ++-
 block/iscsi.c |  4 ++--
 block/nfs.c   |  2 +-
 block/qcow2.c |  8 
 block/qed.c   |  2 +-
 block/raw-format.c|  4 ++--
 block/rbd.c   |  2 +-
 block/sheepdog.c  | 14 ++
 include/block/block_int.h |  2 +-
 14 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/block.c b/block.c
index 6a1937e..ff232a2 100644
--- a/block.c
+++ b/block.c
@@ -3328,13 +3328,13 @@ int bdrv_truncate(BdrvChild *child, int64_t offset, 
Error **errp)
 return -EACCES;
 }
 
-ret = drv->bdrv_truncate(bs, offset);
+ret = drv->bdrv_truncate(bs, offset, errp);
 if (ret == 0) {
 ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
 bdrv_dirty_bitmap_truncate(bs);
 bdrv_parent_cb_resize(bs);
 ++bs->write_gen;
-} else {
+} else if (errp && !*errp) {
 error_setg_errno(errp, -ret, "Failed to resize image");
 }
 return ret;
diff --git a/block/blkdebug.c b/block/blkdebug.c
index 9bd066e..d2a7561 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -659,9 +659,9 @@ static int64_t blkdebug_getlength(BlockDriverState *bs)
 return bdrv_getlength(bs->file->bs);
 }
 
-static int blkdebug_truncate(BlockDriverState *bs, int64_t offset)
+static int blkdebug_truncate(BlockDriverState *bs, int64_t offset, Error 
**errp)
 {
-return bdrv_truncate(bs->file, offset, NULL);
+return bdrv_truncate(bs->file, offset, errp);
 }
 
 static void blkdebug_refresh_filename(BlockDriverState *bs, QDict *options)
diff --git a/block/crypto.c b/block/crypto.c
index 7ad64b5..6828180 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -381,7 +381,8 @@ static int block_crypto_create_generic(QCryptoBlockFormat 
format,
 return ret;
 }
 
-static int block_crypto_truncate(BlockDriverState *bs, int64_t offset)
+static int block_crypto_truncate(BlockDriverState *bs, int64_t offset,
+ Error **errp)
 {
 BlockCrypto *crypto = bs->opaque;
 size_t payload_offset =
@@ -389,7 +390,7 @@ static int block_crypto_truncate(BlockDriverState *bs, 
int64_t offset)
 
 offset += payload_offset;
 
-return bdrv_truncate(bs->file, offset, NULL);
+return bdrv_truncate(bs->file, offset, errp);
 }
 
 static void block_crypto_close(BlockDriverState *bs)
diff --git a/block/file-posix.c b/block/file-posix.c
index ade71db..9c0d701 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1407,7 +1407,7 @@ static void raw_close(BlockDriverState *bs)
 }
 }
 
-static int raw_truncate(BlockDriverState *bs, int64_t offset)
+static int raw_truncate(BlockDriverState *bs, int64_t offset, Error **errp)
 {
 BDRVRawState *s = bs->opaque;
 struct stat st;
diff --git a/block/file-win32.c b/block/file-win32.c
index e132ba1..7872e00 100644
--- a/block/file-win32.c
+++ b/block/file-win32.c
@@ -460,7 +460,7 @@ static void raw_close(BlockDriverState *bs)
 }
 }
 
-static int raw_truncate(BlockDriverState *bs, int64_t offset)
+static int raw_truncate(BlockDriverState *bs, int64_t offset, Error **errp)
 {
 BDRVRawState *s = bs->opaque;
 LONG low, high;
@@ -475,11 +475,11 @@ static int raw_truncate(BlockDriverState *bs, int64_t 
offset)
  */
 dwPtrLow = SetFilePointer(s->hfile, low, &high, FILE_BEGIN);
 if (dwPtrLow == INVALID_SET_FILE_POINTER && GetLastError() != NO_ERROR) {
-fprintf(stderr, "SetFilePointer error: %lu\n", GetLastError());
+error_setg_win32(errp, GetLastError(), "SetFilePointer error");
 return -EIO;
 }
 if (SetEndOfFile(s->hfile) == 0) {
-fprintf(stderr, "SetEndOfFile error: %lu\n", GetLastError());
+error_setg_win32(errp, GetLastError(), "SetEndOfFile error");
 return -EIO;
 }
 return 0;
diff --git a/block/gluster.c b/block/gluster.c
index cf29b5f..be45c08 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -1092,7 +1092,8 @@ static coroutine_fn int 
qemu_gluster_co_rw(BlockDriverState *bs,
 return acb.ret;
 }
 
-static int qemu_gluster_truncate(BlockDriverState *bs, int64_t offset)
+static int qemu_gluster_truncate(BlockDriverState *bs, int64_t offset,
+ Error **errp)
 {
 int ret;
 BDRVGlusterState *s = bs->opaque;
diff --git a/block/iscsi.c b/block/iscsi.c
index 42fb0b0..1

[Qemu-block] [PULL 19/34] block: fix alignment calculations in bdrv_co_do_zero_pwritev

2017-04-28 Thread Kevin Wolf

From: "Denis V. Lunev" 

tail_padding_bytes is calculated wrong. F.e. for
offset = 0
bytes = 2048
align = 512
we will have tail_padding_bytes = 512 which is definitely wrong. The patch
fixes that arithmetics.

Fortunately this problem is harmless, we will have 1 extra allocation and
free thus there is no need to put this into stable. The problem is here
from the very beginning.

Signed-off-by: Denis V. Lunev 
CC: Stefan Hajnoczi 
CC: Fam Zheng 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 block/io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index 80d6c62..40bd94f 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1444,7 +1444,7 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild 
*child,
 int ret = 0;
 
 head_padding_bytes = offset & (align - 1);
-tail_padding_bytes = align - ((offset + bytes) & (align - 1));
+tail_padding_bytes = (align - (offset + bytes)) & (align - 1);
 
 
 assert(flags & BDRV_REQ_ZERO_WRITE);
-- 
1.8.3.1

[Qemu-block] [PULL 24/34] block: Add errp to b{lk,drv}_truncate()

2017-04-28 Thread Kevin Wolf

From: Max Reitz 

For one thing, this allows us to drop the error message generation from
qemu-img.c and blockdev.c and instead have it unified in
bdrv_truncate().

Signed-off-by: Max Reitz 
Message-id: 20170328205129.15138-3-mre...@redhat.com
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Max Reitz 
---
 block.c| 16 
 block/blkdebug.c   |  2 +-
 block/block-backend.c  |  5 +++--
 block/commit.c |  5 +++--
 block/crypto.c |  2 +-
 block/mirror.c |  2 +-
 block/parallels.c  | 13 -
 block/qcow.c   |  6 +++---
 block/qcow2-refcount.c |  5 -
 block/qcow2.c  | 14 +-
 block/qed.c|  2 +-
 block/raw-format.c |  2 +-
 block/vdi.c|  4 ++--
 block/vhdx-log.c   |  2 +-
 block/vhdx.c   | 10 +++---
 block/vmdk.c   | 13 +++--
 block/vpc.c| 13 +++--
 blockdev.c | 21 +
 include/block/block.h  |  2 +-
 include/sysemu/block-backend.h |  2 +-
 qemu-img.c | 17 -
 qemu-io-cmds.c |  5 +++--
 22 files changed, 73 insertions(+), 90 deletions(-)

diff --git a/block.c b/block.c
index 7b557f3..6a1937e 100644
--- a/block.c
+++ b/block.c
@@ -3307,7 +3307,7 @@ exit:
 /**
  * Truncate file to 'offset' bytes (needed only for file protocols)
  */
-int bdrv_truncate(BdrvChild *child, int64_t offset)
+int bdrv_truncate(BdrvChild *child, int64_t offset, Error **errp)
 {
 BlockDriverState *bs = child->bs;
 BlockDriver *drv = bs->drv;
@@ -3315,12 +3315,18 @@ int bdrv_truncate(BdrvChild *child, int64_t offset)
 
 assert(child->perm & BLK_PERM_RESIZE);
 
-if (!drv)
+if (!drv) {
+error_setg(errp, "No medium inserted");
 return -ENOMEDIUM;
-if (!drv->bdrv_truncate)
+}
+if (!drv->bdrv_truncate) {
+error_setg(errp, "Image format driver does not support resize");
 return -ENOTSUP;
-if (bs->read_only)
+}
+if (bs->read_only) {
+error_setg(errp, "Image is read-only");
 return -EACCES;
+}
 
 ret = drv->bdrv_truncate(bs, offset);
 if (ret == 0) {
@@ -3328,6 +3334,8 @@ int bdrv_truncate(BdrvChild *child, int64_t offset)
 bdrv_dirty_bitmap_truncate(bs);
 bdrv_parent_cb_resize(bs);
 ++bs->write_gen;
+} else {
+error_setg_errno(errp, -ret, "Failed to resize image");
 }
 return ret;
 }
diff --git a/block/blkdebug.c b/block/blkdebug.c
index cc4a146..9bd066e 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -661,7 +661,7 @@ static int64_t blkdebug_getlength(BlockDriverState *bs)
 
 static int blkdebug_truncate(BlockDriverState *bs, int64_t offset)
 {
-return bdrv_truncate(bs->file, offset);
+return bdrv_truncate(bs->file, offset, NULL);
 }
 
 static void blkdebug_refresh_filename(BlockDriverState *bs, QDict *options)
diff --git a/block/block-backend.c b/block/block-backend.c
index ae3d771..f5bf13e 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1746,13 +1746,14 @@ int blk_pwrite_compressed(BlockBackend *blk, int64_t 
offset, const void *buf,
BDRV_REQ_WRITE_COMPRESSED);
 }
 
-int blk_truncate(BlockBackend *blk, int64_t offset)
+int blk_truncate(BlockBackend *blk, int64_t offset, Error **errp)
 {
 if (!blk_is_available(blk)) {
+error_setg(errp, "No medium inserted");
 return -ENOMEDIUM;
 }
 
-return bdrv_truncate(blk->root, offset);
+return bdrv_truncate(blk->root, offset, errp);
 }
 
 static void blk_pdiscard_entry(void *opaque)
diff --git a/block/commit.c b/block/commit.c
index 91d2c34..76a0d98 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -151,7 +151,7 @@ static void coroutine_fn commit_run(void *opaque)
 }
 
 if (base_len < s->common.len) {
-ret = blk_truncate(s->base, s->common.len);
+ret = blk_truncate(s->base, s->common.len, NULL);
 if (ret) {
 goto out;
 }
@@ -511,8 +511,9 @@ int bdrv_commit(BlockDriverState *bs)
  * grow the backing file image if possible.  If not possible,
  * we must return an error */
 if (length > backing_length) {
-ret = blk_truncate(backing, length);
+ret = blk_truncate(backing, length, &local_err);
 if (ret < 0) {
+error_report_err(local_err);
 goto ro_cleanup;
 }
 }
diff --git a/block/crypto.c b/block/crypto.c
index 34549b2..7ad64b5 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -389,7 +389,7 @@ static int block_crypto_truncate(BlockDriverState *bs, 
int64_t offset)
 
 offset += payload_offset;
 
-return bdrv_truncate(bs->file, offset);
+return bdrv_truncate(bs->file, offset, NULL);
 }
 
 static void block_crypto_close(BlockDriverState *bs)
diff

[Qemu-block] [PULL 26/34] block: Add .bdrv_truncate() error messages

2017-04-28 Thread Kevin Wolf

From: Max Reitz 

Add missing error messages for the block driver implementations of
.bdrv_truncate(); drop the generic one from block.c's bdrv_truncate().

Since one of these changes touches a mis-indented block in
block/file-posix.c, this patch fixes that coding style issue along the
way.

Signed-off-by: Max Reitz 
Message-id: 20170328205129.15138-5-mre...@redhat.com
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Max Reitz 
---
 block.c|  2 --
 block/file-posix.c | 17 -
 block/gluster.c|  4 +++-
 block/iscsi.c  |  2 ++
 block/nfs.c| 10 +-
 block/qcow2.c  |  2 ++
 block/qed.c|  4 +++-
 block/raw-format.c |  2 ++
 block/rbd.c|  1 +
 9 files changed, 34 insertions(+), 10 deletions(-)

diff --git a/block.c b/block.c
index ff232a2..76bf00f 100644
--- a/block.c
+++ b/block.c
@@ -3334,8 +3334,6 @@ int bdrv_truncate(BdrvChild *child, int64_t offset, Error 
**errp)
 bdrv_dirty_bitmap_truncate(bs);
 bdrv_parent_cb_resize(bs);
 ++bs->write_gen;
-} else if (errp && !*errp) {
-error_setg_errno(errp, -ret, "Failed to resize image");
 }
 return ret;
 }
diff --git a/block/file-posix.c b/block/file-posix.c
index 9c0d701..1941fb6 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1411,20 +1411,27 @@ static int raw_truncate(BlockDriverState *bs, int64_t 
offset, Error **errp)
 {
 BDRVRawState *s = bs->opaque;
 struct stat st;
+int ret;
 
 if (fstat(s->fd, &st)) {
-return -errno;
+ret = -errno;
+error_setg_errno(errp, -ret, "Failed to fstat() the file");
+return ret;
 }
 
 if (S_ISREG(st.st_mode)) {
 if (ftruncate(s->fd, offset) < 0) {
-return -errno;
+ret = -errno;
+error_setg_errno(errp, -ret, "Failed to resize the file");
+return ret;
 }
 } else if (S_ISCHR(st.st_mode) || S_ISBLK(st.st_mode)) {
-   if (offset > raw_getlength(bs)) {
-   return -EINVAL;
-   }
+if (offset > raw_getlength(bs)) {
+error_setg(errp, "Cannot grow device files");
+return -EINVAL;
+}
 } else {
+error_setg(errp, "Resizing this file is not supported");
 return -ENOTSUP;
 }
 
diff --git a/block/gluster.c b/block/gluster.c
index be45c08..1d4e2f7 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -1100,7 +1100,9 @@ static int qemu_gluster_truncate(BlockDriverState *bs, 
int64_t offset,
 
 ret = glfs_ftruncate(s->fd, offset);
 if (ret < 0) {
-return -errno;
+ret = -errno;
+error_setg_errno(errp, -ret, "Failed to truncate file");
+return ret;
 }
 
 return 0;
diff --git a/block/iscsi.c b/block/iscsi.c
index 1ef38cf..5daa201 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -2065,6 +2065,7 @@ static int iscsi_truncate(BlockDriverState *bs, int64_t 
offset, Error **errp)
 Error *local_err = NULL;
 
 if (iscsilun->type != TYPE_DISK) {
+error_setg(errp, "Cannot resize non-disk iSCSI devices");
 return -ENOTSUP;
 }
 
@@ -2075,6 +2076,7 @@ static int iscsi_truncate(BlockDriverState *bs, int64_t 
offset, Error **errp)
 }
 
 if (offset > iscsi_getlength(bs)) {
+error_setg(errp, "Cannot grow iSCSI devices");
 return -EINVAL;
 }
 
diff --git a/block/nfs.c b/block/nfs.c
index 5ae665a..76572ae 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -767,7 +767,15 @@ static int64_t 
nfs_get_allocated_file_size(BlockDriverState *bs)
 static int nfs_file_truncate(BlockDriverState *bs, int64_t offset, Error 
**errp)
 {
 NFSClient *client = bs->opaque;
-return nfs_ftruncate(client->context, client->fh, offset);
+int ret;
+
+ret = nfs_ftruncate(client->context, client->fh, offset);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to truncate file");
+return ret;
+}
+
+return 0;
 }
 
 /* Note that this will not re-establish a connection with the NFS server
diff --git a/block/qcow2.c b/block/qcow2.c
index 6c34798..4ca4cf0 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2551,6 +2551,7 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t 
offset, Error **errp)
 new_l1_size = size_to_l1(s, offset);
 ret = qcow2_grow_l1_table(bs, new_l1_size, true);
 if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to grow the L1 table");
 return ret;
 }
 
@@ -2559,6 +2560,7 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t 
offset, Error **errp)
 ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, size),
&offset, sizeof(uint64_t));
 if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to update the image size");
 return ret;
 }
 
diff --git a/block/qed.c b/block/qed.c
index fa2aeee..fd76817 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -1526,11 +1526,12 @@ static int bdrv_qed_truncate(BlockDriverState *bs, 
int64_t offs

[Qemu-block] [PULL 07/34] qemu-iotests: Filter HMP readline escape characters

2017-04-28 Thread Kevin Wolf

The only thing the escape characters achieve is making the reference
output unreadable and lines that are potentially so long that git
doesn't want to put them into an email any more. Let's filter them out.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
---
 tests/qemu-iotests/028.out   |   2 +-
 tests/qemu-iotests/051   |   3 +-
 tests/qemu-iotests/051.out   | 106 +++
 tests/qemu-iotests/051.pc.out| 132 +++
 tests/qemu-iotests/068   |   4 +-
 tests/qemu-iotests/068.out   |   6 +-
 tests/qemu-iotests/130.out   |   4 +-
 tests/qemu-iotests/142   |   2 +-
 tests/qemu-iotests/142.out   |  10 +--
 tests/qemu-iotests/145   |   3 +-
 tests/qemu-iotests/145.out   |   2 +-
 tests/qemu-iotests/common.filter |   7 +++
 tests/qemu-iotests/common.qemu   |   4 +-
 13 files changed, 147 insertions(+), 138 deletions(-)

diff --git a/tests/qemu-iotests/028.out b/tests/qemu-iotests/028.out
index acd2870..7d54aeb 100644
--- a/tests/qemu-iotests/028.out
+++ b/tests/qemu-iotests/028.out
@@ -469,7 +469,7 @@ No errors were found on the image.
 block-backup
 
 Formatting 'TEST_DIR/t.IMGFMT.copy', fmt=IMGFMT size=4294968832 
backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT
-(qemu) i[K[Din[K[D[Dinf[K[D[D[Dinfo[K[D[D[D[Dinfo 
[K[D[D[D[D[Dinfo b[K[D[D[D[D[D[Dinfo 
bl[K[D[D[D[D[D[D[Dinfo blo[K[D[D[D[D[D[D[D[Dinfo 
bloc[K[D[D[D[D[D[D[D[D[Dinfo 
block[K[D[D[D[D[D[D[D[D[D[Dinfo 
block-[K[D[D[D[D[D[D[D[D[D[D[Dinfo 
block-j[K[D[D[D[D[D[D[D[D[D[D[D[Dinfo 
block-jo[K[D[D[D[D[D[D[D[D[D[D[D[D[Dinfo 
block-job[K[D[D[D[D[D[D[D[D[D[D[D[D[D[Dinfo block-jobs[K
+(qemu) info block-jobs
 No active jobs
 === IO: pattern 195
 read 512/512 bytes at offset 3221194240
diff --git a/tests/qemu-iotests/051 b/tests/qemu-iotests/051
index 630cb7a..7c78a02 100755
--- a/tests/qemu-iotests/051
+++ b/tests/qemu-iotests/051
@@ -60,7 +60,8 @@ function do_run_qemu()
 
 function run_qemu()
 {
-do_run_qemu "$@" 2>&1 | _filter_testdir | _filter_qemu | 
_filter_generated_node_ids
+do_run_qemu "$@" 2>&1 | _filter_testdir | _filter_qemu |
+_filter_generated_node_ids | _filter_hmp
 }
 
 size=128M
diff --git a/tests/qemu-iotests/051.out b/tests/qemu-iotests/051.out
index 7524c62..3b7b040 100644
--- a/tests/qemu-iotests/051.out
+++ b/tests/qemu-iotests/051.out
@@ -58,12 +58,12 @@ QEMU X.Y.Z monitor - type 'help' for more information
 
 Testing: -drive 
file=TEST_DIR/t.qcow2,driver=qcow2,backing.file.filename=TEST_DIR/t.qcow2.orig,if=none,id=drive0
 -nodefaults
 QEMU X.Y.Z monitor - type 'help' for more information
-(qemu) i[K[Din[K[D[Dinf[K[D[D[Dinfo[K[D[D[D[Dinfo 
[K[D[D[D[D[Dinfo b[K[D[D[D[D[D[Dinfo 
bl[K[D[D[D[D[D[D[Dinfo blo[K[D[D[D[D[D[D[D[Dinfo 
bloc[K[D[D[D[D[D[D[D[D[Dinfo block[K
+(qemu) info block
 drive0 (NODE_NAME): TEST_DIR/t.qcow2 (qcow2)
 Removable device: not locked, tray closed
 Cache mode:   writeback
 Backing file: TEST_DIR/t.qcow2.orig (chain depth: 1)
-(qemu) q[K[Dqu[K[D[Dqui[K[D[D[Dquit[K
+(qemu) quit
 
 Testing: -drive 
file=TEST_DIR/t.qcow2,driver=raw,backing.file.filename=TEST_DIR/t.qcow2.orig
 QEMU_PROG: -drive 
file=TEST_DIR/t.qcow2,driver=raw,backing.file.filename=TEST_DIR/t.qcow2.orig: 
Driver doesn't support backing files
@@ -79,11 +79,11 @@ QEMU_PROG: -drive 
file=TEST_DIR/t.qcow2,file.backing.driver=qcow2,file.backing.f
 
 Testing: -drive file=TEST_DIR/t.qcow2,format=qcow2,lazy-refcounts=on
 QEMU X.Y.Z monitor - type 'help' for more information
-(qemu) q[K[Dqu[K[D[Dqui[K[D[D[Dquit[K
+(qemu) quit
 
 Testing: -drive file=TEST_DIR/t.qcow2,format=qcow2,lazy-refcounts=off
 QEMU X.Y.Z monitor - type 'help' for more information
-(qemu) q[K[Dqu[K[D[Dqui[K[D[D[Dquit[K
+(qemu) quit
 
 Testing: -drive file=TEST_DIR/t.qcow2,format=qcow2,lazy-refcounts=
 QEMU_PROG: -drive file=TEST_DIR/t.qcow2,format=qcow2,lazy-refcounts=: 
Parameter 'lazy-refcounts' expects 'on' or 'off'
@@ -103,7 +103,7 @@ QEMU_PROG: -drive 
file=TEST_DIR/t.qcow2,format=qcow2,lazy-refcounts=on: Lazy ref
 
 Testing: -drive file=TEST_DIR/t.qcow2,format=qcow2,lazy-refcounts=off
 QEMU X.Y.Z monitor - type 'help' for more information
-(qemu) q[K[Dqu[K[D[Dqui[K[D[D[Dquit[K
+(qemu) quit
 
 
 === No medium ===
@@ -117,93 +117,93 @@ QEMU X.Y.Z monitor - type 'help' for more information
 
 Testing: -drive file=TEST_DIR/t.qcow2,if=virtio,readonly=on
 QEMU X.Y.Z monitor - type 'help' for more information
-(qemu) q[K[Dqu[K[D[Dqui[K[D[D[Dquit[K
+(qemu) quit
 
 
 === Cache modes ===
 
 Testing: -drive driver=null-co,cache=none
 QEMU X.Y.Z monitor - type 'help' for more information
-(qemu) q[K[Dqu[K[D[Dqui[K[D[D[Dquit[K
+(qemu) quit
 
 Testing: -drive driver=null-co,cache=directsync
 QEMU X.Y.Z monitor - t

[Qemu-block] [PULL 31/34] qemu-img: use blk_co_pwrite_zeroes for zero sectors when compressed

2017-04-28 Thread Kevin Wolf

From: Lidong Chen 

When the buffer is zero, blk_co_pwrite_zeroes is more effective than
blk_co_pwritev with BDRV_REQ_WRITE_COMPRESSED. This patch can reduce
the time for converting qcow2 images with lots of zero data.

Signed-off-by: Lidong Chen 
Message-id: 1493261907-18734-1-git-send-email-lidongc...@tencent.com
Signed-off-by: Max Reitz 
---
 qemu-img.c | 44 ++--
 1 file changed, 14 insertions(+), 30 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 9b6e728..c719636 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1649,6 +1649,8 @@ static int coroutine_fn convert_co_write(ImgConvertState 
*s, int64_t sector_num,
 
 while (nb_sectors > 0) {
 int n = nb_sectors;
+BdrvRequestFlags flags = s->compressed ? BDRV_REQ_WRITE_COMPRESSED : 0;
+
 switch (status) {
 case BLK_BACKING_FILE:
 /* If we have a backing file, leave clusters unallocated that are
@@ -1658,43 +1660,24 @@ static int coroutine_fn 
convert_co_write(ImgConvertState *s, int64_t sector_num,
 break;
 
 case BLK_DATA:
-/* We must always write compressed clusters as a whole, so don't
- * try to find zeroed parts in the buffer. We can only save the
- * write if the buffer is completely zeroed and we're allowed to
- * keep the target sparse. */
-if (s->compressed) {
-if (s->has_zero_init && s->min_sparse &&
-buffer_is_zero(buf, n * BDRV_SECTOR_SIZE))
-{
-assert(!s->target_has_backing);
-break;
-}
-
-iov.iov_base = buf;
-iov.iov_len = n << BDRV_SECTOR_BITS;
-qemu_iovec_init_external(&qiov, &iov, 1);
-
-ret = blk_co_pwritev(s->target, sector_num << BDRV_SECTOR_BITS,
- n << BDRV_SECTOR_BITS, &qiov,
- BDRV_REQ_WRITE_COMPRESSED);
-if (ret < 0) {
-return ret;
-}
-break;
-}
-
-/* If there is real non-zero data or we're told to keep the target
- * fully allocated (-S 0), we must write it. Otherwise we can treat
- * it as zero sectors. */
+/* If we're told to keep the target fully allocated (-S 0) or there
+ * is real non-zero data, we must write it. Otherwise we can treat
+ * it as zero sectors.
+ * Compressed clusters need to be written as a whole, so in that
+ * case we can only save the write if the buffer is completely
+ * zeroed. */
 if (!s->min_sparse ||
-is_allocated_sectors_min(buf, n, &n, s->min_sparse))
+(!s->compressed &&
+ is_allocated_sectors_min(buf, n, &n, s->min_sparse)) ||
+(s->compressed &&
+ !buffer_is_zero(buf, n * BDRV_SECTOR_SIZE)))
 {
 iov.iov_base = buf;
 iov.iov_len = n << BDRV_SECTOR_BITS;
 qemu_iovec_init_external(&qiov, &iov, 1);
 
 ret = blk_co_pwritev(s->target, sector_num << BDRV_SECTOR_BITS,
- n << BDRV_SECTOR_BITS, &qiov, 0);
+ n << BDRV_SECTOR_BITS, &qiov, flags);
 if (ret < 0) {
 return ret;
 }
@@ -1704,6 +1687,7 @@ static int coroutine_fn convert_co_write(ImgConvertState 
*s, int64_t sector_num,
 
 case BLK_ZERO:
 if (s->has_zero_init) {
+assert(!s->target_has_backing);
 break;
 }
 ret = blk_co_pwrite_zeroes(s->target,
-- 
1.8.3.1

[Qemu-block] [PULL 28/34] block: fix obvious coding style mistakes in block_int.h

2017-04-28 Thread Kevin Wolf

From: Klim Kireev 

Signed-off-by: Klim Kireev 
Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Max Reitz 
Message-id: 1491405505-31620-2-git-send-email-...@openvz.org
Reviewed-by: Eric Blake 
Signed-off-by: Max Reitz 
---
 include/block/block_int.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index dcde90a..8773940 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -252,7 +252,7 @@ struct BlockDriver {
  * Returns 0 for completed check, -errno for internal errors.
  * The check results are stored in result.
  */
-int (*bdrv_check)(BlockDriverState* bs, BdrvCheckResult *result,
+int (*bdrv_check)(BlockDriverState *bs, BdrvCheckResult *result,
 BdrvCheckMode fix);
 
 int (*bdrv_amend_options)(BlockDriverState *bs, QemuOpts *opts,
@@ -454,13 +454,13 @@ struct BdrvChildRole {
 /* Returns a name that is supposedly more useful for human users than the
  * node name for identifying the node in question (in particular, a BB
  * name), or NULL if the parent can't provide a better name. */
-const char* (*get_name)(BdrvChild *child);
+const char *(*get_name)(BdrvChild *child);
 
 /* Returns a malloced string that describes the parent of the child for a
  * human reader. This could be a node-name, BlockBackend name, qdev ID or
  * QOM path of the device owning the BlockBackend, job type and ID etc. The
  * caller is responsible for freeing the memory. */
-char* (*get_parent_desc)(BdrvChild *child);
+char *(*get_parent_desc)(BdrvChild *child);
 
 /*
  * If this pair of functions is implemented, the parent doesn't issue new
-- 
1.8.3.1

[Qemu-block] [PULL 27/34] qcow2: Allow discard of final unaligned cluster

2017-04-28 Thread Kevin Wolf

From: Eric Blake 

As mentioned in commit 0c1bd46, we ignored requests to
discard the trailing cluster of an unaligned image.  While
discard is an advisory operation from the guest standpoint,
(and we are therefore free to ignore any request), our
qcow2 implementation exploits the fact that a discarded
cluster reads back as 0.  As long as we discard on cluster
boundaries, we are fine; but that means we could observe
non-zero data leaked at the tail of an unaligned image.

Enhance iotest 66 to cover this case, and fix the implementation
to honor a discard request on the final partial cluster.

Signed-off-by: Eric Blake 
Message-id: 20170407013709.18440-1-ebl...@redhat.com
Signed-off-by: Max Reitz 
---
 block/qcow2.c  |  7 ++-
 tests/qemu-iotests/066 | 12 +++-
 tests/qemu-iotests/066.out | 12 
 3 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 4ca4cf0..5c1573c 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2515,7 +2515,12 @@ static coroutine_fn int 
qcow2_co_pdiscard(BlockDriverState *bs,
 
 if (!QEMU_IS_ALIGNED(offset | count, s->cluster_size)) {
 assert(count < s->cluster_size);
-return -ENOTSUP;
+/* Ignore partial clusters, except for the special case of the
+ * complete partial cluster at the end of an unaligned file */
+if (!QEMU_IS_ALIGNED(offset, s->cluster_size) ||
+offset + count != bs->total_sectors * BDRV_SECTOR_SIZE) {
+return -ENOTSUP;
+}
 }
 
 qemu_co_mutex_lock(&s->lock);
diff --git a/tests/qemu-iotests/066 b/tests/qemu-iotests/066
index 364166d..c2116a3 100755
--- a/tests/qemu-iotests/066
+++ b/tests/qemu-iotests/066
@@ -42,16 +42,18 @@ _supported_fmt qcow2
 _supported_proto generic
 _supported_os Linux
 
+# Intentionally create an unaligned image
 IMGOPTS="compat=1.1"
-IMG_SIZE=64M
+IMG_SIZE=$((64 * 1024 * 1024 + 512))
 
 echo
-echo "=== Testing snapshotting an image with zero clusters ==="
+echo "=== Testing cluster discards ==="
 echo
 _make_test_img $IMG_SIZE
-# Write some normal clusters, zero them (creating preallocated zero clusters)
-# and discard those
-$QEMU_IO -c "write 0 256k" -c "write -z 0 256k" -c "discard 0 256k" 
"$TEST_IMG" \
+# Write some normal clusters, zero some of them (creating preallocated
+# zero clusters) and discard everything. Everything should now read as 0.
+$QEMU_IO -c "write 0 256k" -c "write -z 0 256k" -c "write 64M 512" \
+-c "discard 0 $IMG_SIZE" -c "read -P 0 0 $IMG_SIZE" "$TEST_IMG" \
  | _filter_qemu_io
 # Check the image (there shouldn't be any leaks)
 _check_test_img
diff --git a/tests/qemu-iotests/066.out b/tests/qemu-iotests/066.out
index 7bc9a10..7c1f31a 100644
--- a/tests/qemu-iotests/066.out
+++ b/tests/qemu-iotests/066.out
@@ -1,13 +1,17 @@
 QA output created by 066
 
-=== Testing snapshotting an image with zero clusters ===
+=== Testing cluster discards ===
 
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67109376
 wrote 262144/262144 bytes at offset 0
 256 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 262144/262144 bytes at offset 0
 256 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-discard 262144/262144 bytes at offset 0
-256 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 512/512 bytes at offset 67108864
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+discard 67109376/67109376 bytes at offset 0
+64 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 67109376/67109376 bytes at offset 0
+64 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 No errors were found on the image.
 *** done
-- 
1.8.3.1

[Qemu-block] [PULL 14/34] iotests: Launch qemu-nbd with -e 42

2017-04-28 Thread Kevin Wolf

From: Max Reitz 

There is no reason for the qemu-nbd server used for tests not to accept
an arbitrary number of clients. In fact, test 181 will require it to
accept two clients at the same time (and thus it fails before this
patch).

This patch updates common.rc to launch qemu-nbd with -e 42 which should
be enough for all of our current and future tests.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/common.rc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 62529ee..9fd3130 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -171,7 +171,9 @@ _make_test_img()
 
 # Start an NBD server on the image file, which is what we'll be talking to
 if [ $IMGPROTO = "nbd" ]; then
-eval "$QEMU_NBD -v -t -b 127.0.0.1 -p 10810 -f $IMGFMT  $TEST_IMG_FILE 
>/dev/null &"
+# Pass a sufficiently high number to -e that should be enough for all
+# tests
+eval "$QEMU_NBD -v -t -b 127.0.0.1 -p 10810 -f $IMGFMT -e 42  
$TEST_IMG_FILE >/dev/null &"
 sleep 1 # FIXME: qemu-nbd needs to be listening before we continue
 fi
 
-- 
1.8.3.1

[Qemu-block] [PULL 23/34] block/vhdx: Make vhdx_create() always set errp

2017-04-28 Thread Kevin Wolf

From: Max Reitz 

This patch makes vhdx_create() always set errp in case of an error. It
also adds errp parameters to vhdx_create_bat() and
vhdx_create_new_region_table() so we can pass on the error object
generated by blk_truncate() as of a future commit.

Signed-off-by: Max Reitz 
Reviewed-by: Kevin Wolf 
Message-id: 20170328205129.15138-2-mre...@redhat.com
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Max Reitz 
---
 block/vhdx.c | 23 +++
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/block/vhdx.c b/block/vhdx.c
index 052a753..d25bcd9 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -1586,7 +1586,7 @@ exit:
 static int vhdx_create_bat(BlockBackend *blk, BDRVVHDXState *s,
uint64_t image_size, VHDXImageType type,
bool use_zero_blocks, uint64_t file_offset,
-   uint32_t length)
+   uint32_t length, Error **errp)
 {
 int ret = 0;
 uint64_t data_file_offset;
@@ -1609,14 +1609,19 @@ static int vhdx_create_bat(BlockBackend *blk, 
BDRVVHDXState *s,
  * is the furthest thing we have written yet */
 ret = blk_truncate(blk, data_file_offset);
 if (ret < 0) {
+error_setg_errno(errp, -ret,
+"Failed to resize the underlying file");
 goto exit;
 }
 } else if (type == VHDX_TYPE_FIXED) {
 ret = blk_truncate(blk, data_file_offset + image_size);
 if (ret < 0) {
+error_setg_errno(errp, -ret,
+"Failed to resize the underlying file");
 goto exit;
 }
 } else {
+error_setg(errp, "Unsupported image type");
 ret = -ENOTSUP;
 goto exit;
 }
@@ -1627,6 +1632,7 @@ static int vhdx_create_bat(BlockBackend *blk, 
BDRVVHDXState *s,
 /* for a fixed file, the default BAT entry is not zero */
 s->bat = g_try_malloc0(length);
 if (length && s->bat == NULL) {
+error_setg(errp, "Failed to allocate memory for the BAT");
 ret = -ENOMEM;
 goto exit;
 }
@@ -1646,6 +1652,7 @@ static int vhdx_create_bat(BlockBackend *blk, 
BDRVVHDXState *s,
 }
 ret = blk_pwrite(blk, file_offset, s->bat, length, 0);
 if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to write the BAT");
 goto exit;
 }
 }
@@ -1671,7 +1678,8 @@ static int vhdx_create_new_region_table(BlockBackend *blk,
 uint32_t log_size,
 bool use_zero_blocks,
 VHDXImageType type,
-uint64_t *metadata_offset)
+uint64_t *metadata_offset,
+Error **errp)
 {
 int ret = 0;
 uint32_t offset = 0;
@@ -1740,7 +1748,7 @@ static int vhdx_create_new_region_table(BlockBackend *blk,
 /* The region table gives us the data we need to create the BAT,
  * so do that now */
 ret = vhdx_create_bat(blk, s, image_size, type, use_zero_blocks,
-  bat_file_offset, bat_length);
+  bat_file_offset, bat_length, errp);
 if (ret < 0) {
 goto exit;
 }
@@ -1749,12 +1757,14 @@ static int vhdx_create_new_region_table(BlockBackend 
*blk,
 ret = blk_pwrite(blk, VHDX_REGION_TABLE_OFFSET, buffer,
  VHDX_HEADER_BLOCK_SIZE, 0);
 if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to write first region table");
 goto exit;
 }
 
 ret = blk_pwrite(blk, VHDX_REGION_TABLE2_OFFSET, buffer,
  VHDX_HEADER_BLOCK_SIZE, 0);
 if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to write second region table");
 goto exit;
 }
 
@@ -1825,6 +1835,7 @@ static int vhdx_create(const char *filename, QemuOpts 
*opts, Error **errp)
 ret = -ENOTSUP;
 goto exit;
 } else {
+error_setg(errp, "Invalid subformat '%s'", type);
 ret = -EINVAL;
 goto exit;
 }
@@ -1879,12 +1890,14 @@ static int vhdx_create(const char *filename, QemuOpts 
*opts, Error **errp)
 ret = blk_pwrite(blk, VHDX_FILE_ID_OFFSET, &signature, sizeof(signature),
  0);
 if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to write file signature");
 goto delete_and_exit;
 }
 if (creator) {
 ret = blk_pwrite(blk, VHDX_FILE_ID_OFFSET + sizeof(signature),
  creator, creator_items * sizeof(gunichar2), 0);
 if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to write creator field");
 goto delete_and_exit;
 }
 }
@@ -1893,13 +1906,14 @@ static int vhdx_create(const char *filename, QemuOpts 
*opts, Error **errp)
 /* Creates (B),(C) */
 ret = vhdx_creat

[Qemu-block] [PULL 17/34] iotests: 109: Filter out "len" of failed jobs

2017-04-28 Thread Kevin Wolf

From: Fam Zheng 

Mirror calculates job len from current I/O progress:

s->common.len = s->common.offset +
(cnt + s->sectors_in_flight) * BDRV_SECTOR_SIZE;

The final "len" of a failed mirror job in iotests 109 depends on the
subtle timing of the completion of read and write issued in the first
mirror iteration.  The second iteration may or may not have run when the
I/O error happens, resulting in non-deterministic output of the
BLOCK_JOB_COMPLETED event text.

Similar to what was done in a752e4786, filter out the field to make the
test robust.

Signed-off-by: Fam Zheng 
Reviewed-by: Eric Blake 
Tested-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/109   |  6 --
 tests/qemu-iotests/109.out   | 20 ++--
 tests/qemu-iotests/common.filter |  6 ++
 3 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/tests/qemu-iotests/109 b/tests/qemu-iotests/109
index 6161633..3b496a3 100755
--- a/tests/qemu-iotests/109
+++ b/tests/qemu-iotests/109
@@ -81,7 +81,8 @@ for fmt in qcow qcow2 qed vdi vmdk vpc; do
 
 # This first test should fail: The image format was probed, we may not
 # write an image header at the start of the image
-run_qemu "$TEST_IMG" "$TEST_IMG.src" "" "BLOCK_JOB_ERROR"
+run_qemu "$TEST_IMG" "$TEST_IMG.src" "" "BLOCK_JOB_ERROR" |
+_filter_block_job_len
 $QEMU_IO -c 'read -P 0 0 64k' "$TEST_IMG" | _filter_qemu_io
 
 
@@ -104,7 +105,8 @@ for sample_img in empty.bochs iotest-dirtylog-10G-4M.vhdx 
parallels-v1 \
 _make_test_img 64M
 bzcat "$SAMPLE_IMG_DIR/$sample_img.bz2" > "$TEST_IMG.src"
 
-run_qemu "$TEST_IMG" "$TEST_IMG.src" "" "BLOCK_JOB_ERROR" | 
_filter_block_job_offset
+run_qemu "$TEST_IMG" "$TEST_IMG.src" "" "BLOCK_JOB_ERROR" |
+_filter_block_job_offset | _filter_block_job_len
 $QEMU_IO -c 'read -P 0 0 64k' "$TEST_IMG" | _filter_qemu_io
 
 run_qemu "$TEST_IMG" "$TEST_IMG.src" "'format': 'raw'," "BLOCK_JOB_READY"
diff --git a/tests/qemu-iotests/109.out b/tests/qemu-iotests/109.out
index 55fe536..dc02f9e 100644
--- a/tests/qemu-iotests/109.out
+++ b/tests/qemu-iotests/109.out
@@ -10,7 +10,7 @@ Automatically detecting the format is dangerous for raw 
images, write operations
 Specify the 'raw' format explicitly to remove the restrictions.
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_ERROR", "data": {"device": "src", "operation": "write", "action": 
"report"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_COMPLETED", "data": {"device": "src", "len": 65536, "offset": 0, 
"speed": 0, "type": "mirror", "error": "Operation not permitted"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_COMPLETED", "data": {"device": "src", "len": LEN, "offset": 0, 
"speed": 0, "type": "mirror", "error": "Operation not permitted"}}
 {"return": []}
 read 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
@@ -31,7 +31,7 @@ Automatically detecting the format is dangerous for raw 
images, write operations
 Specify the 'raw' format explicitly to remove the restrictions.
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_ERROR", "data": {"device": "src", "operation": "write", "action": 
"report"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_COMPLETED", "data": {"device": "src", "len": 197120, "offset": 512, 
"speed": 0, "type": "mirror", "error": "Operation not permitted"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_COMPLETED", "data": {"device": "src", "len": LEN, "offset": 512, 
"speed": 0, "type": "mirror", "error": "Operation not permitted"}}
 {"return": []}
 read 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
@@ -52,7 +52,7 @@ Automatically detecting the format is dangerous for raw 
images, write operations
 Specify the 'raw' format explicitly to remove the restrictions.
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_ERROR", "data": {"device": "src", "operation": "write", "action": 
"report"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_COMPLETED", "data": {"device": "src", "len": 327680, "offset": 
262144, "speed": 0, "type": "mirror", "error": "Operation not permitted"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_COMPLETED", "data": {"device": "src", "len": LEN, "offset": 262144, 
"speed": 0, "type": "mirror", "error": "Operation not permitted"}}
 {"return": []}
 read 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
@@ -73,7 +73,7 @@ Automatically detecting the format is dangerous for raw 
images, write operations
 Specify the 'raw' fo

[Qemu-block] [PULL 21/34] qemu-img/convert: Move bs_n > 1 && -B check down

2017-04-28 Thread Kevin Wolf

From: Max Reitz 

It does not make much sense to use a backing image for the target when
you concatenate multiple images (because then there is no correspondence
between the source images' backing files and the target's); but it was
still possible to give one by using -o backing_file=X instead of -B X.

Fix this by moving the check.

(Also, change the error message because -B is not the only way to
 specify the backing file, evidently.)

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 qemu-img.c | 13 +++--
 tests/qemu-iotests/122.out |  4 ++--
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index e0503ae..7044884 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2089,12 +2089,6 @@ static int img_convert(int argc, char **argv)
 }
 
 
-if (s.src_num > 1 && out_baseimg) {
-error_report("-B makes no sense when concatenating multiple input "
- "images");
-goto fail_getopt;
-}
-
 /* ret is still -EINVAL until here */
 ret = bdrv_parse_cache_mode(src_cache, &src_flags, &src_writethrough);
 if (ret < 0) {
@@ -2208,6 +2202,13 @@ static int img_convert(int argc, char **argv)
 }
 s.target_has_backing = (bool) out_baseimg;
 
+if (s.src_num > 1 && out_baseimg) {
+error_report("Having a backing file for the target makes no sense when 
"
+ "concatenating multiple input images");
+ret = -1;
+goto out;
+}
+
 /* Check if compression is supported */
 if (s.compressed) {
 bool encryption =
diff --git a/tests/qemu-iotests/122.out b/tests/qemu-iotests/122.out
index 98814de..9317d80 100644
--- a/tests/qemu-iotests/122.out
+++ b/tests/qemu-iotests/122.out
@@ -61,8 +61,8 @@ read 65536/65536 bytes at offset 4194304
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 read 65536/65536 bytes at offset 8388608
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-qemu-img: -B makes no sense when concatenating multiple input images
-qemu-img: -B makes no sense when concatenating multiple input images
+qemu-img: Having a backing file for the target makes no sense when 
concatenating multiple input images
+qemu-img: Having a backing file for the target makes no sense when 
concatenating multiple input images
 
 === Compression with misaligned allocations and image sizes ===
 
-- 
1.8.3.1

[Qemu-block] [PULL 16/34] iotests: Fix typo in 026

2017-04-28 Thread Kevin Wolf

From: Eric Blake 

s/refcout/refcount/

CC: qemu-triv...@nongnu.org
Signed-off-by: Eric Blake 
Reviewed-by: Laurent Vivier 
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/026 | 2 +-
 tests/qemu-iotests/026.out | 2 +-
 tests/qemu-iotests/026.out.nocache | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/026 b/tests/qemu-iotests/026
index f5a7f02..7fadfba 100755
--- a/tests/qemu-iotests/026
+++ b/tests/qemu-iotests/026
@@ -119,7 +119,7 @@ done
 
 
 echo
-echo === Refcout table growth tests ===
+echo === Refcount table growth tests ===
 echo
 CLUSTER_SIZE=512
 
diff --git a/tests/qemu-iotests/026.out b/tests/qemu-iotests/026.out
index 59b8f74..86a50a2 100644
--- a/tests/qemu-iotests/026.out
+++ b/tests/qemu-iotests/026.out
@@ -462,7 +462,7 @@ Event: cluster_alloc; errno: 28; imm: off; once: off; write 
-b
 write failed: No space left on device
 No errors were found on the image.
 
-=== Refcout table growth tests ===
+=== Refcount table growth tests ===
 
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
diff --git a/tests/qemu-iotests/026.out.nocache 
b/tests/qemu-iotests/026.out.nocache
index b4aeebc..ea2e166 100644
--- a/tests/qemu-iotests/026.out.nocache
+++ b/tests/qemu-iotests/026.out.nocache
@@ -470,7 +470,7 @@ Event: cluster_alloc; errno: 28; imm: off; once: off; write 
-b
 write failed: No space left on device
 No errors were found on the image.
 
-=== Refcout table growth tests ===
+=== Refcount table growth tests ===
 
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
-- 
1.8.3.1

[Qemu-block] [PULL 22/34] qemu-img: Document backing options

2017-04-28 Thread Kevin Wolf

From: Max Reitz 

The create and convert subcommands have shorthands to set the
backing_file and, in the case of create, the backing_fmt options for the
new image. However, they have not been documented so far, which is
remedied by this patch.

Reported-by: Eric Blake 
Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 qemu-img-cmds.hx | 8 
 qemu-img.texi| 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 8ac7822..bf4ce59 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -22,9 +22,9 @@ STEXI
 ETEXI
 
 DEF("create", img_create,
-"create [-q] [--object objectdef] [-f fmt] [-o options] filename [size]")
+"create [-q] [--object objectdef] [-f fmt] [-b backing_file] [-F 
backing_fmt] [-o options] filename [size]")
 STEXI
-@item create [--object @var{objectdef}] [-q] [-f @var{fmt}] [-o @var{options}] 
@var{filename} [@var{size}]
+@item create [--object @var{objectdef}] [-q] [-f @var{fmt}] [-b 
@var{backing_file}] [-F @var{backing_fmt}] [-o @var{options}] @var{filename} 
[@var{size}]
 ETEXI
 
 DEF("commit", img_commit,
@@ -40,9 +40,9 @@ STEXI
 ETEXI
 
 DEF("convert", img_convert,
-"convert [--object objectdef] [--image-opts] [-c] [-p] [-q] [-n] [-f fmt] 
[-t cache] [-T src_cache] [-O output_fmt] [-o options] [-s snapshot_id_or_name] 
[-l snapshot_param] [-S sparse_size] [-m num_coroutines] [-W] filename 
[filename2 [...]] output_filename")
+"convert [--object objectdef] [--image-opts] [-c] [-p] [-q] [-n] [-f fmt] 
[-t cache] [-T src_cache] [-O output_fmt] [-B backing_file] [-o options] [-s 
snapshot_id_or_name] [-l snapshot_param] [-S sparse_size] [-m num_coroutines] 
[-W] filename [filename2 [...]] output_filename")
 STEXI
-@item convert [--object @var{objectdef}] [--image-opts] [-c] [-p] [-q] [-n] 
[-f @var{fmt}] [-t @var{cache}] [-T @var{src_cache}] [-O @var{output_fmt}] [-o 
@var{options}] [-s @var{snapshot_id_or_name}] [-l @var{snapshot_param}] [-S 
@var{sparse_size}] [-m @var{num_coroutines}] [-W] @var{filename} 
[@var{filename2} [...]] @var{output_filename}
+@item convert [--object @var{objectdef}] [--image-opts] [-c] [-p] [-q] [-n] 
[-f @var{fmt}] [-t @var{cache}] [-T @var{src_cache}] [-O @var{output_fmt}] [-B 
@var{backing_file}] [-o @var{options}] [-s @var{snapshot_id_or_name}] [-l 
@var{snapshot_param}] [-S @var{sparse_size}] [-m @var{num_coroutines}] [-W] 
@var{filename} [@var{filename2} [...]] @var{output_filename}
 ETEXI
 
 DEF("dd", img_dd,
diff --git a/qemu-img.texi b/qemu-img.texi
index c81db3e..8c573ae 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -224,7 +224,7 @@ If @code{-r} is specified, exit codes representing the 
image state refer to the
 state after (the attempt at) repairing it. That is, a successful @code{-r all}
 will yield the exit code 0, independently of the image state before.
 
-@item create [-f @var{fmt}] [-o @var{options}] @var{filename} [@var{size}]
+@item create [-f @var{fmt}] [-b @var{backing_file}] [-F @var{backing_fmt}] [-o 
@var{options}] @var{filename} [@var{size}]
 
 Create the new disk image @var{filename} of size @var{size} and format
 @var{fmt}. Depending on the file format, you can add one or more @var{options}
@@ -302,7 +302,7 @@ Error on reading data
 
 @end table
 
-@item convert [-c] [-p] [-n] [-f @var{fmt}] [-t @var{cache}] [-T 
@var{src_cache}] [-O @var{output_fmt}] [-o @var{options}] [-s 
@var{snapshot_id_or_name}] [-l @var{snapshot_param}] [-m @var{num_coroutines}] 
[-W] [-S @var{sparse_size}] @var{filename} [@var{filename2} [...]] 
@var{output_filename}
+@item convert [-c] [-p] [-n] [-f @var{fmt}] [-t @var{cache}] [-T 
@var{src_cache}] [-O @var{output_fmt}] [-B @var{backing_file}] [-o 
@var{options}] [-s @var{snapshot_id_or_name}] [-l @var{snapshot_param}] [-m 
@var{num_coroutines}] [-W] [-S @var{sparse_size}] @var{filename} 
[@var{filename2} [...]] @var{output_filename}
 
 Convert the disk image @var{filename} or a snapshot 
@var{snapshot_param}(@var{snapshot_id_or_name} is deprecated)
 to disk image @var{output_filename} using format @var{output_fmt}. It can be 
optionally compressed (@code{-c}
-- 
1.8.3.1

[Qemu-block] [PULL 18/34] block: Do not unref bs->file on error in BD's open

2017-04-28 Thread Kevin Wolf

From: Max Reitz 

The block layer takes care of removing the bs->file child if the block
driver's bdrv_open()/bdrv_file_open() implementation fails. The block
driver therefore does not need to do so, and indeed should not unless it
sets bs->file to NULL afterwards -- because if this is not done, the
bdrv_unref_child() in bdrv_open_inherit() will dereference the freed
memory block at bs->file afterwards, which is not good.

We can now decide whether to add a "bs->file = NULL;" after each of the
offending bdrv_unref_child() invocations, or just drop them altogether.
The latter is simpler, so let's do that.

Cc: qemu-stable 
Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Kevin Wolf 
---
 block/blkdebug.c  | 4 +---
 block/blkreplay.c | 3 ---
 block/blkverify.c | 3 ---
 3 files changed, 1 insertion(+), 9 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 67e8024..cc4a146 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -389,14 +389,12 @@ static int blkdebug_open(BlockDriverState *bs, QDict 
*options, int flags,
 } else if (align) {
 error_setg(errp, "Invalid alignment");
 ret = -EINVAL;
-goto fail_unref;
+goto out;
 }
 
 ret = 0;
 goto out;
 
-fail_unref:
-bdrv_unref_child(bs, bs->file);
 out:
 if (ret < 0) {
 g_free(s->config_file);
diff --git a/block/blkreplay.c b/block/blkreplay.c
index e110211..6aa5fd4 100755
--- a/block/blkreplay.c
+++ b/block/blkreplay.c
@@ -37,9 +37,6 @@ static int blkreplay_open(BlockDriverState *bs, QDict 
*options, int flags,
 
 ret = 0;
 fail:
-if (ret < 0) {
-bdrv_unref_child(bs, bs->file);
-}
 return ret;
 }
 
diff --git a/block/blkverify.c b/block/blkverify.c
index 9a1e21c..af23281 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -142,9 +142,6 @@ static int blkverify_open(BlockDriverState *bs, QDict 
*options, int flags,
 
 ret = 0;
 fail:
-if (ret < 0) {
-bdrv_unref_child(bs, bs->file);
-}
 qemu_opts_del(opts);
 return ret;
 }
-- 
1.8.3.1

[Qemu-block] [PULL 20/34] qemu-img/convert: Use @opts for one thing only

2017-04-28 Thread Kevin Wolf

From: Max Reitz 

After storing the creation options for the new image into @opts, we
fetch some things for our own information, like the backing file name,
or whether to use encryption or preallocation.

With the -n parameter, there will not be any creation options; this is
not too bad because this just means that querying a NULL @opts will
always return the default value.

However, we also use @opts for the --object options. Therefore, @opts is
not necessarily NULL if -n was specified; instead, it may contain those
options. In practice, this probably does not cause any problems because
there most likely is no object that supports any of the parameters we
query here, but this is neither something we should rely on nor does
this variable reuse make the code very nice to read.

Therefore, just use a separate variable for the --object options.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 qemu-img.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index b94fc11..e0503ae 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2049,13 +2049,15 @@ static int img_convert(int argc, char **argv)
 case 'W':
 s.wr_in_order = false;
 break;
-case OPTION_OBJECT:
-opts = qemu_opts_parse_noisily(&qemu_object_opts,
-   optarg, true);
-if (!opts) {
+case OPTION_OBJECT: {
+QemuOpts *object_opts;
+object_opts = qemu_opts_parse_noisily(&qemu_object_opts,
+  optarg, true);
+if (!object_opts) {
 goto fail_getopt;
 }
 break;
+}
 case OPTION_IMAGE_OPTS:
 image_opts = true;
 break;
-- 
1.8.3.1

[Qemu-block] [PULL 12/34] qemu_iotests: Remove _readlink()

2017-04-28 Thread Kevin Wolf

It is unused.

Suggested-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Fam Zheng 
---
 tests/qemu-iotests/common.config | 18 --
 1 file changed, 18 deletions(-)

diff --git a/tests/qemu-iotests/common.config b/tests/qemu-iotests/common.config
index 5ee00a1..d1b45f5 100644
--- a/tests/qemu-iotests/common.config
+++ b/tests/qemu-iotests/common.config
@@ -217,23 +217,5 @@ fi
 
 export SAMPLE_IMG_DIR
 
-_readlink()
-{
-if [ $# -ne 1 ]; then
-echo "Usage: _readlink filename" 1>&2
-exit 1
-fi
-
-perl -e "\$in=\"$1\";" -e '
-$lnk = readlink($in);
-if ($lnk =~ m!^/.*!) {
-print "$lnk\n";
-}
-else {
-chomp($dir = `dirname $in`);
-print "$dir/$lnk\n";
-}'
-}
-
 # make sure this script returns success
 true
-- 
1.8.3.1

[Qemu-block] [PULL 13/34] block: Remove NULL check in bdrv_co_flush

2017-04-28 Thread Kevin Wolf

From: Fam Zheng 

Reported by Coverity. We already use bs in bdrv_inc_in_flight before
checking for NULL. It is unnecessary as all callers pass non-NULL bs, so
drop it.

Signed-off-by: Fam Zheng 
Reviewed-by: Max Reitz 
Signed-off-by: Kevin Wolf 
---
 block/io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index a32fd1d..80d6c62 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2300,7 +2300,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
 
 bdrv_inc_in_flight(bs);
 
-if (!bs || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs) ||
+if (!bdrv_is_inserted(bs) || bdrv_is_read_only(bs) ||
 bdrv_is_sg(bs)) {
 goto early_exit;
 }
-- 
1.8.3.1

[Qemu-block] [PULL 15/34] Issue a deprecation warning if the user specifies the "-hdachs" option.

2017-04-28 Thread Kevin Wolf

From: Thomas Huth 

If the user needs to specify the disk geometry, the corresponding
parameters of the "-device ide-hd" option should be used instead.
"-hdachs" is considered as deprecated and might be removed soon.

Signed-off-by: Thomas Huth 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 qemu-options.hx | 4 ++--
 vl.c| 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 787b9c3..f68829f 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -803,8 +803,8 @@ STEXI
 Force hard disk 0 physical geometry (1 <= @var{c} <= 16383, 1 <=
 @var{h} <= 16, 1 <= @var{s} <= 63) and optionally force the BIOS
 translation mode (@var{t}=none, lba or auto). Usually QEMU can guess
-all those parameters. This option is useful for old MS-DOS disk
-images.
+all those parameters. This option is deprecated, please use
+@code{-device ide-hd,cyls=c,heads=h,secs=s,...} instead.
 ETEXI
 
 DEF("fsdev", HAS_ARG, QEMU_OPTION_fsdev,
diff --git a/vl.c b/vl.c
index f46e070..42d4bce 100644
--- a/vl.c
+++ b/vl.c
@@ -3231,6 +3231,8 @@ int main(int argc, char **argv, char **envp)
 }
 }
 }
+error_report("'-hdachs' is deprecated, please use '-device"
+ " ide-hd,cyls=c,heads=h,secs=s,...' instead");
 break;
 case QEMU_OPTION_numa:
 opts = qemu_opts_parse_noisily(qemu_find_opts("numa"),
-- 
1.8.3.1

[Qemu-block] [PULL 10/34] iotests/051: Add test for empty filename

2017-04-28 Thread Kevin Wolf

From: Max Reitz 

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/051| 1 +
 tests/qemu-iotests/051.out| 3 +++
 tests/qemu-iotests/051.pc.out | 3 +++
 3 files changed, 7 insertions(+)

diff --git a/tests/qemu-iotests/051 b/tests/qemu-iotests/051
index 7c78a02..26c29de 100755
--- a/tests/qemu-iotests/051
+++ b/tests/qemu-iotests/051
@@ -232,6 +232,7 @@ echo === Leaving out required options ===
 echo
 
 run_qemu -drive driver=file
+run_qemu -drive driver=file,filename=
 run_qemu -drive driver=nbd
 run_qemu -drive driver=raw
 run_qemu -drive file.driver=file
diff --git a/tests/qemu-iotests/051.out b/tests/qemu-iotests/051.out
index 3b7b040..4d3b1ff 100644
--- a/tests/qemu-iotests/051.out
+++ b/tests/qemu-iotests/051.out
@@ -221,6 +221,9 @@ QEMU X.Y.Z monitor - type 'help' for more information
 Testing: -drive driver=file
 QEMU_PROG: -drive driver=file: The 'file' block driver requires a file name
 
+Testing: -drive driver=file,filename=
+QEMU_PROG: -drive driver=file,filename=: The 'file' block driver requires a 
file name
+
 Testing: -drive driver=nbd
 QEMU_PROG: -drive driver=nbd: NBD server address missing
 
diff --git a/tests/qemu-iotests/051.pc.out b/tests/qemu-iotests/051.pc.out
index 7f9a121..76d7205 100644
--- a/tests/qemu-iotests/051.pc.out
+++ b/tests/qemu-iotests/051.pc.out
@@ -319,6 +319,9 @@ QEMU X.Y.Z monitor - type 'help' for more information
 Testing: -drive driver=file
 QEMU_PROG: -drive driver=file: The 'file' block driver requires a file name
 
+Testing: -drive driver=file,filename=
+QEMU_PROG: -drive driver=file,filename=: The 'file' block driver requires a 
file name
+
 Testing: -drive driver=nbd
 QEMU_PROG: -drive driver=nbd: NBD server address missing
 
-- 
1.8.3.1

[Qemu-block] [PULL 11/34] qemu-iotests: Remove PERL_PROG and BC_PROG

2017-04-28 Thread Kevin Wolf

We test for the presence of perl and bc and save their path in the
variables PERL_PROG and BC_PROG, but never actually make use of them.
Remove the checks and assignments so qemu-iotests can run even when
bc isn't installed.

Reported-by: Yash Mankad 
Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Fam Zheng 
---
 tests/qemu-iotests/common.config | 6 --
 1 file changed, 6 deletions(-)

diff --git a/tests/qemu-iotests/common.config b/tests/qemu-iotests/common.config
index c4b51b3..5ee00a1 100644
--- a/tests/qemu-iotests/common.config
+++ b/tests/qemu-iotests/common.config
@@ -75,18 +75,12 @@ _fatal()
 exit 1
 }
 
-export PERL_PROG="`set_prog_path perl`"
-[ "$PERL_PROG" = "" ] && _fatal "perl not found"
-
 export AWK_PROG="`set_prog_path awk`"
 [ "$AWK_PROG" = "" ] && _fatal "awk not found"
 
 export SED_PROG="`set_prog_path sed`"
 [ "$SED_PROG" = "" ] && _fatal "sed not found"
 
-export BC_PROG="`set_prog_path bc`"
-[ "$BC_PROG" = "" ] && _fatal "bc not found"
-
 export PS_ALL_FLAGS="-ef"
 
 if [ -z "$QEMU_PROG" ]; then
-- 
1.8.3.1

[Qemu-block] [PULL 03/34] file-win32: Remove unnecessary include

2017-04-28 Thread Kevin Wolf

Signed-off-by: Kevin Wolf 
---
 block/file-win32.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/block/file-win32.c b/block/file-win32.c
index 800fabd..e132ba1 100644
--- a/block/file-win32.c
+++ b/block/file-win32.c
@@ -24,7 +24,6 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "qemu/cutils.h"
-#include "qemu/timer.h"
 #include "block/block_int.h"
 #include "qemu/module.h"
 #include "block/raw-aio.h"
-- 
1.8.3.1

[Qemu-block] [PULL 09/34] block: An empty filename counts as no filename

2017-04-28 Thread Kevin Wolf

From: Max Reitz 

Reproducer:
$ ./qemu-img info ''
qemu-img: ./block.c:1008: bdrv_open_driver: Assertion
`!drv->bdrv_needs_filename || bs->filename[0]' failed.
[1]26105 abort (core dumped)  ./qemu-img info ''

This patch fixes this to be:
$ ./qemu-img info ''
qemu-img: Could not open '': The 'file' block driver requires a file
name

Cc: qemu-stable 
Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Fam Zheng 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Kevin Wolf 
---
 block.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block.c b/block.c
index ceaca44..7b557f3 100644
--- a/block.c
+++ b/block.c
@@ -1204,7 +1204,7 @@ static int bdrv_open_common(BlockDriverState *bs, 
BlockBackend *file,
 filename = qdict_get_try_str(options, "filename");
 }
 
-if (drv->bdrv_needs_filename && !filename) {
+if (drv->bdrv_needs_filename && (!filename || !filename[0])) {
 error_setg(errp, "The '%s' block driver requires a file name",
drv->format_name);
 ret = -EINVAL;
-- 
1.8.3.1

[Qemu-block] [PULL 08/34] qemu-iotests: Test postcopy migration

2017-04-28 Thread Kevin Wolf

Signed-off-by: Kevin Wolf 
Reviewed-by: Dr. David Alan Gilbert 
---
 tests/qemu-iotests/181 | 119 +
 tests/qemu-iotests/181.out |  38 +++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 158 insertions(+)
 create mode 100755 tests/qemu-iotests/181
 create mode 100644 tests/qemu-iotests/181.out

diff --git a/tests/qemu-iotests/181 b/tests/qemu-iotests/181
new file mode 100755
index 000..e969a2a
--- /dev/null
+++ b/tests/qemu-iotests/181
@@ -0,0 +1,119 @@
+#!/bin/bash
+#
+# Test postcopy live migration with shared storage
+#
+# Copyright (C) 2017 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=kw...@redhat.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+status=1   # failure is the default!
+
+MIG_SOCKET="${TEST_DIR}/migrate"
+
+_cleanup()
+{
+rm -f "${MIG_SOCKET}"
+   _cleanup_test_img
+_cleanup_qemu
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+. ./common.qemu
+
+_supported_fmt generic
+_supported_proto generic
+_supported_os Linux
+
+size=64M
+_make_test_img $size
+
+echo
+echo === Starting VMs ===
+echo
+
+qemu_comm_method="monitor"
+
+_launch_qemu \
+-drive file="${TEST_IMG}",cache=${CACHEMODE},driver=$IMGFMT,id=disk
+src=$QEMU_HANDLE
+
+_launch_qemu \
+-drive file="${TEST_IMG}",cache=${CACHEMODE},driver=$IMGFMT,id=disk \
+-incoming "unix:${MIG_SOCKET}"
+dest=$QEMU_HANDLE
+
+echo
+echo === Write something on the source ===
+echo
+
+silent=
+_send_qemu_cmd $src 'qemu-io disk "write -P 0x55 0 64k"' "(qemu)"
+_send_qemu_cmd $src "" "ops/sec"
+_send_qemu_cmd $src 'qemu-io disk "read -P 0x55 0 64k"' "(qemu)"
+_send_qemu_cmd $src "" "ops/sec"
+
+echo
+echo === Do postcopy migration to destination ===
+echo
+
+# Slow down migration so much that it definitely won't finish before we can
+# switch to postcopy
+silent=yes
+_send_qemu_cmd $src 'migrate_set_speed 4k' "(qemu)"
+_send_qemu_cmd $src 'migrate_set_capability postcopy-ram on' "(qemu)"
+_send_qemu_cmd $src "migrate -d unix:${MIG_SOCKET}" "(qemu)"
+_send_qemu_cmd $src 'migrate_start_postcopy' "(qemu)"
+
+QEMU_COMM_TIMEOUT=1 qemu_cmd_repeat=10 silent=yes \
+_send_qemu_cmd $src "info migrate" "completed\|failed"
+silent=yes _send_qemu_cmd $src "" "(qemu)"
+
+echo
+echo === Do some I/O on the destination ===
+echo
+
+# It is important that we use the BlockBackend of the guest device here instead
+# of the node name, which would create a new BlockBackend and not test whether
+# the guest has the necessary permissions to access the image now
+silent=
+_send_qemu_cmd $dest 'qemu-io disk "read -P 0x55 0 64k"' "(qemu)"
+_send_qemu_cmd $dest "" "ops/sec"
+_send_qemu_cmd $dest 'qemu-io disk "write -P 0x66 1M 64k"' "(qemu)"
+_send_qemu_cmd $dest "" "ops/sec"
+
+echo
+echo === Shut down and check image ===
+echo
+
+_send_qemu_cmd $src 'quit' ""
+_send_qemu_cmd $dest 'quit' ""
+wait=1 _cleanup_qemu
+
+_check_test_img
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/181.out b/tests/qemu-iotests/181.out
new file mode 100644
index 000..6534ba2
--- /dev/null
+++ b/tests/qemu-iotests/181.out
@@ -0,0 +1,38 @@
+QA output created by 181
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+
+=== Starting VMs ===
+
+
+=== Write something on the source ===
+
+QEMU X.Y.Z monitor - type 'help' for more information
+(qemu) qemu-io disk "write -P 0x55 0 64k"
+wrote 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+(qemu) 
+(qemu) qemu-io disk "read -P 0x55 0 64k"
+read 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+=== Do postcopy migration to destination ===
+
+
+=== Do some I/O on the destination ===
+
+QEMU X.Y.Z monitor - type 'help' for more information
+(qemu) qemu-io disk "read -P 0x55 0 64k"
+read 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+(qemu) 
+(qemu) qemu-io disk "write -P 0x66 1M 64k"
+wrote 65536/65536 bytes at offset 1048576
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+=== Shut down and check image ===
+
+(qemu) quit
+(qemu) 
+(qemu) quit
+No errors were found on the image.
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotes

[Qemu-block] [PULL 05/34] qemu-img: simplify img_convert

2017-04-28 Thread Kevin Wolf

From: Peter Lieven 

img_convert has been around before there was an ImgConvertState or
a block backend, but it has never been modified to directly use
these structs. Change this by parsing parameters directly into
the ImgConvertState and directly use BlockBackend where possible.
Furthermore variable initialization has been reworked and sorted.

Signed-off-by: Peter Lieven 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 qemu-img.c | 201 +
 1 file changed, 81 insertions(+), 120 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index bbe1574..b94fc11 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1522,7 +1522,7 @@ typedef struct ImgConvertState {
 int min_sparse;
 size_t cluster_sectors;
 size_t buf_sectors;
-int num_coroutines;
+long num_coroutines;
 int running_coroutines;
 Coroutine *co[MAX_COROUTINES];
 int64_t wait_sector_num[MAX_COROUTINES];
@@ -1916,39 +1916,29 @@ static int convert_do_copy(ImgConvertState *s)
 
 static int img_convert(int argc, char **argv)
 {
-int c, bs_n, bs_i, compress, cluster_sectors, skip_create;
-int64_t ret = 0;
-int progress = 0, flags, src_flags;
-bool writethrough, src_writethrough;
-const char *fmt, *out_fmt, *cache, *src_cache, *out_baseimg, *out_filename;
+int c, bs_i, flags, src_flags = 0;
+const char *fmt = NULL, *out_fmt = "raw", *cache = "unsafe",
+   *src_cache = BDRV_DEFAULT_CACHE, *out_baseimg = NULL,
+   *out_filename, *out_baseimg_param, *snapshot_name = NULL;
 BlockDriver *drv, *proto_drv;
-BlockBackend **blk = NULL, *out_blk = NULL;
-BlockDriverState **bs = NULL, *out_bs = NULL;
-int64_t total_sectors;
-int64_t *bs_sectors = NULL;
-size_t bufsectors = IO_BUF_SIZE / BDRV_SECTOR_SIZE;
 BlockDriverInfo bdi;
-QemuOpts *opts = NULL;
+BlockDriverState *out_bs;
+QemuOpts *opts = NULL, *sn_opts = NULL;
 QemuOptsList *create_opts = NULL;
-const char *out_baseimg_param;
 char *options = NULL;
-const char *snapshot_name = NULL;
-int min_sparse = 8; /* Need at least 4k of zeros for sparse detection */
-bool quiet = false;
 Error *local_err = NULL;
-QemuOpts *sn_opts = NULL;
-ImgConvertState state;
-bool image_opts = false;
-bool wr_in_order = true;
-long num_coroutines = 8;
+bool writethrough, src_writethrough, quiet = false, image_opts = false,
+ skip_create = false, progress = false;
+int64_t ret = -EINVAL;
+
+ImgConvertState s = (ImgConvertState) {
+/* Need at least 4k of zeros for sparse detection */
+.min_sparse = 8,
+.buf_sectors= IO_BUF_SIZE / BDRV_SECTOR_SIZE,
+.wr_in_order= true,
+.num_coroutines = 8,
+};
 
-fmt = NULL;
-out_fmt = "raw";
-cache = "unsafe";
-src_cache = BDRV_DEFAULT_CACHE;
-out_baseimg = NULL;
-compress = 0;
-skip_create = 0;
 for(;;) {
 static const struct option long_options[] = {
 {"help", no_argument, 0, 'h'},
@@ -1981,22 +1971,19 @@ static int img_convert(int argc, char **argv)
 out_baseimg = optarg;
 break;
 case 'c':
-compress = 1;
+s.compressed = true;
 break;
 case 'e':
 error_report("option -e is deprecated, please use \'-o "
   "encryption\' instead!");
-ret = -1;
 goto fail_getopt;
 case '6':
 error_report("option -6 is deprecated, please use \'-o "
   "compat6\' instead!");
-ret = -1;
 goto fail_getopt;
 case 'o':
 if (!is_valid_option_list(optarg)) {
 error_report("Invalid option list: %s", optarg);
-ret = -1;
 goto fail_getopt;
 }
 if (!options) {
@@ -2017,7 +2004,6 @@ static int img_convert(int argc, char **argv)
 if (!sn_opts) {
 error_report("Failed in parsing snapshot param '%s'",
  optarg);
-ret = -1;
 goto fail_getopt;
 }
 } else {
@@ -2031,15 +2017,14 @@ static int img_convert(int argc, char **argv)
 sval = cvtnum(optarg);
 if (sval < 0) {
 error_report("Invalid minimum zero buffer size for sparse 
output specified");
-ret = -1;
 goto fail_getopt;
 }
 
-min_sparse = sval / BDRV_SECTOR_SIZE;
+s.min_sparse = sval / BDRV_SECTOR_SIZE;
 break;
 }
 case 'p':
-progress = 1;
+progress = true;
 break;
 case 't':
 cache = optarg;
@@ -2051,19 +2036,18 @@ static int img_convert(int argc, char **argv)
 quiet = true;
 break;
 case 'n

[Qemu-block] [PULL 06/34] migration: Call blk_resume_after_migration() for postcopy

2017-04-28 Thread Kevin Wolf

Commit d35ff5e6 ('block: Ignore guest dev permissions during incoming
migration') added blk_resume_after_migration() to the precopy migration
path, but neglected to add it to the duplicated code that is used for
postcopy migration. This means that the guest device doesn't request the
necessary permissions, which ultimately led to failing assertions.

Add the missing blk_resume_after_migration() to the postcopy path.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
---
 migration/savevm.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index 03ae1bd..a00c1ab 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1622,6 +1622,14 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
 error_report_err(local_err);
 }
 
+/* If we get an error here, just don't restart the VM yet. */
+blk_resume_after_migration(&local_err);
+if (local_err) {
+error_free(local_err);
+local_err = NULL;
+autostart = false;
+}
+
 trace_loadvm_postcopy_handle_run_cpu_sync();
 cpu_synchronize_all_post_init();
 
-- 
1.8.3.1

[Qemu-block] [PULL 04/34] Revert "block/io: Comment out permission assertions"

2017-04-28 Thread Kevin Wolf

From: Max Reitz 

This reverts commit e3e0003a8f6570aba1421ef99a0b383a43371a74.

This commit was necessary for the 2.9 release because we were unable to
fix the underlying issue(s) in time. However, we will be for 2.10.

Signed-off-by: Max Reitz 
Acked-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 block.c|  6 +-
 block/io.c | 12 ++--
 2 files changed, 3 insertions(+), 15 deletions(-)

diff --git a/block.c b/block.c
index 5db266b..ceaca44 100644
--- a/block.c
+++ b/block.c
@@ -3313,11 +3313,7 @@ int bdrv_truncate(BdrvChild *child, int64_t offset)
 BlockDriver *drv = bs->drv;
 int ret;
 
-/* FIXME: Some format block drivers use this function instead of implicitly
- *growing their file by writing beyond its end.
- *See bdrv_aligned_pwritev() for an explanation why we currently
- *cannot assert this permission in that case. */
-// assert(child->perm & BLK_PERM_RESIZE);
+assert(child->perm & BLK_PERM_RESIZE);
 
 if (!drv)
 return -ENOMEDIUM;
diff --git a/block/io.c b/block/io.c
index a7142e0..a32fd1d 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1362,16 +1362,8 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild 
*child,
 assert(!waited || !req->serialising);
 assert(req->overlap_offset <= offset);
 assert(offset + bytes <= req->overlap_offset + req->overlap_bytes);
-/* FIXME: Block migration uses the BlockBackend of the guest device at a
- *point when it has not yet taken write permissions. This will be
- *fixed by a future patch, but for now we have to bypass this
- *assertion for block migration to work. */
-// assert(child->perm & BLK_PERM_WRITE);
-/* FIXME: Because of the above, we also cannot guarantee that all format
- *BDS take the BLK_PERM_RESIZE permission on their file BDS, since
- *they are not obligated to do so if they do not have any parent
- *that has taken the permission to write to them. */
-// assert(end_sector <= bs->total_sectors || child->perm & 
BLK_PERM_RESIZE);
+assert(child->perm & BLK_PERM_WRITE);
+assert(end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE);
 
 ret = notifier_with_return_list_notify(&bs->before_write_notifiers, req);
 
-- 
1.8.3.1

[Qemu-block] [PULL 00/34] Block layer patches

2017-04-28 Thread Kevin Wolf

The following changes since commit 81b2d5ceb0cfb4cdc2163492e3169ed714b0cda9:

  Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20170426' into 
staging (2017-04-26 20:50:49 +0100)

are available in the git repository at:


  git://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to 5fc0fe383fff318b38291dcdf2cf38e329ec232a:

  Merge remote-tracking branch 'mreitz/tags/pull-block-2017-04-28' into 
queue-block (2017-04-28 20:52:17 +0200)



Block layer patches


Denis V. Lunev (2):
  block: fix alignment calculations in bdrv_co_do_zero_pwritev
  block: assert no image modification under BDRV_O_INACTIVE

Eric Blake (2):
  iotests: Fix typo in 026
  qcow2: Allow discard of final unaligned cluster

Fam Zheng (2):
  block: Remove NULL check in bdrv_co_flush
  iotests: 109: Filter out "len" of failed jobs

John Snow (2):
  iotests: clarify help text
  iotests: fix exclusion option

Kevin Wolf (8):
  file-posix: Remove unnecessary includes
  file-win32: Remove unnecessary include
  migration: Call blk_resume_after_migration() for postcopy
  qemu-iotests: Filter HMP readline escape characters
  qemu-iotests: Test postcopy migration
  qemu-iotests: Remove PERL_PROG and BC_PROG
  qemu_iotests: Remove _readlink()
  Merge remote-tracking branch 'mreitz/tags/pull-block-2017-04-28' into 
queue-block

Klim Kireev (1):
  block: fix obvious coding style mistakes in block_int.h

Krzysztof Kozlowski (1):
  block: Constify data passed by pointer to blk_name

Lidong Chen (1):
  qemu-img: use blk_co_pwrite_zeroes for zero sectors when compressed

Max Reitz (13):
  Revert "block/io: Comment out permission assertions"
  block: An empty filename counts as no filename
  iotests/051: Add test for empty filename
  iotests: Launch qemu-nbd with -e 42
  block: Do not unref bs->file on error in BD's open
  qemu-img/convert: Use @opts for one thing only
  qemu-img/convert: Move bs_n > 1 && -B check down
  qemu-img: Document backing options
  block/vhdx: Make vhdx_create() always set errp
  block: Add errp to b{lk,drv}_truncate()
  block: Add errp to BD.bdrv_truncate()
  block: Add .bdrv_truncate() error messages
  progress: Show current progress on SIGINFO

Peter Lieven (1):
  qemu-img: simplify img_convert

Thomas Huth (1):
  Issue a deprecation warning if the user specifies the "-hdachs" option.

Vladimir Sementsov-Ogievskiy (1):
  qemu-img: improve convert_iteration_sectors()

 block.c|  26 +--
 block/blkdebug.c   |   8 +-
 block/blkreplay.c  |   3 -
 block/blkverify.c  |   3 -
 block/block-backend.c  |   7 +-
 block/commit.c |   5 +-
 block/crypto.c |   5 +-
 block/file-posix.c |  21 ++-
 block/file-win32.c |   7 +-
 block/gluster.c|   7 +-
 block/io.c |  16 +-
 block/iscsi.c  |   6 +-
 block/mirror.c |   2 +-
 block/nfs.c|  12 +-
 block/parallels.c  |  13 +-
 block/qcow.c   |   6 +-
 block/qcow2-refcount.c |   5 +-
 block/qcow2.c  |  31 ++--
 block/qed.c|   8 +-
 block/raw-format.c |   6 +-
 block/rbd.c|   3 +-
 block/sheepdog.c   |  14 +-
 block/vdi.c|   4 +-
 block/vhdx-log.c   |   2 +-
 block/vhdx.c   |  25 ++-
 block/vmdk.c   |  13 +-
 block/vpc.c|  13 +-
 blockdev.c |  21 +--
 include/block/block.h  |   2 +-
 include/block/block_int.h  |   8 +-
 include/sysemu/block-backend.h |   4 +-
 migration/savevm.c |   8 +
 qemu-img-cmds.hx   |   8 +-
 qemu-img.c | 313 ++---
 qemu-img.texi  |   7 +-
 qemu-io-cmds.c |   5 +-
 qemu-options.hx|   4 +-
 tests/qemu-iotests/026 |   2 +-
 tests/qemu-iotests/026.out |   2 +-
 tests/qemu-iotests/026.out.nocache |   2 +-
 tests/qemu-iotests/028.out |   2 +-
 tests/qemu-iotests/051 |   4 +-
 tests/qemu-iotests/051.out | 109 ++---
 tests/qemu-iotests/051.pc.out  | 135 
 tests/qemu-iotests/066 |  12 +-
 tests/qemu-iotests/066.out |  12 +-
 tests/qemu-iotests/068 |   4 +-
 tests/qemu-iotests/068.out |   6 +-
 tests/qemu-iotests/109 |   6 +-
 tests/qemu-iotests/109.out |  20

[Qemu-block] [PULL 02/34] file-posix: Remove unnecessary includes

2017-04-28 Thread Kevin Wolf

Signed-off-by: Kevin Wolf 
---
 block/file-posix.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 0c48968..ade71db 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -25,8 +25,6 @@
 #include "qapi/error.h"
 #include "qemu/cutils.h"
 #include "qemu/error-report.h"
-#include "qemu/timer.h"
-#include "qemu/log.h"
 #include "block/block_int.h"
 #include "qemu/module.h"
 #include "trace.h"
-- 
1.8.3.1

[Qemu-block] [PULL 01/34] block: Constify data passed by pointer to blk_name

2017-04-28 Thread Kevin Wolf

From: Krzysztof Kozlowski 

blk_name() is not modifying data passed to it through pointer and it
returns also a pointer to const so the argument can be made const for
code safeness.

Signed-off-by: Krzysztof Kozlowski 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Kevin Wolf 
---
 block/block-backend.c  | 2 +-
 include/sysemu/block-backend.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 7405024..ae3d771 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -420,7 +420,7 @@ void monitor_remove_blk(BlockBackend *blk)
  * Return @blk's name, a non-null string.
  * Returns an empty string iff @blk is not referenced by the monitor.
  */
-const char *blk_name(BlockBackend *blk)
+const char *blk_name(const BlockBackend *blk)
 {
 return blk->name ?: "";
 }
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 7462228..133e542 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -99,7 +99,7 @@ int blk_get_refcnt(BlockBackend *blk);
 void blk_ref(BlockBackend *blk);
 void blk_unref(BlockBackend *blk);
 void blk_remove_all_bs(void);
-const char *blk_name(BlockBackend *blk);
+const char *blk_name(const BlockBackend *blk);
 BlockBackend *blk_by_name(const char *name);
 BlockBackend *blk_next(BlockBackend *blk);
 bool monitor_add_blk(BlockBackend *blk, const char *name, Error **errp);
-- 
1.8.3.1

Re: [Qemu-block] [Qemu-devel] [PATCH v10 01/17] block: Update comments on BDRV_BLOCK_* meanings

2017-04-28 Thread Eric Blake

On 04/26/2017 08:46 PM, Eric Blake wrote:
> We had some conflicting documentation: a nice 8-way table that
> described all possible combinations of DATA, ZERO, and
> OFFSET_VALID, couple with text that implied that OFFSET_VALID
> always meant raw data could be read directly.  As the 8-way
> table is the intended semantics, simplify the rest of the text
> to get rid of the confusion.
> 
> Suggested-by: Max Reitz 
> Signed-off-by: Eric Blake 
> 
> ---

>  /*
>   * Allocation status flags
> - * BDRV_BLOCK_DATA: data is read from a file returned by 
> bdrv_get_block_status.
> + * BDRV_BLOCK_DATA: data is read from a file returned by 
> bdrv_get_block_status

I guess we always return the BDS where the data is read from, even when
we can't actually provide an offset to that data (such as when the
actual data is encrypted or compressed, and therefore has no raw
offset).  But I'm still wondering if this line can be shortened.

>   * BDRV_BLOCK_ZERO: sectors read as zero
> - * BDRV_BLOCK_OFFSET_VALID: sector stored as raw data in a file returned by
> - *  bdrv_get_block_status.

I think when this was originally written, we blindly assumed that offset
pointed into bs->file.  Then with the exposure of the BDS graph, we
changed bdrv_get_block_status() to have a BlockDriverState **file
parameter (offset points into the returned file), since bs->file need
not always be the right BDS.  This old comment is on track...

> + * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing raw 
> data
>   * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
>   *   layer (as opposed to the backing file)
>   * BDRV_BLOCK_RAW: used internally to indicate that the request
>   * was answered by the raw driver and that one
>   * should look in bs->file directly.
>   *
> - * If BDRV_BLOCK_OFFSET_VALID is set, bits 9-62 represent the offset in
> - * bs->file where sector data can be read from as raw data.

...but this one is stale...

> - *
>   * DATA == 0 && ZERO == 0 means that data is read from backing_hd if present.
>   *
> + * If BDRV_BLOCK_OFFSET_VALID is set, bits 9-62 represent the offset
> + * in bs->file that is allocated for the corresponding raw data;

...and I reused the stale wording.  Oops.

> + * however, whether that offset actually contains data also depends on
> + * BDRV_BLOCK_DATA, as follows:
> + *
>   * DATA ZERO OFFSET_VALID
>   *  ttt   sectors read as zero, bs->file is zero at offset
>   *  tft   sectors read as valid from bs->file at offset

And these references to bs->file are stale too.

Guess I have more to fix on the respin.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v10 07/17] qemu-io: Switch 'alloc' command to byte-based length

2017-04-28 Thread Max Reitz

On 28.04.2017 21:59, Eric Blake wrote:
> On 04/28/2017 02:46 PM, Max Reitz wrote:
>> On 27.04.2017 03:46, Eric Blake wrote:
>>> For the 'alloc' command, accepting an offset in bytes but a length
>>> in sectors, and reporting output in sectors, is confusing.  Do
>>> everything in bytes, and adjust the expected output accordingly.
>>>
>>> Signed-off-by: Eric Blake 
>>> Reviewed-by: Philippe Mathieu-Daudé 
>>>
> 
>>>  }
>>> +if (!QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE)) {
>>> +printf("bytes %" PRId64 " is not sector aligned\n",
>>
>> This isn't real English. :-)
> 
> But, it's just copy-and-paste from the other instances you just reviewed
> in 6/17!  [Translation - if I change this one, I also get to redo that one]

No, those are "offset" and "count" -- all singular. "bytes" is plural. ;-)

> 
> Which of these various alternatives (if any) looks better:
> 
> bytes=511 is not sector-aligned
> 511 is not a sector-aligned value for 'bytes'
> requested 'bytes' of 511 is not sector-aligned
> alignment error: 511 bytes is not sector-aligned> 'bytes' must be 
> sector-aligned: 511
> your clever entry here...

How about "byte count" instead of "bytes" or "bytes value", if really
want to have the exact spelling in there?

For your entries above: (1) and (2) work for me (I like (2) a bit
better), (3) doesn't sound like real English either, and it should be
s/is/are/ in (4) (but it still sounds off with that change). (5) I
mostly dislike because I dislike error message of the form "This should
be X: $foo", I like "$foo is not X" better.

>> With that fixed (somehow, you know better than me how to):
> 
> Re-reading my various alternatives, I do think that /sector
> aligned/sector-aligned/ helps no matter what; and that the remaining
> trick is to use quoting or '=' or some other lexical trick to make it
> obvious that 'bytes' is a parameter name whose value 511 is invalid,
> rather than part of the actual error of a value that is not properly
> aligned.

First of all, I think the user is clever enough to figure out that a
description like "byte count" refers to the "bytes" parameter. ;-)

Secondly, this is even more true because this is a debugging tool so we
don't have to worry too much about... Let's say inexperienced users.

Max

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v10 08/17] qemu-io: Switch 'map' output to byte-based reporting

2017-04-28 Thread Eric Blake

On 04/28/2017 02:53 PM, Max Reitz wrote:
> On 27.04.2017 03:46, Eric Blake wrote:
>> Mixing byte offset and sector allocation counts is a bit
>> confusing.  Also, reporting n/m sectors, where m decreases
>> according to the remaining size of the file, isn't really
>> adding any useful information.
> 
> Since this map doesn't leave out any range in the image file -- not
> really, no. :-)
> 
>> Update the output to use
>> byte counts, and adjust the affected tests (./check -qcow2 102,
>> ./check -vpc 146).

My commit message is stale if test 179 gets committed first (although,
as we mentioned earlier, I'm trying to split this series in two, which
would delay the introduction of 179...)

I still wonder if an even more concise representation is worth it; the
resulting output lines are still quite long.  We don't make any
guarantees about qemu-io back-compat, so I may still tweak it further
(but if I do, I'll drop R-b)

I should probably also add a blurb to the commit message about how
'qemu-img map' and 'qemu-io map' are two different beasts.

>>
>> Signed-off-by: Eric Blake 
>>
>> ---
>> v10: rebase to updated test 179
>> v9: new patch
>> ---
>>  qemu-io-cmds.c |  5 ++---
>>  tests/qemu-iotests/102.out |  4 ++--
>>  tests/qemu-iotests/146.out | 30 +++---
>>  tests/qemu-iotests/179.out | 22 +++---
>>  4 files changed, 30 insertions(+), 31 deletions(-)
> 
> Reviewed-by: Max Reitz 
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v10 07/17] qemu-io: Switch 'alloc' command to byte-based length

2017-04-28 Thread Eric Blake

On 04/28/2017 02:46 PM, Max Reitz wrote:
> On 27.04.2017 03:46, Eric Blake wrote:
>> For the 'alloc' command, accepting an offset in bytes but a length
>> in sectors, and reporting output in sectors, is confusing.  Do
>> everything in bytes, and adjust the expected output accordingly.
>>
>> Signed-off-by: Eric Blake 
>> Reviewed-by: Philippe Mathieu-Daudé 
>>

>>  }
>> +if (!QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE)) {
>> +printf("bytes %" PRId64 " is not sector aligned\n",
> 
> This isn't real English. :-)

But, it's just copy-and-paste from the other instances you just reviewed
in 6/17!  [Translation - if I change this one, I also get to redo that one]

Which of these various alternatives (if any) looks better:

bytes=511 is not sector-aligned
511 is not a sector-aligned value for 'bytes'
requested 'bytes' of 511 is not sector-aligned
alignment error: 511 bytes is not sector-aligned
'bytes' must be sector-aligned: 511
your clever entry here...

> 
> With that fixed (somehow, you know better than me how to):

Re-reading my various alternatives, I do think that /sector
aligned/sector-aligned/ helps no matter what; and that the remaining
trick is to use quoting or '=' or some other lexical trick to make it
obvious that 'bytes' is a parameter name whose value 511 is invalid,
rather than part of the actual error of a value that is not properly
aligned.

> 
> Reviewed-by: Max Reitz 

If you state a preference for one of my variants, then the respin will
use that variant consistently and add your R-b.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v10 08/17] qemu-io: Switch 'map' output to byte-based reporting

2017-04-28 Thread Max Reitz

On 27.04.2017 03:46, Eric Blake wrote:
> Mixing byte offset and sector allocation counts is a bit
> confusing.  Also, reporting n/m sectors, where m decreases
> according to the remaining size of the file, isn't really
> adding any useful information.

Since this map doesn't leave out any range in the image file -- not
really, no. :-)

> Update the output to use
> byte counts, and adjust the affected tests (./check -qcow2 102,
> ./check -vpc 146).
> 
> Signed-off-by: Eric Blake 
> 
> ---
> v10: rebase to updated test 179
> v9: new patch
> ---
>  qemu-io-cmds.c |  5 ++---
>  tests/qemu-iotests/102.out |  4 ++--
>  tests/qemu-iotests/146.out | 30 +++---
>  tests/qemu-iotests/179.out | 22 +++---
>  4 files changed, 30 insertions(+), 31 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v10 05/17] iotests: Add test 179 to cover write zeroes with unmap

2017-04-28 Thread Eric Blake

On 04/28/2017 02:28 PM, Max Reitz wrote:
> On 27.04.2017 03:46, Eric Blake wrote:
>> No tests were covering write zeroes with unmap.  Additionally,
>> I needed to prove that my previous patches for correct status
>> reporting and write zeroes optimizations actually had an impact.
>>
>> The test works for cluster_size between 8k and 2M (for smaller
>> sizes, it fails because our allocation patterns are not contiguous
>> with small clusters).
>>
>> Signed-off-by: Eric Blake 
>>
>> ---
>> v10: drop any changes to v2 files, rewrite test to work with updates
>> earlier in the series, add a blkdebug probe
>> v9: new patch
>> ---
>>  tests/qemu-iotests/179 | 123 
>> +
>>  tests/qemu-iotests/179.out |  90 +
>>  tests/qemu-iotests/group   |   1 +
>>  3 files changed, 214 insertions(+)
>>  create mode 100755 tests/qemu-iotests/179
>>  create mode 100644 tests/qemu-iotests/179.out
> 
> Can I convince you to write this test in a different way? I really have
> a hard time following all of the writes and building a mental image of
> how the qcow2 file should look in the end so I can compare it to the map
> output.
> 
> I'd like it better if you would check the qcow2 file after each group of
> write operations, maybe even creating a new image for every such group.
> But even just having more qemu-img map calls (and changing nothing else)
> would help me a lot.

Will do.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v10 07/17] qemu-io: Switch 'alloc' command to byte-based length

2017-04-28 Thread Max Reitz

On 27.04.2017 03:46, Eric Blake wrote:
> For the 'alloc' command, accepting an offset in bytes but a length
> in sectors, and reporting output in sectors, is confusing.  Do
> everything in bytes, and adjust the expected output accordingly.
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: Philippe Mathieu-Daudé 
> 
> ---
> v10: rebase to code cleanup
> v9: new patch
> ---
>  qemu-io-cmds.c| 30 ++
>  tests/qemu-iotests/019.out|  8 
>  tests/qemu-iotests/common.pattern |  2 +-
>  3 files changed, 23 insertions(+), 17 deletions(-)
> 
> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
> index fabc394..34f6707 100644
> --- a/qemu-io-cmds.c
> +++ b/qemu-io-cmds.c
> @@ -1760,7 +1760,7 @@ out:
>  static int alloc_f(BlockBackend *blk, int argc, char **argv)
>  {
>  BlockDriverState *bs = blk_bs(blk);
> -int64_t offset, sector_num, nb_sectors, remaining;
> +int64_t offset, sector_num, nb_sectors, remaining, bytes;
>  char s1[64];
>  int num, ret;
>  int64_t sum_alloc;
> @@ -1776,18 +1776,24 @@ static int alloc_f(BlockBackend *blk, int argc, char 
> **argv)
>  }
> 
>  if (argc == 3) {
> -nb_sectors = cvtnum(argv[2]);
> -if (nb_sectors < 0) {
> -print_cvtnum_err(nb_sectors, argv[2]);
> +bytes = cvtnum(argv[2]);
> +if (bytes < 0) {
> +print_cvtnum_err(bytes, argv[2]);
>  return 0;
> -} else if (nb_sectors > INT_MAX) {
> -printf("length argument cannot exceed %d, given %s\n",
> -   INT_MAX, argv[2]);
> +} else if (bytes > INT_MAX * BDRV_SECTOR_SIZE) {
> +printf("length argument cannot exceed %llu, given %s\n",
> +   INT_MAX * BDRV_SECTOR_SIZE, argv[2]);
>  return 0;
>  }
>  } else {
> -nb_sectors = 1;
> +bytes = BDRV_SECTOR_SIZE;
>  }
> +if (!QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE)) {
> +printf("bytes %" PRId64 " is not sector aligned\n",

This isn't real English. :-)

With that fixed (somehow, you know better than me how to):

Reviewed-by: Max Reitz 

> +   bytes);
> +return 0;
> +}
> +nb_sectors = bytes >> BDRV_SECTOR_BITS;
> 
>  remaining = nb_sectors;
>  sum_alloc = 0;



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v10 06/17] qemu-io: Don't open-code QEMU_IS_ALIGNED

2017-04-28 Thread Max Reitz

On 27.04.2017 03:46, Eric Blake wrote:
> Manual comparison against 0x1ff is not as clean as using our
> alignment macros from osdep.h.
> 
> Suggested-by: Philippe Mathieu-Daudé 
> Signed-off-by: Eric Blake 
> 
> ---
> v10: new patch
> ---
>  qemu-io-cmds.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v10 05/17] iotests: Add test 179 to cover write zeroes with unmap

2017-04-28 Thread Max Reitz

On 27.04.2017 03:46, Eric Blake wrote:
> No tests were covering write zeroes with unmap.  Additionally,
> I needed to prove that my previous patches for correct status
> reporting and write zeroes optimizations actually had an impact.
> 
> The test works for cluster_size between 8k and 2M (for smaller
> sizes, it fails because our allocation patterns are not contiguous
> with small clusters).
> 
> Signed-off-by: Eric Blake 
> 
> ---
> v10: drop any changes to v2 files, rewrite test to work with updates
> earlier in the series, add a blkdebug probe
> v9: new patch
> ---
>  tests/qemu-iotests/179 | 123 
> +
>  tests/qemu-iotests/179.out |  90 +
>  tests/qemu-iotests/group   |   1 +
>  3 files changed, 214 insertions(+)
>  create mode 100755 tests/qemu-iotests/179
>  create mode 100644 tests/qemu-iotests/179.out

Can I convince you to write this test in a different way? I really have
a hard time following all of the writes and building a mental image of
how the qcow2 file should look in the end so I can compare it to the map
output.

I'd like it better if you would check the qcow2 file after each group of
write operations, maybe even creating a new image for every such group.
But even just having more qemu-img map calls (and changing nothing else)
would help me a lot.

Max

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v10 04/17] qcow2: Optimize zero_single_l2() to minimize L2 churn

2017-04-28 Thread Eric Blake

On 04/28/2017 01:00 PM, Max Reitz wrote:
> On 27.04.2017 03:46, Eric Blake wrote:
>> Similar to discard_single_l2(), we should try to avoid dirtying
>> the L2 cache when the cluster we are changing already has the
>> right characteristics.
>>

>> -/* Update L2 entries */
>> +/*
>> + * Minimize L2 changes if the cluster already reads back as
>> + * zeroes with correct allocation.
>> + */
>> +if (qcow2_get_cluster_type(old_offset) == QCOW2_CLUSTER_ZERO &&
>> +!(unmap && old_offset & L2E_OFFSET_MASK)) {
>> +continue;
>> +}
>> +
>>  qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache, l2_table);
>> -if (old_offset & QCOW_OFLAG_COMPRESSED || flags & 
>> BDRV_REQ_MAY_UNMAP) {
>> +if (old_offset & QCOW_OFLAG_COMPRESSED || unmap) {
> 
> Works, but I'd like it better to store the cluster type somewhere and
> then use it here instead of checking the COMPRESSED flag.

Indeed, qcow2_get_cluster_type() checks QCOW_OFLAG_COMPRESSED first -
but I like the idea of storing it in a temporary variable now that we
are using the cluster type in more than one place.

> 
> But it works, so it's up to you:
> 
> Reviewed-by: Max Reitz 
> 
>>  l2_table[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
>>  qcow2_free_any_clusters(bs, old_offset, 1, 
>> QCOW2_DISCARD_REQUEST);
>>  } else {
>>
> 
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v10 02/17] qcow2: Correctly report status of preallocated zero clusters

2017-04-28 Thread Eric Blake

On 04/28/2017 12:35 PM, Max Reitz wrote:
> On 27.04.2017 03:46, Eric Blake wrote:
>> We were throwing away the preallocation information associated with
>> zero clusters.  But we should be matching the well-defined semantics
>> in bdrv_get_block_status(), where (BDRV_BLOCK_ZERO |
>> BDRV_BLOCK_OFFSET_VALID) informs the user which offset is reserved,
>> while still reminding the user that reading from that offset is
>> likely to read garbage.
>>
>> Making this change lets us see which portions of an image are zero
>> but preallocated, when using qemu-img map --output=json.  The
>> --output=human side intentionally ignores all zero clusters, whether
>> or not they are preallocated.
>>
>> The fact that there is no change to qemu-iotests './check -qcow2'
>> merely means that we aren't yet testing this aspect of qemu-img;
>> a later patch will add a test.
>>
>> Signed-off-by: Eric Blake 
>>
>> ---
>> v10: new patch
>> ---
>>  block/qcow2-cluster.c | 32 +++-
>>  1 file changed, 27 insertions(+), 5 deletions(-)
> 
> I'd propose you split the qcow2 changes off of this series.

Yeah, it's sort of morphed into a well-tested later part and an earlier
part that is still undergoing heavy review.  I don't know how much churn
there would be to rebase things to do the blkdebug changes first (it may
invalidate portions of my test 177; but I could always tag such spots as
FIXME while waiting on the qcow2 part to settle and land).  I'll see
what I can do, as I've now got multiple series queued up, and should at
least get SOMETHING to land to start clearing up the logjam.


>> @@ -335,9 +339,10 @@ static int count_contiguous_clusters_by_type(int 
>> nb_clusters,
>>  int i;
>>
>>  for (i = 0; i < nb_clusters; i++) {
>> -int type = qcow2_get_cluster_type(be64_to_cpu(l2_table[i]));
>> +uint64_t entry = be64_to_cpu(l2_table[i]);
>> +int type = qcow2_get_cluster_type(entry);
>>
>> -if (type != wanted_type) {
>> +if (type != wanted_type || entry & L2E_OFFSET_MASK) {
> 
> This works only for wanted_type \in \{ ZERO, UNALLOCATED \} -- which is
> what the comment you added describes, but this may still warrant a
> renaming of this function (count_contiguous_unallocated_clusters?) and
> probably an assertion that wanted_type is ZERO or UNALLOCATED.

Good idea.


>> @@ -559,9 +564,26 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, 
>> uint64_t offset,
>>  ret = -EIO;
>>  goto fail;
>>  }
>> -c = count_contiguous_clusters_by_type(nb_clusters, 
>> &l2_table[l2_index],
>> -  QCOW2_CLUSTER_ZERO);
>> -*cluster_offset = 0;
>> +/* Distinguish between pure zero clusters and pre-allocated ones */
>> +if (*cluster_offset & L2E_OFFSET_MASK) {
>> +c = count_contiguous_clusters(nb_clusters, s->cluster_size,
>> +  &l2_table[l2_index], 
>> QCOW_OFLAG_ZERO);
> 
> You should probably switch this patch and the next one -- or I just send
> my patches myself and you base your series on top of it...

I tested with your patches in, then rebased with your patches out to see
what broke, and...

> 
> Because before the next patch, this happens:
> 

> $ ./qemu-img map foo.qcow2
> Offset  Length  Mapped to   File
> qemu-img: block/qcow2-cluster.c:319: count_contiguous_clusters:
> Assertion `qcow2_get_cluster_type(first_entry) == QCOW2_CLUSTER_NORMAL'
> failed.

got that same assert. So I realized that your patch was necessary and
rebased it back in, but obviously forgot to re-test at all points in the
series. Yes, your should go first, and I'd be more than happy if you
post yours formally and I rebase my qcow2 portions on top of yours (all
the more reason for me to split this series into two, and get the
blkdebug portions in now while we still hammer on the qcow2 stuff).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [Qemu-devel] [PATCH] MAINTAINERS: Add qemu-progress to the block layer

2017-04-28 Thread Eric Blake

On 04/28/2017 11:55 AM, Max Reitz wrote:
> util/qemu-progress.c is currently unmaintained. The only user of its
> functionality is qemu-img, so it effectively is part of the block layer.
> 
> Suggested-by: Fam Zheng 
> Signed-off-by: Max Reitz 
> ---
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)

Reviewed-by: Eric Blake 

> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index a1d2b3a4d3..768b4d94ec 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1168,6 +1168,7 @@ F: include/block/
>  F: qemu-img*
>  F: qemu-io*
>  F: tests/qemu-iotests/
> +F: util/qemu-progress.c
>  T: git git://repo.or.cz/qemu/kevin.git block
>  
>  Block I/O path
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] RFC: some problems with bdrv_get_block_status

2017-04-28 Thread Eric Blake

On 04/28/2017 11:17 AM, Denis V. Lunev wrote:
> Hello, All!
> 
> Recently we have experienced problems with very slow
> bdrv_get_block_status call, which is massive called f.e.
> from mirror_run().
> 
> The problem was narrowed down to slow lseek64() system
> call, which can take 1-2-3 seconds.

I'm guessing you meant one-to-three (the range), not one-two-three
(three separate digits), and just had an unfortunate abbreviation of
'to' turning into the phonetically-similar '2'.

> 
> root@s158 ~]# strace -f -p 77048 -T  -e lseek
> Process 77048 attached with 14 threads
> [pid 77053] lseek(23, 6359613440, SEEK_DATA) = 6360276992 <1.250323>

That sounds like a bug in your choice of filesystem.  It's been
mentioned before that lseek has some pathetically poor behavior (I think
tmpfs was one of the culprits), but I maintain that it is better to
hammer on the kernel folks to fix the poor behavior than it is to have
to implement user-space workarounds in every single affected program.

That said, a workaround of being able to request the avoidance of lseek
has been brought up on this list before.

> The problem comes from this branch of the code
> 
> bdrv_co_get_block_status
> ...
> if (bs->file &&
> (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
> (ret & BDRV_BLOCK_OFFSET_VALID)) {
> int file_pnum;
> 
> ret2 = bdrv_co_get_block_status(bs->file, ret >> BDRV_SECTOR_BITS,
> *pnum, &file_pnum);
> if (ret2 >= 0) {
> /* Ignore errors.  This is just providing extra information, it
>  * is useful but not necessary.
>  */
> if (!file_pnum) {
> /* !file_pnum indicates an offset at or beyond the EOF;
> it is
>  * perfectly valid for the format block driver to point
> to such
>  * offsets, so catch it and mark everything as zero */
> ret |= BDRV_BLOCK_ZERO;
> } else {
> /* Limit request to the range reported by the protocol
> driver */
> *pnum = file_pnum;
> ret |= (ret2 & BDRV_BLOCK_ZERO);
> }
> }
> }

I don't see anything wrong with this code. It's nice to know that we
have data (qcow2 says we have allocated bytes due to this layer, so
don't refer to the backing files), but when the underlying file can ALSO
tell us that the underlying protocol is sparse and we are on a hole,
then we know that we have BDRV_BLOCK_ZERO condition which can in turn
enable other optimizations in quite a bit of the stack.  It IS valid to
return BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO (we don't have to limit
ourselves to BDRV_BLOCK_DATA), and we should probably do so any time
that such computation is cheap.

I agree with your analysis that on a poor filesystem where lseek()
behavior sucks that it is no longer cheap to determine where the holes are.

But the proper workaround for those filesystems should NOT be made by
undoing this part of bdrv_co_get_block_status(), but should rather be
fixed in file-posix.c by the addition of a user-controllable knob on
whether to skip lseek().  In other words, if we're going to work around
the poor filesystem performance, the workaround should live in
file-posix.c, not in the generic block/io.c.  Once the workaround is
enabled, the 'ret2 = bdrv_co_get_block_status()' nested call should be
lightning fast (because file-posix.c would no longer be using lseek when
you've requested the workaround).

> Frankly speaking, this optimization should not give much.
> If upper layer format (QCOW2) says that we have data
> here, then nowadays in 99.9% we do have the data.

You are correct that we have BDRV_BLOCK_DATA; but it IS useful to ALSO
know if that data reads back as all zeros (so we can skip reading it).
Just because qcow2 reports something as data does NOT rule out whether
the data still reads as zeros.

> Meanwhile this branch can cause problems. We would
> need block cleared entirely to get the benefit for most
> cases in mirror and backup operations.
> 
> At my opinion it worth to drop this at all.
> 
> Guys, do you have any opinion?

Again, my opinion is to not change this part of block/io.c.  Rather,
work should be expended on individual protocol drivers to quit wasting
unnecessary time on reporting BDRV_BLOCK_ZERO if the time spent to
determine that is not worth the effort (as it is when using lseek() on
inefficient filesystems).  It is always safe for a protocol driver to
report BDRV_BLOCK_DATA instead of BDRV_BLOCK_ZERO, if asking the
question is too expensive (treating holes as data may pessimize other
operations, but that's okay if the penalty for asking is worse than what
optimizations are able to regain).

> 
> Den
> 
> P.S. The kernel is one based on RedHat 3.10.0-514. The same
>   problem was observed in 3.10.0-327 too.

Red Hat is always two words (I'm allowed to point that out, as they
employ me ;).  But if it r

Re: [Qemu-block] [PATCH v15 20/21] file-posix: Add image locking to perm operations

2017-04-28 Thread Kevin Wolf

Am 28.04.2017 um 17:30 hat Fam Zheng geschrieben:
> On Fri, 04/28 15:45, Kevin Wolf wrote:
> > Am 26.04.2017 um 05:34 hat Fam Zheng geschrieben:
> > > This extends the permission bits of op blocker API to external using
> > > Linux OFD locks.
> > > 
> > > Each permission in @perm and @shared_perm is represented by a locked
> > > byte in the image file.  Requesting a permission in @perm is translated
> > > to a shared lock of the corresponding byte; rejecting to share the same
> > > permission is translated to a shared lock of a separate byte. With that,
> > > we use 2x number of bytes of distinct permission types.
> > > 
> > > virtlockd in libvirt locks the first byte, so we do locking from a
> > > higher offset.
> > > 
> > > Suggested-by: Kevin Wolf 
> > > Signed-off-by: Fam Zheng 
> > 
> > >  BlockDriver bdrv_file = {
> > >  .format_name = "file",
> > >  .protocol_name = "file",
> > > @@ -1977,7 +2234,11 @@ BlockDriver bdrv_file = {
> > >  .bdrv_get_info = raw_get_info,
> > >  .bdrv_get_allocated_file_size
> > >  = raw_get_allocated_file_size,
> > > -
> > > +.bdrv_inactivate = raw_inactivate,
> > > +.bdrv_invalidate_cache = raw_invalidate_cache,
> > > +.bdrv_check_perm = raw_check_perm,
> > > +.bdrv_set_perm   = raw_set_perm,
> > > +.bdrv_abort_perm_update = raw_abort_perm_update,
> > >  .create_opts = &raw_create_opts,
> > >  };
> > 
> > By the way, is it intentional that we apply locking only to bdrv_file,
> > but not to bdrv_host_device or bdrv_host_cdrom?
> 
> Good question.
> 
> Though CDROM is not very interesting, I am not sure about bdrv_host_device:
> 
> 1) On the one hand, a host device can contain a qcow2 image so maybe locking 
> is
> useful.  Another reason to lock is that they share the same QAPI option,
> BlockdevOptionsFile. That last reason is, it should be very easy to add it.
> 
> 2) On the other hand, unlike files, host devices get pretty high chances in
> being accesses by multiple readers/writers, outside of QEMU, such as partition
> detection, mount, fsck, etc.
> 
> What is your opinion?

I would add it, if nothing else to be consistent with regular files. You
don't even need qcow2 on a block device for it to be useful, even for
raw images the guest may not be able to tolerate a second writer (in
this case, share-rw of the qdev device will directly control the locking
mode).

As for 2), I don't think these other users are a problem because we're
only taking shared locks. We'll prevent other users from taking an
exclusive lock, but that's exactly right because it's true that we're
using the image.

Kevin

Re: [Qemu-block] [PATCH v10 04/17] qcow2: Optimize zero_single_l2() to minimize L2 churn

2017-04-28 Thread Max Reitz

On 27.04.2017 03:46, Eric Blake wrote:
> Similar to discard_single_l2(), we should try to avoid dirtying
> the L2 cache when the cluster we are changing already has the
> right characteristics.
> 
> Note that by the time we get to zero_single_l2(), BDRV_REQ_MAY_UNMAP
> is a requirement to unallocate a cluster (this is because the block
> layer clears that flag if discard.* flags during open requested that
> we never punch holes - see the conversation around commit 170f4b2e,
> https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg07306.html).
> Therefore, this patch can only reuse a zero cluster as-is if either
> unmapping is not requested, or if the zero cluster was not associated
> with an allocation.
> 
> Technically, there are some cases where an unallocated cluster
> already reads as all zeroes (namely, when there is no backing file
> [easy: check bs->backing], or when the backing file also reads as
> zeroes [harder: we can't check bdrv_get_block_status since we are
> already holding the lock]), where the guest would not immediately see
> a difference if we left that cluster unallocated.  But if the user
> did not request unmapping, leaving an unallocated cluster is wrong;
> and even if the user DID request unmapping, keeping a cluster
> unallocated risks a subtle semantic change of guest-visible contents
> if a backing file is later added, and it is not worth auditing
> whether all internal uses such as mirror properly avoid an unmap
> request.  Thus, this patch is intentionally limited to just clusters
> that are already marked as zero.
> 
> Signed-off-by: Eric Blake 
> 
> ---
> v10: new patch, replacing earlier attempt to use unallocated clusters,
> and ditching any optimization of v2 files
> ---
>  block/qcow2-cluster.c | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index db3d937..d542894 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1614,6 +1614,7 @@ static int zero_single_l2(BlockDriverState *bs, 
> uint64_t offset,
>  int l2_index;
>  int ret;
>  int i;
> +bool unmap = !!(flags & BDRV_REQ_MAY_UNMAP);
> 
>  ret = get_cluster_table(bs, offset, &l2_table, &l2_index);
>  if (ret < 0) {
> @@ -1629,9 +1630,17 @@ static int zero_single_l2(BlockDriverState *bs, 
> uint64_t offset,
> 
>  old_offset = be64_to_cpu(l2_table[l2_index + i]);
> 
> -/* Update L2 entries */
> +/*
> + * Minimize L2 changes if the cluster already reads back as
> + * zeroes with correct allocation.
> + */
> +if (qcow2_get_cluster_type(old_offset) == QCOW2_CLUSTER_ZERO &&
> +!(unmap && old_offset & L2E_OFFSET_MASK)) {
> +continue;
> +}
> +
>  qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache, l2_table);
> -if (old_offset & QCOW_OFLAG_COMPRESSED || flags & 
> BDRV_REQ_MAY_UNMAP) {
> +if (old_offset & QCOW_OFLAG_COMPRESSED || unmap) {

Works, but I'd like it better to store the cluster type somewhere and
then use it here instead of checking the COMPRESSED flag.

But it works, so it's up to you:

Reviewed-by: Max Reitz 

>  l2_table[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
>  qcow2_free_any_clusters(bs, old_offset, 1, 
> QCOW2_DISCARD_REQUEST);
>  } else {
> 




signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v10 03/17] qcow2: Reuse preallocated zero clusters

2017-04-28 Thread Max Reitz

On 27.04.2017 03:46, Eric Blake wrote:
> From: Max Reitz 
> 
> Instead of just freeing preallocated zero clusters and completely
> allocating them from scratch, reuse them.
> 
> We cannot do this in handle_copied(), however, since this is a COW
> operation. Therefore, we have to add the new logic to handle_alloc() and
> simply return the existing offset if it exists. The only catch is that
> we have to convince qcow2_alloc_cluster_link_l2() not to free the old
> clusters (because we have reused them).
> 
> Reported-by: Eric Blake 
> Signed-off-by: Max Reitz 
> Signed-off-by: Eric Blake 
> 
> ---
> v10: new patch. Max hasn't posted the patch directly on list, but
> did mention it here:
> https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg03936.html
> and that he would post it once he has tests. Well, my later patches
> add a test that requires this one :)  The other two patches that
> he mentioned there are also good, but not essential for my series
> (and I didn't want to write tests to expose the behavior difference,
> because it would deprive Max of that fun).

Well, the main reason I didn't send the patches yet is because I was
tired while writing them ("Date: Sat Apr 22 01:17:52 2017 +0200") and I
wanted to take another look before sending them. I guess now's as good a
time as any.

> ---
>  block/qcow2.h |  3 ++
>  block/qcow2-cluster.c | 83 
> +++
>  2 files changed, 60 insertions(+), 26 deletions(-)

[...]

> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index d1063df..db3d937 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c

[...]

> @@ -1192,31 +1199,53 @@ static int handle_alloc(BlockDriverState *bs, 
> uint64_t guest_offset,
>   * wrong with our code. */
>  assert(nb_clusters > 0);
> 
> +if (!*host_offset && qcow2_get_cluster_type(entry) == QCOW2_CLUSTER_ZERO 
> &&
> +(entry & L2E_OFFSET_MASK) != 0 && (entry & QCOW_OFLAG_COPIED))

*host_offset works with this, too, if
start_of_cluster(s, *host_offset) == (entry & L2E_OFFSET_MASK).

If !(entry & QCOW_OFLAG_COPIED), we should check whether the refcount
maybe is 1 and then set OFLAG_COPIED. But that is something we don't
even do for normal clusters yet, so it's something to fix another day.

> +{
> +/* Try to reuse preallocated zero clusters; contiguous normal 
> clusters
> + * would be fine, too, but count_cow_clusters() above has limited
> + * nb_clusters already to a range of COW clusters */
> +int preallocated_nb_clusters =
> +count_contiguous_clusters(nb_clusters, s->cluster_size, l2_table,

s/l2_table/&l2_table[l2_index]/

> +  QCOW_OFLAG_COPIED);
> +
> +if (preallocated_nb_clusters) {

preallocated_nb_clusters must be at least 1, so an assertion would be
better.

Max

> +nb_clusters = preallocated_nb_clusters;
> +alloc_cluster_offset = entry & L2E_OFFSET_MASK;
> +
> +/* We want to reuse these clusters, so 
> qcow2_alloc_cluster_link_l2()
> + * should not free them. */
> +keep_old_clusters = true;
> +}
> +}
> +
>  qcow2_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
> 
> -/* Allocate, if necessary at a given offset in the image file */
> -alloc_cluster_offset = start_of_cluster(s, *host_offset);
> -ret = do_alloc_cluster_offset(bs, guest_offset, &alloc_cluster_offset,
> -  &nb_clusters);
> -if (ret < 0) {
> -goto fail;
> -}
> -
> -/* Can't extend contiguous allocation */
> -if (nb_clusters == 0) {
> -*bytes = 0;
> -return 0;
> -}
> -
> -/* !*host_offset would overwrite the image header and is reserved for "no
> - * host offset preferred". If 0 was a valid host offset, it'd trigger the
> - * following overlap check; do that now to avoid having an invalid value 
> in
> - * *host_offset. */
>  if (!alloc_cluster_offset) {
> -ret = qcow2_pre_write_overlap_check(bs, 0, alloc_cluster_offset,
> -nb_clusters * s->cluster_size);
> -assert(ret < 0);
> -goto fail;
> +/* Allocate, if necessary at a given offset in the image file */
> +alloc_cluster_offset = start_of_cluster(s, *host_offset);
> +ret = do_alloc_cluster_offset(bs, guest_offset, 
> &alloc_cluster_offset,
> +  &nb_clusters);
> +if (ret < 0) {
> +goto fail;
> +}
> +
> +/* Can't extend contiguous allocation */
> +if (nb_clusters == 0) {
> +*bytes = 0;
> +return 0;
> +}
> +
> +/* !*host_offset would overwrite the image header and is reserved for
> + * "no host offset preferred". If 0 was a valid host offset, it'd
> + * trigger the following overlap check; do that now to avoid havi

Re: [Qemu-block] [PATCH v10 02/17] qcow2: Correctly report status of preallocated zero clusters

2017-04-28 Thread Max Reitz

On 27.04.2017 03:46, Eric Blake wrote:
> We were throwing away the preallocation information associated with
> zero clusters.  But we should be matching the well-defined semantics
> in bdrv_get_block_status(), where (BDRV_BLOCK_ZERO |
> BDRV_BLOCK_OFFSET_VALID) informs the user which offset is reserved,
> while still reminding the user that reading from that offset is
> likely to read garbage.
> 
> Making this change lets us see which portions of an image are zero
> but preallocated, when using qemu-img map --output=json.  The
> --output=human side intentionally ignores all zero clusters, whether
> or not they are preallocated.
> 
> The fact that there is no change to qemu-iotests './check -qcow2'
> merely means that we aren't yet testing this aspect of qemu-img;
> a later patch will add a test.
> 
> Signed-off-by: Eric Blake 
> 
> ---
> v10: new patch
> ---
>  block/qcow2-cluster.c | 32 +++-
>  1 file changed, 27 insertions(+), 5 deletions(-)

I'd propose you split the qcow2 changes off of this series.

> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 100398c..d1063df 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -328,6 +328,10 @@ static int count_contiguous_clusters(int nb_clusters, 
> int cluster_size,
>   return i;
>  }
> 
> +/*
> + * Checks how many consecutive clusters in a given L2 table have the same
> + * cluster type with no corresponding allocation.
> + */
>  static int count_contiguous_clusters_by_type(int nb_clusters,
>   uint64_t *l2_table,
>   int wanted_type)
> @@ -335,9 +339,10 @@ static int count_contiguous_clusters_by_type(int 
> nb_clusters,
>  int i;
> 
>  for (i = 0; i < nb_clusters; i++) {
> -int type = qcow2_get_cluster_type(be64_to_cpu(l2_table[i]));
> +uint64_t entry = be64_to_cpu(l2_table[i]);
> +int type = qcow2_get_cluster_type(entry);
> 
> -if (type != wanted_type) {
> +if (type != wanted_type || entry & L2E_OFFSET_MASK) {

This works only for wanted_type \in \{ ZERO, UNALLOCATED \} -- which is
what the comment you added describes, but this may still warrant a
renaming of this function (count_contiguous_unallocated_clusters?) and
probably an assertion that wanted_type is ZERO or UNALLOCATED.

>  break;
>  }
>  }
> @@ -559,9 +564,26 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, 
> uint64_t offset,
>  ret = -EIO;
>  goto fail;
>  }
> -c = count_contiguous_clusters_by_type(nb_clusters, 
> &l2_table[l2_index],
> -  QCOW2_CLUSTER_ZERO);
> -*cluster_offset = 0;
> +/* Distinguish between pure zero clusters and pre-allocated ones */
> +if (*cluster_offset & L2E_OFFSET_MASK) {
> +c = count_contiguous_clusters(nb_clusters, s->cluster_size,
> +  &l2_table[l2_index], 
> QCOW_OFLAG_ZERO);

You should probably switch this patch and the next one -- or I just send
my patches myself and you base your series on top of it...

Because before the next patch, this happens:

$ ./qemu-img create -f qcow2 foo.qcow2 64M
Formatting 'foo.qcow2', fmt=qcow2 size=67108864 encryption=off
cluster_size=65536 lazy_refcounts=off refcount_bits=16
$ ./qemu-io -c 'write 0 64M' -c 'write -z 0 64M' foo.qcow2
wrote 67108864/67108864 bytes at offset 0
64 MiB, 1 ops; 0.1846 sec (346.590 MiB/sec and 5.4155 ops/sec)
wrote 67108864/67108864 bytes at offset 0
64 MiB, 1 ops; 0.0004 sec (127.551 GiB/sec and 2040.8163 ops/sec)
$ ./qemu-img map foo.qcow2
Offset  Length  Mapped to   File
qemu-img: block/qcow2-cluster.c:319: count_contiguous_clusters:
Assertion `qcow2_get_cluster_type(first_entry) == QCOW2_CLUSTER_NORMAL'
failed.
[1]4970 abort (core dumped)  ./qemu-img map foo.qcow2

Max

> +*cluster_offset &= L2E_OFFSET_MASK;
> +if (offset_into_cluster(s, *cluster_offset)) {
> +qcow2_signal_corruption(bs, true, -1, -1,
> +"Preallocated zero cluster offset %#"
> +PRIx64 " unaligned (L2 offset: %#"
> +PRIx64 ", L2 index: %#x)",
> +*cluster_offset, l2_offset, 
> l2_index);
> +ret = -EIO;
> +goto fail;
> +}
> +} else {
> +c = count_contiguous_clusters_by_type(nb_clusters,
> +  &l2_table[l2_index],
> +  QCOW2_CLUSTER_ZERO);
> +*cluster_offset = 0;
> +}
>  break;
>  case QCOW2_CLUSTER_UNALLOCATED:
>  /* how many empty clusters ? */
> 




signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v10 01/17] block: Update comments on BDRV_BLOCK_* meanings

2017-04-28 Thread Max Reitz

On 27.04.2017 03:46, Eric Blake wrote:
> We had some conflicting documentation: a nice 8-way table that
> described all possible combinations of DATA, ZERO, and
> OFFSET_VALID, couple with text that implied that OFFSET_VALID
> always meant raw data could be read directly.  As the 8-way
> table is the intended semantics, simplify the rest of the text
> to get rid of the confusion.
> 
> Suggested-by: Max Reitz 
> Signed-off-by: Eric Blake 
> 
> ---
> v10: new patch
> ---
>  include/block/block.h | 13 +++--
>  1 file changed, 7 insertions(+), 6 deletions(-)

Reviewed-by: Max Reitz 

Thanks!



signature.asc
Description: OpenPGP digital signature

[Qemu-block] [PATCH] MAINTAINERS: Add qemu-progress to the block layer

2017-04-28 Thread Max Reitz

util/qemu-progress.c is currently unmaintained. The only user of its
functionality is qemu-img, so it effectively is part of the block layer.

Suggested-by: Fam Zheng 
Signed-off-by: Max Reitz 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index a1d2b3a4d3..768b4d94ec 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1168,6 +1168,7 @@ F: include/block/
 F: qemu-img*
 F: qemu-io*
 F: tests/qemu-iotests/
+F: util/qemu-progress.c
 T: git git://repo.or.cz/qemu/kevin.git block
 
 Block I/O path
-- 
2.12.2

Re: [Qemu-block] [PATCH] progress: Show current progress on SIGINFO

2017-04-28 Thread Max Reitz

On 08.02.2017 00:57, Max Reitz wrote:
> Currently we only print progress information on retrieval of SIGUSR1.
> Some systems have a dedicated SIGINFO for this, however, so it should be
> handled appropriately if it is available.
> 
> Buglink: https://bugs.launchpad.net/qemu/+bug/1662468
> Signed-off-by: Max Reitz 
> ---
> Can anyone test this on a BSD system? O:-)
> ---
>  util/qemu-progress.c | 3 +++
>  qemu-img.texi| 3 ++-
>  2 files changed, 5 insertions(+), 1 deletion(-)

Applied to my block branch.

(https://github.com/XanClic/qemu/commits/block)

Max



signature.asc
Description: OpenPGP digital signature

[Qemu-block] RFC: some problems with bdrv_get_block_status

2017-04-28 Thread Denis V. Lunev

Hello, All!

Recently we have experienced problems with very slow
bdrv_get_block_status call, which is massive called f.e.
from mirror_run().

The problem was narrowed down to slow lseek64() system
call, which can take 1-2-3 seconds.

root@s158 ~]# strace -f -p 77048 -T  -e lseek
Process 77048 attached with 14 threads
[pid 77053] lseek(23, 6359613440, SEEK_DATA) = 6360276992 <1.250323>
[pid 77053] lseek(23, 6360334336, SEEK_DATA) = 6360334336 <0.21>
[pid 77053] lseek(23, 6360334336, SEEK_HOLE) = 6360412160 <0.007850>
[pid 77053] lseek(23, 6360465408, SEEK_DATA) = 6360596480 <0.244460>
[pid 77053] lseek(23, 6360596480, SEEK_DATA) = 6360596480 <0.12>
[pid 77053] lseek(23, 6360596480, SEEK_HOLE) = 6361440256 <0.009102>
[pid 77053] lseek(23, 6361448448, SEEK_DATA) = 6362243072 <1.482338>
[pid 77053] lseek(23, 6361710592, SEEK_DATA) = 6362243072 <0.987192>
[pid 77053] lseek(23, 6362300416, SEEK_DATA) = 6362300416 <0.12>
[pid 77053] lseek(23, 6362300416, SEEK_HOLE) = 6363058176 <0.008983>
[pid 77053] lseek(23, 6363086848, SEEK_DATA) = 6364467200 <2.554859>
[pid 77053] lseek(23, 6363807744, SEEK_DATA) = 6364467200 <1.220386>
[pid 77053] lseek(23, 6364528640, SEEK_DATA) = 6364528640 <0.12>
[pid 77053] lseek(23, 6364528640, SEEK_HOLE) = 6365327360 <0.007906>
[pid 77053] lseek(23, 6365380608, SEEK_DATA) = 6366162944 <1.442824>
[pid 77053] lseek(23, 6365904896, SEEK_DATA) = 6366162944 <0.478188>
[pid 77053] lseek(23, 6366167040, SEEK_DATA) = 6366167040 <0.13>
[pid 77053] lseek(23, 6366167040, SEEK_HOLE) = 6367002624 <0.009230>
[pid 77053] lseek(23, 6367019008, SEEK_DATA^CProcess 77048 detached


The process eats 100% in system time.

The problem comes from this branch of the code

bdrv_co_get_block_status
...
if (bs->file &&
(ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
(ret & BDRV_BLOCK_OFFSET_VALID)) {
int file_pnum;

ret2 = bdrv_co_get_block_status(bs->file, ret >> BDRV_SECTOR_BITS,
*pnum, &file_pnum);
if (ret2 >= 0) {
/* Ignore errors.  This is just providing extra information, it
 * is useful but not necessary.
 */
if (!file_pnum) {
/* !file_pnum indicates an offset at or beyond the EOF;
it is
 * perfectly valid for the format block driver to point
to such
 * offsets, so catch it and mark everything as zero */
ret |= BDRV_BLOCK_ZERO;
} else {
/* Limit request to the range reported by the protocol
driver */
*pnum = file_pnum;
ret |= (ret2 & BDRV_BLOCK_ZERO);
}
}
}

which was added in the commit

commit 5daa74a6ebce7543aaad178c4061dc087bb4c705
Author: Paolo Bonzini 
Date:   Wed Sep 4 19:00:38 2013 +0200

block: look for zero blocks in bs->file
   
Reviewed-by: Eric Blake 
Signed-off-by: Paolo Bonzini 
Signed-off-by: Stefan Hajnoczi 

without much details.

Frankly speaking, this optimization should not give much.
If upper layer format (QCOW2) says that we have data
here, then nowadays in 99.9% we do have the data.
Meanwhile this branch can cause problems. We would
need block cleared entirely to get the benefit for most
cases in mirror and backup operations.

At my opinion it worth to drop this at all.

Guys, do you have any opinion?

Den

P.S. The kernel is one based on RedHat 3.10.0-514. The same
  problem was observed in 3.10.0-327 too.

Re: [Qemu-block] [PATCH v3] qemu-img: use blk_co_pwrite_zeroes for zero sectors when compressed

2017-04-28 Thread Max Reitz

On 27.04.2017 04:58, jemmy858...@gmail.com wrote:
> From: Lidong Chen 
> 
> When the buffer is zero, blk_co_pwrite_zeroes is more effective than
> blk_co_pwritev with BDRV_REQ_WRITE_COMPRESSED. This patch can reduce
> the time for converting qcow2 images with lots of zero data.
> 
> Signed-off-by: Lidong Chen 
> ---
>v3 changelog:
>fix some spelling errors and code style.
>v2 changelog:
>unify the compressed and non-compressed code paths.
> ---
>  qemu-img.c | 44 ++--
>  1 file changed, 14 insertions(+), 30 deletions(-)

Thank you, I have applied the patch to my block branch:

https://github.com/XanClic/qemu/commits/block

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v15 20/21] file-posix: Add image locking to perm operations

2017-04-28 Thread Fam Zheng

On Fri, 04/28 15:45, Kevin Wolf wrote:
> Am 26.04.2017 um 05:34 hat Fam Zheng geschrieben:
> > This extends the permission bits of op blocker API to external using
> > Linux OFD locks.
> > 
> > Each permission in @perm and @shared_perm is represented by a locked
> > byte in the image file.  Requesting a permission in @perm is translated
> > to a shared lock of the corresponding byte; rejecting to share the same
> > permission is translated to a shared lock of a separate byte. With that,
> > we use 2x number of bytes of distinct permission types.
> > 
> > virtlockd in libvirt locks the first byte, so we do locking from a
> > higher offset.
> > 
> > Suggested-by: Kevin Wolf 
> > Signed-off-by: Fam Zheng 
> 
> >  BlockDriver bdrv_file = {
> >  .format_name = "file",
> >  .protocol_name = "file",
> > @@ -1977,7 +2234,11 @@ BlockDriver bdrv_file = {
> >  .bdrv_get_info = raw_get_info,
> >  .bdrv_get_allocated_file_size
> >  = raw_get_allocated_file_size,
> > -
> > +.bdrv_inactivate = raw_inactivate,
> > +.bdrv_invalidate_cache = raw_invalidate_cache,
> > +.bdrv_check_perm = raw_check_perm,
> > +.bdrv_set_perm   = raw_set_perm,
> > +.bdrv_abort_perm_update = raw_abort_perm_update,
> >  .create_opts = &raw_create_opts,
> >  };
> 
> By the way, is it intentional that we apply locking only to bdrv_file,
> but not to bdrv_host_device or bdrv_host_cdrom?

Good question.

Though CDROM is not very interesting, I am not sure about bdrv_host_device:

1) On the one hand, a host device can contain a qcow2 image so maybe locking is
useful.  Another reason to lock is that they share the same QAPI option,
BlockdevOptionsFile. That last reason is, it should be very easy to add it.

2) On the other hand, unlike files, host devices get pretty high chances in
being accesses by multiple readers/writers, outside of QEMU, such as partition
detection, mount, fsck, etc.

What is your opinion?

Fam

Re: [Qemu-block] [Qemu-devel] [PATCH v5 3/4] shutdown: Add source information to SHUTDOWN and RESET

2017-04-28 Thread Markus Armbruster

Eric Blake  writes:

> Libvirt would like to be able to distinguish between a SHUTDOWN
> event triggered solely by guest request and one triggered by a
> SIGTERM or other action on the host.  While qemu_kill_report() is
> already able to tell whether a shutdown was triggered by a host
> signal (but NOT by a host UI event, such as clicking the X on
> the window), that information was then lost after being printed
> to stderr.  The previous patch prepped things to use an enum
> internally; now it's time to wire it up through all callers, and
> to extend the SHUTDOWN and RESET events to report the details.
>
> Enhance the shutdown request path to take a parameter of which
> way it is being triggered, and update ALL callers.  It would have
> been less churn to keep the common case with no arguments as
> meaning guest-triggered, and only modified the host-triggered
> code paths, via a wrapper function, but then we'd still have to
> audit that I didn't miss any host-triggered spots; changing the
> signature forces us to double-check that I correctly categorized
> all callers.
>
> Since command line options can change whether a guest reset request
> causes an actual reset vs. a shutdown, it's easy to also add the
> information to the RESET event, even though libvirt has not yet
> expressed a need to know that.
>
> For the moment, we keep the enum ShutdownCause for internal use
> only, and merely expose a single boolean of 'guest':true|false
> to the QMP client; this is because we don't yet have evidence that
> the further distinctions will be useful, or whether the addition
> of new enum members would cause problems to clients coded to an
> older version of the enum.
>
> Update expected iotest outputs to match the new data.
>
> Here is output from 'virsh qemu-monitor-event --loop' with the
> patch installed:
>
> event SHUTDOWN at 1492639680.731251 for domain fedora_13: {"guest":true}
> event STOP at 1492639680.732116 for domain fedora_13: 
> event SHUTDOWN at 1492639680.732830 for domain fedora_13: {"guest":false}
>
> Note that libvirt runs qemu with -no-quit: the first SHUTDOWN event
> was triggered by an action I took directly in the guest (shutdown -h),
> at which point qemu stops the vcpus and waits for libvirt to do any
> final cleanups; the second SHUTDOWN event is the result of libvirt
> sending SIGTERM now that it has completed cleanup.
>
> The replay driver needs a followup patch if we want to be able to
> faithfully replay the difference between a host- and guest-initiated
> shutdown (for now, the replayed event is always attributed to host).

I'd prefer to get this right from the start, but that requires input
from replay guys.

Scandalously, replay/ is not covered by MAINTAINERS.
scripts/get_maintainers.pl blames it on Pavel Dovgalyuk (cc'ed), and git
shows recent activity.  Pavel, please post a suitable patch to
MAINTAINERS, and please help us figure out what to do about replaying
reset here.

> See also https://bugzilla.redhat.com/1384007
>
> Signed-off-by: Eric Blake

Re: [Qemu-block] [PATCH v2 3/3] qemu-img: Document backing options

2017-04-28 Thread Max Reitz

On 28.04.2017 11:38, Kevin Wolf wrote:
> Am 27.04.2017 um 17:36 hat Eric Blake geschrieben:
>> On 04/26/2017 08:46 AM, Max Reitz wrote:
>>> I think originally the idea was to remove the individual special case
>>> options in the long term in favour of the uniform -o. But I don't think
>>> the short options have fallen out of use, so we can as well document
>>> them again...

Hm, I'm not really sure what the use of deprecating them would be. We
would be able to use the single-letter options -b/-B/-F for something
else, but would we really want to do that...?

(And it should always be trivial to map these legacy parameters to
normal options.)

>>> Maybe we could change the implementation, however, so that the old
>>> options are really just aliases for the respective -o version.
>>
>> As in, a followup patch that adds a paragraph or two to the .texi file
>> stating that '-B @var{backing_file}' is shorthand for '-o
>> backing_file=@var{backing_file}'?
> 
> First and foremost, I meant in the C code where we have separate
> variables for all the shorthand options and then need to handle both
> ways explicitly in the code further down. Instead, -B could just
> directly add a 'backing_file' key to a QemuOpts (and -o would do the
> same instead of building a string that is parsed only later).

We already do that more or less for convert, actually (for -B, that is,
not for -o). We just have to keep out_baseimg separately because we
won't have a QemuOpts with -n.

Using add_old_style_options() in img_create() should be easy enough.

> But the documentation change that you're suggesting is probably a good
> idea as well.

Well, I more or less deliberately did not really document -b/-B/-F. I
just documented them in the syntax overview and named the parameters
backing_file and backing_fmt, both of which are valid options and are
*as such* documented in the explanatory text. So from my point of view,
it's already documented to be a shorthand.

Max

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v15 20/21] file-posix: Add image locking to perm operations

2017-04-28 Thread Kevin Wolf

Am 26.04.2017 um 05:34 hat Fam Zheng geschrieben:
> This extends the permission bits of op blocker API to external using
> Linux OFD locks.
> 
> Each permission in @perm and @shared_perm is represented by a locked
> byte in the image file.  Requesting a permission in @perm is translated
> to a shared lock of the corresponding byte; rejecting to share the same
> permission is translated to a shared lock of a separate byte. With that,
> we use 2x number of bytes of distinct permission types.
> 
> virtlockd in libvirt locks the first byte, so we do locking from a
> higher offset.
> 
> Suggested-by: Kevin Wolf 
> Signed-off-by: Fam Zheng 

>  BlockDriver bdrv_file = {
>  .format_name = "file",
>  .protocol_name = "file",
> @@ -1977,7 +2234,11 @@ BlockDriver bdrv_file = {
>  .bdrv_get_info = raw_get_info,
>  .bdrv_get_allocated_file_size
>  = raw_get_allocated_file_size,
> -
> +.bdrv_inactivate = raw_inactivate,
> +.bdrv_invalidate_cache = raw_invalidate_cache,
> +.bdrv_check_perm = raw_check_perm,
> +.bdrv_set_perm   = raw_set_perm,
> +.bdrv_abort_perm_update = raw_abort_perm_update,
>  .create_opts = &raw_create_opts,
>  };

By the way, is it intentional that we apply locking only to bdrv_file,
but not to bdrv_host_device or bdrv_host_cdrom?

Kevin

Re: [Qemu-block] [Qemu-devel] [PATCH v17 00/24] qcow2: persistent dirty bitmaps

2017-04-28 Thread Vladimir Sementsov-Ogievskiy


27.04.2017 19:43, John Snow wrote:


On 04/26/2017 07:30 AM, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

There is a new update of qcow2-bitmap series - v17.

web: 
https://src.openvz.org/users/vsementsov/repos/qemu/browse?at=qcow2-bitmap-v17
git: https://src.openvz.org/scm/~vsementsov/qemu.git (tag qcow2-bitmap-v17)


Hm, what is this branch based on? My patches tools are having trouble
finding common ancestors -- 2d6752d38d8acda6aae674a72b72be05482a58eb I
guess, but that's pre-rc0, and the histories don't quite match up.

I did a rebase myself, but I notice that patch 14/24; "qcow2: add
persistent dirty bitmaps support" causes a large number of iotest
regressions:

Failures: 007 009 010 011 012 013 014 015 017 018 019 020 022 023 024
025 026 028 029 030 031 032 034 035 037 038 039 040 041 042 043 044 046
047 048 050 051 052 053 055 056 058 060 061 062 063 065 066 073 074 080
082 086 089 090 095 097 098 102 104 110 112 114 115 117 121 122 124 129
130 132 133 138 140 141 142 150 152 154 155 156 158 159 170 176

Can you give that a look and if you respin, rebase on top of origin/master?


ohh, sorry, I've not rebased. will do



Thanks,
-John


v17:

08: add r-b's by Max and John
09: clear unknown autoclear features from BDRVQcow2State before calling
 qcow2_load_autoloading_dirty_bitmaps(), and also do not extra update
 header if it is updated by qcow2_load_autoloading_dirty_bitmaps().
11: new patch, splitted out from 16
12: rewrite commit message
14: changing bdrv_close moved to separate new patch 11
 s/1/1ULL/ ; s/%u/%llu/ for two errors
16: s/No/Not/
 add Max's r-b
24: new patch


v16:

mostly by Kevin's comments:
07: just moved here, as preparation for merging refcounts to 08
08: move "qcow2-bitmap: refcounts" staff to this patch, to not break qcow2-img 
check
 drop r-b's
 move necessary supporting static functions to this patch too (to not break 
compilation)
 fprintf -> error_report
 other small changes with error messages and comments
 code style
 for qcow2-bitmap.c:
   'include "exec/log.h"' was dropped
   s/1/(1U << 0) for BME_FLAG_IN_USE
   add BME_TABLE_ENTRY_FLAG_ALL_ONES to replace magic 1
   don't check 'cluster_size <= 0' in check_table_entry
old "[PATCH v15 08/25] block: introduce auto-loading bitmaps" was dropped
09: was "[PATCH v15 09/25] qcow2: add .bdrv_load_autoloading_dirty_bitmaps"
 drop r-b's
 some staff was moved to 08
 update_header_sync - error on bdrv_flush fail
 rename disk_sectors_in_bitmap_cluster to sectors_covered_by_bitmap_cluster
  and adjust comment.
  so, variable for storing this function result: s/dsc/sbc/
 s/1/BME_TABLE_ENTRY_FLAG_ALL_ONES/
 also, as Qcow2BitmapTable already introduced in 08,
s/table_offset/table.offset/ and s/table_size/table.size, etc..
 update_ext_header_and_dir_in_place: add comments, add additional
   update_header_sync, to reduce indeterminacy in case of error.
 call qcow2_load_autoloading_dirty_bitmaps directly from qcow2_open
 no .bdrv_load_autoloading_dirty_bitmaps
11: drop bdrv_store_persistent_dirty_bitmaps at all.
 drop r-b's
13: rename patch, rewrite commit msg
 drop r-b's
 move bdrv_release_named_dirty_bitmaps in bdrv_close() after 
drv->bdrv_close()
 Qcow2BitmapTable is already introduced, so no
   s/table_offset/table.offset/ and s/table_size/table.size, etc.. here
 old 25 with check_constraints_on_bitmap() improvements merged here (Eric)
 like in 09, s/dsc/sbc/ and 
s/disk_sectors_in_bitmap_cluster/sectors_covered_by_bitmap_cluster/
 no .bdrv_store_persistent_dirty_bitmaps
 call qcow2_store_persistent_dirty_bitmaps directly from qcow2_inactivate
15: corresponding part of 25 merged here. Add John's r-b, as 25 was also 
reviewed by John.
16: add John's r-b


v15:
13,14: add John's r-b
15: qcow2_can_store_new_dirty_bitmap:
   - check max dirty bitmaps and bitmap directory overhead
   - switch to error_prepend
 rm Max's r-b
 not add John's r-b
17-24: add John's r-b
25: changed because 15 changed,
 not add John's r-b


v14:

07: use '|=' to update need_update_header
 add John's r-b
 add Max's r-b
09: remove unused bitmap_table_to_cpu()
 left Max's r-b, hope it's ok
 add John's r-b
10: remove extra new line
 add John's r-b
11: add John's r-b
12: add John's r-b
13: small fixes by John's review:
- remove weird g_free of NULL pointer from
if (tb == NULL) {
g_free(tb);
return -EINVAL;
}
- remove extra comment "/* errp is already set */"
- s/"Too large bitmap directory"/"Bitmap directory is too large"/
 left Max's r-b, hope you don't mind
22: add Max's r-b
23: add Max's r-b
24: add Max's r-b
25: new patch to improve error message on check_constraints_on_bitmap fail
 


v13: Just a fix for style checker.
13: line over 80
14: line over 80
22: s/if () \n{/if () {/

v12:
07: d

Re: [Qemu-block] [PATCH v2 3/3] qemu-img: Document backing options

2017-04-28 Thread Kevin Wolf

Am 27.04.2017 um 17:36 hat Eric Blake geschrieben:
> On 04/26/2017 08:46 AM, Max Reitz wrote:
> > I think originally the idea was to remove the individual special case
> > options in the long term in favour of the uniform -o. But I don't think
> > the short options have fallen out of use, so we can as well document
> > them again...
> >
> > Maybe we could change the implementation, however, so that the old
> > options are really just aliases for the respective -o version.
> 
> As in, a followup patch that adds a paragraph or two to the .texi file
> stating that '-B @var{backing_file}' is shorthand for '-o
> backing_file=@var{backing_file}'?

First and foremost, I meant in the C code where we have separate
variables for all the shorthand options and then need to handle both
ways explicitly in the code further down. Instead, -B could just
directly add a 'backing_file' key to a QemuOpts (and -o would do the
same instead of building a string that is parsed only later).

But the documentation change that you're suggesting is probably a good
idea as well.

Kevin

pgpFsEMVAqaVh.pgp
Description: PGP signature

Re: [Qemu-block] [Qemu-devel] [PATCH v5 06/10] qobject: Use simpler QDict/QList scalar insertion macros

2017-04-28 Thread Markus Armbruster

Eric Blake  writes:

> We now have macros in place to make it less verbose to add a scalar
> to QDict and QList, so use them.  To make this patch smaller to
> review, a couple of subdirectories were done in earlier patches.

Scratch the last sentence.  Can do on commit.

> Patch created mechanically via:
>   spatch --sp-file scripts/coccinelle/qobject.cocci \
> --macro-file scripts/cocci-macro-file.h --dir . --in-place
> then touched up manually to fix a couple of '?:' back to original
> spacing, as well as avoiding a long line in monitor.c.
>
> Signed-off-by: Eric Blake 
> Reviewed-by: Markus Armbruster 
>
> ---
> v5: rebase to master (Coccinelle found a couple new spots), squash 3
> patches into 1, adjust R-b to only list Markus (while there were other
> reviews on the pre-squashed patches, Markus was the only one on all 3)

The block: part had

Acked-by: Richard W.M. Jones 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Alberto Garcia 

The tests and qobject parts had

Reviewed-by: Philippe Mathieu-Daudé 

Richard, Stefan, Alberto, Philippe, let me know if you'd like me to
convert your R-by of parts to an Acked-by of the combined patch.  Feel
free to review the combined patch, of course.

> v4: no change
> v3: new patch

Re: [Qemu-block] [PATCH 0/7] sockets: Flatten SocketAddress except in external interfaces

2017-04-28 Thread Markus Armbruster

Markus Armbruster  writes:

> SocketAddress is a simple union, and simple unions are awkward: they
> have their variant members wrapped in a "data" object on the wire, and
> require additional indirections in C.  Flatten it as follows: rename
> SocketAddress to SocketAddressLegacy, rename its flat sibling
> SocketAddressFlat to SocketAddress, convert all users of
> SocketAddressLegacy to SocketAddress, except for existing external
> interfaces.  This completes the work outlined in commit 9445673.

This series could go in via "Sockets" maintainers, but if you guys don't
mind, I'll simply route it through qapi-next.

75 matches

Mail list logo