[Qemu-block] [PULL 1/2] block/ssh: fix possible segmentation fault when .desc is not null-terminated

2018-01-31 Thread Jeff Cody
From: Murilo Opsfelder Araujo 

This patch prevents a possible segmentation fault when .desc members are checked
against NULL.

The ssh_runtime_opts was added by commit
8a6a80896d6af03b8ee0c17cdf37219eca2588a7 ("block/ssh: Use QemuOpts for runtime
options").

This fix was inspired by
http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg00883.html.

Fixes: 8a6a80896d6af03b8ee0c17cdf37219eca2588a7 ("block/ssh: Use QemuOpts for 
runtime options")
Cc: Max Reitz 
Cc: Eric Blake 
Signed-off-by: Murilo Opsfelder Araujo 
Reviewed-by: Eric Blake 
Reviewed-by: Jeff Cody 
Signed-off-by: Jeff Cody 
---
 block/ssh.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/ssh.c b/block/ssh.c
index b049a16..8890a0c 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -556,6 +556,7 @@ static QemuOptsList ssh_runtime_opts = {
 .type = QEMU_OPT_STRING,
 .help = "Defines how and what to check the host key against",
 },
+{ /* end of list */ }
 },
 };
 
-- 
2.9.5




[Qemu-block] [PULL 0/2] Block patches

2018-01-31 Thread Jeff Cody
The following changes since commit b05631954d6dfe93340d516660397e2c1a2a5dd6:

  Merge remote-tracking branch 'remotes/rth/tags/pull-hppa-20180131' into 
staging (2018-01-31 15:50:29 +)

are available in the git repository at:

  git://github.com/codyprime/qemu-kvm-jtc.git tags/block-pull-request

for you to fetch changes up to 45a79646ea746fa3f32083d0aa70512aae29f6b3:

  iotests: Make 200 run on tmpfs (2018-01-31 22:37:00 -0500)


Block patches


Max Reitz (1):
  iotests: Make 200 run on tmpfs

Murilo Opsfelder Araujo (1):
  block/ssh: fix possible segmentation fault when .desc is not
null-terminated

 block/ssh.c| 1 +
 tests/qemu-iotests/200 | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

-- 
2.9.5




[Qemu-block] [PULL 2/2] iotests: Make 200 run on tmpfs

2018-01-31 Thread Jeff Cody
From: Max Reitz 

200 currently fails on tmpfs because it sets cache=none.  However,
without that (and aio=native), the test still works now and it fails
before Jeff's series (on fc7dbc119e0852a70dc9fa68bb41a318e49e4cd6).  So
we can probably remove the aio=native safely, and replace cache=none by
cache=$CACHEMODE.

Signed-off-by: Max Reitz 
Reviewed-by: Jeff Cody 
Message-id: 20180117135015.15051-1-mre...@redhat.com
Signed-off-by: Jeff Cody 
---
 tests/qemu-iotests/200 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/200 b/tests/qemu-iotests/200
index d8787dd..ddbdedc 100755
--- a/tests/qemu-iotests/200
+++ b/tests/qemu-iotests/200
@@ -60,7 +60,7 @@ qemu_comm_method="qmp"
 _launch_qemu -device pci-bridge,id=bridge1,chassis_nr=1,bus=pci.0 \
  -object iothread,id=iothread0 \
  -device 
virtio-scsi-pci,bus=bridge1,addr=0x1f,id=scsi0,iothread=iothread0 \
- -drive 
file="${TEST_IMG}",media=disk,if=none,cache=none,id=drive_sysdisk,aio=native,format=$IMGFMT
 \
+ -drive 
file="${TEST_IMG}",media=disk,if=none,cache=$CACHEMODE,id=drive_sysdisk,format=$IMGFMT
 \
  -device 
scsi-hd,drive=drive_sysdisk,bus=scsi0.0,id=sysdisk,bootindex=0
 h1=$QEMU_HANDLE
 
-- 
2.9.5




Re: [Qemu-block] [Qemu-devel] [PATCH] block/mirror: change the semantic of 'force' of block-job-cancel

2018-01-31 Thread Liang Li
On Tue, Jan 30, 2018 at 03:18:31PM -0500, John Snow wrote:
> 
> 
> On 01/30/2018 03:38 AM, Liang Li wrote:
>> When doing drive mirror to a low speed shared storage, if there was heavy
>> BLK IO write workload in VM after the 'ready' event, drive mirror block job
>> can't be canceled immediately, it would keep running until the heavy BLK IO
>> workload stopped in the VM.
>> 
>> Because libvirt depends on block-job-cancel for block live migration, the
>> current block-job-cancel has the semantic to make sure data is in sync after
>> the 'ready' event.  This semantic can't meet some requirement, for example,
>> people may use drive mirror for realtime backup while need the ability of
>> block live migration. If drive mirror can't not be cancelled immediately,
>> it means block live migration need to wait, because libvirt make use drive
>> mirror to implement block live migration and only one drive mirror block
>> job is allowed at the same time for a give block dev.
>> 
>> We need a new interface for 'force cancel', which could quit block job
>> immediately if don't care about whether data is in sync or not.
>> 
>> 'force' is not used by libvirt currently, to make things simple, change
>> it's semantic slightly, hope it will not break some use case which need its
>> original semantic.
>> 
>> Cc: Paolo Bonzini 
>> Cc: Jeff Cody 
>> Cc: Kevin Wolf 
>> Cc: Max Reitz 
>> Cc: Eric Blake 
>> Cc: John Snow 
>> Reported-by: Huaitong Han 
>> Signed-off-by: Huaitong Han 
>> Signed-off-by: Liang Li 
>> ---
>> block/mirror.c|  9 +++--
>> blockdev.c|  4 ++--
>> blockjob.c| 11 ++-
>> hmp-commands.hx   |  3 ++-
>> include/block/blockjob.h  |  9 -
>> qapi/block-core.json  |  6 --
>> tests/test-blockjob-txn.c |  8 
>> 7 files changed, 29 insertions(+), 21 deletions(-)
>> 
>> diff --git a/block/mirror.c b/block/mirror.c
>> index c9badc1..c22dff9 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -869,11 +869,8 @@ static void coroutine_fn mirror_run(void *opaque)
>> 
>> ret = 0;
>> trace_mirror_before_sleep(s, cnt, s->synced, delay_ns);
>> -if (!s->synced) {
>> -block_job_sleep_ns(&s->common, delay_ns);
>> -if (block_job_is_cancelled(&s->common)) {
>> -break;
>> -}
>> +if (block_job_is_cancelled(&s->common) && s->common.force) {
>> +break;
> 
> what's the justification for removing the sleep in the case that
> !s->synced && !block_job_is_cancelled(...) ?
> 
if !block_job_is_cancelled() satisfied, the code in 'if (!should_complete) {}'
will execute, there is a block_job_sleep_ns there.

block_job_sleep_ns is for rate throttling, if there is no more data to sync, 
sleep is not needed, right?

>> } else if (!should_complete) {
>> delay_ns = (s->in_flight == 0 && cnt == 0 ? SLICE_TIME : 0);
>> block_job_sleep_ns(&s->common, delay_ns);
>> @@ -887,7 +884,7 @@ immediate_exit:
>>  * or it was cancelled prematurely so that we do not guarantee that
>>  * the target is a copy of the source.
>>  */
>> -assert(ret < 0 || (!s->synced && 
>> block_job_is_cancelled(&s->common)));
>> +assert(ret < 0 || block_job_is_cancelled(&s->common));
> 
> This assertion gets weaker in the case where force isn't provided, is
> that desired?
> 
yes. if force quit is used, the following condition can be true

(ret >= 0) && (s->synced) && (block_job_is_cancelled(&s->common)) 

so the above assert should be changed, or it will be failed.

>> assert(need_drain);
>> mirror_wait_for_all_io(s);
>> }
>> diff --git a/blockdev.c b/blockdev.c
>> index 8e977ee..039f156 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -145,7 +145,7 @@ void blockdev_mark_auto_del(BlockBackend *blk)
>> aio_context_acquire(aio_context);
>> 
>> if (bs->job) {
>> -block_job_cancel(bs->job);
>> +block_job_cancel(bs->job, false);
>> }
>> 
>> aio_context_release(aio_context);
>> @@ -3802,7 +3802,7 @@ void qmp_block_job_cancel(const char *device,
>> }
>> 
>> trace_qmp_block_job_cancel(job);
>> -block_job_cancel(job);
>> +block_job_cancel(job, force);
>> out:
>> aio_context_release(aio_context);
>> }
>> diff --git a/blockjob.c b/blockjob.c
>> index f5cea84..0aacb50 100644
>> --- a/blockjob.c
>> +++ b/blockjob.c
>> @@ -365,7 +365,7 @@ static void block_job_completed_single(BlockJob *job)
>> block_job_unref(job);
>> }
>> 
>> -static void block_job_cancel_async(BlockJob *job)
>> +static void block_job_cancel_async(BlockJob *job, bool force)
>> {
>> if (job->iostatus != BLOCK_DEVICE_IO_STATUS_OK) {
>> block_job_iostatus_reset(job);
>> @@ -376,6 +376,7 @@ static void block_job_cancel_async(BlockJob *job)
>> job->pause_count--;
>> }
>> job->cancelled = true;
>> +job->force = force;
>> }
>> 
> 
> I suppose thi

Re: [Qemu-block] [PATCH] block/mirror: change the semantic of 'force' of block-job-cancel

2018-01-31 Thread Liang Li
On Tue, Jan 30, 2018 at 08:20:03AM -0600, Eric Blake wrote:
> On 01/30/2018 02:38 AM, Liang Li wrote:
>> When doing drive mirror to a low speed shared storage, if there was heavy
>> BLK IO write workload in VM after the 'ready' event, drive mirror block job
>> can't be canceled immediately, it would keep running until the heavy BLK IO
>> workload stopped in the VM.
> 
> So far so good.   But the grammar and explanation in the rest of the
> commit is a bit hard to read; let me give a shot at an alternative wording:
> 
> Libvirt depends on the current block-job-cancel semantics, which is that
> when used without a flag after the 'ready' event, the command blocks
> until data is in sync.  However, these semantics are awkward in other
> situations, for example, people may use drive mirror for realtime
> backups while still wanting to use block live migration.  Libvirt cannot
> start a block live migration while another drive mirror is in progress,
> but the user would rather abandon the backup attempt as broken and
> proceed with the live migration than be stuck waiting for the current
> drive mirror backup to finish.
> 
> The drive-mirror command already includes a 'force' flag, which libvirt
> does not use, although it documented the flag as only being useful to
> quit a job which is paused.  However, since quitting a paused job has
> the same effect as abandoning a backup in a non-paused job (namely, the
> destination file is not in sync, and the command completes immediately),
> we can just improve the documentation to make the force flag obviously
> useful.
> 

much better, will include in the v2. Thanks!
>> 
>> Cc: Paolo Bonzini 
>> Cc: Jeff Cody 
>> Cc: Kevin Wolf 
>> Cc: Max Reitz 
>> Cc: Eric Blake 
>> Cc: John Snow 
>> Reported-by: Huaitong Han 
>> Signed-off-by: Huaitong Han 
>> Signed-off-by: Liang Li 
>> ---
> 
> 
>> +++ b/hmp-commands.hx
>> @@ -106,7 +106,8 @@ ETEXI
>> .args_type  = "force:-f,device:B",
>> .params = "[-f] device",
>> .help   = "stop an active background block operation (use -f"
>> -  "\n\t\t\t if the operation is currently paused)",
>> +  "\n\t\t\t if you want to abort the operation 
>> immediately"
>> +  "\n\t\t\t instead of keep running until data is in 
>> sync )",
> 
> s/sync )/sync)/
> 

done
>> .cmd= hmp_block_job_cancel,
>> },
>> 
>> diff --git a/include/block/blockjob.h b/include/block/blockjob.h
>> index 00403d9..4a96c42 100644
>> --- a/include/block/blockjob.h
>> +++ b/include/block/blockjob.h
>> @@ -63,6 +63,12 @@ typedef struct BlockJob {
>> bool cancelled;
>> 
>> /**
>> + * Set to true if the job should be abort immediately without waiting
> 
> s/be //

done
> 
>> + * for data is in sync.
> 
> s/is/to be/
> 

done
>> + */
>> +bool force;
>> +
>> +/**
>>  * Counter for pause request. If non-zero, the block job is either 
>> paused,
>>  * or if busy == true will pause itself as soon as possible.
>>  */
>> @@ -218,10 +224,11 @@ void block_job_start(BlockJob *job);
>> /**
>>  * block_job_cancel:
>>  * @job: The job to be canceled.
>> + * @force: Quit a job without waiting data is in sync.
> 
> s/data is/for data to be/
> 

done
>> +++ b/qapi/block-core.json
>> @@ -2098,8 +2098,10 @@
>> #  the name of the parameter), but since QEMU 2.7 it can have
>> #  other values.
>> #
>> -# @force: whether to allow cancellation of a paused job (default
>> -# false).  Since 1.3.
>> +# @force: #optional whether to allow cancellation a job without waiting 
>> data is
> 
> The '#optional' tag should no longer be added.
> 
>> +# in sync, please not that since 2.12 it's semantic is not exactly 
>> the
>> +# same as before, from 1.3 to 2.11 it means whether to allow 
>> cancellation
>> +# of a paused job (default false).  Since 1.3.
> 
> Reads awkwardly.  I suggest:
> 
> @force: If true, and the job has already emitted the event
> BLOCK_JOB_READY, abandon the job immediately (even if it is paused)
> instead of waiting for the destination to complete its final
> synchronization (since 1.3)
> 

much more clear
> 
>> +++ b/tests/test-blockjob-txn.c
>> @@ -125,7 +125,7 @@ static void test_single_job(int expected)
>> block_job_start(job);
>> 
>> if (expected == -ECANCELED) {
>> -block_job_cancel(job);
>> +block_job_cancel(job, false);
>> }
>> 
>> while (result == -EINPROGRESS) {
>> @@ -173,10 +173,10 @@ static void test_pair_jobs(int expected1, int 
>> expected2)
>> block_job_txn_unref(txn);
>> 
>> if (expected1 == -ECANCELED) {
>> -block_job_cancel(job1);
>> +block_job_cancel(job1, false);
>> }
>> if (expected2 == -ECANCELED) {
>> -block_job_cancel(job2);
>> +block_job_cancel(job2, false);
>> }
>> 
>> while (result1 == -EINPROGRESS || result2 == -EINPROGRESS) {
>> @@ -231,7 +231,7 @@ static void test_pair_jobs_fa

Re: [Qemu-block] [PATCH] iotests: Fix CID for VMDK afl image

2018-01-31 Thread Fam Zheng
On Thu, Feb 1, 2018 at 2:58 AM, Max Reitz  wrote:
> On 2018-01-30 07:25, Fam Zheng wrote:
>> This reverts commit 76bf133c4 which updated the reference output, and
>> fixed the reference image, because the code path we want to exercise is
>> actually the invalid image size.
>>
>> The descriptor block in the image, which includes the CID to verify, has been
>> invalid since the reference image was added. Since commit 9877860e7bd we 
>> report
>> this error earlier than the "file too large", so 059.out mismatches.
>>
>> The binary change is generated along the operations of:
>>
>>   $ bunzip2 afl9.vmdk.bz2
>>   $ qemu-img create -f vmdk fix.vmdk 1G
>>   $ dd if=afl9.vmdk.bz2 of=fix.vmdk bs=512 count=1 conv=notrunc
>>   $ mv fix.vmdk afl9.vmdk
>>   $ bzip2 afl9.vmdk
>>
>> Signed-off-by: Fam Zheng 
>>
>> ---
>>
>> v2: Fix commit message "qcow2 -> vmdk". [Kevin]
>> Revert 76bf133c4.
>
> H, now this fails again on my 32 bit build. :-(
>
> The issue there is that you get a "Cannot allocate memory" when trying
> to open the file.  My current fix was 2291712c39111a732 which simply
> converted that to "Invalid argument", but now it's getting a bit more
> complicated...  Should I just continue to play the game and check the
> output for "Cannot allocate memory" and print exactly what the reference
> output is expecting...?

Ahhh. OK, then, with a big comment.

I'd say let's just _notrun on 32 bit.

Fam



Re: [Qemu-block] [Qemu-devel] [RFC v3 00/14] blockjobs: add explicit job management

2018-01-31 Thread John Snow


On 01/26/2018 09:05 PM, John Snow wrote:
> For jobs that complete when a monitor isn't looking, there's no way to
> tell what the job's final return code was. We need to allow jobs to
> remain in the list until queried for reliable management.
> 
> Furthermore, it's not viable to have graph changes when the monitor
> isn't looking either. We need at least another event for that.
> 
> This series is a rough sketch for the WAITING, PENDING and CONCLUDED
> events that accompany an expanded job management scheme that a management
> client can opt-in to.
> 
> V3:
>  - Added WAITING and PENDING events
>  - Added block_job_finalize verb
>  - Added .pending() callback for jobs
>  - Tweaked how .commit/.abort work
> 
> V2:
>  - Added tests!
>  - Changed property name (Jeff, Paolo)
> 
> RFC / Known problems:
> - I need a lot more tests.
> - Jobs need to actually implement their .pending callback for this to be of 
> any
>   actual use.
> - Mirror needs to be refactored to use the commit/abort/pending/clean 
> callbacks
>   to fulfill the promise made by "no graph changes without user authorization"
>   that PENDING is supposed to offer
> - Jobs beyond backup will be able to opt-in to the new management scheme in 
> the
>   next version.
> - V4 will include a forced synchronicity for jobs in a transaction; i.e. all
>   jobs will be forced to use either the old or new styles, but not a mix.
> 
> Please take a look and shout loudly if I'm wandering in the wrong direction.
> 
> 
> 
> For convenience, this branch is available at:
> https://github.com/jnsnow/qemu.git branch block-job-reap
> https://github.com/jnsnow/qemu/tree/block-job-reap
> 
> This version is tagged block-job-reap-v3:
> https://github.com/jnsnow/qemu/releases/tag/block-job-reap-v3
> 
> John Snow (14):
>   blockjobs: add manual property
>   blockjobs: Add status enum
>   blockjobs: add state transition table
>   blockjobs: RFC add block_job_verb permission table
>   blockjobs: add block_job_dismiss
>   blockjobs: add CONCLUDED state
>   blockjobs: ensure abort is called for cancelled jobs
>   blockjobs: add commit, abort, clean helpers
>   blockjobs: add prepare callback
>   blockjobs: Add waiting event
>   blockjobs: add PENDING status and event
>   blockjobs: add block-job-finalize
>   blockjobs: Expose manual property
>   iotests: test manual job dismissal
> 
>  block/backup.c   |  22 ++--
>  block/commit.c   |   2 +-
>  block/mirror.c   |   2 +-
>  block/replication.c  |   5 +-
>  block/stream.c   |   2 +-
>  block/trace-events   |   2 +
>  blockdev.c   |  42 ++-
>  blockjob.c   | 279 
> ---
>  include/block/block_int.h|   9 +-
>  include/block/blockjob.h |  38 ++
>  include/block/blockjob_int.h |  12 +-
>  qapi/block-core.json | 135 +++--
>  tests/qemu-iotests/056   | 241 +
>  tests/qemu-iotests/056.out   |   4 +-
>  tests/test-bdrv-drain.c  |   4 +-
>  tests/test-blockjob-txn.c|   2 +-
>  tests/test-blockjob.c|   4 +-
>  17 files changed, 750 insertions(+), 55 deletions(-)
> 

NACK

There are a lot of changes I've already made to this series based on
Kevin's recommendations; please wait for the next revision.

And a lot of bugs in this series I've already found.

--js



Re: [Qemu-block] [PATCH v3 20/39] qcow2: Update qcow2_get_cluster_offset() to support L2 slices

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> qcow2_get_cluster_offset() checks how many contiguous bytes are
> available at a given offset. The returned number of bytes is limited
> by the amount that can be addressed without having to load more than
> one L2 table.
> 
> Since we'll be loading L2 slices instead of full tables this patch
> changes the limit accordingly using the size of the L2 slice for the
> calculations instead of the full table size.
> 
> One consequence of this is that with small L2 slices operations such
> as 'qemu-img map' will need to iterate in more steps because each
> qcow2_get_cluster_offset() call will potentially return a smaller
> number. However the code is already prepared for that so this doesn't
> break semantics.
> 
> The l2_table variable is also renamed to l2_slice to reflect this, and
> offset_to_l2_index() is replaced with offset_to_l2_slice_index().
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cluster.c | 30 +++---
>  1 file changed, 15 insertions(+), 15 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 19/39] qcow2: Update get_cluster_table() to support L2 slices

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> This patch updates get_cluster_table() to return L2 slices instead of
> full L2 tables.
> 
> The code itself needs almost no changes, it only needs to call
> offset_to_l2_slice_index() instead of offset_to_l2_index(). This patch
> also renames all the relevant variables and the documentation.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/qcow2-cluster.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 18/39] qcow2: Refactor get_cluster_table()

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> After the previous patch we're now always using l2_load() in
> get_cluster_table() regardless of whether a new L2 table has to be
> allocated or not.
> 
> This patch refactors that part of the code to use one single l2_load()
> call.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/qcow2-cluster.c | 21 +++--
>  1 file changed, 7 insertions(+), 14 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 2a53c1dc5f..0c0cab85e8 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -695,15 +695,7 @@ static int get_cluster_table(BlockDriverState *bs, 
> uint64_t offset,
>  return -EIO;
>  }
>  
> -/* seek the l2 table of the given l2 offset */
> -
> -if (s->l1_table[l1_index] & QCOW_OFLAG_COPIED) {
> -/* load the l2 table in memory */
> -ret = l2_load(bs, offset, l2_offset, &l2_table);
> -if (ret < 0) {
> -return ret;
> -}
> -} else {
> +if (!(s->l1_table[l1_index] & QCOW_OFLAG_COPIED)) {
>  /* First allocate a new L2 table (and do COW if needed) */
>  ret = l2_allocate(bs, l1_index);
>  if (ret < 0) {
> @@ -719,11 +711,12 @@ static int get_cluster_table(BlockDriverState *bs, 
> uint64_t offset,
>  /* Get the offset of the newly-allocated l2 table */
>  l2_offset = s->l1_table[l1_index] & L1E_OFFSET_MASK;
>  assert(offset_into_cluster(s, l2_offset) == 0);
> -/* Load the l2 table in memory */
> -ret = l2_load(bs, offset, l2_offset, &l2_table);
> -if (ret < 0) {
> -return ret;
> -}
> +}
> +
> +/* load the l2 table in memory */
> +ret = l2_load(bs, offset, l2_offset, &l2_table);
> +if (ret < 0) {
> +return ret;
>  }

I'd pull the l2_offset assignment (and alignment check) down below the
QCOW_OFLAG_COPIED check, saving us another two lines (!!1!) which we
could then spend on an

assert(s->l1_table[l1_index] & QCOW_OFLAG_COPIED);

Max

>  
>  /* find the cluster offset for the given disk offset */
> 




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 17/39] qcow2: Update l2_allocate() to support L2 slices

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> This patch updates l2_allocate() to support the qcow2 cache returning
> L2 slices instead of full L2 tables.
> 
> The old code simply gets an L2 table from the cache and initializes it
> with zeroes or with the contents of an existing table. With a cache
> that returns slices instead of tables the idea remains the same, but
> the code must now iterate over all the slices that are contained in an
> L2 table.
> 
> Since now we're operating with slices the function can no longer
> return the newly-allocated table, so it's up to the caller to retrieve
> the appropriate L2 slice after calling l2_allocate() (note that with
> this patch the caller is still loading full L2 tables, but we'll deal
> with that in a separate patch).
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/qcow2-cluster.c | 56 
> +++
>  1 file changed, 34 insertions(+), 22 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 57349928a9..2a53c1dc5f 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -264,11 +264,12 @@ int qcow2_write_l1_entry(BlockDriverState *bs, int 
> l1_index)
>   *
>   */
>  
> -static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
> +static int l2_allocate(BlockDriverState *bs, int l1_index)
>  {
>  BDRVQcow2State *s = bs->opaque;
>  uint64_t old_l2_offset;
> -uint64_t *l2_table = NULL;
> +uint64_t *l2_slice = NULL;
> +unsigned slice, slice_size2, n_slices;

I'd personally prefer size_t, but oh well.

(And also maybe slice_size_bytes or slice_bytes instead of the
not-so-intuitive slice_size2.  I know we're using *_size2 in other
places, but that's bad enough as it is.)

Overall (with that fixed or not, and with the spelling fixed as pointed
out by Eric);

Reviewed-by: Max Reitz 

However, I'm wondering whether this is the best approach.  The old L2
table is probably not going to be used after this function, so we're
basically polluting the cache here.  That was bad enough so far, but now
that actually means wasting multiple cache entries on it.

Sure, the code is simpler this way.  But maybe it would still be better
to manually copy the data over from the old offset...  (As long as it's
not much more complicated.)

Max

>  int64_t l2_offset;
>  int ret;
>  



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 16/39] qcow2: Prepare l2_allocate() for adding L2 slice support

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> Adding support for L2 slices to l2_allocate() needs (among other
> things) an extra loop that iterates over all slices of a new L2 table.
> 
> Putting all changes in one patch would make it hard to read because
> all semantic changes would be mixed with pure indentation changes.
> 
> To make things easier this patch simply creates a new block and
> changes the indentation of all lines of code inside it. Thus, all
> modifications in this patch are cosmetic. There are no semantic
> changes and no variables are renamed yet. The next patch will take
> care of that.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/qcow2-cluster.c | 55 
> ---
>  1 file changed, 30 insertions(+), 25 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 15/39] qcow2: Update l2_load() to support L2 slices

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> Each entry in the qcow2 L2 cache stores a full L2 table (which uses a
> complete cluster in the qcow2 image). A cluster is usually too large
> to be used efficiently as the size for a cache entry, so we want to
> decouple both values by allowing smaller cache entries. Therefore the
> qcow2 L2 cache will no longer return full L2 tables but slices
> instead.
> 
> This patch updates l2_load() so it can handle L2 slices correctly.
> Apart from the offset of the L2 table (which we already had) we also
> need the guest offset in order to calculate which one of the slices
> we need.
> 
> An L2 slice has currently the same size as an L2 table (one cluster),
> so for now this function will load exactly the same data as before.
> 
> This patch also removes a stale comment about the return value being
> a pointer to the L2 table. This function returns an error code since
> 55c17e9821c474d5fcdebdc82ed2fc096777d611.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cluster.c | 28 +---
>  1 file changed, 17 insertions(+), 11 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 14/39] qcow2: Add offset_to_l2_slice_index()

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> Similar to offset_to_l2_index(), this function takes a guest offset
> and returns the index in the L2 slice that contains its L2 entry.
> 
> An L2 slice has currently the same size as an L2 table (one cluster),
> so both functions return the same value for now.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2.h | 5 +
>  1 file changed, 5 insertions(+)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 13/39] qcow2: Add l2_slice_size field to BDRVQcow2State

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> The BDRVQcow2State structure contains an l2_size field, which stores
> the number of 64-bit entries in an L2 table.
> 
> For efficiency reasons we want to be able to load slices instead of
> full L2 tables, so we need to know how many entries an L2 slice can
> hold.
> 
> An L2 slice is the portion of an L2 table that is loaded by the qcow2
> cache. At the moment that cache can only load complete tables,
> therefore an L2 slice has the same size as an L2 table (one cluster)
> and l2_size == l2_slice_size.
> 
> Later we'll allow smaller slices, but until then we have to use this
> new l2_slice_size field to make the rest of the code ready for that.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2.c | 3 +++
>  block/qcow2.h | 1 +
>  2 files changed, 4 insertions(+)

Am I missing something or does this patch miss setting l2_slice_size in
qcow2_do_open()?

Max

> diff --git a/block/qcow2.c b/block/qcow2.c
> index e2d4bf7ad5..78f067cae7 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -805,6 +805,7 @@ static void read_cache_sizes(BlockDriverState *bs, 
> QemuOpts *opts,
>  typedef struct Qcow2ReopenState {
>  Qcow2Cache *l2_table_cache;
>  Qcow2Cache *refcount_block_cache;
> +int l2_slice_size; /* Number of entries in a slice of the L2 table */
>  bool use_lazy_refcounts;
>  int overlap_check;
>  bool discard_passthrough[QCOW2_DISCARD_MAX];
> @@ -886,6 +887,7 @@ static int qcow2_update_options_prepare(BlockDriverState 
> *bs,
>  }
>  }
>  
> +r->l2_slice_size = s->cluster_size / sizeof(uint64_t);
>  r->l2_table_cache = qcow2_cache_create(bs, l2_cache_size);
>  r->refcount_block_cache = qcow2_cache_create(bs, refcount_cache_size);
>  if (r->l2_table_cache == NULL || r->refcount_block_cache == NULL) {
> @@ -1049,6 +1051,7 @@ static void 
> qcow2_update_options_commit(BlockDriverState *bs,
>  }
>  s->l2_table_cache = r->l2_table_cache;
>  s->refcount_block_cache = r->refcount_block_cache;
> +s->l2_slice_size = r->l2_slice_size;
>  
>  s->overlap_check = r->overlap_check;
>  s->use_lazy_refcounts = r->use_lazy_refcounts;
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 0559afbc63..e0aee88811 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -251,6 +251,7 @@ typedef struct BDRVQcow2State {
>  int cluster_bits;
>  int cluster_size;
>  int cluster_sectors;
> +int l2_slice_size;
>  int l2_bits;
>  int l2_size;
>  int l1_size;
> 




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 12/39] qcow2: Add offset_to_l1_index()

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> Similar to offset_to_l2_index(), this function returns the index in
> the L1 table for a given guest offset. This is only used in a couple
> of places and it's not a particularly complex calculation, but it
> makes the code a bit more readable.
> 
> Although in the qcow2_get_cluster_offset() case the old code was
> taking advantage of the l1_bits variable, we're going to get rid of
> the other uses of l1_bits in a later patch anyway, so it doesn't make
> sense to keep it just for this.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cluster.c | 4 ++--
>  block/qcow2.h | 5 +
>  2 files changed, 7 insertions(+), 2 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 11/39] qcow2: Remove BDS parameter from qcow2_cache_is_table_offset()

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> This function was only using the BlockDriverState parameter to pass it
> to qcow2_cache_get_table_addr(). This is no longer necessary so this
> parameter can be removed.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cache.c| 3 +--
>  block/qcow2-refcount.c | 6 +++---
>  block/qcow2.h  | 3 +--
>  3 files changed, 5 insertions(+), 7 deletions(-)

I was hoping you'd smuggle something funny into one of the commit
messages of these patches to test whether we'd actually read them. :(

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 10/39] qcow2: Remove BDS parameter from qcow2_cache_discard()

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> This function was only using the BlockDriverState parameter to pass it
> to qcow2_cache_get_table_idx() and qcow2_cache_table_release(). This
> is no longer necessary so this parameter can be removed.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cache.c| 2 +-
>  block/qcow2-refcount.c | 6 +++---
>  block/qcow2.h  | 2 +-
>  3 files changed, 5 insertions(+), 5 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 09/39] qcow2: Remove BDS parameter from qcow2_cache_clean_unused()

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> This function was only using the BlockDriverState parameter to pass it
> to qcow2_cache_table_release(). This is no longer necessary so this
> parameter can be removed.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cache.c | 2 +-
>  block/qcow2.c   | 4 ++--
>  block/qcow2.h   | 2 +-
>  3 files changed, 4 insertions(+), 4 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 08/39] qcow2: Remove BDS parameter from qcow2_cache_destroy()

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> This function was never using the BlockDriverState parameter so it can
> be safely removed.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cache.c |  2 +-
>  block/qcow2.c   | 16 
>  block/qcow2.h   |  2 +-
>  3 files changed, 10 insertions(+), 10 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 07/39] qcow2: Remove BDS parameter from qcow2_cache_put()

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> This function was only using the BlockDriverState parameter to pass it
> to qcow2_cache_get_table_idx(). This is no longer necessary so this
> parameter can be removed.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cache.c|  2 +-
>  block/qcow2-cluster.c  | 28 ++--
>  block/qcow2-refcount.c | 30 +++---
>  block/qcow2.h  |  2 +-
>  4 files changed, 31 insertions(+), 31 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 06/39] qcow2: Remove BDS parameter from qcow2_cache_entry_mark_dirty()

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> This function was only using the BlockDriverState parameter to pass it
> to qcow2_cache_get_table_idx(). This is no longer necessary so this
> parameter can be removed.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cache.c|  3 +--
>  block/qcow2-cluster.c  | 12 ++--
>  block/qcow2-refcount.c | 14 ++
>  block/qcow2.h  |  3 +--
>  4 files changed, 14 insertions(+), 18 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 05/39] qcow2: Remove BDS parameter from qcow2_cache_table_release()

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> This function was only using the BlockDriverState parameter to get the
> cache table size (since it was equal to the cluster size). This is no
> longer necessary so this parameter can be removed.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cache.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 04/39] qcow2: Remove BDS parameter from qcow2_cache_get_table_idx()

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> This function was only using the BlockDriverState parameter to get the
> cache table size (since it was equal to the cluster size). This is no
> longer necessary so this parameter can be removed.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cache.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 03/39] qcow2: Remove BDS parameter from qcow2_cache_get_table_addr()

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> This function was only using the BlockDriverState parameter to get the
> cache table size (since it was equal to the cluster size). This is no
> longer necessary so this parameter can be removed.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cache.c | 13 ++---
>  1 file changed, 6 insertions(+), 7 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 02/39] qcow2: Add table size field to Qcow2Cache

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> The table size in the qcow2 cache is currently equal to the cluster
> size. This doesn't allow us to use the cache memory efficiently,
> particularly with large cluster sizes, so we need to be able to have
> smaller cache tables that are independent from the cluster size. This
> patch adds a new field to Qcow2Cache that we can use instead of the
> cluster size.
> 
> The current table size is still being initialized to the cluster size,
> so there are no semantic changes yet, but this patch will allow us to
> prepare the rest of the code and simplify a few function calls.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cache.c | 29 ++---
>  1 file changed, 14 insertions(+), 15 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 39/39] iotests: Add l2-cache-entry-size to iotest 137

2018-01-31 Thread Eric Blake
On 01/26/2018 09:00 AM, Alberto Garcia wrote:
> This test tries reopening a qcow2 image with valid and invalid
> options. This patch adds l2-cache-entry-size to the set.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  tests/qemu-iotests/137 | 5 +
>  tests/qemu-iotests/137.out | 2 ++
>  2 files changed, 7 insertions(+)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 38/39] iotests: Test downgrading an image using a small L2 slice size

2018-01-31 Thread Eric Blake
On 01/26/2018 09:00 AM, Alberto Garcia wrote:
> expand_zero_clusters_in_l1() is used when downgrading qcow2 images
> from v3 to v2 (compat=0.10). This is one of the functions that needed
> more changes to support L2 slices, so this patch extends iotest 061 to
> test downgrading a qcow2 image using a smaller slice size.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  tests/qemu-iotests/061 | 16 
>  tests/qemu-iotests/061.out | 61 
> ++
>  2 files changed, 77 insertions(+)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 37/39] iotests: Test valid values of l2-cache-entry-size

2018-01-31 Thread Eric Blake
On 01/26/2018 09:00 AM, Alberto Garcia wrote:
> The l2-cache-entry-size setting can only contain values that are
> powers of two between 512 and the cluster size.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  tests/qemu-iotests/103 | 17 +
>  tests/qemu-iotests/103.out |  3 +++
>  2 files changed, 20 insertions(+)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 36/39] qcow2: Allow configuring the L2 slice size

2018-01-31 Thread Eric Blake
On 01/26/2018 09:00 AM, Alberto Garcia wrote:
> Now that the code is ready to handle L2 slices we can finally add an
> option to allow configuring their size.
> 
> An L2 slice is the portion of an L2 table that is read by the qcow2
> cache. Until now the cache was always reading full L2 tables, and
> since the L2 table size is equal to the cluster size this was not very
> efficient with large clusters. Here's a more detailed explanation of
> why it makes sense to have smaller cache entries in order to load L2
> data:
> 
>https://lists.gnu.org/archive/html/qemu-block/2017-09/msg00635.html
> 
> This patch introduces a new command-line option to the qcow2 driver
> named l2-cache-entry-size (cf. l2-cache-size). The cache entry size
> has the same restrictions as the cluster size: it must be a power of
> two and it has the same range of allowed values, with the additional
> requirement that it must not be larger than the cluster size.
> 
> The L2 cache entry size (L2 slice size) remains equal to the cluster
> size for now by default, so this feature must be explicitly enabled.
> Although my tests show that 4KB slices consistently improve
> performance and give the best results, let's wait and make more tests
> with different cluster sizes before deciding on an optimal default.
> 
> Now that the cache entry size is not necessarily equal to the cluster
> size we need to reflect that in the MIN_L2_CACHE_SIZE documentation.
> That minimum value is a requirement of the COW algorithm: we need to
> read two L2 slices (and not two L2 tables) in order to do COW, see
> l2_allocate() for the actual code.
> 
> Signed-off-by: Alberto Garcia 
> ---
Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 01/39] qcow2: Fix documentation of get_cluster_table()

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> This function has not been returning the offset of the L2 table since
> commit 3948d1d4876065160583e79533bf604481063833
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cluster.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH] hbitmap: fix missing restore count when finish deserialization

2018-01-31 Thread Max Reitz
On 2018-01-31 20:01, John Snow wrote:
> 
> 
> On 01/31/2018 01:54 PM, Max Reitz wrote:
>> On 2018-01-18 11:58, Liang Li wrote:
>>> The .count of HBitmap is forgot to set in function
>>> hbitmap_deserialize_finish, let's set it to the right value.
>>>
>>> Cc: Vladimir Sementsov-Ogievskiy 
>>> Cc: Fam Zheng 
>>> Cc: Max Reitz 
>>> Cc: John Snow 
>>> Signed-off-by: weiping zhang 
>>> Signed-off-by: Liang Li 
>>> ---
>>>  util/hbitmap.c | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/util/hbitmap.c b/util/hbitmap.c
>>> index 289778a..58a2c93 100644
>>> --- a/util/hbitmap.c
>>> +++ b/util/hbitmap.c
>>> @@ -630,6 +630,7 @@ void hbitmap_deserialize_finish(HBitmap *bitmap)
>>>  }
>>>  
>>>  bitmap->levels[0][0] |= 1UL << (BITS_PER_LONG - 1);
>>> +bitmap->count = hb_count_between(bitmap, 0, bitmap->size - 1);
>>>  }
>>>  
>>>  void hbitmap_free(HBitmap *hb)
>>
>> Actually CC-ing John...
>>
>> (Looks good to me, though.)
>>
>> Max
>>
> 
> Staged already, sorry. Will send the PR this Friday.
> 
> https://github.com/jnsnow/qemu/commit/78ad6913bd34a54f658d5182990fe149614d6402

OK, good. :-)

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v2] block: maintain persistent disabled bitmaps

2018-01-31 Thread Max Reitz
On 2018-01-29 19:43, Max Reitz wrote:
> On 2018-01-22 11:41, Vladimir Sementsov-Ogievskiy wrote:
>> To maintain load/store disabled bitmap there is new approach:
>>
>>  - deprecate @autoload flag of block-dirty-bitmap-add, make it ignored
>>  - store enabled bitmaps as "auto" to qcow2
>>  - store disabled bitmaps without "auto" flag to qcow2
>>  - on qcow2 open load "auto" bitmaps as enabled and others
>>as disabled (except in_use bitmaps)
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> Reviewed-by: John Snow 
>> ---
> 
> Thanks, looks very reasonable.  Applied to my block branch:
> 
> https://github.com/XanClic/qemu/commits/block

...aaand I've only just now seen that iotest 176 will need to be fixed
along with this, so I'm going to unqueue this patch for now.

And when I'm already at it: Should we add deprecation information to
qemu-doc.texi?

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH] hbitmap: fix missing restore count when finish deserialization

2018-01-31 Thread John Snow


On 01/31/2018 01:54 PM, Max Reitz wrote:
> On 2018-01-18 11:58, Liang Li wrote:
>> The .count of HBitmap is forgot to set in function
>> hbitmap_deserialize_finish, let's set it to the right value.
>>
>> Cc: Vladimir Sementsov-Ogievskiy 
>> Cc: Fam Zheng 
>> Cc: Max Reitz 
>> Cc: John Snow 
>> Signed-off-by: weiping zhang 
>> Signed-off-by: Liang Li 
>> ---
>>  util/hbitmap.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/util/hbitmap.c b/util/hbitmap.c
>> index 289778a..58a2c93 100644
>> --- a/util/hbitmap.c
>> +++ b/util/hbitmap.c
>> @@ -630,6 +630,7 @@ void hbitmap_deserialize_finish(HBitmap *bitmap)
>>  }
>>  
>>  bitmap->levels[0][0] |= 1UL << (BITS_PER_LONG - 1);
>> +bitmap->count = hb_count_between(bitmap, 0, bitmap->size - 1);
>>  }
>>  
>>  void hbitmap_free(HBitmap *hb)
> 
> Actually CC-ing John...
> 
> (Looks good to me, though.)
> 
> Max
> 

Staged already, sorry. Will send the PR this Friday.

https://github.com/jnsnow/qemu/commit/78ad6913bd34a54f658d5182990fe149614d6402



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH] iotests: Fix CID for VMDK afl image

2018-01-31 Thread Max Reitz
On 2018-01-30 07:25, Fam Zheng wrote:
> This reverts commit 76bf133c4 which updated the reference output, and
> fixed the reference image, because the code path we want to exercise is
> actually the invalid image size.
> 
> The descriptor block in the image, which includes the CID to verify, has been
> invalid since the reference image was added. Since commit 9877860e7bd we 
> report
> this error earlier than the "file too large", so 059.out mismatches.
> 
> The binary change is generated along the operations of:
> 
>   $ bunzip2 afl9.vmdk.bz2
>   $ qemu-img create -f vmdk fix.vmdk 1G
>   $ dd if=afl9.vmdk.bz2 of=fix.vmdk bs=512 count=1 conv=notrunc
>   $ mv fix.vmdk afl9.vmdk
>   $ bzip2 afl9.vmdk
> 
> Signed-off-by: Fam Zheng 
> 
> ---
> 
> v2: Fix commit message "qcow2 -> vmdk". [Kevin]
> Revert 76bf133c4.

H, now this fails again on my 32 bit build. :-(

The issue there is that you get a "Cannot allocate memory" when trying
to open the file.  My current fix was 2291712c39111a732 which simply
converted that to "Invalid argument", but now it's getting a bit more
complicated...  Should I just continue to play the game and check the
output for "Cannot allocate memory" and print exactly what the reference
output is expecting...?

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH] hbitmap: fix missing restore count when finish deserialization

2018-01-31 Thread Max Reitz
On 2018-01-18 11:58, Liang Li wrote:
> The .count of HBitmap is forgot to set in function
> hbitmap_deserialize_finish, let's set it to the right value.
> 
> Cc: Vladimir Sementsov-Ogievskiy 
> Cc: Fam Zheng 
> Cc: Max Reitz 
> Cc: John Snow 
> Signed-off-by: weiping zhang 
> Signed-off-by: Liang Li 
> ---
>  util/hbitmap.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/util/hbitmap.c b/util/hbitmap.c
> index 289778a..58a2c93 100644
> --- a/util/hbitmap.c
> +++ b/util/hbitmap.c
> @@ -630,6 +630,7 @@ void hbitmap_deserialize_finish(HBitmap *bitmap)
>  }
>  
>  bitmap->levels[0][0] |= 1UL << (BITS_PER_LONG - 1);
> +bitmap->count = hb_count_between(bitmap, 0, bitmap->size - 1);
>  }
>  
>  void hbitmap_free(HBitmap *hb)

Actually CC-ing John...

(Looks good to me, though.)

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v7 8/9] qcow2: skip writing zero buffers to empty COW areas

2018-01-31 Thread Eric Blake
On 01/31/2018 12:35 PM, Max Reitz wrote:

>>> Not sure how useful 2 is, though.  (I don't know.  I always hear about
>>> people wanting to optimize for such a case where a backing file is
>>> shorter than the overlay, but I can't imagine a real use case for that.)
>>
>> I can; here's what's happened to me personally.  I created an image, and
>> took internal snapshots (yeah, I know those aren't in fashion these
>> days, but this was long ago).  Later, I ran out of space.  I wanted to
>> resize the image, but am not convinced whether resizing the image will
>> play nicely with the internal snapshots (in fact, I don't recall
>> off-hand whether this was something we prevented in the past and now
>> support, or if it is still unsupported now...) - so the easiest way is
>> to create a larger overlay image.
> 
> But you were convinced that creating an overlay would play nicely with
> the internal snapshots? ;-)

Yes. As long as I don't mind pointing back to the original file (rather
than the overlay) at the point I attempt to revert to the internal
snapshot, then loading the snapshot doesn't have to worry about a size
change.

> 
> I'm not sure whether that sounds like a use case I'd want to optimize
> for, but, well.

I guess it boils down to whether the optimization is a low-maintenance
freebie.  There's no reason to reject the optimization if a patch
demonstrates it is easy to support and it has iotests coverage.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [Bug 1739304] Re: Passing a directory to (eg.) -cdrom results in misleading error message

2018-01-31 Thread John Snow


On 01/31/2018 01:36 PM, Michal Suchánek wrote:
> On Wed, 20 Dec 2017 21:57:00 -0500
> John Snow  wrote:
> 
>> On 12/20/2017 05:31 AM, Dr. David Alan Gilbert wrote:
>>> Yep, can repeat it here, it seems pretty random which error it
>>> gives:
>>>
>>> [dgilbert@dgilbert-t530 try]$ ./x86_64-softmmu/qemu-system-x86_64
>>> -cdrom /tmp qemu-system-x86_64: -cdrom /tmp: Could not refresh
>>> total sector count: Invalid argument [dgilbert@dgilbert-t530
>>> try]$ ./x86_64-softmmu/qemu-system-x86_64 -cdrom /var
>>> qemu-system-x86_64: -cdrom /var: Could not read image for
>>> determining its format: File too large
>>>
>>>
>>> ** Changed in: qemu
>>>Status: New => Confirmed
>>>   
>>
>> Looks like directories play funny games.
>>
> 
> ...
> 
>> specifically:
>>
>> (gdb) s
>> 1963 size = lseek(s->fd, 0, SEEK_END);
>> (gdb) s
>> 1964 if (size < 0) {
>> (gdb) print size
>> $37 = 9223372036854775807
>>
>> cool, cool, cool. This value is 0x7fff and errno isn't
>> set. cool and good function.
> 
> Indeed: The behavior of lseek() on devices which are incapable of
> seeking is implementation-defined.
> 
>>
>> so, lseek on a folder returns crazy nonsense. Perhaps we ought to
>> actually specifically disallow folders, we don't appear to.
> 
> It probably returns -1 which it is supposed to do on error. It should
> also set errno in that case, though.
> 
> So this is probably bug in the error handling code in lseek.
> 
> Thanks
> 
> Michal
> 

Thanks. It looks like we can't make stronger guarantees about the
behavior of lseek, so I submitted a patch to ratchet down QEMU's
acceptable file types:

https://lists.gnu.org/archive/html/qemu-devel/2018-01/msg05055.html

Thanks,
--John



Re: [Qemu-block] [Qemu-devel] [PATCH v4] file-posix: specify expected filetypes

2018-01-31 Thread John Snow
ping

On 01/19/2018 06:03 PM, Eric Blake wrote:
> On 01/19/2018 04:47 PM, John Snow wrote:
>> Adjust each caller of raw_open_common to specify if they are expecting
>> host and character devices or not. Tighten expectations of file types upon
>> open in the common code and refuse types that are not expected.
>>
>> This has two effects:
>>
>> (1) Character and block devices are now considered deprecated for the
>> 'file' driver, which expects only S_IFREG, and
>> (2) no file-posix driver (file, host_cdrom, or host_device) can open
>> directories now.
>>
>> I don't think there's a legitimate reason to open directories as if
>> they were files. This prevents QEMU from opening and attempting to probe
>> a directory inode, which can break in exciting ways. One of those ways
>> is lseek on ext4/xfs, which will return 0x7fff as the file
>> size instead of EISDIR. This can coax QEMU into responding with a
>> confusing "file too big" instead of "Hey, that's not a file".
>>
>> See: https://bugs.launchpad.net/qemu/+bug/1739304/
>> Signed-off-by: John Snow 
>> ---
> 
> Reviewed-by: Eric Blake 
> 



Re: [Qemu-block] [Qemu-devel] [Bug 1739304] Re: Passing a directory to (eg.) -cdrom results in misleading error message

2018-01-31 Thread Michal Suchánek
On Wed, 20 Dec 2017 21:57:00 -0500
John Snow  wrote:

> On 12/20/2017 05:31 AM, Dr. David Alan Gilbert wrote:
> > Yep, can repeat it here, it seems pretty random which error it
> > gives:
> > 
> > [dgilbert@dgilbert-t530 try]$ ./x86_64-softmmu/qemu-system-x86_64
> > -cdrom /tmp qemu-system-x86_64: -cdrom /tmp: Could not refresh
> > total sector count: Invalid argument [dgilbert@dgilbert-t530
> > try]$ ./x86_64-softmmu/qemu-system-x86_64 -cdrom /var
> > qemu-system-x86_64: -cdrom /var: Could not read image for
> > determining its format: File too large
> > 
> > 
> > ** Changed in: qemu
> >Status: New => Confirmed
> >   
> 
> Looks like directories play funny games.
> 

...

> specifically:
> 
> (gdb) s
> 1963  size = lseek(s->fd, 0, SEEK_END);
> (gdb) s
> 1964  if (size < 0) {
> (gdb) print size
> $37 = 9223372036854775807
> 
> cool, cool, cool. This value is 0x7fff and errno isn't
> set. cool and good function.

Indeed: The behavior of lseek() on devices which are incapable of
seeking is implementation-defined.

> 
> so, lseek on a folder returns crazy nonsense. Perhaps we ought to
> actually specifically disallow folders, we don't appear to.

It probably returns -1 which it is supposed to do on error. It should
also set errno in that case, though.

So this is probably bug in the error handling code in lseek.

Thanks

Michal



Re: [Qemu-block] [PATCH v7 8/9] qcow2: skip writing zero buffers to empty COW areas

2018-01-31 Thread Max Reitz
On 2018-01-31 19:32, Eric Blake wrote:
> On 01/31/2018 11:40 AM, Max Reitz wrote:
> 
>>> Maybe it's good enough to cover these cases:
>>>  1. no backing
>>>  2. beyond bdrv_getlength() in backing
>>>  3. unallocated in backing qcow2 (covers 'beyond bdrv_getlength()
>>>   in backing->file')
>>>
>>> 1 & 2 are easy to check;
>>
>> Not sure how useful 2 is, though.  (I don't know.  I always hear about
>> people wanting to optimize for such a case where a backing file is
>> shorter than the overlay, but I can't imagine a real use case for that.)
> 
> I can; here's what's happened to me personally.  I created an image, and
> took internal snapshots (yeah, I know those aren't in fashion these
> days, but this was long ago).  Later, I ran out of space.  I wanted to
> resize the image, but am not convinced whether resizing the image will
> play nicely with the internal snapshots (in fact, I don't recall
> off-hand whether this was something we prevented in the past and now
> support, or if it is still unsupported now...) - so the easiest way is
> to create a larger overlay image.

But you were convinced that creating an overlay would play nicely with
the internal snapshots? ;-)

I'm not sure whether that sounds like a use case I'd want to optimize
for, but, well.

>>> 3: if that's not too hacky maybe we can do the bdrv_is_allocated() check
>>> for qcow2 exclusively and if there is raw (or any other format) backing
>>> image - do the COW
>>
>> Yes, well, it seems useful in principle...  But it would require general
>> block layer work indeed.  Maybe a new flag for bdrv_block_status*() that
>> says only to look into the format layer?
> 
> Maybe indeed - but first I want my byte-based block_status stuff to land ;)

It could be done as an optimization in a follow-up to this series
anyway, yes.

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v7 8/9] qcow2: skip writing zero buffers to empty COW areas

2018-01-31 Thread Eric Blake
On 01/31/2018 11:40 AM, Max Reitz wrote:

>> Maybe it's good enough to cover these cases:
>>  1. no backing
>>  2. beyond bdrv_getlength() in backing
>>  3. unallocated in backing qcow2 (covers 'beyond bdrv_getlength()
>>   in backing->file')
>>
>> 1 & 2 are easy to check;
> 
> Not sure how useful 2 is, though.  (I don't know.  I always hear about
> people wanting to optimize for such a case where a backing file is
> shorter than the overlay, but I can't imagine a real use case for that.)

I can; here's what's happened to me personally.  I created an image, and
took internal snapshots (yeah, I know those aren't in fashion these
days, but this was long ago).  Later, I ran out of space.  I wanted to
resize the image, but am not convinced whether resizing the image will
play nicely with the internal snapshots (in fact, I don't recall
off-hand whether this was something we prevented in the past and now
support, or if it is still unsupported now...) - so the easiest way is
to create a larger overlay image.

> 
>> 3: if that's not too hacky maybe we can do the bdrv_is_allocated() check
>> for qcow2 exclusively and if there is raw (or any other format) backing
>> image - do the COW
> 
> Yes, well, it seems useful in principle...  But it would require general
> block layer work indeed.  Maybe a new flag for bdrv_block_status*() that
> says only to look into the format layer?

Maybe indeed - but first I want my byte-based block_status stuff to land ;)

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH] block: implement the bdrv_reopen_prepare helper for LUKS driver

2018-01-31 Thread Max Reitz
On 2018-01-18 11:31, Daniel P. Berrange wrote:
> If the bdrv_reopen_prepare helper isn't provided, the qemu-img commit
> command fails to re-open the base layer after committing changes into
> it. Provide a no-op implementation for the LUKS driver, since there
> is not any custom work that needs doing to re-open it.
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  block/crypto.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/block/crypto.c b/block/crypto.c
> index 60ddf8623e..bb9a8f5376 100644
> --- a/block/crypto.c
> +++ b/block/crypto.c
> @@ -382,6 +382,12 @@ static void block_crypto_close(BlockDriverState *bs)
>  qcrypto_block_free(crypto->block);
>  }
>  
> +static int block_crypto_reopen_prepare(BDRVReopenState *state,
> +   BlockReopenQueue *queue, Error **errp)
> +{
> +/* nothing needs checking */
> +return 0;
> +}

Unfortunately I have to admit I'm not quite an expert on reopen
myself...  But generally it is used to change options, so in theory one
could provide a new key-secret here.  It's up to you whether you want to
support that or not (with this implementation the user will get an error
when they want to change the key-secret).

Apart from that, I think a no-op should be OK.

Max

>  
>  /*
>   * 1 MB bounce buffer gives good performance / memory tradeoff
> @@ -620,6 +626,7 @@ BlockDriver bdrv_crypto_luks = {
>  .bdrv_truncate  = block_crypto_truncate,
>  .create_opts= &block_crypto_create_opts_luks,
>  
> +.bdrv_reopen_prepare = block_crypto_reopen_prepare,
>  .bdrv_refresh_limits = block_crypto_refresh_limits,
>  .bdrv_co_preadv = block_crypto_co_preadv,
>  .bdrv_co_pwritev= block_crypto_co_pwritev,
> 




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v7 8/9] qcow2: skip writing zero buffers to empty COW areas

2018-01-31 Thread Max Reitz
On 2018-01-30 15:23, Anton Nefedov wrote:
> 
> 
> On 29/1/2018 11:28 PM, Max Reitz wrote:
>> On 2018-01-18 18:49, Anton Nefedov wrote:
>>> If COW areas of the newly allocated clusters are zeroes on the
>>> backing image,
>>> efficient bdrv_write_zeroes(flags=BDRV_REQ_ALLOCATE) can be used on
>>> the whole
>>> cluster instead of writing explicit zero buffers later in perform_cow().
>>>
>>> iotest 060:
>>> write to the discarded cluster does not trigger COW anymore.
>>> Use a backing image instead.
>>>
>>> iotest 066:
>>> cluster-alignment areas that were not really COWed are now detected
>>> as zeroes, hence the initial write has to be exactly the same size for
>>> the maps to match
>>>
>>> Signed-off-by: Anton Nefedov 
>>> ---
>>>   qapi/block-core.json   |  4 ++-
>>>   block/qcow2.h  |  6 +
>>>   block/qcow2-cluster.c  |  2 +-
>>>   block/qcow2.c  | 66
>>> --
>>>   block/trace-events |  1 +
>>>   tests/qemu-iotests/060 | 26 +++---
>>>   tests/qemu-iotests/060.out |  5 +++-
>>>   tests/qemu-iotests/066 |  2 +-
>>>   tests/qemu-iotests/066.out |  4 +--
>>>   9 files changed, 98 insertions(+), 18 deletions(-)
>>
>> [...]
>>
>>> @@ -1875,6 +1880,52 @@ static bool is_zero(BlockDriverState *bs,
>>> int64_t offset, int64_t bytes)
>>>   return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == bytes;
>>>   }
>>>   +static bool is_zero_cow(BlockDriverState *bs, QCowL2Meta *m)
>>> +{
>>> +    return is_zero(bs, m->offset + m->cow_start.offset,
>>> +   m->cow_start.nb_bytes) &&
>>> +   is_zero(bs, m->offset + m->cow_end.offset,
>>> m->cow_end.nb_bytes);
>>> +}
>>> +
>>> +static int handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
>>> +{
>>> +    BDRVQcow2State *s = bs->opaque;
>>> +    QCowL2Meta *m;
>>> +
>>> +    for (m = l2meta; m != NULL; m = m->next) {
>>> +    int ret;
>>> +
>>> +    if (!m->cow_start.nb_bytes && !m->cow_end.nb_bytes) {
>>> +    continue;
>>> +    }
>>> +
>>> +    if (bs->encrypted) {
>>> +    continue;
>>> +    }
>>
>> Not sure if the compiler optimizes this anyway, but I'd pull this out of
>> the loop.
>>
> 
> An imprint of the following patches (which were dropped from this
> series) - preallocation ahead of image EOF, which takes action
> regardless of image encryption.
> 
> But I'll leave the check outside the loop until it's needed
> there.
> 
> 
>> Maybe you could put all of the conditions under which this function can
>> actually do something at its beginning: That is, we can't do anything if
>> the BDS is encrypted or if bs->file does not support BDRV_REQ_ALLOCATE
>> (and then you just call this function unconditionally in
>> qcow2_co_pwritev()).
>>
> 
> Done.
> 
>>> +    if (!is_zero_cow(bs, m)) {
>>> +    continue;
>>> +    }
>>
>> Is this really efficient?  I remember someone complaining about
>> bdrv_co_block_status() being kind of slow on some filesystems, so
>> there'd be a tradeoff depending on how it compares to just reading up to
>> two clusters from the backing file -- especially considering that the OS
>> can query the same information just as quickly, and thus the only
>> overhead the read should have is a memset() (assuming the OS is clever).
>>
>> So basically my question is whether it would be better to just skip this
>> if we have any backing file at all and only do this optimization if
>> there is none.
>>
> 
> So what we trade between is
> (read+write) vs (lseek+fallocate or lseek+read+write).
> 
> Indeed if it comes to lseek the profit is smaller, and we're probably
> unlikely to find a hole anyway.
> 
> Maybe it's good enough to cover these cases:
>  1. no backing
>  2. beyond bdrv_getlength() in backing
>  3. unallocated in backing qcow2 (covers 'beyond bdrv_getlength()
>   in backing->file')
> 
> 1 & 2 are easy to check;

Not sure how useful 2 is, though.  (I don't know.  I always hear about
people wanting to optimize for such a case where a backing file is
shorter than the overlay, but I can't imagine a real use case for that.)

> 3: if that's not too hacky maybe we can do the bdrv_is_allocated() check
> for qcow2 exclusively and if there is raw (or any other format) backing
> image - do the COW

Yes, well, it seems useful in principle...  But it would require general
block layer work indeed.  Maybe a new flag for bdrv_block_status*() that
says only to look into the format layer?

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v7 4/9] block: treat BDRV_REQ_ALLOCATE as serialising

2018-01-31 Thread Max Reitz
On 2018-01-30 13:36, Anton Nefedov wrote:
> 
> 
> On 29/1/2018 10:48 PM, Max Reitz wrote:
>> On 2018-01-18 18:49, Anton Nefedov wrote:
>>> The idea is that ALLOCATE requests may overlap with other requests.
>>> Reuse the existing block layer infrastructure for serialising requests.
>>> Use the following approach:
>>>    - mark ALLOCATE serialising, so subsequent requests to the area wait
>>>    - ALLOCATE request itself must never wait if another request is in
>>> flight
>>>  already. Return EAGAIN, let the caller reconsider.
>>>
>>> Signed-off-by: Anton Nefedov 
>>> Reviewed-by: Eric Blake 
>>> ---
>>>   block/io.c | 27 +++
>>>   1 file changed, 19 insertions(+), 8 deletions(-)
>>
>> The basic principle looks good to me.
>>
>>> diff --git a/block/io.c b/block/io.c
>>> index cf2f84c..4b0d34f 100644
>>> --- a/block/io.c
>>> +++ b/block/io.c
>>
>> [...]
>>
>>> @@ -1717,7 +1728,7 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
>>>   struct iovec head_iov;
>>>     mark_request_serialising(&req, align);
>>> -    wait_serialising_requests(&req);
>>> +    wait_serialising_requests(&req, false);
>>
>> What if someone calls bdrv_co_pwritev() with BDRV_REQ_ZERO_WRITE |
>> BDRV_REQ_ALLOCATE?  
> 
> Either
> 
>     assert(!(qiov && (flags & BDRV_REQ_ALLOCATE)));
> 
> will fail or bdrv_co_do_zero_pwritev() will be used.

Ah, right, I didn't see that condition there.

>> .. Then this should do exactly the same as
>> bdrv_co_do_zero_pwritev(), which it currently does not -- besides this
>> serialization, this includes returning -ENOTSUP if there is a head or
>> tail to write.
>>
> 
> Another question is if that assertion is ok.
> In other words: should (qiov!=NULL && REQ_ALLOCATE) be a valid case?
> e.g. with qiov filled with zeroes?

I think it's OK to leave the assertion that way.  But maybe move it
down, under the bdrv_co_do_zero_pwritev() call (and then omit the qiov
!= NULL, because that's guaranteed then)?

(But maybe not everyone's as blind as me.)

> I'd rather document that not supported (and leave the assertion).
> 
> Actually, even (qiov!=NULL && REQ_ZERO_WRITE) looks kind of
> unsupported/broken? Alignment code in bdrv_co_pwritev() zeroes out the
> head and tail by passing the flag down bdrv_aligned_pwritev()

Yes.  Maybe we should have an assertion that you aren't allowed to pass
a qiov with REQ_ZERO_WRITE...

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v7 3/9] block: introduce BDRV_REQ_ALLOCATE flag

2018-01-31 Thread Max Reitz
On 2018-01-30 13:34, Anton Nefedov wrote:
> 
> 
> On 29/1/2018 10:37 PM, Max Reitz wrote:
>> On 2018-01-18 18:49, Anton Nefedov wrote:
>>> The flag is supposed to indicate that the region of the disk image has
>>> to be sufficiently allocated so it reads as zeroes.
>>>
>>> The call with the flag set must return -ENOTSUP if allocation cannot
>>> be done efficiently.
>>> This has to be made sure of by both
>>>    - the drivers that support the flag
>>>    - and the common block layer (so it will not fall back to any
>>> slowpath
>>>  (like writing zero buffers) in case the driver does not support
>>>  the flag).
>>>
>>> Signed-off-by: Anton Nefedov 
>>> Reviewed-by: Eric Blake 
>>> Reviewed-by: Alberto Garcia 
>>> ---
>>>   include/block/block.h |  6 +-
>>>   include/block/block_int.h |  2 +-
>>>   block/io.c    | 20 +---
>>>   3 files changed, 23 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/include/block/block.h b/include/block/block.h
>>> index 9b12774..3e31b89 100644
>>> --- a/include/block/block.h
>>> +++ b/include/block/block.h
>>> @@ -65,9 +65,13 @@ typedef enum {
>>>   BDRV_REQ_NO_SERIALISING = 0x8,
>>>   BDRV_REQ_FUA    = 0x10,
>>>   BDRV_REQ_WRITE_COMPRESSED   = 0x20,
>>> +    /* The BDRV_REQ_ALLOCATE flag is used to indicate that the
>>> driver has to
>>> + * efficiently allocate the space so it reads as zeroes, or
>>> return an error.
>>
>> What happens if you specify this for a normal write operation that does
>> not write zeroes?
>>
>> (I suppose the answer is "don't do that", but that would need to be
>> documented more clearly here.)
>>
> 
> I can't quite come up with what a regular write with ALLOCATE flag can
> suppose to mean.

It could mean that when zero detection is active, that zero range will
be allocated.  But considering ALLOCATE explicitly means "do not write
data", it probably doesn't make sense for data writes in general.

> Will document that.

Thanks!

>>> + */
>>> +    BDRV_REQ_ALLOCATE   = 0x40,
>>>     /* Mask of valid flags */
>>> -    BDRV_REQ_MASK   = 0x3f,
>>> +    BDRV_REQ_MASK   = 0x7f,
>>>   } BdrvRequestFlags;
>>>     typedef struct BlockSizes {
>>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>>> index 29cafa4..b141710 100644
>>> --- a/include/block/block_int.h
>>> +++ b/include/block/block_int.h
>>> @@ -632,7 +632,7 @@ struct BlockDriverState {
>>>   /* Flags honored during pwrite (so far: BDRV_REQ_FUA) */
>>>   unsigned int supported_write_flags;
>>>   /* Flags honored during pwrite_zeroes (so far: BDRV_REQ_FUA,
>>> - * BDRV_REQ_MAY_UNMAP) */
>>> + * BDRV_REQ_MAY_UNMAP, BDRV_REQ_ALLOCATE) */
>>>   unsigned int supported_zero_flags;
>>>     /* the following member gives a name to every node on the bs
>>> graph. */
>>> diff --git a/block/io.c b/block/io.c
>>> index 7ea4023..cf2f84c 100644
>>> --- a/block/io.c
>>> +++ b/block/io.c
>>> @@ -1424,7 +1424,7 @@ static int coroutine_fn
>>> bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>>>   assert(!bs->supported_zero_flags);
>>>   }
>>>   -    if (ret == -ENOTSUP) {
>>> +    if (ret == -ENOTSUP && !(flags & BDRV_REQ_ALLOCATE)) {
>>>   /* Fall back to bounce buffer if write zeroes is
>>> unsupported */
>>>   BdrvRequestFlags write_flags = flags &
>>> ~BDRV_REQ_ZERO_WRITE;
>>>   @@ -1514,8 +1514,8 @@ static int coroutine_fn
>>> bdrv_aligned_pwritev(BdrvChild *child,
>>>   ret =
>>> notifier_with_return_list_notify(&bs->before_write_notifiers, req);
>>>     if (!ret && bs->detect_zeroes !=
>>> BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF &&
>>> -    !(flags & BDRV_REQ_ZERO_WRITE) && drv->bdrv_co_pwrite_zeroes &&
>>> -    qemu_iovec_is_zero(qiov)) {
>>> +    !(flags & BDRV_REQ_ZERO_WRITE) && !(flags &
>>> BDRV_REQ_ALLOCATE) &&
>>> +    drv->bdrv_co_pwrite_zeroes && qemu_iovec_is_zero(qiov)) {
>>
>> Do we really need to add the BDRV_REQ_ALLOCATE check here?  If the
>> caller specifies that flag, then we won't invalidate it by adding the
>> BDRV_REQ_ZERO_WRITE flag (as long as we don't add BDRV_REQ_MAY_UNMAP).
>>
> 
> Now !(flags & BDRV_REQ_ALLOCATE) is always true here, as REQ_ALLOCATE
> implies REQ_ZERO_WRITE.
> But conceptually yes I think the check should only forbid setting
> MAY_UNMAP.
> 
> Offtop: does REQ_ZERO_WRITE override REQ_WRITE_COMPRESSED in this
> function? at least with !REQ_MAY_UNMAP it looks wrong

Looks like zero detection will indeed override compression.  I think
that was intended, but I don't even have an opinion either way.

Of course, it wouldn't be so nice if you tried to compress something and
then, because the zero write failed, you actually write uncompressed
zeroes...  But since zero detection is an optional feature, it might be
your own fault if you enable it when you want compression anyway, and if
you write to some format/protocol combination that doesn't allow z

Re: [Qemu-block] [PATCH v7 1/9] mirror: inherit supported write/zero flags

2018-01-31 Thread Max Reitz
On 2018-01-30 13:15, Anton Nefedov wrote:
> 
> 
> On 29/1/2018 10:26 PM, Eric Blake wrote:
>> On 01/29/2018 01:21 PM, Max Reitz wrote:
>>> On 2018-01-18 18:48, Anton Nefedov wrote:
 Signed-off-by: Anton Nefedov 
 Reviewed-by: Eric Blake 
 Reviewed-by: Alberto Garcia 
 ---
   block/mirror.c | 5 +
   1 file changed, 5 insertions(+)

 diff --git a/block/mirror.c b/block/mirror.c
 index c9badc1..d18ec65 100644
 --- a/block/mirror.c
 +++ b/block/mirror.c
 @@ -1064,6 +1064,11 @@ static void
 bdrv_mirror_top_refresh_filename(BlockDriverState *bs, QDict *opts)
   bdrv_refresh_filename(bs->backing->bs);
   pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
   bs->backing->bs->filename);
 +    bs->supported_write_flags = BDRV_REQ_FUA &
 +    bs->backing->bs->supported_write_flags;
 +    bs->supported_zero_flags =
 +    (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
 +    bs->backing->bs->supported_zero_flags;
   }
     static void bdrv_mirror_top_close(BlockDriverState *bs)
>>>
>>> Fundamentally OK, but why is this in *_refresh_filename()?
>>
>> Indeed, I missed that (or maybe it got moved during a botched rebase?).
>> For comparison, blkdebug sets it during blkdebug_open(), and nbd sets it
>> during nbd_client_init() (called during nbd_open()).
>>
> 
> We need a backing bs here and I believe it's not generally set at the
> time of .bdrv_open().

Well, but *_refresh_filename() is just wrong.

This driver doesn't have .bdrv_open() at all, and we create it fully
manually in mirror_start_job() anyway (as Eric has said).  Therefore, we
can just do this right after bdrv_append() there has succeeded.

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v7 4/9] block: treat BDRV_REQ_ALLOCATE as serialising

2018-01-31 Thread Anton Nefedov



On 31/1/2018 6:11 PM, Alberto Garcia wrote:

On Thu 18 Jan 2018 06:49:02 PM CET, Anton Nefedov wrote:


-static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
+static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self,
+   bool nowait)


It's a bit confusing to have a function called wait_foo() with a
parameter that says "don't wait"...

How about

  check_serialising_requests(BdrvTrackedRequest *self, bool wait)



I think it might be more important to emphasize in the name that the
function _might_ wait.

i.e. it feels worse to read
  check_serialising_requests(req, true);
when one needs to follow the function to find out that it might yield.

Personally I'd vote for

static int check_or_wait_serialising_requests(
BdrvTrackedRequest *self, bool wait) {}

and maybe even:

static int check_serialising_requests(BdrvTrackedRequest *self) {
return check_or_wait_serialising_requests(self, false);

static int wait_serialising_requests(BdrvTrackedRequest *self) {
return check_or_wait_serialising_requests(self, true);
}


-waited = wait_serialising_requests(req);
+waited = wait_serialising_requests(req, flags & BDRV_REQ_ALLOCATE);
+if (waited && flags & BDRV_REQ_ALLOCATE) {
+return -EAGAIN;
+}


I find this more readable (even if not strictly necessary):

if (waited && (flags & BDRV_REQ_ALLOCATE)) {



Done!


None of my two comments are blockers, though, so

Reviewed-by: Alberto Garcia 

Berto





Re: [Qemu-block] [Qemu-devel] [PATCH v2] qcow2: Replace align_offset() with ROUND_UP()

2018-01-31 Thread Philippe Mathieu-Daudé
On 01/31/2018 07:21 AM, Alberto Garcia wrote:
> The align_offset() function is equivalent to the ROUND_UP() macro so
> there's no need to use the former. The ROUND_UP() name is also a bit
> more explicit.
> 
> This patch uses ROUND_UP() instead of the slower QEMU_ALIGN_UP()
> because align_offset() already requires that the second parameter is a
> power of two.
> 
> Signed-off-by: Alberto Garcia 

Reviewed-by: Philippe Mathieu-Daudé 

> ---
>  block/qcow2-bitmap.c   |  4 ++--
>  block/qcow2-cluster.c  |  4 ++--
>  block/qcow2-refcount.c |  4 ++--
>  block/qcow2-snapshot.c | 10 +-
>  block/qcow2.c  | 14 +++---
>  block/qcow2.h  |  6 --
>  6 files changed, 18 insertions(+), 24 deletions(-)
> 
> diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
> index efa10c6663..0a3803aa34 100644
> --- a/block/qcow2-bitmap.c
> +++ b/block/qcow2-bitmap.c
> @@ -413,8 +413,8 @@ static inline void 
> bitmap_dir_entry_to_be(Qcow2BitmapDirEntry *entry)
>  
>  static inline int calc_dir_entry_size(size_t name_size, size_t 
> extra_data_size)
>  {
> -return align_offset(sizeof(Qcow2BitmapDirEntry) +
> -name_size + extra_data_size, 8);
> +int size = sizeof(Qcow2BitmapDirEntry) + name_size + extra_data_size;
> +return ROUND_UP(size, 8);
>  }
>  
>  static inline int dir_entry_size(Qcow2BitmapDirEntry *entry)
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index a3fec27bf9..29d70e1f3e 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -127,11 +127,11 @@ int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t 
> min_size,
>  
>  new_l1_size2 = sizeof(uint64_t) * new_l1_size;
>  new_l1_table = qemu_try_blockalign(bs->file->bs,
> -   align_offset(new_l1_size2, 512));
> +   ROUND_UP(new_l1_size2, 512));
>  if (new_l1_table == NULL) {
>  return -ENOMEM;
>  }
> -memset(new_l1_table, 0, align_offset(new_l1_size2, 512));
> +memset(new_l1_table, 0, ROUND_UP(new_l1_size2, 512));
>  
>  if (s->l1_size) {
>  memcpy(new_l1_table, s->l1_table, s->l1_size * sizeof(uint64_t));
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 92701ab7af..1d520615a8 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -1202,7 +1202,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
>   * l1_table_offset when it is the current s->l1_table_offset! Be careful
>   * when changing this! */
>  if (l1_table_offset != s->l1_table_offset) {
> -l1_table = g_try_malloc0(align_offset(l1_size2, 512));
> +l1_table = g_try_malloc0(ROUND_UP(l1_size2, 512));
>  if (l1_size2 && l1_table == NULL) {
>  ret = -ENOMEM;
>  goto fail;
> @@ -2545,7 +2545,7 @@ int qcow2_check_metadata_overlap(BlockDriverState *bs, 
> int ign, int64_t offset,
>  }
>  
>  /* align range to test to cluster boundaries */
> -size = align_offset(offset_into_cluster(s, offset) + size, 
> s->cluster_size);
> +size = ROUND_UP(offset_into_cluster(s, offset) + size, s->cluster_size);
>  offset = start_of_cluster(s, offset);
>  
>  if ((chk & QCOW2_OL_ACTIVE_L1) && s->l1_size) {
> diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
> index 44243e0e95..cee25f582b 100644
> --- a/block/qcow2-snapshot.c
> +++ b/block/qcow2-snapshot.c
> @@ -66,7 +66,7 @@ int qcow2_read_snapshots(BlockDriverState *bs)
>  
>  for(i = 0; i < s->nb_snapshots; i++) {
>  /* Read statically sized part of the snapshot header */
> -offset = align_offset(offset, 8);
> +offset = ROUND_UP(offset, 8);
>  ret = bdrv_pread(bs->file, offset, &h, sizeof(h));
>  if (ret < 0) {
>  goto fail;
> @@ -155,7 +155,7 @@ static int qcow2_write_snapshots(BlockDriverState *bs)
>  offset = 0;
>  for(i = 0; i < s->nb_snapshots; i++) {
>  sn = s->snapshots + i;
> -offset = align_offset(offset, 8);
> +offset = ROUND_UP(offset, 8);
>  offset += sizeof(h);
>  offset += sizeof(extra);
>  offset += strlen(sn->id_str);
> @@ -215,7 +215,7 @@ static int qcow2_write_snapshots(BlockDriverState *bs)
>  assert(id_str_size <= UINT16_MAX && name_size <= UINT16_MAX);
>  h.id_str_size = cpu_to_be16(id_str_size);
>  h.name_size = cpu_to_be16(name_size);
> -offset = align_offset(offset, 8);
> +offset = ROUND_UP(offset, 8);
>  
>  ret = bdrv_pwrite(bs->file, offset, &h, sizeof(h));
>  if (ret < 0) {
> @@ -441,7 +441,7 @@ int qcow2_snapshot_create(BlockDriverState *bs, 
> QEMUSnapshotInfo *sn_info)
>  /* The VM state isn't needed any more in the active L1 table; in fact, it
>   * hurts by causing expensive COW for the next snapshot. */
>  qcow2_cluster_discard(bs, qcow2_vm_state_offset(s),
> -  align_offset(sn->vm_state_size,

Re: [Qemu-block] [PATCH v2 6/6] block: Deprecate "backing": ""

2018-01-31 Thread Eric Blake
On 01/20/2018 09:44 AM, Max Reitz wrote:
> We have a clear replacement, so let's deprecate it.
> 
> Signed-off-by: Max Reitz 
> ---
>  qapi/block-core.json | 4 ++--
>  block.c  | 4 
>  qemu-doc.texi| 7 +++
>  qemu-options.hx  | 4 ++--
>  4 files changed, 15 insertions(+), 4 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v2 4/6] qapi: Make more of qobject_to()

2018-01-31 Thread Alberto Garcia
On Sat 20 Jan 2018 04:44:10 PM CET, Max Reitz wrote:
> This patch reworks some places which use either qobject_type() checks
> plus qobject_to(), where the latter alone is sufficient, or NULL checks
> plus qobject_type() checks where we can simply do a qobject_to() != NULL
> check.
>
> Signed-off-by: Max Reitz 

Reviewed-by: Alberto Garcia 

Berto



Re: [Qemu-block] [Qemu-devel] [PATCH v2 2/6] qapi: Replace qobject_to_X(o) by qobject_to(o, X)

2018-01-31 Thread Eric Blake
On 01/31/2018 10:11 AM, Eric Blake wrote:
> On 01/20/2018 09:44 AM, Max Reitz wrote:
>> This patch was generated using the following Coccinelle script:
>>
>> @@
>> expression Obj;
>> @@
>> (
>> - qobject_to_qnum(Obj)
>> + qobject_to(Obj, QNum)
>> |
>> - qobject_to_qstring(Obj)
>> + qobject_to(Obj, QString)
>> |
>> - qobject_to_qdict(Obj)
>> + qobject_to(Obj, QDict)
>> |
>> - qobject_to_qlist(Obj)
>> + qobject_to(Obj, QList)
>> |
>> - qobject_to_qbool(Obj)
>> + qobject_to(Obj, QBool)
>> )
> 
> Worth adding these to scripts/coccinelle/qobject.cocci?

Answering myself - perhaps not, since 3/6 deletes the functions meaning
they won't reappear as something we need to keep checking for.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v2 2/6] qapi: Replace qobject_to_X(o) by qobject_to(o, X)

2018-01-31 Thread Eric Blake
On 01/20/2018 09:44 AM, Max Reitz wrote:
> This patch was generated using the following Coccinelle script:
> 
> @@
> expression Obj;
> @@
> (
> - qobject_to_qnum(Obj)
> + qobject_to(Obj, QNum)
> |
> - qobject_to_qstring(Obj)
> + qobject_to(Obj, QString)
> |
> - qobject_to_qdict(Obj)
> + qobject_to(Obj, QDict)
> |
> - qobject_to_qlist(Obj)
> + qobject_to(Obj, QList)
> |
> - qobject_to_qbool(Obj)
> + qobject_to(Obj, QBool)
> )

Worth adding these to scripts/coccinelle/qobject.cocci?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v2 3/6] qapi: Remove qobject_to_X() functions

2018-01-31 Thread Alberto Garcia
On Sat 20 Jan 2018 04:44:09 PM CET, Max Reitz wrote:
> They are no longer needed now.
>
> Signed-off-by: Max Reitz 

Reviewed-by: Alberto Garcia 

Berto



Re: [Qemu-block] [PATCH v2 2/6] qapi: Replace qobject_to_X(o) by qobject_to(o, X)

2018-01-31 Thread Alberto Garcia
On Sat 20 Jan 2018 04:44:08 PM CET, Max Reitz wrote:
> This patch was generated using the following Coccinelle script:
>
> @@
> expression Obj;
> @@
> (
> - qobject_to_qnum(Obj)
> + qobject_to(Obj, QNum)
> |
> - qobject_to_qstring(Obj)
> + qobject_to(Obj, QString)
> |
> - qobject_to_qdict(Obj)
> + qobject_to(Obj, QDict)
> |
> - qobject_to_qlist(Obj)
> + qobject_to(Obj, QList)
> |
> - qobject_to_qbool(Obj)
> + qobject_to(Obj, QBool)
> )
>
> and a bit of manual fix-up for overly long lines and three places in
> tests/check-qjson.c that Coccinelle did not find.
>
> Signed-off-by: Max Reitz 

Reviewed-by: Alberto Garcia 

Berto



Re: [Qemu-block] [PATCH v2 1/6] qapi: Add qobject_to()

2018-01-31 Thread Alberto Garcia
On Sat 20 Jan 2018 04:44:07 PM CET, Max Reitz wrote:
> This is a dynamic casting macro that, given a QObject type, returns an
> object as that type or NULL if the object is of a different type (or
> NULL itself).
>
> The macro uses lower-case letters because:
> 1. There does not seem to be a hard rule on whether qemu macros have to
>be upper-cased,
> 2. The current situation in qapi/qmp is inconsistent (compare e.g.
>QINCREF() vs. qdict_put()),
> 3. qobject_to() will evaluate its @obj parameter only once, thus it is
>generally not important to the caller whether it is a macro or not,
> 4. I prefer it aesthetically.
>
> Signed-off-by: Max Reitz 

Reviewed-by: Alberto Garcia 

Berto



Re: [Qemu-block] Fwd: [RFC 2/4] qemu-socket: Allow custom socket options in socket_connect

2018-01-31 Thread Daniel P . Berrangé
On Wed, Jan 31, 2018 at 11:20:16PM +0800, Zihan Yang wrote:
> Hi, Daniel
> 
> >You've added all this extra functionality to pass arbitrary
> >options, but then not used it in any of the later patches.
> >We've been trying to remove complexity from this code, so
> >I'm not in favour of adding new functionality that is not
> >even used.
> 
> You are right, unused functionality should not be added. I was thinking
> about future usage, but that seems really unnecessary now.
> 
> >I'm not seeing the point of adding the support for the O_NONBLOCK
> >in the listener socket either - that can easily be turned on after
> >you have the listener socket created.
> 
> I don't quite understand how I can turn it on in socket_listen, because
> socket_listen will create a listener socket inside it, then bind and listen.
> Are there other ways than passing some kind of 'config parameter'?

You don't need to turn it on in socket_listen(). You can just do

   fd = socket_listen()
   qemu_set_nonblock(fd)
   ...do stuff...

> >The O_NONBLOCK functionality makes more sense in this context
> >but the implementation is really broken.
> 
> Well.. sorry for the broken implementation, I guess I need more practice.
> 
> >These functions do hostname lookups, so can never do non-blocking
> >mode correctly as the hostname lookup itself does blocking I/O
> >to the nameserver(s).  Ignoring that, the way you handle the
> >connect() is wrong too. You're iterating over many addresses
> >returned by getaddrinfo() and doing a non-blocking connect
> >on each of them. These will essentially all fail with EAGAIN
> >and you'll skip onto the next address which is wrong.
> 
> Why would the hostname lookup affect the listener socket
> afterwards? The socket is created after the lookup procedure is done.
> Therefore, the config should only affect the listener socket, not the
> hostname lookup process. Would you explain in more detail? I'm not
> an expert in socket programming, so I'm confused.
> 
> Also, connect() indeed could return EAGAIN, however, the continue
> expression is inside the do-while loop of inet_connect_addr(),
> rather than the for loop inside inet_connect_saddr(), which is the
> caller of inet_connect_addr(). So it would just try to connect again
> instead of skipping to next address.

The point of using  O_NONBLOCK is so that the caller does not get
delayed for a long time while network traffic runs.  There are two
places network traffic is used in socket_connect(). First it is
used to resolve the hostname to an IP address via DNS servers, and
second it is used in the TCP connection handshake.

The O_NONBLOCK code only affects the connection handshake, so the
caller can still be delayed an abitrary amount of time by the DNS
lookup.

IOW, if the caller must *not* be delayed, then simply using O_NONBLOCK
is not sufficient to avoid the delay. If the caller can tolerate
delays, then using O_NONBLOCK is pointless.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



[Qemu-block] Fwd: [RFC 2/4] qemu-socket: Allow custom socket options in socket_connect

2018-01-31 Thread Zihan Yang
Hi, Daniel

>You've added all this extra functionality to pass arbitrary
>options, but then not used it in any of the later patches.
>We've been trying to remove complexity from this code, so
>I'm not in favour of adding new functionality that is not
>even used.

You are right, unused functionality should not be added. I was thinking
about future usage, but that seems really unnecessary now.

>I'm not seeing the point of adding the support for the O_NONBLOCK
>in the listener socket either - that can easily be turned on after
>you have the listener socket created.

I don't quite understand how I can turn it on in socket_listen, because
socket_listen will create a listener socket inside it, then bind and listen.
Are there other ways than passing some kind of 'config parameter'?

>The O_NONBLOCK functionality makes more sense in this context
>but the implementation is really broken.

Well.. sorry for the broken implementation, I guess I need more practice.

>These functions do hostname lookups, so can never do non-blocking
>mode correctly as the hostname lookup itself does blocking I/O
>to the nameserver(s).  Ignoring that, the way you handle the
>connect() is wrong too. You're iterating over many addresses
>returned by getaddrinfo() and doing a non-blocking connect
>on each of them. These will essentially all fail with EAGAIN
>and you'll skip onto the next address which is wrong.

Why would the hostname lookup affect the listener socket
afterwards? The socket is created after the lookup procedure is done.
Therefore, the config should only affect the listener socket, not the
hostname lookup process. Would you explain in more detail? I'm not
an expert in socket programming, so I'm confused.

Also, connect() indeed could return EAGAIN, however, the continue
expression is inside the do-while loop of inet_connect_addr(),
rather than the for loop inside inet_connect_saddr(), which is the
caller of inet_connect_addr(). So it would just try to connect again
instead of skipping to next address.

I know this is not a well-written patch, but if you forget about the broken
implementation for a while, it actually won't fail them all. I did a little
test
by connecting two VMs through socket, and they can ping each other.
The docker test fails because I missed the NULL pointer check for sconf,
which I ignored in the ping test above, but it is irrelevant to the problem
here. After I fix it, the test passes.

>This code used to have support for O_NONBLOCK but it was removed
>because the DNS problem means that any code relying on it was
>already broken.  The rest of the QEMU codebase has been converted
>to use QIOChannelSocket instead which can handle non-blocking
>DNS properly

I didn't know this, thanks for clarifying it. Do you mean if I want to use
non-blocking socket, it's better to use QIOChannelSocket instead of
socket_listen/socket_connect? If I want to set some other
options, do I have to use the bind/listen/connect series functions
directly? That seems the way in some functions in net/socket.c.

Anyway, thanks for your comment, it helps clarify things. I had
intended to get start with this task, but it seems to have already been
fully explored before. I guess I need to review my patches throughly,

Regards
Zihan


Re: [Qemu-block] [PATCH v7 4/9] block: treat BDRV_REQ_ALLOCATE as serialising

2018-01-31 Thread Alberto Garcia
On Thu 18 Jan 2018 06:49:02 PM CET, Anton Nefedov wrote:

> -static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
> +static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self,
> +   bool nowait)

It's a bit confusing to have a function called wait_foo() with a
parameter that says "don't wait"...

How about

 check_serialising_requests(BdrvTrackedRequest *self, bool wait)

> -waited = wait_serialising_requests(req);
> +waited = wait_serialising_requests(req, flags & BDRV_REQ_ALLOCATE);
> +if (waited && flags & BDRV_REQ_ALLOCATE) {
> +return -EAGAIN;
> +}

I find this more readable (even if not strictly necessary):

   if (waited && (flags & BDRV_REQ_ALLOCATE)) {

None of my two comments are blockers, though, so

Reviewed-by: Alberto Garcia 

Berto



Re: [Qemu-block] [PATCH] vl: pause vcpus before stopping iothreads

2018-01-31 Thread Kevin Wolf
Am 31.01.2018 um 14:56 hat Stefan Hajnoczi geschrieben:
> On Tue, Jan 30, 2018 at 05:54:56PM +0100, Kevin Wolf wrote:
> > Am 30.01.2018 um 16:38 hat Stefan Hajnoczi geschrieben:
> > > Commit dce8921b2baaf95974af8176406881872067adfa ("iothread: Stop threads
> > > before main() quits") introduced iothread_stop_all() to avoid the
> > > following virtio-scsi assertion failure:
> > > 
> > >   assert(blk_get_aio_context(d->conf.blk) == s->ctx);
> > > 
> > > Back then the assertion failed because when bdrv_close_all() made
> > > d->conf.blk NULL, blk_get_aio_context() returned the global AioContext
> > > instead of s->ctx.
> > > 
> > > The same assertion can still fail today when vcpus submit new I/O
> > > requests after iothread_stop_all() has moved the BDS to the global
> > > AioContext.
> > > 
> > > This patch hardens the iothread_stop_all() approach by pausing vcpus
> > > before calling iothread_stop_all().
> > > 
> > > Note that the assertion failure is a race condition.  It is not possible
> > > to reproduce it reliably.
> > > 
> > > Signed-off-by: Stefan Hajnoczi 
> > 
> > Does pausing the vcpus actually make sure that the iothread isn't active
> > any more, or do we still have a small window where the vcpu is already
> > stopped, but the iothread is still processing requests?
> > 
> > Essentially, I think the bdrv_set_aio_context() in iothread_stop_all()
> > does either not have any effect, or if it does have an effect, it's
> > wrong. You can't just force an in-use BDS into a different AioContext
> > when the user that set the AioContext is still there.
> > 
> > At the very least, do we need a blk_drain_all() before stopping the
> > iothreads?
> 
> bdrv_set_aio_context() contains aio_disable_external() +
> bdrv_parent_drained_begin() + bdrv_drain(bs).  This should complete all
> requests, even those sitting in a descriptor ring that hasn't been
> processed yet.

Ah, yes. Not very obvious, so I wouldn't mind a comment, but you can
have my R-b either way then:

Reviewed-by: Kevin Wolf 

> > It would still just be a hack, the proper way seens to be
> > getting the virtio device out of dataplane mode so that the iothread is
> > actually unused and doesn't just happen to not process something at the
> > moment.
> 
> Agreed, the existing approach is a hack.  I'm not keen on implementing
> a proper device<->IOThread detach operation because vl.c:main() seems to
> be the only place that needs it - and it can get away with just
> quiescing requests and the IOThread instead.

As long as we don't want to switch devices between iothreads at runtime
(and create/delete iothreads over QMP), we probably won't really need
it, yes.

Kevin


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH v6 2/2] qemu-img: Document --force-share / -U

2018-01-31 Thread Kevin Wolf
Am 31.01.2018 um 15:12 hat Stefan Hajnoczi geschrieben:
> On Tue, Jan 30, 2018 at 05:38:20PM +0100, Kevin Wolf wrote:
> > Am 30.01.2018 um 15:23 hat Eric Blake geschrieben:
> > > On 01/30/2018 12:34 AM, Fam Zheng wrote:
> > > > Signed-off-by: Fam Zheng 
> > > > Signed-off-by: Kevin Wolf 
> > > > ---
> > > >  qemu-img.texi | 7 +++
> > > >  1 file changed, 7 insertions(+)
> > > > 
> > > > diff --git a/qemu-img.texi b/qemu-img.texi
> > > > index 60a0e080c6..ec7e2f5d1e 100644
> > > > --- a/qemu-img.texi
> > > > +++ b/qemu-img.texi
> > > > @@ -86,6 +86,13 @@ exclusive with the @var{-O} parameters. It is 
> > > > currently required to also use
> > > >  the @var{-n} parameter to skip image creation. This restriction may be 
> > > > relaxed
> > > >  in a future release.
> > > >  
> > > > +@item --force-share (-U)
> > > > +If specified, @code{qemu-img} will open the image in shared mode, 
> > > > allowing
> > > > +concurrent writers. For example, this can be used to get the image 
> > > > information
> > > 
> > > Actually, we only permit one writer at a time.  Would it be better to
> > > say "allowing a concurrent writer"?
> > 
> > As far as qemu-img is concerned, multiple writers to the image are
> > allowed. There may be further restrictions imposed by a writer so that a
> > second writer isn't allowed, but with raw images and share-rw=on nothing
> > prevents two qemu instances writing to the same image while qemu-img is
> > accessing it read-only.
> 
> I agree with what you wrote and it's technically correct, but this is a
> confusing place to mention concurrent writers.  They are a common source
> of user error and that's why locking was introduced in the first place.

So you understand "concurrent writers" as at least two writers? Because
I wouldn't understood it as a writer that is concurrent to the
(read-only) qemu-img.

Maybe rephrase it as "...allowing other processes to write to the
image" then?

> There should be a separate paragraph in docs/qemu-block-drivers.texi
> explaining that share-rw=on can be used safely with format=raw if the
> guests are configured to safely access a shared disk.  It should also
> mention that share-rw=on is unsafe for image formats.

share-rw=on is a -device option and only about the guest, not about the
backend. It is never unsafe if the guest can cope with external writers.
It just doesn't prevent qemu from locking the image file for image
formats.

Kevin


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH v6 2/2] qemu-img: Document --force-share / -U

2018-01-31 Thread Stefan Hajnoczi
On Tue, Jan 30, 2018 at 05:38:20PM +0100, Kevin Wolf wrote:
> Am 30.01.2018 um 15:23 hat Eric Blake geschrieben:
> > On 01/30/2018 12:34 AM, Fam Zheng wrote:
> > > Signed-off-by: Fam Zheng 
> > > Signed-off-by: Kevin Wolf 
> > > ---
> > >  qemu-img.texi | 7 +++
> > >  1 file changed, 7 insertions(+)
> > > 
> > > diff --git a/qemu-img.texi b/qemu-img.texi
> > > index 60a0e080c6..ec7e2f5d1e 100644
> > > --- a/qemu-img.texi
> > > +++ b/qemu-img.texi
> > > @@ -86,6 +86,13 @@ exclusive with the @var{-O} parameters. It is 
> > > currently required to also use
> > >  the @var{-n} parameter to skip image creation. This restriction may be 
> > > relaxed
> > >  in a future release.
> > >  
> > > +@item --force-share (-U)
> > > +If specified, @code{qemu-img} will open the image in shared mode, 
> > > allowing
> > > +concurrent writers. For example, this can be used to get the image 
> > > information
> > 
> > Actually, we only permit one writer at a time.  Would it be better to
> > say "allowing a concurrent writer"?
> 
> As far as qemu-img is concerned, multiple writers to the image are
> allowed. There may be further restrictions imposed by a writer so that a
> second writer isn't allowed, but with raw images and share-rw=on nothing
> prevents two qemu instances writing to the same image while qemu-img is
> accessing it read-only.

I agree with what you wrote and it's technically correct, but this is a
confusing place to mention concurrent writers.  They are a common source
of user error and that's why locking was introduced in the first place.

There should be a separate paragraph in docs/qemu-block-drivers.texi
explaining that share-rw=on can be used safely with format=raw if the
guests are configured to safely access a shared disk.  It should also
mention that share-rw=on is unsafe for image formats.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH v2] qcow2: Replace align_offset() with ROUND_UP()

2018-01-31 Thread Eric Blake
On 01/31/2018 04:21 AM, Alberto Garcia wrote:
> The align_offset() function is equivalent to the ROUND_UP() macro so
> there's no need to use the former. The ROUND_UP() name is also a bit
> more explicit.
> 
> This patch uses ROUND_UP() instead of the slower QEMU_ALIGN_UP()
> because align_offset() already requires that the second parameter is a
> power of two.
> 
> Signed-off-by: Alberto Garcia 
> ---

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH] vl: pause vcpus before stopping iothreads

2018-01-31 Thread Stefan Hajnoczi
On Tue, Jan 30, 2018 at 05:54:56PM +0100, Kevin Wolf wrote:
> Am 30.01.2018 um 16:38 hat Stefan Hajnoczi geschrieben:
> > Commit dce8921b2baaf95974af8176406881872067adfa ("iothread: Stop threads
> > before main() quits") introduced iothread_stop_all() to avoid the
> > following virtio-scsi assertion failure:
> > 
> >   assert(blk_get_aio_context(d->conf.blk) == s->ctx);
> > 
> > Back then the assertion failed because when bdrv_close_all() made
> > d->conf.blk NULL, blk_get_aio_context() returned the global AioContext
> > instead of s->ctx.
> > 
> > The same assertion can still fail today when vcpus submit new I/O
> > requests after iothread_stop_all() has moved the BDS to the global
> > AioContext.
> > 
> > This patch hardens the iothread_stop_all() approach by pausing vcpus
> > before calling iothread_stop_all().
> > 
> > Note that the assertion failure is a race condition.  It is not possible
> > to reproduce it reliably.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> 
> Does pausing the vcpus actually make sure that the iothread isn't active
> any more, or do we still have a small window where the vcpu is already
> stopped, but the iothread is still processing requests?
> 
> Essentially, I think the bdrv_set_aio_context() in iothread_stop_all()
> does either not have any effect, or if it does have an effect, it's
> wrong. You can't just force an in-use BDS into a different AioContext
> when the user that set the AioContext is still there.
> 
> At the very least, do we need a blk_drain_all() before stopping the
> iothreads?

bdrv_set_aio_context() contains aio_disable_external() +
bdrv_parent_drained_begin() + bdrv_drain(bs).  This should complete all
requests, even those sitting in a descriptor ring that hasn't been
processed yet.

> It would still just be a hack, the proper way seens to be
> getting the virtio device out of dataplane mode so that the iothread is
> actually unused and doesn't just happen to not process something at the
> moment.

Agreed, the existing approach is a hack.  I'm not keen on implementing
a proper device<->IOThread detach operation because vl.c:main() seems to
be the only place that needs it - and it can get away with just
quiescing requests and the IOThread instead.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH] qcow2: Replace align_offset() with ROUND_UP()

2018-01-31 Thread Alberto Garcia
On Tue 30 Jan 2018 05:17:47 PM CET, Eric Blake wrote:

 -virtual_size = align_offset(qemu_opt_get_size_del(opts, 
 BLOCK_OPT_SIZE, 0),
 +virtual_size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 
 0),
  cluster_size);
>> 
>> I just realized that the first parameter here is a function call with
>> side effects, it it safe to use ROUND_UP() in this case?
>> 
>> #define ROUND_UP(n, d) (((n) + (d) - 1) & -(0 ? (n) : (d)))
>
> Oh, good catch.  No, we need a temporary variable to hold the result
> of the function call (or we could rewrite ROUND_UP to have
> evaluate-once semantics using gcc/clang extensions).

It was actually not a problem since C guarantees that the (n) in the
conditional expression is only evaluated if the first operand is != 0.
So (n) is only evaluated once.

Anyway, I rewrote it in the new patch as you suggested since it looks a
bit more readable.

Berto



[Qemu-block] [PATCH v2] qcow2: Replace align_offset() with ROUND_UP()

2018-01-31 Thread Alberto Garcia
The align_offset() function is equivalent to the ROUND_UP() macro so
there's no need to use the former. The ROUND_UP() name is also a bit
more explicit.

This patch uses ROUND_UP() instead of the slower QEMU_ALIGN_UP()
because align_offset() already requires that the second parameter is a
power of two.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-bitmap.c   |  4 ++--
 block/qcow2-cluster.c  |  4 ++--
 block/qcow2-refcount.c |  4 ++--
 block/qcow2-snapshot.c | 10 +-
 block/qcow2.c  | 14 +++---
 block/qcow2.h  |  6 --
 6 files changed, 18 insertions(+), 24 deletions(-)

diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
index efa10c6663..0a3803aa34 100644
--- a/block/qcow2-bitmap.c
+++ b/block/qcow2-bitmap.c
@@ -413,8 +413,8 @@ static inline void 
bitmap_dir_entry_to_be(Qcow2BitmapDirEntry *entry)
 
 static inline int calc_dir_entry_size(size_t name_size, size_t extra_data_size)
 {
-return align_offset(sizeof(Qcow2BitmapDirEntry) +
-name_size + extra_data_size, 8);
+int size = sizeof(Qcow2BitmapDirEntry) + name_size + extra_data_size;
+return ROUND_UP(size, 8);
 }
 
 static inline int dir_entry_size(Qcow2BitmapDirEntry *entry)
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index a3fec27bf9..29d70e1f3e 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -127,11 +127,11 @@ int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t 
min_size,
 
 new_l1_size2 = sizeof(uint64_t) * new_l1_size;
 new_l1_table = qemu_try_blockalign(bs->file->bs,
-   align_offset(new_l1_size2, 512));
+   ROUND_UP(new_l1_size2, 512));
 if (new_l1_table == NULL) {
 return -ENOMEM;
 }
-memset(new_l1_table, 0, align_offset(new_l1_size2, 512));
+memset(new_l1_table, 0, ROUND_UP(new_l1_size2, 512));
 
 if (s->l1_size) {
 memcpy(new_l1_table, s->l1_table, s->l1_size * sizeof(uint64_t));
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 92701ab7af..1d520615a8 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1202,7 +1202,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
  * l1_table_offset when it is the current s->l1_table_offset! Be careful
  * when changing this! */
 if (l1_table_offset != s->l1_table_offset) {
-l1_table = g_try_malloc0(align_offset(l1_size2, 512));
+l1_table = g_try_malloc0(ROUND_UP(l1_size2, 512));
 if (l1_size2 && l1_table == NULL) {
 ret = -ENOMEM;
 goto fail;
@@ -2545,7 +2545,7 @@ int qcow2_check_metadata_overlap(BlockDriverState *bs, 
int ign, int64_t offset,
 }
 
 /* align range to test to cluster boundaries */
-size = align_offset(offset_into_cluster(s, offset) + size, 
s->cluster_size);
+size = ROUND_UP(offset_into_cluster(s, offset) + size, s->cluster_size);
 offset = start_of_cluster(s, offset);
 
 if ((chk & QCOW2_OL_ACTIVE_L1) && s->l1_size) {
diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 44243e0e95..cee25f582b 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -66,7 +66,7 @@ int qcow2_read_snapshots(BlockDriverState *bs)
 
 for(i = 0; i < s->nb_snapshots; i++) {
 /* Read statically sized part of the snapshot header */
-offset = align_offset(offset, 8);
+offset = ROUND_UP(offset, 8);
 ret = bdrv_pread(bs->file, offset, &h, sizeof(h));
 if (ret < 0) {
 goto fail;
@@ -155,7 +155,7 @@ static int qcow2_write_snapshots(BlockDriverState *bs)
 offset = 0;
 for(i = 0; i < s->nb_snapshots; i++) {
 sn = s->snapshots + i;
-offset = align_offset(offset, 8);
+offset = ROUND_UP(offset, 8);
 offset += sizeof(h);
 offset += sizeof(extra);
 offset += strlen(sn->id_str);
@@ -215,7 +215,7 @@ static int qcow2_write_snapshots(BlockDriverState *bs)
 assert(id_str_size <= UINT16_MAX && name_size <= UINT16_MAX);
 h.id_str_size = cpu_to_be16(id_str_size);
 h.name_size = cpu_to_be16(name_size);
-offset = align_offset(offset, 8);
+offset = ROUND_UP(offset, 8);
 
 ret = bdrv_pwrite(bs->file, offset, &h, sizeof(h));
 if (ret < 0) {
@@ -441,7 +441,7 @@ int qcow2_snapshot_create(BlockDriverState *bs, 
QEMUSnapshotInfo *sn_info)
 /* The VM state isn't needed any more in the active L1 table; in fact, it
  * hurts by causing expensive COW for the next snapshot. */
 qcow2_cluster_discard(bs, qcow2_vm_state_offset(s),
-  align_offset(sn->vm_state_size, s->cluster_size),
+  ROUND_UP(sn->vm_state_size, s->cluster_size),
   QCOW2_DISCARD_NEVER, false);
 
 #ifdef DEBUG_ALLOC
@@ -710,7 +710,7 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs,
 }
 new_l1_bytes = sn->l1_size * sizeof(uint64_t);
 new_l1_table

Re: [Qemu-block] [RFC 2/4] qemu-socket: Allow custom socket options in socket_connect

2018-01-31 Thread Daniel P . Berrangé
On Tue, Jan 30, 2018 at 03:13:42AM +0800, Zihan Yang wrote:
> Currently, socket_connect doesn't allow custom socket options,
> which is inconvenient when the caller wants a different kind of
> socket from that the socket_connect provides. This patch allows
> custom config in socket_connect by providing an extra parameter.
> Existing functions can just pass a NULL pointer.

Again adding options functionality hwich is not actually used
is bad.

The O_NONBLOCK functionality makes more sense in this context
but the implementation is really broken.

These functions do hostname lookups, so can never do non-blocking
mode correctly as the hostname lookup itself does blocking I/O
to the nameserver(s).  Ignoring that, the way you handle the
connect() is wrong too. You're iterating over many addresses
returned by getaddrinfo() and doing a non-blocking connect
on each of them. These will essentially all fail with EAGAIN
and you'll skip onto the next address which is wrong.

This code used to have support for O_NONBLOCK but it was removed
because the DNS problem means that any code relying on it was
already broken.  The rest of the QEMU codebase has been converted
to use QIOChannelSocket instead which can handle non-blocking
DNS properly

So overall I'm against this patch too.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|