Re: [Qemu-block] [Qemu-devel] [PATCH for-2.12 0/4] qmp dirty bitmap API

2017-12-12 Thread Fam Zheng
On Mon, 11/13 19:20, Vladimir Sementsov-Ogievskiy wrote:
> Hi all.
> 
> There are three qmp commands, needed to implement external backup API.
> 
> Using these three commands, client may do all needed bitmap management by
> hand:
> 
> on backup start we need to do a transaction:
>  {disable old bitmap, create new bitmap}
> 
> on backup success:
>  drop old bitmap
> 
> on backup fail:
>  enable old bitmap
>  merge new bitmap to old bitmap
>  drop new bitmap
> 
> Question: it may be better to make one command instead of two:
> block-dirty-bitmap-set-enabled(bool enabled)
> 
> Vladimir Sementsov-Ogievskiy (4):
>   block/dirty-bitmap: add lock to bdrv_enable/disable_dirty_bitmap
>   qapi: add block-dirty-bitmap-enable/disable
>   qmp: transaction support for block-dirty-bitmap-enable/disable
>   qapi: add block-dirty-bitmap-merge
> 
>  qapi/block-core.json |  80 +++
>  qapi/transaction.json|   4 ++
>  include/block/dirty-bitmap.h |   2 +
>  block/dirty-bitmap.c |  21 ++
>  blockdev.c   | 151 
> +++
>  5 files changed, 258 insertions(+)
> 

I think tests are required to merge new features/commands.  Can we include tests
on these new code please?  We should cover error handling, and also write tests
that demonstrate the intended real world use cases.

Also should we add new sections to docs/interop/bitmaps.rst?

Meta: John started a long discussion about the API design but I think after all
it turns out exposing dirty bitmap objects and the primitives is a reasonable
approach to implement incremental backup functionalities. The comment I have is
that we should ensure we have also reviewed it from a higher level (e.g. all the
potential user requirements) to make sure this low level API is both sound and
flexible. We shouldn't introduce a minimal set of low level commands just to
support one particular use case, but look a bit further and broader and come up
with a more complete design? Writing docs and tests might force us to think in
this direction, which I think is a good thing to have for this series, too.

Fam



Re: [Qemu-block] [Qemu-devel] [PATCH] qcow2: get rid of qcow2_backing_read1 routine

2017-12-12 Thread Eric Blake
On 12/12/2017 08:40 AM, Edgar Kaziakhmedov wrote:
> Since bdrv_co_preadv does all neccessary checks including

s/neccessary/necessary/

> reading after the end of the backing file, avoid duplication
> of verification before bdrv_co_preadv call.
> 
> Signed-off-by: Edgar Kaziakhmedov 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/qcow2.c | 51 ---
>  block/qcow2.h |  3 ---
>  2 files changed, 8 insertions(+), 46 deletions(-)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH for-2.12 0/4] qmp dirty bitmap API

2017-12-12 Thread John Snow


On 12/11/2017 07:18 AM, Vladimir Sementsov-Ogievskiy wrote:
> 11.12.2017 14:15, Kevin Wolf wrote:
>> Am 09.12.2017 um 01:57 hat John Snow geschrieben:
>>> Here's an idea of what this API might look like without revealing
>>> explicit merge/split primitives.
>>>
>>> A new bitmap property that lets us set retention:
>>>
>>> :: block-dirty-bitmap-set-retention bitmap=foo slices=10
>>>
>>> Or something similar, where the default property for all bitmaps is
>>> zero -- the current behavior: no copies retained.
>>>
>>> By setting it to a non-zero positive integer, the incremental backup
>>> mode will automatically save a disabled copy when possible.
>> -EMAGIC
>>
>> Operations that create or delete user-visible objects should be
>> explicit, not automatic. You're trying to implement management layer
>> functionality in qemu here, but incomplete enough that the artifacts of
>> it are still visible externally. (A complete solution within qemu
>> wouldn't expose low-level concepts such as bitmaps on an external
>> interface, but you would expose something like checkpoints.)
>>
>> Usually it's not a good idea to have a design where qemu implements
>> enough to restrict management tools to whatever use case we had in mind,
>> but not enough to make the management tool's life substantially easier
>> (by not having to care about some low-level concepts).
>>
>>> "What happens if we exceed our retention?"
>>>
>>> (A) We push the last one out automatically, or
>>> (B) We fail the operation immediately.
>>>
>>> A is more convenient, but potentially unsafe if the management tool or
>>> user wasn't aware that was going to happen.
>>> B is more annoying, but definitely more safe as it means we cannot lose
>>> a bitmap accidentally.
>> Both mean that the management layer has not only to deal with the
>> deletion of bitmaps as it wants to have them, but also to keep the
>> retention counter somewhere and predict what qemu is going to do to the
>> bitmaps and whether any corrective action needs to be taken.
>>
>> This is making things more complex rather than simpler.
>>
>>> I would argue for B with perhaps a force-cycle=true|false that defaults
>>> to false to let management tools say "Yes, go ahead, remove the old one"
>>> with additionally some return to let us know it happened:
>>>
>>> {"return": {
>>>    "dropped-slices": [ {"bitmap0": 0}, ...]
>>> }}
>>>
>>> This would introduce some concept of bitmap slices into the mix as ID'd
>>> children of a bitmap. I would propose that these slices are numbered and
>>> monotonically increasing. "bitmap0" as an object starts with no slices,
>>> but every incremental backup creates slice 0, slice 1, slice 2, and so
>>> on. Even after we start deleting some, they stay ordered. These numbers
>>> then stand in for points in time.
>>>
>>> The counter can (must?) be reset and all slices forgotten when
>>> performing a full backup while providing a bitmap argument.
>>>
>>> "How can a user make use of the slices once they're made?"
>>>
>>> Let's consider something like mode=partial in contrast to
>>> mode=incremental, and an example where we have 6 prior slices:
>>> 0,1,2,3,4,5, (and, unnamed, the 'active' slice.)
>>>
>>> mode=partial bitmap=foo slice=4
>>>
>>> This would create a backup from slice 4 to the current time α. This
>>> includes all clusters from 4, 5, and the active bitmap.
>>>
>>> I don't think it is meaningful to define any end point that isn't the
>>> current time, so I've omitted that as a possibility.
>> John, what are you doing here? This adds option after option, and even
>> additional slice object, only complicating an easy thing more and more.
>> I'm not sure if that was your intention, but I feel I'm starting to
>> understand better how Linus's rants come about.
>>
>> Let me summarise what this means for management layer:
>>
>> * The management layer has to manage bitmaps. They have direct control
>>    over creation and deletion of bitmaps. So far so good.
>>
>> * It also has to manage slices in those bitmaps objects; and these
>>    slices are what contains the actual bitmaps. In order to identify a
>>    bitmap in qemu, you need:
>>
>>  a) the node name
>>  b) the bitmap ID, and
>>  c) the slice number
>>
>>    The slice number is assigned by qemu and libvirt has to wait until
>>    qemu tells it about the slice number of a newly created slice. If
>>    libvirt doesn't receive the reply to the command that started the
>>    block job, it needs to be able to query this information from qemu,
>>    e.g. in query-block-jobs.
>>
>> * Slices are automatically created when you start a backup job with a
>>    bitmap. It doesn't matter whether you even intend to do an incremental
>>    backup against this point in time. qemu knows better.
>>
>> * In order to delete a slice that you don't need any more, you have to
>>    create more slices (by doing more backups), but you don't get to
>>    decide which one is dropped. qemu helpfully just drops the oldest one.

Re: [Qemu-block] [Qemu-devel] [PATCH] blockjob: kick jobs on set-speed

2017-12-12 Thread John Snow


On 12/11/2017 07:08 PM, Jeff Cody wrote:
> On Mon, Dec 11, 2017 at 06:46:09PM -0500, John Snow wrote:
>> If users set an unreasonably low speed (like one byte per second), the
>> calculated delay may exceed many hours. While we like to punish users
>> for asking for stupid things, we do also like to allow users to correct
>> their wicked ways.
>>
>> When a user provides a new speed, kick the job to allow it to recalculate
>> its delay.
>>
>> Signed-off-by: John Snow 
>> ---
>>  blockjob.c | 5 +
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/blockjob.c b/blockjob.c
>> index 715c2c2680..43f01ad190 100644
>> --- a/blockjob.c
>> +++ b/blockjob.c
>> @@ -483,6 +483,7 @@ static void block_job_completed_txn_success(BlockJob 
>> *job)
>>  void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
>>  {
>>  Error *local_err = NULL;
>> +int64_t old_speed = job->speed;
>>  
>>  if (!job->driver->set_speed) {
>>  error_setg(errp, QERR_UNSUPPORTED);
>> @@ -495,6 +496,10 @@ void block_job_set_speed(BlockJob *job, int64_t speed, 
>> Error **errp)
>>  }
>>  
>>  job->speed = speed;
>> +/* Kick the job to recompute its delay */
>> +if ((speed > old_speed) && timer_pending(>sleep_timer)) {
> 
> job->sleep_timer is protected by block_job_mutex (via
> block_job_lock/unlock); is it safe for us to check it here outside the
> mutex?
> 

My hunch is that in this specific case that it is; but only because of
assumptions about holding the aio_context and the QEMU global mutex here.

> But in any case, I think we could get rid of the timer_pending check, and
> just always kick the job if we have a speed increase.  block_job_enter()
> should do the right thing (mutex protected check on job->busy and
> job->sleep_timer).
> 

I could lock it for inarguable correctness; I just didn't want to kick a
job that didn't actually require any kicking to limit any potential
problems from that interaction.

(I'm fond of the extra conditional because I feel like it makes the
intent of the kick explicit.)

I can remove it.

>> +block_job_enter(job);
>> +}
>>  }
>>  
>>  void block_job_complete(BlockJob *job, Error **errp)
>> -- 
>> 2.14.3
>>
> 



Re: [Qemu-block] [RFC 1/1] virtio: fix IO request length in virtio SCSI/block

2017-12-12 Thread Paolo Bonzini
On 12/12/2017 16:16, Denis V. Lunev wrote:
> The dark side of this patch is that we are tweaking guest visible
> parameter, though this should be relatively safe as above transport
> layer support is present in QEMU/host Linux for a very long time.
> The patch adds configurable property for VirtIO SCSI with a new default
> and hardcode option for VirtBlock which does not provide good
> configurable framework.

The dark side is fine, but you still need to do backwards-compatibility
properties and also use a property in virtio-blk.

Thanks,

Paolo



Re: [Qemu-block] Block Migration and CPU throttling

2017-12-12 Thread Dr. David Alan Gilbert
* Peter Lieven (p...@kamp.de) wrote:
> Am 21.09.2017 um 14:36 schrieb Dr. David Alan Gilbert:
> > * Peter Lieven (p...@kamp.de) wrote:
> > > Am 19.09.2017 um 16:41 schrieb Dr. David Alan Gilbert:
> > > > * Peter Lieven (p...@kamp.de) wrote:
> > > > > Am 19.09.2017 um 16:38 schrieb Dr. David Alan Gilbert:
> > > > > > * Peter Lieven (p...@kamp.de) wrote:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > I just noticed that CPU throttling and Block Migration don't work 
> > > > > > > together very well.
> > > > > > > During block migration the throttling heuristic detects that we 
> > > > > > > obviously make no progress
> > > > > > > in ram transfer. But the reason is the running block migration 
> > > > > > > and not a too high dirty pages rate.
> > > > > > > 
> > > > > > > The result is that any VM is throttled by 99% during block 
> > > > > > > migration.
> > > > > > Hmm that's unfortunate; do you have a bandwidth set lower than your
> > > > > > actual network connection? I'm just wondering if it's actually going
> > > > > > between the block and RAM iterative sections or getting stuck in ne.
> > > > > It happens also if source and dest are on the same machine and speed 
> > > > > is set to 100G.
> > > > But does it happen if they're not and the speed is set low?
> > > Yes, it does. I noticed it in our test environment between different 
> > > nodes with a 10G
> > > link in between. But its totally clear why it happens. During block 
> > > migration we transfer
> > > all dirty memory pages in each round (if there is moderate memory load), 
> > > but all dirty
> > > pages are obviously more than 50% of the transferred ram in that round.
> > > It is more exactly 100%. But the current logic triggers on this condition.
> > > 
> > > I think I will go forward and send a patch which disables auto converge 
> > > during
> > > block migration bulk stage.
> > Yes, that's fair;  it probably would also make sense to throttle the RAM
> > migration during the block migration bulk stage, since the chances are
> > it's not going to get far.  (I think in the nbd setup, the main
> > migration process isn't started until the end of bulk).
> 
> Catching up with the idea of delaying ram migration until block bulk has 
> completed.
> What do you think is the easiest way to achieve this?



I think the answer depends whether we think this is a 'special' or we
need a new general purpose mechanism.

If it was really general then we'd probably want to split the iterative
stage in two somehow, and only do RAM in the second half.

But I'm not sure it's worth it; I suspect the easiest way is:

   a) Add a counter in migration/ram.c or in the RAM state somewhere
   b) Make ram_save_inhibit increment the counter
   c) Check the counter at the head of ram_save_iterate and just exit
 if it's none 0
   d) Call ram_save_inhibit from block_save_setup
   e) Then release it when you've finished the bulk stage

Make sure you still count the RAM in the pending totals, otherwise
migration might think it's finished a bit early.

Dave

> Peter
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-block] [PATCH v6 09/20] parallels: Switch to .bdrv_co_block_status()

2017-12-12 Thread Vladimir Sementsov-Ogievskiy

12.12.2017 19:32, Eric Blake wrote:

On 12/11/2017 09:24 AM, Vladimir Sementsov-Ogievskiy wrote:


On the other hand, '<< BDRV_SECTOR_BITS' only produces the same size
output as the input; if the left side is 32 bits, it risks overflowing.
But '* BDRV_SECTOR_SIZE' always produces a 64-bit value.  So I've
learned (from past mistakes in other byte-conversion series) that the
multiply form is less likely to introduce unintended truncation bugs.

hm, now I doubt. I've wrote tiny program:

#include 
#include 

int main() {
     uint32_t blocks = 1 << 20;
     int block_bits = 15;
     uint32_t block_size = 1 << block_bits;
     uint64_t a = blocks * block_size;
     uint64_t b = blocks << block_bits;
     uint64_t c = (uint64_t)blocks * block_size;

Not the same.  'block_size' in your code is 'uint32_t', so your
multiplication is still 32-bit if you don't cast blocks; while qemu has::

include/block/block.h:#define BDRV_SECTOR_BITS   9
include/block/block.h:#define BDRV_SECTOR_SIZE   (1ULL << BDRV_SECTOR_BITS)

and the 1ULL in the qemu definition forces 'unsigned long long' results
whether or not you cast.



Ah, cool, missed this. And we can't do such thing for BDRV_SECTOR_BITS 
because it will be unspecified behavior.

Thank you for explanation.

--
Best regards,
Vladimir




[Qemu-block] [PATCH] qcow2: get rid of qcow2_backing_read1 routine

2017-12-12 Thread Edgar Kaziakhmedov
Since bdrv_co_preadv does all neccessary checks including
reading after the end of the backing file, avoid duplication
of verification before bdrv_co_preadv call.

Signed-off-by: Edgar Kaziakhmedov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.c | 51 ---
 block/qcow2.h |  3 ---
 2 files changed, 8 insertions(+), 46 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 1914a940e5..4348b2c0c5 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1672,34 +1672,12 @@ static int64_t coroutine_fn 
qcow2_co_get_block_status(BlockDriverState *bs,
 return status;
 }
 
-/* handle reading after the end of the backing file */
-int qcow2_backing_read1(BlockDriverState *bs, QEMUIOVector *qiov,
-int64_t offset, int bytes)
-{
-uint64_t bs_size = bs->total_sectors * BDRV_SECTOR_SIZE;
-int n1;
-
-if ((offset + bytes) <= bs_size) {
-return bytes;
-}
-
-if (offset >= bs_size) {
-n1 = 0;
-} else {
-n1 = bs_size - offset;
-}
-
-qemu_iovec_memset(qiov, n1, 0, bytes - n1);
-
-return n1;
-}
-
 static coroutine_fn int qcow2_co_preadv(BlockDriverState *bs, uint64_t offset,
 uint64_t bytes, QEMUIOVector *qiov,
 int flags)
 {
 BDRVQcow2State *s = bs->opaque;
-int offset_in_cluster, n1;
+int offset_in_cluster;
 int ret;
 unsigned int cur_bytes; /* number of bytes in current iteration */
 uint64_t cluster_offset = 0;
@@ -1734,26 +1712,13 @@ static coroutine_fn int 
qcow2_co_preadv(BlockDriverState *bs, uint64_t offset,
 case QCOW2_CLUSTER_UNALLOCATED:
 
 if (bs->backing) {
-/* read from the base image */
-n1 = qcow2_backing_read1(bs->backing->bs, _qiov,
- offset, cur_bytes);
-if (n1 > 0) {
-QEMUIOVector local_qiov;
-
-qemu_iovec_init(_qiov, hd_qiov.niov);
-qemu_iovec_concat(_qiov, _qiov, 0, n1);
-
-BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING_AIO);
-qemu_co_mutex_unlock(>lock);
-ret = bdrv_co_preadv(bs->backing, offset, n1,
- _qiov, 0);
-qemu_co_mutex_lock(>lock);
-
-qemu_iovec_destroy(_qiov);
-
-if (ret < 0) {
-goto fail;
-}
+BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING_AIO);
+qemu_co_mutex_unlock(>lock);
+ret = bdrv_co_preadv(bs->backing, offset, cur_bytes,
+ _qiov, 0);
+qemu_co_mutex_lock(>lock);
+if (ret < 0) {
+goto fail;
 }
 } else {
 /* Note: in this case, no need to wait */
diff --git a/block/qcow2.h b/block/qcow2.h
index 6f0ff15dd0..46c8cf44ec 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -528,9 +528,6 @@ uint32_t offset_to_reftable_index(BDRVQcow2State *s, 
uint64_t offset)
 }
 
 /* qcow2.c functions */
-int qcow2_backing_read1(BlockDriverState *bs, QEMUIOVector *qiov,
-  int64_t sector_num, int nb_sectors);
-
 int64_t qcow2_refcount_metadata_size(int64_t clusters, size_t cluster_size,
  int refcount_order, bool 
generous_increase,
  uint64_t *refblock_count);
-- 
2.11.0




Re: [Qemu-block] [PATCH v6 09/20] parallels: Switch to .bdrv_co_block_status()

2017-12-12 Thread Eric Blake
On 12/11/2017 09:24 AM, Vladimir Sementsov-Ogievskiy wrote:

>> On the other hand, '<< BDRV_SECTOR_BITS' only produces the same size
>> output as the input; if the left side is 32 bits, it risks overflowing.
>> But '* BDRV_SECTOR_SIZE' always produces a 64-bit value.  So I've
>> learned (from past mistakes in other byte-conversion series) that the
>> multiply form is less likely to introduce unintended truncation bugs.
> 
> hm, now I doubt. I've wrote tiny program:
> 
> #include 
> #include 
> 
> int main() {
>     uint32_t blocks = 1 << 20;
>     int block_bits = 15;
>     uint32_t block_size = 1 << block_bits;
>     uint64_t a = blocks * block_size;
>     uint64_t b = blocks << block_bits;
>     uint64_t c = (uint64_t)blocks * block_size;

Not the same.  'block_size' in your code is 'uint32_t', so your
multiplication is still 32-bit if you don't cast blocks; while qemu has::

include/block/block.h:#define BDRV_SECTOR_BITS   9
include/block/block.h:#define BDRV_SECTOR_SIZE   (1ULL << BDRV_SECTOR_BITS)

and the 1ULL in the qemu definition forces 'unsigned long long' results
whether or not you cast.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PATCH v2 3/3] iotests: add dirty bitmap migration test

2017-12-12 Thread Vladimir Sementsov-Ogievskiy
The test creates two vms (vm_a, vm_b), create dirty bitmap in
the first one, do several writes to corresponding device and
then migrate vm_a to vm_b with dirty bitmaps.

For now, only migration through shared storage for persistent
bitmaps is available, so it is tested here. Only offline variant
is tested for now (a kind of suspend-resume), as it is needed
to test that this case is successfully fixed by recent patch.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: John Snow 
---
 tests/qemu-iotests/169 | 82 ++
 tests/qemu-iotests/169.out |  5 +++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 88 insertions(+)
 create mode 100755 tests/qemu-iotests/169
 create mode 100644 tests/qemu-iotests/169.out

diff --git a/tests/qemu-iotests/169 b/tests/qemu-iotests/169
new file mode 100755
index 00..282970df8a
--- /dev/null
+++ b/tests/qemu-iotests/169
@@ -0,0 +1,82 @@
+#!/usr/bin/env python
+#
+# Tests for dirty bitmaps migration.
+#
+# Copyright (c) 2016-2017 Virtuozzo International GmbH. All rights reserved.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import os
+import iotests
+import time
+from iotests import qemu_img
+
+disk = os.path.join(iotests.test_dir, 'disk')
+migfile = os.path.join(iotests.test_dir, 'migfile')
+
+class TestPersistentDirtyBitmapSuspendResume(iotests.QMPTestCase):
+
+def tearDown(self):
+self.vm_a.shutdown()
+self.vm_b.shutdown()
+os.remove(disk)
+os.remove(migfile)
+
+def setUp(self):
+qemu_img('create', '-f', iotests.imgfmt, disk, '1M')
+
+self.vm_a = iotests.VM(path_suffix='a').add_drive(disk)
+self.vm_a.launch()
+
+self.vm_b = iotests.VM(path_suffix='b').add_drive(disk)
+self.vm_b.add_incoming("exec: cat '" + migfile + "'")
+
+def test_migration_persistent_shared_offline(self):
+""" A kind of suspend-resume """
+granularity = 512
+regions = [
+{ 'start': 0,   'count': 0x1 },
+{ 'start': 0xf, 'count': 0x1 },
+{ 'start': 0xa0201, 'count': 0x1000  }
+]
+
+result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
+   name='bitmap0', granularity=granularity,
+   persistent=True, autoload=True)
+self.assert_qmp(result, 'return', {});
+
+for r in regions:
+self.vm_a.hmp_qemu_io('drive0',
+  'write %d %d' % (r['start'], r['count']))
+
+result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
+   node='drive0', name='bitmap0')
+sha256 = result['return']['sha256']
+
+result = self.vm_a.qmp('migrate', uri='exec:cat>' + migfile)
+self.assert_qmp(result, 'return', {});
+self.assertNotEqual(self.vm_a.event_wait("STOP"), None)
+self.vm_a.shutdown()
+
+self.vm_b.launch()
+self.vm_b.event_wait("RESUME", timeout=10.0)
+
+result = self.vm_b.qmp('x-debug-block-dirty-bitmap-sha256',
+   node='drive0', name='bitmap0')
+self.assert_qmp(result, 'return/sha256', sha256);
+
+
+if __name__ == '__main__':
+iotests.main(supported_fmts=['qcow2'])
diff --git a/tests/qemu-iotests/169.out b/tests/qemu-iotests/169.out
new file mode 100644
index 00..ae1213e6f8
--- /dev/null
+++ b/tests/qemu-iotests/169.out
@@ -0,0 +1,5 @@
+.
+--
+Ran 1 tests
+
+OK
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 3e688678dd..e879c7ebc7 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -169,6 +169,7 @@
 162 auto quick
 163 rw auto quick
 165 rw auto quick
+169 rw auto quick
 170 rw auto quick
 171 rw auto quick
 172 auto
-- 
2.11.1




[Qemu-block] [PATCH v2 2/3] qcow2: handle reopening bitmaps on bdrv_invalidate_cache

2017-12-12 Thread Vladimir Sementsov-Ogievskiy
Consider migration with shared storage. Persistent bitmaps are stored
on bdrv_inactivate. Then, on destination
process_incoming_migration_bh() calls bdrv_invalidate_cache_all() which
leads to qcow2_load_autoloading_dirty_bitmaps() which fails if bitmaps
are already loaded on destination start. In this case we should call
qcow2_reopen_bitmaps_rw instead.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: John Snow 
---
 block/qcow2.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 1914a940e5..f5f205e1b7 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1450,7 +1450,13 @@ static int qcow2_do_open(BlockDriverState *bs, QDict 
*options, int flags,
 s->autoclear_features &= QCOW2_AUTOCLEAR_MASK;
 }
 
-if (qcow2_load_autoloading_dirty_bitmaps(bs, _err)) {
+if (bdrv_has_readonly_bitmaps(bs)) {
+if (!bdrv_is_read_only(bs) && !(bdrv_get_flags(bs) & BDRV_O_INACTIVE)) 
{
+bool header_updated = false;
+qcow2_reopen_bitmaps_rw_hint(bs, _updated, _err);
+update_header = update_header && !header_updated;
+}
+} else if (qcow2_load_autoloading_dirty_bitmaps(bs, _err)) {
 update_header = false;
 }
 if (local_err != NULL) {
-- 
2.11.1




[Qemu-block] [PATCH v2 0/3] fix bitmaps migration through shared storage

2017-12-12 Thread Vladimir Sementsov-Ogievskiy
Hi all.

This fixes bitmaps migration through shared storage. Look at 02 for
details.

The bug introduced in 2.10 with the whole qcow2 bitmaps feature, so
qemu-stable in CC. However I doubt that someone really suffered from this.

v2:
   John, thank you for reviewing v1.
   changes:
add John's r-bs, change s/timeout=10/timeout=10.0/ in last patch
and drop old 03 patch, related to this timeout fix.

Vladimir Sementsov-Ogievskiy (3):
  qcow2-bitmap: add qcow2_reopen_bitmaps_rw_hint()
  qcow2: handle reopening bitmaps on bdrv_invalidate_cache
  iotests: add dirty bitmap migration test

 block/qcow2.h  |  2 ++
 block/qcow2-bitmap.c   | 15 -
 block/qcow2.c  |  8 -
 tests/qemu-iotests/169 | 82 ++
 tests/qemu-iotests/169.out |  5 +++
 tests/qemu-iotests/group   |  1 +
 6 files changed, 111 insertions(+), 2 deletions(-)
 create mode 100755 tests/qemu-iotests/169
 create mode 100644 tests/qemu-iotests/169.out

-- 
2.11.1




[Qemu-block] [PATCH v2 1/3] qcow2-bitmap: add qcow2_reopen_bitmaps_rw_hint()

2017-12-12 Thread Vladimir Sementsov-Ogievskiy
Add version of qcow2_reopen_bitmaps_rw, which do the same work but
also return a hint about was header updated or not. This will be
used in the following fix for bitmaps reloading after migration.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: John Snow 
---
 block/qcow2.h|  2 ++
 block/qcow2-bitmap.c | 15 ++-
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 6f0ff15dd0..40fa5b7cfe 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -667,6 +667,8 @@ int qcow2_check_bitmaps_refcounts(BlockDriverState *bs, 
BdrvCheckResult *res,
   void **refcount_table,
   int64_t *refcount_table_size);
 bool qcow2_load_autoloading_dirty_bitmaps(BlockDriverState *bs, Error **errp);
+int qcow2_reopen_bitmaps_rw_hint(BlockDriverState *bs, bool *header_updated,
+ Error **errp);
 int qcow2_reopen_bitmaps_rw(BlockDriverState *bs, Error **errp);
 void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, Error **errp);
 int qcow2_reopen_bitmaps_ro(BlockDriverState *bs, Error **errp);
diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
index f45e46cfbd..6c80dcda69 100644
--- a/block/qcow2-bitmap.c
+++ b/block/qcow2-bitmap.c
@@ -1002,7 +1002,8 @@ fail:
 return false;
 }
 
-int qcow2_reopen_bitmaps_rw(BlockDriverState *bs, Error **errp)
+int qcow2_reopen_bitmaps_rw_hint(BlockDriverState *bs, bool *header_updated,
+ Error **errp)
 {
 BDRVQcow2State *s = bs->opaque;
 Qcow2BitmapList *bm_list;
@@ -1010,6 +1011,10 @@ int qcow2_reopen_bitmaps_rw(BlockDriverState *bs, Error 
**errp)
 GSList *ro_dirty_bitmaps = NULL;
 int ret = 0;
 
+if (header_updated != NULL) {
+*header_updated = false;
+}
+
 if (s->nb_bitmaps == 0) {
 /* No bitmaps - nothing to do */
 return 0;
@@ -1053,6 +1058,9 @@ int qcow2_reopen_bitmaps_rw(BlockDriverState *bs, Error 
**errp)
 error_setg_errno(errp, -ret, "Can't update bitmap directory");
 goto out;
 }
+if (header_updated != NULL) {
+*header_updated = true;
+}
 g_slist_foreach(ro_dirty_bitmaps, set_readonly_helper, false);
 }
 
@@ -1063,6 +1071,11 @@ out:
 return ret;
 }
 
+int qcow2_reopen_bitmaps_rw(BlockDriverState *bs, Error **errp)
+{
+return qcow2_reopen_bitmaps_rw_hint(bs, NULL, errp);
+}
+
 /* store_bitmap_data()
  * Store bitmap to image, filling bitmap table accordingly.
  */
-- 
2.11.1




[Qemu-block] [RFC 1/1] virtio: fix IO request length in virtio SCSI/block

2017-12-12 Thread Denis V. Lunev
Linux guests submit IO requests no longer than PAGE_SIZE * max_seg
field reported by SCSI controler. Thus typical sequential read with
1 MB size results in the following pattern of the IO from the guest:
  8,16   115754 2.766095122  2071  D   R 2095104 + 1008 [dd]
  8,16   115755 2.766108785  2071  D   R 2096112 + 1008 [dd]
  8,16   115756 2.766113486  2071  D   R 2097120 + 32 [dd]
  8,16   115757 2.767668961 0  C   R 2095104 + 1008 [0]
  8,16   115758 2.768534315 0  C   R 2096112 + 1008 [0]
  8,16   115759 2.768539782 0  C   R 2097120 + 32 [0]
The IO was generated by
  dd if=/dev/sda of=/dev/null bs=1024 iflag=direct

This effectively means that on rotational disks we will observe 3 IOPS
for each 2 MBs processed. This definitely negatively affects both
guest and host IO performance.

The cure is relatively simple - we should report lengthy scatter-gather
ability of the SCSI controller. Fortunately the situation here is very
good. VirtIO transport layer can accomodate 1024 items in one request
while we are using only 128. This situation is present since almost
very beginning. 2 items are dedicated for request metadata thus we
should publish VIRTQUEUE_MAX_SIZE - 2 as max_seg.

The following pattern is observed after the patch:
  8,16   1 9921 2.662721340  2063  D   R 2095104 + 1024 [dd]
  8,16   1 9922 2.662737585  2063  D   R 2096128 + 1024 [dd]
  8,16   1 9923 2.665188167 0  C   R 2095104 + 1024 [0]
  8,16   1 9924 2.665198777 0  C   R 2096128 + 1024 [0]
which is much better.

The dark side of this patch is that we are tweaking guest visible
parameter, though this should be relatively safe as above transport
layer support is present in QEMU/host Linux for a very long time.
The patch adds configurable property for VirtIO SCSI with a new default
and hardcode option for VirtBlock which does not provide good
configurable framework.

Signed-off-by: Denis V. Lunev 
CC: "Michael S. Tsirkin" 
CC: Stefan Hajnoczi 
CC: Kevin Wolf 
CC: Max Reitz 
CC: Paolo Bonzini 
---
Guys, we should make a decision whether this is good to make such guest
visible change or we should link this to the machine type. We have made
very short discussion with Stefan on IRC but without final solution.
This patch is posted to start the discussion.

 include/hw/virtio/virtio-scsi.h | 1 +
 hw/block/virtio-blk.c   | 2 +-
 hw/scsi/vhost-scsi.c| 2 ++
 hw/scsi/vhost-user-scsi.c   | 2 ++
 hw/scsi/virtio-scsi.c   | 4 +++-
 5 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 4c0bcdb..1e5805e 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -49,6 +49,7 @@ struct VirtIOSCSIConf {
 uint32_t num_queues;
 uint32_t virtqueue_size;
 uint32_t max_sectors;
+uint32_t max_segments;
 uint32_t cmd_per_lun;
 #ifdef CONFIG_VHOST_SCSI
 char *vhostfd;
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 05d1440..3f272e1 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -736,7 +736,7 @@ static void virtio_blk_update_config(VirtIODevice *vdev, 
uint8_t *config)
 blk_get_geometry(s->blk, );
 memset(, 0, sizeof(blkcfg));
 virtio_stq_p(vdev, , capacity);
-virtio_stl_p(vdev, _max, 128 - 2);
+virtio_stl_p(vdev, _max, VIRTQUEUE_MAX_SIZE - 2);
 virtio_stw_p(vdev, , conf->cyls);
 virtio_stl_p(vdev, _size, blk_size);
 virtio_stw_p(vdev, _io_size, conf->min_io_size / blk_size);
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index 9c1bea8..f93eac6 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -238,6 +238,8 @@ static Property vhost_scsi_properties[] = {
 DEFINE_PROP_UINT32("max_sectors", VirtIOSCSICommon, conf.max_sectors,
0x),
 DEFINE_PROP_UINT32("cmd_per_lun", VirtIOSCSICommon, conf.cmd_per_lun, 128),
+DEFINE_PROP_UINT32("max_segments", VirtIOSCSICommon, conf.max_segments,
+   VIRTQUEUE_MAX_SIZE - 2),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index f7561e2..8b02ab1 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -146,6 +146,8 @@ static Property vhost_user_scsi_properties[] = {
 DEFINE_PROP_BIT64("param_change", VHostUserSCSI, host_features,
  VIRTIO_SCSI_F_CHANGE,
  true),
+DEFINE_PROP_UINT32("max_segments", VirtIOSCSICommon, conf.max_segments,
+   VIRTQUEUE_MAX_SIZE - 2),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 3aa9971..5404dde 100644
--- a/hw/scsi/virtio-scsi.c
+++ 

[Qemu-block] [PATCH] block: Don't acquire AioContext in hmp_qemu_io()

2017-12-12 Thread Kevin Wolf
Commit 15afd94a047 added code to acquire and release the AioContext in
qemuio_command(). This means that the lock is taken twice now in the
call path from hmp_qemu_io(). This causes BDRV_POLL_WHILE() to hang for
any requests issued to nodes in a non-mainloop AioContext.

Dropping the first locking from hmp_qemu_io() fixes the problem.

Signed-off-by: Kevin Wolf 
---
 hmp.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/hmp.c b/hmp.c
index 35a7041824..2d72f94193 100644
--- a/hmp.c
+++ b/hmp.c
@@ -2318,7 +2318,6 @@ void hmp_qemu_io(Monitor *mon, const QDict *qdict)
 {
 BlockBackend *blk;
 BlockBackend *local_blk = NULL;
-AioContext *aio_context;
 const char* device = qdict_get_str(qdict, "device");
 const char* command = qdict_get_str(qdict, "command");
 Error *err = NULL;
@@ -2338,9 +2337,6 @@ void hmp_qemu_io(Monitor *mon, const QDict *qdict)
 }
 }
 
-aio_context = blk_get_aio_context(blk);
-aio_context_acquire(aio_context);
-
 /*
  * Notably absent: Proper permission management. This is sad, but it seems
  * almost impossible to achieve without changing the semantics and thereby
@@ -2368,8 +2364,6 @@ void hmp_qemu_io(Monitor *mon, const QDict *qdict)
  */
 qemuio_command(blk, command);
 
-aio_context_release(aio_context);
-
 fail:
 blk_unref(local_blk);
 hmp_handle_error(mon, );
-- 
2.13.6




Re: [Qemu-block] [PATCH V4] pci: removed the is_express field since a uniform interface was inserted

2017-12-12 Thread Marcel Apfelbaum

On 12/12/2017 7:36, Yoni Bettan wrote:

 * according to Eduardo Habkost's commit
   fd3b02c8896d597dd8b9e053dec579cf0386aee1

 * since all PCIEs now implement INTERFACE_PCIE_DEVICE we
   don't need this field anymore

 * Devices that where only INTERFACE_PCIE_DEVICE (is_express == 1)
   or
   devices that where only INTERFACE_CONVENTIONAL_PCI_DEVICE 
(is_express == 0)
   where not affected by the change

   The only devices that were affected are those that are hybrid and 
also
   had (is_express == 1) - therefor only:
 - hw/vfio/pci.c
 - hw/usb/hcd-xhci.c

   For both I made sure that QEMU_PCI_CAP_EXPRESS is on

Signed-off-by: Yoni Bettan 
---
  docs/pcie_pci_bridge.txt   | 2 +-
  hw/block/nvme.c| 1 -
  hw/net/e1000e.c| 1 -
  hw/pci-bridge/pcie_pci_bridge.c| 1 -
  hw/pci-bridge/pcie_root_port.c | 1 -
  hw/pci-bridge/xio3130_downstream.c | 1 -
  hw/pci-bridge/xio3130_upstream.c   | 1 -
  hw/pci-host/xilinx-pcie.c  | 1 -
  hw/pci/pci.c   | 8 ++--
  hw/scsi/megasas.c  | 4 
  hw/usb/hcd-xhci.c  | 9 -
  hw/vfio/pci.c  | 5 -
  include/hw/pci/pci.h   | 3 ---
  13 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/docs/pcie_pci_bridge.txt b/docs/pcie_pci_bridge.txt
index 5a4203f97c..ab35ebf3ca 100644
--- a/docs/pcie_pci_bridge.txt
+++ b/docs/pcie_pci_bridge.txt
@@ -110,5 +110,5 @@ To enable device hot-plug into the bridge on Linux there're 
3 ways:
  Implementation
  ==
  The PCIE-PCI bridge is based on PCI-PCI bridge, but also accumulates PCI 
Express
-features as a PCI Express device (is_express=1).
+features as a PCI Express device.
  
diff --git a/hw/block/nvme.c b/hw/block/nvme.c

index 441e21ed1f..9325bc0911 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1087,7 +1087,6 @@ static void nvme_class_init(ObjectClass *oc, void *data)
  pc->vendor_id = PCI_VENDOR_ID_INTEL;
  pc->device_id = 0x5845;
  pc->revision = 2;
-pc->is_express = 1;
  
  set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);

  dc->desc = "Non-Volatile Memory Express";
diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c
index f1af279e8d..c360f0d8c9 100644
--- a/hw/net/e1000e.c
+++ b/hw/net/e1000e.c
@@ -675,7 +675,6 @@ static void e1000e_class_init(ObjectClass *class, void 
*data)
  c->revision = 0;
  c->romfile = "efi-e1000e.rom";
  c->class_id = PCI_CLASS_NETWORK_ETHERNET;
-c->is_express = 1;
  
  dc->desc = "Intel 82574L GbE Controller";

  dc->reset = e1000e_qdev_reset;
diff --git a/hw/pci-bridge/pcie_pci_bridge.c b/hw/pci-bridge/pcie_pci_bridge.c
index a4d827c99d..b7d9ebbec2 100644
--- a/hw/pci-bridge/pcie_pci_bridge.c
+++ b/hw/pci-bridge/pcie_pci_bridge.c
@@ -169,7 +169,6 @@ static void pcie_pci_bridge_class_init(ObjectClass *klass, 
void *data)
  DeviceClass *dc = DEVICE_CLASS(klass);
  HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(klass);
  
-k->is_express = 1;

  k->is_bridge = 1;
  k->vendor_id = PCI_VENDOR_ID_REDHAT;
  k->device_id = PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE;
diff --git a/hw/pci-bridge/pcie_root_port.c b/hw/pci-bridge/pcie_root_port.c
index 9b6e4ce512..45f9e8cd4a 100644
--- a/hw/pci-bridge/pcie_root_port.c
+++ b/hw/pci-bridge/pcie_root_port.c
@@ -145,7 +145,6 @@ static void rp_class_init(ObjectClass *klass, void *data)
  DeviceClass *dc = DEVICE_CLASS(klass);
  PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
  
-k->is_express = 1;

  k->is_bridge = 1;
  k->config_write = rp_write_config;
  k->realize = rp_realize;
diff --git a/hw/pci-bridge/xio3130_downstream.c 
b/hw/pci-bridge/xio3130_downstream.c
index 1e09d2afb7..613a0d6bb7 100644
--- a/hw/pci-bridge/xio3130_downstream.c
+++ b/hw/pci-bridge/xio3130_downstream.c
@@ -177,7 +177,6 @@ static void xio3130_downstream_class_init(ObjectClass 
*klass, void *data)
  DeviceClass *dc = DEVICE_CLASS(klass);
  PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
  
-k->is_express = 1;

  k->is_bridge = 1;
  k->config_write = xio3130_downstream_write_config;
  k->realize = xio3130_downstream_realize;
diff --git a/hw/pci-bridge/xio3130_upstream.c b/hw/pci-bridge/xio3130_upstream.c
index 227997ce46..d4645bddee 100644
--- a/hw/pci-bridge/xio3130_upstream.c
+++ b/hw/pci-bridge/xio3130_upstream.c
@@ -148,7 +148,6 @@ static void xio3130_upstream_class_init(ObjectClass *klass, 
void *data)
  DeviceClass *dc = DEVICE_CLASS(klass);
  PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
  
-k->is_express = 1;

  k->is_bridge = 1;
  k->config_write = xio3130_upstream_write_config;
  k->realize = xio3130_upstream_realize;
diff --git a/hw/pci-host/xilinx-pcie.c b/hw/pci-host/xilinx-pcie.c
index 7659253090..a4ca3ba30f 100644
--- a/hw/pci-host/xilinx-pcie.c

Re: [Qemu-block] [PATCH v4] qemu-img: Document --force-share / -U

2017-12-12 Thread Kevin Wolf
Am 12.12.2017 um 01:51 hat Fam Zheng geschrieben:
> Signed-off-by: Fam Zheng 
> 
> ---
> 
> v4: "images". [Kevin]
> 
> v3: Document that the option is not allowed for read-write. [Stefan]
> 
> v2: - "code{qemu-img}". [Kashyap, Eric]
> - "etc.." -> "etc.".
> ---
>  qemu-img.texi | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/qemu-img.texi b/qemu-img.texi
> index fdcf120f36..d93501f94f 100644
> --- a/qemu-img.texi
> +++ b/qemu-img.texi
> @@ -57,6 +57,15 @@ exclusive with the @var{-O} parameters. It is currently 
> required to also use
>  the @var{-n} parameter to skip image creation. This restriction may be 
> relaxed
>  in a future release.
>  
> +@item --force-share (-U)
> +
> +If specified, @code{qemu-img} will open the image with shared permissions,
> +which makes it less likely to conflict with a running guest's permissions due
> +to image locking. For example, this can be used to get the image information
> +(with 'info' subcommand) when the image is used by a running guest. Note that
> +this could produce inconsistent result because of concurrent metadata 
> changes,

Each time you read a text, you find something new...

s/result/results/

> +etc. This option is only allowed when opening images in read-only mode.
> +

Thanks, fixed the above and applied the patch to block-next.

Kevin



Re: [Qemu-block] [Qemu-devel] [PATCH for-2.12 0/4] qmp dirty bitmap API

2017-12-12 Thread Kevin Wolf
Am 11.12.2017 um 19:40 hat John Snow geschrieben:
> 
> 
> On 12/11/2017 06:15 AM, Kevin Wolf wrote:
> > Am 09.12.2017 um 01:57 hat John Snow geschrieben:
> >> Here's an idea of what this API might look like without revealing
> >> explicit merge/split primitives.
> >>
> >> A new bitmap property that lets us set retention:
> >>
> >> :: block-dirty-bitmap-set-retention bitmap=foo slices=10
> >>
> >> Or something similar, where the default property for all bitmaps is
> >> zero -- the current behavior: no copies retained.
> >>
> >> By setting it to a non-zero positive integer, the incremental backup
> >> mode will automatically save a disabled copy when possible.
> > 
> > -EMAGIC
> > 
> > Operations that create or delete user-visible objects should be
> > explicit, not automatic. You're trying to implement management layer
> > functionality in qemu here, but incomplete enough that the artifacts of
> > it are still visible externally. (A complete solution within qemu
> > wouldn't expose low-level concepts such as bitmaps on an external
> > interface, but you would expose something like checkpoints.)
> > 
> > Usually it's not a good idea to have a design where qemu implements
> > enough to restrict management tools to whatever use case we had in mind,
> > but not enough to make the management tool's life substantially easier
> > (by not having to care about some low-level concepts).
> > 
> >> "What happens if we exceed our retention?"
> >>
> >> (A) We push the last one out automatically, or
> >> (B) We fail the operation immediately.
> >>
> >> A is more convenient, but potentially unsafe if the management tool or
> >> user wasn't aware that was going to happen.
> >> B is more annoying, but definitely more safe as it means we cannot lose
> >> a bitmap accidentally.
> > 
> > Both mean that the management layer has not only to deal with the
> > deletion of bitmaps as it wants to have them, but also to keep the
> > retention counter somewhere and predict what qemu is going to do to the
> > bitmaps and whether any corrective action needs to be taken.
> > 
> > This is making things more complex rather than simpler.
> > 
> >> I would argue for B with perhaps a force-cycle=true|false that defaults
> >> to false to let management tools say "Yes, go ahead, remove the old one"
> >> with additionally some return to let us know it happened:
> >>
> >> {"return": {
> >>   "dropped-slices": [ {"bitmap0": 0}, ...]
> >> }}
> >>
> >> This would introduce some concept of bitmap slices into the mix as ID'd
> >> children of a bitmap. I would propose that these slices are numbered and
> >> monotonically increasing. "bitmap0" as an object starts with no slices,
> >> but every incremental backup creates slice 0, slice 1, slice 2, and so
> >> on. Even after we start deleting some, they stay ordered. These numbers
> >> then stand in for points in time.
> >>
> >> The counter can (must?) be reset and all slices forgotten when
> >> performing a full backup while providing a bitmap argument.
> >>
> >> "How can a user make use of the slices once they're made?"
> >>
> >> Let's consider something like mode=partial in contrast to
> >> mode=incremental, and an example where we have 6 prior slices:
> >> 0,1,2,3,4,5, (and, unnamed, the 'active' slice.)
> >>
> >> mode=partial bitmap=foo slice=4
> >>
> >> This would create a backup from slice 4 to the current time α. This
> >> includes all clusters from 4, 5, and the active bitmap.
> >>
> >> I don't think it is meaningful to define any end point that isn't the
> >> current time, so I've omitted that as a possibility.
> > 
> > John, what are you doing here? This adds option after option, and even
> > additional slice object, only complicating an easy thing more and more.
> > I'm not sure if that was your intention, but I feel I'm starting to
> > understand better how Linus's rants come about.
> > 
> > Let me summarise what this means for management layer:
> > 
> > * The management layer has to manage bitmaps. They have direct control
> >   over creation and deletion of bitmaps. So far so good.
> > 
> > * It also has to manage slices in those bitmaps objects; and these
> >   slices are what contains the actual bitmaps. In order to identify a
> >   bitmap in qemu, you need:
> > 
> > a) the node name
> > b) the bitmap ID, and
> > c) the slice number
> > 
> >   The slice number is assigned by qemu and libvirt has to wait until
> >   qemu tells it about the slice number of a newly created slice. If
> >   libvirt doesn't receive the reply to the command that started the
> >   block job, it needs to be able to query this information from qemu,
> >   e.g. in query-block-jobs.
> > 
> > * Slices are automatically created when you start a backup job with a
> >   bitmap. It doesn't matter whether you even intend to do an incremental
> >   backup against this point in time. qemu knows better.
> > 
> > * In order to delete a slice that you don't need any more, you have to
> >   create more slices (by 

Re: [Qemu-block] [PATCH v4] qemu-img: Document --force-share / -U

2017-12-12 Thread Kashyap Chamarthy
On Tue, Dec 12, 2017 at 08:51:13AM +0800, Fam Zheng wrote:
> Signed-off-by: Fam Zheng 

[...]

>  qemu-img.texi | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/qemu-img.texi b/qemu-img.texi
> index fdcf120f36..d93501f94f 100644
> --- a/qemu-img.texi
> +++ b/qemu-img.texi
> @@ -57,6 +57,15 @@ exclusive with the @var{-O} parameters. It is currently 
> required to also use
>  the @var{-n} parameter to skip image creation. This restriction may be 
> relaxed
>  in a future release.
>  
> +@item --force-share (-U)
> +
> +If specified, @code{qemu-img} will open the image with shared permissions,
> +which makes it less likely to conflict with a running guest's permissions due
> +to image locking. For example, this can be used to get the image information
> +(with 'info' subcommand) when the image is used by a running guest. Note that
> +this could produce inconsistent result because of concurrent metadata 
> changes,
> +etc. This option is only allowed when opening images in read-only mode.
> +

Reviewed-by: Kashyap Chamarthy 

-- 
/kashyap



Re: [Qemu-block] [PATCH v2 0/2] virtio-blk: miscellaneous changes

2017-12-12 Thread Stefan Hajnoczi
On Mon, Dec 11, 2017 at 09:16:23AM -0600, Mark Kanda wrote:
> v2: add check for maximum queue size [Stefan]
> 
> This series is for two minor virtio-blk changes. The first patch
> makes the virtio-blk queue size user configurable. The second patch
> rejects logical block size > physical block configurations (similar
> to a recent change in virtio-scsi).
> 
> Mark Kanda (2):
>   virtio-blk: make queue size configurable
>   virtio-blk: reject configs with logical block size > physical block
> size
> 
>  hw/block/virtio-blk.c  | 17 -
>  include/hw/virtio/virtio-blk.h |  1 +
>  2 files changed, 17 insertions(+), 1 deletion(-)
> 
> -- 
> 1.8.3.1
> 

Thanks, applied to my block-next tree:
https://github.com/stefanha/qemu/commits/block-next

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH v4] qemu-img: Document --force-share / -U

2017-12-12 Thread Stefan Hajnoczi
On Tue, Dec 12, 2017 at 08:51:13AM +0800, Fam Zheng wrote:
> Signed-off-by: Fam Zheng 
> 
> ---
> 
> v4: "images". [Kevin]
> 
> v3: Document that the option is not allowed for read-write. [Stefan]
> 
> v2: - "code{qemu-img}". [Kashyap, Eric]
> - "etc.." -> "etc.".
> ---
>  qemu-img.texi | 9 +
>  1 file changed, 9 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature