date:20150727

Re: [Qemu-block] [RFC PATCH 00/11] aio: Introduce handler type to fix nested aio_poll for dataplane

2015-07-27 Thread Paolo Bonzini



On 27/07/2015 08:55, Fam Zheng wrote:
 On Fri, 07/24 09:35, Paolo Bonzini wrote:
 That way, aio_context_acquire can be dropped by:

 /* @pause_owner_thread: a callback which will be called when _main 
 thread_
  * wants exclusively operate on the AioContext, for example with a 
 nested
  * aio_poll().
  * @resume_owner_thread: a callback which will be called when _main 
 thread_
  * has done the exclusive operation.
  */
 AioContext *aio_context_new(AioContextPauseResumeFn *pause_owner_thread,
 AioContextPauseResumeFn 
 *resume_owner_thread,
 void *opaque,
 Error **errp):

 /* Try to pause the owning thread */
 void aio_context_pause(AioContext *ctx, Error **errp);

 /* Try to resume the paused owning thread, cannot fail */
 void aio_context_resume(AioContext *ctx);

 Then, in iothread.c:

 iothread-ctx = aio_context_new(iothread_pause, iothread_resume,
 iothread, local_err);

 Where iothread_pause can block the thread with a QemuCond.

 Condition variables don't mix well with recursive mutexes...  Once we
 have bottom halves outside the AioContext lock, however, we could use a
 separate lock for this condition variable.  That would work.
 
 Yes, I thought so.
 
 I like it, but I still ask myself... what's the difference between
 aio_context_pause/resume and aio_disable/enable_clients? :)  There is
 still state, just in the iothread rather than in the AioContext.
 
 I don't know, maybe this will make aio_context_acquire no longer necessary so
 we get virtio-scsi dataplane fixed?

What you would have is:

1) the main thread calls aio_context_pause/resume, which wait for the
dataplane thread to pause

2) a paused I/O thread use a condition variable or something to wait for
the main thread to call aio_context_resume

In the end, this is still a lock.  For example in RFifoLock the critical
sections look like they are contained within rfifolock_lock and
rfifolock_unlock, but still RFifoLock is a lock and (as far as deadlocks
are concerned) behaves as if there is a single critical section between
rfifolock_lock and rfifolock_unlock.

So, the bdrv_aio_poll approach works great with the huge critical
sections that we have now.  We're effectively using the AioContext lock
_not_ to protect data, but just to shield iothread's usage of block
devices from the dataplane and vice versa.

That worked well as an initial step towards thread-safety (in fact, the
BQL itself is a single huge lock that protects code).  However, we will
however move towards fine-grained critical sections sooner or later;
besides virtio-scsi dataplane, they are also necessary in order to do
real multiqueue with multiple iothreads per BDS.

Once you make your locks protect data, things do become more complex
(more locks, more critical sections, etc.) but they can at the same time
be less scary.  You can reason in terms of invariants that hold at the
end of each critical section, and concurrency is not a big deal anymore
because your new tool (invariants) is both powerful and independent of
concurrency.

You don't have to worry about that now, however.
aio_disable_clients/aio_enable_clients for now can be protected by the
AioContext lock.  Later, it can be protected by whatever fine-grained
lock will protect the list of AioHandlers.

Paolo

[Qemu-block] [RESEND] RFC chaning zlib windowBits for QCOW2

2015-07-27 Thread Peter Lieven


Hi,

I was experimenting a little with optimized zlib versions (like zlib-ng) to 
optimize the speed of compressed qcow2 generation.

I found out that chaning the windowBits from -12 to -15 alone increases 
compression speed by about 15% while lowering
the compressed size by about 6%. As a test case I used a Debian 8.1 disk image.

What is your opinion in chaning the windowBits? The cost is a higher memory usage. 
The usage changes from 272k - 384k for
compression and from 4k - 32k for decompression.

Peter

Re: [Qemu-block] [PATCH V4 3/3] virtio-blk: only clear VIRTIO_F_ANY_LAYOUT for legacy device

2015-07-27 Thread Michael S. Tsirkin

On Mon, Jul 27, 2015 at 12:30:19PM +0200, Paolo Bonzini wrote:
 
 
 On 27/07/2015 11:49, Jason Wang wrote:
  So this patch only clear VIRTIO_F_LAYOUT for legacy device.
  
  Cc: Stefan Hajnoczi stefa...@redhat.com
  Cc: Kevin Wolf kw...@redhat.com
  Cc: qemu-block@nongnu.org
  Signed-off-by: Jason Wang jasow...@redhat.com
  ---
   hw/block/virtio-blk.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
  
  diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
  index 9acbc3a..1d3f26c 100644
  --- a/hw/block/virtio-blk.c
  +++ b/hw/block/virtio-blk.c
  @@ -731,7 +731,6 @@ static uint64_t virtio_blk_get_features(VirtIODevice 
  *vdev, uint64_t features,
   virtio_add_feature(features, VIRTIO_BLK_F_GEOMETRY);
   virtio_add_feature(features, VIRTIO_BLK_F_TOPOLOGY);
   virtio_add_feature(features, VIRTIO_BLK_F_BLK_SIZE);
  -virtio_clear_feature(features, VIRTIO_F_ANY_LAYOUT);
   if (__virtio_has_feature(features, VIRTIO_F_VERSION_1)) {
   if (s-conf.scsi) {
   error_setg(errp, Virtio 1.0 does not support scsi 
  passthrough!);
  @@ -739,6 +738,7 @@ static uint64_t virtio_blk_get_features(VirtIODevice 
  *vdev, uint64_t features,
   }
   virtio_add_feature(features, VIRTIO_F_ANY_LAYOUT);
   } else {
  +virtio_clear_feature(features, VIRTIO_F_ANY_LAYOUT);
   virtio_add_feature(features, VIRTIO_BLK_F_SCSI);
   }
 
 This patch is unnecessary, since the feature is added back below under
 if (__virtio_has_feature(features, VIRTIO_F_VERSION_1)).
 
 Paolo

It's needed so we can apply
virtio: set any_layout in virtio core

Re: [Qemu-block] [PATCH V4 3/3] virtio-blk: only clear VIRTIO_F_ANY_LAYOUT for legacy device

2015-07-27 Thread Cornelia Huck

On Mon, 27 Jul 2015 14:22:37 +0300
Michael S. Tsirkin m...@redhat.com wrote:

 On Mon, Jul 27, 2015 at 12:30:19PM +0200, Paolo Bonzini wrote:
  
  
  On 27/07/2015 11:49, Jason Wang wrote:
   So this patch only clear VIRTIO_F_LAYOUT for legacy device.
   
   Cc: Stefan Hajnoczi stefa...@redhat.com
   Cc: Kevin Wolf kw...@redhat.com
   Cc: qemu-block@nongnu.org
   Signed-off-by: Jason Wang jasow...@redhat.com
   ---
hw/block/virtio-blk.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
   
   diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
   index 9acbc3a..1d3f26c 100644
   --- a/hw/block/virtio-blk.c
   +++ b/hw/block/virtio-blk.c
   @@ -731,7 +731,6 @@ static uint64_t virtio_blk_get_features(VirtIODevice 
   *vdev, uint64_t features,
virtio_add_feature(features, VIRTIO_BLK_F_GEOMETRY);
virtio_add_feature(features, VIRTIO_BLK_F_TOPOLOGY);
virtio_add_feature(features, VIRTIO_BLK_F_BLK_SIZE);
   -virtio_clear_feature(features, VIRTIO_F_ANY_LAYOUT);
if (__virtio_has_feature(features, VIRTIO_F_VERSION_1)) {
if (s-conf.scsi) {
error_setg(errp, Virtio 1.0 does not support scsi 
   passthrough!);
   @@ -739,6 +738,7 @@ static uint64_t virtio_blk_get_features(VirtIODevice 
   *vdev, uint64_t features,
}
virtio_add_feature(features, VIRTIO_F_ANY_LAYOUT);
} else {
   +virtio_clear_feature(features, VIRTIO_F_ANY_LAYOUT);
virtio_add_feature(features, VIRTIO_BLK_F_SCSI);
}
  
  This patch is unnecessary, since the feature is added back below under
  if (__virtio_has_feature(features, VIRTIO_F_VERSION_1)).
  
  Paolo
 
 It's needed so we can apply
 virtio: set any_layout in virtio core

So what's the plan on all those virtio feature patches? It's hard to
keep track about what is based upon what, and what the end result looks
like. I don't have a good feeling about doing this that late in the 2.4
cycle.

Re: [Qemu-block] [PATCH V4 3/3] virtio-blk: only clear VIRTIO_F_ANY_LAYOUT for legacy device

2015-07-27 Thread Paolo Bonzini



On 27/07/2015 13:22, Michael S. Tsirkin wrote:
  This patch is unnecessary, since the feature is added back below under
  if (__virtio_has_feature(features, VIRTIO_F_VERSION_1)).
 
 It's needed so we can apply
 virtio: set any_layout in virtio core

Ah, okay.

Paolo

[Qemu-block] [PULL for-2.4 1/1] sheepdog: serialize requests to overwrapping area

2015-07-27 Thread Jeff Cody

From: Hitoshi Mitake mitake.hito...@lab.ntt.co.jp

Current sheepdog driver only serializes create requests in oid
unit. This mechanism isn't enough for handling requests to
overwrapping area spanning multiple oids, so it can result bugs like
below:
https://bugs.launchpad.net/sheepdog-project/+bug/1456421

This patch adds a new serialization mechanism for the problem. The
difference from the old one is:
1. serialize entire aiocb if their targetting areas overwrap
2. serialize all requests (read, write, and discard), not only creates

This patch also removes the old mechanism because the new one can be
an alternative.

Cc: Kevin Wolf kw...@redhat.com
Cc: Stefan Hajnoczi stefa...@redhat.com
Cc: Teruaki Ishizaki ishizaki.teru...@lab.ntt.co.jp
Cc: Vasiliy Tolstov v.tols...@selfip.ru
Signed-off-by: Hitoshi Mitake mitake.hito...@lab.ntt.co.jp
Tested-by: Vasiliy Tolstov v.tols...@selfip.ru
Signed-off-by: Jeff Cody jc...@redhat.com
---
 block/sheepdog.c | 152 ++-
 1 file changed, 71 insertions(+), 81 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index bd7cbed..9585beb 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -318,6 +318,10 @@ enum AIOCBState {
 AIOCB_DISCARD_OBJ,
 };
 
+#define AIOCBOverwrapping(x, y) \
+(!(x-max_affect_data_idx  y-min_affect_data_idx  \
+   || y-max_affect_data_idx  x-min_affect_data_idx))
+
 struct SheepdogAIOCB {
 BlockAIOCB common;
 
@@ -334,6 +338,11 @@ struct SheepdogAIOCB {
 
 bool cancelable;
 int nr_pending;
+
+uint32_t min_affect_data_idx;
+uint32_t max_affect_data_idx;
+
+QLIST_ENTRY(SheepdogAIOCB) aiocb_siblings;
 };
 
 typedef struct BDRVSheepdogState {
@@ -362,8 +371,10 @@ typedef struct BDRVSheepdogState {
 
 /* Every aio request must be linked to either of these queues. */
 QLIST_HEAD(inflight_aio_head, AIOReq) inflight_aio_head;
-QLIST_HEAD(pending_aio_head, AIOReq) pending_aio_head;
 QLIST_HEAD(failed_aio_head, AIOReq) failed_aio_head;
+
+CoQueue overwrapping_queue;
+QLIST_HEAD(inflight_aiocb_head, SheepdogAIOCB) inflight_aiocb_head;
 } BDRVSheepdogState;
 
 static const char * sd_strerror(int err)
@@ -498,13 +509,7 @@ static void sd_aio_cancel(BlockAIOCB *blockacb)
 AIOReq *aioreq, *next;
 
 if (sd_acb_cancelable(acb)) {
-/* Remove outstanding requests from pending and failed queues.  */
-QLIST_FOREACH_SAFE(aioreq, s-pending_aio_head, aio_siblings,
-   next) {
-if (aioreq-aiocb == acb) {
-free_aio_req(s, aioreq);
-}
-}
+/* Remove outstanding requests from failed queue.  */
 QLIST_FOREACH_SAFE(aioreq, s-failed_aio_head, aio_siblings,
next) {
 if (aioreq-aiocb == acb) {
@@ -529,6 +534,10 @@ static SheepdogAIOCB *sd_aio_setup(BlockDriverState *bs, 
QEMUIOVector *qiov,
int64_t sector_num, int nb_sectors)
 {
 SheepdogAIOCB *acb;
+uint32_t object_size;
+BDRVSheepdogState *s = bs-opaque;
+
+object_size = (UINT32_C(1)  s-inode.block_size_shift);
 
 acb = qemu_aio_get(sd_aiocb_info, bs, NULL, NULL);
 
@@ -542,6 +551,11 @@ static SheepdogAIOCB *sd_aio_setup(BlockDriverState *bs, 
QEMUIOVector *qiov,
 acb-coroutine = qemu_coroutine_self();
 acb-ret = 0;
 acb-nr_pending = 0;
+
+acb-min_affect_data_idx = acb-sector_num * BDRV_SECTOR_SIZE / 
object_size;
+acb-max_affect_data_idx = (acb-sector_num * BDRV_SECTOR_SIZE +
+  acb-nb_sectors * BDRV_SECTOR_SIZE) / 
object_size;
+
 return acb;
 }
 
@@ -703,38 +717,6 @@ static int reload_inode(BDRVSheepdogState *s, uint32_t 
snapid, const char *tag);
 static int get_sheep_fd(BDRVSheepdogState *s, Error **errp);
 static void co_write_request(void *opaque);
 
-static AIOReq *find_pending_req(BDRVSheepdogState *s, uint64_t oid)
-{
-AIOReq *aio_req;
-
-QLIST_FOREACH(aio_req, s-pending_aio_head, aio_siblings) {
-if (aio_req-oid == oid) {
-return aio_req;
-}
-}
-
-return NULL;
-}
-
-/*
- * This function searchs pending requests to the object `oid', and
- * sends them.
- */
-static void coroutine_fn send_pending_req(BDRVSheepdogState *s, uint64_t oid)
-{
-AIOReq *aio_req;
-SheepdogAIOCB *acb;
-
-while ((aio_req = find_pending_req(s, oid)) != NULL) {
-acb = aio_req-aiocb;
-/* move aio_req from pending list to inflight one */
-QLIST_REMOVE(aio_req, aio_siblings);
-QLIST_INSERT_HEAD(s-inflight_aio_head, aio_req, aio_siblings);
-add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov,
-acb-aiocb_type);
-}
-}
-
 static coroutine_fn void reconnect_to_sdog(void *opaque)
 {
 BDRVSheepdogState *s = opaque;
@@ -840,12 +822,6 @@ static void coroutine_fn aio_read_response(void *opaque)

[Qemu-block] [PULL for-2.4 0/1] block patches for 2.4-rc3

2015-07-27 Thread Jeff Cody

The following changes since commit f793d97e454a56d17e404004867985622ca1a63b:

  Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging 
(2015-07-24 13:07:10 +0100)

are available in the git repository at:


  https://github.com/codyprime/qemu-kvm-jtc.git 
tags/jtc-for-upstream-pull-request

for you to fetch changes up to 87b33736235fe3b64d167bfaf1e39a53a2abef43:

  sheepdog: serialize requests to overwrapping area (2015-07-27 11:17:31 -0400)


Block patches for 2.4-rc3


Hitoshi Mitake (1):
  sheepdog: serialize requests to overwrapping area

 block/sheepdog.c | 152 ++-
 1 file changed, 71 insertions(+), 81 deletions(-)

-- 
1.9.3

Re: [Qemu-block] [PATCH V4 3/3] virtio-blk: only clear VIRTIO_F_ANY_LAYOUT for legacy device

2015-07-27 Thread Michael S. Tsirkin

On Mon, Jul 27, 2015 at 03:28:51PM +0200, Cornelia Huck wrote:
 On Mon, 27 Jul 2015 14:22:37 +0300
 Michael S. Tsirkin m...@redhat.com wrote:
 
  On Mon, Jul 27, 2015 at 12:30:19PM +0200, Paolo Bonzini wrote:
   
   
   On 27/07/2015 11:49, Jason Wang wrote:
So this patch only clear VIRTIO_F_LAYOUT for legacy device.

Cc: Stefan Hajnoczi stefa...@redhat.com
Cc: Kevin Wolf kw...@redhat.com
Cc: qemu-block@nongnu.org
Signed-off-by: Jason Wang jasow...@redhat.com
---
 hw/block/virtio-blk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 9acbc3a..1d3f26c 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -731,7 +731,6 @@ static uint64_t 
virtio_blk_get_features(VirtIODevice *vdev, uint64_t features,
 virtio_add_feature(features, VIRTIO_BLK_F_GEOMETRY);
 virtio_add_feature(features, VIRTIO_BLK_F_TOPOLOGY);
 virtio_add_feature(features, VIRTIO_BLK_F_BLK_SIZE);
-virtio_clear_feature(features, VIRTIO_F_ANY_LAYOUT);
 if (__virtio_has_feature(features, VIRTIO_F_VERSION_1)) {
 if (s-conf.scsi) {
 error_setg(errp, Virtio 1.0 does not support scsi 
passthrough!);
@@ -739,6 +738,7 @@ static uint64_t 
virtio_blk_get_features(VirtIODevice *vdev, uint64_t features,
 }
 virtio_add_feature(features, VIRTIO_F_ANY_LAYOUT);
 } else {
+virtio_clear_feature(features, VIRTIO_F_ANY_LAYOUT);
 virtio_add_feature(features, VIRTIO_BLK_F_SCSI);
 }
   
   This patch is unnecessary, since the feature is added back below under
   if (__virtio_has_feature(features, VIRTIO_F_VERSION_1)).
   
   Paolo
  
  It's needed so we can apply
  virtio: set any_layout in virtio core
 
 So what's the plan on all those virtio feature patches? It's hard to
 keep track about what is based upon what, and what the end result looks
 like.

I pushed everything out to my pci branch, pls take a look.
This is what I have:

b787b35 acpi: fix pvpanic device is not shown in ui
49009db hw/acpi/ich9: clean up stale comment about KVM not supporting SMM
e513e9c hw/acpi/ich9: clear smi_en on reset
c9b11f9 virtio-blk: only clear VIRTIO_F_ANY_LAYOUT for legacy device
efb8206 virtio-blk: fail get_features when both scsi and 1.0 were set
9d5b731 virtio: get_features() can fail
2746269 virtio-pci: fix memory MR cleanup for modern
0a5 virtio: set any_layout in virtio core
cd4bfbb virtio-9p: fix any_layout
7882080 virtio-serial: fix ANY_LAYOUT
5f45607 virtio: hide legacy features from modern guests


 I don't have a good feeling about doing this that late in the 2.4
 cycle.

Well there will always be bugs. Given modern is disabled by default,
even if more bugs surface after release it's not the end of the
world.

-- 
MST

[Qemu-block] [PULL 0/2] Block layer patches for 2.4.0-rc3

2015-07-27 Thread Kevin Wolf

The following changes since commit 122e7dab8ac549c8c5a9e1e13aa2464190e888de:

  Merge remote-tracking branch 'remotes/stefanha/tags/net-pull-request' into 
staging (2015-07-27 14:53:42 +0100)

are available in the git repository at:


  git://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to 77c102c26ead946fe7589d4bddcdfa5cb431ebfe:

  block: qemu-iotests - add check for multiplication overflow in vpc 
(2015-07-27 17:19:07 +0200)


Block layer patches for 2.4.0-rc3


Jeff Cody (2):
  block: vpc - prevent overflow if max_table_entries = 0x4000
  block: qemu-iotests - add check for multiplication overflow in vpc

 block/vpc.c   |  18 +++--
 tests/qemu-iotests/135|  54 ++
 tests/qemu-iotests/135.out|   5 +++
 tests/qemu-iotests/group  |   1 +
 tests/qemu-iotests/sample_images/afl5.img.bz2 | Bin 0 - 175 bytes
 5 files changed, 74 insertions(+), 4 deletions(-)
 create mode 100755 tests/qemu-iotests/135
 create mode 100644 tests/qemu-iotests/135.out
 create mode 100644 tests/qemu-iotests/sample_images/afl5.img.bz2

[Qemu-block] [PATCH v9 08/10] qcow2: Invoke refcount order amendment function

2015-07-27 Thread Max Reitz

Make use of qcow2_change_refcount_order() to support changing the
refcount order with qemu-img amend.

Signed-off-by: Max Reitz mre...@redhat.com
---
 block/qcow2.c | 44 +++-
 1 file changed, 35 insertions(+), 9 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 8b5dc73..ae4477d 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2638,13 +2638,7 @@ static int qcow2_downgrade(BlockDriverState *bs, int 
target_version,
 }
 
 if (s-refcount_order != 4) {
-/* we would have to convert the image to a refcount_order == 4 image
- * here; however, since qemu (at the time of writing this) does not
- * support anything different than 4 anyway, there is no point in doing
- * so right now; however, we should error out (if qemu supports this in
- * the future and this code has not been adapted) */
-error_report(qcow2_downgrade: Image refcount orders other than 4 are 
- currently not supported.);
+error_report(compat=0.10 requires refcount_bits=16);
 return -ENOTSUP;
 }
 
@@ -2692,6 +2686,7 @@ typedef enum Qcow2AmendOperation {
  * invocation from an operation change */
 QCOW2_NO_OPERATION = 0,
 
+QCOW2_CHANGING_REFCOUNT_ORDER,
 QCOW2_DOWNGRADING,
 } Qcow2AmendOperation;
 
@@ -2767,6 +2762,7 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 const char *compat = NULL;
 uint64_t cluster_size = s-cluster_size;
 bool encrypt;
+int refcount_bits = s-refcount_bits;
 int ret;
 QemuOptDesc *desc = opts-list-desc;
 Qcow2AmendHelperCBInfo helper_cb_info;
@@ -2818,8 +2814,16 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 lazy_refcounts = qemu_opt_get_bool(opts, BLOCK_OPT_LAZY_REFCOUNTS,
lazy_refcounts);
 } else if (!strcmp(desc-name, BLOCK_OPT_REFCOUNT_BITS)) {
-error_report(Cannot change refcount entry width);
-return -ENOTSUP;
+refcount_bits = qemu_opt_get_number(opts, BLOCK_OPT_REFCOUNT_BITS,
+refcount_bits);
+
+if (refcount_bits = 0 || refcount_bits  64 ||
+!is_power_of_2(refcount_bits))
+{
+error_report(Refcount width must be a power of two and may 
+ not exceed 64 bits);
+return -EINVAL;
+}
 } else {
 /* if this point is reached, this probably means a new option was
  * added without having it covered here */
@@ -2833,6 +2837,7 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 .original_status_cb = status_cb,
 .original_cb_opaque = cb_opaque,
 .total_operations = (new_version  old_version)
+  + (s-refcount_bits != refcount_bits)
 };
 
 /* Upgrade first (some features may require compat=1.1) */
@@ -2845,6 +2850,27 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 }
 }
 
+if (s-refcount_bits != refcount_bits) {
+int refcount_order = ctz32(refcount_bits);
+Error *local_error = NULL;
+
+if (new_version  3  refcount_bits != 16) {
+error_report(Different refcount widths than 16 bits require 
+ compatibility level 1.1 or above (use compat=1.1 or 
+ greater));
+return -EINVAL;
+}
+
+helper_cb_info.current_operation = QCOW2_CHANGING_REFCOUNT_ORDER;
+ret = qcow2_change_refcount_order(bs, refcount_order,
+  qcow2_amend_helper_cb,
+  helper_cb_info, local_error);
+if (ret  0) {
+error_report_err(local_error);
+return ret;
+}
+}
+
 if (backing_file || backing_format) {
 ret = qcow2_change_backing_file(bs,
 backing_file ?: s-image_backing_file,
-- 
2.4.6

Re: [Qemu-block] [Qemu-devel] [PATCH COLO-BLOCK v7 02/17] quorum: implement block driver interfaces add/delete a BDS's child

2015-07-27 Thread Eric Blake

On 07/02/2015 09:21 AM, Alberto Garcia wrote:

 
 3) I don't think it's necessary to set to NULL the pointers in
 s-bs[i] when i = num_children. There's no way to access those
 pointers anyway. Same for the ' s-bs[s-num_children] = NULL; ' bit
 in quorum_del_child(). I also think that using memset() for setting
 NULL pointers is not portable, although QEMU is already doing this in
 a few places.

 OK, will remove it in the next version. Just a question: why is using
 memset() for setting NULL pointers is not prtable?
 
 The standard allows for null pointers to be internally represented by
 nonzero bit patterns. However I'm not aware of any system that we
 support that does that.
 
http://c-faq.com/null/confusion4.html
http://c-faq.com/null/machexamp.html

What's more, POSIX has very recently taken the stance that memset() to 0
will work on all POSIX systems, even if someone is insane enough to use
a non-zero bit pattern for NULL on modern hardware:

http://austingroupbugs.net/view.php?id=940

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-block] [PATCH v9 00/10] qcow2: Support refcount order amendment

2015-07-27 Thread Max Reitz

(v1..v7 were named qcow2: Support refcount orders != 4)

This series contains the final 10 patches of my qcow2 refcount order
series, which add refcount order amendment functionality to qemu-img.


v9:
- Rebase on master
- Patch 8: s/qerror_report_err/error_report_err/


git backport-diff against v8:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/10:[] [--] 'progress: Allow regressing progress'
002/10:[] [--] 'block: Add opaque value to the amend CB'
003/10:[] [-C] 'qcow2: Use error_report() in qcow2_amend_options()'
004/10:[] [--] 'qcow2: Use abort() instead of assert(false)'
005/10:[] [--] 'qcow2: Split upgrade/downgrade paths for amend'
006/10:[] [--] 'qcow2: Use intermediate helper CB for amend'
007/10:[] [--] 'qcow2: Add function for refcount order amendment'
008/10:[0002] [FC] 'qcow2: Invoke refcount order amendment function'
009/10:[] [--] 'qcow2: Point to amend function in check'
010/10:[] [--] 'iotests: Extend test 112 for qemu-img amend'


Max Reitz (10):
  progress: Allow regressing progress
  block: Add opaque value to the amend CB
  qcow2: Use error_report() in qcow2_amend_options()
  qcow2: Use abort() instead of assert(false)
  qcow2: Split upgrade/downgrade paths for amend
  qcow2: Use intermediate helper CB for amend
  qcow2: Add function for refcount order amendment
  qcow2: Invoke refcount order amendment function
  qcow2: Point to amend function in check
  iotests: Extend test 112 for qemu-img amend

 block.c|   4 +-
 block/qcow2-cluster.c  |  14 +-
 block/qcow2-refcount.c | 450 +
 block/qcow2.c  | 178 ++
 block/qcow2.h  |   7 +-
 include/block/block.h  |   4 +-
 include/block/block_int.h  |   3 +-
 qemu-img.c |   5 +-
 tests/qemu-iotests/061.out |  14 +-
 tests/qemu-iotests/112 | 109 +++
 tests/qemu-iotests/112.out |  71 +++
 util/qemu-progress.c   |   3 +-
 12 files changed, 803 insertions(+), 59 deletions(-)

-- 
2.4.6

[Qemu-block] [PATCH v9 04/10] qcow2: Use abort() instead of assert(false)

2015-07-27 Thread Max Reitz

Signed-off-by: Max Reitz mre...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
Reviewed-by: Stefan Hajnoczi stefa...@redhat.com
Reviewed-by: Alberto Garcia be...@igalia.com
---
 block/qcow2.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 3bcb96e..a7f50db 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2751,9 +2751,9 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 error_report(Cannot change refcount entry width);
 return -ENOTSUP;
 } else {
-/* if this assertion fails, this probably means a new option was
+/* if this point is reached, this probably means a new option was
  * added without having it covered here */
-assert(false);
+abort();
 }
 
 desc++;
-- 
2.4.6

Re: [Qemu-block] [PATCH v9 08/10] qcow2: Invoke refcount order amendment function

2015-07-27 Thread Eric Blake

On 07/27/2015 09:51 AM, Max Reitz wrote:
 Make use of qcow2_change_refcount_order() to support changing the
 refcount order with qemu-img amend.
 
 Signed-off-by: Max Reitz mre...@redhat.com
 ---
  block/qcow2.c | 44 +++-
  1 file changed, 35 insertions(+), 9 deletions(-)
 

Reviewed-by: Eric Blake ebl...@redhat.com

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-block] [PATCH v9 01/10] progress: Allow regressing progress

2015-07-27 Thread Max Reitz

Progress may regress; this should be displayed correctly by
qemu_progress_print().

While touching that area of code, drop the redundant parentheses in the
same condition.

Signed-off-by: Max Reitz mre...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
Reviewed-by: Stefan Hajnoczi stefa...@redhat.com
---
 util/qemu-progress.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/util/qemu-progress.c b/util/qemu-progress.c
index 4ee5cd0..532333e 100644
--- a/util/qemu-progress.c
+++ b/util/qemu-progress.c
@@ -152,7 +152,8 @@ void qemu_progress_print(float delta, int max)
 state.current = current;
 
 if (current  (state.last_print + state.min_skip) ||
-(current == 100) || (current == 0)) {
+current  (state.last_print - state.min_skip) ||
+current == 100 || current == 0) {
 state.last_print = state.current;
 state.print();
 }
-- 
2.4.6

Re: [Qemu-block] [RESEND] RFC chaning zlib windowBits for QCOW2

2015-07-27 Thread Stefan Hajnoczi

On Mon, Jul 27, 2015 at 2:58 PM, Peter Lieven p...@kamp.de wrote:
 I was experimenting a little with optimized zlib versions (like zlib-ng) to
 optimize the speed of compressed qcow2 generation.

 I found out that chaning the windowBits from -12 to -15 alone increases
 compression speed by about 15% while lowering
 the compressed size by about 6%. As a test case I used a Debian 8.1 disk
 image.

 What is your opinion in chaning the windowBits? The cost is a higher memory
 usage. The usage changes from 272k - 384k for
 compression and from 4k - 32k for decompression.

Kevin is the qcow2 maintainer, but judging purely by the numbers you
posted it seems attractive to make this change.

Does it have any impact on the file format?  Right now -12 is
hard-coded in QEMU, so we need to make sure that existing files and
programs that can access qcow2 files continue to work with both
old/new images.  If compatibility is not possible then you'll have to
investigate feature bits.

Stefan

[Qemu-block] [PATCH v9 10/10] iotests: Extend test 112 for qemu-img amend

2015-07-27 Thread Max Reitz

Add tests for conversion between different refcount widths.

Signed-off-by: Max Reitz mre...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 tests/qemu-iotests/112 | 109 +
 tests/qemu-iotests/112.out |  71 +
 2 files changed, 180 insertions(+)

diff --git a/tests/qemu-iotests/112 b/tests/qemu-iotests/112
index 3f054a3..34ba06a 100755
--- a/tests/qemu-iotests/112
+++ b/tests/qemu-iotests/112
@@ -180,6 +180,115 @@ $QEMU_IMG snapshot -c foo $TEST_IMG
 # leaked (refcount=UINT64_MAX reference=1)
 _check_test_img
 
+echo
+echo '=== Amend from refcount_bits=16 to refcount_bits=1 ==='
+echo
+
+_make_test_img 64M
+print_refcount_bits
+
+$QEMU_IO -c 'write 16M 32M' $TEST_IMG | _filter_qemu_io
+$QEMU_IMG amend -o refcount_bits=1 $TEST_IMG
+_check_test_img
+print_refcount_bits
+
+echo
+echo '=== Amend from refcount_bits=1 to refcount_bits=64 ==='
+echo
+
+$QEMU_IMG amend -o refcount_bits=64 $TEST_IMG
+_check_test_img
+print_refcount_bits
+
+echo
+echo '=== Amend to compat=0.10 ==='
+echo
+
+# Should not work because refcount_bits needs to be 16 for compat=0.10
+$QEMU_IMG amend -o compat=0.10 $TEST_IMG
+print_refcount_bits
+# Should work
+$QEMU_IMG amend -o compat=0.10,refcount_bits=16 $TEST_IMG
+_check_test_img
+print_refcount_bits
+
+# Get back to compat=1.1 and refcount_bits=16
+$QEMU_IMG amend -o compat=1.1 $TEST_IMG
+print_refcount_bits
+# Should not work
+$QEMU_IMG amend -o refcount_bits=32,compat=0.10 $TEST_IMG
+print_refcount_bits
+
+echo
+echo '=== Amend with snapshot ==='
+echo
+
+$QEMU_IMG snapshot -c foo $TEST_IMG
+# Just to have different refcounts across the image
+$QEMU_IO -c 'write 0 16M' $TEST_IMG | _filter_qemu_io
+
+# Should not work (may work in the future by first decreasing all refcounts so
+# they fit into the target range by copying them)
+$QEMU_IMG amend -o refcount_bits=1 $TEST_IMG
+_check_test_img
+print_refcount_bits
+
+# Should work
+$QEMU_IMG amend -o refcount_bits=2 $TEST_IMG
+_check_test_img
+print_refcount_bits
+
+echo
+echo '=== Testing too many references for check ==='
+echo
+
+IMGOPTS=$IMGOPTS,refcount_bits=1 _make_test_img 64M
+print_refcount_bits
+
+# This cluster should be created at 0x5
+$QEMU_IO -c 'write 0 64k' $TEST_IMG | _filter_qemu_io
+# Now make the second L2 entry (the L2 table should be at 0x4) point to 
that
+# cluster, so we have two references
+poke_file $TEST_IMG $((0x40008)) \x80\x00\x00\x00\x00\x05\x00\x00
+
+# This should say please use amend
+_check_test_img -r all
+
+# So we do that
+$QEMU_IMG amend -o refcount_bits=2 $TEST_IMG
+print_refcount_bits
+
+# And try again
+_check_test_img -r all
+
+echo
+echo '=== Multiple walks necessary during amend ==='
+echo
+
+IMGOPTS=$IMGOPTS,refcount_bits=1,cluster_size=512 _make_test_img 64k
+
+# Cluster 0 is the image header, clusters 1 to 4 are used by the L1 table, a
+# single L2 table, the reftable and a single refblock. This creates 58 data
+# clusters (actually, the L2 table is created here, too), so in total there are
+# then 63 used clusters in the image. With a refcount width of 64, one refblock
+# describes 64 clusters (512 bytes / 64 bits/entry = 64 entries), so this will
+# make the first refblock in the amended image have exactly one free entry.
+$QEMU_IO -c write 0 $((58 * 512)) $TEST_IMG | _filter_qemu_io
+
+# Now change the refcount width; since the first new refblock will have exactly
+# one free entry, that entry will be used to store its own reference. No other
+# refblocks are needed, so then the new reftable will be allocated; since the
+# first new refblock is completely filled up, this will require a new refblock
+# which is why the refcount width changing function will need to run through
+# everything one more time until the allocations are stable.
+# Having more walks than usual should be visible as regressing progress (from
+# 66.67 % (2/3 walks) to 50.00 % (2/4 walks)).
+$QEMU_IMG amend -o refcount_bits=64 -p $TEST_IMG | tr '\r' '\n' \
+   | grep -A 1 '66.67'
+print_refcount_bits
+
+_check_test_img
+
 
 # success, all done
 echo '*** done'
diff --git a/tests/qemu-iotests/112.out b/tests/qemu-iotests/112.out
index 9a98633..4438076 100644
--- a/tests/qemu-iotests/112.out
+++ b/tests/qemu-iotests/112.out
@@ -81,4 +81,75 @@ Leaked cluster 6 refcount=1 reference=0
 
 2 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
+
+=== Amend from refcount_bits=16 to refcount_bits=1 ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+refcount bits: 16
+wrote 33554432/33554432 bytes at offset 16777216
+32 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+No errors were found on the image.
+refcount bits: 1
+
+=== Amend from refcount_bits=1 to refcount_bits=64 ===
+
+No errors were found on the image.
+refcount bits: 64
+
+=== Amend to compat=0.10 ===
+
+qemu-img: compat=0.10 requires refcount_bits=16
+qemu-img:

[Qemu-block] [PATCH] raw-posix.c: Make GetBSDPath() handle caching options

2015-07-27 Thread Programmingkid

Add support for caching options that can be specified from
the command line. 

Signed-off-by: John Arbuckle programmingk...@gmail.com

---
 block/raw-posix.c |   15 +--
 1 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index cbe6574..67d1d48 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1973,8 +1973,8 @@ BlockDriver bdrv_file = {
 
 #if defined(__APPLE__)  defined(__MACH__)
 static kern_return_t FindEjectableCDMedia( io_iterator_t *mediaIterator );
-static kern_return_t GetBSDPath( io_iterator_t mediaIterator, char *bsdPath, 
CFIndex maxPathSize );
-
+static kern_return_t GetBSDPath(io_iterator_t mediaIterator, char *bsdPath,
+CFIndex maxPathSize, int flags);
 kern_return_t FindEjectableCDMedia( io_iterator_t *mediaIterator )
 {
 kern_return_t   kernResult;
@@ -2001,7 +2001,8 @@ kern_return_t FindEjectableCDMedia( io_iterator_t 
*mediaIterator )
 return kernResult;
 }
 
-kern_return_t GetBSDPath( io_iterator_t mediaIterator, char *bsdPath, CFIndex 
maxPathSize )
+kern_return_t GetBSDPath(io_iterator_t mediaIterator, char *bsdPath,
+ CFIndex maxPathSize, int flags)
 {
 io_object_t nextMedia;
 kern_return_t   kernResult = KERN_FAILURE;
@@ -2014,7 +2015,9 @@ kern_return_t GetBSDPath( io_iterator_t mediaIterator, 
char *bsdPath, CFIndex ma
 if ( bsdPathAsCFString ) {
 size_t devPathLength;
 strcpy( bsdPath, _PATH_DEV );
-strcat( bsdPath, r );
+if (flags  BDRV_O_NOCACHE) {
+strcat(bsdPath, r);
+}
 devPathLength = strlen( bsdPath );
 if ( CFStringGetCString( bsdPathAsCFString, bsdPath + 
devPathLength, maxPathSize - devPathLength, kCFStringEncodingASCII ) ) {
 kernResult = KERN_SUCCESS;
@@ -2126,8 +2129,8 @@ static int hdev_open(BlockDriverState *bs, QDict 
*options, int flags,
 int fd;
 
 kernResult = FindEjectableCDMedia( mediaIterator );
-kernResult = GetBSDPath( mediaIterator, bsdPath, sizeof( bsdPath ) );
-
+kernResult = GetBSDPath(mediaIterator, bsdPath, sizeof(bsdPath),
+flags);
 if ( bsdPath[ 0 ] != '\0' ) {
 strcat(bsdPath,s0);
 /* some CDs don't have a partition 0 */
-- 
1.7.5.4

[Qemu-block] [PATCH v5] raw-posix.c: Make physical devices usable in QEMU under Mac OS X host

2015-07-27 Thread Programmingkid

Mac OS X can be picky when it comes to allowing the user to use physical devices
in QEMU. Most mounted volumes appear to be off limits to QEMU. If an issue is
detected, a message is displayed showing the user how to unmount a volume.

Signed-off-by: John Arbuckle programmingk...@gmail.com

---
Removed changes to GetBSDPath() to a separate patch.
This patch now depends on the GetBSDPath patch.

 block/raw-posix.c |   90 +---
 1 files changed, 64 insertions(+), 26 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 67d1d48..a41006f 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -42,9 +42,8 @@
 #include IOKit/storage/IOMediaBSDClient.h
 #include IOKit/storage/IOMedia.h
 #include IOKit/storage/IOCDMedia.h
-//#include IOKit/storage/IOCDTypes.h
 #include CoreFoundation/CoreFoundation.h
-#endif
+#endif /* (__APPLE__)  (__MACH__) */
 
 #ifdef __sun__
 #define _POSIX_PTHREAD_SEMANTICS 1
@@ -1972,7 +1971,7 @@ BlockDriver bdrv_file = {
 /* host device */
 
 #if defined(__APPLE__)  defined(__MACH__)
-static kern_return_t FindEjectableCDMedia( io_iterator_t *mediaIterator );
+static kern_return_t FindEjectableCDMedia(io_iterator_t *mediaIterator);
 static kern_return_t GetBSDPath(io_iterator_t mediaIterator, char *bsdPath,
 CFIndex maxPathSize, int flags);
 kern_return_t FindEjectableCDMedia( io_iterator_t *mediaIterator )
@@ -2030,7 +2029,34 @@ kern_return_t GetBSDPath(io_iterator_t mediaIterator, 
char *bsdPath,
 return kernResult;
 }
 
-#endif
+/* Sets up a real cdrom for use in QEMU */
+static bool setupCDROM(char *bsdPath)
+{
+int index, numOfTestPartitions = 2, fd;
+char testPartition[MAXPATHLEN];
+bool partitionFound = false;
+
+/* look for a working partition */
+for (index = 0; index  numOfTestPartitions; index++) {
+snprintf(testPartition, sizeof(testPartition), %ss%d, bsdPath, 
index);
+fd = qemu_open(testPartition, O_RDONLY | O_BINARY | O_LARGEFILE);
+if (fd = 0) {
+partitionFound = true;
+qemu_close(fd);
+break;
+}
+}
+
+/* if a working partition on the device was not found */
+if (partitionFound == false) {
+printf(Error: Failed to find a working partition on disc!\n);
+} else {
+DPRINTF(Using %s as optical disc\n, testPartition);
+pstrcpy(bsdPath, MAXPATHLEN, testPartition);
+}
+return partitionFound;
+}
+#endif /* defined(__APPLE__)  defined(__MACH__) */
 
 static int hdev_probe_device(const char *filename)
 {
@@ -2118,34 +2144,32 @@ static int hdev_open(BlockDriverState *bs, QDict 
*options, int flags,
 BDRVRawState *s = bs-opaque;
 Error *local_err = NULL;
 int ret;
+bool cdromOK = true;
 
 #if defined(__APPLE__)  defined(__MACH__)
 const char *filename = qdict_get_str(options, filename);
 
-if (strstart(filename, /dev/cdrom, NULL)) {
-kern_return_t kernResult;
-io_iterator_t mediaIterator;
-char bsdPath[ MAXPATHLEN ];
-int fd;
-
-kernResult = FindEjectableCDMedia( mediaIterator );
-kernResult = GetBSDPath(mediaIterator, bsdPath, sizeof(bsdPath), 
-flags);
-if ( bsdPath[ 0 ] != '\0' ) {
-strcat(bsdPath,s0);
-/* some CDs don't have a partition 0 */
-fd = qemu_open(bsdPath, O_RDONLY | O_BINARY | O_LARGEFILE);
-if (fd  0) {
-bsdPath[strlen(bsdPath)-1] = '1';
+/* If using a physical device */
+if (strstart(filename, /dev/, NULL)) {
+char bsdPath[MAXPATHLEN];
+
+/* If the physical device is a cdrom */
+if (strcmp(filename, /dev/cdrom) == 0) {
+io_iterator_t mediaIterator;
+FindEjectableCDMedia(mediaIterator);
+GetBSDPath(mediaIterator, bsdPath, sizeof(bsdPath), flags);
+if (bsdPath[0] == '\0') {
+printf(Error: failed to obtain bsd path for optical 
drive!\n);
 } else {
-qemu_close(fd);
+cdromOK = setupCDROM(bsdPath);
+filename = bsdPath;
 }
-filename = bsdPath;
-qdict_put(options, filename, qstring_from_str(filename));
-}
 
-if ( mediaIterator )
-IOObjectRelease( mediaIterator );
+if (mediaIterator) {
+IOObjectRelease(mediaIterator);
+}
+}
+qdict_put(options, filename, qstring_from_str(filename));
 }
 #endif
 
@@ -2156,7 +2180,21 @@ static int hdev_open(BlockDriverState *bs, QDict 
*options, int flags,
 if (local_err) {
 error_propagate(errp, local_err);
 }
-return ret;
+}
+
+#if defined(__APPLE__)  defined(__MACH__)
+/* if a physical device experienced an error while being opened */
+if (strncmp(filename, /dev/, 5) == 0

[Qemu-block] [PATCH] qapi: use 'type' in generated C code to match QMP union wire form

2015-07-27 Thread Eric Blake

When dealing with simple qapi unions, the code was generating a
discriminator field of 'kind' even though the discriminator is
sent as 'type' over QMP.  Renaming things to match gets us one
step closer to reusing common generator code for both simple and
flat unions, without having to special case the naming choice
for simple unions.

However, this patch does not rename the generated enum, which is
still 'unionnameKind'; if we wanted, a further patch could
generate implicit enums as 'unionnameType', with even more
churn to C code to react, and probably update the qapi generator
to reserve the 'fooType' namespace instead of 'fooKind'.  But
that is a lot harder, as we already have existing qapi usage
of types that end in 'Type'.

Signed-off-by: Eric Blake ebl...@redhat.com
---

Applies on top of Markus' [PATCH RFC v2 00/47] qapi: QMP
introspection. Making this cleanup would make it easier
to switch between simple and flat unions in the future as
needed, because the C code would always use the discriminator
name passed over QMP.

 block/qcow2.c   |  2 +-
 block/vmdk.c|  2 +-
 blockdev.c  | 16 
 hmp.c   | 12 ++--
 hw/input/hid.c  |  2 +-
 hw/input/ps2.c  |  2 +-
 hw/input/virtio-input-hid.c |  2 +-
 hw/mem/pc-dimm.c|  2 +-
 net/dump.c  |  2 +-
 net/hub.c   |  2 +-
 net/l2tpv3.c|  2 +-
 net/net.c   | 20 ++--
 net/slirp.c |  2 +-
 net/socket.c|  2 +-
 net/tap.c   |  4 ++--
 net/vhost-user.c|  2 +-
 numa.c  |  4 ++--
 qemu-char.c | 24 
 scripts/qapi-types.py   |  2 +-
 scripts/qapi-visit.py   |  7 ++-
 scripts/qapi.py |  2 +-
 tpm.c   |  2 +-
 ui/input-keymap.c   | 10 +-
 ui/input-legacy.c   |  2 +-
 ui/input.c  | 22 +++---
 util/qemu-sockets.c | 12 ++--
 26 files changed, 80 insertions(+), 83 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 76c331b..9e1fcc5 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2535,7 +2535,7 @@ static ImageInfoSpecific 
*qcow2_get_specific_info(BlockDriverState *bs)
 ImageInfoSpecific *spec_info = g_new(ImageInfoSpecific, 1);

 *spec_info = (ImageInfoSpecific){
-.kind  = IMAGE_INFO_SPECIFIC_KIND_QCOW2,
+.type  = IMAGE_INFO_SPECIFIC_KIND_QCOW2,
 {
 .qcow2 = g_new(ImageInfoSpecificQCow2, 1),
 },
diff --git a/block/vmdk.c b/block/vmdk.c
index fbaab67..180f416 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -2153,7 +2153,7 @@ static ImageInfoSpecific 
*vmdk_get_specific_info(BlockDriverState *bs)
 ImageInfoList **next;

 *spec_info = (ImageInfoSpecific){
-.kind = IMAGE_INFO_SPECIFIC_KIND_VMDK,
+.type = IMAGE_INFO_SPECIFIC_KIND_VMDK,
 {
 .vmdk = g_new0(ImageInfoSpecificVmdk, 1),
 },
diff --git a/blockdev.c b/blockdev.c
index 62a4586..8f16f03 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1046,12 +1046,12 @@ void hmp_commit(Monitor *mon, const QDict *qdict)
 }
 }

-static void blockdev_do_action(int kind, void *data, Error **errp)
+static void blockdev_do_action(int type, void *data, Error **errp)
 {
 TransactionAction action;
 TransactionActionList list;

-action.kind = kind;
+action.type = type;
 action.data = data;
 list.value = action;
 list.next = NULL;
@@ -1291,7 +1291,7 @@ static void internal_snapshot_prepare(BlkTransactionState 
*common,
 InternalSnapshotState *state;
 int ret1;

-g_assert(common-action-kind ==
+g_assert(common-action-type ==
  TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_INTERNAL_SYNC);
 internal = common-action-blockdev_snapshot_internal_sync;
 state = DO_UPCAST(InternalSnapshotState, common, common);
@@ -1434,7 +1434,7 @@ static void external_snapshot_prepare(BlkTransactionState 
*common,
 TransactionAction *action = common-action;

 /* get parameters */
-g_assert(action-kind == TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC);
+g_assert(action-type == TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC);

 has_device = action-blockdev_snapshot_sync-has_device;
 device = action-blockdev_snapshot_sync-device;
@@ -1579,7 +1579,7 @@ static void drive_backup_prepare(BlkTransactionState 
*common, Error **errp)
 DriveBackup *backup;
 Error *local_err = NULL;

-assert(common-action-kind == TRANSACTION_ACTION_KIND_DRIVE_BACKUP);
+assert(common-action-type == TRANSACTION_ACTION_KIND_DRIVE_BACKUP);
 backup = common-action-drive_backup;

 blk = blk_by_name(backup-device);
@@ -1647,7 +1647,7 @@ static void blockdev_backup_prepare(BlkTransactionState 
*common, Error **errp)
 BlockBackend *blk;
 Error *local_err =

[Qemu-block] [PATCH v9 05/10] qcow2: Split upgrade/downgrade paths for amend

2015-07-27 Thread Max Reitz

If the image version should be upgraded, that is the first we should do;
if it should be downgraded, that is the last we should do. So split the
version change block into an upgrade part at the start and a downgrade
part at the end.

Signed-off-by: Max Reitz mre...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
Reviewed-by: Stefan Hajnoczi stefa...@redhat.com
Reviewed-by: Alberto Garcia be...@igalia.com
---
 block/qcow2.c | 31 ---
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index a7f50db..fe91974 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2759,20 +2759,13 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 desc++;
 }
 
-if (new_version != old_version) {
-if (new_version  old_version) {
-/* Upgrade */
-s-qcow_version = new_version;
-ret = qcow2_update_header(bs);
-if (ret  0) {
-s-qcow_version = old_version;
-return ret;
-}
-} else {
-ret = qcow2_downgrade(bs, new_version, status_cb, cb_opaque);
-if (ret  0) {
-return ret;
-}
+/* Upgrade first (some features may require compat=1.1) */
+if (new_version  old_version) {
+s-qcow_version = new_version;
+ret = qcow2_update_header(bs);
+if (ret  0) {
+s-qcow_version = old_version;
+return ret;
 }
 }
 
@@ -2787,7 +2780,7 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 
 if (s-use_lazy_refcounts != lazy_refcounts) {
 if (lazy_refcounts) {
-if (s-qcow_version  3) {
+if (new_version  3) {
 error_report(Lazy refcounts only supported with compatibility 

  level 1.1 and above (use compat=1.1 or 
greater));
 return -EINVAL;
@@ -2823,6 +2816,14 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 }
 }
 
+/* Downgrade last (so unsupported features can be removed before) */
+if (new_version  old_version) {
+ret = qcow2_downgrade(bs, new_version, status_cb, cb_opaque);
+if (ret  0) {
+return ret;
+}
+}
+
 return 0;
 }
 
-- 
2.4.6

[Qemu-block] [PATCH v9 06/10] qcow2: Use intermediate helper CB for amend

2015-07-27 Thread Max Reitz

If there is more than one time-consuming operation to be performed for
qcow2_amend_options(), we need an intermediate CB which coordinates the
progress of the individual operations and passes the result to the
original status callback.

Signed-off-by: Max Reitz mre...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
Reviewed-by: Stefan Hajnoczi stefa...@redhat.com
---
 block/qcow2.c | 80 ++-
 1 file changed, 79 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index fe91974..8b5dc73 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2686,6 +2686,75 @@ static int qcow2_downgrade(BlockDriverState *bs, int 
target_version,
 return 0;
 }
 
+typedef enum Qcow2AmendOperation {
+/* This is the value Qcow2AmendHelperCBInfo::last_operation will be
+ * statically initialized to so that the helper CB can discern the first
+ * invocation from an operation change */
+QCOW2_NO_OPERATION = 0,
+
+QCOW2_DOWNGRADING,
+} Qcow2AmendOperation;
+
+typedef struct Qcow2AmendHelperCBInfo {
+/* The code coordinating the amend operations should only modify
+ * these four fields; the rest will be managed by the CB */
+BlockDriverAmendStatusCB *original_status_cb;
+void *original_cb_opaque;
+
+Qcow2AmendOperation current_operation;
+
+/* Total number of operations to perform (only set once) */
+int total_operations;
+
+/* The following fields are managed by the CB */
+
+/* Number of operations completed */
+int operations_completed;
+
+/* Cumulative offset of all completed operations */
+int64_t offset_completed;
+
+Qcow2AmendOperation last_operation;
+int64_t last_work_size;
+} Qcow2AmendHelperCBInfo;
+
+static void qcow2_amend_helper_cb(BlockDriverState *bs,
+  int64_t operation_offset,
+  int64_t operation_work_size, void *opaque)
+{
+Qcow2AmendHelperCBInfo *info = opaque;
+int64_t current_work_size;
+int64_t projected_work_size;
+
+if (info-current_operation != info-last_operation) {
+if (info-last_operation != QCOW2_NO_OPERATION) {
+info-offset_completed += info-last_work_size;
+info-operations_completed++;
+}
+
+info-last_operation = info-current_operation;
+}
+
+assert(info-total_operations  0);
+assert(info-operations_completed  info-total_operations);
+
+info-last_work_size = operation_work_size;
+
+current_work_size = info-offset_completed + operation_work_size;
+
+/* current_work_size is the total work size for (operations_completed + 1)
+ * operations (which includes this one), so multiply it by the number of
+ * operations not covered and divide it by the number of operations
+ * covered to get a projection for the operations not covered */
+projected_work_size = current_work_size * (info-total_operations -
+   info-operations_completed - 1)
+/ (info-operations_completed + 1);
+
+info-original_status_cb(bs, info-offset_completed + operation_offset,
+ current_work_size + projected_work_size,
+ info-original_cb_opaque);
+}
+
 static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
BlockDriverAmendStatusCB *status_cb,
void *cb_opaque)
@@ -2700,6 +2769,7 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 bool encrypt;
 int ret;
 QemuOptDesc *desc = opts-list-desc;
+Qcow2AmendHelperCBInfo helper_cb_info;
 
 while (desc  desc-name) {
 if (!qemu_opt_find(opts, desc-name)) {
@@ -2759,6 +2829,12 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 desc++;
 }
 
+helper_cb_info = (Qcow2AmendHelperCBInfo){
+.original_status_cb = status_cb,
+.original_cb_opaque = cb_opaque,
+.total_operations = (new_version  old_version)
+};
+
 /* Upgrade first (some features may require compat=1.1) */
 if (new_version  old_version) {
 s-qcow_version = new_version;
@@ -2818,7 +2894,9 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 
 /* Downgrade last (so unsupported features can be removed before) */
 if (new_version  old_version) {
-ret = qcow2_downgrade(bs, new_version, status_cb, cb_opaque);
+helper_cb_info.current_operation = QCOW2_DOWNGRADING;
+ret = qcow2_downgrade(bs, new_version, qcow2_amend_helper_cb,
+  helper_cb_info);
 if (ret  0) {
 return ret;
 }
-- 
2.4.6

[Qemu-block] [PATCH v9 07/10] qcow2: Add function for refcount order amendment

2015-07-27 Thread Max Reitz

Add a function qcow2_change_refcount_order() which allows changing the
refcount order of a qcow2 image.

Signed-off-by: Max Reitz mre...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 block/qcow2-refcount.c | 447 +
 block/qcow2.h  |   4 +
 2 files changed, 451 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index b0ee42d..50f1757 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -2451,3 +2451,450 @@ int qcow2_pre_write_overlap_check(BlockDriverState *bs, 
int ign, int64_t offset,
 
 return 0;
 }
+
+/* A pointer to a function of this type is given to walk_over_reftable(). That
+ * function will create refblocks and pass them to a RefblockFinishOp once they
+ * are completed (@refblock). @refblock_empty is set if the refblock is
+ * completely empty.
+ *
+ * Along with the refblock, a corresponding reftable entry is passed, in the
+ * reftable @reftable (which may be reallocated) at @reftable_index.
+ *
+ * @allocated should be set to true if a new cluster has been allocated.
+ */
+typedef int (RefblockFinishOp)(BlockDriverState *bs, uint64_t **reftable,
+   uint64_t reftable_index, uint64_t 
*reftable_size,
+   void *refblock, bool refblock_empty,
+   bool *allocated, Error **errp);
+
+/**
+ * This operation for walk_over_reftable() allocates the refblock on disk (if
+ * it is not empty) and inserts its offset into the new reftable. The size of
+ * this new reftable is increased as required.
+ */
+static int alloc_refblock(BlockDriverState *bs, uint64_t **reftable,
+  uint64_t reftable_index, uint64_t *reftable_size,
+  void *refblock, bool refblock_empty, bool *allocated,
+  Error **errp)
+{
+BDRVQcowState *s = bs-opaque;
+int64_t offset;
+
+if (!refblock_empty  reftable_index = *reftable_size) {
+uint64_t *new_reftable;
+uint64_t new_reftable_size;
+
+new_reftable_size = ROUND_UP(reftable_index + 1,
+ s-cluster_size / sizeof(uint64_t));
+if (new_reftable_size  QCOW_MAX_REFTABLE_SIZE / sizeof(uint64_t)) {
+error_setg(errp,
+   This operation would make the refcount table grow 
+   beyond the maximum size supported by QEMU, aborting);
+return -ENOTSUP;
+}
+
+new_reftable = g_try_realloc(*reftable, new_reftable_size *
+sizeof(uint64_t));
+if (!new_reftable) {
+error_setg(errp, Failed to increase reftable buffer size);
+return -ENOMEM;
+}
+
+memset(new_reftable + *reftable_size, 0,
+   (new_reftable_size - *reftable_size) * sizeof(uint64_t));
+
+*reftable  = new_reftable;
+*reftable_size = new_reftable_size;
+}
+
+if (!refblock_empty  !(*reftable)[reftable_index]) {
+offset = qcow2_alloc_clusters(bs, s-cluster_size);
+if (offset  0) {
+error_setg_errno(errp, -offset, Failed to allocate refblock);
+return offset;
+}
+(*reftable)[reftable_index] = offset;
+*allocated = true;
+}
+
+return 0;
+}
+
+/**
+ * This operation for walk_over_reftable() writes the refblock to disk at the
+ * offset specified by the new reftable's entry. It does not modify the new
+ * reftable or change any refcounts.
+ */
+static int flush_refblock(BlockDriverState *bs, uint64_t **reftable,
+  uint64_t reftable_index, uint64_t *reftable_size,
+  void *refblock, bool refblock_empty, bool *allocated,
+  Error **errp)
+{
+BDRVQcowState *s = bs-opaque;
+int64_t offset;
+int ret;
+
+if (reftable_index  *reftable_size  (*reftable)[reftable_index]) {
+offset = (*reftable)[reftable_index];
+
+ret = qcow2_pre_write_overlap_check(bs, 0, offset, s-cluster_size);
+if (ret  0) {
+error_setg_errno(errp, -ret, Overlap check failed);
+return ret;
+}
+
+ret = bdrv_pwrite(bs-file, offset, refblock, s-cluster_size);
+if (ret  0) {
+error_setg_errno(errp, -ret, Failed to write refblock);
+return ret;
+}
+} else {
+assert(refblock_empty);
+}
+
+return 0;
+}
+
+/**
+ * This function walks over the existing reftable and every referenced 
refblock;
+ * if @new_set_refcount is non-NULL, it is called for every refcount entry to
+ * create an equal new entry in the passed @new_refblock. Once that
+ * @new_refblock is completely filled, @operation will be called.
+ *
+ * @status_cb and @cb_opaque are used for the amend operation's status 
callback.
+ * @index is the index of the walk_over_reftable() calls and @total is the

[Qemu-block] [PATCH v9 03/10] qcow2: Use error_report() in qcow2_amend_options()

2015-07-27 Thread Max Reitz

Signed-off-by: Max Reitz mre...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
Reviewed-by: Stefan Hajnoczi stefa...@redhat.com
Reviewed-by: Alberto Garcia be...@igalia.com
---
 block/qcow2.c  | 14 ++
 tests/qemu-iotests/061.out | 14 +++---
 2 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 4051fdd..3bcb96e 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2717,11 +2717,11 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 } else if (!strcmp(compat, 1.1)) {
 new_version = 3;
 } else {
-fprintf(stderr, Unknown compatibility level %s.\n, compat);
+error_report(Unknown compatibility level %s, compat);
 return -EINVAL;
 }
 } else if (!strcmp(desc-name, BLOCK_OPT_PREALLOC)) {
-fprintf(stderr, Cannot change preallocation mode.\n);
+error_report(Cannot change preallocation mode);
 return -ENOTSUP;
 } else if (!strcmp(desc-name, BLOCK_OPT_SIZE)) {
 new_size = qemu_opt_get_size(opts, BLOCK_OPT_SIZE, 0);
@@ -2734,16 +2734,14 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 !!s-cipher);
 
 if (encrypt != !!s-cipher) {
-fprintf(stderr, Changing the encryption flag is not 
-supported.\n);
+error_report(Changing the encryption flag is not supported);
 return -ENOTSUP;
 }
 } else if (!strcmp(desc-name, BLOCK_OPT_CLUSTER_SIZE)) {
 cluster_size = qemu_opt_get_size(opts, BLOCK_OPT_CLUSTER_SIZE,
  cluster_size);
 if (cluster_size != s-cluster_size) {
-fprintf(stderr, Changing the cluster size is not 
-supported.\n);
+error_report(Changing the cluster size is not supported);
 return -ENOTSUP;
 }
 } else if (!strcmp(desc-name, BLOCK_OPT_LAZY_REFCOUNTS)) {
@@ -2790,8 +2788,8 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 if (s-use_lazy_refcounts != lazy_refcounts) {
 if (lazy_refcounts) {
 if (s-qcow_version  3) {
-fprintf(stderr, Lazy refcounts only supported with 
compatibility 
-level 1.1 and above (use compat=1.1 or greater)\n);
+error_report(Lazy refcounts only supported with compatibility 

+ level 1.1 and above (use compat=1.1 or 
greater));
 return -EINVAL;
 }
 s-compatible_features |= QCOW2_COMPAT_LAZY_REFCOUNTS;
diff --git a/tests/qemu-iotests/061.out b/tests/qemu-iotests/061.out
index 5ec248f..713ca6b 100644
--- a/tests/qemu-iotests/061.out
+++ b/tests/qemu-iotests/061.out
@@ -281,18 +281,18 @@ No errors were found on the image.
 === Testing invalid configurations ===
 
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
-Lazy refcounts only supported with compatibility level 1.1 and above (use 
compat=1.1 or greater)
+qemu-img: Lazy refcounts only supported with compatibility level 1.1 and above 
(use compat=1.1 or greater)
 qemu-img: Error while amending options: Invalid argument
-Lazy refcounts only supported with compatibility level 1.1 and above (use 
compat=1.1 or greater)
+qemu-img: Lazy refcounts only supported with compatibility level 1.1 and above 
(use compat=1.1 or greater)
 qemu-img: Error while amending options: Invalid argument
-Unknown compatibility level 0.42.
+qemu-img: Unknown compatibility level 0.42
 qemu-img: Error while amending options: Invalid argument
 qemu-img: Invalid parameter 'foo'
-Changing the cluster size is not supported.
+qemu-img: Changing the cluster size is not supported
 qemu-img: Error while amending options: Operation not supported
-Changing the encryption flag is not supported.
+qemu-img: Changing the encryption flag is not supported
 qemu-img: Error while amending options: Operation not supported
-Cannot change preallocation mode.
+qemu-img: Cannot change preallocation mode
 qemu-img: Error while amending options: Operation not supported
 
 === Testing correct handling of unset value ===
@@ -300,7 +300,7 @@ qemu-img: Error while amending options: Operation not 
supported
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 Should work:
 Should not work:
-Changing the cluster size is not supported.
+qemu-img: Changing the cluster size is not supported
 qemu-img: Error while amending options: Operation not supported
 
 === Testing zero expansion on inactive clusters ===
-- 
2.4.6

[Qemu-block] [PATCH v5 10/17] qcow2/overlaps: Protect inactive L1 tables

2015-07-27 Thread Max Reitz

Keep track of the inactive L1 tables in the metadata list to protect
them against accidental modifications.

Signed-off-by: Max Reitz mre...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 block/qcow2-snapshot.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 6d1ae00..95afd87 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -122,6 +122,21 @@ int qcow2_read_snapshots(BlockDriverState *bs)
 ret = -EFBIG;
 goto fail;
 }
+
+if (!(s-overlap_check  QCOW2_OL_INACTIVE_L1)) {
+continue;
+}
+
+if (sn-l1_size  INT_MAX / sizeof(uint64_t)) {
+/* Do not fail opening the image because a snapshot is broken which
+ * might not be used anyway */
+continue;
+}
+
+qcow2_metadata_list_enter(bs, sn-l1_table_offset,
+  size_to_clusters(s, sn-l1_size *
+  sizeof(uint64_t)),
+  QCOW2_OL_INACTIVE_L1);
 }
 
 assert(offset - s-snapshots_offset = INT_MAX);
@@ -415,6 +430,11 @@ int qcow2_snapshot_create(BlockDriverState *bs, 
QEMUSnapshotInfo *sn_info)
 g_free(l1_table);
 l1_table = NULL;
 
+qcow2_metadata_list_enter(bs, sn-l1_table_offset,
+  size_to_clusters(s, sn-l1_size *
+  sizeof(uint64_t)),
+  QCOW2_OL_INACTIVE_L1);
+
 /*
  * Increase the refcounts of all clusters and make sure everything is
  * stable on disk before updating the snapshot table to contain a pointer
@@ -635,6 +655,11 @@ int qcow2_snapshot_delete(BlockDriverState *bs,
 g_free(sn.id_str);
 g_free(sn.name);
 
+qcow2_metadata_list_remove(bs, sn.l1_table_offset,
+   size_to_clusters(s, sn.l1_size *
+   sizeof(uint64_t)),
+   QCOW2_OL_INACTIVE_L1);
+
 /*
  * Now decrease the refcounts of clusters referenced by the snapshot and
  * free the L1 table.
-- 
2.4.6

[Qemu-block] [PATCH v5 02/17] qcow2: Pull up overlap check option evaluation

2015-07-27 Thread Max Reitz

Pull up the absorption of the qcow2-relevant command line options and
the evaluation of the overlap check options in qcow2_open().

Signed-off-by: Max Reitz mre...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 block/qcow2.c | 96 +--
 1 file changed, 48 insertions(+), 48 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 76c331b..97b624a 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -708,6 +708,54 @@ static int qcow2_open(BlockDriverState *bs, QDict 
*options, int flags,
 bs-encrypted = 1;
 }
 
+opts = qemu_opts_create(qcow2_runtime_opts, NULL, 0, error_abort);
+qemu_opts_absorb_qdict(opts, options, local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+ret = -EINVAL;
+goto fail;
+}
+
+opt_overlap_check = qemu_opt_get(opts, QCOW2_OPT_OVERLAP);
+opt_overlap_check_template = qemu_opt_get(opts, 
QCOW2_OPT_OVERLAP_TEMPLATE);
+if (opt_overlap_check_template  opt_overlap_check 
+strcmp(opt_overlap_check_template, opt_overlap_check))
+{
+error_setg(errp, Conflicting values for qcow2 options '
+   QCOW2_OPT_OVERLAP ' ('%s') and ' 
QCOW2_OPT_OVERLAP_TEMPLATE
+   ' ('%s'), opt_overlap_check, opt_overlap_check_template);
+ret = -EINVAL;
+goto fail;
+}
+if (!opt_overlap_check) {
+opt_overlap_check = opt_overlap_check_template ?: cached;
+}
+
+if (!strcmp(opt_overlap_check, none)) {
+overlap_check_template = 0;
+} else if (!strcmp(opt_overlap_check, constant)) {
+overlap_check_template = QCOW2_OL_CONSTANT;
+} else if (!strcmp(opt_overlap_check, cached)) {
+overlap_check_template = QCOW2_OL_CACHED;
+} else if (!strcmp(opt_overlap_check, all)) {
+overlap_check_template = QCOW2_OL_ALL;
+} else {
+error_setg(errp, Unsupported value '%s' for qcow2 option 
+   'overlap-check'. Allowed are either of the following: 
+   none, constant, cached, all, opt_overlap_check);
+ret = -EINVAL;
+goto fail;
+}
+
+s-overlap_check = 0;
+for (i = 0; i  QCOW2_OL_MAX_BITNR; i++) {
+/* overlap-check defines a template bitmask, but every flag may be
+ * overwritten through the associated boolean option */
+s-overlap_check |=
+qemu_opt_get_bool(opts, overlap_bool_option_names[i],
+  overlap_check_template  (1  i))  i;
+}
+
 s-l2_bits = s-cluster_bits - 3; /* L2 is always one cluster */
 s-l2_size = 1  s-l2_bits;
 /* 2^(s-refcount_order - 3) is the refcount width in bytes */
@@ -803,14 +851,6 @@ static int qcow2_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 
 /* get L2 table/refcount block cache size from command line options */
-opts = qemu_opts_create(qcow2_runtime_opts, NULL, 0, error_abort);
-qemu_opts_absorb_qdict(opts, options, local_err);
-if (local_err) {
-error_propagate(errp, local_err);
-ret = -EINVAL;
-goto fail;
-}
-
 read_cache_sizes(bs, opts, l2_cache_size, refcount_cache_size,
  local_err);
 if (local_err) {
@@ -946,46 +986,6 @@ static int qcow2_open(BlockDriverState *bs, QDict 
*options, int flags,
 s-discard_passthrough[QCOW2_DISCARD_OTHER] =
 qemu_opt_get_bool(opts, QCOW2_OPT_DISCARD_OTHER, false);
 
-opt_overlap_check = qemu_opt_get(opts, QCOW2_OPT_OVERLAP);
-opt_overlap_check_template = qemu_opt_get(opts, 
QCOW2_OPT_OVERLAP_TEMPLATE);
-if (opt_overlap_check_template  opt_overlap_check 
-strcmp(opt_overlap_check_template, opt_overlap_check))
-{
-error_setg(errp, Conflicting values for qcow2 options '
-   QCOW2_OPT_OVERLAP ' ('%s') and ' 
QCOW2_OPT_OVERLAP_TEMPLATE
-   ' ('%s'), opt_overlap_check, opt_overlap_check_template);
-ret = -EINVAL;
-goto fail;
-}
-if (!opt_overlap_check) {
-opt_overlap_check = opt_overlap_check_template ?: cached;
-}
-
-if (!strcmp(opt_overlap_check, none)) {
-overlap_check_template = 0;
-} else if (!strcmp(opt_overlap_check, constant)) {
-overlap_check_template = QCOW2_OL_CONSTANT;
-} else if (!strcmp(opt_overlap_check, cached)) {
-overlap_check_template = QCOW2_OL_CACHED;
-} else if (!strcmp(opt_overlap_check, all)) {
-overlap_check_template = QCOW2_OL_ALL;
-} else {
-error_setg(errp, Unsupported value '%s' for qcow2 option 
-   'overlap-check'. Allowed are either of the following: 
-   none, constant, cached, all, opt_overlap_check);
-ret = -EINVAL;
-goto fail;
-}
-
-s-overlap_check = 0;
-for (i = 0; i  QCOW2_OL_MAX_BITNR; i++) {
-/* overlap-check defines a template bitmask, but every flag may be
- * overwritten through the

[Qemu-block] [PATCH v5 13/17] qcow2/overlaps: Add memory limit reached event

2015-07-27 Thread Max Reitz

Later, a mechanism to set a limit on how much memory may be used for the
overlap prevention structures will be introduced. If that limit is about
to be exceeded, a QMP event should be emitted. This very event is
specified by this patch.

Signed-off-by: Max Reitz mre...@redhat.com
---
 docs/qmp/qmp-events.txt | 27 +++
 qapi/event.json | 27 +++
 2 files changed, 54 insertions(+)

diff --git a/docs/qmp/qmp-events.txt b/docs/qmp/qmp-events.txt
index d92cc48..00c70d1 100644
--- a/docs/qmp/qmp-events.txt
+++ b/docs/qmp/qmp-events.txt
@@ -279,6 +279,33 @@ Example:
 { event: POWERDOWN,
 timestamp: { seconds: 1267040730, microseconds: 682951 } }
 
+QCOW2_OVERLAP_CHECK_MEMORY_LIMIT_REACHED
+
+
+Emitted by the qcow2 block driver if the preset size limit for the in-memory
+structures for metadata overlap prevention has been reached and an allocation
+of further resources has been denied. This means that it cannot be guaranteed
+that overlap checks will be performed for the specified range; if no range is
+given, no checks can be expected to be performed whatsoever.
+
+Note that this event does not guarantee that no check will be performed in the
+given range (or on the whole image, if no range is given), but only signals the
+possibility that this might be the case. The ability to check the full range or
+a part of it may be restored at any point in time.
+
+Data:
+- reference: Device name if set; node name otherwise. (json-string)
+- start: Offset of the range of clusters (possibly) no longer being
+   checked for writes overlapping with existing metadata.
+   (json-int, optional)
+- length:Length of that range in bytes. (json-int, optional)
+
+Example:
+
+{ event: QCOW2_OVERLAP_CHECK_MEMORY_LIMIT_REACHED,
+data: { reference: virtio0, start: 805306368, length: 268435456 
},
+timestamp: { seconds: 1429331400, microseconds: 519454 } }
+
 QUORUM_FAILURE
 --
 
diff --git a/qapi/event.json b/qapi/event.json
index f0cef01..996cd3d 100644
--- a/qapi/event.json
+++ b/qapi/event.json
@@ -356,3 +356,30 @@
 ##
 { 'event': 'MEM_UNPLUG_ERROR',
   'data': { 'device': 'str', 'msg': 'str' } }
+
+##
+# @QCOW2_OVERLAP_CHECK_MEMORY_LIMIT_REACHED
+#
+# Emitted by the qcow2 block driver if the preset size limit for the in-memory
+# structures for metadata overlap prevention has been reached and an allocation
+# of further resources has been denied. This means that no overlap checks will
+# be performed for the specified range.
+#
+# Note that this event does not guarantee that no check will be performed in 
the
+# given range (or on the whole image, if no range is given), but only signals
+# the possibility that this might be the case. The ability to check the full
+# range or a part of it may be restored at any point in time.
+#
+# @reference: device name if set; node name otherwise
+#
+# @start: #optional offset of the range of clusters (possibly) no longer
+# being checked for writes overlapping with existing metadata
+#
+# @length:#optional length of that range in bytes
+#
+# Since: 2.5
+##
+{ 'event': 'QCOW2_OVERLAP_CHECK_MEMORY_LIMIT_REACHED',
+  'data': { 'reference': 'str',
+'*start':'int',
+'*length':   'int' } }
-- 
2.4.6

[Qemu-block] [PATCH v5 12/17] qcow2: Use new metadata overlap check function

2015-07-27 Thread Max Reitz

Make the static new overlap check function global and drop the old
function.

Signed-off-by: Max Reitz mre...@redhat.com
---
 block/qcow2-overlap.c  |   8 +---
 block/qcow2-refcount.c | 120 -
 2 files changed, 2 insertions(+), 126 deletions(-)

diff --git a/block/qcow2-overlap.c b/block/qcow2-overlap.c
index 35390a5..92097dd 100644
--- a/block/qcow2-overlap.c
+++ b/block/qcow2-overlap.c
@@ -356,12 +356,8 @@ static int single_check_metadata_overlap(Qcow2MetadataList 
*mdl, int ign,
 return window-bitmap[bitmap_i]  ~ign;
 }
 
-/* This will replace qcow2_check_metadata_overlap() */
-static int check_metadata_overlap(BlockDriverState *bs, int ign, int64_t 
offset,
-  int64_t size) __attribute__((used));
-
-static int check_metadata_overlap(BlockDriverState *bs, int ign, int64_t 
offset,
-  int64_t size)
+int qcow2_check_metadata_overlap(BlockDriverState *bs, int ign, int64_t offset,
+ int64_t size)
 {
 BDRVQcowState *s = bs-opaque;
 uint64_t start_cluster = offset  s-cluster_bits;
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 2cdf535..940f424 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -2371,126 +2371,6 @@ fail:
 return ret;
 }
 
-#define overlaps_with(ofs, sz) \
-ranges_overlap(offset, size, ofs, sz)
-
-/*
- * Checks if the given offset into the image file is actually free to use by
- * looking for overlaps with important metadata sections (L1/L2 tables etc.),
- * i.e. a sanity check without relying on the refcount tables.
- *
- * The ign parameter specifies what checks not to perform (being a bitmask of
- * QCow2MetadataOverlap values), i.e., what sections to ignore.
- *
- * Returns:
- * - 0 if writing to this offset will not affect the mentioned metadata
- * - a positive QCow2MetadataOverlap value indicating one overlapping section
- * - a negative value (-errno) indicating an error while performing a check,
- *   e.g. when bdrv_read failed on QCOW2_OL_INACTIVE_L2
- */
-int qcow2_check_metadata_overlap(BlockDriverState *bs, int ign, int64_t offset,
- int64_t size)
-{
-BDRVQcowState *s = bs-opaque;
-int chk = s-overlap_check  ~ign;
-int i, j;
-
-if (!size) {
-return 0;
-}
-
-if (chk  QCOW2_OL_MAIN_HEADER) {
-if (offset  s-cluster_size) {
-return QCOW2_OL_MAIN_HEADER;
-}
-}
-
-/* align range to test to cluster boundaries */
-size = align_offset(offset_into_cluster(s, offset) + size, 
s-cluster_size);
-offset = start_of_cluster(s, offset);
-
-if ((chk  QCOW2_OL_ACTIVE_L1)  s-l1_size) {
-if (overlaps_with(s-l1_table_offset, s-l1_size * sizeof(uint64_t))) {
-return QCOW2_OL_ACTIVE_L1;
-}
-}
-
-if ((chk  QCOW2_OL_REFCOUNT_TABLE)  s-refcount_table_size) {
-if (overlaps_with(s-refcount_table_offset,
-s-refcount_table_size * sizeof(uint64_t))) {
-return QCOW2_OL_REFCOUNT_TABLE;
-}
-}
-
-if ((chk  QCOW2_OL_SNAPSHOT_TABLE)  s-snapshots_size) {
-if (overlaps_with(s-snapshots_offset, s-snapshots_size)) {
-return QCOW2_OL_SNAPSHOT_TABLE;
-}
-}
-
-if ((chk  QCOW2_OL_INACTIVE_L1)  s-snapshots) {
-for (i = 0; i  s-nb_snapshots; i++) {
-if (s-snapshots[i].l1_size 
-overlaps_with(s-snapshots[i].l1_table_offset,
-s-snapshots[i].l1_size * sizeof(uint64_t))) {
-return QCOW2_OL_INACTIVE_L1;
-}
-}
-}
-
-if ((chk  QCOW2_OL_ACTIVE_L2)  s-l1_table) {
-for (i = 0; i  s-l1_size; i++) {
-if ((s-l1_table[i]  L1E_OFFSET_MASK) 
-overlaps_with(s-l1_table[i]  L1E_OFFSET_MASK,
-s-cluster_size)) {
-return QCOW2_OL_ACTIVE_L2;
-}
-}
-}
-
-if ((chk  QCOW2_OL_REFCOUNT_BLOCK)  s-refcount_table) {
-for (i = 0; i  s-refcount_table_size; i++) {
-if ((s-refcount_table[i]  REFT_OFFSET_MASK) 
-overlaps_with(s-refcount_table[i]  REFT_OFFSET_MASK,
-s-cluster_size)) {
-return QCOW2_OL_REFCOUNT_BLOCK;
-}
-}
-}
-
-if ((chk  QCOW2_OL_INACTIVE_L2)  s-snapshots) {
-for (i = 0; i  s-nb_snapshots; i++) {
-uint64_t l1_ofs = s-snapshots[i].l1_table_offset;
-uint32_t l1_sz  = s-snapshots[i].l1_size;
-uint64_t l1_sz2 = l1_sz * sizeof(uint64_t);
-uint64_t *l1 = g_try_malloc(l1_sz2);
-int ret;
-
-if (l1_sz2  l1 == NULL) {
-return -ENOMEM;
-}
-
-ret = bdrv_pread(bs-file, l1_ofs, l1, l1_sz2);
-if (ret  0) {
-g_free(l1);
-return ret;
-}
-
-for (j = 0; j  l1_sz; j++) {
-

[Qemu-block] [PATCH v5 11/17] qcow2/overlaps: Protect inactive L2 tables

2015-07-27 Thread Max Reitz

Keep track of the inactive L2 tables in the metadata list to protect
them against accidental modifications.

Signed-off-by: Max Reitz mre...@redhat.com
---
 block/qcow2-refcount.c | 20 
 block/qcow2-snapshot.c | 43 ---
 2 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 76dd2bc..2cdf535 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1211,8 +1211,28 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
 }
 if (addend  0) {
 if (!l1_allocated) {
+/* This is easy */
 qcow2_metadata_list_remove(bs, l2_offset, 1,
QCOW2_OL_ACTIVE_L2);
+} else {
+/* If refcount == 0, this is, too. If refcount  1, we
+ * know that there must be some other inactive L2
+ * reference; and for refcount == 1, if this is an
+ * active L2 table, this was the last inactive L2
+ * reference. */
+bool remove;
+if (refcount == 0) {
+remove = true;
+} else if (refcount == 1) {
+remove = qcow2_check_metadata_overlap(bs,
+~QCOW2_OL_ACTIVE_L2, 
l2_offset,s-cluster_size);
+} else {
+remove = false;
+}
+if (remove) {
+qcow2_metadata_list_remove(bs, l2_offset, 1,
+   QCOW2_OL_INACTIVE_L2);
+}
 }
 }
 }
diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 95afd87..e781bf2 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -47,9 +47,10 @@ int qcow2_read_snapshots(BlockDriverState *bs)
 QCowSnapshotHeader h;
 QCowSnapshotExtraData extra;
 QCowSnapshot *sn;
-int i, id_str_size, name_size;
+int i, j, id_str_size, name_size;
 int64_t offset;
 uint32_t extra_data_size;
+uint64_t *l1_table;
 int ret;
 
 if (!s-nb_snapshots) {
@@ -123,11 +124,12 @@ int qcow2_read_snapshots(BlockDriverState *bs)
 goto fail;
 }
 
-if (!(s-overlap_check  QCOW2_OL_INACTIVE_L1)) {
+if (!(s-overlap_check  (QCOW2_OL_INACTIVE_L1 | 
QCOW2_OL_INACTIVE_L2)))
+{
 continue;
 }
 
-if (sn-l1_size  INT_MAX / sizeof(uint64_t)) {
+if (sn-l1_size  QCOW_MAX_L1_SIZE) {
 /* Do not fail opening the image because a snapshot is broken which
  * might not be used anyway */
 continue;
@@ -137,6 +139,34 @@ int qcow2_read_snapshots(BlockDriverState *bs)
   size_to_clusters(s, sn-l1_size *
   sizeof(uint64_t)),
   QCOW2_OL_INACTIVE_L1);
+
+if (!(s-overlap_check  QCOW2_OL_INACTIVE_L2)) {
+continue;
+}
+
+l1_table = qemu_try_blockalign(bs-file,
+   sn-l1_size * sizeof(uint64_t));
+if (!l1_table) {
+/* Do not fail opening the image just because a snapshot's L2 
tables
+ * cannot be covered by the overlap checks */
+continue;
+}
+
+ret = bdrv_pread(bs-file, sn-l1_table_offset, l1_table,
+ sn-l1_size * sizeof(uint64_t));
+if (ret  0) {
+qemu_vfree(l1_table);
+continue;
+}
+for (j = 0; j  sn-l1_size; j++) {
+uint64_t l2_offset = be64_to_cpu(l1_table[j])  L1E_OFFSET_MASK;
+if (l2_offset) {
+qcow2_metadata_list_enter(bs, l2_offset, 1,
+  QCOW2_OL_INACTIVE_L2);
+}
+}
+
+qemu_vfree(l1_table);
 }
 
 assert(offset - s-snapshots_offset = INT_MAX);
@@ -435,6 +465,13 @@ int qcow2_snapshot_create(BlockDriverState *bs, 
QEMUSnapshotInfo *sn_info)
   sizeof(uint64_t)),
   QCOW2_OL_INACTIVE_L1);
 
+for (i = 0; i  s-l1_size; i++) {
+uint64_t l2_offset = s-l1_table[i]  L1E_OFFSET_MASK;
+if (l2_offset) {
+qcow2_metadata_list_enter(bs, l2_offset, 1, QCOW2_OL_INACTIVE_L2);
+}
+}
+
 /*
  * Increase the refcounts of all clusters and make sure everything is
  * stable on disk before updating the snapshot table to contain a pointer
-- 
2.4.6

Re: [Qemu-block] [Qemu-devel] [PATCH] qapi: use 'type' in generated C code to match QMP union wire form

2015-07-27 Thread Eric Blake

On 07/27/2015 11:30 AM, Eric Blake wrote:
 When dealing with simple qapi unions, the code was generating a
 discriminator field of 'kind' even though the discriminator is
 sent as 'type' over QMP.  Renaming things to match gets us one
 step closer to reusing common generator code for both simple and
 flat unions, without having to special case the naming choice
 for simple unions.
 
 However, this patch does not rename the generated enum, which is
 still 'unionnameKind'; if we wanted, a further patch could
 generate implicit enums as 'unionnameType', with even more
 churn to C code to react, and probably update the qapi generator
 to reserve the 'fooType' namespace instead of 'fooKind'.  But
 that is a lot harder, as we already have existing qapi usage
 of types that end in 'Type'.
 
 Signed-off-by: Eric Blake ebl...@redhat.com
 ---
 
 Applies on top of Markus' [PATCH RFC v2 00/47] qapi: QMP
 introspection. Making this cleanup would make it easier
 to switch between simple and flat unions in the future as
 needed, because the C code would always use the discriminator
 name passed over QMP.

And missing changes to tests/test-qmp-* that prevent 'make
check-qapi-schema check-unit' from passing. But I'll wait to send a v2
until after Markus' series is reposted, as there may be some rebasing
required.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-block] [PATCH v5 04/17] qcow2/overlaps: Protect image header

2015-07-27 Thread Max Reitz

Enter the image header into the metadata list to protect it against
accidental modifications.

Signed-off-by: Max Reitz mre...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 block/qcow2.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index d30d008..9815325 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -774,6 +774,8 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, 
int flags,
 }
 }
 
+qcow2_metadata_list_enter(bs, 0, 1, QCOW2_OL_MAIN_HEADER);
+
 s-refcount_table_offset = header.refcount_table_offset;
 s-refcount_table_size =
 header.refcount_table_clusters  (s-cluster_bits - 3);
-- 
2.4.6

[Qemu-block] [PATCH v5 00/17] qcow2: Add new overlap check functions

2015-07-27 Thread Max Reitz

This is a continuation of previous versions of this series. v2's cover
letter was the most elaborate, which you can find here (includes
benchmarks):
http://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg03430.html

See patch 1 for an explanation of why this series exists and what it
does. Patch 1 is basically the core of this series, the rest just
employs the functions introduced there.

In a later patch, we may want to change the meaning of the constant
overlap checking option to mean the same as cached, which is
everything except for inactive L2 tables. This series does make
checking for overlaps with inactive L2 tables at runtime just as cheap
as everything else (constant time plus caching), but using these checks
means qemu has to read all the snapshot L1 tables when opening a qcow2
file. This does not take long, of course, but it does result in a bit of
overhead so I did not want to enable it by default.

--- v4 part (RFC is still relevant) ---

The new thing about v4 is that there now is a way to limit the total
combined size of all data structures used for the new overlap prevention
algorithm. Bad news: The limit is disabled by default. This is mainly
because for image creation, it is pretty hard to find a sane default
(it depends on the desired image size; but the image is not created with
that size, but rather with size 0 and then truncated; inferring the
default from the image size seems to make sense, but we cannot do this
for images that are going to be truncated).

RFC: An alternative would be to infer the limit like so:
MAX(1M, guest_image_size / 1M). It seems sane to me and I guess it
should work even for images with very, very large guest sizes. But I
think it will break if you're trying to preallocate like a 1 PB image
(which is not really a concern right now, but let's see about it in
ten years)...
On the other hand, the worst that happens is that the event is generated
and the overlap checks won't be performed for some parts of the image
during its creation. Doesn't sound too bad to me. What do you think,
dears reviewers?


v5:
- Rebased on master
- Patch 13:
  - Fixed JSON [Eric]
  - s/2\.4/2.5/ (I know, I know...)
- Patch 14:
  - If a new bitmap is to be created, but the memory limit would be
exceeded, it's better to evict old bitmaps from the cache until
enough memory is available instead of just aborting.
- Patch 16:
  - Simplified description [Eric]
  - s/type/struct/ [Eric]
  - s/2\.4/2.5/


git-backport-diff against v4:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/17:[] [-C] 'qcow2: Add new overlap check functions'
002/17:[] [-C] 'qcow2: Pull up overlap check option evaluation'
003/17:[] [--] 'qcow2: Create metadata list'
004/17:[] [--] 'qcow2/overlaps: Protect image header'
005/17:[] [--] 'qcow2/overlaps: Protect refcount table'
006/17:[] [--] 'qcow2/overlaps: Protect refcount blocks'
007/17:[] [--] 'qcow2/overlaps: Protect active L1 table'
008/17:[] [--] 'qcow2/overlaps: Protect active L2 tables'
009/17:[] [--] 'qcow2/overlaps: Protect snapshot table'
010/17:[] [--] 'qcow2/overlaps: Protect inactive L1 tables'
011/17:[] [--] 'qcow2/overlaps: Protect inactive L2 tables'
012/17:[] [--] 'qcow2: Use new metadata overlap check function'
013/17:[0005] [FC] 'qcow2/overlaps: Add memory limit reached event'
014/17:[0044] [FC] 'qcow2/overlaps: Add memory usage limit'
015/17:[] [--] 'qcow2: Add overlap structure memory size options'
016/17:[0022] [FC] 'qapi: Expose new qcow2 overlap check options'
017/17:[] [--] 'iotests: Test qcow2's overlap check memory limit'


Max Reitz (17):
  qcow2: Add new overlap check functions
  qcow2: Pull up overlap check option evaluation
  qcow2: Create metadata list
  qcow2/overlaps: Protect image header
  qcow2/overlaps: Protect refcount table
  qcow2/overlaps: Protect refcount blocks
  qcow2/overlaps: Protect active L1 table
  qcow2/overlaps: Protect active L2 tables
  qcow2/overlaps: Protect snapshot table
  qcow2/overlaps: Protect inactive L1 tables
  qcow2/overlaps: Protect inactive L2 tables
  qcow2: Use new metadata overlap check function
  qcow2/overlaps: Add memory limit reached event
  qcow2/overlaps: Add memory usage limit
  qcow2: Add overlap structure memory size options
  qapi: Expose new qcow2 overlap check options
  iotests: Test qcow2's overlap check memory limit

 block/Makefile.objs|   3 +-
 block/qcow2-cluster.c  |  13 +
 block/qcow2-overlap.c  | 610 +
 block/qcow2-refcount.c | 202 ++-
 block/qcow2-snapshot.c | 105 +++-
 block/qcow2.c  | 162 
 block/qcow2.h  |  15 ++
 docs/qmp/qmp-events.txt|  27 ++
 qapi/block-core.json   |  31 +++
 qapi/event.json

[Qemu-block] [PATCH v5 17/17] iotests: Test qcow2's overlap check memory limit

2015-07-27 Thread Max Reitz

This patch adds some test cases for the memory limit concerning the
in-memory structures used to detect and prevent accidental metadata
overlaps.

Signed-off-by: Max Reitz mre...@redhat.com
---
 tests/qemu-iotests/060 | 222 +
 tests/qemu-iotests/060.out |  47 ++
 tests/qemu-iotests/group   |   2 +-
 3 files changed, 270 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/060 b/tests/qemu-iotests/060
index c81319c..b0bc86a 100755
--- a/tests/qemu-iotests/060
+++ b/tests/qemu-iotests/060
@@ -37,6 +37,7 @@ trap _cleanup; exit \$status 0 1 2 3 15
 # get standard environment, filters and checks
 . ./common.rc
 . ./common.filter
+. ./common.qemu
 
 # This tests qocw2-specific low-level functionality
 _supported_fmt qcow2
@@ -243,6 +244,227 @@ poke_file $TEST_IMG $(($l2_offset+8)) 
\x80\x00\x00\x00\x00\x06\x2a\x00
 # Should emit two error messages
 $QEMU_IO -c discard 0 64k -c read 64k 64k $TEST_IMG | _filter_qemu_io
 
+echo
+echo === Testing memory limit ===
+echo
+
+_make_test_img 64M
+# Use blockdev-add instead of adding the drive at startup, so that events which
+# are generated when the image is opened will be sent over QMP
+_launch_qemu
+_send_qemu_cmd $QEMU_HANDLE { 'execute': 'qmp_capabilities' } 'return'
+
+echo
+echo '--- Zero limit ---'
+echo
+
+# This should fail (because allocating the metadata structure list itself 
fails)
+_send_qemu_cmd $QEMU_HANDLE \
+{ 'execute': 'blockdev-add', \
+  'arguments': { \
+  'options': { \
+  'id': 'drv0', \
+  'driver': 'qcow2', \
+  'overlap-structures': { \
+  'total-size-limit': 0 \
+  }, \
+  'file': { \
+ 'driver': 'file', \
+ 'filename': '$TEST_IMG' \
+  } } } } \
+'error'
+
+echo
+echo '--- Zero limit with overlap checks disabled ---'
+echo
+
+# This should work (because overlap checks are disabled)
+_send_qemu_cmd $QEMU_HANDLE \
+{ 'execute': 'blockdev-add', \
+  'arguments': { \
+  'options': { \
+  'id': 'drv1', \
+  'driver': 'qcow2', \
+  'overlap-check': 'none', \
+  'overlap-structures': { \
+  'total-size-limit': 0 \
+  }, \
+  'file': { \
+ 'driver': 'file', \
+ 'filename': '$TEST_IMG' \
+  } } } } \
+'return'
+
+echo
+echo '--- Limit too small for entering metadata ---'
+echo
+
+# For the initial metadata structures, the following data is required:
+#  - Qcow2MetadataList (32 B on x86, 56 B on x64)
+#  - nb_windows * Qcow2MetadataWindow (nb_windows is
+#ceil(x / (64k * WINDOW_SIZE)) = 1 (WINDOW_SIZE = 4096; x is the size of 
the
+#underlying file, which is probably below 1 MB)); therefore, this is 24 B 
on
+#x86, and 32 B on x64)
+#  - cache_entries * int (cache_entries is cache_size / WINDOW_SIZE =
+#65536 / 4096 = 16, therefore this is 64 B on both x86 and x64)
+# In total, we need 120 B on x86 and 152 B on x64 (Linux/gcc).
+#
+# Creating a single bytemap takes WINDOW_SIZE (4096) B. Therefore, setting the
+# limit to 256 B will definitely suffice for the structures required to create
+# the metadata structures, but trying to enter even a single metadata structure
+# will fail.
+# The start field of the event(s) generated will be 0; the length field 
will
+# be WINDOW_SIZE * cluster_size = 4096 * 64k = 256M.
+_send_qemu_cmd $QEMU_HANDLE \
+{ 'execute': 'blockdev-add', \
+  'arguments': { \
+  'options': { \
+  'id': 'drv2', \
+  'driver': 'qcow2', \
+  'overlap-structures': { \
+  'bitmap-size': 65536, \
+  'total-size-limit': 256 \
+  }, \
+  'file': { \
+ 'driver': 'file', \
+ 'filename': '$TEST_IMG' \
+  } } } } \
+'return'
+
+_cleanup_qemu
+
+echo
+echo '--- Limit too small for both bitmap and fragment list ---'
+echo
+
+# Now test the case of trying to convert the bitmap back to the fragment list,
+# but where memory does not suffice to hold both. Because the bitmap is 
released
+# after the fragment list has been created, it should work anyway, because the
+# implementation can consider the bitmap freed before it is actually freed.
+# In order for this to work on both x86 and x64, the fragment list needs to 
take
+# up more than 32 B (152 - 120) of memory. Every entry has 4 B, so we need 9
+# entries. One is the image header, then there is the L1 table, the refcount
+# table, a refcount block, and then there are more than 1000 free clusters; a
+# maximum of 256 clusters can be represented by a single fragment, so these are
+# another 4 entries. In total, there are thus 8 entries so far, so we need
+# another one, which we can do by writing data and thus creating an L2 table.
+$QEMU_IO -c 'write 0 64k' $TEST_IMG |

[Qemu-block] [PATCH v5 07/17] qcow2/overlaps: Protect active L1 table

2015-07-27 Thread Max Reitz

Keep track of the active L1 table in the metadata list to protect it
against accidental modifications.

Signed-off-by: Max Reitz mre...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 block/qcow2-cluster.c  | 11 +++
 block/qcow2-snapshot.c | 10 ++
 block/qcow2.c  |  4 
 3 files changed, 25 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index b43f186..b95f6fe 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -125,6 +125,17 @@ int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t 
min_size,
 s-l1_table = new_l1_table;
 old_l1_size = s-l1_size;
 s-l1_size = new_l1_size;
+
+qcow2_metadata_list_remove(bs, old_l1_table_offset,
+   size_to_clusters(s, old_l1_size *
+   sizeof(uint64_t)),
+   QCOW2_OL_ACTIVE_L1);
+
+qcow2_metadata_list_enter(bs, s-l1_table_offset,
+  size_to_clusters(s, s-l1_size *
+  sizeof(uint64_t)),
+  QCOW2_OL_ACTIVE_L1);
+
 qcow2_free_clusters(bs, old_l1_table_offset, old_l1_size * 
sizeof(uint64_t),
 QCOW2_DISCARD_OTHER);
 return 0;
diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index b6f58c1..05e814d 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -719,6 +719,11 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs,
 return ret;
 }
 
+qcow2_metadata_list_remove(bs, s-l1_table_offset,
+   size_to_clusters(s, s-l1_size *
+   sizeof(uint64_t)),
+   QCOW2_OL_ACTIVE_L1);
+
 /* Switch the L1 table */
 qemu_vfree(s-l1_table);
 
@@ -730,5 +735,10 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs,
 be64_to_cpus(s-l1_table[i]);
 }
 
+qcow2_metadata_list_enter(bs, s-l1_table_offset,
+  size_to_clusters(s, s-l1_size *
+  sizeof(uint64_t)),
+  QCOW2_OL_ACTIVE_L1);
+
 return 0;
 }
diff --git a/block/qcow2.c b/block/qcow2.c
index e34cd7c..c186452 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -844,6 +844,10 @@ static int qcow2_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 s-l1_table_offset = header.l1_table_offset;
 
+qcow2_metadata_list_enter(bs, s-l1_table_offset,
+  size_to_clusters(s, s-l1_size *
+  sizeof(uint64_t)),
+  QCOW2_OL_ACTIVE_L1);
 
 if (s-l1_size  0) {
 s-l1_table = qemu_try_blockalign(bs-file,
-- 
2.4.6

[Qemu-block] [PATCH v5 14/17] qcow2/overlaps: Add memory usage limit

2015-07-27 Thread Max Reitz

This adds an adjustable limit for the total memory usage of the overlap
prevention structures.

Signed-off-by: Max Reitz mre...@redhat.com
---
 block/qcow2-overlap.c | 183 +++---
 block/qcow2.c |   2 +-
 block/qcow2.h |   2 +-
 3 files changed, 177 insertions(+), 10 deletions(-)

diff --git a/block/qcow2-overlap.c b/block/qcow2-overlap.c
index 92097dd..8a418c9 100644
--- a/block/qcow2-overlap.c
+++ b/block/qcow2-overlap.c
@@ -23,9 +23,11 @@
  */
 
 #include block/block_int.h
+#include sysemu/block-backend.h
 #include qemu-common.h
 #include qemu/range.h
 #include qcow2.h
+#include qapi-event.h
 
 /* Number of clusters which are covered by each metadata window;
  * note that this may not exceed 2^16 as long as
@@ -68,12 +70,50 @@ struct Qcow2MetadataList {
 
 unsigned current_age;
 
+size_t mem_usage, max_mem_usage;
+
 /* Index into the windows array */
 int *cached_windows;
 size_t nb_cached_windows;
 };
 
 /**
+ * Logs newly allocated memory in the @mdl-mem_usage field. Returns true iff
+ * @mdl-mem_usage + @size * @nmemb = @mdl-max_mem_usage, that is, if
+ * allocating that memory is still within limits. In that case, @mdl-mem_usage
+ * is increased by @size * @nmemb; otherwise, it is left unchanged.
+ */
+static bool increase_mdl_mem_usage(Qcow2MetadataList *mdl, size_t size,
+   size_t nmemb)
+{
+if (nmemb  SIZE_MAX / nmemb  size) {
+/* Overflow */
+return false;
+}
+
+size *= nmemb;
+if (mdl-mem_usage + size  mdl-mem_usage ||
+mdl-mem_usage + size  mdl-max_mem_usage)
+{
+return false;
+}
+
+mdl-mem_usage += size;
+return true;
+}
+
+static void signal_memory_excess(BlockDriverState *bs, uint64_t start_cluster,
+ uint64_t nb_clusters)
+{
+BDRVQcowState *s = bs-opaque;
+const char *ref = bs-blk ? blk_name(bs-blk) : bs-node_name;
+
+qapi_event_send_qcow2_overlap_check_memory_limit_reached(ref, true,
+start_cluster * s-cluster_size, true, nb_clusters * s-cluster_size,
+error_abort);
+}
+
+/**
  * Destroys the cached window bitmap. If it has been modified, the fragment 
list
  * will be rebuilt accordingly.
  */
@@ -81,11 +121,16 @@ static void destroy_window_bitmap(Qcow2MetadataList *mdl,
   Qcow2MetadataWindow *window)
 {
 Qcow2MetadataFragment *new_fragments;
+bool increase_ret;
 
 if (!window-bitmap) {
 return;
 }
 
+/* Treat the bitmap as freed already; while not really correct, this gives
+ * us space for the fragment list */
+mdl-mem_usage -= WINDOW_SIZE;
+
 if (window-bitmap_modified) {
 int bitmap_i, fragment_i = 0;
 QCow2MetadataOverlap current_types = 0;
@@ -105,6 +150,9 @@ static void destroy_window_bitmap(Qcow2MetadataList *mdl,
 } else {
 if (current_types  current_nb_clusters) {
 if (fragment_i = window-fragments_array_size) {
+mdl-mem_usage -= sizeof(Qcow2MetadataFragment)
+  * window-fragments_array_size;
+
 window-fragments_array_size =
 3 * window-fragments_array_size / 2 + 1;
 if (sizeof(Qcow2MetadataFragment)
@@ -115,6 +163,16 @@ static void destroy_window_bitmap(Qcow2MetadataList *mdl,
 goto fail;
 }
 
+increase_ret = increase_mdl_mem_usage(mdl,
+sizeof(Qcow2MetadataFragment),
+window-fragments_array_size);
+/* This can never be false: Because mem_usage was
+ * reduced by WINDOW_SIZE at the beginning of this
+ * function and the size of the fragment list is 
certain
+ * to be less than WINDOW_SIZE, there must be enough
+ * memory available. */
+assert(increase_ret);
+
 new_fragments =
 g_try_renew(Qcow2MetadataFragment,
 window-fragments,
@@ -148,6 +206,7 @@ static void destroy_window_bitmap(Qcow2MetadataList *mdl,
 new_fragments = g_try_renew(Qcow2MetadataFragment, window-fragments,
 window-nb_fragments);
 if (new_fragments) {
+mdl-mem_usage -= window-fragments_array_size - 
window-nb_fragments;
 window-fragments = new_fragments;
 window-fragments_array_size = window-nb_fragments;
 }
@@ -163,12 +222,50 @@ fail:
 window-fragments = NULL;
 window-nb_fragments = 0;
 window-fragments_array_size = 0;
+
+/* window-bitmap is not destroyed, so we need to account for it (because
+ * we borrowed the space before) */

[Qemu-block] [PATCH v5 01/17] qcow2: Add new overlap check functions

2015-07-27 Thread Max Reitz

The existing qcow2 metadata overlap detection function used existing
structures to determine the location of the image metadata, from plain
fields such as l1_table_offset and l1_size in the BDRVQcowState, over
image structures in memory such as the L1 table for the L2 tables'
positions, or it even read the required data directly from disk for
every requested check, such as the snapshot L1 tables for the inactive
L2 tables' positions.

These new functions instead keep a dedicated structure for keeping track
of the metadata positions in memory. It consists of two parts: First,
there is one structure which is basically a list of all metadata
structures. Each entry has a bitmask of types (because some metadata
structures may actually overlap, such as active and inactive L2 tables),
a number of clusters occupied and the offset from the previous entry in
clusters. This structure requires relatively little memory, but checking
a certain point may take relatively long. Each entry is called a
fragment.

Therefore, there is another representation which is a bitmap, or rather
a bytemap, of metadata types. The previously described list is split
into multiple windows with each describing a constant number of clusters
(WINDOW_SIZE). If the list is to be queried or changed, the respective
window is selected in constant time and the bitmap is generated from the
fragments belonging to the window. This bitmap can then be queried in
constant time and easily be changed.

Because the bitmap representation generally requires more memory, it is
only used as a cache. Whenever a window is removed from the cache, the
fragment list will be rebuilt from the bitmap if the latter has been
modified; except for if the fragment list would actually be bigger than
the bitmap, in which case the bitmap is kept but no longer being
considered part of the cache. Therefore, the fragment list is only used
as the background representation to save memory, whereas the bitmap is
used whenever possible.

Signed-off-by: Max Reitz mre...@redhat.com
---
 block/Makefile.objs   |   3 +-
 block/qcow2-overlap.c | 447 ++
 block/qcow2.h |  13 ++
 3 files changed, 462 insertions(+), 1 deletion(-)
 create mode 100644 block/qcow2-overlap.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 58ef2ef..b974db4 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -1,5 +1,6 @@
 block-obj-y += raw_bsd.o qcow.o vdi.o vmdk.o cloop.o bochs.o vpc.o vvfat.o
-block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o 
qcow2-cache.o
+block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
+block-obj-y += qcow2-cache.o qcow2-overlap.o
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
 block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
diff --git a/block/qcow2-overlap.c b/block/qcow2-overlap.c
new file mode 100644
index 000..35390a5
--- /dev/null
+++ b/block/qcow2-overlap.c
@@ -0,0 +1,447 @@
+/*
+ * QCOW2 runtime metadata overlap detection
+ *
+ * Copyright (c) 2015 Max Reitz mre...@redhat.com
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the Software), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include block/block_int.h
+#include qemu-common.h
+#include qemu/range.h
+#include qcow2.h
+
+/* Number of clusters which are covered by each metadata window;
+ * note that this may not exceed 2^16 as long as
+ * Qcow2MetadataFragment::relative_start is a uint16_t */
+#define WINDOW_SIZE 4096
+
+/* Describes a fragment of or a whole metadata range; does not necessarily
+ * describe the whole range because it needs to be split on window boundaries 
*/
+typedef struct Qcow2MetadataFragment {
+/* Bitmask of QCow2MetadataOverlap values */
+uint8_t types;
+uint8_t nb_clusters_minus_one;
+/* Number of clusters between the start of the window and this range */
+uint16_t relative_start;
+} QEMU_PACKED Qcow2MetadataFragment;
+

Re: [Qemu-block] [PATCH] megasas: Add write function to handle write access to PCI BAR 3

2015-07-27 Thread Paolo Bonzini



On 27/07/2015 10:57, Hannes Reinecke wrote:
 Acked-by: Hannes Reinecke h...@suse.com

Thanks, applied for 2.4.

Paolo

Re: [Qemu-block] [PATCH v3] raw-posix.c: Make physical devices usable in QEMU under Mac OS X host

2015-07-27 Thread Stefan Hajnoczi

On Fri, Jul 24, 2015 at 11:37:50AM -0400, Programmingkid wrote:
 
 On Jul 24, 2015, at 11:00 AM, Stefan Hajnoczi wrote:
 
  On Fri, Jul 17, 2015 at 08:19:16PM -0400, Programmingkid wrote:
  @@ -2014,7 +2015,9 @@ kern_return_t GetBSDPath( io_iterator_t 
  mediaIterator, char *bsdPath, CFIndex ma
  if ( bsdPathAsCFString ) {
  size_t devPathLength;
  strcpy( bsdPath, _PATH_DEV );
  -strcat( bsdPath, r );
  +if (flags  BDRV_O_NOCACHE) {
  +strcat(bsdPath, r);
  +}
  devPathLength = strlen( bsdPath );
  if ( CFStringGetCString( bsdPathAsCFString, bsdPath + 
  devPathLength, maxPathSize - devPathLength, kCFStringEncodingASCII ) ) {
  kernResult = KERN_SUCCESS;
  
  This hunk should be a separate patch.  
 
 If I made the changes to the GetBSDPath() function its own patch, QEMU would 
 no longer compile. The addition of a flags argument would cause this issue.

Please include the addition of the flags argument and the change to the
call site in the separate patch.

This is a single logical change which deserves its own commit.  That way
you can explain that /dev/cdrom was opening the raw char device but not
probing alignment, causing bdrv_read() to fail.


pgp0YfK0eZU0m.pgp
Description: PGP signature

Re: [Qemu-block] [PATCH V4 3/3] virtio-blk: only clear VIRTIO_F_ANY_LAYOUT for legacy device

2015-07-27 Thread Paolo Bonzini



On 27/07/2015 11:49, Jason Wang wrote:
 So this patch only clear VIRTIO_F_LAYOUT for legacy device.
 
 Cc: Stefan Hajnoczi stefa...@redhat.com
 Cc: Kevin Wolf kw...@redhat.com
 Cc: qemu-block@nongnu.org
 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
  hw/block/virtio-blk.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
 index 9acbc3a..1d3f26c 100644
 --- a/hw/block/virtio-blk.c
 +++ b/hw/block/virtio-blk.c
 @@ -731,7 +731,6 @@ static uint64_t virtio_blk_get_features(VirtIODevice 
 *vdev, uint64_t features,
  virtio_add_feature(features, VIRTIO_BLK_F_GEOMETRY);
  virtio_add_feature(features, VIRTIO_BLK_F_TOPOLOGY);
  virtio_add_feature(features, VIRTIO_BLK_F_BLK_SIZE);
 -virtio_clear_feature(features, VIRTIO_F_ANY_LAYOUT);
  if (__virtio_has_feature(features, VIRTIO_F_VERSION_1)) {
  if (s-conf.scsi) {
  error_setg(errp, Virtio 1.0 does not support scsi 
 passthrough!);
 @@ -739,6 +738,7 @@ static uint64_t virtio_blk_get_features(VirtIODevice 
 *vdev, uint64_t features,
  }
  virtio_add_feature(features, VIRTIO_F_ANY_LAYOUT);
  } else {
 +virtio_clear_feature(features, VIRTIO_F_ANY_LAYOUT);
  virtio_add_feature(features, VIRTIO_BLK_F_SCSI);
  }

This patch is unnecessary, since the feature is added back below under
if (__virtio_has_feature(features, VIRTIO_F_VERSION_1)).

Paolo

Re: [Qemu-block] [Qemu-devel] [PULL for-2.4 0/1] block patches for 2.4-rc3

2015-07-27 Thread Jeff Cody

On Mon, Jul 27, 2015 at 11:21:42AM -0400, Jeff Cody wrote:
 The following changes since commit f793d97e454a56d17e404004867985622ca1a63b:
 
   Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into 
 staging (2015-07-24 13:07:10 +0100)
 
 are available in the git repository at:
 
 
   https://github.com/codyprime/qemu-kvm-jtc.git 
 tags/jtc-for-upstream-pull-request
 
 for you to fetch changes up to 87b33736235fe3b64d167bfaf1e39a53a2abef43:
 
   sheepdog: serialize requests to overwrapping area (2015-07-27 11:17:31 
 -0400)
 
 
 Block patches for 2.4-rc3
 
 
 Hitoshi Mitake (1):
   sheepdog: serialize requests to overwrapping area
 
  block/sheepdog.c | 152 
 ++-
  1 file changed, 71 insertions(+), 81 deletions(-)
 
 -- 
 1.9.3
 



This pull request is missing 1 patch, resending with both patches.

Thanks,
Jeff

[Qemu-block] [PULL for-2.4 0/2] block patches for 2.4-rc3

2015-07-27 Thread Jeff Cody

The following changes since commit f8787f8723eaca1be99e3b1873e54de163fffa93:

  Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20150727' into 
staging (2015-07-27 19:37:09 +0100)

are available in the git repository at:


  g...@github.com:codyprime/qemu-kvm-jtc.git tags/jtc-for-upstream-pull-request

for you to fetch changes up to 325e3904210c779a13fbbc9ee156056d045d7eee:

  block/ssh: Avoid segfault if inet_connect doesn't set errno. (2015-07-28 
00:19:05 -0400)


Block patches for 2.4-rc3


Hitoshi Mitake (1):
  sheepdog: serialize requests to overwrapping area

Richard W.M. Jones (1):
  block/ssh: Avoid segfault if inet_connect doesn't set errno.

 block/sheepdog.c | 152 ++-
 block/ssh.c  |   2 +-
 2 files changed, 72 insertions(+), 82 deletions(-)

-- 
1.9.3

Re: [Qemu-block] [PATCH v3] block/ssh: Avoid segfault if inet_connect doesn't set errno.

2015-07-27 Thread Jeff Cody

On Wed, Jul 22, 2015 at 02:27:46PM +0100, Richard W.M. Jones wrote:
 v3:
 Same as v2 but set the return value to -EIO.


Thanks, applied to my block tree:

https://github.com/codyprime/qemu-kvm-jtc/tree/block

Thanks,
Jeff

[Qemu-block] [PULL for-2.4 1/2] sheepdog: serialize requests to overwrapping area

2015-07-27 Thread Jeff Cody

From: Hitoshi Mitake mitake.hito...@lab.ntt.co.jp

Current sheepdog driver only serializes create requests in oid
unit. This mechanism isn't enough for handling requests to
overwrapping area spanning multiple oids, so it can result bugs like
below:
https://bugs.launchpad.net/sheepdog-project/+bug/1456421

This patch adds a new serialization mechanism for the problem. The
difference from the old one is:
1. serialize entire aiocb if their targetting areas overwrap
2. serialize all requests (read, write, and discard), not only creates

This patch also removes the old mechanism because the new one can be
an alternative.

Cc: Kevin Wolf kw...@redhat.com
Cc: Stefan Hajnoczi stefa...@redhat.com
Cc: Teruaki Ishizaki ishizaki.teru...@lab.ntt.co.jp
Cc: Vasiliy Tolstov v.tols...@selfip.ru
Signed-off-by: Hitoshi Mitake mitake.hito...@lab.ntt.co.jp
Tested-by: Vasiliy Tolstov v.tols...@selfip.ru
Signed-off-by: Jeff Cody jc...@redhat.com
---
 block/sheepdog.c | 152 ++-
 1 file changed, 71 insertions(+), 81 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index bd7cbed..9585beb 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -318,6 +318,10 @@ enum AIOCBState {
 AIOCB_DISCARD_OBJ,
 };
 
+#define AIOCBOverwrapping(x, y) \
+(!(x-max_affect_data_idx  y-min_affect_data_idx  \
+   || y-max_affect_data_idx  x-min_affect_data_idx))
+
 struct SheepdogAIOCB {
 BlockAIOCB common;
 
@@ -334,6 +338,11 @@ struct SheepdogAIOCB {
 
 bool cancelable;
 int nr_pending;
+
+uint32_t min_affect_data_idx;
+uint32_t max_affect_data_idx;
+
+QLIST_ENTRY(SheepdogAIOCB) aiocb_siblings;
 };
 
 typedef struct BDRVSheepdogState {
@@ -362,8 +371,10 @@ typedef struct BDRVSheepdogState {
 
 /* Every aio request must be linked to either of these queues. */
 QLIST_HEAD(inflight_aio_head, AIOReq) inflight_aio_head;
-QLIST_HEAD(pending_aio_head, AIOReq) pending_aio_head;
 QLIST_HEAD(failed_aio_head, AIOReq) failed_aio_head;
+
+CoQueue overwrapping_queue;
+QLIST_HEAD(inflight_aiocb_head, SheepdogAIOCB) inflight_aiocb_head;
 } BDRVSheepdogState;
 
 static const char * sd_strerror(int err)
@@ -498,13 +509,7 @@ static void sd_aio_cancel(BlockAIOCB *blockacb)
 AIOReq *aioreq, *next;
 
 if (sd_acb_cancelable(acb)) {
-/* Remove outstanding requests from pending and failed queues.  */
-QLIST_FOREACH_SAFE(aioreq, s-pending_aio_head, aio_siblings,
-   next) {
-if (aioreq-aiocb == acb) {
-free_aio_req(s, aioreq);
-}
-}
+/* Remove outstanding requests from failed queue.  */
 QLIST_FOREACH_SAFE(aioreq, s-failed_aio_head, aio_siblings,
next) {
 if (aioreq-aiocb == acb) {
@@ -529,6 +534,10 @@ static SheepdogAIOCB *sd_aio_setup(BlockDriverState *bs, 
QEMUIOVector *qiov,
int64_t sector_num, int nb_sectors)
 {
 SheepdogAIOCB *acb;
+uint32_t object_size;
+BDRVSheepdogState *s = bs-opaque;
+
+object_size = (UINT32_C(1)  s-inode.block_size_shift);
 
 acb = qemu_aio_get(sd_aiocb_info, bs, NULL, NULL);
 
@@ -542,6 +551,11 @@ static SheepdogAIOCB *sd_aio_setup(BlockDriverState *bs, 
QEMUIOVector *qiov,
 acb-coroutine = qemu_coroutine_self();
 acb-ret = 0;
 acb-nr_pending = 0;
+
+acb-min_affect_data_idx = acb-sector_num * BDRV_SECTOR_SIZE / 
object_size;
+acb-max_affect_data_idx = (acb-sector_num * BDRV_SECTOR_SIZE +
+  acb-nb_sectors * BDRV_SECTOR_SIZE) / 
object_size;
+
 return acb;
 }
 
@@ -703,38 +717,6 @@ static int reload_inode(BDRVSheepdogState *s, uint32_t 
snapid, const char *tag);
 static int get_sheep_fd(BDRVSheepdogState *s, Error **errp);
 static void co_write_request(void *opaque);
 
-static AIOReq *find_pending_req(BDRVSheepdogState *s, uint64_t oid)
-{
-AIOReq *aio_req;
-
-QLIST_FOREACH(aio_req, s-pending_aio_head, aio_siblings) {
-if (aio_req-oid == oid) {
-return aio_req;
-}
-}
-
-return NULL;
-}
-
-/*
- * This function searchs pending requests to the object `oid', and
- * sends them.
- */
-static void coroutine_fn send_pending_req(BDRVSheepdogState *s, uint64_t oid)
-{
-AIOReq *aio_req;
-SheepdogAIOCB *acb;
-
-while ((aio_req = find_pending_req(s, oid)) != NULL) {
-acb = aio_req-aiocb;
-/* move aio_req from pending list to inflight one */
-QLIST_REMOVE(aio_req, aio_siblings);
-QLIST_INSERT_HEAD(s-inflight_aio_head, aio_req, aio_siblings);
-add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov,
-acb-aiocb_type);
-}
-}
-
 static coroutine_fn void reconnect_to_sdog(void *opaque)
 {
 BDRVSheepdogState *s = opaque;
@@ -840,12 +822,6 @@ static void coroutine_fn aio_read_response(void *opaque)

Re: [Qemu-block] [RFC PATCH 00/11] aio: Introduce handler type to fix nested aio_poll for dataplane

2015-07-27 Thread Fam Zheng

On Fri, 07/24 09:35, Paolo Bonzini wrote:
  That way, aio_context_acquire can be dropped by:
  
  /* @pause_owner_thread: a callback which will be called when _main 
  thread_
   * wants exclusively operate on the AioContext, for example with a 
  nested
   * aio_poll().
   * @resume_owner_thread: a callback which will be called when _main 
  thread_
   * has done the exclusive operation.
   */
  AioContext *aio_context_new(AioContextPauseResumeFn *pause_owner_thread,
  AioContextPauseResumeFn 
  *resume_owner_thread,
  void *opaque,
  Error **errp):
  
  /* Try to pause the owning thread */
  void aio_context_pause(AioContext *ctx, Error **errp);
  
  /* Try to resume the paused owning thread, cannot fail */
  void aio_context_resume(AioContext *ctx);
  
  Then, in iothread.c:
  
  iothread-ctx = aio_context_new(iothread_pause, iothread_resume,
  iothread, local_err);
  
  Where iothread_pause can block the thread with a QemuCond.
 
 Condition variables don't mix well with recursive mutexes...  Once we
 have bottom halves outside the AioContext lock, however, we could use a
 separate lock for this condition variable.  That would work.

Yes, I thought so.

 
 I like it, but I still ask myself... what's the difference between
 aio_context_pause/resume and aio_disable/enable_clients? :)  There is
 still state, just in the iothread rather than in the AioContext.

I don't know, maybe this will make aio_context_acquire no longer necessary so
we get virtio-scsi dataplane fixed?

Fam

[Qemu-block] [PATCH] megasas: Add write function to handle write access to PCI BAR 3

2015-07-27 Thread Salva Peiró

This patch fixes a QEMU SEGFAULT when a write operation is performed on
the memory region of the PCI BAR 3 (base address space).
When a writeb(0xe000) is performed the .write function is invoked to
handle the write access, however, since the .write is not initialised,
the call to 0, causes QEMU to SEGFAULT.

Signed-off-by: Salva Peiró speir...@gmail.com
---
 hw/scsi/megasas.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
index 51ba9e0..a04369c 100644
--- a/hw/scsi/megasas.c
+++ b/hw/scsi/megasas.c
@@ -2202,8 +2202,15 @@ static uint64_t megasas_queue_read(void *opaque, hwaddr 
addr,
 return 0;
 }
 
+static void megasas_queue_write(void *opaque, hwaddr addr,
+   uint64_t val, unsigned size)
+{
+return;
+}
+
 static const MemoryRegionOps megasas_queue_ops = {
 .read = megasas_queue_read,
+.write = megasas_queue_write,
 .endianness = DEVICE_LITTLE_ENDIAN,
 .impl = {
 .min_access_size = 8,
-- 
2.1.4

Re: [Qemu-block] [PATCH] megasas: Add write function to handle write access to PCI BAR 3

2015-07-27 Thread Hannes Reinecke

On 07/27/2015 10:51 AM, Salva Peiró wrote:
 This patch fixes a QEMU SEGFAULT when a write operation is performed on
 the memory region of the PCI BAR 3 (base address space).
 When a writeb(0xe000) is performed the .write function is invoked to
 handle the write access, however, since the .write is not initialised,
 the call to 0, causes QEMU to SEGFAULT.
 
 Signed-off-by: Salva Peiró speir...@gmail.com
 ---
  hw/scsi/megasas.c | 7 +++
  1 file changed, 7 insertions(+)
 
 diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
 index 51ba9e0..a04369c 100644
 --- a/hw/scsi/megasas.c
 +++ b/hw/scsi/megasas.c
 @@ -2202,8 +2202,15 @@ static uint64_t megasas_queue_read(void *opaque, 
 hwaddr addr,
  return 0;
  }
  
 +static void megasas_queue_write(void *opaque, hwaddr addr,
 +   uint64_t val, unsigned size)
 +{
 +return;
 +}
 +
  static const MemoryRegionOps megasas_queue_ops = {
  .read = megasas_queue_read,
 +.write = megasas_queue_write,
  .endianness = DEVICE_LITTLE_ENDIAN,
  .impl = {
  .min_access_size = 8,
 
Yep, that's the correct fix.

Acked-by: Hannes Reinecke h...@suse.com

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries  Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

[Qemu-block] [PATCH V4 3/3] virtio-blk: only clear VIRTIO_F_ANY_LAYOUT for legacy device

2015-07-27 Thread Jason Wang

Chapter 6.3 of spec said


Transitional devices MUST offer, and if offered by the device
transitional drivers MUST accept the following:

VIRTIO_F_ANY_LAYOUT (27)


So this patch only clear VIRTIO_F_LAYOUT for legacy device.

Cc: Stefan Hajnoczi stefa...@redhat.com
Cc: Kevin Wolf kw...@redhat.com
Cc: qemu-block@nongnu.org
Signed-off-by: Jason Wang jasow...@redhat.com
---
 hw/block/virtio-blk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 9acbc3a..1d3f26c 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -731,7 +731,6 @@ static uint64_t virtio_blk_get_features(VirtIODevice *vdev, 
uint64_t features,
 virtio_add_feature(features, VIRTIO_BLK_F_GEOMETRY);
 virtio_add_feature(features, VIRTIO_BLK_F_TOPOLOGY);
 virtio_add_feature(features, VIRTIO_BLK_F_BLK_SIZE);
-virtio_clear_feature(features, VIRTIO_F_ANY_LAYOUT);
 if (__virtio_has_feature(features, VIRTIO_F_VERSION_1)) {
 if (s-conf.scsi) {
 error_setg(errp, Virtio 1.0 does not support scsi passthrough!);
@@ -739,6 +738,7 @@ static uint64_t virtio_blk_get_features(VirtIODevice *vdev, 
uint64_t features,
 }
 virtio_add_feature(features, VIRTIO_F_ANY_LAYOUT);
 } else {
+virtio_clear_feature(features, VIRTIO_F_ANY_LAYOUT);
 virtio_add_feature(features, VIRTIO_BLK_F_SCSI);
 }
 
-- 
2.1.4

47 matches

Mail list logo