Re: [Qemu-devel] [PATCH v3] util/async: use atomic_mb_set in qemu_bh_cancel

2017-11-08 Thread Pavel Butsykin



On 08.11.2017 17:15, Paolo Bonzini wrote:

On 08/11/2017 15:10, Sergio Lopez wrote:

I'm not quite sure that the pre-fetched is involved in this issue,
because pre-fetch reading a certain addresses should be invalidated by
write on another core to the same addresses. In our case write
req->state = THREAD_DONE should invalidate read req->state == THREAD_DONE.
I am inclined to think that there is a memory-reordering read with
write. It's a very real case for x86 and I don't see the reasons which
can prevent it:


Yes, you're right. This is actually a memory reordering issue. I'm
going to rewrite that paragraph.


Well, memory reordering _is_ caused by speculative prefetching, delayed
cache invalidation (store buffers), and so on.


what do you mean?

If we are speaking about x86, then a write on another core
(like req->state = THREAD_DONE in this issue) should invalidate
prefetch read(req->state = THREAD_DONE) and this is prevented in
hardware. The prefetch is locked to the L1, when another cpu
invalidates the cache lines, the prefetch is invalidated also
(As far as I understand it).


But it's probably better indeed to replace "pre-fetched" with
"outdated".  Whoever commits the patch can do the substitution (I can too).

Paolo





Re: [Qemu-devel] [PATCH v3] util/async: use atomic_mb_set in qemu_bh_cancel

2017-11-08 Thread Pavel Butsykin

On 08.11.2017 17:24, Sergio Lopez wrote:

On Wed, Nov 8, 2017 at 3:15 PM, Paolo Bonzini  wrote:

On 08/11/2017 15:10, Sergio Lopez wrote:

I'm not quite sure that the pre-fetched is involved in this issue,
because pre-fetch reading a certain addresses should be invalidated by
write on another core to the same addresses. In our case write
req->state = THREAD_DONE should invalidate read req->state == THREAD_DONE.
I am inclined to think that there is a memory-reordering read with
write. It's a very real case for x86 and I don't see the reasons which
can prevent it:


Yes, you're right. This is actually a memory reordering issue. I'm
going to rewrite that paragraph.


Well, memory reordering _is_ caused by speculative prefetching, delayed
cache invalidation (store buffers), and so on.

But it's probably better indeed to replace "pre-fetched" with
"outdated".  Whoever commits the patch can do the substitution (I can too).



Alternatively, if we want to explicitly mention the memory barrier, we
can replace the third paragraph with something like this:


This was considered to be safe, as the completion function restarts the
loop just after the call to qemu_bh_cancel. But, as this loop lacks a HW
memory barrier, the read of req->state may actually happen _before_ the
call, seeing it still as THREAD_QUEUED, and ending the completion
function without having processed a pending TPE linked at pool->head:



Yes, that's better. Thank you.


---
Sergio





Re: [Qemu-devel] [PATCH v3] util/async: use atomic_mb_set in qemu_bh_cancel

2017-11-08 Thread Pavel Butsykin

On 08.11.2017 09:34, Sergio Lopez wrote:

Commit b7a745d added a qemu_bh_cancel call to the completion function
as an optimization to prevent it from unnecessarily rescheduling itself.

This completion function is scheduled from worker_thread, after setting
the state of a ThreadPoolElement to THREAD_DONE.



Great! We are seeing the same problem, and I was describing my fix,
when I came across your patch :)


This was considered to be safe, as the completion function restarts the
loop just after the call to qemu_bh_cancel. But, under certain access
patterns and scheduling conditions, the loop may wrongly use a
pre-fetched elem->state value, reading it as THREAD_QUEUED, and ending
the completion function without having processed a pending TPE linked at
pool->head:


I'm not quite sure that the pre-fetched is involved in this issue,
because pre-fetch reading a certain addresses should be invalidated by
write on another core to the same addresses. In our case write
req->state = THREAD_DONE should invalidate read req->state == THREAD_DONE.
I am inclined to think that there is a memory-reordering read with
write. It's a very real case for x86 and I don't see the reasons which
can prevent it:

.text:0060E21E loc_60E21E: ; CODE 
XREF: .text:0060E2F4j

.text:0060E21E mov rbx, [r12+98h]
.text:0060E226 testrbx, rbx
.text:0060E229 jnz short loc_60E238
.text:0060E22B jmp short exit_0
.text:0060E22B ; 
---

.text:0060E22D align 10h
.text:0060E21E loc_60E21E: ; CODE 
XREF: .text:0060E2F4j

.text:0060E21E mov rbx, [r12+98h]
.text:0060E226 testrbx, rbx
.text:0060E229 jnz short loc_60E238
.text:0060E22B jmp short exit_0
.text:0060E230 loc_60E230: ; CODE 
XREF: .text:0060E240j

.text:0060E230 testrbp, rbp
.text:0060E233 jz  short exit_0
.text:0060E235
.text:0060E235 loc_60E235: ; CODE 
XREF: .text:0060E289j

.text:0060E235 mov rbx, rbp
.text:0060E238
.text:0060E238 loc_60E238: ; CODE 
XREF: .text:0060E229j
.text:0060E238 cmp 
[rbx+ThreadPoolElement.state], 2 ; THREAD_DONE
.text:0060E23C mov rbp, 
[rbx+ThreadPoolElement.all.link_next]

.text:0060E240 jnz short loc_60E230
.text:0060E242 mov r15d, 
[rbx+ThreadPoolElement.ret]
.text:0060E246 mov r13, 
[rbx+ThreadPoolElement.common.opaque]

.text:0060E24A nop
.text:0060E24B lea rax, 
trace_events_enabled_count

.text:0060E252 mov eax, [rax]
.text:0060E254 testeax, eax
.text:0060E256 mov rax, rbp
.text:0060E259 jnz loc_60E2F9
 ...

.text:0060E2BC loc_60E2BC: ; CODE 
XREF: .text:0060E27Cj

.text:0060E2BC mov rdi, [r12+8]
.text:0060E2C1 callqemu_bh_schedule
.text:0060E2C6 mov rdi, [r12]
.text:0060E2CA callaio_context_release
.text:0060E2CF mov esi, [rbx+44h]
.text:0060E2D2 mov rdi, [rbx+18h]
.text:0060E2D6 callqword ptr [rbx+10h]
.text:0060E2D9 mov rdi, [r12]
.text:0060E2DD callaio_context_acquire
.text:0060E2E2 mov rdi, [r12+8]
.text:0060E2E7 callqemu_bh_cancel
.text:0060E2EC mov rdi, rbx
.text:0060E2EF callqemu_aio_unref
.text:0060E2F4 jmp loc_60E21E


The read (req->state == THREAD_DONE) can be reordered
with qemu_bh_cancel(p->completion_bh) and then we get the same picture:

   worker thread |I/O thread
 
 | reordered read req->state
  req->state = THREAD_DONE;  |
  qemu_bh_schedule(p->completion_bh) |
bh->scheduled = 1;   |
 | qemu_bh_cancel(p->completion_bh)
 |   bh->scheduled = 0;
 | if (req->state == THREAD_DONE)
  

Re: [Qemu-devel] [PATCH] qcow2: Emit errp when truncating the image tail

2017-10-09 Thread Pavel Butsykin

On 09.10.2017 18:54, Max Reitz wrote:

bdrv_truncate() has an errp parameter which is always set when an error
occurs.  Let's use that instead of a plain strerror().

Signed-off-by: Max Reitz 
---
  block/qcow2.c | 13 +++--
  1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index dff903e05c..2f6a8e1ff8 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3150,12 +3150,13 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t 
offset,
  return last_cluster;
  }
  if ((last_cluster + 1) * s->cluster_size < old_file_size) {
-ret = bdrv_truncate(bs->file, (last_cluster + 1) * s->cluster_size,
-PREALLOC_MODE_OFF, NULL);
-if (ret < 0) {
-warn_report("Failed to truncate the tail of the image: %s",
-strerror(-ret));
-ret = 0;
+Error *local_err = NULL;
+
+bdrv_truncate(bs->file, (last_cluster + 1) * s->cluster_size,
+  PREALLOC_MODE_OFF, &local_err);
+if (local_err) {
+warn_reportf_err(local_err,
+ "Failed to truncate the tail of the image: ");
  }
      }
  } else {



Reviewed-by: Pavel Butsykin 



[Qemu-devel] [PATCH v3 2/2] qcow2: truncate the tail of the image file after shrinking the image

2017-09-28 Thread Pavel Butsykin
Now after shrinking the image, at the end of the image file, there might be a
tail that probably will never be used. So we can find the last used cluster and
cut the tail.

Signed-off-by: Pavel Butsykin 
Reviewed-by: John Snow 
---
 block/qcow2-refcount.c | 22 ++
 block/qcow2.c  | 23 +++
 block/qcow2.h  |  1 +
 3 files changed, 46 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 88d5a3f1ad..aa3fd6cf17 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -3181,3 +3181,25 @@ out:
 g_free(reftable_tmp);
 return ret;
 }
+
+int64_t qcow2_get_last_cluster(BlockDriverState *bs, int64_t size)
+{
+BDRVQcow2State *s = bs->opaque;
+int64_t i;
+
+for (i = size_to_clusters(s, size) - 1; i >= 0; i--) {
+uint64_t refcount;
+int ret = qcow2_get_refcount(bs, i, &refcount);
+if (ret < 0) {
+fprintf(stderr, "Can't get refcount for cluster %" PRId64 ": %s\n",
+i, strerror(-ret));
+return ret;
+}
+if (refcount > 0) {
+return i;
+}
+}
+qcow2_signal_corruption(bs, true, -1, -1,
+"There are no references in the refcount table.");
+return -EIO;
+}
diff --git a/block/qcow2.c b/block/qcow2.c
index 8a4311d338..f08c69ccd9 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3106,6 +3106,7 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t 
offset,
 new_l1_size = size_to_l1(s, offset);
 
 if (offset < old_length) {
+int64_t last_cluster, old_file_size;
 if (prealloc != PREALLOC_MODE_OFF) {
 error_setg(errp,
"Preallocation can't be used for shrinking an image");
@@ -3134,6 +3135,28 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t 
offset,
  "Failed to discard unused refblocks");
 return ret;
 }
+
+old_file_size = bdrv_getlength(bs->file->bs);
+if (old_file_size < 0) {
+error_setg_errno(errp, -old_file_size,
+ "Failed to inquire current file length");
+return old_file_size;
+}
+last_cluster = qcow2_get_last_cluster(bs, old_file_size);
+if (last_cluster < 0) {
+error_setg_errno(errp, -last_cluster,
+ "Failed to find the last cluster");
+return last_cluster;
+}
+if ((last_cluster + 1) * s->cluster_size < old_file_size) {
+ret = bdrv_truncate(bs->file, (last_cluster + 1) * s->cluster_size,
+PREALLOC_MODE_OFF, NULL);
+if (ret < 0) {
+warn_report("Failed to truncate the tail of the image: "
+"ret = %d", ret);
+ret = 0;
+}
+}
 } else {
 ret = qcow2_grow_l1_table(bs, new_l1_size, true);
 if (ret < 0) {
diff --git a/block/qcow2.h b/block/qcow2.h
index 5a289a81e2..782a206ecb 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -597,6 +597,7 @@ int qcow2_change_refcount_order(BlockDriverState *bs, int 
refcount_order,
 BlockDriverAmendStatusCB *status_cb,
 void *cb_opaque, Error **errp);
 int qcow2_shrink_reftable(BlockDriverState *bs);
+int64_t qcow2_get_last_cluster(BlockDriverState *bs, int64_t size);
 
 /* qcow2-cluster.c functions */
 int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
-- 
2.14.1




[Qemu-devel] [PATCH v3 0/2] Truncate the tail of the image file in qcow2 shrinking

2017-09-28 Thread Pavel Butsykin
Now after shrinking the qcow2 image, at the end of the image file, there might
be a tail that probably will never be used. Although it will not bring any
tangible benefit, we can cut the tail if it is. Yes, it will not free up disk
space, but if the blocks were be allocated sequentially and the image is not
heavily fragmented then the virtual size of the image file will be commensurate
with the real size. It also doesn't look like a great plus.. Well, at least we
can discuss it.

Changes from v1:
- rewrite qcow2_get_last_cluster() function according to Max's comments. (2)

Changes from v2:
- report a warning if truncation of the tail of the image file failed. (2)

Pavel Butsykin (2):
  qcow2: fix return error code in qcow2_truncate()
  qcow2: truncate the tail of the image file after shrinking the image

 block/qcow2-refcount.c | 22 ++
 block/qcow2.c  | 27 +--
 block/qcow2.h  |  1 +
 3 files changed, 48 insertions(+), 2 deletions(-)

-- 
2.14.1




[Qemu-devel] [PATCH v3 1/2] qcow2: fix return error code in qcow2_truncate()

2017-09-28 Thread Pavel Butsykin
Signed-off-by: Pavel Butsykin 
Reviewed-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Max Reitz 
---
 block/qcow2.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 2174a84d1f..8a4311d338 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3166,7 +3166,7 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t 
offset,
 if (old_file_size < 0) {
 error_setg_errno(errp, -old_file_size,
  "Failed to inquire current file length");
-return ret;
+return old_file_size;
 }
 
 nb_new_data_clusters = DIV_ROUND_UP(offset - old_length,
@@ -3195,7 +3195,7 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t 
offset,
 if (allocation_start < 0) {
 error_setg_errno(errp, -allocation_start,
  "Failed to resize refcount structures");
-return -allocation_start;
+return allocation_start;
 }
 
 clusters_allocated = qcow2_alloc_clusters_at(bs, allocation_start,
-- 
2.14.1




Re: [Qemu-devel] [PATCH v2 2/2] qcow2: truncate the tail of the image file after shrinking the image

2017-09-28 Thread Pavel Butsykin

On 27.09.2017 19:36, Max Reitz wrote:

On 2017-09-27 18:27, Pavel Butsykin wrote:

On 27.09.2017 19:00, Max Reitz wrote:

On 2017-09-22 11:39, Pavel Butsykin wrote:

Now after shrinking the image, at the end of the image file, there
might be a
tail that probably will never be used. So we can find the last used
cluster and
cut the tail.

Signed-off-by: Pavel Butsykin 
---
   block/qcow2-refcount.c | 22 ++
   block/qcow2.c  | 23 +++
   block/qcow2.h  |  1 +
   3 files changed, 46 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 88d5a3f1ad..aa3fd6cf17 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -3181,3 +3181,25 @@ out:
   g_free(reftable_tmp);
   return ret;
   }
+
+int64_t qcow2_get_last_cluster(BlockDriverState *bs, int64_t size)
+{
+BDRVQcow2State *s = bs->opaque;
+int64_t i;
+
+for (i = size_to_clusters(s, size) - 1; i >= 0; i--) {
+uint64_t refcount;
+int ret = qcow2_get_refcount(bs, i, &refcount);
+if (ret < 0) {
+fprintf(stderr, "Can't get refcount for cluster %"
PRId64 ": %s\n",
+i, strerror(-ret));
+return ret;
+}
+if (refcount > 0) {
+return i;
+}
+}
+qcow2_signal_corruption(bs, true, -1, -1,
+"There are no references in the refcount
table.");
+return -EIO;
+}
diff --git a/block/qcow2.c b/block/qcow2.c
index 8a4311d338..8dfb5393a7 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3106,6 +3106,7 @@ static int qcow2_truncate(BlockDriverState *bs,
int64_t offset,
   new_l1_size = size_to_l1(s, offset);
 if (offset < old_length) {
+int64_t last_cluster, old_file_size;
   if (prealloc != PREALLOC_MODE_OFF) {
   error_setg(errp,
  "Preallocation can't be used for shrinking
an image");
@@ -3134,6 +3135,28 @@ static int qcow2_truncate(BlockDriverState
*bs, int64_t offset,
"Failed to discard unused refblocks");
   return ret;
   }
+
+old_file_size = bdrv_getlength(bs->file->bs);
+if (old_file_size < 0) {
+error_setg_errno(errp, -old_file_size,
+ "Failed to inquire current file length");
+return old_file_size;
+}
+last_cluster = qcow2_get_last_cluster(bs, old_file_size);
+if (last_cluster < 0) {
+error_setg_errno(errp, -last_cluster,
+ "Failed to find the last cluster");
+return last_cluster;
+}


My idea was rather that you just wouldn't truncate the image file if
something fails here.  So in any of these new cases where you currently
just report the whole truncate operation as having failed, you could
just emit a warning and not do the truncation of bs->file.


I'm not sure that's right. If the qcow2_get_last_cluster() returned an
error, probably with the image was a problem.. can we continue to work
with the image without risking to damage it even more? if something bad
happened with the reftable we usually mark the image as corrupted, it's
the same thing.


Well, the only thing that's left to do is to write the new size into the
image header, so I think that should work just fine...


Yes, but what difference will update the size in the header or not, if
the reftable was corrupted. A much more important point here is that the
qcow2_truncate() should return an error and the caller must stop the
work.


I won't disagree that bdrv_getlength() or qcow2_get_last_cluster()
failing may be reasons to stop truncation (although I don't think they
necessarily are at this point).

But I could well imagine that the below bdrv_truncate() of bs->file
fails for benign reasons, e.g. because the underlying protocol does not
support shrinking of images or something.  Then we probably should carry on.


Yes, I agree here. If the bdrv_truncate() of bs->file failed, we can
print just a warning :) So, I'll send new version of the patch with
this change.


Max


Although if the shrink is almost complete, the truncation of bs->file
isn't so important thing and we could update qcow2 header.


I can live with the current version, though, so:

Reviewed-by: Max Reitz 

But I'll wait for a response from you before merging this series.

Max


+if ((last_cluster + 1) * s->cluster_size < old_file_size) {
+ret = bdrv_truncate(bs->file, (last_cluster + 1) *
s->cluster_size,
+PREALLOC_MODE_OFF, NULL);
+if (ret < 0) {
+error_setg_errno(errp, -ret,
+ "Failed to truncate the tail of the
image");
+ 

Re: [Qemu-devel] [PATCH v2 2/2] qcow2: truncate the tail of the image file after shrinking the image

2017-09-27 Thread Pavel Butsykin

On 27.09.2017 19:00, Max Reitz wrote:

On 2017-09-22 11:39, Pavel Butsykin wrote:

Now after shrinking the image, at the end of the image file, there might be a
tail that probably will never be used. So we can find the last used cluster and
cut the tail.

Signed-off-by: Pavel Butsykin 
---
  block/qcow2-refcount.c | 22 ++
  block/qcow2.c  | 23 +++
  block/qcow2.h  |  1 +
  3 files changed, 46 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 88d5a3f1ad..aa3fd6cf17 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -3181,3 +3181,25 @@ out:
  g_free(reftable_tmp);
  return ret;
  }
+
+int64_t qcow2_get_last_cluster(BlockDriverState *bs, int64_t size)
+{
+BDRVQcow2State *s = bs->opaque;
+int64_t i;
+
+for (i = size_to_clusters(s, size) - 1; i >= 0; i--) {
+uint64_t refcount;
+int ret = qcow2_get_refcount(bs, i, &refcount);
+if (ret < 0) {
+fprintf(stderr, "Can't get refcount for cluster %" PRId64 ": %s\n",
+i, strerror(-ret));
+return ret;
+}
+if (refcount > 0) {
+return i;
+}
+}
+qcow2_signal_corruption(bs, true, -1, -1,
+"There are no references in the refcount table.");
+return -EIO;
+}
diff --git a/block/qcow2.c b/block/qcow2.c
index 8a4311d338..8dfb5393a7 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3106,6 +3106,7 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t 
offset,
  new_l1_size = size_to_l1(s, offset);
  
  if (offset < old_length) {

+int64_t last_cluster, old_file_size;
  if (prealloc != PREALLOC_MODE_OFF) {
  error_setg(errp,
 "Preallocation can't be used for shrinking an image");
@@ -3134,6 +3135,28 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t 
offset,
   "Failed to discard unused refblocks");
  return ret;
  }
+
+old_file_size = bdrv_getlength(bs->file->bs);
+if (old_file_size < 0) {
+error_setg_errno(errp, -old_file_size,
+ "Failed to inquire current file length");
+return old_file_size;
+}
+last_cluster = qcow2_get_last_cluster(bs, old_file_size);
+if (last_cluster < 0) {
+error_setg_errno(errp, -last_cluster,
+ "Failed to find the last cluster");
+return last_cluster;
+}


My idea was rather that you just wouldn't truncate the image file if
something fails here.  So in any of these new cases where you currently
just report the whole truncate operation as having failed, you could
just emit a warning and not do the truncation of bs->file.


I'm not sure that's right. If the qcow2_get_last_cluster() returned an
error, probably with the image was a problem.. can we continue to work
with the image without risking to damage it even more? if something bad
happened with the reftable we usually mark the image as corrupted, it's
the same thing.

Although if the shrink is almost complete, the truncation of bs->file
isn't so important thing and we could update qcow2 header.


I can live with the current version, though, so:

Reviewed-by: Max Reitz 

But I'll wait for a response from you before merging this series.

Max


+if ((last_cluster + 1) * s->cluster_size < old_file_size) {
+ret = bdrv_truncate(bs->file, (last_cluster + 1) * s->cluster_size,
+PREALLOC_MODE_OFF, NULL);
+if (ret < 0) {
+error_setg_errno(errp, -ret,
+ "Failed to truncate the tail of the image");
+return ret;
+}
+}
  } else {
  ret = qcow2_grow_l1_table(bs, new_l1_size, true);
  if (ret < 0) {
diff --git a/block/qcow2.h b/block/qcow2.h
index 5a289a81e2..782a206ecb 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -597,6 +597,7 @@ int qcow2_change_refcount_order(BlockDriverState *bs, int 
refcount_order,
  BlockDriverAmendStatusCB *status_cb,
  void *cb_opaque, Error **errp);
  int qcow2_shrink_reftable(BlockDriverState *bs);
+int64_t qcow2_get_last_cluster(BlockDriverState *bs, int64_t size);
  
  /* qcow2-cluster.c functions */

  int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,








Re: [Qemu-devel] [PATCH v2 0/2] Truncate the tail of the image file in qcow2 shrinking

2017-09-22 Thread Pavel Butsykin

On 22.09.2017 12:50, Daniel P. Berrange wrote:

On Fri, Sep 22, 2017 at 12:39:24PM +0300, Pavel Butsykin wrote:

Now after shrinking the qcow2 image, at the end of the image file, there might
be a tail that probably will never be used. Although it will not bring any
tangible benefit, we can cut the tail if it is. Yes, it will not free up disk
space, but if the blocks were be allocated sequentially and the image is not
heavily fragmented then the virtual size of the image file will be commensurate
with the real size. It also doesn't look like a great plus.. Well, at least we
can discuss it.


If the block backend has discard support enabled, can't we get the tail
to be discarded rather than merely truncated ?



It has already been implemented. (see 
https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg04581.html)

Sorry, I just forgot to mention that this patch rebased on Max's block
branch (https://github.com/XanClic/qemu/commits/block). Actually the
truncation will always be done on the already discarded area. It can
be useful only if the block backend doesn't support discard or a file
system doesn't support sparse files.


Regards,
Daniel





[Qemu-devel] [PATCH v2 2/2] qcow2: truncate the tail of the image file after shrinking the image

2017-09-22 Thread Pavel Butsykin
Now after shrinking the image, at the end of the image file, there might be a
tail that probably will never be used. So we can find the last used cluster and
cut the tail.

Signed-off-by: Pavel Butsykin 
---
 block/qcow2-refcount.c | 22 ++
 block/qcow2.c  | 23 +++
 block/qcow2.h  |  1 +
 3 files changed, 46 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 88d5a3f1ad..aa3fd6cf17 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -3181,3 +3181,25 @@ out:
 g_free(reftable_tmp);
 return ret;
 }
+
+int64_t qcow2_get_last_cluster(BlockDriverState *bs, int64_t size)
+{
+BDRVQcow2State *s = bs->opaque;
+int64_t i;
+
+for (i = size_to_clusters(s, size) - 1; i >= 0; i--) {
+uint64_t refcount;
+int ret = qcow2_get_refcount(bs, i, &refcount);
+if (ret < 0) {
+fprintf(stderr, "Can't get refcount for cluster %" PRId64 ": %s\n",
+i, strerror(-ret));
+return ret;
+}
+if (refcount > 0) {
+return i;
+}
+}
+qcow2_signal_corruption(bs, true, -1, -1,
+"There are no references in the refcount table.");
+return -EIO;
+}
diff --git a/block/qcow2.c b/block/qcow2.c
index 8a4311d338..8dfb5393a7 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3106,6 +3106,7 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t 
offset,
 new_l1_size = size_to_l1(s, offset);
 
 if (offset < old_length) {
+int64_t last_cluster, old_file_size;
 if (prealloc != PREALLOC_MODE_OFF) {
 error_setg(errp,
"Preallocation can't be used for shrinking an image");
@@ -3134,6 +3135,28 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t 
offset,
  "Failed to discard unused refblocks");
 return ret;
 }
+
+old_file_size = bdrv_getlength(bs->file->bs);
+if (old_file_size < 0) {
+error_setg_errno(errp, -old_file_size,
+ "Failed to inquire current file length");
+return old_file_size;
+}
+last_cluster = qcow2_get_last_cluster(bs, old_file_size);
+if (last_cluster < 0) {
+error_setg_errno(errp, -last_cluster,
+ "Failed to find the last cluster");
+return last_cluster;
+}
+if ((last_cluster + 1) * s->cluster_size < old_file_size) {
+ret = bdrv_truncate(bs->file, (last_cluster + 1) * s->cluster_size,
+PREALLOC_MODE_OFF, NULL);
+if (ret < 0) {
+error_setg_errno(errp, -ret,
+ "Failed to truncate the tail of the image");
+return ret;
+}
+}
 } else {
 ret = qcow2_grow_l1_table(bs, new_l1_size, true);
 if (ret < 0) {
diff --git a/block/qcow2.h b/block/qcow2.h
index 5a289a81e2..782a206ecb 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -597,6 +597,7 @@ int qcow2_change_refcount_order(BlockDriverState *bs, int 
refcount_order,
 BlockDriverAmendStatusCB *status_cb,
 void *cb_opaque, Error **errp);
 int qcow2_shrink_reftable(BlockDriverState *bs);
+int64_t qcow2_get_last_cluster(BlockDriverState *bs, int64_t size);
 
 /* qcow2-cluster.c functions */
 int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
-- 
2.14.1




[Qemu-devel] [PATCH v2 0/2] Truncate the tail of the image file in qcow2 shrinking

2017-09-22 Thread Pavel Butsykin
Now after shrinking the qcow2 image, at the end of the image file, there might
be a tail that probably will never be used. Although it will not bring any
tangible benefit, we can cut the tail if it is. Yes, it will not free up disk
space, but if the blocks were be allocated sequentially and the image is not
heavily fragmented then the virtual size of the image file will be commensurate
with the real size. It also doesn't look like a great plus.. Well, at least we
can discuss it.

Changes from v1:
- rewrite qcow2_get_last_cluster() function according to Max's comments. (2)

Pavel Butsykin (2):
  qcow2: fix return error code in qcow2_truncate()
  qcow2: truncate the tail of the image file after shrinking the image

 block/qcow2-refcount.c | 22 ++
 block/qcow2.c  | 27 +--
 block/qcow2.h  |  1 +
 3 files changed, 48 insertions(+), 2 deletions(-)

-- 
2.14.1




[Qemu-devel] [PATCH v2 1/2] qcow2: fix return error code in qcow2_truncate()

2017-09-22 Thread Pavel Butsykin
Signed-off-by: Pavel Butsykin 
Reviewed-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Max Reitz 
---
 block/qcow2.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 2174a84d1f..8a4311d338 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3166,7 +3166,7 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t 
offset,
 if (old_file_size < 0) {
 error_setg_errno(errp, -old_file_size,
  "Failed to inquire current file length");
-return ret;
+return old_file_size;
 }
 
 nb_new_data_clusters = DIV_ROUND_UP(offset - old_length,
@@ -3195,7 +3195,7 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t 
offset,
 if (allocation_start < 0) {
 error_setg_errno(errp, -allocation_start,
  "Failed to resize refcount structures");
-return -allocation_start;
+return allocation_start;
 }
 
 clusters_allocated = qcow2_alloc_clusters_at(bs, allocation_start,
-- 
2.14.1




Re: [Qemu-devel] [PATCH 2/2] qcow2: truncate the tail of the image file after shrinking the image

2017-09-21 Thread Pavel Butsykin

On 21.09.2017 18:28, Max Reitz wrote:

On 2017-09-20 15:58, Pavel Butsykin wrote:

Now after shrinking the image, at the end of the image file, there might be a
tail that probably will never be used. So we can find the last used cluster and
cut the tail.

Signed-off-by: Pavel Butsykin 
---
  block/qcow2-refcount.c | 21 +
  block/qcow2.c  | 19 +++
  block/qcow2.h  |  1 +
  3 files changed, 41 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 88d5a3f1ad..5e221a166c 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -3181,3 +3181,24 @@ out:
  g_free(reftable_tmp);
  return ret;
  }
+
+int64_t qcow2_get_last_cluster(BlockDriverState *bs, int64_t size)
+{
+BDRVQcow2State *s = bs->opaque;
+int64_t i, last_cluster, nb_clusters = size_to_clusters(s, size);
+uint64_t refcount;
+
+for (i = 0, last_cluster = 0; i < nb_clusters; i++) {
+int ret = qcow2_get_refcount(bs, i, &refcount);
+if (ret < 0) {
+fprintf(stderr, "Can't get refcount for cluster %" PRId64 ": %s\n",
+i, strerror(-ret));
+continue;
+}
+
+if (refcount > 0) {
+last_cluster = i;
+}
+}
+return last_cluster;
+}


Wouldn't it make more sense to start from the end of the image?


If this will reduce the iterations, then yes. But it will depend on the
situation. If you truncate the image more than half, it can increase the
number of iterations. But intuitively it seems that to start from the
end would be better :)

Max





Re: [Qemu-devel] [PATCH 2/2] qcow2: truncate the tail of the image file after shrinking the image

2017-09-21 Thread Pavel Butsykin

On 21.09.2017 18:30, Max Reitz wrote:

On 2017-09-20 15:58, Pavel Butsykin wrote:

Now after shrinking the image, at the end of the image file, there might be a
tail that probably will never be used. So we can find the last used cluster and
cut the tail.

Signed-off-by: Pavel Butsykin 
---
  block/qcow2-refcount.c | 21 +
  block/qcow2.c  | 19 +++
  block/qcow2.h  |  1 +
  3 files changed, 41 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 88d5a3f1ad..5e221a166c 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -3181,3 +3181,24 @@ out:
  g_free(reftable_tmp);
  return ret;
  }
+
+int64_t qcow2_get_last_cluster(BlockDriverState *bs, int64_t size)
+{
+BDRVQcow2State *s = bs->opaque;
+int64_t i, last_cluster, nb_clusters = size_to_clusters(s, size);
+uint64_t refcount;
+
+for (i = 0, last_cluster = 0; i < nb_clusters; i++) {
+int ret = qcow2_get_refcount(bs, i, &refcount);
+if (ret < 0) {
+fprintf(stderr, "Can't get refcount for cluster %" PRId64 ": %s\n",
+i, strerror(-ret));
+continue;


Oh, and another thing: If you decide to ignore errors here, I'd at least
consider the cluster allocated.

Of course it would also be possible not to ignore errors, and instead
return them to the caller which would then just not truncate the file.


Yes, it seems so safer.


Max


+}
+
+if (refcount > 0) {
+last_cluster = i;
+}
+}
+return last_cluster;
+}






Re: [Qemu-devel] [Qemu-block] [PATCH 2/2] qcow2: truncate the tail of the image file after shrinking the image

2017-09-21 Thread Pavel Butsykin

On 21.09.2017 00:38, John Snow wrote:



On 09/20/2017 09:58 AM, Pavel Butsykin wrote:

Now after shrinking the image, at the end of the image file, there might be a
tail that probably will never be used. So we can find the last used cluster and
cut the tail.

Signed-off-by: Pavel Butsykin 
---
  block/qcow2-refcount.c | 21 +
  block/qcow2.c  | 19 +++
  block/qcow2.h  |  1 +
  3 files changed, 41 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 88d5a3f1ad..5e221a166c 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -3181,3 +3181,24 @@ out:
  g_free(reftable_tmp);
  return ret;
  }
+
+int64_t qcow2_get_last_cluster(BlockDriverState *bs, int64_t size)
+{
+BDRVQcow2State *s = bs->opaque;
+int64_t i, last_cluster, nb_clusters = size_to_clusters(s, size);
+uint64_t refcount;
+
+for (i = 0, last_cluster = 0; i < nb_clusters; i++) {
+int ret = qcow2_get_refcount(bs, i, &refcount);
+if (ret < 0) {
+fprintf(stderr, "Can't get refcount for cluster %" PRId64 ": %s\n",
+i, strerror(-ret));
+continue;
+}
+
+if (refcount > 0) {
+last_cluster = i;
+}
+}
+return last_cluster;
+}> diff --git a/block/qcow2.c b/block/qcow2.c
index 8a4311d338..c3b6dd44c4 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3106,6 +3106,7 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t 
offset,
  new_l1_size = size_to_l1(s, offset);
  
  if (offset < old_length) {

+int64_t image_end_offset, old_file_size;
  if (prealloc != PREALLOC_MODE_OFF) {
  error_setg(errp,
 "Preallocation can't be used for shrinking an image");
@@ -3134,6 +3135,24 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t 
offset,
   "Failed to discard unused refblocks");
  return ret;
  }
+
+old_file_size = bdrv_getlength(bs->file->bs);
+if (old_file_size < 0) {
+error_setg_errno(errp, -old_file_size,
+ "Failed to inquire current file length");
+return old_file_size;
+}
+image_end_offset = (qcow2_get_last_cluster(bs, old_file_size) + 1) *
+   s->cluster_size;
+if (image_end_offset < old_file_size) {
+ret = bdrv_truncate(bs->file, image_end_offset,
+PREALLOC_MODE_OFF, NULL);
+if (ret < 0) {
+error_setg_errno(errp, -ret,
+ "Failed to truncate the tail of the image");


I've recently become skeptical of what partial resize successes look
like, but that's an issue for another day entirely.


+return ret;
+}
+}
  } else {
  ret = qcow2_grow_l1_table(bs, new_l1_size, true);
  if (ret < 0) {
diff --git a/block/qcow2.h b/block/qcow2.h
index 5a289a81e2..782a206ecb 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -597,6 +597,7 @@ int qcow2_change_refcount_order(BlockDriverState *bs, int 
refcount_order,
  BlockDriverAmendStatusCB *status_cb,
  void *cb_opaque, Error **errp);
  int qcow2_shrink_reftable(BlockDriverState *bs);
+int64_t qcow2_get_last_cluster(BlockDriverState *bs, int64_t size);
  
  /* qcow2-cluster.c functions */

  int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,



Reviewed-by: John Snow 

Looks sane to me, but under which circumstances might we grow such a
tail? I assume the actual truncate call aligns to cluster boundaries as
appropriate, so is this a bit of a "quick fix" to cull unused clusters
that happened to be near the truncate boundary?

It might be worth documenting the circumstances that produces this
unused space that will never get used. My hunch is that such unused
space should likely be getting reclaimed elsewhere and not here, but
perhaps I'm misunderstanding the causal factors.



This is a consequence of how we implemented shrinking the qcow2 image.
(https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg04580.html)
But on the other hand, if we need to shrink the qcow2 image without
copying the data, this is the only way. The same guest offset can be
converted to almost any host offset in the file i.e. the first guest
cluster may be located somewhere at the end or the middle of the image
file. So we can't just take and truncate the image file on the border of
the truncation, therefore to shrink the image we just discard the
clusters that corresponds to the truncated area. The result is a
sparse image file where the apparent file size differs from actual size.
And the tail in this case is the difference between the actual size and
last used cluster in the image, so in fact the cutting of the tail does
not change the apparent file size.


--js





[Qemu-devel] [PATCH 0/2] Truncate the tail of the image file in qcow2 shrinking

2017-09-20 Thread Pavel Butsykin
Now after shrinking the qcow2 image, at the end of the image file, there might
be a tail that probably will never be used. Although it will not bring any
tangible benefit, we can cut the tail if it is. Yes, it will not free up disk
space, but if the blocks were be allocated sequentially and the image is not
heavily fragmented then the virtual size of the image file will be commensurate
with the real size. It also doesn't look like a great plus.. Well, at least we
can discuss it.

Pavel Butsykin (2):
  qcow2: fix return error code in qcow2_truncate()
  qcow2: truncate the tail of the image file after shrinking the image

 block/qcow2-refcount.c | 21 +
 block/qcow2.c  | 23 +--
 block/qcow2.h  |  1 +
 3 files changed, 43 insertions(+), 2 deletions(-)

-- 
2.14.1



[Qemu-devel] [PATCH 1/2] qcow2: fix return error code in qcow2_truncate()

2017-09-20 Thread Pavel Butsykin
Signed-off-by: Pavel Butsykin 
---
 block/qcow2.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 2174a84d1f..8a4311d338 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3166,7 +3166,7 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t 
offset,
 if (old_file_size < 0) {
 error_setg_errno(errp, -old_file_size,
  "Failed to inquire current file length");
-return ret;
+return old_file_size;
 }
 
 nb_new_data_clusters = DIV_ROUND_UP(offset - old_length,
@@ -3195,7 +3195,7 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t 
offset,
 if (allocation_start < 0) {
 error_setg_errno(errp, -allocation_start,
  "Failed to resize refcount structures");
-return -allocation_start;
+return allocation_start;
 }
 
 clusters_allocated = qcow2_alloc_clusters_at(bs, allocation_start,
-- 
2.14.1




[Qemu-devel] [PATCH 2/2] qcow2: truncate the tail of the image file after shrinking the image

2017-09-20 Thread Pavel Butsykin
Now after shrinking the image, at the end of the image file, there might be a
tail that probably will never be used. So we can find the last used cluster and
cut the tail.

Signed-off-by: Pavel Butsykin 
---
 block/qcow2-refcount.c | 21 +
 block/qcow2.c  | 19 +++
 block/qcow2.h  |  1 +
 3 files changed, 41 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 88d5a3f1ad..5e221a166c 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -3181,3 +3181,24 @@ out:
 g_free(reftable_tmp);
 return ret;
 }
+
+int64_t qcow2_get_last_cluster(BlockDriverState *bs, int64_t size)
+{
+BDRVQcow2State *s = bs->opaque;
+int64_t i, last_cluster, nb_clusters = size_to_clusters(s, size);
+uint64_t refcount;
+
+for (i = 0, last_cluster = 0; i < nb_clusters; i++) {
+int ret = qcow2_get_refcount(bs, i, &refcount);
+if (ret < 0) {
+fprintf(stderr, "Can't get refcount for cluster %" PRId64 ": %s\n",
+i, strerror(-ret));
+continue;
+}
+
+if (refcount > 0) {
+last_cluster = i;
+}
+}
+return last_cluster;
+}
diff --git a/block/qcow2.c b/block/qcow2.c
index 8a4311d338..c3b6dd44c4 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3106,6 +3106,7 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t 
offset,
 new_l1_size = size_to_l1(s, offset);
 
 if (offset < old_length) {
+int64_t image_end_offset, old_file_size;
 if (prealloc != PREALLOC_MODE_OFF) {
 error_setg(errp,
"Preallocation can't be used for shrinking an image");
@@ -3134,6 +3135,24 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t 
offset,
  "Failed to discard unused refblocks");
 return ret;
 }
+
+old_file_size = bdrv_getlength(bs->file->bs);
+if (old_file_size < 0) {
+error_setg_errno(errp, -old_file_size,
+ "Failed to inquire current file length");
+return old_file_size;
+}
+image_end_offset = (qcow2_get_last_cluster(bs, old_file_size) + 1) *
+   s->cluster_size;
+if (image_end_offset < old_file_size) {
+ret = bdrv_truncate(bs->file, image_end_offset,
+PREALLOC_MODE_OFF, NULL);
+if (ret < 0) {
+error_setg_errno(errp, -ret,
+ "Failed to truncate the tail of the image");
+return ret;
+}
+}
 } else {
 ret = qcow2_grow_l1_table(bs, new_l1_size, true);
 if (ret < 0) {
diff --git a/block/qcow2.h b/block/qcow2.h
index 5a289a81e2..782a206ecb 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -597,6 +597,7 @@ int qcow2_change_refcount_order(BlockDriverState *bs, int 
refcount_order,
 BlockDriverAmendStatusCB *status_cb,
 void *cb_opaque, Error **errp);
 int qcow2_shrink_reftable(BlockDriverState *bs);
+int64_t qcow2_get_last_cluster(BlockDriverState *bs, int64_t size);
 
 /* qcow2-cluster.c functions */
 int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
-- 
2.14.1




[Qemu-devel] [PATCH v2] virtio-serial: add enable_backend callback

2017-09-19 Thread Pavel Butsykin
We should guarantee that RAM will not be modified while VM has a stopped
state, otherwise it can lead to negative consequences during post-copy
migration. In RUN_STATE_FINISH_MIGRATE step, it's expected that RAM on
source side will not be modified as this could lead to non-consistent vm state
on the destination side. Also RAM access during postcopy-ram migration with
enabled release-ram capability can lead to sad consequences.

Let's add enable_backend() callback to avoid undesirable virtioqueue changes
in the guest memory.

Signed-off-by: Pavel Butsykin 
---
Changes from v1:
- rebase on master a9158a5cba

 hw/char/virtio-console.c  | 21 +
 hw/char/virtio-serial-bus.c   |  7 +++
 include/hw/virtio/virtio-serial.h |  3 +++
 3 files changed, 31 insertions(+)

diff --git a/hw/char/virtio-console.c b/hw/char/virtio-console.c
index 198b2a89c0..172c72d06c 100644
--- a/hw/char/virtio-console.c
+++ b/hw/char/virtio-console.c
@@ -187,6 +187,26 @@ static int chr_be_change(void *opaque)
 return 0;
 }
 
+static void virtconsole_enable_backend(VirtIOSerialPort *port, bool enable)
+{
+VirtConsole *vcon = VIRTIO_CONSOLE(port);
+
+if (!qemu_chr_fe_backend_connected(&vcon->chr)) {
+return;
+}
+
+if (enable) {
+VirtIOSerialPortClass *k = VIRTIO_SERIAL_PORT_GET_CLASS(port);
+
+qemu_chr_fe_set_handlers(&vcon->chr, chr_can_read, chr_read,
+ k->is_console ? NULL : chr_event,
+ chr_be_change, vcon, NULL, false);
+} else {
+qemu_chr_fe_set_handlers(&vcon->chr, NULL, NULL, NULL,
+ NULL, NULL, NULL, false);
+}
+}
+
 static void virtconsole_realize(DeviceState *dev, Error **errp)
 {
 VirtIOSerialPort *port = VIRTIO_SERIAL_PORT(dev);
@@ -258,6 +278,7 @@ static void virtserialport_class_init(ObjectClass *klass, 
void *data)
 k->unrealize = virtconsole_unrealize;
 k->have_data = flush_buf;
 k->set_guest_connected = set_guest_connected;
+k->enable_backend = virtconsole_enable_backend;
 k->guest_writable = guest_writable;
 dc->props = virtserialport_properties;
 }
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 17a1bb008a..9470bd7be7 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -637,6 +637,13 @@ static void set_status(VirtIODevice *vdev, uint8_t status)
 if (!(status & VIRTIO_CONFIG_S_DRIVER_OK)) {
 guest_reset(vser);
 }
+
+QTAILQ_FOREACH(port, &vser->ports, next) {
+VirtIOSerialPortClass *vsc = VIRTIO_SERIAL_PORT_GET_CLASS(port);
+if (vsc->enable_backend) {
+vsc->enable_backend(port, vdev->vm_running);
+}
+}
 }
 
 static void vser_reset(VirtIODevice *vdev)
diff --git a/include/hw/virtio/virtio-serial.h 
b/include/hw/virtio/virtio-serial.h
index b19c44727f..12657a9f39 100644
--- a/include/hw/virtio/virtio-serial.h
+++ b/include/hw/virtio/virtio-serial.h
@@ -58,6 +58,9 @@ typedef struct VirtIOSerialPortClass {
 /* Guest opened/closed device. */
 void (*set_guest_connected)(VirtIOSerialPort *port, int guest_connected);
 
+/* Enable/disable backend for virtio serial port */
+void (*enable_backend)(VirtIOSerialPort *port, bool enable);
+
 /* Guest is now ready to accept data (virtqueues set up). */
 void (*guest_ready)(VirtIOSerialPort *port);
 
-- 
2.14.1




[Qemu-devel] [PATCH v8 1/4] qemu-img: add --shrink flag for resize

2017-09-18 Thread Pavel Butsykin
The flag is additional precaution against data loss. Perhaps in the future the
operation shrink without this flag will be blocked for all formats, but for now
we need to maintain compatibility with raw.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Max Reitz 
Reviewed-by: John Snow 
---
 qemu-img-cmds.hx   |  4 ++--
 qemu-img.c | 23 +++
 qemu-img.texi  |  6 +-
 tests/qemu-iotests/102 |  4 ++--
 tests/qemu-iotests/106 |  2 +-
 5 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index b47d409665..2fe31893cf 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -89,9 +89,9 @@ STEXI
 ETEXI
 
 DEF("resize", img_resize,
-"resize [--object objectdef] [--image-opts] [-q] filename [+ | -]size")
+"resize [--object objectdef] [--image-opts] [-q] [--shrink] filename [+ | 
-]size")
 STEXI
-@item resize [--object @var{objectdef}] [--image-opts] [-q] @var{filename} [+ 
| -]@var{size}
+@item resize [--object @var{objectdef}] [--image-opts] [-q] [--shrink] 
@var{filename} [+ | -]@var{size}
 ETEXI
 
 STEXI
diff --git a/qemu-img.c b/qemu-img.c
index 56ef49e214..b7b2386cbd 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -65,6 +65,7 @@ enum {
 OPTION_TARGET_IMAGE_OPTS = 263,
 OPTION_SIZE = 264,
 OPTION_PREALLOCATION = 265,
+OPTION_SHRINK = 266,
 };
 
 typedef enum OutputFormat {
@@ -3437,6 +3438,7 @@ static int img_resize(int argc, char **argv)
 },
 };
 bool image_opts = false;
+bool shrink = false;
 
 /* Remove size from argv manually so that negative numbers are not treated
  * as options by getopt. */
@@ -3455,6 +3457,7 @@ static int img_resize(int argc, char **argv)
 {"object", required_argument, 0, OPTION_OBJECT},
 {"image-opts", no_argument, 0, OPTION_IMAGE_OPTS},
 {"preallocation", required_argument, 0, OPTION_PREALLOCATION},
+{"shrink", no_argument, 0, OPTION_SHRINK},
 {0, 0, 0, 0}
 };
 c = getopt_long(argc, argv, ":f:hq",
@@ -3498,6 +3501,9 @@ static int img_resize(int argc, char **argv)
 return 1;
 }
 break;
+case OPTION_SHRINK:
+shrink = true;
+break;
 }
 }
 if (optind != argc - 1) {
@@ -3571,6 +3577,23 @@ static int img_resize(int argc, char **argv)
 goto out;
 }
 
+if (total_size < current_size && !shrink) {
+warn_report("Shrinking an image will delete all data beyond the "
+"shrunken image's end. Before performing such an "
+"operation, make sure there is no important data there.");
+
+if (g_strcmp0(bdrv_get_format_name(blk_bs(blk)), "raw") != 0) {
+error_report(
+  "Use the --shrink option to perform a shrink operation.");
+ret = -1;
+goto out;
+} else {
+warn_report("Using the --shrink option will suppress this message."
+"Note that future versions of qemu-img may refuse to "
+"shrink images without this option.");
+}
+}
+
 ret = blk_truncate(blk, total_size, prealloc, &err);
 if (!ret) {
 qprintf(quiet, "Image resized.\n");
diff --git a/qemu-img.texi b/qemu-img.texi
index 72dabd6b3e..ea5d04b873 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -536,7 +536,7 @@ qemu-img rebase -b base.img diff.qcow2
 At this point, @code{modified.img} can be discarded, since
 @code{base.img + diff.qcow2} contains the same information.
 
-@item resize [--preallocation=@var{prealloc}] @var{filename} [+ | -]@var{size}
+@item resize [--shrink] [--preallocation=@var{prealloc}] @var{filename} [+ | 
-]@var{size}
 
 Change the disk image as if it had been created with @var{size}.
 
@@ -544,6 +544,10 @@ Before using this command to shrink a disk image, you MUST 
use file system and
 partitioning tools inside the VM to reduce allocated file systems and partition
 sizes accordingly.  Failure to do so will result in data loss!
 
+When shrinking images, the @code{--shrink} option must be given. This informs
+qemu-img that the user acknowledges all loss of data beyond the truncated
+image's end.
+
 After using this command to grow a disk image, you must use file system and
 partitioning tools inside the VM to actually begin using the new space on the
 device.
diff --git a/tests/qemu-iotests/102 b/tests/qemu-iotests/102
index 87db1bb1bf..d7ad8d9840 100755
--- a/tests/qemu-iotests/102
+++ b/tests/qemu-iotests/102
@@ -54,7 +54,7 @@ _make_test_img $IMG_SIZE
 $QEMU_IO -c 'write 0 64k' "$TEST_IMG" | _filter_qemu_io
 # Remove data cluster from image (first cluster: image header, second: 
reftable,
 # third: refbloc

[Qemu-devel] [PATCH v8 4/4] qemu-iotests: add shrinking image test

2017-09-18 Thread Pavel Butsykin
Signed-off-by: Pavel Butsykin 
Reviewed-by: Max Reitz 
Reviewed-by: John Snow 
---
 tests/qemu-iotests/163 | 170 +
 tests/qemu-iotests/163.out |   5 ++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 176 insertions(+)
 create mode 100644 tests/qemu-iotests/163
 create mode 100644 tests/qemu-iotests/163.out

diff --git a/tests/qemu-iotests/163 b/tests/qemu-iotests/163
new file mode 100644
index 00..403842354e
--- /dev/null
+++ b/tests/qemu-iotests/163
@@ -0,0 +1,170 @@
+#!/usr/bin/env python
+#
+# Tests for shrinking images
+#
+# Copyright (c) 2016-2017 Parallels International GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import os, random, iotests, struct, qcow2
+from iotests import qemu_img, qemu_io, image_size
+
+test_img = os.path.join(iotests.test_dir, 'test.img')
+check_img = os.path.join(iotests.test_dir, 'check.img')
+
+def size_to_int(str):
+suff = ['B', 'K', 'M', 'G', 'T']
+return int(str[:-1]) * 1024**suff.index(str[-1:])
+
+class ShrinkBaseClass(iotests.QMPTestCase):
+image_len = '128M'
+shrink_size = '10M'
+chunk_size = '16M'
+refcount_bits = '16'
+
+def __qcow2_check(self, filename):
+entry_bits = 3
+entry_size = 1 << entry_bits
+l1_mask = 0x00fffe00
+div_roundup = lambda n, d: (n + d - 1) / d
+
+def split_by_n(data, n):
+for x in xrange(0, len(data), n):
+yield struct.unpack('>Q', data[x:x + n])[0] & l1_mask
+
+def check_l1_table(h, l1_data):
+l1_list = list(split_by_n(l1_data, entry_size))
+real_l1_size = div_roundup(h.size,
+   1 << (h.cluster_bits*2 - entry_size))
+used, unused = l1_list[:real_l1_size], l1_list[real_l1_size:]
+
+self.assertTrue(len(used) != 0, "Verifying l1 table content")
+self.assertFalse(any(unused), "Verifying l1 table content")
+
+def check_reftable(fd, h, reftable):
+for offset in split_by_n(reftable, entry_size):
+if offset != 0:
+fd.seek(offset)
+cluster = fd.read(1 << h.cluster_bits)
+self.assertTrue(any(cluster), "Verifying reftable content")
+
+with open(filename, "rb") as fd:
+h = qcow2.QcowHeader(fd)
+
+fd.seek(h.l1_table_offset)
+l1_table = fd.read(h.l1_size << entry_bits)
+
+fd.seek(h.refcount_table_offset)
+reftable = fd.read(h.refcount_table_clusters << h.cluster_bits)
+
+check_l1_table(h, l1_table)
+check_reftable(fd, h, reftable)
+
+def __raw_check(self, filename):
+pass
+
+image_check = {
+'qcow2' : __qcow2_check,
+'raw' : __raw_check
+}
+
+def setUp(self):
+if iotests.imgfmt == 'raw':
+qemu_img('create', '-f', iotests.imgfmt, test_img, self.image_len)
+qemu_img('create', '-f', iotests.imgfmt, check_img,
+ self.shrink_size)
+else:
+qemu_img('create', '-f', iotests.imgfmt,
+ '-o', 'cluster_size=' + self.cluster_size +
+ ',refcount_bits=' + self.refcount_bits,
+ test_img, self.image_len)
+qemu_img('create', '-f', iotests.imgfmt,
+ '-o', 'cluster_size=%s'% self.cluster_size,
+ check_img, self.shrink_size)
+qemu_io('-c', 'write -P 0xff 0 ' + self.shrink_size, check_img)
+
+def tearDown(self):
+os.remove(test_img)
+os.remove(check_img)
+
+def image_verify(self):
+self.assertEqual(image_size(test_img), image_size(check_img),
+ "Verifying image size")
+self.image_check[iotests.imgfmt](self, test_img)
+
+if iotests.imgfmt == 'raw':
+return
+self.assertEqual(qemu_img('check', test_img

[Qemu-devel] [PATCH v8 0/4] Add shrink image for qcow2

2017-09-18 Thread Pavel Butsykin
This patch add shrinking of the image file for qcow2. As a result, this allows
us to reduce the virtual image size and free up space on the disk without
copying the image. Image can be fragmented and shrink is done by punching holes
in the image file.

# ./qemu-img create -f qcow2 image.qcow2 4G
Formatting 'image.qcow2', fmt=qcow2 size=4294967296 encryption=off 
cluster_size=65536 lazy_refcounts=off refcount_bits=16

# ./qemu-io -c "write -P 0x22 0 1G" image.qcow2
wrote 1073741824/1073741824 bytes at offset 0
1 GiB, 1 ops; 0:00:04.59 (222.886 MiB/sec and 0.2177 ops/sec)

# ./qemu-img resize image.qcow2 512M
warning: qemu-img: Shrinking an image will delete all data beyond the shrunken 
image's end. Before performing such an operation, make sure there is no 
important data there.
error: qemu-img: Use the --shrink option to perform a shrink operation.

# ./qemu-img resize --shrink image.qcow2 128M
Image resized.

# ./qemu-img info image.qcow2
image: image.qcow2
file format: qcow2
virtual size: 128M (134217728 bytes)
disk size: 128M
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false

# du -h image.qcow2
129Mimage.qcow2

Changes from v1:
- add --shrink flag for qemu-img resize
- add qcow2_cache_discard
- simplify qcow2_shrink_l1_table() to reduce the likelihood of image corruption
- add new qemu-iotests for shrinking images

Changes from v2:
- replace qprintf() on error_report() (1)
- rewrite warning messages (1)
- enforce --shrink flag for all formats except raw (1)
- split qcow2_cache_discard() (2)
- minor fixes according to comments (3)
- rewrite the last part of qcow2_shrink_reftable() to avoid
  qcow2_free_clusters() calls inside (3)
- improve test for shrinking image (4)

Changes from v3:
- rebase on "Implement a warning_report function" Alistair's patch-set (1)
- spelling fixes (1)
- the man page fix according to the discussion (1)
- add call qcow2_signal_corruption() in case of image corruption (3)

Changes from v4:
- rebase on https://github.com/XanClic/qemu/commits/block Max's block branch

Changes from v5:
- the condition refcount == 0 should be enough to evict the l2/refcount cluster
  from the cache (2)
- overwrite the l1/refcount table in memory with zeros, even if overwriting the
  l1/refcount table on disk has failed (3)
- replace g_try_malloc() on g_malloc() for allocation reftable_tmp (3)

Changes from v6:
- rebase on master 1f29673387

Changes from v7:
- fix 106 iotest (1)
- minor fixes according to comments (2, 3)
- add documentation of the new enum members (3)
- add r-b's by Max and John

Pavel Butsykin (4):
  qemu-img: add --shrink flag for resize
  qcow2: add qcow2_cache_discard
  qcow2: add shrink image support
  qemu-iotests: add shrinking image test

 block/qcow2-cache.c|  26 +++
 block/qcow2-cluster.c  |  50 +
 block/qcow2-refcount.c | 140 -
 block/qcow2.c  |  43 +---
 block/qcow2.h  |  17 +
 qapi/block-core.json   |   8 ++-
 qemu-img-cmds.hx   |   4 +-
 qemu-img.c |  23 ++
 qemu-img.texi  |   6 +-
 tests/qemu-iotests/102 |   4 +-
 tests/qemu-iotests/106 |   2 +-
 tests/qemu-iotests/163 | 170 +
 tests/qemu-iotests/163.out |   5 ++
 tests/qemu-iotests/group   |   1 +
 14 files changed, 481 insertions(+), 18 deletions(-)
 create mode 100644 tests/qemu-iotests/163
 create mode 100644 tests/qemu-iotests/163.out

-- 
2.14.1




[Qemu-devel] [PATCH v8 2/4] qcow2: add qcow2_cache_discard

2017-09-18 Thread Pavel Butsykin
Whenever l2/refcount table clusters are discarded from the file we can
automatically drop unnecessary content of the cache tables. This reduces
the chance of eviction useful cache data and eliminates inconsistent data
in the cache with the data in the file.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Max Reitz 
Reviewed-by: John Snow 
---
 block/qcow2-cache.c| 26 ++
 block/qcow2-refcount.c | 20 ++--
 block/qcow2.h  |  3 +++
 3 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index 1d25147392..75746a7f43 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -411,3 +411,29 @@ void qcow2_cache_entry_mark_dirty(BlockDriverState *bs, 
Qcow2Cache *c,
 assert(c->entries[i].offset != 0);
 c->entries[i].dirty = true;
 }
+
+void *qcow2_cache_is_table_offset(BlockDriverState *bs, Qcow2Cache *c,
+  uint64_t offset)
+{
+int i;
+
+for (i = 0; i < c->size; i++) {
+if (c->entries[i].offset == offset) {
+return qcow2_cache_get_table_addr(bs, c, i);
+}
+}
+return NULL;
+}
+
+void qcow2_cache_discard(BlockDriverState *bs, Qcow2Cache *c, void *table)
+{
+int i = qcow2_cache_get_table_idx(bs, c, table);
+
+assert(c->entries[i].ref == 0);
+
+c->entries[i].offset = 0;
+c->entries[i].lru_counter = 0;
+c->entries[i].dirty = false;
+
+qcow2_cache_table_release(bs, c, i, 1);
+}
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 168fc32e7b..8c17c0e3aa 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -861,8 +861,24 @@ static int QEMU_WARN_UNUSED_RESULT 
update_refcount(BlockDriverState *bs,
 }
 s->set_refcount(refcount_block, block_index, refcount);
 
-if (refcount == 0 && s->discard_passthrough[type]) {
-update_refcount_discard(bs, cluster_offset, s->cluster_size);
+if (refcount == 0) {
+void *table;
+
+table = qcow2_cache_is_table_offset(bs, s->refcount_block_cache,
+offset);
+if (table != NULL) {
+qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
+qcow2_cache_discard(bs, s->refcount_block_cache, table);
+}
+
+table = qcow2_cache_is_table_offset(bs, s->l2_table_cache, offset);
+if (table != NULL) {
+qcow2_cache_discard(bs, s->l2_table_cache, table);
+}
+
+if (s->discard_passthrough[type]) {
+update_refcount_discard(bs, cluster_offset, s->cluster_size);
+}
 }
 }
 
diff --git a/block/qcow2.h b/block/qcow2.h
index 96a8d43c17..52c374e9ed 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -649,6 +649,9 @@ int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, 
uint64_t offset,
 int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
 void **table);
 void qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
+void *qcow2_cache_is_table_offset(BlockDriverState *bs, Qcow2Cache *c,
+  uint64_t offset);
+void qcow2_cache_discard(BlockDriverState *bs, Qcow2Cache *c, void *table);
 
 /* qcow2-bitmap.c functions */
 int qcow2_check_bitmaps_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
-- 
2.14.1




[Qemu-devel] [PATCH v8 3/4] qcow2: add shrink image support

2017-09-18 Thread Pavel Butsykin
This patch add shrinking of the image file for qcow2. As a result, this allows
us to reduce the virtual image size and free up space on the disk without
copying the image. Image can be fragmented and shrink is done by punching holes
in the image file.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Max Reitz 
Reviewed-by: John Snow 
---
 block/qcow2-cluster.c  |  50 +
 block/qcow2-refcount.c | 120 +
 block/qcow2.c  |  43 ++
 block/qcow2.h  |  14 ++
 qapi/block-core.json   |   8 +++-
 5 files changed, 225 insertions(+), 10 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 0d4824993c..d2518d1893 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -32,6 +32,56 @@
 #include "qemu/bswap.h"
 #include "trace.h"
 
+int qcow2_shrink_l1_table(BlockDriverState *bs, uint64_t exact_size)
+{
+BDRVQcow2State *s = bs->opaque;
+int new_l1_size, i, ret;
+
+if (exact_size >= s->l1_size) {
+return 0;
+}
+
+new_l1_size = exact_size;
+
+#ifdef DEBUG_ALLOC2
+fprintf(stderr, "shrink l1_table from %d to %d\n", s->l1_size, 
new_l1_size);
+#endif
+
+BLKDBG_EVENT(bs->file, BLKDBG_L1_SHRINK_WRITE_TABLE);
+ret = bdrv_pwrite_zeroes(bs->file, s->l1_table_offset +
+   new_l1_size * sizeof(uint64_t),
+ (s->l1_size - new_l1_size) * sizeof(uint64_t), 0);
+if (ret < 0) {
+goto fail;
+}
+
+ret = bdrv_flush(bs->file->bs);
+if (ret < 0) {
+goto fail;
+}
+
+BLKDBG_EVENT(bs->file, BLKDBG_L1_SHRINK_FREE_L2_CLUSTERS);
+for (i = s->l1_size - 1; i > new_l1_size - 1; i--) {
+if ((s->l1_table[i] & L1E_OFFSET_MASK) == 0) {
+continue;
+}
+qcow2_free_clusters(bs, s->l1_table[i] & L1E_OFFSET_MASK,
+s->cluster_size, QCOW2_DISCARD_ALWAYS);
+s->l1_table[i] = 0;
+}
+return 0;
+
+fail:
+/*
+ * If the write in the l1_table failed the image may contain a partially
+ * overwritten l1_table. In this case it would be better to clear the
+ * l1_table in memory to avoid possible image corruption.
+ */
+memset(s->l1_table + new_l1_size, 0,
+   (s->l1_size - new_l1_size) * sizeof(uint64_t));
+return ret;
+}
+
 int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
 bool exact_size)
 {
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 8c17c0e3aa..88d5a3f1ad 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -29,6 +29,7 @@
 #include "block/qcow2.h"
 #include "qemu/range.h"
 #include "qemu/bswap.h"
+#include "qemu/cutils.h"
 
 static int64_t alloc_clusters_noref(BlockDriverState *bs, uint64_t size);
 static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
@@ -3061,3 +3062,122 @@ done:
 qemu_vfree(new_refblock);
 return ret;
 }
+
+static int qcow2_discard_refcount_block(BlockDriverState *bs,
+uint64_t discard_block_offs)
+{
+BDRVQcow2State *s = bs->opaque;
+uint64_t refblock_offs = get_refblock_offset(s, discard_block_offs);
+uint64_t cluster_index = discard_block_offs >> s->cluster_bits;
+uint32_t block_index = cluster_index & (s->refcount_block_size - 1);
+void *refblock;
+int ret;
+
+assert(discard_block_offs != 0);
+
+ret = qcow2_cache_get(bs, s->refcount_block_cache, refblock_offs,
+  &refblock);
+if (ret < 0) {
+return ret;
+}
+
+if (s->get_refcount(refblock, block_index) != 1) {
+qcow2_signal_corruption(bs, true, -1, -1, "Invalid refcount:"
+" refblock offset %#" PRIx64
+", reftable index %u"
+", block offset %#" PRIx64
+", refcount %#" PRIx64,
+refblock_offs,
+offset_to_reftable_index(s, 
discard_block_offs),
+discard_block_offs,
+s->get_refcount(refblock, block_index));
+qcow2_cache_put(bs, s->refcount_block_cache, &refblock);
+return -EINVAL;
+}
+s->set_refcount(refblock, block_index, 0);
+
+qcow2_cache_entry_mark_dirty(bs, s->refcount_block_cache, refblock);
+
+qcow2_cache_put(bs, s->refcount_block_cache, &refblock);
+
+if (cluster_index < s->free_cluster_index) {
+s->free_cluster_index = cluster_index;
+}
+
+refblock = qcow2_cache_is_table_offset(bs, s->refcount_block_cache,
+   

Re: [Qemu-devel] [PATCH v7 0/4] Add shrink image for qcow2

2017-09-18 Thread Pavel Butsykin

On 16.09.2017 17:56, Max Reitz wrote:

On 2017-08-22 01:31, John Snow wrote:



On 08/17/2017 05:15 AM, Pavel Butsykin wrote:

This patch add shrinking of the image file for qcow2. As a result, this allows
us to reduce the virtual image size and free up space on the disk without
copying the image. Image can be fragmented and shrink is done by punching holes
in the image file.

# ./qemu-img create -f qcow2 image.qcow2 4G
Formatting 'image.qcow2', fmt=qcow2 size=4294967296 encryption=off 
cluster_size=65536 lazy_refcounts=off refcount_bits=16

# ./qemu-io -c "write -P 0x22 0 1G" image.qcow2
wrote 1073741824/1073741824 bytes at offset 0
1 GiB, 1 ops; 0:00:04.59 (222.886 MiB/sec and 0.2177 ops/sec)

# ./qemu-img resize image.qcow2 512M
warning: qemu-img: Shrinking an image will delete all data beyond the shrunken 
image's end. Before performing such an operation, make sure there is no 
important data there.
error: qemu-img: Use the --shrink option to perform a shrink operation.

# ./qemu-img resize --shrink image.qcow2 128M
Image resized.

# ./qemu-img info image.qcow2
image: image.qcow2
file format: qcow2
virtual size: 128M (134217728 bytes)
disk size: 128M
cluster_size: 65536
Format specific information:
 compat: 1.1
 lazy refcounts: false
 refcount bits: 16
 corrupt: false

# du -h image.qcow2
129Mimage.qcow2



It looks sane to me, and already has a full set of R-Bs from Max. Are we
waiting for Kevin?


Now I found that I was waiting for my own R-b for patch 3.  I hadn't yet
reviewed that patch in its current state (I had reviewed it for v5, but
it has changed quite a bit in v6)


I decided that small changes are not considered :) I'm sorry for the
confusion with the R-b.



Max





Re: [Qemu-devel] [PATCH] virtio-serial: add enable_backend callback

2017-09-18 Thread Pavel Butsykin

On 15.09.2017 20:09, Paolo Bonzini wrote:

On 07/07/2017 16:21, Pavel Butsykin wrote:

We should guarantee that RAM will not be modified while VM has a stopped
state, otherwise it can lead to negative consequences during post-copy
migration. In RUN_STATE_FINISH_MIGRATE step, it's expected that RAM on
source side will not be modified as this could lead to non-consistent vm state
on the destination side. Also RAM access during postcopy-ram migration with
enabled release-ram capability can lead to sad consequences.

Let's add enable_backend() callback to avoid undesirable virtioqueue changes
in the guest memory.

Signed-off-by: Pavel Butsykin 


To understand the patch better this doesn't fix _all_ stopped states,
only migration, right?  That said it's a valid bugfix even independent
of the effects for stopped runstate.
Yes, the bug only appears on the migration. Actually, to protect memory 
during the migration, this approach is already used for other virtio

devices, for example net_vhost, see virtio_net_vhost_status().


Thanks,

Paolo


---
  hw/char/virtio-console.c  | 21 +
  hw/char/virtio-serial-bus.c   |  7 +++
  include/hw/virtio/virtio-serial.h |  3 +++
  3 files changed, 31 insertions(+)

diff --git a/hw/char/virtio-console.c b/hw/char/virtio-console.c
index 0cb1668c8a..b55905892e 100644
--- a/hw/char/virtio-console.c
+++ b/hw/char/virtio-console.c
@@ -163,6 +163,26 @@ static void chr_event(void *opaque, int event)
  }
  }
  
+static void virtconsole_enable_backend(VirtIOSerialPort *port, bool enable)

+{
+VirtConsole *vcon = VIRTIO_CONSOLE(port);
+
+if (!qemu_chr_fe_get_driver(&vcon->chr)) {
+return;
+}
+
+if (enable) {
+VirtIOSerialPortClass *k = VIRTIO_SERIAL_PORT_GET_CLASS(port);
+
+qemu_chr_fe_set_handlers(&vcon->chr, chr_can_read, chr_read,
+ k->is_console ? NULL : chr_event,
+ vcon, NULL, false);
+} else {
+qemu_chr_fe_set_handlers(&vcon->chr, NULL, NULL,
+ NULL, NULL, NULL, false);
+}
+}
+
  static void virtconsole_realize(DeviceState *dev, Error **errp)
  {
  VirtIOSerialPort *port = VIRTIO_SERIAL_PORT(dev);
@@ -233,6 +253,7 @@ static void virtserialport_class_init(ObjectClass *klass, 
void *data)
  k->unrealize = virtconsole_unrealize;
  k->have_data = flush_buf;
  k->set_guest_connected = set_guest_connected;
+k->enable_backend = virtconsole_enable_backend;
  k->guest_writable = guest_writable;
  dc->props = virtserialport_properties;
  }
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index f5bc173844..f0f18c8e7c 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -637,6 +637,13 @@ static void set_status(VirtIODevice *vdev, uint8_t status)
  if (!(status & VIRTIO_CONFIG_S_DRIVER_OK)) {
  guest_reset(vser);
  }
+
+QTAILQ_FOREACH(port, &vser->ports, next) {
+VirtIOSerialPortClass *vsc = VIRTIO_SERIAL_PORT_GET_CLASS(port);
+if (vsc->enable_backend) {
+vsc->enable_backend(port, vdev->vm_running);
+}
+}
  }
  
  static void vser_reset(VirtIODevice *vdev)

diff --git a/include/hw/virtio/virtio-serial.h 
b/include/hw/virtio/virtio-serial.h
index b19c44727f..12657a9f39 100644
--- a/include/hw/virtio/virtio-serial.h
+++ b/include/hw/virtio/virtio-serial.h
@@ -58,6 +58,9 @@ typedef struct VirtIOSerialPortClass {
  /* Guest opened/closed device. */
  void (*set_guest_connected)(VirtIOSerialPort *port, int guest_connected);
  
+/* Enable/disable backend for virtio serial port */

+void (*enable_backend)(VirtIOSerialPort *port, bool enable);
+
  /* Guest is now ready to accept data (virtqueues set up). */
  void (*guest_ready)(VirtIOSerialPort *port);
  







Re: [Qemu-devel] [PATCH] virtio-serial: add enable_backend callback

2017-09-15 Thread Pavel Butsykin

On 17.07.2017 16:56, Pavel Butsykin wrote:

On 07.07.2017 17:21, Pavel Butsykin wrote:

We should guarantee that RAM will not be modified while VM has a stopped
state, otherwise it can lead to negative consequences during post-copy
migration. In RUN_STATE_FINISH_MIGRATE step, it's expected that RAM on
source side will not be modified as this could lead to non-consistent 
vm state
on the destination side. Also RAM access during postcopy-ram migration 
with

enabled release-ram capability can lead to sad consequences.

Let's add enable_backend() callback to avoid undesirable virtioqueue 
changes

in the guest memory.

Signed-off-by: Pavel Butsykin 
---
  hw/char/virtio-console.c  | 21 +
  hw/char/virtio-serial-bus.c   |  7 +++
  include/hw/virtio/virtio-serial.h |  3 +++
  3 files changed, 31 insertions(+)

diff --git a/hw/char/virtio-console.c b/hw/char/virtio-console.c
index 0cb1668c8a..b55905892e 100644
--- a/hw/char/virtio-console.c
+++ b/hw/char/virtio-console.c
@@ -163,6 +163,26 @@ static void chr_event(void *opaque, int event)
  }
  }
+static void virtconsole_enable_backend(VirtIOSerialPort *port, bool 
enable)

+{
+VirtConsole *vcon = VIRTIO_CONSOLE(port);
+
+if (!qemu_chr_fe_get_driver(&vcon->chr)) {
+return;
+}
+
+if (enable) {
+VirtIOSerialPortClass *k = VIRTIO_SERIAL_PORT_GET_CLASS(port);
+
+qemu_chr_fe_set_handlers(&vcon->chr, chr_can_read, chr_read,
+ k->is_console ? NULL : chr_event,
+ vcon, NULL, false);
+} else {
+qemu_chr_fe_set_handlers(&vcon->chr, NULL, NULL,
+ NULL, NULL, NULL, false);
+}
+}
+
  static void virtconsole_realize(DeviceState *dev, Error **errp)
  {
  VirtIOSerialPort *port = VIRTIO_SERIAL_PORT(dev);
@@ -233,6 +253,7 @@ static void virtserialport_class_init(ObjectClass 
*klass, void *data)

  k->unrealize = virtconsole_unrealize;
  k->have_data = flush_buf;
  k->set_guest_connected = set_guest_connected;
+k->enable_backend = virtconsole_enable_backend;
  k->guest_writable = guest_writable;
  dc->props = virtserialport_properties;
  }
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index f5bc173844..f0f18c8e7c 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -637,6 +637,13 @@ static void set_status(VirtIODevice *vdev, 
uint8_t status)

  if (!(status & VIRTIO_CONFIG_S_DRIVER_OK)) {
  guest_reset(vser);
  }
+
+QTAILQ_FOREACH(port, &vser->ports, next) {
+VirtIOSerialPortClass *vsc = VIRTIO_SERIAL_PORT_GET_CLASS(port);
+if (vsc->enable_backend) {
+vsc->enable_backend(port, vdev->vm_running);
+}
+}
  }
  static void vser_reset(VirtIODevice *vdev)
diff --git a/include/hw/virtio/virtio-serial.h 
b/include/hw/virtio/virtio-serial.h

index b19c44727f..12657a9f39 100644
--- a/include/hw/virtio/virtio-serial.h
+++ b/include/hw/virtio/virtio-serial.h
@@ -58,6 +58,9 @@ typedef struct VirtIOSerialPortClass {
  /* Guest opened/closed device. */
  void (*set_guest_connected)(VirtIOSerialPort *port, int 
guest_connected);

+/* Enable/disable backend for virtio serial port */
+void (*enable_backend)(VirtIOSerialPort *port, bool enable);
+
  /* Guest is now ready to accept data (virtqueues set up). */
  void (*guest_ready)(VirtIOSerialPort *port);



ping


ping^2




Re: [Qemu-devel] [PATCH v7 0/4] Add shrink image for qcow2

2017-09-15 Thread Pavel Butsykin

On 17.08.2017 12:15, Pavel Butsykin wrote:

This patch add shrinking of the image file for qcow2. As a result, this allows
us to reduce the virtual image size and free up space on the disk without
copying the image. Image can be fragmented and shrink is done by punching holes
in the image file.

# ./qemu-img create -f qcow2 image.qcow2 4G
Formatting 'image.qcow2', fmt=qcow2 size=4294967296 encryption=off 
cluster_size=65536 lazy_refcounts=off refcount_bits=16

# ./qemu-io -c "write -P 0x22 0 1G" image.qcow2
wrote 1073741824/1073741824 bytes at offset 0
1 GiB, 1 ops; 0:00:04.59 (222.886 MiB/sec and 0.2177 ops/sec)

# ./qemu-img resize image.qcow2 512M
warning: qemu-img: Shrinking an image will delete all data beyond the shrunken 
image's end. Before performing such an operation, make sure there is no 
important data there.
error: qemu-img: Use the --shrink option to perform a shrink operation.

# ./qemu-img resize --shrink image.qcow2 128M
Image resized.

# ./qemu-img info image.qcow2
image: image.qcow2
file format: qcow2
virtual size: 128M (134217728 bytes)
disk size: 128M
cluster_size: 65536
Format specific information:
 compat: 1.1
 lazy refcounts: false
 refcount bits: 16
 corrupt: false

# du -h image.qcow2
129Mimage.qcow2


ping


Hi guys, I would like to ask is there any plans to take this small patch
series? :) And thanks for the review.



Re: [Qemu-devel] [PATCH] qcow2: move qcow2_store_persistent_dirty_bitmaps() before cache flushing

2017-09-06 Thread Pavel Butsykin

On 05.09.2017 22:30, Eric Blake wrote:

On 09/04/2017 05:18 AM, Pavel Butsykin wrote:

After calling qcow2_inactivate(), all qcow2 caches must be flushed, but this
may not happen, because the last call qcow2_store_persistent_dirty_bitmaps()
can lead to marking l2/refcont cache as dirty.

Let's move qcow2_store_persistent_dirty_bitmaps() before the caсhe flushing
to fix it.

Signed-off-by: Pavel Butsykin 
---
  block/qcow2.c | 16 
  1 file changed, 8 insertions(+), 8 deletions(-)



Should this cc: qemu-stable?


The latest stable branch (2.8?) doesn't contain the persistent dirty bitmap.


Reviewed-by: Eric Blake 





[Qemu-devel] [PATCH] qcow2: move qcow2_store_persistent_dirty_bitmaps() before cache flushing

2017-09-04 Thread Pavel Butsykin
After calling qcow2_inactivate(), all qcow2 caches must be flushed, but this
may not happen, because the last call qcow2_store_persistent_dirty_bitmaps()
can lead to marking l2/refcont cache as dirty.

Let's move qcow2_store_persistent_dirty_bitmaps() before the caсhe flushing
to fix it.

Signed-off-by: Pavel Butsykin 
---
 block/qcow2.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index a3679c69e8..dcf49084c5 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2037,6 +2037,14 @@ static int qcow2_inactivate(BlockDriverState *bs)
 int ret, result = 0;
 Error *local_err = NULL;
 
+qcow2_store_persistent_dirty_bitmaps(bs, &local_err);
+if (local_err != NULL) {
+result = -EINVAL;
+error_report_err(local_err);
+error_report("Persistent bitmaps are lost for node '%s'",
+ bdrv_get_device_or_node_name(bs));
+}
+
 ret = qcow2_cache_flush(bs, s->l2_table_cache);
 if (ret) {
 result = ret;
@@ -2051,14 +2059,6 @@ static int qcow2_inactivate(BlockDriverState *bs)
  strerror(-ret));
 }
 
-qcow2_store_persistent_dirty_bitmaps(bs, &local_err);
-if (local_err != NULL) {
-result = -EINVAL;
-error_report_err(local_err);
-error_report("Persistent bitmaps are lost for node '%s'",
- bdrv_get_device_or_node_name(bs));
-}
-
 if (result == 0) {
 qcow2_mark_clean(bs);
 }
-- 
2.14.1




[Qemu-devel] [PATCH] follow-up path - " qcow2: add shrink image support"

2017-08-17 Thread Pavel Butsykin
Signed-off-by: Pavel Butsykin 
---
 qapi/block-core.json | 5 +
 1 file changed, 5 insertions(+)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index d6172bfe15..c55cd0c8db 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2479,6 +2479,11 @@
 #
 # Trigger events supported by blkdebug.
 #
+# @l1_shrink_write_table:  write zeros to the l1 table to shrink image.
+#  (since 2.11)
+#
+# @l1_shrink_free_l2_clusters: discard the l2 tables. (since 2.11)
+#
 # Since: 2.9
 ##
 { 'enum': 'BlkdebugEvent', 'prefix': 'BLKDBG',
-- 
2.14.1




Re: [Qemu-devel] [Qemu-block] [PATCH v6 0/4] Add shrink image for qcow2

2017-08-17 Thread Pavel Butsykin

On 17.08.2017 00:07, John Snow wrote:

Over a month with no replies and we're nearing the next QEMU release. If
this patchset is still applicable, can you rebase and resend for 2.11?


Thanks for digging up these patches :) I've sent the rebased version.


--js

On 07/14/2017 11:37 AM, Pavel Butsykin wrote:

This patch add shrinking of the image file for qcow2. As a result, this allows
us to reduce the virtual image size and free up space on the disk without
copying the image. Image can be fragmented and shrink is done by punching holes
in the image file.

# ./qemu-img create -f qcow2 image.qcow2 4G
Formatting 'image.qcow2', fmt=qcow2 size=4294967296 encryption=off 
cluster_size=65536 lazy_refcounts=off refcount_bits=16

# ./qemu-io -c "write -P 0x22 0 1G" image.qcow2
wrote 1073741824/1073741824 bytes at offset 0
1 GiB, 1 ops; 0:00:04.59 (222.886 MiB/sec and 0.2177 ops/sec)

# ./qemu-img resize image.qcow2 512M
warning: qemu-img: Shrinking an image will delete all data beyond the shrunken 
image's end. Before performing such an operation, make sure there is no 
important data there.
error: qemu-img: Use the --shrink option to perform a shrink operation.

# ./qemu-img resize --shrink image.qcow2 128M
Image resized.

# ./qemu-img info image.qcow2
image: image.qcow2
file format: qcow2
virtual size: 128M (134217728 bytes)
disk size: 128M
cluster_size: 65536
Format specific information:
 compat: 1.1
 lazy refcounts: false
 refcount bits: 16
 corrupt: false

# du -h image.qcow2
129Mimage.qcow2

Changes from v1:
- add --shrink flag for qemu-img resize
- add qcow2_cache_discard
- simplify qcow2_shrink_l1_table() to reduce the likelihood of image corruption
- add new qemu-iotests for shrinking images

Changes from v2:
- replace qprintf() on error_report() (1)
- rewrite warning messages (1)
- enforce --shrink flag for all formats except raw (1)
- split qcow2_cache_discard() (2)
- minor fixes according to comments (3)
- rewrite the last part of qcow2_shrink_reftable() to avoid
   qcow2_free_clusters() calls inside (3)
- improve test for shrinking image (4)

Changes from v3:
- rebase on "Implement a warning_report function" Alistair's patch-set (1)
- spelling fixes (1)
- the man page fix according to the discussion (1)
- add call qcow2_signal_corruption() in case of image corruption (3)

Changes from v4:
- rebase on https://github.com/XanClic/qemu/commits/block Max's block branch

Changes from v5:
- the condition refcount == 0 should be enough to evict the l2/refcount cluster
   from the cache (2)
- overwrite the l1/refcount table in memory with zeros, even if overwriting the
   l1/refcount table on disk has failed (3)
- replace g_try_malloc() on g_malloc() for allocation reftable_tmp (3)

Pavel Butsykin (4):
   qemu-img: add --shrink flag for resize
   qcow2: add qcow2_cache_discard
   qcow2: add shrink image support
   qemu-iotests: add shrinking image test

  block/qcow2-cache.c|  26 +++
  block/qcow2-cluster.c  |  50 +
  block/qcow2-refcount.c | 140 -
  block/qcow2.c  |  43 +---
  block/qcow2.h  |  17 +
  qapi/block-core.json   |   3 +-
  qemu-img-cmds.hx   |   4 +-
  qemu-img.c |  23 ++
  qemu-img.texi  |   6 +-
  tests/qemu-iotests/102 |   4 +-
  tests/qemu-iotests/163 | 170 +
  tests/qemu-iotests/163.out |   5 ++
  tests/qemu-iotests/group   |   1 +
  13 files changed, 475 insertions(+), 17 deletions(-)
  create mode 100644 tests/qemu-iotests/163
  create mode 100644 tests/qemu-iotests/163.out





[Qemu-devel] [PATCH v7 4/4] qemu-iotests: add shrinking image test

2017-08-17 Thread Pavel Butsykin
Signed-off-by: Pavel Butsykin 
Reviewed-by: Max Reitz 
---
 tests/qemu-iotests/163 | 170 +
 tests/qemu-iotests/163.out |   5 ++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 176 insertions(+)
 create mode 100644 tests/qemu-iotests/163
 create mode 100644 tests/qemu-iotests/163.out

diff --git a/tests/qemu-iotests/163 b/tests/qemu-iotests/163
new file mode 100644
index 00..403842354e
--- /dev/null
+++ b/tests/qemu-iotests/163
@@ -0,0 +1,170 @@
+#!/usr/bin/env python
+#
+# Tests for shrinking images
+#
+# Copyright (c) 2016-2017 Parallels International GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import os, random, iotests, struct, qcow2
+from iotests import qemu_img, qemu_io, image_size
+
+test_img = os.path.join(iotests.test_dir, 'test.img')
+check_img = os.path.join(iotests.test_dir, 'check.img')
+
+def size_to_int(str):
+suff = ['B', 'K', 'M', 'G', 'T']
+return int(str[:-1]) * 1024**suff.index(str[-1:])
+
+class ShrinkBaseClass(iotests.QMPTestCase):
+image_len = '128M'
+shrink_size = '10M'
+chunk_size = '16M'
+refcount_bits = '16'
+
+def __qcow2_check(self, filename):
+entry_bits = 3
+entry_size = 1 << entry_bits
+l1_mask = 0x00fffe00
+div_roundup = lambda n, d: (n + d - 1) / d
+
+def split_by_n(data, n):
+for x in xrange(0, len(data), n):
+yield struct.unpack('>Q', data[x:x + n])[0] & l1_mask
+
+def check_l1_table(h, l1_data):
+l1_list = list(split_by_n(l1_data, entry_size))
+real_l1_size = div_roundup(h.size,
+   1 << (h.cluster_bits*2 - entry_size))
+used, unused = l1_list[:real_l1_size], l1_list[real_l1_size:]
+
+self.assertTrue(len(used) != 0, "Verifying l1 table content")
+self.assertFalse(any(unused), "Verifying l1 table content")
+
+def check_reftable(fd, h, reftable):
+for offset in split_by_n(reftable, entry_size):
+if offset != 0:
+fd.seek(offset)
+cluster = fd.read(1 << h.cluster_bits)
+self.assertTrue(any(cluster), "Verifying reftable content")
+
+with open(filename, "rb") as fd:
+h = qcow2.QcowHeader(fd)
+
+fd.seek(h.l1_table_offset)
+l1_table = fd.read(h.l1_size << entry_bits)
+
+fd.seek(h.refcount_table_offset)
+reftable = fd.read(h.refcount_table_clusters << h.cluster_bits)
+
+check_l1_table(h, l1_table)
+check_reftable(fd, h, reftable)
+
+def __raw_check(self, filename):
+pass
+
+image_check = {
+'qcow2' : __qcow2_check,
+'raw' : __raw_check
+}
+
+def setUp(self):
+if iotests.imgfmt == 'raw':
+qemu_img('create', '-f', iotests.imgfmt, test_img, self.image_len)
+qemu_img('create', '-f', iotests.imgfmt, check_img,
+ self.shrink_size)
+else:
+qemu_img('create', '-f', iotests.imgfmt,
+ '-o', 'cluster_size=' + self.cluster_size +
+ ',refcount_bits=' + self.refcount_bits,
+ test_img, self.image_len)
+qemu_img('create', '-f', iotests.imgfmt,
+ '-o', 'cluster_size=%s'% self.cluster_size,
+ check_img, self.shrink_size)
+qemu_io('-c', 'write -P 0xff 0 ' + self.shrink_size, check_img)
+
+def tearDown(self):
+os.remove(test_img)
+os.remove(check_img)
+
+def image_verify(self):
+self.assertEqual(image_size(test_img), image_size(check_img),
+ "Verifying image size")
+self.image_check[iotests.imgfmt](self, test_img)
+
+if iotests.imgfmt == 'raw':
+return
+self.assertEqual(qemu_img('check', test_img), 0,
+ 

[Qemu-devel] [PATCH v7 3/4] qcow2: add shrink image support

2017-08-17 Thread Pavel Butsykin
This patch add shrinking of the image file for qcow2. As a result, this allows
us to reduce the virtual image size and free up space on the disk without
copying the image. Image can be fragmented and shrink is done by punching holes
in the image file.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Max Reitz 
---
 block/qcow2-cluster.c  |  50 +
 block/qcow2-refcount.c | 120 +
 block/qcow2.c  |  43 ++
 block/qcow2.h  |  14 ++
 qapi/block-core.json   |   3 +-
 5 files changed, 220 insertions(+), 10 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index f06c08f64c..0c7a9a920c 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -32,6 +32,56 @@
 #include "qemu/bswap.h"
 #include "trace.h"
 
+int qcow2_shrink_l1_table(BlockDriverState *bs, uint64_t exact_size)
+{
+BDRVQcow2State *s = bs->opaque;
+int new_l1_size, i, ret;
+
+if (exact_size >= s->l1_size) {
+return 0;
+}
+
+new_l1_size = exact_size;
+
+#ifdef DEBUG_ALLOC2
+fprintf(stderr, "shrink l1_table from %d to %d\n", s->l1_size, 
new_l1_size);
+#endif
+
+BLKDBG_EVENT(bs->file, BLKDBG_L1_SHRINK_WRITE_TABLE);
+ret = bdrv_pwrite_zeroes(bs->file, s->l1_table_offset +
+   new_l1_size * sizeof(uint64_t),
+ (s->l1_size - new_l1_size) * sizeof(uint64_t), 0);
+if (ret < 0) {
+goto fail;
+}
+
+ret = bdrv_flush(bs->file->bs);
+if (ret < 0) {
+goto fail;
+}
+
+BLKDBG_EVENT(bs->file, BLKDBG_L1_SHRINK_FREE_L2_CLUSTERS);
+for (i = s->l1_size - 1; i > new_l1_size - 1; i--) {
+if ((s->l1_table[i] & L1E_OFFSET_MASK) == 0) {
+continue;
+}
+qcow2_free_clusters(bs, s->l1_table[i] & L1E_OFFSET_MASK,
+s->cluster_size, QCOW2_DISCARD_ALWAYS);
+s->l1_table[i] = 0;
+}
+return 0;
+
+fail:
+/*
+ * If the write in the l1_table failed the image may contain partially
+ * overwritten the l1_table. In this case would be better to clear the
+ * l1_table in memory to avoid possible image corruption.
+ */
+memset(s->l1_table + exact_size, 0,
+   (s->l1_size - new_l1_size) * sizeof(uint64_t));
+return ret;
+}
+
 int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
 bool exact_size)
 {
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 8c17c0e3aa..15af9a795f 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -29,6 +29,7 @@
 #include "block/qcow2.h"
 #include "qemu/range.h"
 #include "qemu/bswap.h"
+#include "qemu/cutils.h"
 
 static int64_t alloc_clusters_noref(BlockDriverState *bs, uint64_t size);
 static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
@@ -3061,3 +3062,122 @@ done:
 qemu_vfree(new_refblock);
 return ret;
 }
+
+static int qcow2_discard_refcount_block(BlockDriverState *bs,
+uint64_t discard_block_offs)
+{
+BDRVQcow2State *s = bs->opaque;
+uint64_t refblock_offs = get_refblock_offset(s, discard_block_offs);
+uint64_t cluster_index = discard_block_offs >> s->cluster_bits;
+uint32_t block_index = cluster_index & (s->refcount_block_size - 1);
+void *refblock;
+int ret;
+
+assert(discard_block_offs != 0);
+
+ret = qcow2_cache_get(bs, s->refcount_block_cache, refblock_offs,
+  &refblock);
+if (ret < 0) {
+return ret;
+}
+
+if (s->get_refcount(refblock, block_index) != 1) {
+qcow2_signal_corruption(bs, true, -1, -1, "Invalid refcount:"
+" refblock offset %#" PRIx64
+", reftable index %u"
+", block offset %#" PRIx64
+", refcount %#" PRIx64,
+refblock_offs,
+offset_to_reftable_index(s, 
discard_block_offs),
+discard_block_offs,
+s->get_refcount(refblock, block_index));
+qcow2_cache_put(bs, s->refcount_block_cache, &refblock);
+return -EINVAL;
+}
+s->set_refcount(refblock, block_index, 0);
+
+qcow2_cache_entry_mark_dirty(bs, s->refcount_block_cache, refblock);
+
+qcow2_cache_put(bs, s->refcount_block_cache, &refblock);
+
+if (cluster_index < s->free_cluster_index) {
+s->free_cluster_index = cluster_index;
+}
+
+refblock = qcow2_cache_is_table_offset(bs, s->refcount_block_cache,
+

[Qemu-devel] [PATCH v7 2/4] qcow2: add qcow2_cache_discard

2017-08-17 Thread Pavel Butsykin
Whenever l2/refcount table clusters are discarded from the file we can
automatically drop unnecessary content of the cache tables. This reduces
the chance of eviction useful cache data and eliminates inconsistent data
in the cache with the data in the file.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Max Reitz 
---
 block/qcow2-cache.c| 26 ++
 block/qcow2-refcount.c | 20 ++--
 block/qcow2.h  |  3 +++
 3 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index 1d25147392..75746a7f43 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -411,3 +411,29 @@ void qcow2_cache_entry_mark_dirty(BlockDriverState *bs, 
Qcow2Cache *c,
 assert(c->entries[i].offset != 0);
 c->entries[i].dirty = true;
 }
+
+void *qcow2_cache_is_table_offset(BlockDriverState *bs, Qcow2Cache *c,
+  uint64_t offset)
+{
+int i;
+
+for (i = 0; i < c->size; i++) {
+if (c->entries[i].offset == offset) {
+return qcow2_cache_get_table_addr(bs, c, i);
+}
+}
+return NULL;
+}
+
+void qcow2_cache_discard(BlockDriverState *bs, Qcow2Cache *c, void *table)
+{
+int i = qcow2_cache_get_table_idx(bs, c, table);
+
+assert(c->entries[i].ref == 0);
+
+c->entries[i].offset = 0;
+c->entries[i].lru_counter = 0;
+c->entries[i].dirty = false;
+
+qcow2_cache_table_release(bs, c, i, 1);
+}
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 168fc32e7b..8c17c0e3aa 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -861,8 +861,24 @@ static int QEMU_WARN_UNUSED_RESULT 
update_refcount(BlockDriverState *bs,
 }
 s->set_refcount(refcount_block, block_index, refcount);
 
-if (refcount == 0 && s->discard_passthrough[type]) {
-update_refcount_discard(bs, cluster_offset, s->cluster_size);
+if (refcount == 0) {
+void *table;
+
+table = qcow2_cache_is_table_offset(bs, s->refcount_block_cache,
+offset);
+if (table != NULL) {
+qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
+qcow2_cache_discard(bs, s->refcount_block_cache, table);
+}
+
+table = qcow2_cache_is_table_offset(bs, s->l2_table_cache, offset);
+if (table != NULL) {
+qcow2_cache_discard(bs, s->l2_table_cache, table);
+}
+
+if (s->discard_passthrough[type]) {
+update_refcount_discard(bs, cluster_offset, s->cluster_size);
+}
 }
 }
 
diff --git a/block/qcow2.h b/block/qcow2.h
index 96a8d43c17..52c374e9ed 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -649,6 +649,9 @@ int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, 
uint64_t offset,
 int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
 void **table);
 void qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
+void *qcow2_cache_is_table_offset(BlockDriverState *bs, Qcow2Cache *c,
+  uint64_t offset);
+void qcow2_cache_discard(BlockDriverState *bs, Qcow2Cache *c, void *table);
 
 /* qcow2-bitmap.c functions */
 int qcow2_check_bitmaps_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
-- 
2.14.1




[Qemu-devel] [PATCH v7 1/4] qemu-img: add --shrink flag for resize

2017-08-17 Thread Pavel Butsykin
The flag is additional precaution against data loss. Perhaps in the future the
operation shrink without this flag will be blocked for all formats, but for now
we need to maintain compatibility with raw.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Max Reitz 
---
 qemu-img-cmds.hx   |  4 ++--
 qemu-img.c | 23 +++
 qemu-img.texi  |  6 +-
 tests/qemu-iotests/102 |  4 ++--
 4 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index b47d409665..2fe31893cf 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -89,9 +89,9 @@ STEXI
 ETEXI
 
 DEF("resize", img_resize,
-"resize [--object objectdef] [--image-opts] [-q] filename [+ | -]size")
+"resize [--object objectdef] [--image-opts] [-q] [--shrink] filename [+ | 
-]size")
 STEXI
-@item resize [--object @var{objectdef}] [--image-opts] [-q] @var{filename} [+ 
| -]@var{size}
+@item resize [--object @var{objectdef}] [--image-opts] [-q] [--shrink] 
@var{filename} [+ | -]@var{size}
 ETEXI
 
 STEXI
diff --git a/qemu-img.c b/qemu-img.c
index 56ef49e214..b7b2386cbd 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -65,6 +65,7 @@ enum {
 OPTION_TARGET_IMAGE_OPTS = 263,
 OPTION_SIZE = 264,
 OPTION_PREALLOCATION = 265,
+OPTION_SHRINK = 266,
 };
 
 typedef enum OutputFormat {
@@ -3437,6 +3438,7 @@ static int img_resize(int argc, char **argv)
 },
 };
 bool image_opts = false;
+bool shrink = false;
 
 /* Remove size from argv manually so that negative numbers are not treated
  * as options by getopt. */
@@ -3455,6 +3457,7 @@ static int img_resize(int argc, char **argv)
 {"object", required_argument, 0, OPTION_OBJECT},
 {"image-opts", no_argument, 0, OPTION_IMAGE_OPTS},
 {"preallocation", required_argument, 0, OPTION_PREALLOCATION},
+{"shrink", no_argument, 0, OPTION_SHRINK},
 {0, 0, 0, 0}
 };
 c = getopt_long(argc, argv, ":f:hq",
@@ -3498,6 +3501,9 @@ static int img_resize(int argc, char **argv)
 return 1;
 }
 break;
+case OPTION_SHRINK:
+shrink = true;
+break;
 }
 }
 if (optind != argc - 1) {
@@ -3571,6 +3577,23 @@ static int img_resize(int argc, char **argv)
 goto out;
 }
 
+if (total_size < current_size && !shrink) {
+warn_report("Shrinking an image will delete all data beyond the "
+"shrunken image's end. Before performing such an "
+"operation, make sure there is no important data there.");
+
+if (g_strcmp0(bdrv_get_format_name(blk_bs(blk)), "raw") != 0) {
+error_report(
+  "Use the --shrink option to perform a shrink operation.");
+ret = -1;
+goto out;
+} else {
+warn_report("Using the --shrink option will suppress this message."
+"Note that future versions of qemu-img may refuse to "
+"shrink images without this option.");
+}
+}
+
 ret = blk_truncate(blk, total_size, prealloc, &err);
 if (!ret) {
 qprintf(quiet, "Image resized.\n");
diff --git a/qemu-img.texi b/qemu-img.texi
index 72dabd6b3e..ea5d04b873 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -536,7 +536,7 @@ qemu-img rebase -b base.img diff.qcow2
 At this point, @code{modified.img} can be discarded, since
 @code{base.img + diff.qcow2} contains the same information.
 
-@item resize [--preallocation=@var{prealloc}] @var{filename} [+ | -]@var{size}
+@item resize [--shrink] [--preallocation=@var{prealloc}] @var{filename} [+ | 
-]@var{size}
 
 Change the disk image as if it had been created with @var{size}.
 
@@ -544,6 +544,10 @@ Before using this command to shrink a disk image, you MUST 
use file system and
 partitioning tools inside the VM to reduce allocated file systems and partition
 sizes accordingly.  Failure to do so will result in data loss!
 
+When shrinking images, the @code{--shrink} option must be given. This informs
+qemu-img that the user acknowledges all loss of data beyond the truncated
+image's end.
+
 After using this command to grow a disk image, you must use file system and
 partitioning tools inside the VM to actually begin using the new space on the
 device.
diff --git a/tests/qemu-iotests/102 b/tests/qemu-iotests/102
index 87db1bb1bf..d7ad8d9840 100755
--- a/tests/qemu-iotests/102
+++ b/tests/qemu-iotests/102
@@ -54,7 +54,7 @@ _make_test_img $IMG_SIZE
 $QEMU_IO -c 'write 0 64k' "$TEST_IMG" | _filter_qemu_io
 # Remove data cluster from image (first cluster: image header, second: 
reftable,
 # third: refblock, fourth: L1 table, fifth: L2 table)
-$QEMU_IMG resize 

[Qemu-devel] [PATCH v7 0/4] Add shrink image for qcow2

2017-08-17 Thread Pavel Butsykin
This patch add shrinking of the image file for qcow2. As a result, this allows
us to reduce the virtual image size and free up space on the disk without
copying the image. Image can be fragmented and shrink is done by punching holes
in the image file.

# ./qemu-img create -f qcow2 image.qcow2 4G
Formatting 'image.qcow2', fmt=qcow2 size=4294967296 encryption=off 
cluster_size=65536 lazy_refcounts=off refcount_bits=16

# ./qemu-io -c "write -P 0x22 0 1G" image.qcow2
wrote 1073741824/1073741824 bytes at offset 0
1 GiB, 1 ops; 0:00:04.59 (222.886 MiB/sec and 0.2177 ops/sec)

# ./qemu-img resize image.qcow2 512M
warning: qemu-img: Shrinking an image will delete all data beyond the shrunken 
image's end. Before performing such an operation, make sure there is no 
important data there.
error: qemu-img: Use the --shrink option to perform a shrink operation.

# ./qemu-img resize --shrink image.qcow2 128M
Image resized.

# ./qemu-img info image.qcow2
image: image.qcow2
file format: qcow2
virtual size: 128M (134217728 bytes)
disk size: 128M
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false

# du -h image.qcow2
129Mimage.qcow2

Changes from v1:
- add --shrink flag for qemu-img resize
- add qcow2_cache_discard
- simplify qcow2_shrink_l1_table() to reduce the likelihood of image corruption
- add new qemu-iotests for shrinking images

Changes from v2:
- replace qprintf() on error_report() (1)
- rewrite warning messages (1)
- enforce --shrink flag for all formats except raw (1)
- split qcow2_cache_discard() (2)
- minor fixes according to comments (3)
- rewrite the last part of qcow2_shrink_reftable() to avoid
  qcow2_free_clusters() calls inside (3)
- improve test for shrinking image (4)

Changes from v3:
- rebase on "Implement a warning_report function" Alistair's patch-set (1)
- spelling fixes (1)
- the man page fix according to the discussion (1)
- add call qcow2_signal_corruption() in case of image corruption (3)

Changes from v4:
- rebase on https://github.com/XanClic/qemu/commits/block Max's block branch

Changes from v5:
- the condition refcount == 0 should be enough to evict the l2/refcount cluster
  from the cache (2)
- overwrite the l1/refcount table in memory with zeros, even if overwriting the
  l1/refcount table on disk has failed (3)
- replace g_try_malloc() on g_malloc() for allocation reftable_tmp (3)

Changes from v6:
- rebase on master 1f29673387

Pavel Butsykin (4):
  qemu-img: add --shrink flag for resize
  qcow2: add qcow2_cache_discard
  qcow2: add shrink image support
  qemu-iotests: add shrinking image test

 block/qcow2-cache.c|  26 +++
 block/qcow2-cluster.c  |  50 +
 block/qcow2-refcount.c | 140 -
 block/qcow2.c  |  43 +---
 block/qcow2.h  |  17 +
 qapi/block-core.json   |   3 +-
 qemu-img-cmds.hx   |   4 +-
 qemu-img.c |  23 ++
 qemu-img.texi  |   6 +-
 tests/qemu-iotests/102 |   4 +-
 tests/qemu-iotests/163 | 170 +
 tests/qemu-iotests/163.out |   5 ++
 tests/qemu-iotests/group   |   1 +
 13 files changed, 475 insertions(+), 17 deletions(-)
 create mode 100644 tests/qemu-iotests/163
 create mode 100644 tests/qemu-iotests/163.out

-- 
2.14.1




Re: [Qemu-devel] [PATCH] virtio-serial: add enable_backend callback

2017-07-17 Thread Pavel Butsykin

On 07.07.2017 17:21, Pavel Butsykin wrote:

We should guarantee that RAM will not be modified while VM has a stopped
state, otherwise it can lead to negative consequences during post-copy
migration. In RUN_STATE_FINISH_MIGRATE step, it's expected that RAM on
source side will not be modified as this could lead to non-consistent vm state
on the destination side. Also RAM access during postcopy-ram migration with
enabled release-ram capability can lead to sad consequences.

Let's add enable_backend() callback to avoid undesirable virtioqueue changes
in the guest memory.

Signed-off-by: Pavel Butsykin 
---
  hw/char/virtio-console.c  | 21 +
  hw/char/virtio-serial-bus.c   |  7 +++
  include/hw/virtio/virtio-serial.h |  3 +++
  3 files changed, 31 insertions(+)

diff --git a/hw/char/virtio-console.c b/hw/char/virtio-console.c
index 0cb1668c8a..b55905892e 100644
--- a/hw/char/virtio-console.c
+++ b/hw/char/virtio-console.c
@@ -163,6 +163,26 @@ static void chr_event(void *opaque, int event)
  }
  }
  
+static void virtconsole_enable_backend(VirtIOSerialPort *port, bool enable)

+{
+VirtConsole *vcon = VIRTIO_CONSOLE(port);
+
+if (!qemu_chr_fe_get_driver(&vcon->chr)) {
+return;
+}
+
+if (enable) {
+VirtIOSerialPortClass *k = VIRTIO_SERIAL_PORT_GET_CLASS(port);
+
+qemu_chr_fe_set_handlers(&vcon->chr, chr_can_read, chr_read,
+ k->is_console ? NULL : chr_event,
+ vcon, NULL, false);
+} else {
+qemu_chr_fe_set_handlers(&vcon->chr, NULL, NULL,
+ NULL, NULL, NULL, false);
+}
+}
+
  static void virtconsole_realize(DeviceState *dev, Error **errp)
  {
  VirtIOSerialPort *port = VIRTIO_SERIAL_PORT(dev);
@@ -233,6 +253,7 @@ static void virtserialport_class_init(ObjectClass *klass, 
void *data)
  k->unrealize = virtconsole_unrealize;
  k->have_data = flush_buf;
  k->set_guest_connected = set_guest_connected;
+k->enable_backend = virtconsole_enable_backend;
  k->guest_writable = guest_writable;
  dc->props = virtserialport_properties;
  }
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index f5bc173844..f0f18c8e7c 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -637,6 +637,13 @@ static void set_status(VirtIODevice *vdev, uint8_t status)
  if (!(status & VIRTIO_CONFIG_S_DRIVER_OK)) {
  guest_reset(vser);
  }
+
+QTAILQ_FOREACH(port, &vser->ports, next) {
+VirtIOSerialPortClass *vsc = VIRTIO_SERIAL_PORT_GET_CLASS(port);
+if (vsc->enable_backend) {
+vsc->enable_backend(port, vdev->vm_running);
+}
+}
  }
  
  static void vser_reset(VirtIODevice *vdev)

diff --git a/include/hw/virtio/virtio-serial.h 
b/include/hw/virtio/virtio-serial.h
index b19c44727f..12657a9f39 100644
--- a/include/hw/virtio/virtio-serial.h
+++ b/include/hw/virtio/virtio-serial.h
@@ -58,6 +58,9 @@ typedef struct VirtIOSerialPortClass {
  /* Guest opened/closed device. */
  void (*set_guest_connected)(VirtIOSerialPort *port, int guest_connected);
  
+/* Enable/disable backend for virtio serial port */

+void (*enable_backend)(VirtIOSerialPort *port, bool enable);
+
  /* Guest is now ready to accept data (virtqueues set up). */
  void (*guest_ready)(VirtIOSerialPort *port);
  



ping



[Qemu-devel] [PATCH v6 1/4] qemu-img: add --shrink flag for resize

2017-07-14 Thread Pavel Butsykin
The flag is additional precaution against data loss. Perhaps in the future the
operation shrink without this flag will be blocked for all formats, but for now
we need to maintain compatibility with raw.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Max Reitz 
---
 qemu-img-cmds.hx   |  4 ++--
 qemu-img.c | 23 +++
 qemu-img.texi  |  6 +-
 tests/qemu-iotests/102 |  4 ++--
 4 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index ac5946bc4f..e36957a2ca 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -82,9 +82,9 @@ STEXI
 ETEXI
 
 DEF("resize", img_resize,
-"resize [--object objectdef] [--image-opts] [-q] filename [+ | -]size")
+"resize [--object objectdef] [--image-opts] [-q] [--shrink] filename [+ | 
-]size")
 STEXI
-@item resize [--object @var{objectdef}] [--image-opts] [-q] @var{filename} [+ 
| -]@var{size}
+@item resize [--object @var{objectdef}] [--image-opts] [-q] [--shrink] 
@var{filename} [+ | -]@var{size}
 ETEXI
 
 DEF("amend", img_amend,
diff --git a/qemu-img.c b/qemu-img.c
index 28022145d5..b4dc4bb5c4 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -64,6 +64,7 @@ enum {
 OPTION_TARGET_IMAGE_OPTS = 263,
 OPTION_SIZE = 264,
 OPTION_PREALLOCATION = 265,
+OPTION_SHRINK = 266,
 };
 
 typedef enum OutputFormat {
@@ -3430,6 +3431,7 @@ static int img_resize(int argc, char **argv)
 },
 };
 bool image_opts = false;
+bool shrink = false;
 
 /* Remove size from argv manually so that negative numbers are not treated
  * as options by getopt. */
@@ -3448,6 +3450,7 @@ static int img_resize(int argc, char **argv)
 {"object", required_argument, 0, OPTION_OBJECT},
 {"image-opts", no_argument, 0, OPTION_IMAGE_OPTS},
 {"preallocation", required_argument, 0, OPTION_PREALLOCATION},
+{"shrink", no_argument, 0, OPTION_SHRINK},
 {0, 0, 0, 0}
 };
 c = getopt_long(argc, argv, ":f:hq",
@@ -3491,6 +3494,9 @@ static int img_resize(int argc, char **argv)
 return 1;
 }
 break;
+case OPTION_SHRINK:
+shrink = true;
+break;
 }
 }
 if (optind != argc - 1) {
@@ -3564,6 +3570,23 @@ static int img_resize(int argc, char **argv)
 goto out;
 }
 
+if (total_size < current_size && !shrink) {
+warn_report("Shrinking an image will delete all data beyond the "
+"shrunken image's end. Before performing such an "
+"operation, make sure there is no important data there.");
+
+if (g_strcmp0(bdrv_get_format_name(blk_bs(blk)), "raw") != 0) {
+error_report(
+  "Use the --shrink option to perform a shrink operation.");
+ret = -1;
+goto out;
+} else {
+warn_report("Using the --shrink option will suppress this message."
+"Note that future versions of qemu-img may refuse to "
+"shrink images without this option.");
+}
+}
+
 ret = blk_truncate(blk, total_size, prealloc, &err);
 if (!ret) {
 qprintf(quiet, "Image resized.\n");
diff --git a/qemu-img.texi b/qemu-img.texi
index f11f6036ad..9a930f5e6d 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -529,7 +529,7 @@ qemu-img rebase -b base.img diff.qcow2
 At this point, @code{modified.img} can be discarded, since
 @code{base.img + diff.qcow2} contains the same information.
 
-@item resize [--preallocation=@var{prealloc}] @var{filename} [+ | -]@var{size}
+@item resize [--shrink] [--preallocation=@var{prealloc}] @var{filename} [+ | 
-]@var{size}
 
 Change the disk image as if it had been created with @var{size}.
 
@@ -537,6 +537,10 @@ Before using this command to shrink a disk image, you MUST 
use file system and
 partitioning tools inside the VM to reduce allocated file systems and partition
 sizes accordingly.  Failure to do so will result in data loss!
 
+When shrinking images, the @code{--shrink} option must be given. This informs
+qemu-img that the user acknowledges all loss of data beyond the truncated
+image's end.
+
 After using this command to grow a disk image, you must use file system and
 partitioning tools inside the VM to actually begin using the new space on the
 device.
diff --git a/tests/qemu-iotests/102 b/tests/qemu-iotests/102
index 87db1bb1bf..d7ad8d9840 100755
--- a/tests/qemu-iotests/102
+++ b/tests/qemu-iotests/102
@@ -54,7 +54,7 @@ _make_test_img $IMG_SIZE
 $QEMU_IO -c 'write 0 64k' "$TEST_IMG" | _filter_qemu_io
 # Remove data cluster from image (first cluster: image header, second: 
reftable,
 # third: refblock, fourth: L1 table, fifth: L2 tab

[Qemu-devel] [PATCH v6 0/4] Add shrink image for qcow2

2017-07-14 Thread Pavel Butsykin
This patch add shrinking of the image file for qcow2. As a result, this allows
us to reduce the virtual image size and free up space on the disk without
copying the image. Image can be fragmented and shrink is done by punching holes
in the image file.

# ./qemu-img create -f qcow2 image.qcow2 4G
Formatting 'image.qcow2', fmt=qcow2 size=4294967296 encryption=off 
cluster_size=65536 lazy_refcounts=off refcount_bits=16

# ./qemu-io -c "write -P 0x22 0 1G" image.qcow2
wrote 1073741824/1073741824 bytes at offset 0
1 GiB, 1 ops; 0:00:04.59 (222.886 MiB/sec and 0.2177 ops/sec)

# ./qemu-img resize image.qcow2 512M
warning: qemu-img: Shrinking an image will delete all data beyond the shrunken 
image's end. Before performing such an operation, make sure there is no 
important data there.
error: qemu-img: Use the --shrink option to perform a shrink operation.

# ./qemu-img resize --shrink image.qcow2 128M
Image resized.

# ./qemu-img info image.qcow2
image: image.qcow2
file format: qcow2
virtual size: 128M (134217728 bytes)
disk size: 128M
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false

# du -h image.qcow2
129Mimage.qcow2

Changes from v1:
- add --shrink flag for qemu-img resize
- add qcow2_cache_discard
- simplify qcow2_shrink_l1_table() to reduce the likelihood of image corruption
- add new qemu-iotests for shrinking images

Changes from v2:
- replace qprintf() on error_report() (1)
- rewrite warning messages (1)
- enforce --shrink flag for all formats except raw (1)
- split qcow2_cache_discard() (2)
- minor fixes according to comments (3)
- rewrite the last part of qcow2_shrink_reftable() to avoid
  qcow2_free_clusters() calls inside (3)
- improve test for shrinking image (4)

Changes from v3:
- rebase on "Implement a warning_report function" Alistair's patch-set (1)
- spelling fixes (1)
- the man page fix according to the discussion (1)
- add call qcow2_signal_corruption() in case of image corruption (3)

Changes from v4:
- rebase on https://github.com/XanClic/qemu/commits/block Max's block branch

Changes from v5:
- the condition refcount == 0 should be enough to evict the l2/refcount cluster
  from the cache (2)
- overwrite the l1/refcount table in memory with zeros, even if overwriting the
  l1/refcount table on disk has failed (3)
- replace g_try_malloc() on g_malloc() for allocation reftable_tmp (3)

Pavel Butsykin (4):
  qemu-img: add --shrink flag for resize
  qcow2: add qcow2_cache_discard
  qcow2: add shrink image support
  qemu-iotests: add shrinking image test

 block/qcow2-cache.c|  26 +++
 block/qcow2-cluster.c  |  50 +
 block/qcow2-refcount.c | 140 -
 block/qcow2.c  |  43 +---
 block/qcow2.h  |  17 +
 qapi/block-core.json   |   3 +-
 qemu-img-cmds.hx   |   4 +-
 qemu-img.c |  23 ++
 qemu-img.texi  |   6 +-
 tests/qemu-iotests/102 |   4 +-
 tests/qemu-iotests/163 | 170 +
 tests/qemu-iotests/163.out |   5 ++
 tests/qemu-iotests/group   |   1 +
 13 files changed, 475 insertions(+), 17 deletions(-)
 create mode 100644 tests/qemu-iotests/163
 create mode 100644 tests/qemu-iotests/163.out

-- 
2.13.0




[Qemu-devel] [PATCH v6 3/4] qcow2: add shrink image support

2017-07-14 Thread Pavel Butsykin
This patch add shrinking of the image file for qcow2. As a result, this allows
us to reduce the virtual image size and free up space on the disk without
copying the image. Image can be fragmented and shrink is done by punching holes
in the image file.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Max Reitz 
---
 block/qcow2-cluster.c  |  50 +
 block/qcow2-refcount.c | 120 +
 block/qcow2.c  |  43 ++
 block/qcow2.h  |  14 ++
 qapi/block-core.json   |   3 +-
 5 files changed, 220 insertions(+), 10 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index f06c08f64c..405bc2e7af 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -32,6 +32,56 @@
 #include "qemu/bswap.h"
 #include "trace.h"
 
+int qcow2_shrink_l1_table(BlockDriverState *bs, uint64_t exact_size)
+{
+BDRVQcow2State *s = bs->opaque;
+int new_l1_size, i, ret;
+
+if (exact_size >= s->l1_size) {
+return 0;
+}
+
+new_l1_size = exact_size;
+
+#ifdef DEBUG_ALLOC2
+fprintf(stderr, "shrink l1_table from %d to %d\n", s->l1_size, 
new_l1_size);
+#endif
+
+BLKDBG_EVENT(bs->file, BLKDBG_L1_SHRINK_WRITE_TABLE);
+ret = bdrv_pwrite_zeroes(bs->file, s->l1_table_offset +
+   new_l1_size * sizeof(uint64_t),
+ (s->l1_size - new_l1_size) * sizeof(uint64_t), 0);
+if (ret < 0) {
+goto fail;
+}
+
+ret = bdrv_flush(bs->file->bs);
+if (ret < 0) {
+goto fail;
+}
+
+BLKDBG_EVENT(bs->file, BLKDBG_L1_SHRINK_FREE_L2_CLUSTERS);
+for (i = s->l1_size - 1; i > new_l1_size - 1; i--) {
+if ((s->l1_table[i] & L1E_OFFSET_MASK) == 0) {
+continue;
+}
+qcow2_free_clusters(bs, s->l1_table[i] & L1E_OFFSET_MASK,
+s->cluster_size, QCOW2_DISCARD_ALWAYS);
+s->l1_table[i] = 0;
+}
+return 0;
+
+fail:
+/*
+ * If the write in the l1_table failed the image may contain partially
+ * overwritten the l1_table. In this case would be better to clear the
+ * l1_table in memory to avoid possible image corruption.
+ */
+memset(s->l1_table + exact_size, 0,
+   (s->l1_size - new_l1_size) * sizeof(uint64_t));
+return ret;
+}
+
 int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
 bool exact_size)
 {
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index bbe5a2b2cc..6f7c3132c6 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -29,6 +29,7 @@
 #include "block/qcow2.h"
 #include "qemu/range.h"
 #include "qemu/bswap.h"
+#include "qemu/cutils.h"
 
 static int64_t alloc_clusters_noref(BlockDriverState *bs, uint64_t size);
 static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
@@ -3061,3 +3062,122 @@ done:
 qemu_vfree(new_refblock);
 return ret;
 }
+
+static int qcow2_discard_refcount_block(BlockDriverState *bs,
+uint64_t discard_block_offs)
+{
+BDRVQcow2State *s = bs->opaque;
+uint64_t refblock_offs = get_refblock_offset(s, discard_block_offs);
+uint64_t cluster_index = discard_block_offs >> s->cluster_bits;
+uint32_t block_index = cluster_index & (s->refcount_block_size - 1);
+void *refblock;
+int ret;
+
+assert(discard_block_offs != 0);
+
+ret = qcow2_cache_get(bs, s->refcount_block_cache, refblock_offs,
+  &refblock);
+if (ret < 0) {
+return ret;
+}
+
+if (s->get_refcount(refblock, block_index) != 1) {
+qcow2_signal_corruption(bs, true, -1, -1, "Invalid refcount:"
+" refblock offset %#" PRIx64
+", reftable index %u"
+", block offset %#" PRIx64
+", refcount %#" PRIx64,
+refblock_offs,
+offset_to_reftable_index(s, 
discard_block_offs),
+discard_block_offs,
+s->get_refcount(refblock, block_index));
+qcow2_cache_put(bs, s->refcount_block_cache, &refblock);
+return -EINVAL;
+}
+s->set_refcount(refblock, block_index, 0);
+
+qcow2_cache_entry_mark_dirty(bs, s->refcount_block_cache, refblock);
+
+qcow2_cache_put(bs, s->refcount_block_cache, &refblock);
+
+if (cluster_index < s->free_cluster_index) {
+s->free_cluster_index = cluster_index;
+}
+
+refblock = qcow2_cache_is_table_offset(bs, s->refcount_block_cache,
+

[Qemu-devel] [PATCH v6 4/4] qemu-iotests: add shrinking image test

2017-07-14 Thread Pavel Butsykin
Signed-off-by: Pavel Butsykin 
Reviewed-by: Max Reitz 
---
 tests/qemu-iotests/163 | 170 +
 tests/qemu-iotests/163.out |   5 ++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 176 insertions(+)
 create mode 100644 tests/qemu-iotests/163
 create mode 100644 tests/qemu-iotests/163.out

diff --git a/tests/qemu-iotests/163 b/tests/qemu-iotests/163
new file mode 100644
index 00..403842354e
--- /dev/null
+++ b/tests/qemu-iotests/163
@@ -0,0 +1,170 @@
+#!/usr/bin/env python
+#
+# Tests for shrinking images
+#
+# Copyright (c) 2016-2017 Parallels International GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import os, random, iotests, struct, qcow2
+from iotests import qemu_img, qemu_io, image_size
+
+test_img = os.path.join(iotests.test_dir, 'test.img')
+check_img = os.path.join(iotests.test_dir, 'check.img')
+
+def size_to_int(str):
+suff = ['B', 'K', 'M', 'G', 'T']
+return int(str[:-1]) * 1024**suff.index(str[-1:])
+
+class ShrinkBaseClass(iotests.QMPTestCase):
+image_len = '128M'
+shrink_size = '10M'
+chunk_size = '16M'
+refcount_bits = '16'
+
+def __qcow2_check(self, filename):
+entry_bits = 3
+entry_size = 1 << entry_bits
+l1_mask = 0x00fffe00
+div_roundup = lambda n, d: (n + d - 1) / d
+
+def split_by_n(data, n):
+for x in xrange(0, len(data), n):
+yield struct.unpack('>Q', data[x:x + n])[0] & l1_mask
+
+def check_l1_table(h, l1_data):
+l1_list = list(split_by_n(l1_data, entry_size))
+real_l1_size = div_roundup(h.size,
+   1 << (h.cluster_bits*2 - entry_size))
+used, unused = l1_list[:real_l1_size], l1_list[real_l1_size:]
+
+self.assertTrue(len(used) != 0, "Verifying l1 table content")
+self.assertFalse(any(unused), "Verifying l1 table content")
+
+def check_reftable(fd, h, reftable):
+for offset in split_by_n(reftable, entry_size):
+if offset != 0:
+fd.seek(offset)
+cluster = fd.read(1 << h.cluster_bits)
+self.assertTrue(any(cluster), "Verifying reftable content")
+
+with open(filename, "rb") as fd:
+h = qcow2.QcowHeader(fd)
+
+fd.seek(h.l1_table_offset)
+l1_table = fd.read(h.l1_size << entry_bits)
+
+fd.seek(h.refcount_table_offset)
+reftable = fd.read(h.refcount_table_clusters << h.cluster_bits)
+
+check_l1_table(h, l1_table)
+check_reftable(fd, h, reftable)
+
+def __raw_check(self, filename):
+pass
+
+image_check = {
+'qcow2' : __qcow2_check,
+'raw' : __raw_check
+}
+
+def setUp(self):
+if iotests.imgfmt == 'raw':
+qemu_img('create', '-f', iotests.imgfmt, test_img, self.image_len)
+qemu_img('create', '-f', iotests.imgfmt, check_img,
+ self.shrink_size)
+else:
+qemu_img('create', '-f', iotests.imgfmt,
+ '-o', 'cluster_size=' + self.cluster_size +
+ ',refcount_bits=' + self.refcount_bits,
+ test_img, self.image_len)
+qemu_img('create', '-f', iotests.imgfmt,
+ '-o', 'cluster_size=%s'% self.cluster_size,
+ check_img, self.shrink_size)
+qemu_io('-c', 'write -P 0xff 0 ' + self.shrink_size, check_img)
+
+def tearDown(self):
+os.remove(test_img)
+os.remove(check_img)
+
+def image_verify(self):
+self.assertEqual(image_size(test_img), image_size(check_img),
+ "Verifying image size")
+self.image_check[iotests.imgfmt](self, test_img)
+
+if iotests.imgfmt == 'raw':
+return
+self.assertEqual(qemu_img('check', test_img), 0,
+ 

[Qemu-devel] [PATCH v6 2/4] qcow2: add qcow2_cache_discard

2017-07-14 Thread Pavel Butsykin
Whenever l2/refcount table clusters are discarded from the file we can
automatically drop unnecessary content of the cache tables. This reduces
the chance of eviction useful cache data and eliminates inconsistent data
in the cache with the data in the file.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Max Reitz 
---
 block/qcow2-cache.c| 26 ++
 block/qcow2-refcount.c | 20 ++--
 block/qcow2.h  |  3 +++
 3 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index 1d25147392..75746a7f43 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -411,3 +411,29 @@ void qcow2_cache_entry_mark_dirty(BlockDriverState *bs, 
Qcow2Cache *c,
 assert(c->entries[i].offset != 0);
 c->entries[i].dirty = true;
 }
+
+void *qcow2_cache_is_table_offset(BlockDriverState *bs, Qcow2Cache *c,
+  uint64_t offset)
+{
+int i;
+
+for (i = 0; i < c->size; i++) {
+if (c->entries[i].offset == offset) {
+return qcow2_cache_get_table_addr(bs, c, i);
+}
+}
+return NULL;
+}
+
+void qcow2_cache_discard(BlockDriverState *bs, Qcow2Cache *c, void *table)
+{
+int i = qcow2_cache_get_table_idx(bs, c, table);
+
+assert(c->entries[i].ref == 0);
+
+c->entries[i].offset = 0;
+c->entries[i].lru_counter = 0;
+c->entries[i].dirty = false;
+
+qcow2_cache_table_release(bs, c, i, 1);
+}
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index c9b0dcb4f3..bbe5a2b2cc 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -861,8 +861,24 @@ static int QEMU_WARN_UNUSED_RESULT 
update_refcount(BlockDriverState *bs,
 }
 s->set_refcount(refcount_block, block_index, refcount);
 
-if (refcount == 0 && s->discard_passthrough[type]) {
-update_refcount_discard(bs, cluster_offset, s->cluster_size);
+if (refcount == 0) {
+void *table;
+
+table = qcow2_cache_is_table_offset(bs, s->refcount_block_cache,
+offset);
+if (table != NULL) {
+qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
+qcow2_cache_discard(bs, s->refcount_block_cache, table);
+}
+
+table = qcow2_cache_is_table_offset(bs, s->l2_table_cache, offset);
+if (table != NULL) {
+qcow2_cache_discard(bs, s->l2_table_cache, table);
+}
+
+if (s->discard_passthrough[type]) {
+update_refcount_discard(bs, cluster_offset, s->cluster_size);
+}
 }
 }
 
diff --git a/block/qcow2.h b/block/qcow2.h
index 96a8d43c17..52c374e9ed 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -649,6 +649,9 @@ int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, 
uint64_t offset,
 int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
 void **table);
 void qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
+void *qcow2_cache_is_table_offset(BlockDriverState *bs, Qcow2Cache *c,
+  uint64_t offset);
+void qcow2_cache_discard(BlockDriverState *bs, Qcow2Cache *c, void *table);
 
 /* qcow2-bitmap.c functions */
 int qcow2_check_bitmaps_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
-- 
2.13.0




Re: [Qemu-devel] [PATCH v5 3/4] qcow2: add shrink image support

2017-07-13 Thread Pavel Butsykin

On 13.07.2017 17:36, Max Reitz wrote:

On 2017-07-13 10:41, Kevin Wolf wrote:

Am 12.07.2017 um 18:58 hat Max Reitz geschrieben:

On 2017-07-12 16:52, Kevin Wolf wrote:

Am 12.07.2017 um 13:46 hat Pavel Butsykin geschrieben:

This patch add shrinking of the image file for qcow2. As a result, this allows
us to reduce the virtual image size and free up space on the disk without
copying the image. Image can be fragmented and shrink is done by punching holes
in the image file.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Max Reitz 
---
  block/qcow2-cluster.c  |  40 ++
  block/qcow2-refcount.c | 110 +
  block/qcow2.c  |  43 +++
  block/qcow2.h  |  14 +++
  qapi/block-core.json   |   3 +-
  5 files changed, 200 insertions(+), 10 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index f06c08f64c..518429c64b 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -32,6 +32,46 @@
  #include "qemu/bswap.h"
  #include "trace.h"
  
+int qcow2_shrink_l1_table(BlockDriverState *bs, uint64_t exact_size)

+{
+BDRVQcow2State *s = bs->opaque;
+int new_l1_size, i, ret;
+
+if (exact_size >= s->l1_size) {
+return 0;
+}
+
+new_l1_size = exact_size;
+
+#ifdef DEBUG_ALLOC2
+fprintf(stderr, "shrink l1_table from %d to %d\n", s->l1_size, 
new_l1_size);
+#endif
+
+BLKDBG_EVENT(bs->file, BLKDBG_L1_SHRINK_WRITE_TABLE);
+ret = bdrv_pwrite_zeroes(bs->file, s->l1_table_offset +
+   sizeof(uint64_t) * new_l1_size,
+ (s->l1_size - new_l1_size) * sizeof(uint64_t), 0);
+if (ret < 0) {
+return ret;
+}
+
+ret = bdrv_flush(bs->file->bs);
+if (ret < 0) {
+return ret;
+}


If we have an error here (or after a partial bdrv_pwrite_zeroes()), we
have entries zeroed out on disk, but in memory we still have the
original L1 table.

Should the in-memory L1 table be zeroed first? Then we can't
accidentally reuse stale entries, but would have to allocate new ones,
which would get on-disk state and in-memory state back in sync again.


Yes, I thought of the same.  But this implies that the allocation is
able to modify the L1 table, and I find that unlikely if that
bdrv_flush() failed already...

So I concluded not to have an opinion on which order is better.


Well, let me ask the other way round: Is there any disadvantage in first
zeroing the in-memory table and then writing to the image?


I was informed that the code would be harder to write. :-)


If I have a choice between "always safe" and "not completely safe, but I
think it's unlikely to happen", especially in image formats, then I will
certainly take the "always safe".


+BLKDBG_EVENT(bs->file, BLKDBG_L1_SHRINK_FREE_L2_CLUSTERS);
+for (i = s->l1_size - 1; i > new_l1_size - 1; i--) {
+if ((s->l1_table[i] & L1E_OFFSET_MASK) == 0) {
+continue;
+}
+qcow2_free_clusters(bs, s->l1_table[i] & L1E_OFFSET_MASK,
+s->cluster_size, QCOW2_DISCARD_ALWAYS);
+s->l1_table[i] = 0;
+}
+return 0;
+}
+
  int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
  bool exact_size)
  {


I haven't checked qcow2_shrink_reftable() for similar kinds of problems,
I hope Max has.


Well, it's exactly the same there.


Ok, so I'll object to this code without really having looked at it.

I won't object to your objection. O:-)


Kevin,

Can you help me to reduce the number of patch-set versions? :)
And look at the rest part of the series, thanks!


Max





Re: [Qemu-devel] [PATCH v5 3/4] qcow2: add shrink image support

2017-07-13 Thread Pavel Butsykin

On 13.07.2017 11:41, Kevin Wolf wrote:

Am 12.07.2017 um 18:58 hat Max Reitz geschrieben:

On 2017-07-12 16:52, Kevin Wolf wrote:

Am 12.07.2017 um 13:46 hat Pavel Butsykin geschrieben:

This patch add shrinking of the image file for qcow2. As a result, this allows
us to reduce the virtual image size and free up space on the disk without
copying the image. Image can be fragmented and shrink is done by punching holes
in the image file.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Max Reitz 
---
  block/qcow2-cluster.c  |  40 ++
  block/qcow2-refcount.c | 110 +
  block/qcow2.c  |  43 +++
  block/qcow2.h  |  14 +++
  qapi/block-core.json   |   3 +-
  5 files changed, 200 insertions(+), 10 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index f06c08f64c..518429c64b 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -32,6 +32,46 @@
  #include "qemu/bswap.h"
  #include "trace.h"
  
+int qcow2_shrink_l1_table(BlockDriverState *bs, uint64_t exact_size)

+{
+BDRVQcow2State *s = bs->opaque;
+int new_l1_size, i, ret;
+
+if (exact_size >= s->l1_size) {
+return 0;
+}
+
+new_l1_size = exact_size;
+
+#ifdef DEBUG_ALLOC2
+fprintf(stderr, "shrink l1_table from %d to %d\n", s->l1_size, 
new_l1_size);
+#endif
+
+BLKDBG_EVENT(bs->file, BLKDBG_L1_SHRINK_WRITE_TABLE);
+ret = bdrv_pwrite_zeroes(bs->file, s->l1_table_offset +
+   sizeof(uint64_t) * new_l1_size,
+ (s->l1_size - new_l1_size) * sizeof(uint64_t), 0);
+if (ret < 0) {
+return ret;
+}
+
+ret = bdrv_flush(bs->file->bs);
+if (ret < 0) {
+return ret;
+}


If we have an error here (or after a partial bdrv_pwrite_zeroes()), we
have entries zeroed out on disk, but in memory we still have the
original L1 table.

Should the in-memory L1 table be zeroed first? Then we can't
accidentally reuse stale entries, but would have to allocate new ones,
which would get on-disk state and in-memory state back in sync again.


Yes, I thought of the same.  But this implies that the allocation is
able to modify the L1 table, and I find that unlikely if that
bdrv_flush() failed already...

So I concluded not to have an opinion on which order is better.


Well, let me ask the other way round: Is there any disadvantage in first
zeroing the in-memory table and then writing to the image?


If bdrv_flush/drv_pwrite_zeroes function failed, the subsequent writes
to truncated area lead to allocation L2 tables. This implies two things:

1. We need call qcow2_free_clusters() after bdrv_flush/drv_pwrite_zeroes
anyway, otherwise it may lead to the situation when the l1 table will
have two identical offsets.

2. Old l2 blocks may be lost and will be dead weight for the image.


If I have a choice between "always safe" and "not completely safe, but I
think it's unlikely to happen", especially in image formats, then I will
certainly take the "always safe".


In my understanding both cases are "unsafe", because both cases may lead
to inconsistent state between image and memory. When writing this code I
was looking at an existing approach in qcow2*.c to such kind of issues:
...
static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
...
trace_qcow2_l2_allocate_write_l1(bs, l1_index);
s->l1_table[l1_index] = l2_offset | QCOW_OFLAG_COPIED;
ret = qcow2_write_l1_entry(bs, l1_index);
if (ret < 0) {
goto fail;
}

*table = l2_table;
trace_qcow2_l2_allocate_done(bs, l1_index, 0);
return 0;

fail:
trace_qcow2_l2_allocate_done(bs, l1_index, ret);
if (l2_table != NULL) {
qcow2_cache_put(bs, s->l2_table_cache, (void**) table);
}
s->l1_table[l1_index] = old_l2_offset;
if (l2_offset > 0) {
qcow2_free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t),
QCOW2_DISCARD_ALWAYS);
}
return ret;

Do I understand correctly? that if qcow2_write_l1_entr call fails
somewhere in the middle of bdrv_flush() then l1_table still contain the
old l2_offset, but the image may already contain the new l2_offset. We
can continue to use the image and write data, but in this case we lose
the data after reopening image.

So it seemed to me that in current conditions we can't escape from such
kind of problem completely. But I understand your desire to do better
even in little things, I agree that would be a little safer in the case
"first zeroing the in-memory table and then writing". So if this
solution suits all, let me write the next patch-set version :)


+BLKDBG_EVENT(bs->file, BLKDBG_L1_SHRINK_FREE_L2_CLUSTERS);
+for (i = s->l1_size - 1; i > new_l1_size - 1; i--)

Re: [Qemu-devel] [PATCH v5 2/4] qcow2: add qcow2_cache_discard

2017-07-13 Thread Pavel Butsykin

On 12.07.2017 17:45, Kevin Wolf wrote:

Am 12.07.2017 um 13:46 hat Pavel Butsykin geschrieben:

Whenever l2/refcount table clusters are discarded from the file we can
automatically drop unnecessary content of the cache tables. This reduces
the chance of eviction useful cache data and eliminates inconsistent data
in the cache with the data in the file.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Max Reitz 
---
  block/qcow2-cache.c| 26 ++
  block/qcow2-refcount.c | 14 ++
  block/qcow2.h  |  3 +++
  3 files changed, 43 insertions(+)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index 1d25147392..75746a7f43 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -411,3 +411,29 @@ void qcow2_cache_entry_mark_dirty(BlockDriverState *bs, 
Qcow2Cache *c,
  assert(c->entries[i].offset != 0);
  c->entries[i].dirty = true;
  }
+
+void *qcow2_cache_is_table_offset(BlockDriverState *bs, Qcow2Cache *c,
+  uint64_t offset)
+{
+int i;
+
+for (i = 0; i < c->size; i++) {
+if (c->entries[i].offset == offset) {
+return qcow2_cache_get_table_addr(bs, c, i);
+}
+}
+return NULL;
+}
+
+void qcow2_cache_discard(BlockDriverState *bs, Qcow2Cache *c, void *table)
+{
+int i = qcow2_cache_get_table_idx(bs, c, table);
+
+assert(c->entries[i].ref == 0);
+
+c->entries[i].offset = 0;
+c->entries[i].lru_counter = 0;
+c->entries[i].dirty = false;
+
+qcow2_cache_table_release(bs, c, i, 1);
+}
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index c9b0dcb4f3..8050db4544 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -862,6 +862,20 @@ static int QEMU_WARN_UNUSED_RESULT 
update_refcount(BlockDriverState *bs,
  s->set_refcount(refcount_block, block_index, refcount);
  
  if (refcount == 0 && s->discard_passthrough[type]) {

+void *table;
+
+table = qcow2_cache_is_table_offset(bs, s->refcount_block_cache,
+offset);
+if (table != NULL) {
+qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
+qcow2_cache_discard(bs, s->refcount_block_cache, table);
+}
+
+table = qcow2_cache_is_table_offset(bs, s->l2_table_cache, offset);
+if (table != NULL) {
+qcow2_cache_discard(bs, s->l2_table_cache, table);
+}
+
  update_refcount_discard(bs, cluster_offset, s->cluster_size);
  }
  }


This is not wrong, but wouldn't actually even refcount == 0 be enough as
a condition to evict the cluster from the cache? I don't think it really
matters whether we actually send a discard request or not for the image
file.


Yes, you're right. Will fix.


Kevin





[Qemu-devel] [PATCH v5 1/4] qemu-img: add --shrink flag for resize

2017-07-12 Thread Pavel Butsykin
The flag is additional precaution against data loss. Perhaps in the future the
operation shrink without this flag will be blocked for all formats, but for now
we need to maintain compatibility with raw.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Max Reitz 
---
 qemu-img-cmds.hx   |  4 ++--
 qemu-img.c | 23 +++
 qemu-img.texi  |  6 +-
 tests/qemu-iotests/102 |  4 ++--
 4 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index ac5946bc4f..e36957a2ca 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -82,9 +82,9 @@ STEXI
 ETEXI
 
 DEF("resize", img_resize,
-"resize [--object objectdef] [--image-opts] [-q] filename [+ | -]size")
+"resize [--object objectdef] [--image-opts] [-q] [--shrink] filename [+ | 
-]size")
 STEXI
-@item resize [--object @var{objectdef}] [--image-opts] [-q] @var{filename} [+ 
| -]@var{size}
+@item resize [--object @var{objectdef}] [--image-opts] [-q] [--shrink] 
@var{filename} [+ | -]@var{size}
 ETEXI
 
 DEF("amend", img_amend,
diff --git a/qemu-img.c b/qemu-img.c
index 28022145d5..b4dc4bb5c4 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -64,6 +64,7 @@ enum {
 OPTION_TARGET_IMAGE_OPTS = 263,
 OPTION_SIZE = 264,
 OPTION_PREALLOCATION = 265,
+OPTION_SHRINK = 266,
 };
 
 typedef enum OutputFormat {
@@ -3430,6 +3431,7 @@ static int img_resize(int argc, char **argv)
 },
 };
 bool image_opts = false;
+bool shrink = false;
 
 /* Remove size from argv manually so that negative numbers are not treated
  * as options by getopt. */
@@ -3448,6 +3450,7 @@ static int img_resize(int argc, char **argv)
 {"object", required_argument, 0, OPTION_OBJECT},
 {"image-opts", no_argument, 0, OPTION_IMAGE_OPTS},
 {"preallocation", required_argument, 0, OPTION_PREALLOCATION},
+{"shrink", no_argument, 0, OPTION_SHRINK},
 {0, 0, 0, 0}
 };
 c = getopt_long(argc, argv, ":f:hq",
@@ -3491,6 +3494,9 @@ static int img_resize(int argc, char **argv)
 return 1;
 }
 break;
+case OPTION_SHRINK:
+shrink = true;
+break;
 }
 }
 if (optind != argc - 1) {
@@ -3564,6 +3570,23 @@ static int img_resize(int argc, char **argv)
 goto out;
 }
 
+if (total_size < current_size && !shrink) {
+warn_report("Shrinking an image will delete all data beyond the "
+"shrunken image's end. Before performing such an "
+"operation, make sure there is no important data there.");
+
+if (g_strcmp0(bdrv_get_format_name(blk_bs(blk)), "raw") != 0) {
+error_report(
+  "Use the --shrink option to perform a shrink operation.");
+ret = -1;
+goto out;
+} else {
+warn_report("Using the --shrink option will suppress this message."
+"Note that future versions of qemu-img may refuse to "
+"shrink images without this option.");
+}
+}
+
 ret = blk_truncate(blk, total_size, prealloc, &err);
 if (!ret) {
 qprintf(quiet, "Image resized.\n");
diff --git a/qemu-img.texi b/qemu-img.texi
index f11f6036ad..9a930f5e6d 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -529,7 +529,7 @@ qemu-img rebase -b base.img diff.qcow2
 At this point, @code{modified.img} can be discarded, since
 @code{base.img + diff.qcow2} contains the same information.
 
-@item resize [--preallocation=@var{prealloc}] @var{filename} [+ | -]@var{size}
+@item resize [--shrink] [--preallocation=@var{prealloc}] @var{filename} [+ | 
-]@var{size}
 
 Change the disk image as if it had been created with @var{size}.
 
@@ -537,6 +537,10 @@ Before using this command to shrink a disk image, you MUST 
use file system and
 partitioning tools inside the VM to reduce allocated file systems and partition
 sizes accordingly.  Failure to do so will result in data loss!
 
+When shrinking images, the @code{--shrink} option must be given. This informs
+qemu-img that the user acknowledges all loss of data beyond the truncated
+image's end.
+
 After using this command to grow a disk image, you must use file system and
 partitioning tools inside the VM to actually begin using the new space on the
 device.
diff --git a/tests/qemu-iotests/102 b/tests/qemu-iotests/102
index 87db1bb1bf..d7ad8d9840 100755
--- a/tests/qemu-iotests/102
+++ b/tests/qemu-iotests/102
@@ -54,7 +54,7 @@ _make_test_img $IMG_SIZE
 $QEMU_IO -c 'write 0 64k' "$TEST_IMG" | _filter_qemu_io
 # Remove data cluster from image (first cluster: image header, second: 
reftable,
 # third: refblock, fourth: L1 table, fifth: L2 tab

[Qemu-devel] [PATCH v5 4/4] qemu-iotests: add shrinking image test

2017-07-12 Thread Pavel Butsykin
Signed-off-by: Pavel Butsykin 
Reviewed-by: Max Reitz 
---
 tests/qemu-iotests/163 | 170 +
 tests/qemu-iotests/163.out |   5 ++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 176 insertions(+)
 create mode 100644 tests/qemu-iotests/163
 create mode 100644 tests/qemu-iotests/163.out

diff --git a/tests/qemu-iotests/163 b/tests/qemu-iotests/163
new file mode 100644
index 00..403842354e
--- /dev/null
+++ b/tests/qemu-iotests/163
@@ -0,0 +1,170 @@
+#!/usr/bin/env python
+#
+# Tests for shrinking images
+#
+# Copyright (c) 2016-2017 Parallels International GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import os, random, iotests, struct, qcow2
+from iotests import qemu_img, qemu_io, image_size
+
+test_img = os.path.join(iotests.test_dir, 'test.img')
+check_img = os.path.join(iotests.test_dir, 'check.img')
+
+def size_to_int(str):
+suff = ['B', 'K', 'M', 'G', 'T']
+return int(str[:-1]) * 1024**suff.index(str[-1:])
+
+class ShrinkBaseClass(iotests.QMPTestCase):
+image_len = '128M'
+shrink_size = '10M'
+chunk_size = '16M'
+refcount_bits = '16'
+
+def __qcow2_check(self, filename):
+entry_bits = 3
+entry_size = 1 << entry_bits
+l1_mask = 0x00fffe00
+div_roundup = lambda n, d: (n + d - 1) / d
+
+def split_by_n(data, n):
+for x in xrange(0, len(data), n):
+yield struct.unpack('>Q', data[x:x + n])[0] & l1_mask
+
+def check_l1_table(h, l1_data):
+l1_list = list(split_by_n(l1_data, entry_size))
+real_l1_size = div_roundup(h.size,
+   1 << (h.cluster_bits*2 - entry_size))
+used, unused = l1_list[:real_l1_size], l1_list[real_l1_size:]
+
+self.assertTrue(len(used) != 0, "Verifying l1 table content")
+self.assertFalse(any(unused), "Verifying l1 table content")
+
+def check_reftable(fd, h, reftable):
+for offset in split_by_n(reftable, entry_size):
+if offset != 0:
+fd.seek(offset)
+cluster = fd.read(1 << h.cluster_bits)
+self.assertTrue(any(cluster), "Verifying reftable content")
+
+with open(filename, "rb") as fd:
+h = qcow2.QcowHeader(fd)
+
+fd.seek(h.l1_table_offset)
+l1_table = fd.read(h.l1_size << entry_bits)
+
+fd.seek(h.refcount_table_offset)
+reftable = fd.read(h.refcount_table_clusters << h.cluster_bits)
+
+check_l1_table(h, l1_table)
+check_reftable(fd, h, reftable)
+
+def __raw_check(self, filename):
+pass
+
+image_check = {
+'qcow2' : __qcow2_check,
+'raw' : __raw_check
+}
+
+def setUp(self):
+if iotests.imgfmt == 'raw':
+qemu_img('create', '-f', iotests.imgfmt, test_img, self.image_len)
+qemu_img('create', '-f', iotests.imgfmt, check_img,
+ self.shrink_size)
+else:
+qemu_img('create', '-f', iotests.imgfmt,
+ '-o', 'cluster_size=' + self.cluster_size +
+ ',refcount_bits=' + self.refcount_bits,
+ test_img, self.image_len)
+qemu_img('create', '-f', iotests.imgfmt,
+ '-o', 'cluster_size=%s'% self.cluster_size,
+ check_img, self.shrink_size)
+qemu_io('-c', 'write -P 0xff 0 ' + self.shrink_size, check_img)
+
+def tearDown(self):
+os.remove(test_img)
+os.remove(check_img)
+
+def image_verify(self):
+self.assertEqual(image_size(test_img), image_size(check_img),
+ "Verifying image size")
+self.image_check[iotests.imgfmt](self, test_img)
+
+if iotests.imgfmt == 'raw':
+return
+self.assertEqual(qemu_img('check', test_img), 0,
+ 

[Qemu-devel] [PATCH v5 3/4] qcow2: add shrink image support

2017-07-12 Thread Pavel Butsykin
This patch add shrinking of the image file for qcow2. As a result, this allows
us to reduce the virtual image size and free up space on the disk without
copying the image. Image can be fragmented and shrink is done by punching holes
in the image file.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Max Reitz 
---
 block/qcow2-cluster.c  |  40 ++
 block/qcow2-refcount.c | 110 +
 block/qcow2.c  |  43 +++
 block/qcow2.h  |  14 +++
 qapi/block-core.json   |   3 +-
 5 files changed, 200 insertions(+), 10 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index f06c08f64c..518429c64b 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -32,6 +32,46 @@
 #include "qemu/bswap.h"
 #include "trace.h"
 
+int qcow2_shrink_l1_table(BlockDriverState *bs, uint64_t exact_size)
+{
+BDRVQcow2State *s = bs->opaque;
+int new_l1_size, i, ret;
+
+if (exact_size >= s->l1_size) {
+return 0;
+}
+
+new_l1_size = exact_size;
+
+#ifdef DEBUG_ALLOC2
+fprintf(stderr, "shrink l1_table from %d to %d\n", s->l1_size, 
new_l1_size);
+#endif
+
+BLKDBG_EVENT(bs->file, BLKDBG_L1_SHRINK_WRITE_TABLE);
+ret = bdrv_pwrite_zeroes(bs->file, s->l1_table_offset +
+   sizeof(uint64_t) * new_l1_size,
+ (s->l1_size - new_l1_size) * sizeof(uint64_t), 0);
+if (ret < 0) {
+return ret;
+}
+
+ret = bdrv_flush(bs->file->bs);
+if (ret < 0) {
+return ret;
+}
+
+BLKDBG_EVENT(bs->file, BLKDBG_L1_SHRINK_FREE_L2_CLUSTERS);
+for (i = s->l1_size - 1; i > new_l1_size - 1; i--) {
+if ((s->l1_table[i] & L1E_OFFSET_MASK) == 0) {
+continue;
+}
+qcow2_free_clusters(bs, s->l1_table[i] & L1E_OFFSET_MASK,
+s->cluster_size, QCOW2_DISCARD_ALWAYS);
+s->l1_table[i] = 0;
+}
+return 0;
+}
+
 int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
 bool exact_size)
 {
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 8050db4544..5c8d606d29 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -29,6 +29,7 @@
 #include "block/qcow2.h"
 #include "qemu/range.h"
 #include "qemu/bswap.h"
+#include "qemu/cutils.h"
 
 static int64_t alloc_clusters_noref(BlockDriverState *bs, uint64_t size);
 static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
@@ -3059,3 +3060,112 @@ done:
 qemu_vfree(new_refblock);
 return ret;
 }
+
+int qcow2_shrink_reftable(BlockDriverState *bs)
+{
+BDRVQcow2State *s = bs->opaque;
+uint64_t *reftable_tmp =
+g_try_malloc(sizeof(uint64_t) * s->refcount_table_size);
+int i, ret;
+
+if (s->refcount_table_size && reftable_tmp == NULL) {
+return -ENOMEM;
+}
+
+for (i = 0; i < s->refcount_table_size; i++) {
+int64_t refblock_offs = s->refcount_table[i] & REFT_OFFSET_MASK;
+void *refblock;
+bool unused_block;
+
+if (refblock_offs == 0) {
+reftable_tmp[i] = 0;
+continue;
+}
+ret = qcow2_cache_get(bs, s->refcount_block_cache, refblock_offs,
+  &refblock);
+if (ret < 0) {
+goto out;
+}
+
+/* the refblock has own reference */
+if (i == offset_to_reftable_index(s, refblock_offs)) {
+uint64_t block_index = (refblock_offs >> s->cluster_bits) &
+   (s->refcount_block_size - 1);
+uint64_t refcount = s->get_refcount(refblock, block_index);
+
+s->set_refcount(refblock, block_index, 0);
+
+unused_block = buffer_is_zero(refblock, s->cluster_size);
+
+s->set_refcount(refblock, block_index, refcount);
+} else {
+unused_block = buffer_is_zero(refblock, s->cluster_size);
+}
+qcow2_cache_put(bs, s->refcount_block_cache, &refblock);
+
+reftable_tmp[i] = unused_block ? 0 : cpu_to_be64(s->refcount_table[i]);
+}
+
+ret = bdrv_pwrite_sync(bs->file, s->refcount_table_offset, reftable_tmp,
+   sizeof(uint64_t) * s->refcount_table_size);
+if (ret < 0) {
+goto out;
+}
+
+for (i = 0; i < s->refcount_table_size; i++) {
+if (s->refcount_table[i] && !reftable_tmp[i]) {
+uint64_t discard_offs = s->refcount_table[i] & REFT_OFFSET_MASK;
+uint64_t refblock_offs = get_refblock_offset(s, discard_offs);
+uint64_t cluster_index = discard_offs >> s->cluster_bits;
+uint32_t block_index = cluster_i

[Qemu-devel] [PATCH v5 2/4] qcow2: add qcow2_cache_discard

2017-07-12 Thread Pavel Butsykin
Whenever l2/refcount table clusters are discarded from the file we can
automatically drop unnecessary content of the cache tables. This reduces
the chance of eviction useful cache data and eliminates inconsistent data
in the cache with the data in the file.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Max Reitz 
---
 block/qcow2-cache.c| 26 ++
 block/qcow2-refcount.c | 14 ++
 block/qcow2.h  |  3 +++
 3 files changed, 43 insertions(+)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index 1d25147392..75746a7f43 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -411,3 +411,29 @@ void qcow2_cache_entry_mark_dirty(BlockDriverState *bs, 
Qcow2Cache *c,
 assert(c->entries[i].offset != 0);
 c->entries[i].dirty = true;
 }
+
+void *qcow2_cache_is_table_offset(BlockDriverState *bs, Qcow2Cache *c,
+  uint64_t offset)
+{
+int i;
+
+for (i = 0; i < c->size; i++) {
+if (c->entries[i].offset == offset) {
+return qcow2_cache_get_table_addr(bs, c, i);
+}
+}
+return NULL;
+}
+
+void qcow2_cache_discard(BlockDriverState *bs, Qcow2Cache *c, void *table)
+{
+int i = qcow2_cache_get_table_idx(bs, c, table);
+
+assert(c->entries[i].ref == 0);
+
+c->entries[i].offset = 0;
+c->entries[i].lru_counter = 0;
+c->entries[i].dirty = false;
+
+qcow2_cache_table_release(bs, c, i, 1);
+}
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index c9b0dcb4f3..8050db4544 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -862,6 +862,20 @@ static int QEMU_WARN_UNUSED_RESULT 
update_refcount(BlockDriverState *bs,
 s->set_refcount(refcount_block, block_index, refcount);
 
 if (refcount == 0 && s->discard_passthrough[type]) {
+void *table;
+
+table = qcow2_cache_is_table_offset(bs, s->refcount_block_cache,
+offset);
+if (table != NULL) {
+qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
+qcow2_cache_discard(bs, s->refcount_block_cache, table);
+}
+
+table = qcow2_cache_is_table_offset(bs, s->l2_table_cache, offset);
+if (table != NULL) {
+qcow2_cache_discard(bs, s->l2_table_cache, table);
+}
+
 update_refcount_discard(bs, cluster_offset, s->cluster_size);
 }
 }
diff --git a/block/qcow2.h b/block/qcow2.h
index 96a8d43c17..52c374e9ed 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -649,6 +649,9 @@ int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, 
uint64_t offset,
 int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
 void **table);
 void qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
+void *qcow2_cache_is_table_offset(BlockDriverState *bs, Qcow2Cache *c,
+  uint64_t offset);
+void qcow2_cache_discard(BlockDriverState *bs, Qcow2Cache *c, void *table);
 
 /* qcow2-bitmap.c functions */
 int qcow2_check_bitmaps_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
-- 
2.13.0




[Qemu-devel] [PATCH v5 0/4] Add shrink image for qcow2

2017-07-12 Thread Pavel Butsykin
This patch add shrinking of the image file for qcow2. As a result, this allows
us to reduce the virtual image size and free up space on the disk without
copying the image. Image can be fragmented and shrink is done by punching holes
in the image file.

# ./qemu-img create -f qcow2 image.qcow2 4G
Formatting 'image.qcow2', fmt=qcow2 size=4294967296 encryption=off 
cluster_size=65536 lazy_refcounts=off refcount_bits=16

# ./qemu-io -c "write -P 0x22 0 1G" image.qcow2
wrote 1073741824/1073741824 bytes at offset 0
1 GiB, 1 ops; 0:00:04.59 (222.886 MiB/sec and 0.2177 ops/sec)

# ./qemu-img resize image.qcow2 512M
warning: qemu-img: Shrinking an image will delete all data beyond the shrunken 
image's end. Before performing such an operation, make sure there is no 
important data there.
error: qemu-img: Use the --shrink option to perform a shrink operation.

# ./qemu-img resize --shrink image.qcow2 128M
Image resized.

# ./qemu-img info image.qcow2
image: image.qcow2
file format: qcow2
virtual size: 128M (134217728 bytes)
disk size: 128M
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false

# du -h image.qcow2 
129Mimage.qcow2

Changes from v1:
- add --shrink flag for qemu-img resize
- add qcow2_cache_discard
- simplify qcow2_shrink_l1_table() to reduce the likelihood of image corruption
- add new qemu-iotests for shrinking images

Changes from v2:
- replace qprintf() on error_report() (1)
- rewrite warning messages (1)
- enforce --shrink flag for all formats except raw (1)
- split qcow2_cache_discard() (2)
- minor fixes according to comments (3)
- rewrite the last part of qcow2_shrink_reftable() to avoid
  qcow2_free_clusters() calls inside (3)
- improve test for shrinking image (4)

Changes from v3:
- rebase on "Implement a warning_report function" Alistair's patch-set (1)
- spelling fixes (1)
- the man page fix according to the discussion (1)
- add call qcow2_signal_corruption() in case of image corruption (3)

Changes from v4:
- rebase on https://github.com/XanClic/qemu/commits/block Max's block branch

Pavel Butsykin (4):
  qemu-img: add --shrink flag for resize
  qcow2: add qcow2_cache_discard
  qcow2: add shrink image support
  qemu-iotests: add shrinking image test

 block/qcow2-cache.c|  26 +++
 block/qcow2-cluster.c  |  40 +++
 block/qcow2-refcount.c | 124 +
 block/qcow2.c  |  43 +---
 block/qcow2.h  |  17 +
 qapi/block-core.json   |   3 +-
 qemu-img-cmds.hx   |   4 +-
 qemu-img.c |  23 ++
 qemu-img.texi  |   6 +-
 tests/qemu-iotests/102 |   4 +-
 tests/qemu-iotests/163 | 170 +
 tests/qemu-iotests/163.out |   5 ++
 tests/qemu-iotests/group   |   1 +
 13 files changed, 451 insertions(+), 15 deletions(-)
 create mode 100644 tests/qemu-iotests/163
 create mode 100644 tests/qemu-iotests/163.out

-- 
2.13.0




Re: [Qemu-devel] [PATCH v4 0/4] Add shrink image for qcow2

2017-07-11 Thread Pavel Butsykin

On 11.07.2017 16:14, no-re...@patchew.org wrote:

Hi,

This series failed build test on s390x host. Please find the details below.

Type: series
Subject: [Qemu-devel] [PATCH v4 0/4] Add shrink image for qcow2
Message-id: 20170711124024.1396-1-pbutsy...@virtuozzo.com



Because this series is rebased on the pending "Implement a
warning_report function" patch-set.

Can I exclude this check? (If I know in advance that the test will fail
by the rebase reason)



[Qemu-devel] [PATCH v4 4/4] qemu-iotests: add shrinking image test

2017-07-11 Thread Pavel Butsykin
Signed-off-by: Pavel Butsykin 
---
 tests/qemu-iotests/163 | 170 +
 tests/qemu-iotests/163.out |   5 ++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 176 insertions(+)
 create mode 100644 tests/qemu-iotests/163
 create mode 100644 tests/qemu-iotests/163.out

diff --git a/tests/qemu-iotests/163 b/tests/qemu-iotests/163
new file mode 100644
index 00..403842354e
--- /dev/null
+++ b/tests/qemu-iotests/163
@@ -0,0 +1,170 @@
+#!/usr/bin/env python
+#
+# Tests for shrinking images
+#
+# Copyright (c) 2016-2017 Parallels International GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import os, random, iotests, struct, qcow2
+from iotests import qemu_img, qemu_io, image_size
+
+test_img = os.path.join(iotests.test_dir, 'test.img')
+check_img = os.path.join(iotests.test_dir, 'check.img')
+
+def size_to_int(str):
+suff = ['B', 'K', 'M', 'G', 'T']
+return int(str[:-1]) * 1024**suff.index(str[-1:])
+
+class ShrinkBaseClass(iotests.QMPTestCase):
+image_len = '128M'
+shrink_size = '10M'
+chunk_size = '16M'
+refcount_bits = '16'
+
+def __qcow2_check(self, filename):
+entry_bits = 3
+entry_size = 1 << entry_bits
+l1_mask = 0x00fffe00
+div_roundup = lambda n, d: (n + d - 1) / d
+
+def split_by_n(data, n):
+for x in xrange(0, len(data), n):
+yield struct.unpack('>Q', data[x:x + n])[0] & l1_mask
+
+def check_l1_table(h, l1_data):
+l1_list = list(split_by_n(l1_data, entry_size))
+real_l1_size = div_roundup(h.size,
+   1 << (h.cluster_bits*2 - entry_size))
+used, unused = l1_list[:real_l1_size], l1_list[real_l1_size:]
+
+self.assertTrue(len(used) != 0, "Verifying l1 table content")
+self.assertFalse(any(unused), "Verifying l1 table content")
+
+def check_reftable(fd, h, reftable):
+for offset in split_by_n(reftable, entry_size):
+if offset != 0:
+fd.seek(offset)
+cluster = fd.read(1 << h.cluster_bits)
+self.assertTrue(any(cluster), "Verifying reftable content")
+
+with open(filename, "rb") as fd:
+h = qcow2.QcowHeader(fd)
+
+fd.seek(h.l1_table_offset)
+l1_table = fd.read(h.l1_size << entry_bits)
+
+fd.seek(h.refcount_table_offset)
+reftable = fd.read(h.refcount_table_clusters << h.cluster_bits)
+
+check_l1_table(h, l1_table)
+check_reftable(fd, h, reftable)
+
+def __raw_check(self, filename):
+pass
+
+image_check = {
+'qcow2' : __qcow2_check,
+'raw' : __raw_check
+}
+
+def setUp(self):
+if iotests.imgfmt == 'raw':
+qemu_img('create', '-f', iotests.imgfmt, test_img, self.image_len)
+qemu_img('create', '-f', iotests.imgfmt, check_img,
+ self.shrink_size)
+else:
+qemu_img('create', '-f', iotests.imgfmt,
+ '-o', 'cluster_size=' + self.cluster_size +
+ ',refcount_bits=' + self.refcount_bits,
+ test_img, self.image_len)
+qemu_img('create', '-f', iotests.imgfmt,
+ '-o', 'cluster_size=%s'% self.cluster_size,
+ check_img, self.shrink_size)
+qemu_io('-c', 'write -P 0xff 0 ' + self.shrink_size, check_img)
+
+def tearDown(self):
+os.remove(test_img)
+os.remove(check_img)
+
+def image_verify(self):
+self.assertEqual(image_size(test_img), image_size(check_img),
+ "Verifying image size")
+self.image_check[iotests.imgfmt](self, test_img)
+
+if iotests.imgfmt == 'raw':
+return
+self.assertEqual(qemu_img('check', test_img), 0,
+ "V

[Qemu-devel] [PATCH v4 3/4] qcow2: add shrink image support

2017-07-11 Thread Pavel Butsykin
This patch add shrinking of the image file for qcow2. As a result, this allows
us to reduce the virtual image size and free up space on the disk without
copying the image. Image can be fragmented and shrink is done by punching holes
in the image file.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Max Reitz 
---
 block/qcow2-cluster.c  |  40 ++
 block/qcow2-refcount.c | 110 +
 block/qcow2.c  |  42 ++-
 block/qcow2.h  |  14 +++
 qapi/block-core.json   |   3 +-
 5 files changed, 198 insertions(+), 11 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 3d341fd9cb..04d48878af 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -32,6 +32,46 @@
 #include "qemu/bswap.h"
 #include "trace.h"
 
+int qcow2_shrink_l1_table(BlockDriverState *bs, uint64_t exact_size)
+{
+BDRVQcow2State *s = bs->opaque;
+int new_l1_size, i, ret;
+
+if (exact_size >= s->l1_size) {
+return 0;
+}
+
+new_l1_size = exact_size;
+
+#ifdef DEBUG_ALLOC2
+fprintf(stderr, "shrink l1_table from %d to %d\n", s->l1_size, 
new_l1_size);
+#endif
+
+BLKDBG_EVENT(bs->file, BLKDBG_L1_SHRINK_WRITE_TABLE);
+ret = bdrv_pwrite_zeroes(bs->file, s->l1_table_offset +
+   sizeof(uint64_t) * new_l1_size,
+ (s->l1_size - new_l1_size) * sizeof(uint64_t), 0);
+if (ret < 0) {
+return ret;
+}
+
+ret = bdrv_flush(bs->file->bs);
+if (ret < 0) {
+return ret;
+}
+
+BLKDBG_EVENT(bs->file, BLKDBG_L1_SHRINK_FREE_L2_CLUSTERS);
+for (i = s->l1_size - 1; i > new_l1_size - 1; i--) {
+if ((s->l1_table[i] & L1E_OFFSET_MASK) == 0) {
+continue;
+}
+qcow2_free_clusters(bs, s->l1_table[i] & L1E_OFFSET_MASK,
+s->cluster_size, QCOW2_DISCARD_ALWAYS);
+s->l1_table[i] = 0;
+}
+return 0;
+}
+
 int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
 bool exact_size)
 {
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 0141c9cbe7..e52d1698b5 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -29,6 +29,7 @@
 #include "block/qcow2.h"
 #include "qemu/range.h"
 #include "qemu/bswap.h"
+#include "qemu/cutils.h"
 
 static int64_t alloc_clusters_noref(BlockDriverState *bs, uint64_t size);
 static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
@@ -2945,3 +2946,112 @@ done:
 qemu_vfree(new_refblock);
 return ret;
 }
+
+int qcow2_shrink_reftable(BlockDriverState *bs)
+{
+BDRVQcow2State *s = bs->opaque;
+uint64_t *reftable_tmp =
+g_try_malloc(sizeof(uint64_t) * s->refcount_table_size);
+int i, ret;
+
+if (s->refcount_table_size && reftable_tmp == NULL) {
+return -ENOMEM;
+}
+
+for (i = 0; i < s->refcount_table_size; i++) {
+int64_t refblock_offs = s->refcount_table[i] & REFT_OFFSET_MASK;
+void *refblock;
+bool unused_block;
+
+if (refblock_offs == 0) {
+reftable_tmp[i] = 0;
+continue;
+}
+ret = qcow2_cache_get(bs, s->refcount_block_cache, refblock_offs,
+  &refblock);
+if (ret < 0) {
+goto out;
+}
+
+/* the refblock has own reference */
+if (i == offset_to_reftable_index(s, refblock_offs)) {
+uint64_t block_index = (refblock_offs >> s->cluster_bits) &
+   (s->refcount_block_size - 1);
+uint64_t refcount = s->get_refcount(refblock, block_index);
+
+s->set_refcount(refblock, block_index, 0);
+
+unused_block = buffer_is_zero(refblock, s->cluster_size);
+
+s->set_refcount(refblock, block_index, refcount);
+} else {
+unused_block = buffer_is_zero(refblock, s->cluster_size);
+}
+qcow2_cache_put(bs, s->refcount_block_cache, &refblock);
+
+reftable_tmp[i] = unused_block ? 0 : cpu_to_be64(s->refcount_table[i]);
+}
+
+ret = bdrv_pwrite_sync(bs->file, s->refcount_table_offset, reftable_tmp,
+   sizeof(uint64_t) * s->refcount_table_size);
+if (ret < 0) {
+goto out;
+}
+
+for (i = 0; i < s->refcount_table_size; i++) {
+if (s->refcount_table[i] && !reftable_tmp[i]) {
+uint64_t discard_offs = s->refcount_table[i] & REFT_OFFSET_MASK;
+uint64_t refblock_offs = get_refblock_offset(s, discard_offs);
+uint64_t cluster_index = discard_offs >> s->cluster_bits;
+uint32_t block_index = cluster_i

[Qemu-devel] [PATCH v4 0/4] Add shrink image for qcow2

2017-07-11 Thread Pavel Butsykin
This patch add shrinking of the image file for qcow2. As a result, this allows
us to reduce the virtual image size and free up space on the disk without
copying the image. Image can be fragmented and shrink is done by punching holes
in the image file.

# ./qemu-img create -f qcow2 image.qcow2 4G
Formatting 'image.qcow2', fmt=qcow2 size=4294967296 encryption=off 
cluster_size=65536 lazy_refcounts=off refcount_bits=16

# ./qemu-io -c "write -P 0x22 0 1G" image.qcow2
wrote 1073741824/1073741824 bytes at offset 0
1 GiB, 1 ops; 0:00:04.59 (222.886 MiB/sec and 0.2177 ops/sec)

# ./qemu-img resize image.qcow2 512M
warning: qemu-img: Shrinking an image will delete all data beyond the shrunken 
image's end. Before performing such an operation, make sure there is no 
important data there.
error: qemu-img: Use the --shrink option to perform a shrink operation.

# ./qemu-img resize --shrink image.qcow2 128M
Image resized.

# ./qemu-img info image.qcow2
image: image.qcow2
file format: qcow2
virtual size: 128M (134217728 bytes)
disk size: 128M
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false

# du -h image.qcow2 
129Mimage.qcow2

Changes from v1:
- add --shrink flag for qemu-img resize
- add qcow2_cache_discard
- simplify qcow2_shrink_l1_table() to reduce the likelihood of image corruption
- add new qemu-iotests for shrinking images

Changes from v2:
- replace qprintf() on error_report() (1)
- rewrite warning messages (1)
- enforce --shrink flag for all formats except raw (1)
- split qcow2_cache_discard() (2)
- minor fixes according to comments (3)
- rewrite the last part of qcow2_shrink_reftable() to avoid
  qcow2_free_clusters() calls inside (3)
- improve test for shrinking image (4)

Changes from v3:
- rebase on "Implement a warning_report function" Alistair's patch-set (1)
- spelling fixes (1)
- the man page fix according to the discussion (1)
- add call qcow2_signal_corruption() in case of image corruption (3)

Pavel Butsykin (4):
  qemu-img: add --shrink flag for resize
  qcow2: add qcow2_cache_discard
  qcow2: add shrink image support
  qemu-iotests: add shrinking image test

 block/qcow2-cache.c|  26 +++
 block/qcow2-cluster.c  |  40 +++
 block/qcow2-refcount.c | 124 +
 block/qcow2.c  |  42 ---
 block/qcow2.h  |  17 +
 qapi/block-core.json   |   3 +-
 qemu-img-cmds.hx   |   4 +-
 qemu-img.c |  23 ++
 qemu-img.texi  |   6 +-
 tests/qemu-iotests/102 |   4 +-
 tests/qemu-iotests/163 | 170 +
 tests/qemu-iotests/163.out |   5 ++
 tests/qemu-iotests/group   |   1 +
 13 files changed, 449 insertions(+), 16 deletions(-)
 create mode 100644 tests/qemu-iotests/163
 create mode 100644 tests/qemu-iotests/163.out

-- 
2.13.0




[Qemu-devel] [PATCH v4 2/4] qcow2: add qcow2_cache_discard

2017-07-11 Thread Pavel Butsykin
Whenever l2/refcount table clusters are discarded from the file we can
automatically drop unnecessary content of the cache tables. This reduces
the chance of eviction useful cache data and eliminates inconsistent data
in the cache with the data in the file.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Max Reitz 
---
 block/qcow2-cache.c| 26 ++
 block/qcow2-refcount.c | 14 ++
 block/qcow2.h  |  3 +++
 3 files changed, 43 insertions(+)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index 1d25147392..75746a7f43 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -411,3 +411,29 @@ void qcow2_cache_entry_mark_dirty(BlockDriverState *bs, 
Qcow2Cache *c,
 assert(c->entries[i].offset != 0);
 c->entries[i].dirty = true;
 }
+
+void *qcow2_cache_is_table_offset(BlockDriverState *bs, Qcow2Cache *c,
+  uint64_t offset)
+{
+int i;
+
+for (i = 0; i < c->size; i++) {
+if (c->entries[i].offset == offset) {
+return qcow2_cache_get_table_addr(bs, c, i);
+}
+}
+return NULL;
+}
+
+void qcow2_cache_discard(BlockDriverState *bs, Qcow2Cache *c, void *table)
+{
+int i = qcow2_cache_get_table_idx(bs, c, table);
+
+assert(c->entries[i].ref == 0);
+
+c->entries[i].offset = 0;
+c->entries[i].lru_counter = 0;
+c->entries[i].dirty = false;
+
+qcow2_cache_table_release(bs, c, i, 1);
+}
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 7c06061aae..0141c9cbe7 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -767,6 +767,20 @@ static int QEMU_WARN_UNUSED_RESULT 
update_refcount(BlockDriverState *bs,
 s->set_refcount(refcount_block, block_index, refcount);
 
 if (refcount == 0 && s->discard_passthrough[type]) {
+void *table;
+
+table = qcow2_cache_is_table_offset(bs, s->refcount_block_cache,
+offset);
+if (table != NULL) {
+qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
+qcow2_cache_discard(bs, s->refcount_block_cache, table);
+}
+
+table = qcow2_cache_is_table_offset(bs, s->l2_table_cache, offset);
+if (table != NULL) {
+qcow2_cache_discard(bs, s->l2_table_cache, table);
+}
+
 update_refcount_discard(bs, cluster_offset, s->cluster_size);
 }
 }
diff --git a/block/qcow2.h b/block/qcow2.h
index 87b15eb4aa..bf6691dbd0 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -604,5 +604,8 @@ int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, 
uint64_t offset,
 int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
 void **table);
 void qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
+void *qcow2_cache_is_table_offset(BlockDriverState *bs, Qcow2Cache *c,
+  uint64_t offset);
+void qcow2_cache_discard(BlockDriverState *bs, Qcow2Cache *c, void *table);
 
 #endif
-- 
2.13.0




[Qemu-devel] [PATCH v4 1/4] qemu-img: add --shrink flag for resize

2017-07-11 Thread Pavel Butsykin
The flag is additional precaution against data loss. Perhaps in the future the
operation shrink without this flag will be blocked for all formats, but for now
we need to maintain compatibility with raw.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Max Reitz 
---
 qemu-img-cmds.hx   |  4 ++--
 qemu-img.c | 23 +++
 qemu-img.texi  |  6 +-
 tests/qemu-iotests/102 |  4 ++--
 4 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index a39fcdba71..3b2eab9d20 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -76,9 +76,9 @@ STEXI
 ETEXI
 
 DEF("resize", img_resize,
-"resize [--object objectdef] [--image-opts] [-q] filename [+ | -]size")
+"resize [--object objectdef] [--image-opts] [-q] [--shrink] filename [+ | 
-]size")
 STEXI
-@item resize [--object @var{objectdef}] [--image-opts] [-q] @var{filename} [+ 
| -]@var{size}
+@item resize [--object @var{objectdef}] [--image-opts] [-q] [--shrink] 
@var{filename} [+ | -]@var{size}
 ETEXI
 
 DEF("amend", img_amend,
diff --git a/qemu-img.c b/qemu-img.c
index 91ad6bebbf..6c28dc439d 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -61,6 +61,7 @@ enum {
 OPTION_FLUSH_INTERVAL = 261,
 OPTION_NO_DRAIN = 262,
 OPTION_TARGET_IMAGE_OPTS = 263,
+OPTION_SHRINK = 264,
 };
 
 typedef enum OutputFormat {
@@ -3458,6 +3459,7 @@ static int img_resize(int argc, char **argv)
 },
 };
 bool image_opts = false;
+bool shrink = false;
 
 /* Remove size from argv manually so that negative numbers are not treated
  * as options by getopt. */
@@ -3475,6 +3477,7 @@ static int img_resize(int argc, char **argv)
 {"help", no_argument, 0, 'h'},
 {"object", required_argument, 0, OPTION_OBJECT},
 {"image-opts", no_argument, 0, OPTION_IMAGE_OPTS},
+{"shrink", no_argument, 0, OPTION_SHRINK},
 {0, 0, 0, 0}
 };
 c = getopt_long(argc, argv, ":f:hq",
@@ -3509,6 +3512,9 @@ static int img_resize(int argc, char **argv)
 case OPTION_IMAGE_OPTS:
 image_opts = true;
 break;
+case OPTION_SHRINK:
+shrink = true;
+break;
 }
 }
 if (optind != argc - 1) {
@@ -3568,6 +3574,23 @@ static int img_resize(int argc, char **argv)
 goto out;
 }
 
+if (total_size < blk_getlength(blk) && !shrink) {
+warn_report("Shrinking an image will delete all data beyond the "
+"shrunken image's end. Before performing such an "
+"operation, make sure there is no important data there.");
+
+if (g_strcmp0(bdrv_get_format_name(blk_bs(blk)), "raw") != 0) {
+error_report(
+  "Use the --shrink option to perform a shrink operation.");
+ret = -1;
+goto out;
+} else {
+warn_report("Using the --shrink option will suppress this message."
+"Note that future versions of qemu-img may refuse to "
+"shrink images without this option.");
+}
+}
+
 ret = blk_truncate(blk, total_size, &err);
 if (!ret) {
 qprintf(quiet, "Image resized.\n");
diff --git a/qemu-img.texi b/qemu-img.texi
index 5b925ecf41..79ba802481 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -499,7 +499,7 @@ qemu-img rebase -b base.img diff.qcow2
 At this point, @code{modified.img} can be discarded, since
 @code{base.img + diff.qcow2} contains the same information.
 
-@item resize @var{filename} [+ | -]@var{size}
+@item resize [--shrink] @var{filename} [+ | -]@var{size}
 
 Change the disk image as if it had been created with @var{size}.
 
@@ -507,6 +507,10 @@ Before using this command to shrink a disk image, you MUST 
use file system and
 partitioning tools inside the VM to reduce allocated file systems and partition
 sizes accordingly.  Failure to do so will result in data loss!
 
+When shrinking images, the @code{--shrink} option must be given. This informs
+qemu-img that the user acknowledges all loss of data beyond the truncated
+image's end.
+
 After using this command to grow a disk image, you must use file system and
 partitioning tools inside the VM to actually begin using the new space on the
 device.
diff --git a/tests/qemu-iotests/102 b/tests/qemu-iotests/102
index 87db1bb1bf..d7ad8d9840 100755
--- a/tests/qemu-iotests/102
+++ b/tests/qemu-iotests/102
@@ -54,7 +54,7 @@ _make_test_img $IMG_SIZE
 $QEMU_IO -c 'write 0 64k' "$TEST_IMG" | _filter_qemu_io
 # Remove data cluster from image (first cluster: image header, second: 
reftable,
 # third: refblock, fourth: L1 table, fifth: L2 table)
-$QEMU_IMG resize -f raw "$TEST_IMG" $((5 * 64 *

Re: [Qemu-devel] [PATCH] virtio-serial: add enable_backend callback

2017-07-10 Thread Pavel Butsykin

On 10.07.2017 17:13, Laurent Vivier wrote:

On 07/07/2017 16:21, Pavel Butsykin wrote:

We should guarantee that RAM will not be modified while VM has a stopped
state, otherwise it can lead to negative consequences during post-copy
migration. In RUN_STATE_FINISH_MIGRATE step, it's expected that RAM on
source side will not be modified as this could lead to non-consistent vm state
on the destination side. Also RAM access during postcopy-ram migration with
enabled release-ram capability can lead to sad consequences.

Let's add enable_backend() callback to avoid undesirable virtioqueue changes
in the guest memory.

Signed-off-by: Pavel Butsykin 
---
  hw/char/virtio-console.c  | 21 +
  hw/char/virtio-serial-bus.c   |  7 +++
  include/hw/virtio/virtio-serial.h |  3 +++
  3 files changed, 31 insertions(+)

diff --git a/hw/char/virtio-console.c b/hw/char/virtio-console.c
index 0cb1668c8a..b55905892e 100644
--- a/hw/char/virtio-console.c
+++ b/hw/char/virtio-console.c
@@ -163,6 +163,26 @@ static void chr_event(void *opaque, int event)
  }
  }
  
+static void virtconsole_enable_backend(VirtIOSerialPort *port, bool enable)

+{
+VirtConsole *vcon = VIRTIO_CONSOLE(port);
+
+if (!qemu_chr_fe_get_driver(&vcon->chr)) {
+return;
+}
+
+if (enable) {
+VirtIOSerialPortClass *k = VIRTIO_SERIAL_PORT_GET_CLASS(port);
+
+qemu_chr_fe_set_handlers(&vcon->chr, chr_can_read, chr_read,
+ k->is_console ? NULL : chr_event,
+ vcon, NULL, false);
+} else {
+qemu_chr_fe_set_handlers(&vcon->chr, NULL, NULL,
+ NULL, NULL, NULL, false);
+}
+}


I think you can also factorize the code in virtconsole_realize() to call
this new function.


  static void virtconsole_realize(DeviceState *dev, Error **errp)
  {
  VirtIOSerialPort *port = VIRTIO_SERIAL_PORT(dev);
@@ -233,6 +253,7 @@ static void virtserialport_class_init(ObjectClass *klass, 
void *data)
  k->unrealize = virtconsole_unrealize;
  k->have_data = flush_buf;
  k->set_guest_connected = set_guest_connected;
+k->enable_backend = virtconsole_enable_backend;


Why don't you register a  vm_state change handler to change the state of
the virtconsole according to the state of the machine instead of adding
a new function in the VirtIOSerialPortClass?

See a23a6d1 ("virtio-rng: stop virtqueue while the CPU is stopped")


I tried to follow the existing approach. Look at vhost_net or
virtio-input. This is similar to the hierarchical structure, we have the
root device which notifies the virtio devices at the levels above
(virtio-device -> virtio-bus -> virtio_*_device ). It ensures the
state of devices will be modified consistently, the first is a bus, then
the device on that bus. I'm not sure that following this order during
the device state changes is absolute necessity. But it looks nicer
than the registration of notifications for each device.

Although this is just a guess, I have no clear idea of how to do it
right, so I'm waiting for comments :)


Thanks,
Laurent





[Qemu-devel] [PATCH v3 4/4] qemu-iotests: add shrinking image test

2017-07-07 Thread Pavel Butsykin
Signed-off-by: Pavel Butsykin 
---
 tests/qemu-iotests/163 | 170 +
 tests/qemu-iotests/163.out |   5 ++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 176 insertions(+)
 create mode 100644 tests/qemu-iotests/163
 create mode 100644 tests/qemu-iotests/163.out

diff --git a/tests/qemu-iotests/163 b/tests/qemu-iotests/163
new file mode 100644
index 00..403842354e
--- /dev/null
+++ b/tests/qemu-iotests/163
@@ -0,0 +1,170 @@
+#!/usr/bin/env python
+#
+# Tests for shrinking images
+#
+# Copyright (c) 2016-2017 Parallels International GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import os, random, iotests, struct, qcow2
+from iotests import qemu_img, qemu_io, image_size
+
+test_img = os.path.join(iotests.test_dir, 'test.img')
+check_img = os.path.join(iotests.test_dir, 'check.img')
+
+def size_to_int(str):
+suff = ['B', 'K', 'M', 'G', 'T']
+return int(str[:-1]) * 1024**suff.index(str[-1:])
+
+class ShrinkBaseClass(iotests.QMPTestCase):
+image_len = '128M'
+shrink_size = '10M'
+chunk_size = '16M'
+refcount_bits = '16'
+
+def __qcow2_check(self, filename):
+entry_bits = 3
+entry_size = 1 << entry_bits
+l1_mask = 0x00fffe00
+div_roundup = lambda n, d: (n + d - 1) / d
+
+def split_by_n(data, n):
+for x in xrange(0, len(data), n):
+yield struct.unpack('>Q', data[x:x + n])[0] & l1_mask
+
+def check_l1_table(h, l1_data):
+l1_list = list(split_by_n(l1_data, entry_size))
+real_l1_size = div_roundup(h.size,
+   1 << (h.cluster_bits*2 - entry_size))
+used, unused = l1_list[:real_l1_size], l1_list[real_l1_size:]
+
+self.assertTrue(len(used) != 0, "Verifying l1 table content")
+self.assertFalse(any(unused), "Verifying l1 table content")
+
+def check_reftable(fd, h, reftable):
+for offset in split_by_n(reftable, entry_size):
+if offset != 0:
+fd.seek(offset)
+cluster = fd.read(1 << h.cluster_bits)
+self.assertTrue(any(cluster), "Verifying reftable content")
+
+with open(filename, "rb") as fd:
+h = qcow2.QcowHeader(fd)
+
+fd.seek(h.l1_table_offset)
+l1_table = fd.read(h.l1_size << entry_bits)
+
+fd.seek(h.refcount_table_offset)
+reftable = fd.read(h.refcount_table_clusters << h.cluster_bits)
+
+check_l1_table(h, l1_table)
+check_reftable(fd, h, reftable)
+
+def __raw_check(self, filename):
+pass
+
+image_check = {
+'qcow2' : __qcow2_check,
+'raw' : __raw_check
+}
+
+def setUp(self):
+if iotests.imgfmt == 'raw':
+qemu_img('create', '-f', iotests.imgfmt, test_img, self.image_len)
+qemu_img('create', '-f', iotests.imgfmt, check_img,
+ self.shrink_size)
+else:
+qemu_img('create', '-f', iotests.imgfmt,
+ '-o', 'cluster_size=' + self.cluster_size +
+ ',refcount_bits=' + self.refcount_bits,
+ test_img, self.image_len)
+qemu_img('create', '-f', iotests.imgfmt,
+ '-o', 'cluster_size=%s'% self.cluster_size,
+ check_img, self.shrink_size)
+qemu_io('-c', 'write -P 0xff 0 ' + self.shrink_size, check_img)
+
+def tearDown(self):
+os.remove(test_img)
+os.remove(check_img)
+
+def image_verify(self):
+self.assertEqual(image_size(test_img), image_size(check_img),
+ "Verifying image size")
+self.image_check[iotests.imgfmt](self, test_img)
+
+if iotests.imgfmt == 'raw':
+return
+self.assertEqual(qemu_img('check', test_img), 0,
+ "V

[Qemu-devel] [PATCH v3 2/4] qcow2: add qcow2_cache_discard

2017-07-07 Thread Pavel Butsykin
Whenever l2/refcount table clusters are discarded from the file we can
automatically drop unnecessary content of the cache tables. This reduces
the chance of eviction useful cache data and eliminates inconsistent data
in the cache with the data in the file.

Signed-off-by: Pavel Butsykin 
---
 block/qcow2-cache.c| 26 ++
 block/qcow2-refcount.c | 14 ++
 block/qcow2.h  |  3 +++
 3 files changed, 43 insertions(+)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index 1d25147392..75746a7f43 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -411,3 +411,29 @@ void qcow2_cache_entry_mark_dirty(BlockDriverState *bs, 
Qcow2Cache *c,
 assert(c->entries[i].offset != 0);
 c->entries[i].dirty = true;
 }
+
+void *qcow2_cache_is_table_offset(BlockDriverState *bs, Qcow2Cache *c,
+  uint64_t offset)
+{
+int i;
+
+for (i = 0; i < c->size; i++) {
+if (c->entries[i].offset == offset) {
+return qcow2_cache_get_table_addr(bs, c, i);
+}
+}
+return NULL;
+}
+
+void qcow2_cache_discard(BlockDriverState *bs, Qcow2Cache *c, void *table)
+{
+int i = qcow2_cache_get_table_idx(bs, c, table);
+
+assert(c->entries[i].ref == 0);
+
+c->entries[i].offset = 0;
+c->entries[i].lru_counter = 0;
+c->entries[i].dirty = false;
+
+qcow2_cache_table_release(bs, c, i, 1);
+}
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 7c06061aae..0141c9cbe7 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -767,6 +767,20 @@ static int QEMU_WARN_UNUSED_RESULT 
update_refcount(BlockDriverState *bs,
 s->set_refcount(refcount_block, block_index, refcount);
 
 if (refcount == 0 && s->discard_passthrough[type]) {
+void *table;
+
+table = qcow2_cache_is_table_offset(bs, s->refcount_block_cache,
+offset);
+if (table != NULL) {
+qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
+qcow2_cache_discard(bs, s->refcount_block_cache, table);
+}
+
+table = qcow2_cache_is_table_offset(bs, s->l2_table_cache, offset);
+if (table != NULL) {
+qcow2_cache_discard(bs, s->l2_table_cache, table);
+}
+
 update_refcount_discard(bs, cluster_offset, s->cluster_size);
 }
 }
diff --git a/block/qcow2.h b/block/qcow2.h
index 87b15eb4aa..bf6691dbd0 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -604,5 +604,8 @@ int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, 
uint64_t offset,
 int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
 void **table);
 void qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
+void *qcow2_cache_is_table_offset(BlockDriverState *bs, Qcow2Cache *c,
+  uint64_t offset);
+void qcow2_cache_discard(BlockDriverState *bs, Qcow2Cache *c, void *table);
 
 #endif
-- 
2.13.0




[Qemu-devel] [PATCH v3 3/4] qcow2: add shrink image support

2017-07-07 Thread Pavel Butsykin
This patch add shrinking of the image file for qcow2. As a result, this allows
us to reduce the virtual image size and free up space on the disk without
copying the image. Image can be fragmented and shrink is done by punching holes
in the image file.

Signed-off-by: Pavel Butsykin 
---
 block/qcow2-cluster.c  |  40 +++
 block/qcow2-refcount.c | 103 +
 block/qcow2.c  |  42 +++-
 block/qcow2.h  |  14 +++
 qapi/block-core.json   |   3 +-
 5 files changed, 191 insertions(+), 11 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 3d341fd9cb..04d48878af 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -32,6 +32,46 @@
 #include "qemu/bswap.h"
 #include "trace.h"
 
+int qcow2_shrink_l1_table(BlockDriverState *bs, uint64_t exact_size)
+{
+BDRVQcow2State *s = bs->opaque;
+int new_l1_size, i, ret;
+
+if (exact_size >= s->l1_size) {
+return 0;
+}
+
+new_l1_size = exact_size;
+
+#ifdef DEBUG_ALLOC2
+fprintf(stderr, "shrink l1_table from %d to %d\n", s->l1_size, 
new_l1_size);
+#endif
+
+BLKDBG_EVENT(bs->file, BLKDBG_L1_SHRINK_WRITE_TABLE);
+ret = bdrv_pwrite_zeroes(bs->file, s->l1_table_offset +
+   sizeof(uint64_t) * new_l1_size,
+ (s->l1_size - new_l1_size) * sizeof(uint64_t), 0);
+if (ret < 0) {
+return ret;
+}
+
+ret = bdrv_flush(bs->file->bs);
+if (ret < 0) {
+return ret;
+}
+
+BLKDBG_EVENT(bs->file, BLKDBG_L1_SHRINK_FREE_L2_CLUSTERS);
+for (i = s->l1_size - 1; i > new_l1_size - 1; i--) {
+if ((s->l1_table[i] & L1E_OFFSET_MASK) == 0) {
+continue;
+}
+qcow2_free_clusters(bs, s->l1_table[i] & L1E_OFFSET_MASK,
+s->cluster_size, QCOW2_DISCARD_ALWAYS);
+s->l1_table[i] = 0;
+}
+return 0;
+}
+
 int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
 bool exact_size)
 {
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 0141c9cbe7..443fca6a98 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -29,6 +29,7 @@
 #include "block/qcow2.h"
 #include "qemu/range.h"
 #include "qemu/bswap.h"
+#include "qemu/cutils.h"
 
 static int64_t alloc_clusters_noref(BlockDriverState *bs, uint64_t size);
 static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
@@ -2945,3 +2946,105 @@ done:
 qemu_vfree(new_refblock);
 return ret;
 }
+
+int qcow2_shrink_reftable(BlockDriverState *bs)
+{
+BDRVQcow2State *s = bs->opaque;
+uint64_t *reftable_tmp =
+g_try_malloc(sizeof(uint64_t) * s->refcount_table_size);
+int i, ret;
+
+if (s->refcount_table_size && reftable_tmp == NULL) {
+return -ENOMEM;
+}
+
+for (i = 0; i < s->refcount_table_size; i++) {
+int64_t refblock_offs = s->refcount_table[i] & REFT_OFFSET_MASK;
+void *refblock;
+bool unused_block;
+
+if (refblock_offs == 0) {
+reftable_tmp[i] = 0;
+continue;
+}
+ret = qcow2_cache_get(bs, s->refcount_block_cache, refblock_offs,
+  &refblock);
+if (ret < 0) {
+goto out;
+}
+
+/* the refblock has own reference */
+if (i == offset_to_reftable_index(s, refblock_offs)) {
+uint64_t block_index = (refblock_offs >> s->cluster_bits) &
+   (s->refcount_block_size - 1);
+uint64_t refcount = s->get_refcount(refblock, block_index);
+
+s->set_refcount(refblock, block_index, 0);
+
+unused_block = buffer_is_zero(refblock, s->cluster_size);
+
+s->set_refcount(refblock, block_index, refcount);
+} else {
+unused_block = buffer_is_zero(refblock, s->cluster_size);
+}
+qcow2_cache_put(bs, s->refcount_block_cache, &refblock);
+
+reftable_tmp[i] = unused_block ? 0 : cpu_to_be64(s->refcount_table[i]);
+}
+
+ret = bdrv_pwrite_sync(bs->file, s->refcount_table_offset, reftable_tmp,
+   sizeof(uint64_t) * s->refcount_table_size);
+if (ret < 0) {
+goto out;
+}
+
+for (i = 0; i < s->refcount_table_size; i++) {
+if (s->refcount_table[i] && !reftable_tmp[i]) {
+uint64_t discard_offs = s->refcount_table[i] & REFT_OFFSET_MASK;
+uint64_t refblock_offs = get_refblock_offset(s, discard_offs);
+uint64_t cluster_index = discard_offs >> s->cluster_bits;
+uint32_t block_index = cluster_index & (s->refcoun

[Qemu-devel] [PATCH v3 1/4] qemu-img: add --shrink flag for resize

2017-07-07 Thread Pavel Butsykin
The flag as additional precaution of data loss. Perhaps in the future the
operation shrink without this flag will be blocked for all formats, but while
we need to maintain compatibility with raw.

Signed-off-by: Pavel Butsykin 
---
 qemu-img-cmds.hx   |  4 ++--
 qemu-img.c | 23 +++
 qemu-img.texi  |  7 ++-
 tests/qemu-iotests/102 |  4 ++--
 4 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index a39fcdba71..3b2eab9d20 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -76,9 +76,9 @@ STEXI
 ETEXI
 
 DEF("resize", img_resize,
-"resize [--object objectdef] [--image-opts] [-q] filename [+ | -]size")
+"resize [--object objectdef] [--image-opts] [-q] [--shrink] filename [+ | 
-]size")
 STEXI
-@item resize [--object @var{objectdef}] [--image-opts] [-q] @var{filename} [+ 
| -]@var{size}
+@item resize [--object @var{objectdef}] [--image-opts] [-q] [--shrink] 
@var{filename} [+ | -]@var{size}
 ETEXI
 
 DEF("amend", img_amend,
diff --git a/qemu-img.c b/qemu-img.c
index 91ad6bebbf..9773e835e4 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -61,6 +61,7 @@ enum {
 OPTION_FLUSH_INTERVAL = 261,
 OPTION_NO_DRAIN = 262,
 OPTION_TARGET_IMAGE_OPTS = 263,
+OPTION_SHRINK = 264,
 };
 
 typedef enum OutputFormat {
@@ -3458,6 +3459,7 @@ static int img_resize(int argc, char **argv)
 },
 };
 bool image_opts = false;
+bool shrink = false;
 
 /* Remove size from argv manually so that negative numbers are not treated
  * as options by getopt. */
@@ -3475,6 +3477,7 @@ static int img_resize(int argc, char **argv)
 {"help", no_argument, 0, 'h'},
 {"object", required_argument, 0, OPTION_OBJECT},
 {"image-opts", no_argument, 0, OPTION_IMAGE_OPTS},
+{"shrink", no_argument, 0, OPTION_SHRINK},
 {0, 0, 0, 0}
 };
 c = getopt_long(argc, argv, ":f:hq",
@@ -3509,6 +3512,9 @@ static int img_resize(int argc, char **argv)
 case OPTION_IMAGE_OPTS:
 image_opts = true;
 break;
+case OPTION_SHRINK:
+shrink = true;
+break;
 }
 }
 if (optind != argc - 1) {
@@ -3568,6 +3574,23 @@ static int img_resize(int argc, char **argv)
 goto out;
 }
 
+if (total_size < blk_getlength(blk) && !shrink) {
+error_report("Warning: Shrinking an image will delete all data beyond"
+ "the shrunken image's end. Before performing such an"
+ "operation, make sure there is no important data there.");
+
+if (g_strcmp0(bdrv_get_format_name(blk_bs(blk)), "raw") != 0) {
+error_report(
+  "Use the --shrink option to perform a shrink operation.");
+ret = -1;
+goto out;
+} else {
+error_report("Using the --shrink option will suppress this 
message."
+ "Note that future versions of qemu-img may refuse to "
+ "shrink images without this option!");
+}
+}
+
 ret = blk_truncate(blk, total_size, &err);
 if (!ret) {
 qprintf(quiet, "Image resized.\n");
diff --git a/qemu-img.texi b/qemu-img.texi
index 5b925ecf41..6324abef48 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -499,7 +499,7 @@ qemu-img rebase -b base.img diff.qcow2
 At this point, @code{modified.img} can be discarded, since
 @code{base.img + diff.qcow2} contains the same information.
 
-@item resize @var{filename} [+ | -]@var{size}
+@item resize [--shrink] @var{filename} [+ | -]@var{size}
 
 Change the disk image as if it had been created with @var{size}.
 
@@ -507,6 +507,11 @@ Before using this command to shrink a disk image, you MUST 
use file system and
 partitioning tools inside the VM to reduce allocated file systems and partition
 sizes accordingly.  Failure to do so will result in data loss!
 
+@code{--shrink} informs qemu-img that the user is certain about wanting
+to shrink an image and is aware that any data beyond the truncated
+image's end will be lost. Trying to shrink an image without this option
+results in a warning; future versions may make it an error.
+
 After using this command to grow a disk image, you must use file system and
 partitioning tools inside the VM to actually begin using the new space on the
 device.
diff --git a/tests/qemu-iotests/102 b/tests/qemu-iotests/102
index 87db1bb1bf..d7ad8d9840 100755
--- a/tests/qemu-iotests/102
+++ b/tests/qemu-iotests/102
@@ -54,7 +54,7 @@ _make_test_img $IMG_SIZE
 $QEMU_IO -c 'write 0 64k' "$TEST_IMG" | _filter_qemu_io
 # Remove data cluster from image (first cluster: image header, second: 
reftable,
 # third: refblock, fourt

[Qemu-devel] [PATCH v3 0/4] Add shrink image for qcow2

2017-07-07 Thread Pavel Butsykin
This patch add shrinking of the image file for qcow2. As a result, this allows
us to reduce the virtual image size and free up space on the disk without
copying the image. Image can be fragmented and shrink is done by punching holes
in the image file.

# ./qemu-img create -f qcow2 image.qcow2 4G
Formatting 'image.qcow2', fmt=qcow2 size=4294967296 encryption=off 
cluster_size=65536 lazy_refcounts=off refcount_bits=16

# ./qemu-io -c "write -P 0x22 0 1G" image.qcow2
wrote 1073741824/1073741824 bytes at offset 0
1 GiB, 1 ops; 0:00:04.59 (222.886 MiB/sec and 0.2177 ops/sec)

# ./qemu-img resize image.qcow2 512M
qemu-img: Warning: Shrinking an image will delete all data beyondthe shrunken 
image's end. Before performing such anoperation, make sure there is no 
important data there.
qemu-img: Use the --shrink option to perform a shrink operation.

# ./qemu-img resize --shrink image.qcow2 128M
Image resized.

# ./qemu-img info image.qcow2
image: image.qcow2
file format: qcow2
virtual size: 128M (134217728 bytes)
disk size: 128M
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false

# du -h image.qcow2 
129Mimage.qcow2

Changes from v1:
- add --shrink flag for qemu-img resize
- add qcow2_cache_discard
- simplify qcow2_shrink_l1_table() to reduce the likelihood of image corruption
- add new qemu-iotests for shrinking images

Changes from v2:
- replace qprintf() on error_report() (1)
- rewrite warning messages (1)
- enforce --shrink flag for all formats except raw (1)
- split qcow2_cache_discard() (2)
- minor fixes according to comments (3)
- rewrite the last part of qcow2_shrink_reftable() to avoid
  qcow2_free_clusters() calls inside (3)
- improve test for shrinking image (4)

Pavel Butsykin (4):
  qemu-img: add --shrink flag for resize
  qcow2: add qcow2_cache_discard
  qcow2: add shrink image support
  qemu-iotests: add shrinking image test

 block/qcow2-cache.c|  26 +++
 block/qcow2-cluster.c  |  40 +++
 block/qcow2-refcount.c | 117 +++
 block/qcow2.c  |  42 ---
 block/qcow2.h  |  17 +
 qapi/block-core.json   |   3 +-
 qemu-img-cmds.hx   |   4 +-
 qemu-img.c |  23 ++
 qemu-img.texi  |   7 +-
 tests/qemu-iotests/102 |   4 +-
 tests/qemu-iotests/163 | 170 +
 tests/qemu-iotests/163.out |   5 ++
 tests/qemu-iotests/group   |   1 +
 13 files changed, 443 insertions(+), 16 deletions(-)
 create mode 100644 tests/qemu-iotests/163
 create mode 100644 tests/qemu-iotests/163.out

-- 
2.13.0




[Qemu-devel] [PATCH] virtio-serial: add enable_backend callback

2017-07-07 Thread Pavel Butsykin
We should guarantee that RAM will not be modified while VM has a stopped
state, otherwise it can lead to negative consequences during post-copy
migration. In RUN_STATE_FINISH_MIGRATE step, it's expected that RAM on
source side will not be modified as this could lead to non-consistent vm state
on the destination side. Also RAM access during postcopy-ram migration with
enabled release-ram capability can lead to sad consequences.

Let's add enable_backend() callback to avoid undesirable virtioqueue changes
in the guest memory.

Signed-off-by: Pavel Butsykin 
---
 hw/char/virtio-console.c  | 21 +
 hw/char/virtio-serial-bus.c   |  7 +++
 include/hw/virtio/virtio-serial.h |  3 +++
 3 files changed, 31 insertions(+)

diff --git a/hw/char/virtio-console.c b/hw/char/virtio-console.c
index 0cb1668c8a..b55905892e 100644
--- a/hw/char/virtio-console.c
+++ b/hw/char/virtio-console.c
@@ -163,6 +163,26 @@ static void chr_event(void *opaque, int event)
 }
 }
 
+static void virtconsole_enable_backend(VirtIOSerialPort *port, bool enable)
+{
+VirtConsole *vcon = VIRTIO_CONSOLE(port);
+
+if (!qemu_chr_fe_get_driver(&vcon->chr)) {
+return;
+}
+
+if (enable) {
+VirtIOSerialPortClass *k = VIRTIO_SERIAL_PORT_GET_CLASS(port);
+
+qemu_chr_fe_set_handlers(&vcon->chr, chr_can_read, chr_read,
+ k->is_console ? NULL : chr_event,
+ vcon, NULL, false);
+} else {
+qemu_chr_fe_set_handlers(&vcon->chr, NULL, NULL,
+ NULL, NULL, NULL, false);
+}
+}
+
 static void virtconsole_realize(DeviceState *dev, Error **errp)
 {
 VirtIOSerialPort *port = VIRTIO_SERIAL_PORT(dev);
@@ -233,6 +253,7 @@ static void virtserialport_class_init(ObjectClass *klass, 
void *data)
 k->unrealize = virtconsole_unrealize;
 k->have_data = flush_buf;
 k->set_guest_connected = set_guest_connected;
+k->enable_backend = virtconsole_enable_backend;
 k->guest_writable = guest_writable;
 dc->props = virtserialport_properties;
 }
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index f5bc173844..f0f18c8e7c 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -637,6 +637,13 @@ static void set_status(VirtIODevice *vdev, uint8_t status)
 if (!(status & VIRTIO_CONFIG_S_DRIVER_OK)) {
 guest_reset(vser);
 }
+
+QTAILQ_FOREACH(port, &vser->ports, next) {
+VirtIOSerialPortClass *vsc = VIRTIO_SERIAL_PORT_GET_CLASS(port);
+if (vsc->enable_backend) {
+vsc->enable_backend(port, vdev->vm_running);
+}
+}
 }
 
 static void vser_reset(VirtIODevice *vdev)
diff --git a/include/hw/virtio/virtio-serial.h 
b/include/hw/virtio/virtio-serial.h
index b19c44727f..12657a9f39 100644
--- a/include/hw/virtio/virtio-serial.h
+++ b/include/hw/virtio/virtio-serial.h
@@ -58,6 +58,9 @@ typedef struct VirtIOSerialPortClass {
 /* Guest opened/closed device. */
 void (*set_guest_connected)(VirtIOSerialPort *port, int guest_connected);
 
+/* Enable/disable backend for virtio serial port */
+void (*enable_backend)(VirtIOSerialPort *port, bool enable);
+
 /* Guest is now ready to accept data (virtqueues set up). */
 void (*guest_ready)(VirtIOSerialPort *port);
 
-- 
2.13.0




Re: [Qemu-devel] [PATCH 0/5] qemu-iotests: test savevm/loadvm iothread (and make it work!)

2017-07-05 Thread Pavel Butsykin

On 05.07.2017 15:55, Stefan Hajnoczi wrote:

On Mon, Jun 19, 2017 at 03:26:56PM +0300, Pavel Butsykin wrote:

On 15.06.2017 19:38, Stefan Hajnoczi wrote:

This series extends qemu-iotests 068 to also run with iothread enabled.  Doing
so was harder than expected because:

1. ioeventfd is disabled without -M accel=kvm even though it should work
2. loadvm still has an iothread bug

Instead of adding a ./check -iothread option I decided to always run the test.
Kevin recently recommended this approach; the advantage is that iothread *will*
always be tested iothread mode whereas people won't run ./check -iothread.


Why not just add -iothread option in check-block.sh? We can do an
additional run with -iothread, as it is done for different block
formats.

Because all the test cases already exist, we can just reuse them.


 From the email:

   "the advantage is that iothread *will* always be tested iothread mode
   whereas people won't run ./check -iothread"

Both approaches have pros and cons.


Yeah, it'd just be nice to add iothread support for all iotests.


If we decide to use ./check -iothread then it's necessary to figure out
which tests -iothread even applies to.  Tests that use qemu-img/qemu-io
do not use iothread.  Only tests that launch QEMU can take advantage of
iothread today.


Well, for tests which don't support the -iothread option we can define
something like this:

_unsupported_iothread

or

_unsupported_opts "iothread"

But overall it looks like a separate patch series, which can be done in
the future.


Stefan





Re: [Qemu-devel] [PATCH v2 3/4] qcow2: add shrink image support

2017-06-28 Thread Pavel Butsykin

On 28.06.2017 16:59, Max Reitz wrote:

On 2017-06-27 17:06, Pavel Butsykin wrote:

On 26.06.2017 20:47, Max Reitz wrote:

On 2017-06-26 17:23, Pavel Butsykin wrote:

[]


Is there any guarantee that in the future this will not change? Because
in this case it can be a potential danger.


Since this behavior is not documented anywhere, there is no guarantee.


I can add a comment... Or add a new variable with the size of
reftable_tmp, and every time count min(s->refcount_table_size,
reftable_tmp_size)
before accessing to s->refcount_table[]/reftable_tmp[]


Or (1) you add an assertion that refcount_table_size doesn't change
along with a comment why that is the case, which also explains in detail
why the call to qcow2_free_clusters() should be safe: The on-disk
reftable differs from the one in memory. qcow2_free_clusters()and
update_refcount() themselves do not access the reftable, so they are
safe. However, update_refcount() calls alloc_refcount_block(), and that
function does access the reftable: Now, as long as
s->refcount_table_size does not shrink (which I can't see why it would),
refcount_table_index should always be smaller. Now we're accessing
s->refcount_table: This will always return an existing refblock because
this will either be the refblock itself (for self-referencing refblocks)
or another one that is not going to be freed by qcow2_shrink_reftable()
because this function will not free refblocks which cover other clusters
than themselves.
We will then proceed to update the refblock which is either right (if it
is not the refblock to be freed) or won't do anything (if it is the one
to be freed).
In any case, we will never write to the reftable and reading from the
basically outdated cached version will never do anything bad.


OK, SGTM.


Or (2) you copy reftable_tmp into s->refcount_table[] *before* any call
to qcow2_free_clusters(). To make this work, you would need to also
discard all refblocks from the cache in this function here (and not in
update_refcount()) and then only call qcow2_free_clusters() on refblocks
which were not self-referencing. An alternative hack would be to simply
mark the image dirty and just not do any qcow2_free_clusters() call...


The main purpose of qcow2_reftable_shrink() function is discard all
unnecessary refblocks from the file. If we do only rewrite
refcount_table and discard non-self-referencing refblocks (which are
actually very rare), then the meaning of the function is lost.


It would do exactly the same. The idea is that you do not need to call
qcow2_free_clusters() on self-referencing refblocks at all, since they
are freed automatically when their reftable entry is overwritten with 0.


Not sure.. For self-referencing refblocks, we also need to do:
1. check if refcount > 1
2. update s->free_cluster_index
3. call update_refcount_discard() (to in the end the fallocate
PUNCH_HOLE was called on refblock offset)

It will be practically a copy-paste from qcow2_free_clusters(), so it is
better to avoid it. I think that if it makes sense to do
qcow2_reftable_shrink(), it is only because we can slightly reduce image
size.


Or (3) of course it would be possible to not clean up refcount
structures at all...


Nice solution :)


It is, because as I said refcount structures only have a small overhead.


Yes, I agree.


Max





Re: [Qemu-devel] [PATCH v2 3/4] qcow2: add shrink image support

2017-06-27 Thread Pavel Butsykin

On 26.06.2017 20:47, Max Reitz wrote:

On 2017-06-26 17:23, Pavel Butsykin wrote:

[]


Is there any guarantee that in the future this will not change? Because
in this case it can be a potential danger.


Since this behavior is not documented anywhere, there is no guarantee.


I can add a comment... Or add a new variable with the size of
reftable_tmp, and every time count min(s->refcount_table_size,
reftable_tmp_size)
before accessing to s->refcount_table[]/reftable_tmp[]


Or (1) you add an assertion that refcount_table_size doesn't change
along with a comment why that is the case, which also explains in detail
why the call to qcow2_free_clusters() should be safe: The on-disk
reftable differs from the one in memory. qcow2_free_clusters()and
update_refcount() themselves do not access the reftable, so they are
safe. However, update_refcount() calls alloc_refcount_block(), and that
function does access the reftable: Now, as long as
s->refcount_table_size does not shrink (which I can't see why it would),
refcount_table_index should always be smaller. Now we're accessing
s->refcount_table: This will always return an existing refblock because
this will either be the refblock itself (for self-referencing refblocks)
or another one that is not going to be freed by qcow2_shrink_reftable()
because this function will not free refblocks which cover other clusters
than themselves.
We will then proceed to update the refblock which is either right (if it
is not the refblock to be freed) or won't do anything (if it is the one
to be freed).
In any case, we will never write to the reftable and reading from the
basically outdated cached version will never do anything bad.


OK, SGTM.


Or (2) you copy reftable_tmp into s->refcount_table[] *before* any call
to qcow2_free_clusters(). To make this work, you would need to also
discard all refblocks from the cache in this function here (and not in
update_refcount()) and then only call qcow2_free_clusters() on refblocks
which were not self-referencing. An alternative hack would be to simply
mark the image dirty and just not do any qcow2_free_clusters() call...


The main purpose of qcow2_reftable_shrink() function is discard all
unnecessary refblocks from the file. If we do only rewrite
refcount_table and discard non-self-referencing refblocks (which are
actually very rare), then the meaning of the function is lost.


Or (3) of course it would be possible to not clean up refcount
structures at all...


Nice solution :)


Max





Re: [Qemu-devel] [PATCH v2 3/4] qcow2: add shrink image support

2017-06-26 Thread Pavel Butsykin

On 23.06.2017 18:46, Max Reitz wrote:

On 2017-06-22 15:57, Pavel Butsykin wrote:


On 22.06.2017 01:55, Max Reitz wrote:

On 2017-06-13 14:16, Pavel Butsykin wrote:

[]

+}
+qcow2_cache_put(bs, s->refcount_block_cache, &refblock);
+
+reftable_tmp[i] = unused_block ? 0 :
cpu_to_be64(s->refcount_table[i]);
+}
+
+ret = bdrv_pwrite_sync(bs->file, s->refcount_table_offset,
reftable_tmp,
+   sizeof(uint64_t) * s->refcount_table_size);
+if (ret < 0) {
+goto out;
+}
+
+for (i = 0; i < s->refcount_table_size; i++) {
+if (s->refcount_table[i] && !reftable_tmp[i]) {
+qcow2_free_clusters(bs, s->refcount_table[i] &
REFT_OFFSET_MASK,
+s->cluster_size, QCOW2_DISCARD_ALWAYS);


This doesn't feel like a very good idea. The bdrv_pwrite_sync() before
has brought the on-disk refcount structures into a different state than
what we have cached.


It is for this inside qcow2_free_clusters()->update_refcount() the cache
is discarded by qcow2_cache_discard().


This doesn't change the fact that the in-memory reftable is different
from the on-disk reftable and that qcow2_free_clusters() may trip up on
that; the main issue is the allocate_refcount_block() call before.


before what?

If we are talking about allocate_refcount_block() calls after
bdrv_pwrite_sync(), then... Inside allocate_refcount_block() will always
be called load_refcount_block(), what actually is not so dangerous even
if refcount_block_cache is empty. Because the refblock offset will
always be taken from s->refcount_table.


So we need a guarantee that update_refcount() won't touch the reftable
if the refcount is decreased. It will call alloc_refcount_block() and
that should definitely find the respective refblock to already exist
because of course it has a refcount already.


We don't touch the refblocks which contain references to other
refblocks, this ensures that update_refcount() will not try to raise
the discarded refblock.


But here's an issue: It tries to read from s->refcount_table[], and you
are slowly overwriting it in the same loop here. So it may not actually
find the refcount (if a refblock is described by an earlier one).
(After more than an hour of debugging, I realized this is not true: You
will only zero reftable entries if the refblock describes nothing or
only themselves. So overwriting one reftable entry cannot have effects
on other refblocks. Or at least it should not.)


As you've noticed, here uses a simple approach:
We discard only refblocks that contain nothing or own reference. If we
have a refblock that is actually empty, but contains a reference to
another empty refblock, we don't touch this refblock. Maybe it's not the
best solution, but at least it's simple and secure.

There is another approach that can be applied here:

1. decrease the refcounts for all refblocks
2. find all empty refblocks
3. increase the refcounts for all refblocks
4. rewrite the refcount_table on disk (with the empty reftable entries)
5. release all the emptt reblocks in reverse order (start at the end of 
the s->refcount_table)


This will certainly allow us to get rid of all empty reblocks, but the
code will be less welcoming :) Also the case when the refblock contains
a reference to another refblock is quite rare.


Another potential issue is that you're assuming s->refcount_table_size
to be constant. I cannot find a way for it not to be, but investigating
this is painful and I can't claim I know for sure that it is constant.
If it isn't, you may get overflows when accessing reftable_tmp[].

(Yes, it may be constant; but the reader of this code has to read
through qcow2_free_clusters(), allocate_refcount_block() and
update_refcount() to know (or at least to guess) that's the case.)


Is there any guarantee that in the future this will not change? Because
in this case it can be a potential danger.

I can add a comment... Or add a new variable with the size of
reftable_tmp, and every time count min(s->refcount_table_size, 
reftable_tmp_size)

before accessing to s->refcount_table[]/reftable_tmp[]


I don't really want to look deeper into this, but here's an image that I
produced while trying to somehow break all of this. It makes qemu-img
check pass but fails after qemu-img resize --shrink shrink.qcow2 32M:
https://xanclic.moe/shrink.qcow2

(The image has been created with cluster_size=512 and refcount_bits=64;
then I filled the penultimate two entries of the reftable with pointers
to 0x1f and 0x1f0200, respectively (so the first of these refblocks
would describe both), giving me this:
https://xanclic.moe/shrink-template.qcow2
I then put some data onto it with qemu-io -c 'write 0 1457K', which gave
me shrink.qcow2.)



Thank you for the samples! The mistake was quite na

Re: [Qemu-devel] [PATCH v2 1/4] qemu-img: add --shrink flag for resize

2017-06-22 Thread Pavel Butsykin

On 22.06.2017 17:49, Kevin Wolf wrote:

Am 22.06.2017 um 15:54 hat Pavel Butsykin geschrieben:

On 22.06.2017 01:17, Max Reitz wrote:

On 2017-06-13 14:16, Pavel Butsykin wrote:

The flag as additional precaution of data loss. Perhaps in the future the
operation shrink without this flag will be banned, but while we need to
maintain compatibility.

Signed-off-by: Pavel Butsykin 



The functional changes look good to me; even though I'd rather have it
an error for qcow2 now (even if that means having to check the image
format in img_resize(), and being inconsistent because you wouldn't need
--shrink for raw, but for qcow2 you would). But, well, I'm not going to
stop this series over that.



Why shrink for qcow2 image is dangerous, but for raw is not dangerous? I
think we should provide the same behavior for all formats. When --shrink
option will become necessary, it also should be the same for all image
formats.


It is dangerous for both, but for raw we can't enforce the flag
immediately without a deprecation period. With qcow2 we can (because it
is new functionality), so we might as well enforce it from the start.



Ah, exactly. I like the offer to print the warning for raw and enforce
the flag for other formats.


Kevin





Re: [Qemu-devel] [PATCH v2 3/4] qcow2: add shrink image support

2017-06-22 Thread Pavel Butsykin


On 22.06.2017 01:55, Max Reitz wrote:

On 2017-06-13 14:16, Pavel Butsykin wrote:

This patch add shrinking of the image file for qcow2. As a result, this allows
us to reduce the virtual image size and free up space on the disk without
copying the image. Image can be fragmented and shrink is done by punching holes
in the image file.

Signed-off-by: Pavel Butsykin 
---
  block/qcow2-cluster.c  | 42 
  block/qcow2-refcount.c | 65 ++
  block/qcow2.c  | 40 +++
  block/qcow2.h  |  2 ++
  qapi/block-core.json   |  3 ++-
  5 files changed, 141 insertions(+), 11 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index d779ea19cf..a84b7e607e 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -32,6 +32,48 @@
  #include "qemu/bswap.h"
  #include "trace.h"
  
+int qcow2_shrink_l1_table(BlockDriverState *bs, uint64_t max_size)


It's not really a max_size but always an exact size. You don't want it
to be any smaller than this.


+{
+BDRVQcow2State *s = bs->opaque;
+int new_l1_size, i, ret;
+
+if (max_size >= s->l1_size) {
+return 0;
+}
+
+new_l1_size = max_size;
+
+#ifdef DEBUG_ALLOC2
+fprintf(stderr, "shrink l1_table from %d to %" PRId64 "\n",
+s->l1_size, new_l1_size);


new_l1_size is of type int, not int64_t.


+#endif
+
+BLKDBG_EVENT(bs->file, BLKDBG_L1_SHRINK_WRITE_TABLE);
+ret = bdrv_pwrite_zeroes(bs->file, s->l1_table_offset +
+   sizeof(uint64_t) * new_l1_size,
+ (s->l1_size - new_l1_size) * sizeof(uint64_t), 0);
+if (ret < 0) {
+return ret;
+}
+
+ret = bdrv_flush(bs->file->bs);
+if (ret < 0) {
+return ret;
+}
+
+BLKDBG_EVENT(bs->file, BLKDBG_L1_SHRINK_FREE_L2_CLUSTERS);
+for (i = s->l1_size - 1; i > new_l1_size - 1; i--) {
+if ((s->l1_table[i] & L1E_OFFSET_MASK) == 0) {
+continue;
+}
+qcow2_free_clusters(bs, s->l1_table[i] & L1E_OFFSET_MASK,
+s->l2_size * sizeof(uint64_t),


I'm more of a fan of s->cluster_size instead of s->l2_size *
sizeof(uint64_t) but it's not like it matters...


+QCOW2_DISCARD_ALWAYS);
+s->l1_table[i] = 0;


I'd probably clear the overhanging s->l1_table entries before
bdrv_flush() (before you shouldn't really use them after
bdrv_pwrite_zeroes() has returned, even if bdrv_flush() has failed), but
it's not absolutely necessary. As long as they still have a refcount of
at least one, writing to them will just be useless but not destroy any data.



You're right, but If it's not necessary, I would prefer to leave as is..
Just because overhanging s->l1_table entries used to release clusters :)


+}
+return 0;
+}
+
  int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
  bool exact_size)
  {
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 576ab551d6..e98306acd8 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -29,6 +29,7 @@
  #include "block/qcow2.h"
  #include "qemu/range.h"
  #include "qemu/bswap.h"
+#include "qemu/cutils.h"
  
  static int64_t alloc_clusters_noref(BlockDriverState *bs, uint64_t size);

  static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
@@ -2936,3 +2937,67 @@ done:
  qemu_vfree(new_refblock);
  return ret;
  }
+
+int qcow2_shrink_reftable(BlockDriverState *bs)
+{
+BDRVQcow2State *s = bs->opaque;
+uint64_t *reftable_tmp =
+g_try_malloc(sizeof(uint64_t) * s->refcount_table_size);
+int i, ret;
+
+if (s->refcount_table_size && reftable_tmp == NULL) {
+return -ENOMEM;
+}
+
+for (i = 0; i < s->refcount_table_size; i++) {
+int64_t refblock_offs = s->refcount_table[i] & REFT_OFFSET_MASK;
+void *refblock;
+bool unused_block;
+
+if (refblock_offs == 0) {
+reftable_tmp[i] = 0;
+continue;
+}
+ret = qcow2_cache_get(bs, s->refcount_block_cache, refblock_offs,
+  &refblock);
+if (ret < 0) {
+goto out;
+}
+
+/* the refblock has own reference */
+if (i == refblock_offs >> (s->refcount_block_bits + s->cluster_bits)) {
+uint64_t blk_index = (refblock_offs >> s->cluster_bits) &
+ (s->refcount_block_size - 1);
+uint64_t refcount = s->get_refcount(refblock, blk_index);
+
+s->set_refcount(refblock, blk_index, 0);
+
+   

Re: [Qemu-devel] [PATCH v2 2/4] qcow2: add qcow2_cache_discard

2017-06-22 Thread Pavel Butsykin

On 22.06.2017 01:29, Max Reitz wrote:

On 2017-06-13 14:16, Pavel Butsykin wrote:

Whenever l2/refcount table clusters are discarded from the file we can
automatically drop unnecessary content of the cache tables. This reduces
the chance of eviction useful cache data and eliminates inconsistent data
in thecache with the data in the file.

Signed-off-by: Pavel Butsykin 
---
  block/qcow2-cache.c| 21 +
  block/qcow2-refcount.c |  5 +
  block/qcow2.h  |  1 +
  3 files changed, 27 insertions(+)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index 1d25147392..7931edf237 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -411,3 +411,24 @@ void qcow2_cache_entry_mark_dirty(BlockDriverState *bs, 
Qcow2Cache *c,
  assert(c->entries[i].offset != 0);
  c->entries[i].dirty = true;
  }
+
+void qcow2_cache_discard(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset)
+{
+int i;
+
+for (i = 0; i < c->size; i++) {
+if (c->entries[i].offset == offset) {
+goto found; /* table offset */
+}
+}
+return;
+
+found:
+assert(c->entries[i].ref == 0);
+
+c->entries[i].offset = 0;
+c->entries[i].lru_counter = 0;
+c->entries[i].dirty = false;
+
+qcow2_cache_table_release(bs, c, i, 1);
+}
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 7c06061aae..576ab551d6 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -767,6 +767,11 @@ static int QEMU_WARN_UNUSED_RESULT 
update_refcount(BlockDriverState *bs,
  s->set_refcount(refcount_block, block_index, refcount);
  
  if (refcount == 0 && s->discard_passthrough[type]) {

+qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);


I don't like this very much. It works, but it feels bad.

Would it be possible to store this refblock's offset somewhere and only
put it back if @offset is equal to that?


+qcow2_cache_discard(bs, s->refcount_block_cache, offset);
+
+qcow2_cache_discard(bs, s->l2_table_cache, offset);
+


So you're blindly calling qcow2_cache_discard() on @offset because
@offset may be pointing to a refblock or an L2 table? Right, that works,
but it still deserves a comment, I think (that we have no idea whether
@offset actually points to any of these refcount structures, but that we
also just don't have to care).

Looks OK to me, functionally, but I'd at least like to have that comment.



Hmm.. We can split qcow2_cache_discard() and kill two birds with one
stone.

void* qcow2_cache_is_table_offset(BlockDriverState *bs, Qcow2Cache *c,
  uint64_t offset)
{
int i;

for (i = 0; i < c->size; i++) {
if (c->entries[i].offset == offset) {
return qcow2_cache_get_table_addr(bs, c, i);
}
}
return NULL;
}

void qcow2_cache_discard(BlockDriverState *bs, Qcow2Cache *c, void table)
{
int i = qcow2_cache_get_table_idx(bs, c, table);

assert(c->entries[i].ref == 0);

c->entries[i].offset = 0;
c->entries[i].lru_counter = 0;
c->entries[i].dirty = false;

qcow2_cache_table_release(bs, c, i, 1);
}



if (refcount == 0 && s->discard_passthrough[type]) {
void *table;

table = qcow2_cache_is_table_offset(bs, s->refcount_block_cache, 
offset);

if (table != NULL) {
qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
qcow2_cache_discard(bs, s->refcount_block_cache, table);
}

table = qcow2_cache_is_table_offset(bs, s->l2_table_cache, offset);
if (table != NULL) {
qcow2_cache_discard(bs, s->l2_table_cache, table);
}
...


Max


  update_refcount_discard(bs, cluster_offset, s->cluster_size);
  }
  }
diff --git a/block/qcow2.h b/block/qcow2.h
index 1801dc30dc..07faa6dc78 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -597,5 +597,6 @@ int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, 
uint64_t offset,
  int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t 
offset,
  void **table);
  void qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
+void qcow2_cache_discard(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset);
  
  #endif









Re: [Qemu-devel] [PATCH v2 1/4] qemu-img: add --shrink flag for resize

2017-06-22 Thread Pavel Butsykin

On 22.06.2017 01:17, Max Reitz wrote:

On 2017-06-13 14:16, Pavel Butsykin wrote:

The flag as additional precaution of data loss. Perhaps in the future the
operation shrink without this flag will be banned, but while we need to
maintain compatibility.

Signed-off-by: Pavel Butsykin 
---
  qemu-img-cmds.hx   |  4 ++--
  qemu-img.c | 15 +++
  qemu-img.texi  |  5 -
  tests/qemu-iotests/102 |  4 ++--
  4 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index a39fcdba71..3b2eab9d20 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -76,9 +76,9 @@ STEXI
  ETEXI
  
  DEF("resize", img_resize,

-"resize [--object objectdef] [--image-opts] [-q] filename [+ | -]size")
+"resize [--object objectdef] [--image-opts] [-q] [--shrink] filename [+ | 
-]size")
  STEXI
-@item resize [--object @var{objectdef}] [--image-opts] [-q] @var{filename} [+ 
| -]@var{size}
+@item resize [--object @var{objectdef}] [--image-opts] [-q] [--shrink] 
@var{filename} [+ | -]@var{size}
  ETEXI
  
  DEF("amend", img_amend,

diff --git a/qemu-img.c b/qemu-img.c
index 0ad698d7f1..bfe5f61b0b 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -61,6 +61,7 @@ enum {
  OPTION_FLUSH_INTERVAL = 261,
  OPTION_NO_DRAIN = 262,
  OPTION_TARGET_IMAGE_OPTS = 263,
+OPTION_SHRINK = 264,
  };
  
  typedef enum OutputFormat {

@@ -3452,6 +3453,7 @@ static int img_resize(int argc, char **argv)
  },
  };
  bool image_opts = false;
+bool shrink = false;
  
  /* Remove size from argv manually so that negative numbers are not treated

   * as options by getopt. */
@@ -3469,6 +3471,7 @@ static int img_resize(int argc, char **argv)
  {"help", no_argument, 0, 'h'},
  {"object", required_argument, 0, OPTION_OBJECT},
  {"image-opts", no_argument, 0, OPTION_IMAGE_OPTS},
+{"shrink", no_argument, 0, OPTION_SHRINK},
  {0, 0, 0, 0}
  };
  c = getopt_long(argc, argv, ":f:hq",
@@ -3503,6 +3506,9 @@ static int img_resize(int argc, char **argv)
  case OPTION_IMAGE_OPTS:
  image_opts = true;
  break;
+case OPTION_SHRINK:
+shrink = true;
+break;
  }
  }
  if (optind != argc - 1) {
@@ -3562,6 +3568,15 @@ static int img_resize(int argc, char **argv)
  goto out;
  }
  
+if (total_size < blk_getlength(blk) && !shrink) {

+qprintf(quiet, "Warning: shrinking of the image can lead to data loss. 
"


I think this should always be printed to stderr, even if quiet is true;
especially considering we (or at least I) plan to make it mandatory.



OK.


+   "Before performing shrink operation you must make sure "


*Before performaing a shrink operation


s/performaing/performing/


Also, I'd rather use the imperative: "Before ... operation, make sure
that the..."

(English isn't my native language either, but as far as I remember
"must" is generally used for external influences. But it's not like a
force of nature is forcing you to confirm there's no important data; you
can just ignore this advice and see all of your non-backed-up childhood
pictures go to /dev/null.)



Yes, It should be just a recommendation.

(As you might have already guessed, English isn't my native language too
:) I'm glad you are understanding. Really thanks for helping to improve
the text.)


+   "that the shrink part of image doesn't contain 
important"


Hmm... I don't think "shrink part" really works.

Maybe the following would work better:

   Warning: Shrinking an image will delete all data beyond the shrunken
   image's end. Before performing such an operation, make sure there is
   no important data there.


+   " data.\n");
+qprintf(quiet,
+"If you don't want to see this message use --shrink 
option.\n");


You should make a note that --shrink may (and I hope it will) become
necessary in the future, like:

   Using the --shrink option will suppress this message. Note that future
   versions of qemu-img may refuse to shrink images without this option!



will fix.


+}
+
  ret = blk_truncate(blk, total_size, &err);
  if (!ret) {
  qprintf(quiet, "Image resized.\n");
diff --git a/qemu-img.texi b/qemu-img.texi
index 5b925ecf41..c2b694cd00 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -499,7 +499,7 @@ qemu-img rebase -b base.img diff.qcow2
  At this point, @code{modified.img} can be discarded, since
  @code{base.img + diff.qcow2} contains the same information.
  
-@item resize @var{filename} [+ | -]@var{size}

+@item resize [--shrink] @var{filen

Re: [Qemu-devel] [PATCH 0/5] qemu-iotests: test savevm/loadvm iothread (and make it work!)

2017-06-19 Thread Pavel Butsykin

On 15.06.2017 19:38, Stefan Hajnoczi wrote:

This series extends qemu-iotests 068 to also run with iothread enabled.  Doing
so was harder than expected because:

1. ioeventfd is disabled without -M accel=kvm even though it should work
2. loadvm still has an iothread bug

Instead of adding a ./check -iothread option I decided to always run the test.
Kevin recently recommended this approach; the advantage is that iothread *will*
always be tested iothread mode whereas people won't run ./check -iothread.


Why not just add -iothread option in check-block.sh? We can do an
additional run with -iothread, as it is done for different block
formats.

Because all the test cases already exist, we can just reuse them.

Stefan Hajnoczi (5):
   virtio-pci: use ioeventfd even when KVM is disabled
   migration: hold AioContext lock for loadvm qemu_fclose()
   qemu-iotests: 068: extract _qemu() function
   qemu-iotests: 068: use -drive/-device instead of -hda
   qemu-iotests: 068: test iothread mode

  hw/virtio/virtio-pci.c |  2 +-
  migration/savevm.c |  2 +-
  tests/qemu-iotests/068 | 37 +
  tests/qemu-iotests/068.out | 11 ++-
  4 files changed, 37 insertions(+), 15 deletions(-)





Re: [Qemu-devel] [PATCH v3 3/4] migration: avoid recursive AioContext locking in save_vmstate()

2017-06-15 Thread Pavel Butsykin

On 14.06.2017 17:43, Kevin Wolf wrote:

Am 14.06.2017 um 15:15 hat Pavel Butsykin geschrieben:

On 14.06.2017 13:10, Pavel Butsykin wrote:


On 22.05.2017 16:57, Stefan Hajnoczi wrote:

AioContext was designed to allow nested acquire/release calls.  It uses
a recursive mutex so callers don't need to worry about nesting...or so
we thought.

BDRV_POLL_WHILE() is used to wait for block I/O requests.  It releases
the AioContext temporarily around aio_poll().  This gives IOThreads a
chance to acquire the AioContext to process I/O completions.

It turns out that recursive locking and BDRV_POLL_WHILE() don't mix.
BDRV_POLL_WHILE() only releases the AioContext once, so the IOThread
will not be able to acquire the AioContext if it was acquired
multiple times.

Instead of trying to release AioContext n times in BDRV_POLL_WHILE(),
this patch simply avoids nested locking in save_vmstate().  It's the
simplest fix and we should step back to consider the big picture with
all the recent changes to block layer threading.

This patch is the final fix to solve 'savevm' hanging with -object
iothread.


The same I see in external_snapshot_prepare():
[...]
and at the moment BDRV_POLL_WHILE(bs, flush_co.ret == NOT_DONE),
we have at least two locks.. So here is another deadlock.


Sorry, here different kind of deadlock. In external_snapshot case, the
deadlock can happen only if state->old_bs->aio_context == my_iothread->ctx,
because in this case the aio_co_enter() always calls aio_co_schedule():


Can you please write qemu-iotests case for any deadlock case that we're
seeing? Stefan, we could also use one for the bug fixed in this series.


It's 085 test, only need to enable iothread. (patch attached)
# ./check -qcow2 085 -iothread
...
+Timeout waiting for return on handle 0
Failures: 085
Failed 1 of 1 tests

The timeout because of the deadlock.


Actually the deadlock for the same reason :D

A recursive lock occurs when in the transaction more than one action:
void qmp_transaction(TransactionActionList *dev_list,
...
/* We don't do anything in this loop that commits us to the 
operations */

while (NULL != dev_entry) {
...
state->ops->prepare(state, &local_err);
if (local_err) {
error_propagate(errp, local_err);
goto delete_and_fail;
}
}

QSIMPLEQ_FOREACH(state, &snap_bdrv_states, entry) {
if (state->ops->commit) {
state->ops->commit(state);

There the contex lock is acquired in state->ops->prepare(), but is
released in state->ops->commit() (at bdrv_reopen_multiple()). And when
in a transaction two or more actions, we will see a recursive lock.
Unfortunately I have no idea how cheap it is to fix this.


Kevin

>From 674132232f94c1db8015ed780ba84f49fb0fd2bc Mon Sep 17 00:00:00 2001
From: Pavel Butsykin 
Date: Thu, 15 Jun 2017 15:42:26 +0300
Subject: [PATCH] qemu-iotests: add -iothread option

Signed-off-by: Pavel Butsykin 
---
 tests/qemu-iotests/085 | 9 -
 tests/qemu-iotests/common  | 7 +++
 tests/qemu-iotests/common.qemu | 9 +++--
 3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/085 b/tests/qemu-iotests/085
index b97adcd8db..7b4419deeb 100755
--- a/tests/qemu-iotests/085
+++ b/tests/qemu-iotests/085
@@ -131,7 +131,14 @@ echo === Running QEMU ===
 echo
 
 qemu_comm_method="qmp"
-_launch_qemu -drive file="${TEST_IMG}.1",if=virtio -drive file="${TEST_IMG}.2",if=virtio
+if [ "${IOTHREAD_QEMU}" = "y" ]; then
+_launch_qemu -drive file="${TEST_IMG}.1",if=none,id=virtio0 \
+-device virtio-blk-pci,iothread=iothread1,drive=virtio0 \
+-drive file="${TEST_IMG}.2",if=none,id=virtio1 \
+-device virtio-blk-pci,iothread=iothread1,drive=virtio1
+else
+_launch_qemu -drive file="${TEST_IMG}.1",if=virtio -drive file="${TEST_IMG}.2",if=virtio
+fi
 h=$QEMU_HANDLE
 
 echo
diff --git a/tests/qemu-iotests/common b/tests/qemu-iotests/common
index f2a7199c4b..fffbf39d55 100644
--- a/tests/qemu-iotests/common
+++ b/tests/qemu-iotests/common
@@ -53,6 +53,7 @@ export QEMU_IO_OPTIONS=""
 export CACHEMODE_IS_DEFAULT=true
 export QEMU_OPTIONS="-nodefaults -machine accel=qtest"
 export VALGRIND_QEMU=
+export IOTHREAD_QEMU=
 export IMGKEYSECRET=
 export IMGOPTSSYNTAX=false
 
@@ -165,6 +166,7 @@ image protocol options
 other options
 -xdiff  graphical mode diff
 -nocacheuse O_DIRECT on backing file
+-iothread   enable iothread
 -misalign   misalign memory allocations
 -n  show me, do not run tests
 -o options  -o options to pass to qemu-img create/convert
@@ -297,6 +299,11 @@ testlist options
 xpand=false
 ;;
 
+-iothread)
+IOTHREAD_QEMU='y'
+xpand=f

Re: [Qemu-devel] [PATCH v3 3/4] migration: avoid recursive AioContext locking in save_vmstate()

2017-06-14 Thread Pavel Butsykin

On 14.06.2017 13:10, Pavel Butsykin wrote:


On 22.05.2017 16:57, Stefan Hajnoczi wrote:

AioContext was designed to allow nested acquire/release calls.  It uses
a recursive mutex so callers don't need to worry about nesting...or so
we thought.

BDRV_POLL_WHILE() is used to wait for block I/O requests.  It releases
the AioContext temporarily around aio_poll().  This gives IOThreads a
chance to acquire the AioContext to process I/O completions.

It turns out that recursive locking and BDRV_POLL_WHILE() don't mix.
BDRV_POLL_WHILE() only releases the AioContext once, so the IOThread
will not be able to acquire the AioContext if it was acquired
multiple times.

Instead of trying to release AioContext n times in BDRV_POLL_WHILE(),
this patch simply avoids nested locking in save_vmstate().  It's the
simplest fix and we should step back to consider the big picture with
all the recent changes to block layer threading.

This patch is the final fix to solve 'savevm' hanging with -object
iothread.


The same I see in external_snapshot_prepare():
 /* Acquire AioContext now so any threads operating on old_bs stop */
 state->aio_context = bdrv_get_aio_context(state->old_bs);
 aio_context_acquire(state->aio_context);
 bdrv_drained_begin(state->old_bs);

 if (!bdrv_is_inserted(state->old_bs)) {
 error_setg(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
 return;
 }

 if (bdrv_op_is_blocked(state->old_bs,
BLOCK_OP_TYPE_EXTERNAL_SNAPSHOT, errp)) {
 return;
 }

 if (!bdrv_is_read_only(state->old_bs)) {
 if (bdrv_flush(state->old_bs)) {  <---!!!

and at the moment BDRV_POLL_WHILE(bs, flush_co.ret == NOT_DONE),
we have at least two locks.. So here is another deadlock.


Sorry, here different kind of deadlock. In external_snapshot case, the
deadlock can happen only if state->old_bs->aio_context == my_iothread->ctx,
because in this case the aio_co_enter() always calls aio_co_schedule():

void bdrv_coroutine_enter(BlockDriverState *bs, Coroutine *co)
{
aio_co_enter(bdrv_get_aio_context(bs), co);
}
...
void aio_co_enter(AioContext *ctx, struct Coroutine *co)
{
if (ctx != qemu_get_current_aio_context()) {
aio_co_schedule(ctx, co);
return;
}
...

But the iothread will not be able to perform coroutine, because the 
my_iothread->ctx is already under lock here:


static void external_snapshot_prepare(BlkActionState *common,
...
/* Acquire AioContext now so any threads operating on old_bs stop */
state->aio_context = bdrv_get_aio_context(state->old_bs);
aio_context_acquire(state->aio_context);
bdrv_drained_begin(state->old_bs);
...

As a result, we can see the following deadlock:

Thread 1 (Thread 0x7f0da6af9bc0 (LWP 973294)):
#0  0x7f0da49b5ebf in __GI_ppoll (fds=0x7f0da87434b0, nfds=1, 
timeout=, sigmask=0x0)

at ../sysdeps/unix/sysv/linux/ppoll.c:56
#1  0x7f0da71e8234 in qemu_poll_ns (fds=0x7f0da87434b0, nfds=1, 
timeout=-1) at util/qemu-timer.c:322
#2  0x7f0da71eaf32 in aio_poll (ctx=0x7f0da870e7c0, blocking=true) 
at util/aio-posix.c:622

#3  0x7f0da715ff9c in bdrv_flush (bs=0x7f0da876c270) at block/io.c:2397
#4  0x7f0da6e98344 in external_snapshot_prepare 
(common=0x7f0da9d727b0, errp=0x7ffec3893f38)

at blockdev.c:1686
#5  0x7f0da6e99537 in qmp_transaction (dev_list=0x7f0da98abb40, 
has_props=false,

props=0x7f0daa2788c0, errp=0x7ffec3893fc0) at blockdev.c:2205
#6  0x7f0da6ebee21 in qmp_marshal_transaction (args=0x7f0da9e7b700, 
ret=0x7ffec3894030,

errp=0x7ffec3894028) at qmp-marshal.c:5952
#7  0x7f0da71d9783 in do_qmp_dispatch (cmds=0x7f0da785f940 
, request=0x7f0da87b8400,

errp=0x7ffec3894090) at qapi/qmp-dispatch.c:104
#8  0x7f0da71d98bb in qmp_dispatch (cmds=0x7f0da785f940 
, request=0x7f0da87b8400)

at qapi/qmp-dispatch.c:131
#9  0x7f0da6d6a0a1 in handle_qmp_command (parser=0x7f0da874c150, 
tokens=0x7f0da870e2e0)

at /root/qemu/monitor.c:3834
#10 0x7f0da71e0c62 in json_message_process_token 
(lexer=0x7f0da874c158, input=0x7f0da870e200,

type=JSON_RCURLY, x=422, y=35) at qobject/json-streamer.c:105
#11 0x7f0da720ba27 in json_lexer_feed_char (lexer=0x7f0da874c158, 
ch=125 '}', flush=false)

at qobject/json-lexer.c:319
#12 0x7f0da720bb67 in json_lexer_feed (lexer=0x7f0da874c158, 
buffer=0x7ffec3894320 "}", size=1)

at qobject/json-lexer.c:369
#13 0x7f0da71e0d02 in json_message_parser_feed 
(parser=0x7f0da874c150, buffer=0x7ffec3894320 "}",

size=1) at qobject/json-streamer.c:124
---Type  to continue, or q  to quit---
#14 0x7f0da6d6a263 in monitor_qmp_read (opaque=0x7f0da874c0d0, 
buf=0x7ffec3894320 "}", size=1)

at /root/qemu/monitor.c:3876
#15 0x7f0da717af53 in qemu_chr_be_write_impl (s=0x7f0da871c3f0, 
buf=0x7ffec3894320 "}", len=1)

at chardev/ch

Re: [Qemu-devel] [PATCH v3 3/4] migration: avoid recursive AioContext locking in save_vmstate()

2017-06-14 Thread Pavel Butsykin


On 22.05.2017 16:57, Stefan Hajnoczi wrote:

AioContext was designed to allow nested acquire/release calls.  It uses
a recursive mutex so callers don't need to worry about nesting...or so
we thought.

BDRV_POLL_WHILE() is used to wait for block I/O requests.  It releases
the AioContext temporarily around aio_poll().  This gives IOThreads a
chance to acquire the AioContext to process I/O completions.

It turns out that recursive locking and BDRV_POLL_WHILE() don't mix.
BDRV_POLL_WHILE() only releases the AioContext once, so the IOThread
will not be able to acquire the AioContext if it was acquired
multiple times.

Instead of trying to release AioContext n times in BDRV_POLL_WHILE(),
this patch simply avoids nested locking in save_vmstate().  It's the
simplest fix and we should step back to consider the big picture with
all the recent changes to block layer threading.

This patch is the final fix to solve 'savevm' hanging with -object
iothread.


The same I see in external_snapshot_prepare():
/* Acquire AioContext now so any threads operating on old_bs stop */
state->aio_context = bdrv_get_aio_context(state->old_bs);
aio_context_acquire(state->aio_context);
bdrv_drained_begin(state->old_bs);

if (!bdrv_is_inserted(state->old_bs)) {
error_setg(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
return;
}

if (bdrv_op_is_blocked(state->old_bs,
   BLOCK_OP_TYPE_EXTERNAL_SNAPSHOT, errp)) {
return;
}

if (!bdrv_is_read_only(state->old_bs)) {
if (bdrv_flush(state->old_bs)) {  <---!!!

and at the moment BDRV_POLL_WHILE(bs, flush_co.ret == NOT_DONE),
we have at least two locks.. So here is another deadlock.


Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Reviewed-by: Paolo Bonzini 
---
  migration/savevm.c | 12 +++-
  1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index f5e8194..3ca319f 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2150,6 +2150,14 @@ int save_vmstate(const char *name, Error **errp)
  goto the_end;
  }
  
+/* The bdrv_all_create_snapshot() call that follows acquires the AioContext

+ * for itself.  BDRV_POLL_WHILE() does not support nested locking because
+ * it only releases the lock once.  Therefore synchronous I/O will deadlock
+ * unless we release the AioContext before bdrv_all_create_snapshot().
+ */
+aio_context_release(aio_context);
+aio_context = NULL;
+
  ret = bdrv_all_create_snapshot(sn, bs, vm_state_size, &bs);
  if (ret < 0) {
  error_setg(errp, "Error while creating snapshot on '%s'",
@@ -2160,7 +2168,9 @@ int save_vmstate(const char *name, Error **errp)
  ret = 0;
  
   the_end:

-aio_context_release(aio_context);
+if (aio_context) {
+aio_context_release(aio_context);
+}
  if (saved_vm_running) {
  vm_start();
  }





[Qemu-devel] [PATCH v2 4/4] qemu-iotests: add shrinking image test

2017-06-13 Thread Pavel Butsykin
Signed-off-by: Pavel Butsykin 
---
 tests/qemu-iotests/163 | 113 +
 tests/qemu-iotests/163.out |   5 ++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 119 insertions(+)
 create mode 100644 tests/qemu-iotests/163
 create mode 100644 tests/qemu-iotests/163.out

diff --git a/tests/qemu-iotests/163 b/tests/qemu-iotests/163
new file mode 100644
index 00..2cb0116173
--- /dev/null
+++ b/tests/qemu-iotests/163
@@ -0,0 +1,113 @@
+#!/usr/bin/env python
+#
+# Tests for shrinking images
+#
+# Copyright (c) 2016-2017 Parallels International GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import os, random, iotests
+from iotests import qemu_img, qemu_io, image_size
+
+test_img = os.path.join(iotests.test_dir, 'test.img')
+check_img = os.path.join(iotests.test_dir, 'check.img')
+
+def size_to_int(str):
+suff = ['B', 'K', 'M', 'G', 'T']
+return int(str[:-1]) * 1024**suff.index(str[-1:])
+
+class TestShrink(iotests.QMPTestCase):
+image_len = '1G'
+shrink_size = '128M'
+chank_size = '256M'
+
+def setUp(self):
+qemu_img('create', '-f', iotests.imgfmt, test_img, 
TestShrink.image_len)
+qemu_img('create', '-f', iotests.imgfmt, check_img,
+ TestShrink.shrink_size)
+
+def tearDown(self):
+os.remove(test_img)
+os.remove(check_img)
+
+def image_verify(self):
+self.assertEqual(image_size(test_img), image_size(check_img),
+ "Verifying image size")
+
+if iotests.imgfmt == 'raw':
+return
+
+self.assertEqual(qemu_img('check', test_img),
+ qemu_img('check', check_img),
+ "Verifying image corruption")
+
+def test_empty_image(self):
+qemu_img('resize',  '-f', iotests.imgfmt, '--shrink', test_img,
+ TestShrink.shrink_size)
+
+self.assertEqual(
+qemu_io('-c', 'read -P 0x00 %s'%TestShrink.shrink_size, test_img),
+qemu_io('-c', 'read -P 0x00 %s'%TestShrink.shrink_size, check_img),
+"Verifying image content")
+
+TestShrink.image_verify(self)
+
+def test_sequential_write(self):
+for offs in range(0, size_to_int(TestShrink.image_len),
+  size_to_int(TestShrink.chank_size)):
+qemu_io('-c', 'write -P 0xff %d %s' % (offs, 
TestShrink.chank_size),
+test_img)
+
+qemu_img('resize',  '-f', iotests.imgfmt, '--shrink', test_img,
+ TestShrink.shrink_size)
+
+self.assertEqual(
+qemu_io('-c', 'read -P 0xff %s'%TestShrink.image_len, test_img),
+qemu_io('-c', 'read -P 0xff %s'%TestShrink.image_len, check_img),
+"Verifying image content")
+
+self.assertEqual(
+qemu_io('-c', 'read -P 0xff %s'%TestShrink.shrink_size, test_img),
+qemu_io('-c', 'read -P 0xff %s'%TestShrink.shrink_size, check_img),
+"Verifying image content")
+
+TestShrink.image_verify(self)
+
+def test_random_write(self):
+offs_list = range(0, size_to_int(TestShrink.image_len),
+  size_to_int(TestShrink.chank_size))
+random.shuffle(offs_list)
+for offs in offs_list:
+qemu_io('-c', 'write -P 0xff %d %s' % (offs, 
TestShrink.chank_size),
+test_img)
+
+qemu_img('resize',  '-f', iotests.imgfmt, '--shrink', test_img,
+ TestShrink.shrink_size)
+
+self.assertEqual(
+qemu_io('-c', 'read -P 0xff %s'%TestShrink.image_len, test_img),
+qemu_io('-c', 'read -P 0xff %s'%TestShrink.image_len, check_img),
+"Verifying image content")
+
+self.assertEqual(
+qemu_io('-c', 'read -P 0xff %s'%TestShrink.

[Qemu-devel] [PATCH v2 3/4] qcow2: add shrink image support

2017-06-13 Thread Pavel Butsykin
This patch add shrinking of the image file for qcow2. As a result, this allows
us to reduce the virtual image size and free up space on the disk without
copying the image. Image can be fragmented and shrink is done by punching holes
in the image file.

Signed-off-by: Pavel Butsykin 
---
 block/qcow2-cluster.c  | 42 
 block/qcow2-refcount.c | 65 ++
 block/qcow2.c  | 40 +++
 block/qcow2.h  |  2 ++
 qapi/block-core.json   |  3 ++-
 5 files changed, 141 insertions(+), 11 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index d779ea19cf..a84b7e607e 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -32,6 +32,48 @@
 #include "qemu/bswap.h"
 #include "trace.h"
 
+int qcow2_shrink_l1_table(BlockDriverState *bs, uint64_t max_size)
+{
+BDRVQcow2State *s = bs->opaque;
+int new_l1_size, i, ret;
+
+if (max_size >= s->l1_size) {
+return 0;
+}
+
+new_l1_size = max_size;
+
+#ifdef DEBUG_ALLOC2
+fprintf(stderr, "shrink l1_table from %d to %" PRId64 "\n",
+s->l1_size, new_l1_size);
+#endif
+
+BLKDBG_EVENT(bs->file, BLKDBG_L1_SHRINK_WRITE_TABLE);
+ret = bdrv_pwrite_zeroes(bs->file, s->l1_table_offset +
+   sizeof(uint64_t) * new_l1_size,
+ (s->l1_size - new_l1_size) * sizeof(uint64_t), 0);
+if (ret < 0) {
+return ret;
+}
+
+ret = bdrv_flush(bs->file->bs);
+if (ret < 0) {
+return ret;
+}
+
+BLKDBG_EVENT(bs->file, BLKDBG_L1_SHRINK_FREE_L2_CLUSTERS);
+for (i = s->l1_size - 1; i > new_l1_size - 1; i--) {
+if ((s->l1_table[i] & L1E_OFFSET_MASK) == 0) {
+continue;
+}
+qcow2_free_clusters(bs, s->l1_table[i] & L1E_OFFSET_MASK,
+s->l2_size * sizeof(uint64_t),
+QCOW2_DISCARD_ALWAYS);
+s->l1_table[i] = 0;
+}
+return 0;
+}
+
 int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
 bool exact_size)
 {
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 576ab551d6..e98306acd8 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -29,6 +29,7 @@
 #include "block/qcow2.h"
 #include "qemu/range.h"
 #include "qemu/bswap.h"
+#include "qemu/cutils.h"
 
 static int64_t alloc_clusters_noref(BlockDriverState *bs, uint64_t size);
 static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
@@ -2936,3 +2937,67 @@ done:
 qemu_vfree(new_refblock);
 return ret;
 }
+
+int qcow2_shrink_reftable(BlockDriverState *bs)
+{
+BDRVQcow2State *s = bs->opaque;
+uint64_t *reftable_tmp =
+g_try_malloc(sizeof(uint64_t) * s->refcount_table_size);
+int i, ret;
+
+if (s->refcount_table_size && reftable_tmp == NULL) {
+return -ENOMEM;
+}
+
+for (i = 0; i < s->refcount_table_size; i++) {
+int64_t refblock_offs = s->refcount_table[i] & REFT_OFFSET_MASK;
+void *refblock;
+bool unused_block;
+
+if (refblock_offs == 0) {
+reftable_tmp[i] = 0;
+continue;
+}
+ret = qcow2_cache_get(bs, s->refcount_block_cache, refblock_offs,
+  &refblock);
+if (ret < 0) {
+goto out;
+}
+
+/* the refblock has own reference */
+if (i == refblock_offs >> (s->refcount_block_bits + s->cluster_bits)) {
+uint64_t blk_index = (refblock_offs >> s->cluster_bits) &
+ (s->refcount_block_size - 1);
+uint64_t refcount = s->get_refcount(refblock, blk_index);
+
+s->set_refcount(refblock, blk_index, 0);
+
+unused_block = buffer_is_zero(refblock, s->refcount_block_size);
+
+s->set_refcount(refblock, blk_index, refcount);
+} else {
+unused_block = buffer_is_zero(refblock, s->refcount_block_size);
+}
+qcow2_cache_put(bs, s->refcount_block_cache, &refblock);
+
+reftable_tmp[i] = unused_block ? 0 : cpu_to_be64(s->refcount_table[i]);
+}
+
+ret = bdrv_pwrite_sync(bs->file, s->refcount_table_offset, reftable_tmp,
+   sizeof(uint64_t) * s->refcount_table_size);
+if (ret < 0) {
+goto out;
+}
+
+for (i = 0; i < s->refcount_table_size; i++) {
+if (s->refcount_table[i] && !reftable_tmp[i]) {
+qcow2_free_clusters(bs, s->refcount_table[i] & REFT_OFFSET_MASK,
+s->cluster_size, QCOW2_DISCARD_ALWAYS);
+s->refcount_tab

[Qemu-devel] [PATCH v2 1/4] qemu-img: add --shrink flag for resize

2017-06-13 Thread Pavel Butsykin
The flag as additional precaution of data loss. Perhaps in the future the
operation shrink without this flag will be banned, but while we need to
maintain compatibility.

Signed-off-by: Pavel Butsykin 
---
 qemu-img-cmds.hx   |  4 ++--
 qemu-img.c | 15 +++
 qemu-img.texi  |  5 -
 tests/qemu-iotests/102 |  4 ++--
 4 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index a39fcdba71..3b2eab9d20 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -76,9 +76,9 @@ STEXI
 ETEXI
 
 DEF("resize", img_resize,
-"resize [--object objectdef] [--image-opts] [-q] filename [+ | -]size")
+"resize [--object objectdef] [--image-opts] [-q] [--shrink] filename [+ | 
-]size")
 STEXI
-@item resize [--object @var{objectdef}] [--image-opts] [-q] @var{filename} [+ 
| -]@var{size}
+@item resize [--object @var{objectdef}] [--image-opts] [-q] [--shrink] 
@var{filename} [+ | -]@var{size}
 ETEXI
 
 DEF("amend", img_amend,
diff --git a/qemu-img.c b/qemu-img.c
index 0ad698d7f1..bfe5f61b0b 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -61,6 +61,7 @@ enum {
 OPTION_FLUSH_INTERVAL = 261,
 OPTION_NO_DRAIN = 262,
 OPTION_TARGET_IMAGE_OPTS = 263,
+OPTION_SHRINK = 264,
 };
 
 typedef enum OutputFormat {
@@ -3452,6 +3453,7 @@ static int img_resize(int argc, char **argv)
 },
 };
 bool image_opts = false;
+bool shrink = false;
 
 /* Remove size from argv manually so that negative numbers are not treated
  * as options by getopt. */
@@ -3469,6 +3471,7 @@ static int img_resize(int argc, char **argv)
 {"help", no_argument, 0, 'h'},
 {"object", required_argument, 0, OPTION_OBJECT},
 {"image-opts", no_argument, 0, OPTION_IMAGE_OPTS},
+{"shrink", no_argument, 0, OPTION_SHRINK},
 {0, 0, 0, 0}
 };
 c = getopt_long(argc, argv, ":f:hq",
@@ -3503,6 +3506,9 @@ static int img_resize(int argc, char **argv)
 case OPTION_IMAGE_OPTS:
 image_opts = true;
 break;
+case OPTION_SHRINK:
+shrink = true;
+break;
 }
 }
 if (optind != argc - 1) {
@@ -3562,6 +3568,15 @@ static int img_resize(int argc, char **argv)
 goto out;
 }
 
+if (total_size < blk_getlength(blk) && !shrink) {
+qprintf(quiet, "Warning: shrinking of the image can lead to data loss. 
"
+   "Before performing shrink operation you must make sure "
+   "that the shrink part of image doesn't contain 
important"
+   " data.\n");
+qprintf(quiet,
+"If you don't want to see this message use --shrink 
option.\n");
+}
+
 ret = blk_truncate(blk, total_size, &err);
 if (!ret) {
 qprintf(quiet, "Image resized.\n");
diff --git a/qemu-img.texi b/qemu-img.texi
index 5b925ecf41..c2b694cd00 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -499,7 +499,7 @@ qemu-img rebase -b base.img diff.qcow2
 At this point, @code{modified.img} can be discarded, since
 @code{base.img + diff.qcow2} contains the same information.
 
-@item resize @var{filename} [+ | -]@var{size}
+@item resize [--shrink] @var{filename} [+ | -]@var{size}
 
 Change the disk image as if it had been created with @var{size}.
 
@@ -507,6 +507,9 @@ Before using this command to shrink a disk image, you MUST 
use file system and
 partitioning tools inside the VM to reduce allocated file systems and partition
 sizes accordingly.  Failure to do so will result in data loss!
 
+If @code{--shrink} is specified, warning about data loss doesn't print for
+the shrink operation.
+
 After using this command to grow a disk image, you must use file system and
 partitioning tools inside the VM to actually begin using the new space on the
 device.
diff --git a/tests/qemu-iotests/102 b/tests/qemu-iotests/102
index 87db1bb1bf..d7ad8d9840 100755
--- a/tests/qemu-iotests/102
+++ b/tests/qemu-iotests/102
@@ -54,7 +54,7 @@ _make_test_img $IMG_SIZE
 $QEMU_IO -c 'write 0 64k' "$TEST_IMG" | _filter_qemu_io
 # Remove data cluster from image (first cluster: image header, second: 
reftable,
 # third: refblock, fourth: L1 table, fifth: L2 table)
-$QEMU_IMG resize -f raw "$TEST_IMG" $((5 * 64 * 1024))
+$QEMU_IMG resize -f raw --shrink "$TEST_IMG" $((5 * 64 * 1024))
 
 $QEMU_IO -c map "$TEST_IMG"
 $QEMU_IMG map "$TEST_IMG"
@@ -69,7 +69,7 @@ $QEMU_IO -c 'write 0 64k' "$TEST_IMG" | _filter_qemu_io
 
 qemu_comm_method=monitor _launch_qemu -drive if=none,file="$TEST_IMG",id=drv0
 
-$QEMU_IMG resize -f raw "$TEST_IMG" $((5 * 64 * 1024))
+$QEMU_IMG resize -f raw --shrink "$TEST_IMG" $((5 * 64 * 1024))
 
 _send_qemu_cmd $QEMU_HANDLE 'qemu-io drv0 map' 'allocated' \
 | sed -e 's/^(qemu).*qemu-io drv0 map...$/(qemu) qemu-io drv0 map/'
-- 
2.13.0




[Qemu-devel] [PATCH v2 2/4] qcow2: add qcow2_cache_discard

2017-06-13 Thread Pavel Butsykin
Whenever l2/refcount table clusters are discarded from the file we can
automatically drop unnecessary content of the cache tables. This reduces
the chance of eviction useful cache data and eliminates inconsistent data
in thecache with the data in the file.

Signed-off-by: Pavel Butsykin 
---
 block/qcow2-cache.c| 21 +
 block/qcow2-refcount.c |  5 +
 block/qcow2.h  |  1 +
 3 files changed, 27 insertions(+)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index 1d25147392..7931edf237 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -411,3 +411,24 @@ void qcow2_cache_entry_mark_dirty(BlockDriverState *bs, 
Qcow2Cache *c,
 assert(c->entries[i].offset != 0);
 c->entries[i].dirty = true;
 }
+
+void qcow2_cache_discard(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset)
+{
+int i;
+
+for (i = 0; i < c->size; i++) {
+if (c->entries[i].offset == offset) {
+goto found; /* table offset */
+}
+}
+return;
+
+found:
+assert(c->entries[i].ref == 0);
+
+c->entries[i].offset = 0;
+c->entries[i].lru_counter = 0;
+c->entries[i].dirty = false;
+
+qcow2_cache_table_release(bs, c, i, 1);
+}
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 7c06061aae..576ab551d6 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -767,6 +767,11 @@ static int QEMU_WARN_UNUSED_RESULT 
update_refcount(BlockDriverState *bs,
 s->set_refcount(refcount_block, block_index, refcount);
 
 if (refcount == 0 && s->discard_passthrough[type]) {
+qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
+qcow2_cache_discard(bs, s->refcount_block_cache, offset);
+
+qcow2_cache_discard(bs, s->l2_table_cache, offset);
+
 update_refcount_discard(bs, cluster_offset, s->cluster_size);
 }
 }
diff --git a/block/qcow2.h b/block/qcow2.h
index 1801dc30dc..07faa6dc78 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -597,5 +597,6 @@ int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, 
uint64_t offset,
 int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
 void **table);
 void qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
+void qcow2_cache_discard(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset);
 
 #endif
-- 
2.13.0




[Qemu-devel] [PATCH v2 0/4] Add shrink image for qcow2

2017-06-13 Thread Pavel Butsykin
This patch add shrinking of the image file for qcow2. As a result, this allows
us to reduce the virtual image size and free up space on the disk without
copying the image. Image can be fragmented and shrink is done by punching holes
in the image file.

# ./qemu-img create -f qcow2 -o size=4G image.qcow2
Formatting 'image.qcow2', fmt=qcow2 size=4294967296 encryption=off 
cluster_size=65536 lazy_refcounts=off refcount_bits=16

# ./qemu-io -c "write -P 0x22 0 1G" image.qcow2
wrote 1073741824/1073741824 bytes at offset 0
1 GiB, 1 ops; 0:00:02.59 (395.180 MiB/sec and 0.3859 ops/sec)

# ./qemu-img resize image.qcow2 512M
Warning: shrinking of the image can lead to data loss. Before performing shrink 
operation you must make sure that the shrink part of image doesn't contain 
important data.
If you don't want to see this message use --shrink option.
Image resized.

# ./qemu-img resize --shrink image.qcow2 128M
Image resized.

# ./qemu-img info image.qcow2
image: image.qcow2
file format: qcow2
virtual size: 128M (134217728 bytes)
disk size: 128M
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false

# du -h ./image.qcow2 
129M./image.qcow2

Changes from v1:
- add --shrink flag for qemu-img resize
- add qcow2_cache_discard
- simplify qcow2_shrink_l1_table() to reduce the likelihood of image corruption
- add new qemu-iotests for shrinking images

Pavel Butsykin (4):
  qemu-img: add --shrink flag for resize
  qcow2: add qcow2_cache_discard
  qcow2: add shrink image support
  qemu-iotests: add shrinking image test

 block/qcow2-cache.c|  21 +
 block/qcow2-cluster.c  |  42 +
 block/qcow2-refcount.c |  70 
 block/qcow2.c  |  40 
 block/qcow2.h  |   3 ++
 qapi/block-core.json   |   3 +-
 qemu-img-cmds.hx   |   4 +-
 qemu-img.c |  15 ++
 qemu-img.texi  |   5 +-
 tests/qemu-iotests/102 |   4 +-
 tests/qemu-iotests/163 | 113 +
 tests/qemu-iotests/163.out |   5 ++
 tests/qemu-iotests/group   |   1 +
 13 files changed, 310 insertions(+), 16 deletions(-)
 create mode 100644 tests/qemu-iotests/163
 create mode 100644 tests/qemu-iotests/163.out

-- 
2.13.0




Re: [Qemu-devel] [PATCH 1/2] qcow2: add reduce image support

2017-06-02 Thread Pavel Butsykin



On 01.06.2017 17:41, Kevin Wolf wrote:

Am 31.05.2017 um 16:43 hat Pavel Butsykin geschrieben:

This patch adds the reduction of the image file for qcow2. As a result, this
allows us to reduce the virtual image size and free up space on the disk without
copying the image. Image can be fragmented and reduction is done by punching
holes in the image file.

Signed-off-by: Pavel Butsykin 
---
  block/qcow2-cache.c|  8 +
  block/qcow2-cluster.c  | 83 ++
  block/qcow2-refcount.c | 65 +++
  block/qcow2.c  | 40 ++--
  block/qcow2.h  |  4 +++
  qapi/block-core.json   |  4 ++-
  6 files changed, 193 insertions(+), 11 deletions(-)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index 1d25147392..da55118ca7 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -411,3 +411,11 @@ void qcow2_cache_entry_mark_dirty(BlockDriverState *bs, 
Qcow2Cache *c,
  assert(c->entries[i].offset != 0);
  c->entries[i].dirty = true;
  }
+
+void qcow2_cache_entry_mark_clean(BlockDriverState *bs, Qcow2Cache *c,
+ void *table)
+{
+int i = qcow2_cache_get_table_idx(bs, c, table);
+assert(c->entries[i].offset != 0);
+c->entries[i].dirty = false;
+}


This is an interesting function. We can use it whenever we're not
interested in the content of the table any more. However, we still keep
that data in the cache and may even evict other tables before this one.
The data in the cache also becomes inconsistent with the data in the
file, which should not be a problem in theory (because nobody should be
using it), but it surely could be confusing when debugging something in
the cache.



Good idea!


We can easily improve this a little: Make it qcow2_cache_discard(), a
function that gets a cluster offset, asserts that a table at this
offset isn't in use (not cached or ref == 0), and then just directly
drops it from the cache. This can be called from update_refcount()
whenever a refcount goes to 0, immediately before or after calling
update_refcount_discard() - those two are closely related. Then this
would automatically also be used for L2 tables.



Did I understand correctly? Every time we need to check the incoming
offset to make sure it is offset to L2/refcount table (not to the guest 
data) ?



Adding this mechanism could be a patch of its own

...


Kevin





Re: [Qemu-devel] [PATCH 0/2] Add reduce image for qcow2

2017-05-31 Thread Pavel Butsykin


On 31.05.2017 19:10, Richard W.M. Jones wrote:

On Wed, May 31, 2017 at 06:54:33PM +0300, Pavel Butsykin wrote:

It is assumed that the user has already made a preparatory with the
image:
1. freeing space at the end of the image
2. reducing the last partition on the disk
3. rebuilding fs
Only after these steps, the user can reduce the image by qemu-img.


It's tricky with GPT, as GPT puts a second copy of the header in the
last few sectors of the disk.  I always advise people to use trim (or
virt-sparsify) instead of trying to adjust virtual disk size.


virt-sparsify is a very useful utility, I use it too :) I agree that the
trim is much better if the goal is only to free up space on the host
disk. But there's another goal is to limit further growth of the image.



Nevertheless I don't think we should prevent shrinking qcow2 virtual
size.  It's likely useful to someone.

Rich.





Re: [Qemu-devel] [PATCH 0/2] Add reduce image for qcow2

2017-05-31 Thread Pavel Butsykin

On 31.05.2017 19:03, Max Reitz wrote:

On 2017-05-31 17:54, Pavel Butsykin wrote:

On 31.05.2017 18:03, Eric Blake wrote:

On 05/31/2017 09:43 AM, Pavel Butsykin wrote:

This patch adds the reduction of the image file for qcow2. As a
result, this
allows us to reduce the virtual image size and free up space on the
disk without
copying the image. Image can be fragmented and reduction is done by
punching
holes in the image file.

# ./qemu-img create -f qcow2 -o size=4G image.qcow2
Formatting 'image.qcow2', fmt=qcow2 size=4294967296 encryption=off
cluster_size=65536 lazy_refcounts=off refcount_bits=16

# ./qemu-io -c "write -P 0x22 0 1G" image.qcow2


So this is 1G of guest-visible data...


# ./qemu-img resize image.qcow2 128M
Image resized.


...and you are truncating the image by throwing away guest-visible
content, with no warning or double-checking (such as requiring a -f
force parameter or something) about the data loss.  Shrinking images is
something we should allow, but it feels like is a rare enough operation
that you don't want it to be casually easy to throw away data.


It is assumed that the user has already made a preparatory with the
image:
1. freeing space at the end of the image
2. reducing the last partition on the disk
3. rebuilding fs
Only after these steps, the user can reduce the image by qemu-img.


But your implementation allows the user to reduce it anyway, even if
these steps have not been performed.

I very much agree that we have to be careful because otherwise you might
ruin all your data if you hand-type a resize command and drop a digit.



We could check that the shrinking part of the image doesn't contain
non-zero clusters and print just a warning. But on the other hand, if
the user has not performed trim, at the end of the disk will still be
dirty cluster and we can't force users to do trim :)

We can add a flag --force and without flag just print a warning.


I think it's not so rare case, sometimes people run out of disk space
and this is another way to solve the problem (along with the use of
trim).

We already have all the interfaces, left them only to support :)


Is it feasible to require that a shrink operation will not be performed
unless all clusters being eliminated have been previously discarded (or
maybe written to zero), as an assurance that the guest did not care
about the tail of the image?



Yes.

# ./qemu-img create -f qcow2 -o size=4G image.qcow2

# ./qemu-io -c "write -P 0x22 0 1G" image.qcow2
# ./qemu-io -c "write -P 0x22 1G 1G" image.qcow2

# qemu-img map ./image.qcow2
Offset  Length  Mapped to   File
0   0x2000  0x5 ./image.qcow2
0x2000  0x2000  0x2006  ./image.qcow2
0x4000  0x2000  0x4007  ./image.qcow2
0x6000  0x2000  0x6008  ./image.qcow2

# ./qemu-io -c "discard 1G 1G" ./image.qcow2

# qemu-img map ./image.qcow2
Offset  Length  Mapped to   File0 0x2000
0x5 ./image.qcow2
0x2000  0x2000  0x2006  ./image.qcow2


No, it isn't, because qemu-io is a debugging tool and a debugging tool only.

We could require the user to perform a trim operation or something in
the guest instead -- but I'd prefer a plain new flag for qemu-img resize
that says the user is OK with shrinking the image and thus throwing data
way.

I think it's fine to have this flag only as part of the qemu-img
interface, not e.g. for the block-resize QMP command. I think it's
reasonable to assume someone sending a QMP command (i.e. usually the
management layer) to know exactly what they are doing. OTOH, I wouldn't
oppose a flag there, though, I just don't think it's absolutely necessary.



I agree that the flag can be only as protection from the human factor.


Max





Re: [Qemu-devel] [PATCH 0/2] Add reduce image for qcow2

2017-05-31 Thread Pavel Butsykin

On 31.05.2017 18:03, Eric Blake wrote:

On 05/31/2017 09:43 AM, Pavel Butsykin wrote:

This patch adds the reduction of the image file for qcow2. As a result, this
allows us to reduce the virtual image size and free up space on the disk without
copying the image. Image can be fragmented and reduction is done by punching
holes in the image file.

# ./qemu-img create -f qcow2 -o size=4G image.qcow2
Formatting 'image.qcow2', fmt=qcow2 size=4294967296 encryption=off 
cluster_size=65536 lazy_refcounts=off refcount_bits=16

# ./qemu-io -c "write -P 0x22 0 1G" image.qcow2


So this is 1G of guest-visible data...


# ./qemu-img resize image.qcow2 128M
Image resized.


...and you are truncating the image by throwing away guest-visible
content, with no warning or double-checking (such as requiring a -f
force parameter or something) about the data loss.  Shrinking images is
something we should allow, but it feels like is a rare enough operation
that you don't want it to be casually easy to throw away data.


It is assumed that the user has already made a preparatory with the
image:
1. freeing space at the end of the image
2. reducing the last partition on the disk
3. rebuilding fs
Only after these steps, the user can reduce the image by qemu-img.

I think it's not so rare case, sometimes people run out of disk space 
and this is another way to solve the problem (along with the use of

trim).

We already have all the interfaces, left them only to support :)


Is it feasible to require that a shrink operation will not be performed
unless all clusters being eliminated have been previously discarded (or
maybe written to zero), as an assurance that the guest did not care
about the tail of the image?



Yes.

# ./qemu-img create -f qcow2 -o size=4G image.qcow2

# ./qemu-io -c "write -P 0x22 0 1G" image.qcow2
# ./qemu-io -c "write -P 0x22 1G 1G" image.qcow2

# qemu-img map ./image.qcow2
Offset  Length  Mapped to   File
0   0x2000  0x5 ./image.qcow2
0x2000  0x2000  0x2006  ./image.qcow2
0x4000  0x2000  0x4007  ./image.qcow2
0x6000  0x2000  0x6008  ./image.qcow2

# ./qemu-io -c "discard 1G 1G" ./image.qcow2

# qemu-img map ./image.qcow2
Offset  Length  Mapped to   File0 
0x2000  0x5 ./image.qcow2

0x2000  0x2000  0x2006  ./image.qcow2



Re: [Qemu-devel] [PATCH 2/2] qemu-iotests: add reducing image test in 025

2017-05-31 Thread Pavel Butsykin



On 31.05.2017 17:43, Pavel Butsykin wrote:

Signed-off-by: Pavel Butsykin 
---
  tests/qemu-iotests/025 | 19 +--
  tests/qemu-iotests/025.out | 12 +++-
  2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/025 b/tests/qemu-iotests/025
index f5e672e6b3..658601579b 100755
--- a/tests/qemu-iotests/025
+++ b/tests/qemu-iotests/025
@@ -38,7 +38,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
  . ./common.filter
  . ./common.pattern
  
-_supported_fmt raw qcow2 qed

+_supported_fmt raw qcow2


I'm not sure, can I so blatantly drop QED here. But this place is very
suitable for reduce image case. Perhaps the alternative would be adding
a new test, I just didn't want to copy the tests, which are testing
almost the same thing.


  _supported_proto file sheepdog rbd nfs
  _supported_os Linux
  
@@ -46,6 +46,7 @@ echo "=== Creating image"

  echo
  small_size=$((128 * 1024 * 1024))
  big_size=$((384 * 1024 * 1024))
+bigger_size=$((512 * 1024 * 1024))
  _make_test_img $small_size
  
  echo

@@ -54,7 +55,20 @@ io_pattern write 0 $small_size 0 1 0xc5
  _check_test_img
  
  echo

-echo "=== Resizing image"
+echo "=== Growing image"
+$QEMU_IO "$TEST_IMG" <  
  # success, all done

  echo "*** done"
diff --git a/tests/qemu-iotests/025.out b/tests/qemu-iotests/025.out
index f13fc2863c..a0293711c7 100644
--- a/tests/qemu-iotests/025.out
+++ b/tests/qemu-iotests/025.out
@@ -9,8 +9,16 @@ wrote 134217728/134217728 bytes at offset 0
  128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
  No errors were found on the image.
  
-=== Resizing image

+=== Growing image
  128 MiB
+512 MiB
+No errors were found on the image.
+
+=== Verifying image size after reopen
+512 MiB
+
+=== Reducing image
+512 MiB
  384 MiB
  No errors were found on the image.
  
@@ -24,4 +32,6 @@ read 134217728/134217728 bytes at offset 0

  === IO: pattern 0
  read 268435456/268435456 bytes at offset 134217728
  256 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+=== IO: pattern 0
+read failed: Input/output error
  *** done





[Qemu-devel] [PATCH 0/2] Add reduce image for qcow2

2017-05-31 Thread Pavel Butsykin
This patch adds the reduction of the image file for qcow2. As a result, this
allows us to reduce the virtual image size and free up space on the disk without
copying the image. Image can be fragmented and reduction is done by punching
holes in the image file.

# ./qemu-img create -f qcow2 -o size=4G image.qcow2
Formatting 'image.qcow2', fmt=qcow2 size=4294967296 encryption=off 
cluster_size=65536 lazy_refcounts=off refcount_bits=16

# ./qemu-io -c "write -P 0x22 0 1G" image.qcow2
wrote 1073741824/1073741824 bytes at offset 0
1 GiB, 1 ops; 0:00:02.59 (395.180 MiB/sec and 0.3859 ops/sec)

# ./qemu-img resize image.qcow2 128M
Image resized.

# qemu-img info image.qcow2 
image: image.qcow2
file format: qcow2
virtual size: 128M (134217728 bytes)
disk size: 128M
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false

# du -h image.qcow2 
129M    image.qcow2

Pavel Butsykin (2):
  qcow2: add reduce image support
  qemu-iotests: add reducing image test in 025

 block/qcow2-cache.c|  8 +
 block/qcow2-cluster.c  | 83 ++
 block/qcow2-refcount.c | 65 
 block/qcow2.c  | 40 --
 block/qcow2.h  |  4 +++
 qapi/block-core.json   |  4 ++-
 tests/qemu-iotests/025 | 19 +--
 tests/qemu-iotests/025.out | 12 ++-
 8 files changed, 221 insertions(+), 14 deletions(-)

-- 
2.11.0




[Qemu-devel] [PATCH 2/2] qemu-iotests: add reducing image test in 025

2017-05-31 Thread Pavel Butsykin
Signed-off-by: Pavel Butsykin 
---
 tests/qemu-iotests/025 | 19 +--
 tests/qemu-iotests/025.out | 12 +++-
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/025 b/tests/qemu-iotests/025
index f5e672e6b3..658601579b 100755
--- a/tests/qemu-iotests/025
+++ b/tests/qemu-iotests/025
@@ -38,7 +38,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.filter
 . ./common.pattern
 
-_supported_fmt raw qcow2 qed
+_supported_fmt raw qcow2
 _supported_proto file sheepdog rbd nfs
 _supported_os Linux
 
@@ -46,6 +46,7 @@ echo "=== Creating image"
 echo
 small_size=$((128 * 1024 * 1024))
 big_size=$((384 * 1024 * 1024))
+bigger_size=$((512 * 1024 * 1024))
 _make_test_img $small_size
 
 echo
@@ -54,7 +55,20 @@ io_pattern write 0 $small_size 0 1 0xc5
 _check_test_img
 
 echo
-echo "=== Resizing image"
+echo "=== Growing image"
+$QEMU_IO "$TEST_IMG" <

[Qemu-devel] [PATCH 1/2] qcow2: add reduce image support

2017-05-31 Thread Pavel Butsykin
This patch adds the reduction of the image file for qcow2. As a result, this
allows us to reduce the virtual image size and free up space on the disk without
copying the image. Image can be fragmented and reduction is done by punching
holes in the image file.

Signed-off-by: Pavel Butsykin 
---
 block/qcow2-cache.c|  8 +
 block/qcow2-cluster.c  | 83 ++
 block/qcow2-refcount.c | 65 +++
 block/qcow2.c  | 40 ++--
 block/qcow2.h  |  4 +++
 qapi/block-core.json   |  4 ++-
 6 files changed, 193 insertions(+), 11 deletions(-)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index 1d25147392..da55118ca7 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -411,3 +411,11 @@ void qcow2_cache_entry_mark_dirty(BlockDriverState *bs, 
Qcow2Cache *c,
 assert(c->entries[i].offset != 0);
 c->entries[i].dirty = true;
 }
+
+void qcow2_cache_entry_mark_clean(BlockDriverState *bs, Qcow2Cache *c,
+ void *table)
+{
+int i = qcow2_cache_get_table_idx(bs, c, table);
+assert(c->entries[i].offset != 0);
+c->entries[i].dirty = false;
+}
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 347d94b0d2..47e04d7317 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -32,6 +32,89 @@
 #include "qemu/bswap.h"
 #include "trace.h"
 
+int qcow2_reduce_l1_table(BlockDriverState *bs, uint64_t max_size)
+{
+BDRVQcow2State *s = bs->opaque;
+int64_t new_l1_size_bytes, free_l1_clusters;
+uint64_t *new_l1_table;
+int new_l1_size, i, ret;
+
+if (max_size >= s->l1_size) {
+return 0;
+}
+
+new_l1_size = max_size;
+
+#ifdef DEBUG_ALLOC2
+fprintf(stderr, "reduce l1_table from %d to %" PRId64 "\n",
+s->l1_size, new_l1_size);
+#endif
+
+ret = qcow2_cache_flush(bs, s->l2_table_cache);
+if (ret < 0) {
+return ret;
+}
+
+BLKDBG_EVENT(bs->file, BLKDBG_L1_REDUCE_FREE_L2_CLUSTERS);
+for (i = s->l1_size - 1; i > new_l1_size - 1; i--) {
+if ((s->l1_table[i] & L1E_OFFSET_MASK) == 0) {
+continue;
+}
+qcow2_free_clusters(bs, s->l1_table[i] & L1E_OFFSET_MASK,
+s->l2_size * sizeof(uint64_t),
+QCOW2_DISCARD_ALWAYS);
+}
+
+new_l1_size_bytes = sizeof(uint64_t) * new_l1_size;
+
+BLKDBG_EVENT(bs->file, BLKDBG_L1_REDUCE_WRITE_TABLE);
+ret = bdrv_pwrite_zeroes(bs->file, s->l1_table_offset + new_l1_size_bytes,
+ s->l1_size * sizeof(uint64_t) - new_l1_size_bytes,
+ 0);
+if (ret < 0) {
+return ret;
+}
+
+ret = bdrv_flush(bs->file->bs);
+if (ret < 0) {
+return ret;
+}
+
+/* set new table size */
+BLKDBG_EVENT(bs->file, BLKDBG_L1_REDUCE_ACTIVATE_TABLE);
+new_l1_size = cpu_to_be32(new_l1_size);
+ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, l1_size),
+   &new_l1_size, sizeof(new_l1_size));
+new_l1_size = be32_to_cpu(new_l1_size);
+if (ret < 0) {
+return ret;
+}
+
+BLKDBG_EVENT(bs->file, BLKDBG_L1_REDUCE_FREE_L1_CLUSTERS);
+free_l1_clusters =
+DIV_ROUND_UP(s->l1_size * sizeof(uint64_t), s->cluster_size) -
+DIV_ROUND_UP(new_l1_size_bytes, s->cluster_size);
+if (free_l1_clusters) {
+qcow2_free_clusters(bs, s->l1_table_offset +
+ROUND_UP(new_l1_size_bytes, s->cluster_size),
+free_l1_clusters << s->cluster_bits,
+QCOW2_DISCARD_ALWAYS);
+}
+
+new_l1_table = qemu_try_blockalign(bs->file->bs,
+   align_offset(new_l1_size_bytes, 512));
+if (new_l1_table == NULL) {
+return -ENOMEM;
+}
+memcpy(new_l1_table, s->l1_table, new_l1_size_bytes);
+
+qemu_vfree(s->l1_table);
+s->l1_table = new_l1_table;
+s->l1_size = new_l1_size;
+
+return 0;
+}
+
 int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
 bool exact_size)
 {
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 7c06061aae..5481b623cd 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -29,6 +29,7 @@
 #include "block/qcow2.h"
 #include "qemu/range.h"
 #include "qemu/bswap.h"
+#include "qemu/cutils.h"
 
 static int64_t alloc_clusters_noref(BlockDriverState *bs, uint64_t size);
 static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
@@ -2931,3 +2932,67 @@ done:
 qemu_vfree(new_refblock);
 return ret;
 }
+
+int qcow2_reftable_shrink(BlockDriverState *bs)
+{
+BDRVQcow2State *s = bs->opaque;
+int 

Re: [Qemu-devel] [PATCH v2 0/3] migration capability to discard the migrated ram pages

2017-03-07 Thread Pavel Butsykin

On 07.03.2017 16:56, Dr. David Alan Gilbert wrote:

* Pavel Butsykin (pbutsy...@virtuozzo.com) wrote:

On 14.02.2017 17:02, Dr. David Alan Gilbert wrote:

Hi Pavel,
I was curious, having merged this, how you're using postcopy; do you switch
into postcopy mode immediately or wait until the first sync or what?
Do you find yourself in postcopy mode long enough that it's worth
doing the release?  If so on what size VMs are you working with?

Dave


Hi Dave,
I'm sorry it took so long, I was on vacation. And thanks for the quick
review and merge!

We want to use this function to update Qemu without rebooting VMs. It
looks like a local migration on the same host. Now, switching into
postcopy mode is done immediately, the size of VMs can be very
different.


Thanks! Do you have libvirt magic to do that update?


Yes, there are patches that add a flag "--local" for migration. The
flag adds start incoming VM and other magic about replacement source on
destination.


Dave


* Pavel Butsykin (pbutsy...@virtuozzo.com) wrote:

This feature frees the migrated memory on the source during postcopy-ram
migration. In the second step of postcopy-ram migration when the source vm
is put on pause we can free unnecessary memory. It will allow, in particular,
to start relaxing the memory stress on the source host in a load-balancing
scenario.

Changes from v1:
- changed name of the interfaces (discard to release)
- fix make check error
- add more comments to qemu_iovec_release_ram()
- rebase on "Postcopy: Hugepage support" (David's patch series)
- removed ram_discard_page for xbzrle
- fix erroneous release memory in complete precopy (tie release-ram to postcopy)

Pavel Butsykin (3):
migration: add MigrationState arg for ram_save_/compressed_/page()
add 'release-ram' migrate capability
migration: discard non-dirty ram pages after the start of postcopy

   include/migration/migration.h |  2 ++
   include/migration/qemu-file.h |  3 ++-
   migration/migration.c | 13 ++
   migration/qemu-file.c | 59 
++-
   migration/ram.c   | 56 ++--
   qapi-schema.json  |  5 +++-
   6 files changed, 121 insertions(+), 17 deletions(-)

--
2.11.0



--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK


--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK





Re: [Qemu-devel] [PATCH v2 0/3] migration capability to discard the migrated ram pages

2017-03-07 Thread Pavel Butsykin

On 07.03.2017 17:46, Dr. David Alan Gilbert wrote:

* Pavel Butsykin (pbutsy...@virtuozzo.com) wrote:

On 07.03.2017 16:56, Dr. David Alan Gilbert wrote:

* Pavel Butsykin (pbutsy...@virtuozzo.com) wrote:

On 14.02.2017 17:02, Dr. David Alan Gilbert wrote:

Hi Pavel,
 I was curious, having merged this, how you're using postcopy; do you switch
into postcopy mode immediately or wait until the first sync or what?
Do you find yourself in postcopy mode long enough that it's worth
doing the release?  If so on what size VMs are you working with?

Dave


Hi Dave,
I'm sorry it took so long, I was on vacation. And thanks for the quick
review and merge!

We want to use this function to update Qemu without rebooting VMs. It
looks like a local migration on the same host. Now, switching into
postcopy mode is done immediately, the size of VMs can be very
different.


Thanks! Do you have libvirt magic to do that update?


Yes, there are patches that add a flag "--local" for migration. The
flag adds start incoming VM and other magic about replacement source on
destination.


Nice; are those posted to libvirt's list somewhere?


Not yet, but we're going to do it.


Dave


Dave


* Pavel Butsykin (pbutsy...@virtuozzo.com) wrote:

This feature frees the migrated memory on the source during postcopy-ram
migration. In the second step of postcopy-ram migration when the source vm
is put on pause we can free unnecessary memory. It will allow, in particular,
to start relaxing the memory stress on the source host in a load-balancing
scenario.

Changes from v1:
- changed name of the interfaces (discard to release)
- fix make check error
- add more comments to qemu_iovec_release_ram()
- rebase on "Postcopy: Hugepage support" (David's patch series)
- removed ram_discard_page for xbzrle
- fix erroneous release memory in complete precopy (tie release-ram to postcopy)

Pavel Butsykin (3):
 migration: add MigrationState arg for ram_save_/compressed_/page()
 add 'release-ram' migrate capability
 migration: discard non-dirty ram pages after the start of postcopy

include/migration/migration.h |  2 ++
include/migration/qemu-file.h |  3 ++-
migration/migration.c | 13 ++
migration/qemu-file.c | 59 
++-
migration/ram.c   | 56 ++--
qapi-schema.json  |  5 +++-
6 files changed, 121 insertions(+), 17 deletions(-)

--
2.11.0



--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK


--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK


--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK





Re: [Qemu-devel] [PATCH v2 0/3] migration capability to discard the migrated ram pages

2017-03-03 Thread Pavel Butsykin

On 14.02.2017 17:02, Dr. David Alan Gilbert wrote:

Hi Pavel,
   I was curious, having merged this, how you're using postcopy; do you switch
into postcopy mode immediately or wait until the first sync or what?
Do you find yourself in postcopy mode long enough that it's worth
doing the release?  If so on what size VMs are you working with?

Dave


Hi Dave,
I'm sorry it took so long, I was on vacation. And thanks for the quick
review and merge!

We want to use this function to update Qemu without rebooting VMs. It
looks like a local migration on the same host. Now, switching into
postcopy mode is done immediately, the size of VMs can be very
different.


* Pavel Butsykin (pbutsy...@virtuozzo.com) wrote:

This feature frees the migrated memory on the source during postcopy-ram
migration. In the second step of postcopy-ram migration when the source vm
is put on pause we can free unnecessary memory. It will allow, in particular,
to start relaxing the memory stress on the source host in a load-balancing
scenario.

Changes from v1:
- changed name of the interfaces (discard to release)
- fix make check error
- add more comments to qemu_iovec_release_ram()
- rebase on "Postcopy: Hugepage support" (David's patch series)
- removed ram_discard_page for xbzrle
- fix erroneous release memory in complete precopy (tie release-ram to postcopy)

Pavel Butsykin (3):
   migration: add MigrationState arg for ram_save_/compressed_/page()
   add 'release-ram' migrate capability
   migration: discard non-dirty ram pages after the start of postcopy

  include/migration/migration.h |  2 ++
  include/migration/qemu-file.h |  3 ++-
  migration/migration.c | 13 ++
  migration/qemu-file.c | 59 ++-
  migration/ram.c   | 56 ++--
  qapi-schema.json  |  5 +++-
  6 files changed, 121 insertions(+), 17 deletions(-)

--
2.11.0



--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK





[Qemu-devel] [PATCH] migration: madvise error_report fixup!

2017-02-10 Thread Pavel Butsykin
Signed-off-by: Pavel Butsykin 
---
 migration/qemu-file.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 82dbef3c86..195fa94fcf 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -156,13 +156,13 @@ static void qemu_iovec_release_ram(QEMUFile *f)
 continue;
 }
 if (qemu_madvise(iov.iov_base, iov.iov_len, QEMU_MADV_DONTNEED) < 0) {
-error_report("migrate: madvise DONTNEED failed %p %ld: %s",
+error_report("migrate: madvise DONTNEED failed %p %zd: %s",
  iov.iov_base, iov.iov_len, strerror(errno));
 }
 iov = f->iov[idx];
 }
 if (qemu_madvise(iov.iov_base, iov.iov_len, QEMU_MADV_DONTNEED) < 0) {
-error_report("migrate: madvise DONTNEED failed %p %ld: %s",
+error_report("migrate: madvise DONTNEED failed %p %zd: %s",
  iov.iov_base, iov.iov_len, strerror(errno));
 }
 memset(f->may_free, 0, sizeof(f->may_free));
-- 
2.11.0




[Qemu-devel] [PATCH v2 0/3] migration capability to discard the migrated ram pages

2017-02-03 Thread Pavel Butsykin
This feature frees the migrated memory on the source during postcopy-ram
migration. In the second step of postcopy-ram migration when the source vm
is put on pause we can free unnecessary memory. It will allow, in particular,
to start relaxing the memory stress on the source host in a load-balancing
scenario.

Changes from v1:
- changed name of the interfaces (discard to release)
- fix make check error
- add more comments to qemu_iovec_release_ram()
- rebase on "Postcopy: Hugepage support" (David's patch series)
- removed ram_discard_page for xbzrle 
- fix erroneous release memory in complete precopy (tie release-ram to postcopy)

Pavel Butsykin (3):
  migration: add MigrationState arg for ram_save_/compressed_/page()
  add 'release-ram' migrate capability
  migration: discard non-dirty ram pages after the start of postcopy

 include/migration/migration.h |  2 ++
 include/migration/qemu-file.h |  3 ++-
 migration/migration.c | 13 ++
 migration/qemu-file.c | 59 ++-
 migration/ram.c   | 56 ++--
 qapi-schema.json  |  5 +++-
 6 files changed, 121 insertions(+), 17 deletions(-)

-- 
2.11.0




[Qemu-devel] [PATCH v2 1/3] migration: add MigrationState arg for ram_save_/compressed_/page()

2017-02-03 Thread Pavel Butsykin
Cosmetic patch. The use of ms variable instead of migrate_get_current()
looks nicer, especially when there reuse.

Signed-off-by: Pavel Butsykin 
---
 migration/ram.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index a683f4bb9e..d866b6518b 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -734,13 +734,14 @@ static int save_zero_page(QEMUFile *f, RAMBlock *block, 
ram_addr_t offset,
  *  >=0 - Number of pages written - this might legally be 0
  *if xbzrle noticed the page was the same.
  *
+ * @ms: The current migration state.
  * @f: QEMUFile where to send the data
  * @block: block that contains the page we want to send
  * @offset: offset inside the block for the page
  * @last_stage: if we are at the completion stage
  * @bytes_transferred: increase it with the number of transferred bytes
  */
-static int ram_save_page(QEMUFile *f, PageSearchStatus *pss,
+static int ram_save_page(MigrationState *ms, QEMUFile *f, PageSearchStatus 
*pss,
  bool last_stage, uint64_t *bytes_transferred)
 {
 int pages = -1;
@@ -786,8 +787,7 @@ static int ram_save_page(QEMUFile *f, PageSearchStatus *pss,
  */
 xbzrle_cache_zero_page(current_addr);
 } else if (!ram_bulk_stage &&
-   !migration_in_postcopy(migrate_get_current()) &&
-   migrate_use_xbzrle()) {
+   !migration_in_postcopy(ms) && migrate_use_xbzrle()) {
 pages = save_xbzrle_page(f, &p, current_addr, block,
  offset, last_stage, bytes_transferred);
 if (!last_stage) {
@@ -914,14 +914,15 @@ static int compress_page_with_multi_thread(QEMUFile *f, 
RAMBlock *block,
  *
  * Returns: Number of pages written.
  *
+ * @ms: The current migration state.
  * @f: QEMUFile where to send the data
  * @block: block that contains the page we want to send
  * @offset: offset inside the block for the page
  * @last_stage: if we are at the completion stage
  * @bytes_transferred: increase it with the number of transferred bytes
  */
-static int ram_save_compressed_page(QEMUFile *f, PageSearchStatus *pss,
-bool last_stage,
+static int ram_save_compressed_page(MigrationState *ms, QEMUFile *f,
+PageSearchStatus *pss, bool last_stage,
 uint64_t *bytes_transferred)
 {
 int pages = -1;
@@ -1252,11 +1253,11 @@ static int ram_save_target_page(MigrationState *ms, 
QEMUFile *f,
 if (migration_bitmap_clear_dirty(dirty_ram_abs)) {
 unsigned long *unsentmap;
 if (compression_switch && migrate_use_compression()) {
-res = ram_save_compressed_page(f, pss,
+res = ram_save_compressed_page(ms, f, pss,
last_stage,
bytes_transferred);
 } else {
-res = ram_save_page(f, pss, last_stage,
+res = ram_save_page(ms, f, pss, last_stage,
 bytes_transferred);
 }
 
-- 
2.11.0




[Qemu-devel] [PATCH v2 3/3] migration: discard non-dirty ram pages after the start of postcopy

2017-02-03 Thread Pavel Butsykin
After the start of postcopy migration there are some non-dirty pages which have
already been migrated. These pages are no longer needed on the source vm so that
we can free them and it doen't hurt to complete the migration.

Signed-off-by: Pavel Butsykin 
---
 include/migration/migration.h |  1 +
 migration/migration.c |  4 
 migration/ram.c   | 19 +++
 3 files changed, 24 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 401fbe1f77..3a5f8c469e 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -288,6 +288,7 @@ int ram_postcopy_send_discard_bitmap(MigrationState *ms);
 int ram_discard_range(MigrationIncomingState *mis, const char *block_name,
   uint64_t start, size_t length);
 int ram_postcopy_incoming_init(MigrationIncomingState *mis);
+void ram_postcopy_migrated_memory_release(MigrationState *ms);
 
 /**
  * @migrate_add_blocker - prevent migration from proceeding
diff --git a/migration/migration.c b/migration/migration.c
index 8d5a5f8a6e..a9aa6a0f8b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1713,6 +1713,10 @@ static int postcopy_start(MigrationState *ms, bool 
*old_vm_running)
  */
 qemu_savevm_send_ping(ms->to_dst_file, 4);
 
+if (migrate_release_ram()) {
+ram_postcopy_migrated_memory_release(ms);
+}
+
 ret = qemu_file_get_error(ms->to_dst_file);
 if (ret) {
 error_report("postcopy_start: Migration stream errored");
diff --git a/migration/ram.c b/migration/ram.c
index 5a43f716d1..ae1f10b145 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1562,6 +1562,25 @@ void ram_debug_dump_bitmap(unsigned long *todump, bool 
expected)
 
 /*  functions for postcopy * */
 
+void ram_postcopy_migrated_memory_release(MigrationState *ms)
+{
+struct RAMBlock *block;
+unsigned long *bitmap = atomic_rcu_read(&migration_bitmap_rcu)->bmap;
+
+QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+unsigned long first = block->offset >> TARGET_PAGE_BITS;
+unsigned long range = first + (block->used_length >> TARGET_PAGE_BITS);
+unsigned long run_start = find_next_zero_bit(bitmap, range, first);
+
+while (run_start < range) {
+unsigned long run_end = find_next_bit(bitmap, range, run_start + 
1);
+ram_discard_range(NULL, block->idstr, run_start << 
TARGET_PAGE_BITS,
+  (run_end - run_start) << TARGET_PAGE_BITS);
+run_start = find_next_zero_bit(bitmap, range, run_end + 1);
+}
+}
+}
+
 /*
  * Callback from postcopy_each_ram_send_discard for each RAMBlock
  * Note: At this point the 'unsentmap' is the processed bitmap combined
-- 
2.11.0




[Qemu-devel] [PATCH v2 2/3] add 'release-ram' migrate capability

2017-02-03 Thread Pavel Butsykin
This feature frees the migrated memory on the source during postcopy-ram
migration. In the second step of postcopy-ram migration when the source vm
is put on pause we can free unnecessary memory. It will allow, in particular,
to start relaxing the memory stress on the source host in a load-balancing
scenario.

Signed-off-by: Pavel Butsykin 
---
 include/migration/migration.h |  1 +
 include/migration/qemu-file.h |  3 ++-
 migration/migration.c |  9 +++
 migration/qemu-file.c | 59 ++-
 migration/ram.c   | 22 +++-
 qapi-schema.json  |  5 +++-
 6 files changed, 89 insertions(+), 10 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index bd399fc0df..401fbe1f77 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -307,6 +307,7 @@ int migrate_add_blocker(Error *reason, Error **errp);
  */
 void migrate_del_blocker(Error *reason);
 
+bool migrate_release_ram(void);
 bool migrate_postcopy_ram(void);
 bool migrate_zero_blocks(void);
 
diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index abedd466c9..0cd648a733 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -132,7 +132,8 @@ void qemu_put_byte(QEMUFile *f, int v);
  * put_buffer without copying the buffer.
  * The buffer should be available till it is sent asynchronously.
  */
-void qemu_put_buffer_async(QEMUFile *f, const uint8_t *buf, size_t size);
+void qemu_put_buffer_async(QEMUFile *f, const uint8_t *buf, size_t size,
+   bool may_free);
 bool qemu_file_mode_is_not_valid(const char *mode);
 bool qemu_file_is_writable(QEMUFile *f);
 
diff --git a/migration/migration.c b/migration/migration.c
index 1ae68be0c7..8d5a5f8a6e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1302,6 +1302,15 @@ void qmp_migrate_set_downtime(double value, Error **errp)
 qmp_migrate_set_parameters(&p, errp);
 }
 
+bool migrate_release_ram(void)
+{
+MigrationState *s;
+
+s = migrate_get_current();
+
+return s->enabled_capabilities[MIGRATION_CAPABILITY_RELEASE_RAM];
+}
+
 bool migrate_postcopy_ram(void)
 {
 MigrationState *s;
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index e9fae31158..82dbef3c86 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -49,6 +49,7 @@ struct QEMUFile {
 int buf_size; /* 0 when writing */
 uint8_t buf[IO_BUF_SIZE];
 
+DECLARE_BITMAP(may_free, MAX_IOV_SIZE);
 struct iovec iov[MAX_IOV_SIZE];
 unsigned int iovcnt;
 
@@ -132,6 +133,41 @@ bool qemu_file_is_writable(QEMUFile *f)
 return f->ops->writev_buffer;
 }
 
+static void qemu_iovec_release_ram(QEMUFile *f)
+{
+struct iovec iov;
+unsigned long idx;
+
+/* Find and release all the contiguous memory ranges marked as may_free. */
+idx = find_next_bit(f->may_free, f->iovcnt, 0);
+if (idx >= f->iovcnt) {
+return;
+}
+iov = f->iov[idx];
+
+/* The madvise() in the loop is called for iov within a continuous range 
and
+ * then reinitialize the iov. And in the end, madvise() is called for the
+ * last iov.
+ */
+while ((idx = find_next_bit(f->may_free, f->iovcnt, idx + 1)) < f->iovcnt) 
{
+/* check for adjacent buffer and coalesce them */
+if (iov.iov_base + iov.iov_len == f->iov[idx].iov_base) {
+iov.iov_len += f->iov[idx].iov_len;
+continue;
+}
+if (qemu_madvise(iov.iov_base, iov.iov_len, QEMU_MADV_DONTNEED) < 0) {
+error_report("migrate: madvise DONTNEED failed %p %ld: %s",
+ iov.iov_base, iov.iov_len, strerror(errno));
+}
+iov = f->iov[idx];
+}
+if (qemu_madvise(iov.iov_base, iov.iov_len, QEMU_MADV_DONTNEED) < 0) {
+error_report("migrate: madvise DONTNEED failed %p %ld: %s",
+ iov.iov_base, iov.iov_len, strerror(errno));
+}
+memset(f->may_free, 0, sizeof(f->may_free));
+}
+
 /**
  * Flushes QEMUFile buffer
  *
@@ -151,6 +187,8 @@ void qemu_fflush(QEMUFile *f)
 if (f->iovcnt > 0) {
 expect = iov_size(f->iov, f->iovcnt);
 ret = f->ops->writev_buffer(f->opaque, f->iov, f->iovcnt, f->pos);
+
+qemu_iovec_release_ram(f);
 }
 
 if (ret >= 0) {
@@ -304,13 +342,19 @@ int qemu_fclose(QEMUFile *f)
 return ret;
 }
 
-static void add_to_iovec(QEMUFile *f, const uint8_t *buf, size_t size)
+static void add_to_iovec(QEMUFile *f, const uint8_t *buf, size_t size,
+ bool may_free)
 {
 /* check for adjacent buffer and coalesce them */
 if (f->iovcnt > 0 && buf == f->iov[f->iovcnt - 1].iov_base +
-f->iov[f->iovcnt - 1].iov_len) {
+f->iov[f->iovcnt - 1].iov_

Re: [Qemu-devel] [PATCH 1/2] add 'discard-ram' migrate capability

2017-01-30 Thread Pavel Butsykin

On 27.01.2017 14:01, Dr. David Alan Gilbert wrote:

* Pavel Butsykin (pbutsy...@virtuozzo.com) wrote:

This feature frees the migrated memory on the source during postcopy-ram
migration. In the second step of postcopy-ram migration when the source vm
is put on pause we can free unnecessary memory. It will allow, in particular,
to start relaxing the memory stress on the source host in a load-balancing
scenario.

Signed-off-by: Pavel Butsykin 


Hi Pavel,
   Firstly a higher-level thing, can we use a different word than 'discard'
because I already use 'discard' in postcopy to mean the request from the source
to the destination to discard pages that are redirtied.  I suggest 'release-ram'
to just pick a different word that means the same thing.



Hi David,

Yeah, I thought about it.. the same names can confuse here, so I agree 
to change the name.



Also, see patchew's build error it spotted.


---
  include/migration/migration.h |  1 +
  include/migration/qemu-file.h |  3 ++-
  migration/migration.c |  9 +++
  migration/qemu-file.c | 59 ++-
  migration/ram.c   | 24 +-
  qapi-schema.json  |  5 +++-
  6 files changed, 91 insertions(+), 10 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index c309d23370..d7bd404365 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -294,6 +294,7 @@ void migrate_add_blocker(Error *reason);
   */
  void migrate_del_blocker(Error *reason);

+bool migrate_discard_ram(void);
  bool migrate_postcopy_ram(void);
  bool migrate_zero_blocks(void);

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index abedd466c9..0cd648a733 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -132,7 +132,8 @@ void qemu_put_byte(QEMUFile *f, int v);
   * put_buffer without copying the buffer.
   * The buffer should be available till it is sent asynchronously.
   */
-void qemu_put_buffer_async(QEMUFile *f, const uint8_t *buf, size_t size);
+void qemu_put_buffer_async(QEMUFile *f, const uint8_t *buf, size_t size,
+   bool may_free);
  bool qemu_file_mode_is_not_valid(const char *mode);
  bool qemu_file_is_writable(QEMUFile *f);

diff --git a/migration/migration.c b/migration/migration.c
index f498ab84f2..391db6f28b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1251,6 +1251,15 @@ void qmp_migrate_set_downtime(double value, Error **errp)
  qmp_migrate_set_parameters(&p, errp);
  }

+bool migrate_discard_ram(void)
+{
+MigrationState *s;
+
+s = migrate_get_current();
+
+return s->enabled_capabilities[MIGRATION_CAPABILITY_DISCARD_RAM];
+}
+
  bool migrate_postcopy_ram(void)
  {
  MigrationState *s;
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index e9fae31158..f85a0ecd9e 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -30,6 +30,7 @@
  #include "qemu/coroutine.h"
  #include "migration/migration.h"
  #include "migration/qemu-file.h"
+#include "sysemu/sysemu.h"
  #include "trace.h"

  #define IO_BUF_SIZE 32768
@@ -49,6 +50,7 @@ struct QEMUFile {
  int buf_size; /* 0 when writing */
  uint8_t buf[IO_BUF_SIZE];

+DECLARE_BITMAP(may_free, MAX_IOV_SIZE);
  struct iovec iov[MAX_IOV_SIZE];
  unsigned int iovcnt;

@@ -132,6 +134,40 @@ bool qemu_file_is_writable(QEMUFile *f)
  return f->ops->writev_buffer;
  }

+static void qemu_iovec_discard_ram(QEMUFile *f)
+{
+struct iovec iov;
+unsigned long idx;
+
+if (!migrate_discard_ram() || !runstate_check(RUN_STATE_FINISH_MIGRATE)) {
+return;
+}


Can we split this out into a separate function please; so qemu_iovec_discard_ram
always does it, and then you have something that only calls it if enabled.



Of course, I think migrate_discard_ram() is not needed here, because it
is already invoked in ram.c:

static int ram_save_page(QEMUFile *f, PageSearchStatus *pss,
...
if (send_async) {
qemu_put_buffer_async(
f, p, TARGET_PAGE_SIZE, migrate_discard_ram());
} else {
qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
...


Also, it will fix 'build error'.


+idx = find_next_bit(f->may_free, f->iovcnt, 0);
+if (idx >= f->iovcnt) {
+return;
+}
+iov = f->iov[idx];
+
+while ((idx = find_next_bit(f->may_free, f->iovcnt, idx + 1)) < f->iovcnt) 
{
+/* check for adjacent buffer and coalesce them */
+if (iov.iov_base + iov.iov_len == f->iov[idx].iov_base) {
+iov.iov_len += f->iov[idx].iov_len;
+continue;
+}
+if (qemu_madvise(iov.iov_base, iov.iov_len, QEMU_MADV_DONTNEED) < 0) {
+error_report("migrate: madvise DON

Re: [Qemu-devel] [PATCH 2/2] migration: discard non-dirty ram pages after the start of postcopy

2017-01-30 Thread Pavel Butsykin

On 27.01.2017 14:39, Dr. David Alan Gilbert wrote:

* Pavel Butsykin (pbutsy...@virtuozzo.com) wrote:

After the start of postcopy migration there are some non-dirty pages which have
already been migrated. These pages are no longer needed on the source vm so that
we can free them and it doen't hurt to complete the migration.

Signed-off-by: Pavel Butsykin 
---
  include/migration/migration.h |  1 +
  migration/migration.c |  2 ++
  migration/ram.c   | 25 +
  3 files changed, 28 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index d7bd404365..0d9b81545c 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -279,6 +279,7 @@ int ram_postcopy_send_discard_bitmap(MigrationState *ms);
  int ram_discard_range(MigrationIncomingState *mis, const char *block_name,
uint64_t start, size_t length);
  int ram_postcopy_incoming_init(MigrationIncomingState *mis);
+void ram_postcopy_migrated_memory_discard(MigrationState *ms);

  /**
   * @migrate_add_blocker - prevent migration from proceeding
diff --git a/migration/migration.c b/migration/migration.c
index 391db6f28b..20490ed020 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1662,6 +1662,8 @@ static int postcopy_start(MigrationState *ms, bool 
*old_vm_running)
   */
  qemu_savevm_send_ping(ms->to_dst_file, 4);

+ram_postcopy_migrated_memory_discard(ms);
+


Did you intend this to be selected based on your capability flag?



I did, but apparently lost..




  ret = qemu_file_get_error(ms->to_dst_file);
  if (ret) {
  error_report("postcopy_start: Migration stream errored");
diff --git a/migration/ram.c b/migration/ram.c
index b0322a0b5c..8a6b614b0d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1546,6 +1546,31 @@ void ram_debug_dump_bitmap(unsigned long *todump, bool 
expected)

  /*  functions for postcopy * */

+void ram_postcopy_migrated_memory_discard(MigrationState *ms)
+{
+struct RAMBlock *block;
+unsigned long *bitmap = atomic_rcu_read(&migration_bitmap_rcu)->bmap;
+
+QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+unsigned long first = block->offset >> TARGET_PAGE_BITS;
+unsigned long range = first + (block->used_length >> TARGET_PAGE_BITS);
+unsigned long run_start = find_next_zero_bit(bitmap, range, first);
+
+while (run_start < range) {
+unsigned long run_end = find_next_bit(bitmap, range, run_start + 
1);
+uint8_t *addr = block->host + (run_start << TARGET_PAGE_BITS);
+size_t chunk_size = (run_end - run_start) << TARGET_PAGE_BITS;
+
+if (qemu_madvise(addr, chunk_size, QEMU_MADV_DONTNEED) < 0) {
+error_report("migrate: madvise DONTNEED failed %p %ld: %s",
+ addr, chunk_size, strerror(errno));
+}


can you use your ram_discard_page here, it keeps all the use of madvise 
together.


ok.


+
+run_start = find_next_zero_bit(bitmap, range, run_end + 1);
+}
+}
+}


Dave


  /*
   * Callback from postcopy_each_ram_send_discard for each RAMBlock
   * Note: At this point the 'unsentmap' is the processed bitmap combined
--
2.11.0



--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK





  1   2   3   >