Re: [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts

2012-06-07 Thread Stefan Hajnoczi
On Wed, Jun 6, 2012 at 3:53 PM, Zhi Yong Wu zwu.ker...@gmail.com wrote:
 On Wed, Jun 6, 2012 at 6:32 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Fri, Jun 1, 2012 at 9:26 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote:
 On Fri, Jun 1, 2012 at 4:06 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Fri, Jun 1, 2012 at 6:22 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote:
 On Thu, May 31, 2012 at 5:26 PM, Stefan Hajnoczi stefa...@gmail.com 
 wrote:
 On Wed, May 30, 2012 at 9:31 AM, Zhi Yong Wu zwu.ker...@gmail.com 
 wrote:
 On Sat, May 12, 2012 at 12:48 AM, Kevin Wolf kw...@redhat.com wrote:
 A prerequisite for a QED mode in qcow2, which doesn't update the 
 refcount
 Recently some new concepts such as QED mode in qcow2 are seen
 frequencely, can anyone explain what it means? thanks.

 qcow2 has more metadata than qed.  More metadata means more write
 operations when allocating new clusters.

 In order to overcome this performance issue qcow2 has a metadata
 cache.  But when QEMU is launched with -drive ...,cache=writethrough
 (the default) the metadata cache *must* be in writethrough mode
 Why must i be? If the option with -drive ..,cache=writethrough is
 specified. it means that host page cache is on while guest disk cache
 is off. Since the metadata cache exists in host page cache, not guest,
 i think that it is in writeback mode.

 Since the emulated disk write cache is off, we must ensure that guest
 writes are on disk before completing them.  Therefore we cannot cache
 metadata updates in host RAM - it would be lost on power failure but
 But host page cache is *on* in this mode, which means that metadata
 should be cached in host RAM. how do you explain this?

 cache=writethrough means that the file is opened with O_SYNC.  Every
 single write reaches the physical disk - that's why it's called a
 writethrough cache.  Read requests, however, can be satisfied from
 the host page cache.

 In other words, cache=writethrough ensures that all data reaches the
 disk but may give performance benefits to read-heavy workloads
 (especially when guest RAM is much smaller than host RAM, so the host
 page cache would have a high hit rate).
 Ah, i see now, cache=writethrough mean that host page cache is applied
 to read request, not write. thanks.

Writes are placed in the host page cache so future reads can be served
from the cache.  But O_SYNC also forces the kernel to immediately sync
the data in the host page cache to disk.

Stefan



Re: [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts

2012-06-06 Thread Stefan Hajnoczi
On Fri, Jun 1, 2012 at 9:26 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote:
 On Fri, Jun 1, 2012 at 4:06 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Fri, Jun 1, 2012 at 6:22 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote:
 On Thu, May 31, 2012 at 5:26 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Wed, May 30, 2012 at 9:31 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote:
 On Sat, May 12, 2012 at 12:48 AM, Kevin Wolf kw...@redhat.com wrote:
 A prerequisite for a QED mode in qcow2, which doesn't update the 
 refcount
 Recently some new concepts such as QED mode in qcow2 are seen
 frequencely, can anyone explain what it means? thanks.

 qcow2 has more metadata than qed.  More metadata means more write
 operations when allocating new clusters.

 In order to overcome this performance issue qcow2 has a metadata
 cache.  But when QEMU is launched with -drive ...,cache=writethrough
 (the default) the metadata cache *must* be in writethrough mode
 Why must i be? If the option with -drive ..,cache=writethrough is
 specified. it means that host page cache is on while guest disk cache
 is off. Since the metadata cache exists in host page cache, not guest,
 i think that it is in writeback mode.

 Since the emulated disk write cache is off, we must ensure that guest
 writes are on disk before completing them.  Therefore we cannot cache
 metadata updates in host RAM - it would be lost on power failure but
 But host page cache is *on* in this mode, which means that metadata
 should be cached in host RAM. how do you explain this?

cache=writethrough means that the file is opened with O_SYNC.  Every
single write reaches the physical disk - that's why it's called a
writethrough cache.  Read requests, however, can be satisfied from
the host page cache.

In other words, cache=writethrough ensures that all data reaches the
disk but may give performance benefits to read-heavy workloads
(especially when guest RAM is much smaller than host RAM, so the host
page cache would have a high hit rate).

 we promised the guest its writes reached the disk!

 instead of writeback mode.  In other words, every metadata update
 needs to be written to the image file before we complete the guest's
 What will mean one guest's wirte request is completed?

 For example, virtio-blk fills in the success status code and raises an
 interrupt.  This notifies the guest that the write is done.
 Great, thanks.

 write request.  This means the metadata cache only hides the metadata
 performance issue when -drive ...,cache=direct|writeback are used
 because there we can keep metadata changes buffered in memory until
 the guest flushes the emulated disk write cache.

 QED mode is a solution for -drive ...,cache=writethrough|directsync.
  It simply doesn't update refcount metadata in the qcow2 image file
 l1/l2 info need to be updated to qcow2 image file?

Yes, this is necessary to ensure written data is accessible in the
future.  Without the L1/L2 tables we cannot find the data we wrote.

Stefan



Re: [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts

2012-06-06 Thread Zhi Yong Wu
On Wed, Jun 6, 2012 at 6:32 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Fri, Jun 1, 2012 at 9:26 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote:
 On Fri, Jun 1, 2012 at 4:06 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Fri, Jun 1, 2012 at 6:22 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote:
 On Thu, May 31, 2012 at 5:26 PM, Stefan Hajnoczi stefa...@gmail.com 
 wrote:
 On Wed, May 30, 2012 at 9:31 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote:
 On Sat, May 12, 2012 at 12:48 AM, Kevin Wolf kw...@redhat.com wrote:
 A prerequisite for a QED mode in qcow2, which doesn't update the 
 refcount
 Recently some new concepts such as QED mode in qcow2 are seen
 frequencely, can anyone explain what it means? thanks.

 qcow2 has more metadata than qed.  More metadata means more write
 operations when allocating new clusters.

 In order to overcome this performance issue qcow2 has a metadata
 cache.  But when QEMU is launched with -drive ...,cache=writethrough
 (the default) the metadata cache *must* be in writethrough mode
 Why must i be? If the option with -drive ..,cache=writethrough is
 specified. it means that host page cache is on while guest disk cache
 is off. Since the metadata cache exists in host page cache, not guest,
 i think that it is in writeback mode.

 Since the emulated disk write cache is off, we must ensure that guest
 writes are on disk before completing them.  Therefore we cannot cache
 metadata updates in host RAM - it would be lost on power failure but
 But host page cache is *on* in this mode, which means that metadata
 should be cached in host RAM. how do you explain this?

 cache=writethrough means that the file is opened with O_SYNC.  Every
 single write reaches the physical disk - that's why it's called a
 writethrough cache.  Read requests, however, can be satisfied from
 the host page cache.

 In other words, cache=writethrough ensures that all data reaches the
 disk but may give performance benefits to read-heavy workloads
 (especially when guest RAM is much smaller than host RAM, so the host
 page cache would have a high hit rate).
Ah, i see now, cache=writethrough mean that host page cache is applied
to read request, not write. thanks.

 we promised the guest its writes reached the disk!

 instead of writeback mode.  In other words, every metadata update
 needs to be written to the image file before we complete the guest's
 What will mean one guest's wirte request is completed?

 For example, virtio-blk fills in the success status code and raises an
 interrupt.  This notifies the guest that the write is done.
 Great, thanks.

 write request.  This means the metadata cache only hides the metadata
 performance issue when -drive ...,cache=direct|writeback are used
 because there we can keep metadata changes buffered in memory until
 the guest flushes the emulated disk write cache.

 QED mode is a solution for -drive ...,cache=writethrough|directsync.
  It simply doesn't update refcount metadata in the qcow2 image file
 l1/l2 info need to be updated to qcow2 image file?

 Yes, this is necessary to ensure written data is accessible in the
 future.  Without the L1/L2 tables we cannot find the data we wrote.

 Stefan



-- 
Regards,

Zhi Yong Wu



Re: [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts

2012-06-01 Thread Stefan Hajnoczi
On Fri, Jun 1, 2012 at 6:22 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote:
 On Thu, May 31, 2012 at 5:26 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Wed, May 30, 2012 at 9:31 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote:
 On Sat, May 12, 2012 at 12:48 AM, Kevin Wolf kw...@redhat.com wrote:
 A prerequisite for a QED mode in qcow2, which doesn't update the refcount
 Recently some new concepts such as QED mode in qcow2 are seen
 frequencely, can anyone explain what it means? thanks.

 qcow2 has more metadata than qed.  More metadata means more write
 operations when allocating new clusters.

 In order to overcome this performance issue qcow2 has a metadata
 cache.  But when QEMU is launched with -drive ...,cache=writethrough
 (the default) the metadata cache *must* be in writethrough mode
 Why must i be? If the option with -drive ..,cache=writethrough is
 specified. it means that host page cache is on while guest disk cache
 is off. Since the metadata cache exists in host page cache, not guest,
 i think that it is in writeback mode.

Since the emulated disk write cache is off, we must ensure that guest
writes are on disk before completing them.  Therefore we cannot cache
metadata updates in host RAM - it would be lost on power failure but
we promised the guest its writes reached the disk!

 instead of writeback mode.  In other words, every metadata update
 needs to be written to the image file before we complete the guest's
 What will mean one guest's wirte request is completed?

For example, virtio-blk fills in the success status code and raises an
interrupt.  This notifies the guest that the write is done.

 write request.  This means the metadata cache only hides the metadata
 performance issue when -drive ...,cache=direct|writeback are used
 because there we can keep metadata changes buffered in memory until
 the guest flushes the emulated disk write cache.

 QED mode is a solution for -drive ...,cache=writethrough|directsync.
  It simply doesn't update refcount metadata in the qcow2 image file
 immediately in exchange for a refcount fixup step that is introduced
 Can you say this with more details? Why is this step need only when
 image file is opened? After image file is opened, and some guest's
 write requests are completed, maybe the refcount fixup step need to be
 done once.

If we don't update refcounts on disk then they become outdated and no
longer reflect the true allocation information.  It's not safe to rely
on outdated refcount information since we could allocate the same
cluster multiple times - this means data corruption.  By running a
consistency check when opening a dirty image file we guarantee that we
have accurate refcount information again.

As an optimization we will commit refcount information to disk when
closing the image file and mark it clean.  This means a clean QEMU
shutdown does not require a consistency check on startup - but in the
worst case (power failure or crash) we will have a dirty image file.

Stefan



Re: [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts

2012-06-01 Thread Zhi Yong Wu
On Fri, Jun 1, 2012 at 4:06 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Fri, Jun 1, 2012 at 6:22 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote:
 On Thu, May 31, 2012 at 5:26 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Wed, May 30, 2012 at 9:31 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote:
 On Sat, May 12, 2012 at 12:48 AM, Kevin Wolf kw...@redhat.com wrote:
 A prerequisite for a QED mode in qcow2, which doesn't update the 
 refcount
 Recently some new concepts such as QED mode in qcow2 are seen
 frequencely, can anyone explain what it means? thanks.

 qcow2 has more metadata than qed.  More metadata means more write
 operations when allocating new clusters.

 In order to overcome this performance issue qcow2 has a metadata
 cache.  But when QEMU is launched with -drive ...,cache=writethrough
 (the default) the metadata cache *must* be in writethrough mode
 Why must i be? If the option with -drive ..,cache=writethrough is
 specified. it means that host page cache is on while guest disk cache
 is off. Since the metadata cache exists in host page cache, not guest,
 i think that it is in writeback mode.

 Since the emulated disk write cache is off, we must ensure that guest
 writes are on disk before completing them.  Therefore we cannot cache
 metadata updates in host RAM - it would be lost on power failure but
But host page cache is *on* in this mode, which means that metadata
should be cached in host RAM. how do you explain this?

 we promised the guest its writes reached the disk!

 instead of writeback mode.  In other words, every metadata update
 needs to be written to the image file before we complete the guest's
 What will mean one guest's wirte request is completed?

 For example, virtio-blk fills in the success status code and raises an
 interrupt.  This notifies the guest that the write is done.
Great, thanks.

 write request.  This means the metadata cache only hides the metadata
 performance issue when -drive ...,cache=direct|writeback are used
 because there we can keep metadata changes buffered in memory until
 the guest flushes the emulated disk write cache.

 QED mode is a solution for -drive ...,cache=writethrough|directsync.
  It simply doesn't update refcount metadata in the qcow2 image file
l1/l2 info need to be updated to qcow2 image file?
 immediately in exchange for a refcount fixup step that is introduced
 Can you say this with more details? Why is this step need only when
 image file is opened? After image file is opened, and some guest's
 write requests are completed, maybe the refcount fixup step need to be
 done once.

 If we don't update refcounts on disk then they become outdated and no
 longer reflect the true allocation information.  It's not safe to rely
 on outdated refcount information since we could allocate the same
 cluster multiple times - this means data corruption.  By running a
 consistency check when opening a dirty image file we guarantee that we
 have accurate refcount information again.
ah, i got it now.

 As an optimization we will commit refcount information to disk when
 closing the image file and mark it clean.  This means a clean QEMU
 shutdown does not require a consistency check on startup - but in the
 worst case (power failure or crash) we will have a dirty image file.
Yeah, a consistency check on startup is good, i think. thanks.

 Stefan



-- 
Regards,

Zhi Yong Wu



Re: [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts

2012-05-31 Thread Stefan Hajnoczi
On Wed, May 30, 2012 at 9:31 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote:
 On Sat, May 12, 2012 at 12:48 AM, Kevin Wolf kw...@redhat.com wrote:
 A prerequisite for a QED mode in qcow2, which doesn't update the refcount
 Recently some new concepts such as QED mode in qcow2 are seen
 frequencely, can anyone explain what it means? thanks.

qcow2 has more metadata than qed.  More metadata means more write
operations when allocating new clusters.

In order to overcome this performance issue qcow2 has a metadata
cache.  But when QEMU is launched with -drive ...,cache=writethrough
(the default) the metadata cache *must* be in writethrough mode
instead of writeback mode.  In other words, every metadata update
needs to be written to the image file before we complete the guest's
write request.  This means the metadata cache only hides the metadata
performance issue when -drive ...,cache=direct|writeback are used
because there we can keep metadata changes buffered in memory until
the guest flushes the emulated disk write cache.

QED mode is a solution for -drive ...,cache=writethrough|directsync.
 It simply doesn't update refcount metadata in the qcow2 image file
immediately in exchange for a refcount fixup step that is introduced
when opening the image file.  It's like doing an fsck operation on a
file system when mounting it.

Stefan



Re: [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts

2012-05-31 Thread Zhi Yong Wu
On Thu, May 31, 2012 at 5:26 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Wed, May 30, 2012 at 9:31 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote:
 On Sat, May 12, 2012 at 12:48 AM, Kevin Wolf kw...@redhat.com wrote:
 A prerequisite for a QED mode in qcow2, which doesn't update the refcount
 Recently some new concepts such as QED mode in qcow2 are seen
 frequencely, can anyone explain what it means? thanks.

 qcow2 has more metadata than qed.  More metadata means more write
 operations when allocating new clusters.

 In order to overcome this performance issue qcow2 has a metadata
 cache.  But when QEMU is launched with -drive ...,cache=writethrough
 (the default) the metadata cache *must* be in writethrough mode
Why must i be? If the option with -drive ..,cache=writethrough is
specified. it means that host page cache is on while guest disk cache
is off. Since the metadata cache exists in host page cache, not guest,
i think that it is in writeback mode.

 instead of writeback mode.  In other words, every metadata update
 needs to be written to the image file before we complete the guest's
What will mean one guest's wirte request is completed?
 write request.  This means the metadata cache only hides the metadata
 performance issue when -drive ...,cache=direct|writeback are used
 because there we can keep metadata changes buffered in memory until
 the guest flushes the emulated disk write cache.

 QED mode is a solution for -drive ...,cache=writethrough|directsync.
  It simply doesn't update refcount metadata in the qcow2 image file
 immediately in exchange for a refcount fixup step that is introduced
Can you say this with more details? Why is this step need only when
image file is opened? After image file is opened, and some guest's
write requests are completed, maybe the refcount fixup step need to be
done once.
 when opening the image file.  It's like doing an fsck operation on a
 file system when mounting it.

 Stefan



-- 
Regards,

Zhi Yong Wu



Re: [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts

2012-05-30 Thread Zhi Yong Wu
On Sat, May 12, 2012 at 12:48 AM, Kevin Wolf kw...@redhat.com wrote:
 A prerequisite for a QED mode in qcow2, which doesn't update the refcount
Recently some new concepts such as QED mode in qcow2 are seen
frequencely, can anyone explain what it means? thanks.

 table except on clean shutdown, is that refcounts can be repaired when the
 image is opened the next time after a crash.

 This series adds a qemu-img check option that doesn't only check, but also
 tries to fix the errors that it found.

 Kevin Wolf (3):
  qemu-img check -r for repairing images
  qemu-img check: Print fixed clusters and recheck
  qcow2: Support for fixing refcount inconsistencies

  block.c                |    4 ++--
  block.h                |    9 -
  block/qcow2-refcount.c |   27 +--
  block/qcow2.c          |    5 +++--
  block/qcow2.h          |    3 ++-
  block/qed-check.c      |    2 ++
  block/qed.c            |    5 +++--
  block/vdi.c            |    7 ++-
  block_int.h            |    3 ++-
  qemu-img-cmds.hx       |    4 ++--
  qemu-img.c             |   35 ---
  qemu-img.texi          |    7 ++-
  12 files changed, 93 insertions(+), 18 deletions(-)

 --
 1.7.6.5





-- 
Regards,

Zhi Yong Wu



Re: [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts

2012-05-25 Thread Stefan Hajnoczi
On Fri, May 11, 2012 at 5:48 PM, Kevin Wolf kw...@redhat.com wrote:
 A prerequisite for a QED mode in qcow2, which doesn't update the refcount
 table except on clean shutdown, is that refcounts can be repaired when the
 image is opened the next time after a crash.

 This series adds a qemu-img check option that doesn't only check, but also
 tries to fix the errors that it found.

 Kevin Wolf (3):
  qemu-img check -r for repairing images
  qemu-img check: Print fixed clusters and recheck
  qcow2: Support for fixing refcount inconsistencies

  block.c                |    4 ++--
  block.h                |    9 -
  block/qcow2-refcount.c |   27 +--
  block/qcow2.c          |    5 +++--
  block/qcow2.h          |    3 ++-
  block/qed-check.c      |    2 ++
  block/qed.c            |    5 +++--
  block/vdi.c            |    7 ++-
  block_int.h            |    3 ++-
  qemu-img-cmds.hx       |    4 ++--
  qemu-img.c             |   35 ---
  qemu-img.texi          |    7 ++-
  12 files changed, 93 insertions(+), 18 deletions(-)

Looks good except for the one comment I posted.

Stefan



[Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts

2012-05-11 Thread Kevin Wolf
A prerequisite for a QED mode in qcow2, which doesn't update the refcount
table except on clean shutdown, is that refcounts can be repaired when the
image is opened the next time after a crash.

This series adds a qemu-img check option that doesn't only check, but also
tries to fix the errors that it found.

Kevin Wolf (3):
  qemu-img check -r for repairing images
  qemu-img check: Print fixed clusters and recheck
  qcow2: Support for fixing refcount inconsistencies

 block.c|4 ++--
 block.h|9 -
 block/qcow2-refcount.c |   27 +--
 block/qcow2.c  |5 +++--
 block/qcow2.h  |3 ++-
 block/qed-check.c  |2 ++
 block/qed.c|5 +++--
 block/vdi.c|7 ++-
 block_int.h|3 ++-
 qemu-img-cmds.hx   |4 ++--
 qemu-img.c |   35 ---
 qemu-img.texi  |7 ++-
 12 files changed, 93 insertions(+), 18 deletions(-)

-- 
1.7.6.5