Re: [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts
On Wed, Jun 6, 2012 at 3:53 PM, Zhi Yong Wu zwu.ker...@gmail.com wrote: On Wed, Jun 6, 2012 at 6:32 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Fri, Jun 1, 2012 at 9:26 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote: On Fri, Jun 1, 2012 at 4:06 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Fri, Jun 1, 2012 at 6:22 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote: On Thu, May 31, 2012 at 5:26 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Wed, May 30, 2012 at 9:31 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote: On Sat, May 12, 2012 at 12:48 AM, Kevin Wolf kw...@redhat.com wrote: A prerequisite for a QED mode in qcow2, which doesn't update the refcount Recently some new concepts such as QED mode in qcow2 are seen frequencely, can anyone explain what it means? thanks. qcow2 has more metadata than qed. More metadata means more write operations when allocating new clusters. In order to overcome this performance issue qcow2 has a metadata cache. But when QEMU is launched with -drive ...,cache=writethrough (the default) the metadata cache *must* be in writethrough mode Why must i be? If the option with -drive ..,cache=writethrough is specified. it means that host page cache is on while guest disk cache is off. Since the metadata cache exists in host page cache, not guest, i think that it is in writeback mode. Since the emulated disk write cache is off, we must ensure that guest writes are on disk before completing them. Therefore we cannot cache metadata updates in host RAM - it would be lost on power failure but But host page cache is *on* in this mode, which means that metadata should be cached in host RAM. how do you explain this? cache=writethrough means that the file is opened with O_SYNC. Every single write reaches the physical disk - that's why it's called a writethrough cache. Read requests, however, can be satisfied from the host page cache. In other words, cache=writethrough ensures that all data reaches the disk but may give performance benefits to read-heavy workloads (especially when guest RAM is much smaller than host RAM, so the host page cache would have a high hit rate). Ah, i see now, cache=writethrough mean that host page cache is applied to read request, not write. thanks. Writes are placed in the host page cache so future reads can be served from the cache. But O_SYNC also forces the kernel to immediately sync the data in the host page cache to disk. Stefan
Re: [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts
On Fri, Jun 1, 2012 at 9:26 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote: On Fri, Jun 1, 2012 at 4:06 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Fri, Jun 1, 2012 at 6:22 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote: On Thu, May 31, 2012 at 5:26 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Wed, May 30, 2012 at 9:31 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote: On Sat, May 12, 2012 at 12:48 AM, Kevin Wolf kw...@redhat.com wrote: A prerequisite for a QED mode in qcow2, which doesn't update the refcount Recently some new concepts such as QED mode in qcow2 are seen frequencely, can anyone explain what it means? thanks. qcow2 has more metadata than qed. More metadata means more write operations when allocating new clusters. In order to overcome this performance issue qcow2 has a metadata cache. But when QEMU is launched with -drive ...,cache=writethrough (the default) the metadata cache *must* be in writethrough mode Why must i be? If the option with -drive ..,cache=writethrough is specified. it means that host page cache is on while guest disk cache is off. Since the metadata cache exists in host page cache, not guest, i think that it is in writeback mode. Since the emulated disk write cache is off, we must ensure that guest writes are on disk before completing them. Therefore we cannot cache metadata updates in host RAM - it would be lost on power failure but But host page cache is *on* in this mode, which means that metadata should be cached in host RAM. how do you explain this? cache=writethrough means that the file is opened with O_SYNC. Every single write reaches the physical disk - that's why it's called a writethrough cache. Read requests, however, can be satisfied from the host page cache. In other words, cache=writethrough ensures that all data reaches the disk but may give performance benefits to read-heavy workloads (especially when guest RAM is much smaller than host RAM, so the host page cache would have a high hit rate). we promised the guest its writes reached the disk! instead of writeback mode. In other words, every metadata update needs to be written to the image file before we complete the guest's What will mean one guest's wirte request is completed? For example, virtio-blk fills in the success status code and raises an interrupt. This notifies the guest that the write is done. Great, thanks. write request. This means the metadata cache only hides the metadata performance issue when -drive ...,cache=direct|writeback are used because there we can keep metadata changes buffered in memory until the guest flushes the emulated disk write cache. QED mode is a solution for -drive ...,cache=writethrough|directsync. It simply doesn't update refcount metadata in the qcow2 image file l1/l2 info need to be updated to qcow2 image file? Yes, this is necessary to ensure written data is accessible in the future. Without the L1/L2 tables we cannot find the data we wrote. Stefan
Re: [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts
On Wed, Jun 6, 2012 at 6:32 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Fri, Jun 1, 2012 at 9:26 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote: On Fri, Jun 1, 2012 at 4:06 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Fri, Jun 1, 2012 at 6:22 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote: On Thu, May 31, 2012 at 5:26 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Wed, May 30, 2012 at 9:31 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote: On Sat, May 12, 2012 at 12:48 AM, Kevin Wolf kw...@redhat.com wrote: A prerequisite for a QED mode in qcow2, which doesn't update the refcount Recently some new concepts such as QED mode in qcow2 are seen frequencely, can anyone explain what it means? thanks. qcow2 has more metadata than qed. More metadata means more write operations when allocating new clusters. In order to overcome this performance issue qcow2 has a metadata cache. But when QEMU is launched with -drive ...,cache=writethrough (the default) the metadata cache *must* be in writethrough mode Why must i be? If the option with -drive ..,cache=writethrough is specified. it means that host page cache is on while guest disk cache is off. Since the metadata cache exists in host page cache, not guest, i think that it is in writeback mode. Since the emulated disk write cache is off, we must ensure that guest writes are on disk before completing them. Therefore we cannot cache metadata updates in host RAM - it would be lost on power failure but But host page cache is *on* in this mode, which means that metadata should be cached in host RAM. how do you explain this? cache=writethrough means that the file is opened with O_SYNC. Every single write reaches the physical disk - that's why it's called a writethrough cache. Read requests, however, can be satisfied from the host page cache. In other words, cache=writethrough ensures that all data reaches the disk but may give performance benefits to read-heavy workloads (especially when guest RAM is much smaller than host RAM, so the host page cache would have a high hit rate). Ah, i see now, cache=writethrough mean that host page cache is applied to read request, not write. thanks. we promised the guest its writes reached the disk! instead of writeback mode. In other words, every metadata update needs to be written to the image file before we complete the guest's What will mean one guest's wirte request is completed? For example, virtio-blk fills in the success status code and raises an interrupt. This notifies the guest that the write is done. Great, thanks. write request. This means the metadata cache only hides the metadata performance issue when -drive ...,cache=direct|writeback are used because there we can keep metadata changes buffered in memory until the guest flushes the emulated disk write cache. QED mode is a solution for -drive ...,cache=writethrough|directsync. It simply doesn't update refcount metadata in the qcow2 image file l1/l2 info need to be updated to qcow2 image file? Yes, this is necessary to ensure written data is accessible in the future. Without the L1/L2 tables we cannot find the data we wrote. Stefan -- Regards, Zhi Yong Wu
Re: [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts
On Fri, Jun 1, 2012 at 6:22 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote: On Thu, May 31, 2012 at 5:26 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Wed, May 30, 2012 at 9:31 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote: On Sat, May 12, 2012 at 12:48 AM, Kevin Wolf kw...@redhat.com wrote: A prerequisite for a QED mode in qcow2, which doesn't update the refcount Recently some new concepts such as QED mode in qcow2 are seen frequencely, can anyone explain what it means? thanks. qcow2 has more metadata than qed. More metadata means more write operations when allocating new clusters. In order to overcome this performance issue qcow2 has a metadata cache. But when QEMU is launched with -drive ...,cache=writethrough (the default) the metadata cache *must* be in writethrough mode Why must i be? If the option with -drive ..,cache=writethrough is specified. it means that host page cache is on while guest disk cache is off. Since the metadata cache exists in host page cache, not guest, i think that it is in writeback mode. Since the emulated disk write cache is off, we must ensure that guest writes are on disk before completing them. Therefore we cannot cache metadata updates in host RAM - it would be lost on power failure but we promised the guest its writes reached the disk! instead of writeback mode. In other words, every metadata update needs to be written to the image file before we complete the guest's What will mean one guest's wirte request is completed? For example, virtio-blk fills in the success status code and raises an interrupt. This notifies the guest that the write is done. write request. This means the metadata cache only hides the metadata performance issue when -drive ...,cache=direct|writeback are used because there we can keep metadata changes buffered in memory until the guest flushes the emulated disk write cache. QED mode is a solution for -drive ...,cache=writethrough|directsync. It simply doesn't update refcount metadata in the qcow2 image file immediately in exchange for a refcount fixup step that is introduced Can you say this with more details? Why is this step need only when image file is opened? After image file is opened, and some guest's write requests are completed, maybe the refcount fixup step need to be done once. If we don't update refcounts on disk then they become outdated and no longer reflect the true allocation information. It's not safe to rely on outdated refcount information since we could allocate the same cluster multiple times - this means data corruption. By running a consistency check when opening a dirty image file we guarantee that we have accurate refcount information again. As an optimization we will commit refcount information to disk when closing the image file and mark it clean. This means a clean QEMU shutdown does not require a consistency check on startup - but in the worst case (power failure or crash) we will have a dirty image file. Stefan
Re: [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts
On Fri, Jun 1, 2012 at 4:06 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Fri, Jun 1, 2012 at 6:22 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote: On Thu, May 31, 2012 at 5:26 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Wed, May 30, 2012 at 9:31 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote: On Sat, May 12, 2012 at 12:48 AM, Kevin Wolf kw...@redhat.com wrote: A prerequisite for a QED mode in qcow2, which doesn't update the refcount Recently some new concepts such as QED mode in qcow2 are seen frequencely, can anyone explain what it means? thanks. qcow2 has more metadata than qed. More metadata means more write operations when allocating new clusters. In order to overcome this performance issue qcow2 has a metadata cache. But when QEMU is launched with -drive ...,cache=writethrough (the default) the metadata cache *must* be in writethrough mode Why must i be? If the option with -drive ..,cache=writethrough is specified. it means that host page cache is on while guest disk cache is off. Since the metadata cache exists in host page cache, not guest, i think that it is in writeback mode. Since the emulated disk write cache is off, we must ensure that guest writes are on disk before completing them. Therefore we cannot cache metadata updates in host RAM - it would be lost on power failure but But host page cache is *on* in this mode, which means that metadata should be cached in host RAM. how do you explain this? we promised the guest its writes reached the disk! instead of writeback mode. In other words, every metadata update needs to be written to the image file before we complete the guest's What will mean one guest's wirte request is completed? For example, virtio-blk fills in the success status code and raises an interrupt. This notifies the guest that the write is done. Great, thanks. write request. This means the metadata cache only hides the metadata performance issue when -drive ...,cache=direct|writeback are used because there we can keep metadata changes buffered in memory until the guest flushes the emulated disk write cache. QED mode is a solution for -drive ...,cache=writethrough|directsync. It simply doesn't update refcount metadata in the qcow2 image file l1/l2 info need to be updated to qcow2 image file? immediately in exchange for a refcount fixup step that is introduced Can you say this with more details? Why is this step need only when image file is opened? After image file is opened, and some guest's write requests are completed, maybe the refcount fixup step need to be done once. If we don't update refcounts on disk then they become outdated and no longer reflect the true allocation information. It's not safe to rely on outdated refcount information since we could allocate the same cluster multiple times - this means data corruption. By running a consistency check when opening a dirty image file we guarantee that we have accurate refcount information again. ah, i got it now. As an optimization we will commit refcount information to disk when closing the image file and mark it clean. This means a clean QEMU shutdown does not require a consistency check on startup - but in the worst case (power failure or crash) we will have a dirty image file. Yeah, a consistency check on startup is good, i think. thanks. Stefan -- Regards, Zhi Yong Wu
Re: [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts
On Wed, May 30, 2012 at 9:31 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote: On Sat, May 12, 2012 at 12:48 AM, Kevin Wolf kw...@redhat.com wrote: A prerequisite for a QED mode in qcow2, which doesn't update the refcount Recently some new concepts such as QED mode in qcow2 are seen frequencely, can anyone explain what it means? thanks. qcow2 has more metadata than qed. More metadata means more write operations when allocating new clusters. In order to overcome this performance issue qcow2 has a metadata cache. But when QEMU is launched with -drive ...,cache=writethrough (the default) the metadata cache *must* be in writethrough mode instead of writeback mode. In other words, every metadata update needs to be written to the image file before we complete the guest's write request. This means the metadata cache only hides the metadata performance issue when -drive ...,cache=direct|writeback are used because there we can keep metadata changes buffered in memory until the guest flushes the emulated disk write cache. QED mode is a solution for -drive ...,cache=writethrough|directsync. It simply doesn't update refcount metadata in the qcow2 image file immediately in exchange for a refcount fixup step that is introduced when opening the image file. It's like doing an fsck operation on a file system when mounting it. Stefan
Re: [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts
On Thu, May 31, 2012 at 5:26 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Wed, May 30, 2012 at 9:31 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote: On Sat, May 12, 2012 at 12:48 AM, Kevin Wolf kw...@redhat.com wrote: A prerequisite for a QED mode in qcow2, which doesn't update the refcount Recently some new concepts such as QED mode in qcow2 are seen frequencely, can anyone explain what it means? thanks. qcow2 has more metadata than qed. More metadata means more write operations when allocating new clusters. In order to overcome this performance issue qcow2 has a metadata cache. But when QEMU is launched with -drive ...,cache=writethrough (the default) the metadata cache *must* be in writethrough mode Why must i be? If the option with -drive ..,cache=writethrough is specified. it means that host page cache is on while guest disk cache is off. Since the metadata cache exists in host page cache, not guest, i think that it is in writeback mode. instead of writeback mode. In other words, every metadata update needs to be written to the image file before we complete the guest's What will mean one guest's wirte request is completed? write request. This means the metadata cache only hides the metadata performance issue when -drive ...,cache=direct|writeback are used because there we can keep metadata changes buffered in memory until the guest flushes the emulated disk write cache. QED mode is a solution for -drive ...,cache=writethrough|directsync. It simply doesn't update refcount metadata in the qcow2 image file immediately in exchange for a refcount fixup step that is introduced Can you say this with more details? Why is this step need only when image file is opened? After image file is opened, and some guest's write requests are completed, maybe the refcount fixup step need to be done once. when opening the image file. It's like doing an fsck operation on a file system when mounting it. Stefan -- Regards, Zhi Yong Wu
Re: [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts
On Sat, May 12, 2012 at 12:48 AM, Kevin Wolf kw...@redhat.com wrote: A prerequisite for a QED mode in qcow2, which doesn't update the refcount Recently some new concepts such as QED mode in qcow2 are seen frequencely, can anyone explain what it means? thanks. table except on clean shutdown, is that refcounts can be repaired when the image is opened the next time after a crash. This series adds a qemu-img check option that doesn't only check, but also tries to fix the errors that it found. Kevin Wolf (3): qemu-img check -r for repairing images qemu-img check: Print fixed clusters and recheck qcow2: Support for fixing refcount inconsistencies block.c | 4 ++-- block.h | 9 - block/qcow2-refcount.c | 27 +-- block/qcow2.c | 5 +++-- block/qcow2.h | 3 ++- block/qed-check.c | 2 ++ block/qed.c | 5 +++-- block/vdi.c | 7 ++- block_int.h | 3 ++- qemu-img-cmds.hx | 4 ++-- qemu-img.c | 35 --- qemu-img.texi | 7 ++- 12 files changed, 93 insertions(+), 18 deletions(-) -- 1.7.6.5 -- Regards, Zhi Yong Wu
Re: [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts
On Fri, May 11, 2012 at 5:48 PM, Kevin Wolf kw...@redhat.com wrote: A prerequisite for a QED mode in qcow2, which doesn't update the refcount table except on clean shutdown, is that refcounts can be repaired when the image is opened the next time after a crash. This series adds a qemu-img check option that doesn't only check, but also tries to fix the errors that it found. Kevin Wolf (3): qemu-img check -r for repairing images qemu-img check: Print fixed clusters and recheck qcow2: Support for fixing refcount inconsistencies block.c | 4 ++-- block.h | 9 - block/qcow2-refcount.c | 27 +-- block/qcow2.c | 5 +++-- block/qcow2.h | 3 ++- block/qed-check.c | 2 ++ block/qed.c | 5 +++-- block/vdi.c | 7 ++- block_int.h | 3 ++- qemu-img-cmds.hx | 4 ++-- qemu-img.c | 35 --- qemu-img.texi | 7 ++- 12 files changed, 93 insertions(+), 18 deletions(-) Looks good except for the one comment I posted. Stefan
[Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts
A prerequisite for a QED mode in qcow2, which doesn't update the refcount table except on clean shutdown, is that refcounts can be repaired when the image is opened the next time after a crash. This series adds a qemu-img check option that doesn't only check, but also tries to fix the errors that it found. Kevin Wolf (3): qemu-img check -r for repairing images qemu-img check: Print fixed clusters and recheck qcow2: Support for fixing refcount inconsistencies block.c|4 ++-- block.h|9 - block/qcow2-refcount.c | 27 +-- block/qcow2.c |5 +++-- block/qcow2.h |3 ++- block/qed-check.c |2 ++ block/qed.c|5 +++-- block/vdi.c|7 ++- block_int.h|3 ++- qemu-img-cmds.hx |4 ++-- qemu-img.c | 35 --- qemu-img.texi |7 ++- 12 files changed, 93 insertions(+), 18 deletions(-) -- 1.7.6.5