Re: [Qemu-devel] [Qemu-block] [PATCH v3] block/vdi: Use bdrv_flush after metadata updates

2015-05-07 Thread Eric Blake
On 05/06/2015 10:04 PM, Zhe Qiu wrote:
> From: phoeagon 
> 
> In reference to 
> b0ad5a455d7e5352d4c86ba945112011dbeadfb8~078a458e077d6b0db262c4b05fee51d01de2d1d2,
>  metadata writes to qcow2/cow/qcow/vpc/vmdk are all synced prior to 
> succeeding writes.
> 

Please wrap commit comments to be under 80 columns (in fact, under 72 is
good, because 'git log' adds spaces when displaying commit bodies).

Your notation of commit~commit is unusual; ranges in git are usually
spelled commit..commit.  Also, it's okay to abbreviate commit SHAs to 8
bytes or so.  So to shrink your long line, I'd write b0ad5a45..078a458e.

> Only when write is successful that bdrv_flush is called.
> 
> Signed-off-by: Zhe Qiu 

Your 'From:' and 'Signed-off-by:' lines have different spellings, which
makes it more confusing when searching for patches by you.  You can add
an entry to .mailmap (as a separate patch) to retro-actively consolidate
entries, but it is better to catch things like this up front before we
even have the problem of separate names.

I suspect you want "Zhe Qiu" as the spelling of your name in both lines
(git config can be taught to set up the preferred name to attribute both
for signing patches and for sending email).  It is also acceptable to
use UTF-8 to spell your name in native characters, or even a combination
of "native name (ascii counterpart)"

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [Qemu-block] [PATCH v3] block/vdi: Use bdrv_flush after metadata updates

2015-05-08 Thread Kevin Wolf
Am 07.05.2015 um 06:04 hat Zhe Qiu geschrieben:
> From: phoeagon 
> 
> In reference to 
> b0ad5a455d7e5352d4c86ba945112011dbeadfb8~078a458e077d6b0db262c4b05fee51d01de2d1d2,
>  metadata writes to qcow2/cow/qcow/vpc/vmdk are all synced prior to 
> succeeding writes.

The justification for these patches (in 2010!) was more or less that we
didn't know whether the implementations were safe without the flushes.
Many of the flushes added back then have been removed again until today
because they have turned out to be unnecessary.

> Only when write is successful that bdrv_flush is called.
> 
> Signed-off-by: Zhe Qiu 

Please describe why VDI needs this flush to be safe. This is best done
by describing a case where not doing the flush would lead to image
corruption in case of a crash.

To my knowledge, the VDI driver is completely safe without it.

Kevin



Re: [Qemu-devel] [Qemu-block] [PATCH v3] block/vdi: Use bdrv_flush after metadata updates

2015-05-08 Thread phoeagon
In case of correctness, lacking a sync here does not introduce data
corruption I can think of. But this reduces the volatile window during
which the metadata changes are NOT guaranteed on disk. Without a barrier,
in case of power loss you may end up with the bitmap changes on disk and
not the header block, or vice versa. Neither introduces data corruption
directly, but since VDI doesn't have proper fix mechanism for qemu-img,
once the leak is introduced you have to "convert" to fix it, consuming a
long time if the disk is large.

This patch does not fix the issue entirely, and it does not substitute for
proper check-and-fix implementation. But this should bring about minor
performance degradation (only 1 extra sync per allocation) but greatly
reduces the metadata inconsistency window.

On Fri, May 8, 2015 at 6:43 PM Kevin Wolf  wrote:

> Am 07.05.2015 um 06:04 hat Zhe Qiu geschrieben:
> > From: phoeagon 
> >
> > In reference to
> b0ad5a455d7e5352d4c86ba945112011dbeadfb8~078a458e077d6b0db262c4b05fee51d01de2d1d2,
> metadata writes to qcow2/cow/qcow/vpc/vmdk are all synced prior to
> succeeding writes.
>
> The justification for these patches (in 2010!) was more or less that we
> didn't know whether the implementations were safe without the flushes.
> Many of the flushes added back then have been removed again until today
> because they have turned out to be unnecessary.
>
> > Only when write is successful that bdrv_flush is called.
> >
> > Signed-off-by: Zhe Qiu 
>
> Please describe why VDI needs this flush to be safe. This is best done
> by describing a case where not doing the flush would lead to image
> corruption in case of a crash.
>
> To my knowledge, the VDI driver is completely safe without it.
>
> Kevin
>


Re: [Qemu-devel] [Qemu-block] [PATCH v3] block/vdi: Use bdrv_flush after metadata updates

2015-05-08 Thread Kevin Wolf
Am 08.05.2015 um 13:50 hat phoeagon geschrieben:
> In case of correctness, lacking a sync here does not introduce data corruption
> I can think of. But this reduces the volatile window during which the metadata
> changes are NOT guaranteed on disk. Without a barrier, in case of power loss
> you may end up with the bitmap changes on disk and not the header block, or
> vice versa. Neither introduces data corruption directly, but since VDI doesn't
> have proper fix mechanism for qemu-img, once the leak is introduced you have 
> to
> "convert" to fix it, consuming a long time if the disk is large.

This is true. I'm not sure how big a problem this is in practice,
though.

> This patch does not fix the issue entirely, and it does not substitute for
> proper check-and-fix implementation. But this should bring about minor
> performance degradation (only 1 extra sync per allocation) but greatly reduces
> the metadata inconsistency window.

Did you benchmark this? From the past experience with flushes in qemu
block drivers, one sync per allocation certainly doesn't sound "minor".

What could possibly save us from the worst is that VDI has a relatively
large block size (or rather, that we don't support images with different
block sizes).

Kevin



Re: [Qemu-devel] [Qemu-block] [PATCH v3] block/vdi: Use bdrv_flush after metadata updates

2015-05-08 Thread phoeagon
I tested it on host-btrfs, guest-ext4, (connected to guest via virtualized
IDE) with 1G VDI image, testing with "dbench 10":

synced:
Writeback: 39.4852M/s 326.812ms
Unsafe: 432.72M/s 1029.313ms

normal:
Writeback: 39.1884M/s 311.383ms
Unsafe: 413.592M/s 280.951ms

Although I hear that dbench is not a good I/O benchmark (and iozone is just
a little too much hassle) but I'm pretty sure the difference, if any, is
within fluctuation in this case. Under my setting even a raw "dd
of=/dev/sdb" (from within a VM) doesn't have higher than 20M/s even without
this extra write barrier in the proposed patch.

So although what described above is not comprehensive (you can probably
extract the most overhead by deliberately O_DIRECT writing at 1M stride,
512K each, no application level sync in "writeback" cache mode of VDI, what
originally mostly resides in host page cache now must be flushed to hard
disk, probably in an inconvenient order if host FS highly fragmented), I
doubt consecutive raw writes cover several Megabytes or guest-FS based
workload would see much performance overhead after introducing this extra
sync.

On Fri, May 8, 2015 at 8:02 PM Kevin Wolf  wrote:

> Am 08.05.2015 um 13:50 hat phoeagon geschrieben:
> > In case of correctness, lacking a sync here does not introduce data
> corruption
> > I can think of. But this reduces the volatile window during which the
> metadata
> > changes are NOT guaranteed on disk. Without a barrier, in case of power
> loss
> > you may end up with the bitmap changes on disk and not the header block,
> or
> > vice versa. Neither introduces data corruption directly, but since VDI
> doesn't
> > have proper fix mechanism for qemu-img, once the leak is introduced you
> have to
> > "convert" to fix it, consuming a long time if the disk is large.
>
> This is true. I'm not sure how big a problem this is in practice,
> though.
>
> > This patch does not fix the issue entirely, and it does not substitute
> for
> > proper check-and-fix implementation. But this should bring about minor
> > performance degradation (only 1 extra sync per allocation) but greatly
> reduces
> > the metadata inconsistency window.
>
> Did you benchmark this? From the past experience with flushes in qemu
> block drivers, one sync per allocation certainly doesn't sound "minor".
>
> What could possibly save us from the worst is that VDI has a relatively
> large block size (or rather, that we don't support images with different
> block sizes).
>
> Kevin
>