Re: [Qemu-devel] disk image: self-organized format or raw file

2014-08-15 Thread Kevin Wolf
Am 14.08.2014 um 22:53 hat Xingbo Wu geschrieben:
> >> >> The main trick of QED was to introduce a dirty flag, which allowed to
> >> >> call fdatasync() less often because it was okay for image metadata to
> >> >> become inconsistent. After a crash, you have to repair the image then.
> >> >>
> >> >
> >> > I'm very curious about this dirty flag trick. I was surprised when I
> >> > observed very fast 'sync write' performance on QED.
> >> > If it skips the fdatasync when processing the device 'flush' command from
> >> > guest, it literally cheats the guest as the data can be lost. Am I that 
> >> > correct?
> >> > Does the repairing make sure all the data written before the last
> >> > successful 'flush'
> >> > can be recovered?
> >> > To my understanding, the 'flush' command in guest asks for persistence.
> >> > Data has to be persistent on host storage after flush except for the
> >> > image opened with 'cache=unsafe' mode.
> >> >
> >>
> >> I have some different ideas. Please correct me if I make any mistake.
> >> The trick may not cause true consistency issues. The relaxed write
> >> ordering (less fdatasync) seems to be safe.
> >> The analysis on this is described in this
> >> [http://lists.nongnu.org/archive/html/qemu-devel/2010-09/msg00515.html].
> >
> > Yes, specifically point 3. Without the dirty flag, you would have to
> > ensure that the file size is updated first and then the L2 table entry
> > is written. (This would still allow cluster leaks that cannot be
> > reclaimed, but at least no data corruption.)
> >
> >> In my opinion the reason why the ordering is irreverent is that any
> >> uninitialized block could exist in a block device.
> >> Unordered update l1 and alloc-write l2 are also safe because
> >> uninitialized blocks in a file is always zero or beyond the EOF.
> >
> > Yes. This holds true because QED (unlike qcow2) cannot be used directly
> > on block devices. This is a real limitation.
> >
> 
> I don't know much about the best practices in virtualization. Could
> you give me some examples? Thanks.
> Do some products provide resizeable (automatically?) Logical Volumes
> and put one qcow2 on each LV?
> Anyway, does someone use a physical disk to hold only one qcow2 image
> for some special usage?

I would be surprised if someone used a whole physical disk for a single
qcow2 image, but some people always do crazier things than you can
imagine...

Anyway, oVirt uses LVs to store qcow2 images on. It resizes the LVs on
the fly as they fill up. This seems to be working quite well.

Kevin



Re: [Qemu-devel] disk image: self-organized format or raw file

2014-08-14 Thread Xingbo Wu
>> >> The main trick of QED was to introduce a dirty flag, which allowed to
>> >> call fdatasync() less often because it was okay for image metadata to
>> >> become inconsistent. After a crash, you have to repair the image then.
>> >>
>> >
>> > I'm very curious about this dirty flag trick. I was surprised when I
>> > observed very fast 'sync write' performance on QED.
>> > If it skips the fdatasync when processing the device 'flush' command from
>> > guest, it literally cheats the guest as the data can be lost. Am I that 
>> > correct?
>> > Does the repairing make sure all the data written before the last
>> > successful 'flush'
>> > can be recovered?
>> > To my understanding, the 'flush' command in guest asks for persistence.
>> > Data has to be persistent on host storage after flush except for the
>> > image opened with 'cache=unsafe' mode.
>> >
>>
>> I have some different ideas. Please correct me if I make any mistake.
>> The trick may not cause true consistency issues. The relaxed write
>> ordering (less fdatasync) seems to be safe.
>> The analysis on this is described in this
>> [http://lists.nongnu.org/archive/html/qemu-devel/2010-09/msg00515.html].
>
> Yes, specifically point 3. Without the dirty flag, you would have to
> ensure that the file size is updated first and then the L2 table entry
> is written. (This would still allow cluster leaks that cannot be
> reclaimed, but at least no data corruption.)
>
>> In my opinion the reason why the ordering is irreverent is that any
>> uninitialized block could exist in a block device.
>> Unordered update l1 and alloc-write l2 are also safe because
>> uninitialized blocks in a file is always zero or beyond the EOF.
>
> Yes. This holds true because QED (unlike qcow2) cannot be used directly
> on block devices. This is a real limitation.
>

I don't know much about the best practices in virtualization. Could
you give me some examples? Thanks.
Do some products provide resizeable (automatically?) Logical Volumes
and put one qcow2 on each LV?
Anyway, does someone use a physical disk to hold only one qcow2 image
for some special usage?

>> Any unsuccessful write of the l1/l2/data would cause the loss of the
>> data. However, at that point the guest must not have returned from its
>> last 'flush' so the guest won't have consistency issue on its data.
>> The repair process (qed-check.c) doesn't recover data, it only does
>> some scanning for processing new requests. the 'check' can be
>> considered as a normal operation of bdrv_open().
>>
>> BTW, filesystems heavily use this kind of 'tricks' to improve performance.
>> The sync write could return as a indication of data being persistently
>> written, while the data may have only been committed to the journal.
>> Scanning and recovering from journal is considered as the normal job
>> of filesystems.
>
> But this is not a journal. It is something like fsck in ext2 times.
>
> I believe qcow2 could be optimised a bit more if we added a journal to
> it, but currently qcow2 performance isn't a problem urgent enough that I
> could easily find the time to implement it. (We've discussed it several
> times in the past.)
>
> Kevin



-- 

Cheers!
   吴兴博  Wu, Xingbo 



Re: [Qemu-devel] disk image: self-organized format or raw file

2014-08-14 Thread Kevin Wolf
Am 14.08.2014 um 04:42 hat Xingbo Wu geschrieben:
> On Wed, Aug 13, 2014 at 5:04 PM, Xingbo Wu  wrote:
> > On Wed, Aug 13, 2014 at 2:32 PM, Kevin Wolf  wrote:
> >> Am 13.08.2014 um 18:38 hat Xingbo Wu geschrieben:
> >>> On Wed, Aug 13, 2014 at 11:54 AM, Kevin Wolf  wrote:
> >>> > Am 12.08.2014 um 01:38 hat 吴兴博 geschrieben:
> >>> >> Hello,
> >>> >>
> >>> >>   The introduction in the wiki page present several advantages of 
> >>> >> qcow2 [1].
> >>> >> But I'm a little confused. I really appreciate if any one can give me 
> >>> >> some help
> >>> >> on this :).
> >>> >>
> >>> >>  (1) Currently the raw format doesn't support COW. In other words, a 
> >>> >> raw image
> >>> >> cannot have a backing file. COW depends on the mapping table on which 
> >>> >> we it
> >>> >> knows whether each block/cluster is present (has been modified) in the 
> >>> >> current
> >>> >> image file. Modern file-systems like xfs/ext4/etc. provide extent/block
> >>> >> allocation information to user-level. Like what 'filefrag' does with 
> >>> >> ioctl
> >>> >> 'FIBMAP' and 'FIEMAP'. I guess the raw file driver (maybe 
> >>> >> block/raw-posix.c)
> >>> >> may obtain correct 'present information about blocks. However this 
> >>> >> information
> >>> >> may be limited to be aligned with file allocation unit size. Maybe 
> >>> >> it's just
> >>> >> because a raw file has no space to store the "backing file name"? I 
> >>> >> don't think
> >>> >> this could hinder the useful feature.
> >>> >>
> >>> >>  (2) As most popular filesystems support delay-allocation/on-demand 
> >>> >> allocation/
> >>> >> holes, whatever, a raw image is also thin provisioned as other 
> >>> >> formats. It
> >>> >> doesn't consume much disk space by storing useless zeros. However, I 
> >>> >> don't know
> >>> >> if there is any concern on whether fragmented extents would become a 
> >>> >> burden of
> >>> >> the host filesystem.
> >>> >>
> >>> >>  (3) For compression and encryption, I'm not an export on these topics 
> >>> >> at all
> >>> >> but I think these features may not be vital to a image format as both 
> >>> >> guest/
> >>> >> host's filesystem can also provide similar functionality.
> >>> >>
> >>> >>  (4) I don't have too much understanding on how snapshot works but I 
> >>> >> think
> >>> >> theoretically it would be using the techniques no more than that used 
> >>> >> in COW
> >>> >> and backing file.
> >>> >>
> >>> >> After all these thoughts, I still found no reason to not using a 'raw' 
> >>> >> file
> >>> >> image (engineering efforts in Qemu should not count as we don't ask  
> >>> >> for more
> >>> >> features from outside world).
> >>> >> I would be very sorry if my ignorance wasted your time.
> >>> >
> >>> > Even if it did work (that it's problematic is already discussed in other
> >>> > subthreads) what advantage would you get from using an extended raw
> >>> > driver compared to simply using qcow2, which supports all of this today?
> >>> >
> >>> > Kevin
> >>>
> >>>
> >>> I read several messages from this thread: "[RFC] qed: Add QEMU
> >>> Enhanced Disk format". To my understanding, if the new format can be
> >>> acceptable to the community:
> >>>   It needs to retain all the key features provided by qcow2,
> >>> especially for compression, encryption, and internal snapshot, as
> >>> mentioned in that thread.
> >>>   And, needless to say, it must run faster.
> >>>
> >>> Yes I agree it's at least a subset of the homework one need to do
> >>> before selling the new format to the community.
> >>
> >> So your goal is improved performance?
> >>
> >
> > Yes if performance is not improved I won't spend more time on it :).
> > I believe it's gonna be very difficult.
> >
> >> Why do you think that a raw driver with backing file support would run
> >> much faster than qcow2? It would have to solve the same problems, like
> >> doing efficient COW.
> >>
> >>> Thanks and another question:
> >>> What's the magic that makes QED runs faster than QCOW2?
> >>
> >> During cluster allocation (which is the real critical part), QED is a
> >> lot slower than today's qcow2. And by that I mean not just a few
> >> percent, but like half the performance. After that, when accessing
> >> already allocated data, both perform similar. Mailing list discussions
> >> of four years ago don't reflect accurately how qemu works today.
> >>
> >> The main trick of QED was to introduce a dirty flag, which allowed to
> >> call fdatasync() less often because it was okay for image metadata to
> >> become inconsistent. After a crash, you have to repair the image then.
> >>
> >
> > I'm very curious about this dirty flag trick. I was surprised when I
> > observed very fast 'sync write' performance on QED.
> > If it skips the fdatasync when processing the device 'flush' command from
> > guest, it literally cheats the guest as the data can be lost. Am I that 
> > correct?
> > Does the repairing make sure all the data written before the last
> > successful 'flush'
> > can be recovered?
> >

Re: [Qemu-devel] disk image: self-organized format or raw file

2014-08-13 Thread Xingbo Wu
On Wed, Aug 13, 2014 at 5:04 PM, Xingbo Wu  wrote:
> On Wed, Aug 13, 2014 at 2:32 PM, Kevin Wolf  wrote:
>> Am 13.08.2014 um 18:38 hat Xingbo Wu geschrieben:
>>> On Wed, Aug 13, 2014 at 11:54 AM, Kevin Wolf  wrote:
>>> > Am 12.08.2014 um 01:38 hat 吴兴博 geschrieben:
>>> >> Hello,
>>> >>
>>> >>   The introduction in the wiki page present several advantages of qcow2 
>>> >> [1].
>>> >> But I'm a little confused. I really appreciate if any one can give me 
>>> >> some help
>>> >> on this :).
>>> >>
>>> >>  (1) Currently the raw format doesn't support COW. In other words, a raw 
>>> >> image
>>> >> cannot have a backing file. COW depends on the mapping table on which we 
>>> >> it
>>> >> knows whether each block/cluster is present (has been modified) in the 
>>> >> current
>>> >> image file. Modern file-systems like xfs/ext4/etc. provide extent/block
>>> >> allocation information to user-level. Like what 'filefrag' does with 
>>> >> ioctl
>>> >> 'FIBMAP' and 'FIEMAP'. I guess the raw file driver (maybe 
>>> >> block/raw-posix.c)
>>> >> may obtain correct 'present information about blocks. However this 
>>> >> information
>>> >> may be limited to be aligned with file allocation unit size. Maybe it's 
>>> >> just
>>> >> because a raw file has no space to store the "backing file name"? I 
>>> >> don't think
>>> >> this could hinder the useful feature.
>>> >>
>>> >>  (2) As most popular filesystems support delay-allocation/on-demand 
>>> >> allocation/
>>> >> holes, whatever, a raw image is also thin provisioned as other formats. 
>>> >> It
>>> >> doesn't consume much disk space by storing useless zeros. However, I 
>>> >> don't know
>>> >> if there is any concern on whether fragmented extents would become a 
>>> >> burden of
>>> >> the host filesystem.
>>> >>
>>> >>  (3) For compression and encryption, I'm not an export on these topics 
>>> >> at all
>>> >> but I think these features may not be vital to a image format as both 
>>> >> guest/
>>> >> host's filesystem can also provide similar functionality.
>>> >>
>>> >>  (4) I don't have too much understanding on how snapshot works but I 
>>> >> think
>>> >> theoretically it would be using the techniques no more than that used in 
>>> >> COW
>>> >> and backing file.
>>> >>
>>> >> After all these thoughts, I still found no reason to not using a 'raw' 
>>> >> file
>>> >> image (engineering efforts in Qemu should not count as we don't ask  for 
>>> >> more
>>> >> features from outside world).
>>> >> I would be very sorry if my ignorance wasted your time.
>>> >
>>> > Even if it did work (that it's problematic is already discussed in other
>>> > subthreads) what advantage would you get from using an extended raw
>>> > driver compared to simply using qcow2, which supports all of this today?
>>> >
>>> > Kevin
>>>
>>>
>>> I read several messages from this thread: "[RFC] qed: Add QEMU
>>> Enhanced Disk format". To my understanding, if the new format can be
>>> acceptable to the community:
>>>   It needs to retain all the key features provided by qcow2,
>>> especially for compression, encryption, and internal snapshot, as
>>> mentioned in that thread.
>>>   And, needless to say, it must run faster.
>>>
>>> Yes I agree it's at least a subset of the homework one need to do
>>> before selling the new format to the community.
>>
>> So your goal is improved performance?
>>
>
> Yes if performance is not improved I won't spend more time on it :).
> I believe it's gonna be very difficult.
>
>> Why do you think that a raw driver with backing file support would run
>> much faster than qcow2? It would have to solve the same problems, like
>> doing efficient COW.
>>
>>> Thanks and another question:
>>> What's the magic that makes QED runs faster than QCOW2?
>>
>> During cluster allocation (which is the real critical part), QED is a
>> lot slower than today's qcow2. And by that I mean not just a few
>> percent, but like half the performance. After that, when accessing
>> already allocated data, both perform similar. Mailing list discussions
>> of four years ago don't reflect accurately how qemu works today.
>>
>> The main trick of QED was to introduce a dirty flag, which allowed to
>> call fdatasync() less often because it was okay for image metadata to
>> become inconsistent. After a crash, you have to repair the image then.
>>
>
> I'm very curious about this dirty flag trick. I was surprised when I
> observed very fast 'sync write' performance on QED.
> If it skips the fdatasync when processing the device 'flush' command from
> guest, it literally cheats the guest as the data can be lost. Am I that 
> correct?
> Does the repairing make sure all the data written before the last
> successful 'flush'
> can be recovered?
> To my understanding, the 'flush' command in guest asks for persistence.
> Data has to be persistent on host storage after flush except for the
> image opened with 'cache=unsafe' mode.
>

I have some different ideas. Please correct me if I make any mistake.
The trick may no

Re: [Qemu-devel] disk image: self-organized format or raw file

2014-08-13 Thread Eric Blake
On 08/13/2014 03:04 PM, Xingbo Wu wrote:

>>> I read several messages from this thread: "[RFC] qed: Add QEMU
>>> Enhanced Disk format". To my understanding, if the new format can be
>>> acceptable to the community:
>>>   It needs to retain all the key features provided by qcow2,
>>> especially for compression, encryption, and internal snapshot, as
>>> mentioned in that thread.

Encryption in qcow2 is currently a joke, that no one in their right mind
should be relying on.  If your new format approaches encryption in a
cryptographically sound manner, then your format might be considered
better even without beating qcow2 in benchmarks.

But from the sound of this thread, you aren't out to improve encrypted
images.  And even if you ARE hoping to improve encrypted images, it
might STILL be better to investigate how to enhance qcow2 to do a
cryptographically sound encryption (the idea floated on the list is to
let qcow2 do LUKS encryption of the guest-visible payload, while still
leaving the metadata unencrypted), rather than trying to do a completely
new format.

>>>   And, needless to say, it must run faster.
>>>
>>> Yes I agree it's at least a subset of the homework one need to do
>>> before selling the new format to the community.
>>
>> So your goal is improved performance?
>>
> 
> Yes if performance is not improved I won't spend more time on it :).
> I believe it's gonna be very difficult.

Good luck if you are willing to try it.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] disk image: self-organized format or raw file

2014-08-13 Thread Xingbo Wu
On Wed, Aug 13, 2014 at 2:32 PM, Kevin Wolf  wrote:
> Am 13.08.2014 um 18:38 hat Xingbo Wu geschrieben:
>> On Wed, Aug 13, 2014 at 11:54 AM, Kevin Wolf  wrote:
>> > Am 12.08.2014 um 01:38 hat 吴兴博 geschrieben:
>> >> Hello,
>> >>
>> >>   The introduction in the wiki page present several advantages of qcow2 
>> >> [1].
>> >> But I'm a little confused. I really appreciate if any one can give me 
>> >> some help
>> >> on this :).
>> >>
>> >>  (1) Currently the raw format doesn't support COW. In other words, a raw 
>> >> image
>> >> cannot have a backing file. COW depends on the mapping table on which we 
>> >> it
>> >> knows whether each block/cluster is present (has been modified) in the 
>> >> current
>> >> image file. Modern file-systems like xfs/ext4/etc. provide extent/block
>> >> allocation information to user-level. Like what 'filefrag' does with ioctl
>> >> 'FIBMAP' and 'FIEMAP'. I guess the raw file driver (maybe 
>> >> block/raw-posix.c)
>> >> may obtain correct 'present information about blocks. However this 
>> >> information
>> >> may be limited to be aligned with file allocation unit size. Maybe it's 
>> >> just
>> >> because a raw file has no space to store the "backing file name"? I don't 
>> >> think
>> >> this could hinder the useful feature.
>> >>
>> >>  (2) As most popular filesystems support delay-allocation/on-demand 
>> >> allocation/
>> >> holes, whatever, a raw image is also thin provisioned as other formats. It
>> >> doesn't consume much disk space by storing useless zeros. However, I 
>> >> don't know
>> >> if there is any concern on whether fragmented extents would become a 
>> >> burden of
>> >> the host filesystem.
>> >>
>> >>  (3) For compression and encryption, I'm not an export on these topics at 
>> >> all
>> >> but I think these features may not be vital to a image format as both 
>> >> guest/
>> >> host's filesystem can also provide similar functionality.
>> >>
>> >>  (4) I don't have too much understanding on how snapshot works but I think
>> >> theoretically it would be using the techniques no more than that used in 
>> >> COW
>> >> and backing file.
>> >>
>> >> After all these thoughts, I still found no reason to not using a 'raw' 
>> >> file
>> >> image (engineering efforts in Qemu should not count as we don't ask  for 
>> >> more
>> >> features from outside world).
>> >> I would be very sorry if my ignorance wasted your time.
>> >
>> > Even if it did work (that it's problematic is already discussed in other
>> > subthreads) what advantage would you get from using an extended raw
>> > driver compared to simply using qcow2, which supports all of this today?
>> >
>> > Kevin
>>
>>
>> I read several messages from this thread: "[RFC] qed: Add QEMU
>> Enhanced Disk format". To my understanding, if the new format can be
>> acceptable to the community:
>>   It needs to retain all the key features provided by qcow2,
>> especially for compression, encryption, and internal snapshot, as
>> mentioned in that thread.
>>   And, needless to say, it must run faster.
>>
>> Yes I agree it's at least a subset of the homework one need to do
>> before selling the new format to the community.
>
> So your goal is improved performance?
>

Yes if performance is not improved I won't spend more time on it :).
I believe it's gonna be very difficult.

> Why do you think that a raw driver with backing file support would run
> much faster than qcow2? It would have to solve the same problems, like
> doing efficient COW.
>
>> Thanks and another question:
>> What's the magic that makes QED runs faster than QCOW2?
>
> During cluster allocation (which is the real critical part), QED is a
> lot slower than today's qcow2. And by that I mean not just a few
> percent, but like half the performance. After that, when accessing
> already allocated data, both perform similar. Mailing list discussions
> of four years ago don't reflect accurately how qemu works today.
>
> The main trick of QED was to introduce a dirty flag, which allowed to
> call fdatasync() less often because it was okay for image metadata to
> become inconsistent. After a crash, you have to repair the image then.
>

I'm very curious about this dirty flag trick. I was surprised when I
observed very fast 'sync write' performance on QED.
If it skips the fdatasync when processing the device 'flush' command from
guest, it literally cheats the guest as the data can be lost. Am I that correct?
Does the repairing make sure all the data written before the last
successful 'flush'
can be recovered?
To my understanding, the 'flush' command in guest asks for persistence.
Data has to be persistent on host storage after flush except for the
image opened with 'cache=unsafe' mode.

> qcow2 supports the same with lazy_refcounts=on, but it's really only
> useful in rare cases, mostly with cache=writethrough.
>
>> In some simple
>> parallel IO tests QED can run a magnitude faster than QCOW2.  I saw
>> differences on simple/complex metadata organization, and coroutine/

Re: [Qemu-devel] disk image: self-organized format or raw file

2014-08-13 Thread Kevin Wolf
Am 13.08.2014 um 18:38 hat Xingbo Wu geschrieben:
> On Wed, Aug 13, 2014 at 11:54 AM, Kevin Wolf  wrote:
> > Am 12.08.2014 um 01:38 hat 吴兴博 geschrieben:
> >> Hello,
> >>
> >>   The introduction in the wiki page present several advantages of qcow2 
> >> [1].
> >> But I'm a little confused. I really appreciate if any one can give me some 
> >> help
> >> on this :).
> >>
> >>  (1) Currently the raw format doesn't support COW. In other words, a raw 
> >> image
> >> cannot have a backing file. COW depends on the mapping table on which we it
> >> knows whether each block/cluster is present (has been modified) in the 
> >> current
> >> image file. Modern file-systems like xfs/ext4/etc. provide extent/block
> >> allocation information to user-level. Like what 'filefrag' does with ioctl
> >> 'FIBMAP' and 'FIEMAP'. I guess the raw file driver (maybe 
> >> block/raw-posix.c)
> >> may obtain correct 'present information about blocks. However this 
> >> information
> >> may be limited to be aligned with file allocation unit size. Maybe it's 
> >> just
> >> because a raw file has no space to store the "backing file name"? I don't 
> >> think
> >> this could hinder the useful feature.
> >>
> >>  (2) As most popular filesystems support delay-allocation/on-demand 
> >> allocation/
> >> holes, whatever, a raw image is also thin provisioned as other formats. It
> >> doesn't consume much disk space by storing useless zeros. However, I don't 
> >> know
> >> if there is any concern on whether fragmented extents would become a 
> >> burden of
> >> the host filesystem.
> >>
> >>  (3) For compression and encryption, I'm not an export on these topics at 
> >> all
> >> but I think these features may not be vital to a image format as both 
> >> guest/
> >> host's filesystem can also provide similar functionality.
> >>
> >>  (4) I don't have too much understanding on how snapshot works but I think
> >> theoretically it would be using the techniques no more than that used in 
> >> COW
> >> and backing file.
> >>
> >> After all these thoughts, I still found no reason to not using a 'raw' file
> >> image (engineering efforts in Qemu should not count as we don't ask  for 
> >> more
> >> features from outside world).
> >> I would be very sorry if my ignorance wasted your time.
> >
> > Even if it did work (that it's problematic is already discussed in other
> > subthreads) what advantage would you get from using an extended raw
> > driver compared to simply using qcow2, which supports all of this today?
> >
> > Kevin
> 
> 
> I read several messages from this thread: "[RFC] qed: Add QEMU
> Enhanced Disk format". To my understanding, if the new format can be
> acceptable to the community:
>   It needs to retain all the key features provided by qcow2,
> especially for compression, encryption, and internal snapshot, as
> mentioned in that thread.
>   And, needless to say, it must run faster.
> 
> Yes I agree it's at least a subset of the homework one need to do
> before selling the new format to the community.

So your goal is improved performance?

Why do you think that a raw driver with backing file support would run
much faster than qcow2? It would have to solve the same problems, like
doing efficient COW.

> Thanks and another question:
> What's the magic that makes QED runs faster than QCOW2?

During cluster allocation (which is the real critical part), QED is a
lot slower than today's qcow2. And by that I mean not just a few
percent, but like half the performance. After that, when accessing
already allocated data, both perform similar. Mailing list discussions
of four years ago don't reflect accurately how qemu works today.

The main trick of QED was to introduce a dirty flag, which allowed to
call fdatasync() less often because it was okay for image metadata to
become inconsistent. After a crash, you have to repair the image then.

qcow2 supports the same with lazy_refcounts=on, but it's really only
useful in rare cases, mostly with cache=writethrough.

> In some simple
> parallel IO tests QED can run a magnitude faster than QCOW2.  I saw
> differences on simple/complex metadata organization, and coroutine/aio
> (however "bdrv_co_"s finally call "bdrv_aio_"s via "_em". If you can
> provide some insight on this I would be really appreciate.

Today, everything is internally coroutine operations, so every request
goes through bdrv_co_do_preadv/pwritev. The aio_* versions are just
wrappers around it for callers and block drivers that prefer a callback
based interface.

Kevin



Re: [Qemu-devel] disk image: self-organized format or raw file

2014-08-13 Thread Xingbo Wu
On Wed, Aug 13, 2014 at 11:54 AM, Kevin Wolf  wrote:
> Am 12.08.2014 um 01:38 hat 吴兴博 geschrieben:
>> Hello,
>>
>>   The introduction in the wiki page present several advantages of qcow2 [1].
>> But I'm a little confused. I really appreciate if any one can give me some 
>> help
>> on this :).
>>
>>  (1) Currently the raw format doesn't support COW. In other words, a raw 
>> image
>> cannot have a backing file. COW depends on the mapping table on which we it
>> knows whether each block/cluster is present (has been modified) in the 
>> current
>> image file. Modern file-systems like xfs/ext4/etc. provide extent/block
>> allocation information to user-level. Like what 'filefrag' does with ioctl
>> 'FIBMAP' and 'FIEMAP'. I guess the raw file driver (maybe block/raw-posix.c)
>> may obtain correct 'present information about blocks. However this 
>> information
>> may be limited to be aligned with file allocation unit size. Maybe it's just
>> because a raw file has no space to store the "backing file name"? I don't 
>> think
>> this could hinder the useful feature.
>>
>>  (2) As most popular filesystems support delay-allocation/on-demand 
>> allocation/
>> holes, whatever, a raw image is also thin provisioned as other formats. It
>> doesn't consume much disk space by storing useless zeros. However, I don't 
>> know
>> if there is any concern on whether fragmented extents would become a burden 
>> of
>> the host filesystem.
>>
>>  (3) For compression and encryption, I'm not an export on these topics at all
>> but I think these features may not be vital to a image format as both guest/
>> host's filesystem can also provide similar functionality.
>>
>>  (4) I don't have too much understanding on how snapshot works but I think
>> theoretically it would be using the techniques no more than that used in COW
>> and backing file.
>>
>> After all these thoughts, I still found no reason to not using a 'raw' file
>> image (engineering efforts in Qemu should not count as we don't ask  for more
>> features from outside world).
>> I would be very sorry if my ignorance wasted your time.
>
> Even if it did work (that it's problematic is already discussed in other
> subthreads) what advantage would you get from using an extended raw
> driver compared to simply using qcow2, which supports all of this today?
>
> Kevin


I read several messages from this thread: "[RFC] qed: Add QEMU
Enhanced Disk format". To my understanding, if the new format can be
acceptable to the community:
  It needs to retain all the key features provided by qcow2,
especially for compression, encryption, and internal snapshot, as
mentioned in that thread.
  And, needless to say, it must run faster.

Yes I agree it's at least a subset of the homework one need to do
before selling the new format to the community.

Thanks and another question:
What's the magic that makes QED runs faster than QCOW2? In some simple
parallel IO tests QED can run a magnitude faster than QCOW2.  I saw
differences on simple/complex metadata organization, and coroutine/aio
(however "bdrv_co_"s finally call "bdrv_aio_"s via "_em". If you can
provide some insight on this I would be really appreciate.


-- 

Cheers!
   吴兴博  Wu, Xingbo 



Re: [Qemu-devel] disk image: self-organized format or raw file

2014-08-13 Thread Kevin Wolf
Am 12.08.2014 um 01:38 hat 吴兴博 geschrieben:
> Hello,
> 
>   The introduction in the wiki page present several advantages of qcow2 [1].
> But I'm a little confused. I really appreciate if any one can give me some 
> help
> on this :).
> 
>  (1) Currently the raw format doesn't support COW. In other words, a raw image
> cannot have a backing file. COW depends on the mapping table on which we it
> knows whether each block/cluster is present (has been modified) in the current
> image file. Modern file-systems like xfs/ext4/etc. provide extent/block
> allocation information to user-level. Like what 'filefrag' does with ioctl
> 'FIBMAP' and 'FIEMAP'. I guess the raw file driver (maybe block/raw-posix.c)
> may obtain correct 'present information about blocks. However this information
> may be limited to be aligned with file allocation unit size. Maybe it's just
> because a raw file has no space to store the "backing file name"? I don't 
> think
> this could hinder the useful feature.
> 
>  (2) As most popular filesystems support delay-allocation/on-demand 
> allocation/
> holes, whatever, a raw image is also thin provisioned as other formats. It
> doesn't consume much disk space by storing useless zeros. However, I don't 
> know
> if there is any concern on whether fragmented extents would become a burden of
> the host filesystem.
> 
>  (3) For compression and encryption, I'm not an export on these topics at all
> but I think these features may not be vital to a image format as both guest/
> host's filesystem can also provide similar functionality.
> 
>  (4) I don't have too much understanding on how snapshot works but I think
> theoretically it would be using the techniques no more than that used in COW
> and backing file.
> 
> After all these thoughts, I still found no reason to not using a 'raw' file
> image (engineering efforts in Qemu should not count as we don't ask  for more
> features from outside world).
> I would be very sorry if my ignorance wasted your time.

Even if it did work (that it's problematic is already discussed in other
subthreads) what advantage would you get from using an extended raw
driver compared to simply using qcow2, which supports all of this today?

Kevin



Re: [Qemu-devel] disk image: self-organized format or raw file

2014-08-13 Thread Kevin Wolf
Am 12.08.2014 um 17:30 hat Eric Blake geschrieben:
> On 08/12/2014 08:14 AM, 吴兴博 wrote:
> >>> However FVD seems to have been ignored by community.
> >>
> >> Care to give a pointer to a URL describing the FVD format?
> >>
> >> http://lists.nongnu.org/archive/html/qemu-devel/2011-01/msg00398.html
> > 
> > This thread could be the clearest message on FVD.
> 
> That very message also points out WHY the community has appeared to
> ignore FVD:
> 
> "For any feature to be seriously considered for inclusion in QEMU,
> patches need to be posted to the mailing list against the latest git
> tree. That's a pre-requisite for any real discussion."
> 
> > It also has a paper published on USENIX conference.
> > https://www.usenix.org/event/atc11/tech/final_files/Tang.pdf
> 
> Thanks for the references.  Are you interested in posting patches to
> revive the work on that format?

Just to be clear upfront so that you don't waste your time: A new native
image format is not going to be merged. You would have to prove that
your format is capable of replacing qcow2 with all its features, that
it's better in some respect and that qcow2 cannot be extended to provide
the same. Other proposals, including FVD, have failed to provide that
and I'd consider it unlikely to happen this time. (QED fell short of it
and was merged anyway for political reasons; it's clear today that this
was a mistake.)

Kevin


pgpGt7AerV2j7.pgp
Description: PGP signature


Re: [Qemu-devel] disk image: self-organized format or raw file

2014-08-12 Thread Fam Zheng
On Tue, 08/12 12:22, Xingbo Wu wrote:
> On Tue, Aug 12, 2014 at 11:30 AM, Eric Blake  wrote:
> 
> > On 08/12/2014 08:14 AM, 吴兴博 wrote:
> > >>> However FVD seems to have been ignored by community.
> > >>
> > >> Care to give a pointer to a URL describing the FVD format?
> > >>
> > >> http://lists.nongnu.org/archive/html/qemu-devel/2011-01/msg00398.html
> > >
> > > This thread could be the clearest message on FVD.
> >
> > That very message also points out WHY the community has appeared to
> > ignore FVD:
> >
> > "For any feature to be seriously considered for inclusion in QEMU,
> > patches need to be posted to the mailing list against the latest git
> > tree. That's a pre-requisite for any real discussion."
> >
> > > It also has a paper published on USENIX conference.
> > > https://www.usenix.org/event/atc11/tech/final_files/Tang.pdf
> >
> > Thanks for the references.  Are you interested in posting patches to
> > revive the work on that format?
> >
> > I'm going to study it first. It would take some time :)
> 

Please don't add your text after quote leadings "> >". You should start a new
line.

Fam



Re: [Qemu-devel] disk image: self-organized format or raw file

2014-08-12 Thread Richard W.M. Jones
On Tue, Aug 12, 2014 at 03:23:38PM -0400, Xingbo Wu wrote:
> would be very space efficient for distribution :).
> Would you consider replace xz with lz4? it has faster decompression speed
> (~500MB/s)[1] and client-side decompression would be made painless.

No.  The main benefit of xz is it has a well defined stable API and a
file format that supports seeking.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html



Re: [Qemu-devel] disk image: self-organized format or raw file

2014-08-12 Thread Xingbo Wu
On Tue, Aug 12, 2014 at 2:52 PM, Richard W.M. Jones 
wrote:

> On Tue, Aug 12, 2014 at 07:46:30PM +0100, Daniel P. Berrange wrote:
> > Taking the compression feature - arguably the biggest benefit of that
> > is when you distribute disk images. eg if someone provides a root disk
> > image on a web server, using compression in qcow2 can dramatically
> > lower the download size, while still allowing QEMU to directly run
> > from that qcow2 file. Sure you could wrap your disk images in gzip
> > and then convert to your local filesystem at time of use but this
> > introduces multiple extra steps.
>
> It would be nice if qemu could handle xz-compressed files
> transparently, since (when prepared correctly) these files are
> seekable.
>
> I have written code to do this here:
>
>   https://github.com/libguestfs/nbdkit/tree/master/plugins/xz
>
> I believe it's ideal for read-only backing file, the xz-compressed image
would be very space efficient for distribution :).
Would you consider replace xz with lz4? it has faster decompression speed
(~500MB/s)[1] and client-side decompression would be made painless.

[1]
http://linuxaria.com/article/linux-compressors-comparison-on-centos-6-5-x86-64-lzo-vs-lz4-vs-gzip-vs-bzip2-vs-lzma

> Rich.
>
> --
> Richard Jones, Virtualization Group, Red Hat
> http://people.redhat.com/~rjones
> Read my programming and virtualization blog: http://rwmj.wordpress.com
> libguestfs lets you edit virtual machines.  Supports shell scripting,
> bindings from many languages.  http://libguestfs.org
>



-- 

Cheers!
   吴兴博  Wu, Xingbo 


Re: [Qemu-devel] disk image: self-organized format or raw file

2014-08-12 Thread Richard W.M. Jones
On Tue, Aug 12, 2014 at 07:46:30PM +0100, Daniel P. Berrange wrote:
> Taking the compression feature - arguably the biggest benefit of that
> is when you distribute disk images. eg if someone provides a root disk
> image on a web server, using compression in qcow2 can dramatically
> lower the download size, while still allowing QEMU to directly run
> from that qcow2 file. Sure you could wrap your disk images in gzip
> and then convert to your local filesystem at time of use but this
> introduces multiple extra steps.

It would be nice if qemu could handle xz-compressed files
transparently, since (when prepared correctly) these files are
seekable.

I have written code to do this here:

  https://github.com/libguestfs/nbdkit/tree/master/plugins/xz

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org



Re: [Qemu-devel] disk image: self-organized format or raw file

2014-08-12 Thread Daniel P. Berrange
On Mon, Aug 11, 2014 at 07:38:50PM -0400, 吴兴博 wrote:
> Hello,
> 
>   The introduction in the wiki page present several advantages of qcow2
> [1]. But I'm a little confused. I really appreciate if any one can give me
> some help on this :).
> 
>  (1) Currently the raw format doesn't support COW. In other words, a raw
> image cannot have a backing file. COW depends on the mapping table on which
> we it knows whether each block/cluster is present (has been modified) in
> the current image file. Modern file-systems like xfs/ext4/etc. provide
> extent/block allocation information to user-level. Like what 'filefrag'
> does with ioctl 'FIBMAP' and 'FIEMAP'. I guess the raw file driver (maybe
> block/raw-posix.c) may obtain correct 'present information about blocks.
> However this information may be limited to be aligned with file allocation
> unit size. Maybe it's just because a raw file has no space to store the
> "backing file name"? I don't think this could hinder the useful feature.
> 
>  (2) As most popular filesystems support delay-allocation/on-demand
> allocation/holes, whatever, a raw image is also thin provisioned as other
> formats. It doesn't consume much disk space by storing useless zeros.
> However, I don't know if there is any concern on whether fragmented extents
> would become a burden of the host filesystem.
> 
>  (3) For compression and encryption, I'm not an export on these topics at
> all but I think these features may not be vital to a image format as both
> guest/host's filesystem can also provide similar functionality.
> 
>  (4) I don't have too much understanding on how snapshot works but I think
> theoretically it would be using the techniques no more than that used in
> COW and backing file.
> 
> After all these thoughts, I still found no reason to not using a 'raw' file
> image (engineering efforts in Qemu should not count as we don't ask  for
> more features from outside world).
> I would be very sorry if my ignorance wasted your time.

FWIW, much of what you say about features supported in filesystems is
correct, however, that is only considering the needs of deployment on
your specific platform. One value of QCow2 is that it is a portable
format you can use on any platform where QEMU builds, whether it be
Linux, Windows, *BSD or Solaris. If you were to rely on the host
filesystem then obviously you'd have to figure out the different
solution for the particular OS you deploy on.

Taking the compression feature - arguably the biggest benefit of that
is when you distribute disk images. eg if someone provides a root disk
image on a web server, using compression in qcow2 can dramatically
lower the download size, while still allowing QEMU to directly run
from that qcow2 file. Sure you could wrap your disk images in gzip
and then convert to your local filesystem at time of use but this
introduces multiple extra steps.

There's similar arguments for other features in qcow2. That's not to
say you are wrong in your analysis of your own needs. It is simply a
case that different scenarios imply different solutions, so for some
qcow2 may be optimal, while for others using native filesystem features
might be better

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] disk image: self-organized format or raw file

2014-08-12 Thread Richard W.M. Jones
On Tue, Aug 12, 2014 at 08:07:55AM -0600, Eric Blake wrote:
> On 08/12/2014 07:45 AM, 吴兴博 wrote:
> 
> [please don't top-post on technical lists]
> 
> > Thanks for your information. It's really helpful.
> > I think adding a bitmap alongside the raw file ( or just within that file)
> 
> Umm, how do you propose to add a bitmap within a raw file?  The moment
> the file contains metadata, it is no longer raw, but some other format.
>  You'd need a way to reliably delineate the portion of the file that
> contains the bitmap and therefore must not be exposed to the guest.

There was an MSFT format where they used raw but added metadata after
the end of the raw file data.

https://en.wikipedia.org/wiki/VHD_%28file_format%29

This is crazy BTW - I'm not advocating we do it :-)

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html



Re: [Qemu-devel] disk image: self-organized format or raw file

2014-08-12 Thread Xingbo Wu
On Tue, Aug 12, 2014 at 11:30 AM, Eric Blake  wrote:

> On 08/12/2014 08:14 AM, 吴兴博 wrote:
> >>> However FVD seems to have been ignored by community.
> >>
> >> Care to give a pointer to a URL describing the FVD format?
> >>
> >> http://lists.nongnu.org/archive/html/qemu-devel/2011-01/msg00398.html
> >
> > This thread could be the clearest message on FVD.
>
> That very message also points out WHY the community has appeared to
> ignore FVD:
>
> "For any feature to be seriously considered for inclusion in QEMU,
> patches need to be posted to the mailing list against the latest git
> tree. That's a pre-requisite for any real discussion."
>
> > It also has a paper published on USENIX conference.
> > https://www.usenix.org/event/atc11/tech/final_files/Tang.pdf
>
> Thanks for the references.  Are you interested in posting patches to
> revive the work on that format?
>
> I'm going to study it first. It would take some time :)

>  --
> Eric Blake   eblake redhat com+1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>
>


-- 

Cheers!
   吴兴博  Wu, Xingbo 


Re: [Qemu-devel] disk image: self-organized format or raw file

2014-08-12 Thread Eric Blake
On 08/12/2014 08:14 AM, 吴兴博 wrote:
>>> However FVD seems to have been ignored by community.
>>
>> Care to give a pointer to a URL describing the FVD format?
>>
>> http://lists.nongnu.org/archive/html/qemu-devel/2011-01/msg00398.html
> 
> This thread could be the clearest message on FVD.

That very message also points out WHY the community has appeared to
ignore FVD:

"For any feature to be seriously considered for inclusion in QEMU,
patches need to be posted to the mailing list against the latest git
tree. That's a pre-requisite for any real discussion."

> It also has a paper published on USENIX conference.
> https://www.usenix.org/event/atc11/tech/final_files/Tang.pdf

Thanks for the references.  Are you interested in posting patches to
revive the work on that format?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] disk image: self-organized format or raw file

2014-08-12 Thread 吴兴博
On Tue, Aug 12, 2014 at 10:07 AM, Eric Blake  wrote:

> On 08/12/2014 07:45 AM, 吴兴博 wrote:
>
> [please don't top-post on technical lists]
>
> Sorry about that..

>  > Thanks for your information. It's really helpful.
> > I think adding a bitmap alongside the raw file ( or just within that
> file)
>
> Umm, how do you propose to add a bitmap within a raw file?  The moment
> the file contains metadata, it is no longer raw, but some other format.
>  You'd need a way to reliably delineate the portion of the file that
> contains the bitmap and therefore must not be exposed to the guest.
>
> Yes a agree. It's not raw anymore. It should be some 'lightweight' format.

 > would be suffice to distinguish between present or in backing file.
> > The idea in FVD looks similar to 'addcow'---use bitmap but delegating
> > allocation to FS. However FVD seems to have been ignored by community.
>
> Care to give a pointer to a URL describing the FVD format?
>
> http://lists.nongnu.org/archive/html/qemu-devel/2011-01/msg00398.html

This thread could be the clearest message on FVD.
It also has a paper published on USENIX conference.
https://www.usenix.org/event/atc11/tech/final_files/Tang.pdf

> --
> Eric Blake   eblake redhat com+1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>
>


Re: [Qemu-devel] disk image: self-organized format or raw file

2014-08-12 Thread Eric Blake
On 08/12/2014 07:45 AM, 吴兴博 wrote:

[please don't top-post on technical lists]

> Thanks for your information. It's really helpful.
> I think adding a bitmap alongside the raw file ( or just within that file)

Umm, how do you propose to add a bitmap within a raw file?  The moment
the file contains metadata, it is no longer raw, but some other format.
 You'd need a way to reliably delineate the portion of the file that
contains the bitmap and therefore must not be exposed to the guest.

> would be suffice to distinguish between present or in backing file.
> The idea in FVD looks similar to 'addcow'---use bitmap but delegating
> allocation to FS. However FVD seems to have been ignored by community.

Care to give a pointer to a URL describing the FVD format?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] disk image: self-organized format or raw file

2014-08-12 Thread 吴兴博
Thanks for your information. It's really helpful.
I think adding a bitmap alongside the raw file ( or just within that file)
would be suffice to distinguish between present or in backing file.
The idea in FVD looks similar to 'addcow'---use bitmap but delegating
allocation to FS. However FVD seems to have been ignored by community.

Cheers!
   吴兴博  Wu, Xingbo 


On Tue, Aug 12, 2014 at 9:23 AM, Eric Blake  wrote:

> On 08/11/2014 05:38 PM, 吴兴博 wrote:
> > Hello,
> >
> >   The introduction in the wiki page present several advantages of qcow2
> > [1]. But I'm a little confused. I really appreciate if any one can give
> me
> > some help on this :).
> >
> >  (1) Currently the raw format doesn't support COW. In other words, a raw
> > image cannot have a backing file. COW depends on the mapping table on
> which
> > we it knows whether each block/cluster is present (has been modified) in
> > the current image file. Modern file-systems like xfs/ext4/etc. provide
> > extent/block allocation information to user-level. Like what 'filefrag'
> > does with ioctl 'FIBMAP' and 'FIEMAP'. I guess the raw file driver (maybe
> > block/raw-posix.c) may obtain correct 'present information about blocks.
> > However this information may be limited to be aligned with file
> allocation
> > unit size. Maybe it's just because a raw file has no space to store the
> > "backing file name"? I don't think this could hinder the useful feature.
>
> Search the list archives; at one point in the past, an 'addcow' format
> was proposed, which is an additional file alongside a raw which provides
> enough information to (temporarily) add cow to raw (or any other file
> without a native backing file).  I don't know why that format was not
> pursued further.
>
> You could use xattr to store a user attribute of a backing file or
> addcow file to associate with a raw file.  But file system holes are NOT
> a good metadata tool for distinguishing between data not present (refer
> to the backing file) vs. data explicitly all zero.  Your proposal of
> using holes in raw files as metadata is NOT going to reliably work.
>
> Also, using SEEK_HOLE/SEEK_DATA is a much nicer interface for iterating
> raw file holes than FIEMAP.  It conveys less information, but that
> information is more portable (POSIX will be adding requirements for
> SEEK_HOLE/SEEK_DATA, and even NFSv4.2 is considering[1] adding this
> support because of POSIX).  GNU cp is capable of using both FIEMAP and
> SEEK_HOLE to optimize copies where the destination tries to preserve the
> same hole layout as the source (not always possible, given that not all
> systems have the same granularities of holes, and also given that not
> all consecutive blocks of all-zero bytes have to be reported as holes).
>  The SEEK_HOLE implementation has ALWAYS worked, but the FIEMAP
> implementation uncovered various bugs in file systems, and at one point
> would corrupt the copy unless cp did a sync() first, which slowed down
> the operation and defeated the point of attempting to use it for
> optimizations.  While holes are a cool thing, they are best only for
> optimizations, and not for reliable metadata information.
>
> [1]
> http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-26#section-15.12
>
> --
> Eric Blake   eblake redhat com+1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>
>


Re: [Qemu-devel] disk image: self-organized format or raw file

2014-08-12 Thread Eric Blake
On 08/11/2014 05:38 PM, 吴兴博 wrote:
> Hello,
> 
>   The introduction in the wiki page present several advantages of qcow2
> [1]. But I'm a little confused. I really appreciate if any one can give me
> some help on this :).
> 
>  (1) Currently the raw format doesn't support COW. In other words, a raw
> image cannot have a backing file. COW depends on the mapping table on which
> we it knows whether each block/cluster is present (has been modified) in
> the current image file. Modern file-systems like xfs/ext4/etc. provide
> extent/block allocation information to user-level. Like what 'filefrag'
> does with ioctl 'FIBMAP' and 'FIEMAP'. I guess the raw file driver (maybe
> block/raw-posix.c) may obtain correct 'present information about blocks.
> However this information may be limited to be aligned with file allocation
> unit size. Maybe it's just because a raw file has no space to store the
> "backing file name"? I don't think this could hinder the useful feature.

Search the list archives; at one point in the past, an 'addcow' format
was proposed, which is an additional file alongside a raw which provides
enough information to (temporarily) add cow to raw (or any other file
without a native backing file).  I don't know why that format was not
pursued further.

You could use xattr to store a user attribute of a backing file or
addcow file to associate with a raw file.  But file system holes are NOT
a good metadata tool for distinguishing between data not present (refer
to the backing file) vs. data explicitly all zero.  Your proposal of
using holes in raw files as metadata is NOT going to reliably work.

Also, using SEEK_HOLE/SEEK_DATA is a much nicer interface for iterating
raw file holes than FIEMAP.  It conveys less information, but that
information is more portable (POSIX will be adding requirements for
SEEK_HOLE/SEEK_DATA, and even NFSv4.2 is considering[1] adding this
support because of POSIX).  GNU cp is capable of using both FIEMAP and
SEEK_HOLE to optimize copies where the destination tries to preserve the
same hole layout as the source (not always possible, given that not all
systems have the same granularities of holes, and also given that not
all consecutive blocks of all-zero bytes have to be reported as holes).
 The SEEK_HOLE implementation has ALWAYS worked, but the FIEMAP
implementation uncovered various bugs in file systems, and at one point
would corrupt the copy unless cp did a sync() first, which slowed down
the operation and defeated the point of attempting to use it for
optimizations.  While holes are a cool thing, they are best only for
optimizations, and not for reliable metadata information.

[1]
http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-26#section-15.12

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] disk image: self-organized format or raw file

2014-08-12 Thread Kirill Batuzov
On Tue, 12 Aug 2014, Fam Zheng wrote:

> On Mon, 08/11 19:38, 吴兴博 wrote:
> > Hello,
> > 
> >   The introduction in the wiki page present several advantages of qcow2
> > [1]. But I'm a little confused. I really appreciate if any one can give me
> > some help on this :).
> > 
> >  (1) Currently the raw format doesn't support COW. In other words, a raw
> > image cannot have a backing file. COW depends on the mapping table on which
> > we it knows whether each block/cluster is present (has been modified) in
> > the current image file. Modern file-systems like xfs/ext4/etc. provide
> > extent/block allocation information to user-level. Like what 'filefrag'
> > does with ioctl 'FIBMAP' and 'FIEMAP'. I guess the raw file driver (maybe
> > block/raw-posix.c) may obtain correct 'present information about blocks.
> > However this information may be limited to be aligned with file allocation
> > unit size. Maybe it's just because a raw file has no space to store the
> > "backing file name"? I don't think this could hinder the useful feature.
> > 
> >  (2) As most popular filesystems support delay-allocation/on-demand
> > allocation/holes, whatever, a raw image is also thin provisioned as other
> > formats. It doesn't consume much disk space by storing useless zeros.
> > However, I don't know if there is any concern on whether fragmented extents
> > would become a burden of the host filesystem.
> > 
> >  (3) For compression and encryption, I'm not an export on these topics at
> > all but I think these features may not be vital to a image format as both
> > guest/host's filesystem can also provide similar functionality.
> > 
> >  (4) I don't have too much understanding on how snapshot works but I think
> > theoretically it would be using the techniques no more than that used in
> > COW and backing file.
> > 
> > After all these thoughts, I still found no reason to not using a 'raw' file
> > image (engineering efforts in Qemu should not count as we don't ask  for
> > more features from outside world).
> > I would be very sorry if my ignorance wasted your time.
> 
> Hi! I think what you described is theoretically possible, but I'm not so
> positive about this feature. What would be the advantages, compared to qcow2?
> 

I think this idea was exploited in FVD format. The research paper
reported a large performance gain compared to qcow2. The patches can be
found in the mailing list archives (feb. 2011).

http://wiki.qemu.org/Features/FVD

> My major concern is that the file system hole's transparency, meaning that the
> users normally can't tell if a "hole" is really zeroes or unallocated, would
> cause data loss more easily: the user may expect scp (1) or cp (1) to work on
> an image file, just as always, but these tools can legitimately fill the whole
> with actual zeroes, if the target is filesystem does not supporting hole.
> That's too dangerous but totally out of control of QEMU.
> 
> Fam
> 
>

-- 
Kirill

Re: [Qemu-devel] disk image: self-organized format or raw file

2014-08-12 Thread Fam Zheng
On Tue, 08/12 08:03, 吴兴博 wrote:
> I carefully read your reply and thought of it carefully. I'm sorry that
> when I said "I get it" I actually meant "I believe you" but not "I
> understand it".
> The problem would not come from cp or rsync -- It's not their fault. They
> just have no way to make it right.
> The real reason of it would be that filesystems have different allocation
> unit size.
> 
> For example, a file is of 16KB in appearance, and the 4KB-12KB of it is a
> hole (0KB-4KB and 12KB-16KB has valid data).
> The FS held it has 4KB block size, so it *could* be allocated like this.
> Copying this file to a filesystem of 16KB block size would cause the entire
> 16KB filled with data, to be specific, the hole is filled with zero and
> cp/rsync have NO way to make difference.
> 
> That's not a engineering issue of cp/rsync. It's a real issue cause by the
> fact that (most) filesystems have configurable block size.
> 

Correct.

It's not an fault of any party, because there is no contract on this part at
all. What you suggested is not a good use case of the file system hole.

Fam



Re: [Qemu-devel] disk image: self-organized format or raw file

2014-08-12 Thread 吴兴博
I carefully read your reply and thought of it carefully. I'm sorry that
when I said "I get it" I actually meant "I believe you" but not "I
understand it".
The problem would not come from cp or rsync -- It's not their fault. They
just have no way to make it right.
The real reason of it would be that filesystems have different allocation
unit size.

For example, a file is of 16KB in appearance, and the 4KB-12KB of it is a
hole (0KB-4KB and 12KB-16KB has valid data).
The FS held it has 4KB block size, so it *could* be allocated like this.
Copying this file to a filesystem of 16KB block size would cause the entire
16KB filled with data, to be specific, the hole is filled with zero and
cp/rsync have NO way to make difference.

That's not a engineering issue of cp/rsync. It's a real issue cause by the
fact that (most) filesystems have configurable block size.

Is that correct?
I really appreciate.


Cheers!
   吴兴博  Wu, Xingbo 


On Tue, Aug 12, 2014 at 7:39 AM, Fam Zheng  wrote:

> On Tue, 08/12 07:22, 吴兴博 wrote:
> > Thanks, I get it.
> > Does rsync have exactly the same problem?
>
> Yes.
>
> Fam
>


Re: [Qemu-devel] disk image: self-organized format or raw file

2014-08-12 Thread Fam Zheng
On Tue, 08/12 06:46, 吴兴博 wrote:
> Hi Fam,
>   It's glad to hear you,
> It is said in this post that "All files systems that support inodes
> (ext2/3/4, xfs, btfs, etc) support files with holes while creating the
> files..."
> [
> http://serverfault.com/questions/558761/best-linux-filesystem-for-sparse-files
> ]
> 
> I also heard this claim from other sources, and the only "popular"
> filesystems who don't support holes in real world are just the old FAT32
> and other FAT*.
> Note that holes appear in filesystems when creating a sparse file in
> inode-filesystems. While "punching holes" does remove the existent contents
> from the file, and it was  newly added to only xfs/ext4 in newer linux
> kernel.
> 
> In qemu's disk image, a hole delivers clear message---the corresponding
> sectors/blocks/clusters are never written. So it's up to the guest whether
> to initialize the sectors to zero or just ignore them (filesystems never
> confuse with a uninitialized sector right?). Filesystems should ignore
> uninitialized data just because it's meaningless. Once written, the data
> would be ever meaningful to the guest.
> 
> "punching holes" would add support for "DISCARD" for a image which could
> behave like a SSD. Otherwise the image behaves like a magnetic disk.
> 
> The message in below would not be accurate:
> * cp has --sparse option to support read and create sparse files.
> * Sadly scp doesn't support sparse files.
> * rsync also has a -S --sparse option to properly handle sparse files.
> 
> Not until recently did I realize that the hole is just widely supported in
> *almost* all filesystems. That's why I have come up this idea.
> I understand your concern about the support of hole. If this just because
> the "hole" is never standardized as POSIX or something else?
> 
> So now I get one clear reason: hole is not guaranteed by standardized
> filesystems (I guess a POSIX would be enough).
> Is their something else? If it's the only reason of not using a sparse raw
> file as image, and the only impediment is no-one-should-ever-use FAT32 or
> say the POSIX, we may be very close to  move one step forward.
> 

The problem is cp wouldn't maintain the correctness of a copied raw-with-hole
image, whereas cp does maintain the correctness of any other thin image types,
that has cluster explicit allocation info.

We can't overcome that, unless we tell users "never use `cp' to copy the image,
it will break your data, you have to use `qemu-img convert'". That's
counterintuitive and a step back.

Fam



Re: [Qemu-devel] disk image: self-organized format or raw file

2014-08-12 Thread 吴兴博
Hi Fam,
  It's glad to hear you,
It is said in this post that "All files systems that support inodes
(ext2/3/4, xfs, btfs, etc) support files with holes while creating the
files..."
[
http://serverfault.com/questions/558761/best-linux-filesystem-for-sparse-files
]

I also heard this claim from other sources, and the only "popular"
filesystems who don't support holes in real world are just the old FAT32
and other FAT*.
Note that holes appear in filesystems when creating a sparse file in
inode-filesystems. While "punching holes" does remove the existent contents
from the file, and it was  newly added to only xfs/ext4 in newer linux
kernel.

In qemu's disk image, a hole delivers clear message---the corresponding
sectors/blocks/clusters are never written. So it's up to the guest whether
to initialize the sectors to zero or just ignore them (filesystems never
confuse with a uninitialized sector right?). Filesystems should ignore
uninitialized data just because it's meaningless. Once written, the data
would be ever meaningful to the guest.

"punching holes" would add support for "DISCARD" for a image which could
behave like a SSD. Otherwise the image behaves like a magnetic disk.

The message in below would not be accurate:
* cp has --sparse option to support read and create sparse files.
* Sadly scp doesn't support sparse files.
* rsync also has a -S --sparse option to properly handle sparse files.

Not until recently did I realize that the hole is just widely supported in
*almost* all filesystems. That's why I have come up this idea.
I understand your concern about the support of hole. If this just because
the "hole" is never standardized as POSIX or something else?

So now I get one clear reason: hole is not guaranteed by standardized
filesystems (I guess a POSIX would be enough).
Is their something else? If it's the only reason of not using a sparse raw
file as image, and the only impediment is no-one-should-ever-use FAT32 or
say the POSIX, we may be very close to  move one step forward.





Cheers!
   吴兴博  Wu, Xingbo 


On Mon, Aug 11, 2014 at 8:52 PM, Fam Zheng  wrote:

> On Mon, 08/11 19:38, 吴兴博 wrote:
> > Hello,
> >
> >   The introduction in the wiki page present several advantages of qcow2
> > [1]. But I'm a little confused. I really appreciate if any one can give
> me
> > some help on this :).
> >
> >  (1) Currently the raw format doesn't support COW. In other words, a raw
> > image cannot have a backing file. COW depends on the mapping table on
> which
> > we it knows whether each block/cluster is present (has been modified) in
> > the current image file. Modern file-systems like xfs/ext4/etc. provide
> > extent/block allocation information to user-level. Like what 'filefrag'
> > does with ioctl 'FIBMAP' and 'FIEMAP'. I guess the raw file driver (maybe
> > block/raw-posix.c) may obtain correct 'present information about blocks.
> > However this information may be limited to be aligned with file
> allocation
> > unit size. Maybe it's just because a raw file has no space to store the
> > "backing file name"? I don't think this could hinder the useful feature.
> >
> >  (2) As most popular filesystems support delay-allocation/on-demand
> > allocation/holes, whatever, a raw image is also thin provisioned as other
> > formats. It doesn't consume much disk space by storing useless zeros.
> > However, I don't know if there is any concern on whether fragmented
> extents
> > would become a burden of the host filesystem.
> >
> >  (3) For compression and encryption, I'm not an export on these topics at
> > all but I think these features may not be vital to a image format as both
> > guest/host's filesystem can also provide similar functionality.
> >
> >  (4) I don't have too much understanding on how snapshot works but I
> think
> > theoretically it would be using the techniques no more than that used in
> > COW and backing file.
> >
> > After all these thoughts, I still found no reason to not using a 'raw'
> file
> > image (engineering efforts in Qemu should not count as we don't ask  for
> > more features from outside world).
> > I would be very sorry if my ignorance wasted your time.
>
> Hi! I think what you described is theoretically possible, but I'm not so
> positive about this feature. What would be the advantages, compared to
> qcow2?
>
> My major concern is that the file system hole's transparency, meaning that
> the
> users normally can't tell if a "hole" is really zeroes or unallocated,
> would
> cause data loss more easily: the user may expect scp (1) or cp (1) to work
> on
> an image file, just as always, but these tools can legitimately fill the
> whole
> with actual zeroes, if the target is filesystem does not supporting hole.
> That's too dangerous but totally out of control of QEMU.
>
> Fam
>


Re: [Qemu-devel] disk image: self-organized format or raw file

2014-08-11 Thread Fam Zheng
On Mon, 08/11 19:38, 吴兴博 wrote:
> Hello,
> 
>   The introduction in the wiki page present several advantages of qcow2
> [1]. But I'm a little confused. I really appreciate if any one can give me
> some help on this :).
> 
>  (1) Currently the raw format doesn't support COW. In other words, a raw
> image cannot have a backing file. COW depends on the mapping table on which
> we it knows whether each block/cluster is present (has been modified) in
> the current image file. Modern file-systems like xfs/ext4/etc. provide
> extent/block allocation information to user-level. Like what 'filefrag'
> does with ioctl 'FIBMAP' and 'FIEMAP'. I guess the raw file driver (maybe
> block/raw-posix.c) may obtain correct 'present information about blocks.
> However this information may be limited to be aligned with file allocation
> unit size. Maybe it's just because a raw file has no space to store the
> "backing file name"? I don't think this could hinder the useful feature.
> 
>  (2) As most popular filesystems support delay-allocation/on-demand
> allocation/holes, whatever, a raw image is also thin provisioned as other
> formats. It doesn't consume much disk space by storing useless zeros.
> However, I don't know if there is any concern on whether fragmented extents
> would become a burden of the host filesystem.
> 
>  (3) For compression and encryption, I'm not an export on these topics at
> all but I think these features may not be vital to a image format as both
> guest/host's filesystem can also provide similar functionality.
> 
>  (4) I don't have too much understanding on how snapshot works but I think
> theoretically it would be using the techniques no more than that used in
> COW and backing file.
> 
> After all these thoughts, I still found no reason to not using a 'raw' file
> image (engineering efforts in Qemu should not count as we don't ask  for
> more features from outside world).
> I would be very sorry if my ignorance wasted your time.

Hi! I think what you described is theoretically possible, but I'm not so
positive about this feature. What would be the advantages, compared to qcow2?

My major concern is that the file system hole's transparency, meaning that the
users normally can't tell if a "hole" is really zeroes or unallocated, would
cause data loss more easily: the user may expect scp (1) or cp (1) to work on
an image file, just as always, but these tools can legitimately fill the whole
with actual zeroes, if the target is filesystem does not supporting hole.
That's too dangerous but totally out of control of QEMU.

Fam