On Wed, Nov 21, 2012 at 10:01:00AM +0100, Dietmar Maurer wrote: > +==Disadvantages== > + > +* we need to define a new archive format > + > +Note: Most existing archive formats are optimized to store small files > +including file attributes. We simply do not need that for VM archives.
Did you look at the VMDK "Stream-Optimized Compressed" subformat? http://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf?src=vmdk It is a stream of compressed "grains" (data). They are out-of-order and each grain comes with the virtual disk lba where the data should be visible to the guest. The stream also contains "grain tables" and "grain directories". This metadata makes random read access to the file possible once you have downloaded the entire file (i.e. it is seekable). Although tools can choose to consume the stream in sequential order too and ignore the metadata. In other words, the format is an out-of-order stream of data chunks plus random access lookup tables at the end. QEMU's block/vmdk.c already has some support for this format although I don't think we generate out-of-order yet. The benefit of reusing this code is that existing tools can consume these files. Stefan