Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

Anthony Liguori Thu, 09 Sep 2010 10:43:52 -0700

On 09/09/2010 01:59 AM, Avi Kivity wrote:

 On 09/08/2010 06:07 PM, Stefan Hajnoczi wrote:
     uint32_t table_size;          /* table size, in clusters */
Presumably L1 table size?  Or any table size?
Hm.  It would be nicer not to require contiguous sectors anywhere.  How
about a variable- or fixed-height tree?
Both extents and fancier trees don't fit the philosophy, which is to
keep things straightforward and fast by doing less.  With extents and
trees you've got something that looks much more like a full-blown
filesystem.  Is there an essential feature or characteristic that QED
cannot provide in its current design?
Not using extents mean that random workloads on very large disks willcontinuously need to page in L2s (which are quite large, 256KB islarge enough that you need to account for read time, not just seektime). Keeping it to two levels means that the image size is limited,not very good for an image format designed in 2010.


Define "very large disks".

My target for VM images is 100GB-1TB. Practically speaking, that atleast covers us for the next 5 years.

Since QED has rich support for features, we can continue to evolve theformat over time in a backwards compatible way. I'd rather delaysupporting massively huge disks for the future when we better understandtrue nature of the problem.

Is the physical image size always derived from the host filemetadata? Is
this always safe?
In my email summarizing crash scenarios and recovery we cover the
bases and I think it is safe to rely on file size as physical image
size.  The drawback is that you need a host filesystem and cannot
directly use a bare block device.  I think that is acceptable for a
sparse format, otherwise we'd be using raw.
Hm, we do have a use case for qcow2-over-lvm. I can't say it'ssomething I like, but a point to consider.

We specifically are not supporting that use-case in QED today. There'sa good reason for it. For cluster allocation, we achieve goodperformance because for L2 cluster updates, we can avoid synchronousmetadata updates (except for L1 updates).

We achieve synchronous metadata updates by leveraging the underlyingfilesystem's metadata. The underlying filesystems are much smarterabout their metadata updates. They'll keep a journal to delaysynchronous updates and other fancy things.

If we tried to represent the disk size in the header, we would have todo an fsync() on every cluster allocation.

I can only imagine the use case for qcow2-over-lvm is performance. Butthe performance of QED on a file system is so much better than qcow2that you can safely just use a file system and avoid the complexity ofqcow2 over lvm.


Regards,

Anthony Liguori

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

Reply via email to