On 09/09/2010 01:59 AM, Avi Kivity wrote:
 On 09/08/2010 06:07 PM, Stefan Hajnoczi wrote:
     uint32_t table_size;          /* table size, in clusters */
Presumably L1 table size?  Or any table size?

Hm.  It would be nicer not to require contiguous sectors anywhere.  How
about a variable- or fixed-height tree?
Both extents and fancier trees don't fit the philosophy, which is to
keep things straightforward and fast by doing less.  With extents and
trees you've got something that looks much more like a full-blown
filesystem.  Is there an essential feature or characteristic that QED
cannot provide in its current design?


Not using extents mean that random workloads on very large disks will continuously need to page in L2s (which are quite large, 256KB is large enough that you need to account for read time, not just seek time). Keeping it to two levels means that the image size is limited, not very good for an image format designed in 2010.

Define "very large disks".

My target for VM images is 100GB-1TB. Practically speaking, that at least covers us for the next 5 years.

Since QED has rich support for features, we can continue to evolve the format over time in a backwards compatible way. I'd rather delay supporting massively huge disks for the future when we better understand true nature of the problem.

Is the physical image size always derived from the host file metadata? Is
this always safe?
In my email summarizing crash scenarios and recovery we cover the
bases and I think it is safe to rely on file size as physical image
size.  The drawback is that you need a host filesystem and cannot
directly use a bare block device.  I think that is acceptable for a
sparse format, otherwise we'd be using raw.

Hm, we do have a use case for qcow2-over-lvm. I can't say it's something I like, but a point to consider.

We specifically are not supporting that use-case in QED today. There's a good reason for it. For cluster allocation, we achieve good performance because for L2 cluster updates, we can avoid synchronous metadata updates (except for L1 updates).

We achieve synchronous metadata updates by leveraging the underlying filesystem's metadata. The underlying filesystems are much smarter about their metadata updates. They'll keep a journal to delay synchronous updates and other fancy things.

If we tried to represent the disk size in the header, we would have to do an fsync() on every cluster allocation.

I can only imagine the use case for qcow2-over-lvm is performance. But the performance of QED on a file system is so much better than qcow2 that you can safely just use a file system and avoid the complexity of qcow2 over lvm.

Regards,

Anthony Liguori



Reply via email to