On 09/10/2010 02:29 PM, Stefan Hajnoczi wrote:

They only guarantee that the filesystem is consistent.  A write() that
extends a file may be reordered with the L2 write() that references the new
cluster.  Requiring fsck on  unclean shutdown is very backwards for a 2010
format.
I'm interested in understanding how preallocation will work in a way
that does not introduce extra flushes in the common case or require
fsck.

It seems to me that you can either preallocate and then rely on an
fsck on startup to figure out which clusters are now really in use, or
you can keep an exact max_cluster but this requires an extra write
operation for each allocating write (and perhaps a flush?).

Can you go into more detail in how preallocation should work?

You simply leak the preallocated clusters.

That's not as bad as it sounds - if you never write() the clusters they don't occupy any space on disk, so you only leak address space, not actual storage. If you copy the image then you actually do lost storage.

If you really wanted to recover the lost storage you could start a thread in the background that looks for unallocated blocks. Unlike fsck, you don't have to wait for it since data integrity does not depend on it. I don't think it's worthwhile, though.

Other games you can play with preallocation is varying the preallocation window with workload: start with no preallocation, as the guest starts to allocate you increase the window. When the guest starts to idle again you can return the storage to the operating system and reduce the window back to zero.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


Reply via email to