Re: JFYI: ext4 bug triggerable by kvm

Anthony Liguori Tue, 17 Aug 2010 07:55:26 -0700

On 08/17/2010 09:45 AM, Christoph Hellwig wrote:

On Tue, Aug 17, 2010 at 09:39:15AM -0500, Anthony Liguori wrote:

The type of cache we present to the guest only should relate to how
the hypervisor caches the storage.  It should be independent of how
data is cached by the disk.

It is.

There can be many levels of caching in a storage hierarchy and each
hierarchy cached independently of the next level.

If the user has a disk with a writeback cache, if we expose a
writethrough cache to the guest, it's not our responsibility to make
sure that we break through the writeback cache on the disk.

The users doesn't know or have to care about the caching.  The
users uses O_SYNC/fsync to tell it wants data on disk, and it's the
operating systems job to make that happen.   The situation with qemu
is the same - if we tell the guest that we do not have a volatile write
cache that needs explicit management the guest can rely on the fact
that it does not have to do manual cache management.

This is simply unrealistic. O_SYNC might force data to be on a platterwhen using a directly attached disk but many NAS's actually do writebackcaching and relying on having an UPS to preserve data integrity.There's really no way in the general case to ensure that data isactually on a platter once you've involved a complex storage setup oryou assume FUA

Let me put it another way. If an admin knows the disks on a machinehave battery backed cache, he's likely to leave writeback caching enabled.

We are currently giving the admin two choices with QEMU, either ignorethe fact that the disk is battery backed and do write through caching ofthe disk or do writeback caching in the host which expands the diskcache from something very small and non-volatile (the on-disk cache) tosomething very large and volatile (the page cache). To make the pagecache non-volatile, you would need to have an UPS for the hypervisorwith enough power to flush the page cache.

So basically, we're not presenting a model that makes sensible use ofreliable disks.

cache=none does the right thing here but doesn't benefit from the host'spage cache for reads. This is really the missing behavior.


Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: JFYI: ext4 bug triggerable by kvm

Reply via email to