On 09/20/2010 10:08 AM, Kevin Wolf wrote:
If you're comfortable with a writeback cache for metadata, then you
should also be comfortable with a writeback cache for data in which
case, cache=writeback is the answer.
Well, there is a difference: We don't pollute the host page cache with
guest data and we don't get a virtual "disk cache" as big as the host
RAM, but only a very limited queue of metadata.
Basically, in qemu we have three different types of caching:
1. O_DSYNC, everything is always synced without any explicit request.
This is cache=writethrough.
I actually think O_DSYNC is the wrong implementation of
cache=writethrough. cache=writethrough should behave just like
cache=none except that data goes through the page cache.
2. Nothing is ever synced. This is cache=unsafe.
3. We present a writeback disk cache to the guest and the guest needs
to explicitly flush to gets its data safe on disk. This is
cache=writeback and cache=none.
We shouldn't tie the virtual disk cache to which cache= option is used
in the host. cache=none means that all requests go directly to the
disk. cache=writeback means the host acts as a writeback cache.
If your disk is in writethrough mode, exposing cache=none as a writeback
disk cache is not correct.
We're still lacking modes for O_DSYNC | O_DIRECT and unsafe | O_DIRECT,
but they are entirely possible, because it's two different dimensions.
(And I think Christoph was planning to actually make it two independent
options)
I don't really think O_DSYNC | O_DIRECT makes much sense.
If it's a matter of batching, batching can't occur if you have a barrier
between steps 3 and 5. The only way you can get batching is by doing a
writeback cache for the metadata such that you can complete your request
before the metadata is written.
Am I misunderstanding the idea?
No, I think you understand it right, but maybe you were not completely
aware that cache=none doesn't mean writethrough.
No, cache=none means don't cache on the host.
In my mind, cache=none|cache=writethrough is specifically about
eliminating the host from the cache hierarchy. This is not a
correctness issue with respect to integrity but rather about data loss.
If you have strong storage with battery backed caches, then you can
relax flushes. However, if you've got a cache in the host and the host
isn't battery backed, that's no longer safe to do.
So even with cache=none, if we added a writeback cache for metadata, it
would really need to be an optional feature. Something like
cache=none|writethrough|metadata|writeback.
Regards,
Anthony Liguori
Kevin