Mark McLoughlin wrote:
i.e. with write-back caching enabled, the IDE  protocol makes no
guarantees about when data is committed to disk.

So, from a protocol correctness POV, qemu is behaving correctly with
cache=on and write-back caching enabled on the disk.

Yes, the host page cache is basically a big on-disk cache.

For SCSI, an unordered queue is advertised. Again, everything depends on whether or not write-back caching is enabled or not. Again, perfectly happy to take patches here.

Queue ordering and write-back caching sound like very different things.
Are they two distinct SCSI options, or ...?

Yes.

Surely an ordered queue doesn't do much help prevent fs corruption if
the host crashes, right? You would still need write-back caching
disabled?

You need both. In theory, a guest would use queue ordering to guarantee that certain writes made it to disk before other writes. Enabling write-through guarantees that the data is actually on disk. Since we advertise an unordered queue, we're okay from a safety point-of-view but for performance reasons, we'll want to do ordered queuing.

More importantly, the most common journaled filesystem, ext3, does not enable write barriers by default (even for journal updates). This is how it ship in Red Hat distros.

i.e. implementing barriers for virtio won't help most ext3 deployments?

Yes, ext3 doesn't use barriers by default. See http://kerneltrap.org/mailarchive/linux-kernel/2008/5/19/1865314

And again, if barriers are just about ordering, don't you need to
disable caching anyway?

Well, virtio doesn't have a notion of write-caching. My thinking is that we ought to implement barriers via fdatasync because posix-aio already has an op for it. This would effectively use barriers as a point to force something on disk. I think this would take care of most of the data corruption issues since the cases where the guest cares about data corruption would be handled (barriers should be used for journal writes and any O_DIRECT write for instance, although, yeah, not the case today with ext3).

So there is no greater risk of corrupting a journal in QEMU than there
is on bare metal.

This is the bit I really don't buy - we're equating qemu caching to IDE
write-back caching and saying the risk of corruption is the same in both
cases.

Yes.

But doesn't qemu cache data for far, far longer than a typical IDE disk
with write-back caching would do? Doesn't that mean you're far, far more
likely to see fs corruption with qemu caching?

It caches more data, I don't know how much longer it cases than a typical IDE disk. The guest can crash and that won't cause data loss. The only thing that will really cause data loss is the host crashing so it's slightly better than write-back caching from that regard.

Or put it another way, if we fix it by implementing the disabling of
write-back caching ... users running a virtual machine will need to run
"hdparam -W 0 /dev/sda" where they would never have run it on baremetal?

I don't see it as something needing to be fixed because I don't see that the exposure is significantly greater for a VM than for a real machine.

And let's take a step back too. If people are really concerned about this point, let's introduce a sync=on option that opens the image with O_SYNC. This will effectively make the cache write-through without the baggage associated with O_DIRECT.

While I object to libvirt always setting cache=off, I think sync=on for IDE and SCSI may be reasonable (you don't want it for virtio-blk once we implement proper barriers with fdatasync I think).

Regards,

Anthony Liguori

Cheers,
Mark.


--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Reply via email to