----- Original Message ----- > From: "Stefan Hajnoczi" <stefa...@gmail.com> > To: "Andrew Martin" <amar...@xes-inc.com> > Cc: qemu-devel@nongnu.org > Sent: Tuesday, August 19, 2014 9:59:25 AM > Subject: Re: [Qemu-devel] Using cache=writeback safely on qemu 1.4.0 and later > > If you strace -f the QEMU process on the host, you will see fdatasync(2) > system calls when the guest flushes the disk. > > You can find the file descriptor number by checking ls -l > /proc/$PID_OF_QEMU/fd and looking for the disk image file.
When the disk is set to cache=writethrough on one of the same VMs, I see frequent fdatasync(2) calls (every few seconds). However, when I change the disk over to cache=writeback, since boot I have not yet seen a single fdatasync(2) call, even after writing data 2x the amount of RAM: # time strace -ft -p4113 2>&1 | grep fdatasync ^C real 15m39.245s user 0m7.940s sys 0m18.280s Note that the disk is defined as follows: <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='writeback'/> <source file='/var/lib/libvirt/images/vm.img'/> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk> > > I recently experienced UPS failure on several hosts which caused a hard > > shutdown. After restarting, 3 of the guests had corruption on their disks > > and > > required a fairly long fsck to fix. Afterwards, data that had been written > > to > > the disks several hours before the crash was corrupted, which makes me > > think > > that it was never fsync()-ed to the non-volatile storage. > > What exactly was the "corruption" you encountered? Which application, > error message, etc. Two of the servers are web servers with apache2. In one case, a python daemon copies JPGs onto the server - the last 100 copied onto the server were corrupted. In another case, some files had been uploaded several days prior to the www-root, but after the hard reset said files were no longer present in the filesystem. > > Is it safe in this setup to use cache=writeback? Or, should I use > > cache=writethrough instead? > > Ubuntu 12.04 is recent and sends write cache flushes. > > Are you sure the file system and/or application workload are flushing > the disk cache? Please check the mount options and application-specific > configuration. The mount options for the ext4 filesystem in the VM in both cases are: rw,relatime,errors=remount-ro,data=ordered Similarly, the host's ext4 filesystem holding the images is mounted with: rw,relatime,data=ordered I did not see any errors in the kernel log in the guest, probably because the root filesystem was read-only until the fsck had completed. Thanks, Andrew