On Fri, Jan 14, 2011 at 03:56:00PM -0500, Chunqiang Tang wrote: > P2) Overhead of storing an image on a host file system. Specifically, a > RAW image stored on ext3 is 50-63% slower than a RAW image stored on a raw > partition.
Sorry, benchmarking this against ext3 really doesn't matter. Benchmark it against xfs or ext4 with a preallocated image (fallocate or dd). > For P1), I uses the term compact image instead of sparse image, because a > RAW image stored as a sparse file in ext3 is a sparse image, but is not a > compact image. A compact image stores data in such a way that the file > size of the image file is smaller than the size of the virtual disk > perceived by the VM. QCOW2 is a compact image. The disadvantage of a > compact image is that the data layout perceived by the guest OS differs > from the actual layout on the physical disk, which defeats many > optimizations in guest file systems. It's something filesystems have to deal with. Real storage is getting increasingly virtualized. While this didn't matter for the real high end storage which has been doing this for a long time it's getting more and more exposed to the filesystem. That includes LVM layouts and thinly provisioned disk arrays, which are getting increasingly popular. That doesn't matter the 64k (or until recently 4k) cluster size in qcow2 is a good idea, we'd want at least a magnitude or two larger extents to perform well, but it means filesystems really do have to cope with it. > For P2), using a host file system is inefficient, because 1) historically > file systems are optimized for small files rather than large images, I'm not sure what hole you're pulling off this bullshit, but this is absolutely not correct. Since the damn of time you have filesystems optimized for small files, for larger or really large files, or trying to deal with a tradeoff inbetween. > 2) certain functions of a host file system are simply redundant with > respect to the function of a compact image, e.g., performing storage > allocation. Moreover, using a host file system not only adds overhead, but > also introduces data integrity issues. I/O into fully preallocated files uses exactly the same codepath as doing I/O to the block device, except for a identify logical to physical block mapping in the block device and a non-trivial one in the filesysgem. Note that the block mapping is cached and does not affect the performance. I've published the numbers for qemu in the various caching modes and all major filesystems a while ago, so I'm not making this up. > Specifically, if I/Os uses O_DSYNC, > it may be too slow. If I/Os use O_DIRECT, it cannot guarantee data > integrity in the event of a host crash. See > http://lwn.net/Articles/348739/ . I/O to the block devices does not guarantee data integrity without O_DSYNC either. > Storage over-commit means that, e.g., a 100GB physical disk can be used to > host 10 VMs, each with a 20GB virtual disk. The current storage industry buzz word for that is thin provisioning.