On Tue, Aug 13, 2013 at 07:03:56PM +0200, Kaveh Razavi wrote: > Using copy-on-write images with the base image stored remotely is common > practice in data centers. This saves significant network traffic by > avoiding the transfer of the complete base image. However, the data > blocks needed for a VM boot still need to be transfered to the node that > runs the VM. On slower networks, this will create a bottleneck when > booting many VMs simultaneously from a single VM image. Also, > simultaneously booting VMs from more than one VM image creates a > bottleneck at the storage device of the base image, if the storage > device does not fair well with the random access pattern that happens > during booting. > > This patch introduces a block-level caching mechanism by introducing a > copy-on-read image that supports quota and goes in between the base > image and copy-on-write image. This cache image can either be stored on > the nodes that run VMs or on a storage device that can handle random > access well (e.g. memory, SSD, etc.). This cache image is effective > since usually only a very small part of the image is necessary for > booting a VM. We measured 100MB to be enough for a default CentOS and > Debian installations. > > A cache image with a quota of 100MB can be created using these commands: > > $ qemu-img create -f qcow2 -o > cache_img_quota=104857600,backing_file=/path/to/base /path/to/cache > $ qemu-img create -f qcow2 -o backing_file=/path/to/cache /path/to/cow > > The first time a VM boots from the copy-on-write image, the cache gets > warm. Subsequent boots do not need to read from the base image.
100 MB is small enough for RAM. Did you try enabling the host kernel page cache for the backing file? That way all guests running on this host share a single RAM-cached version of the backing file. The other existing solution is to use the image streaming feature, which was designed to speed up deployment of image files over the network. It copies the contents of the image from a remote server onto the host while allowing immediate random access from the guest. This isn't a cache, this is a full copy of the image. I share an idea of how to turn this into a cache in a second, but first how to deploy this safely. Since multiple QEMU processes can share a backing file and the cache must not suffer from corruptions due to races, you can use one qemu-nbd per backing image. The QEMU processes connect to the local read-only qemu-nbd server. If you want a cache you could enable copy-on-read without the image streaming feature (block_stream command) and evict old data using discard commands. No qcow2 image format changes are necessary to do this. > @@ -730,6 +751,31 @@ static coroutine_fn int qcow2_co_readv(BlockDriverState > *bs, int64_t sector_num, > if (ret < 0) { > goto fail; > } > + /* do copy-on-read if this is a cache image */ > + if (bs->is_cache_img && !s->is_cache_full && > + !s->is_writing_on_cache) > + { > + qemu_co_mutex_unlock(&s->lock); > + s->is_writing_on_cache = true; > + ret = bdrv_co_writev(bs, > + sector_num, > + n1, > + &hd_qiov); > + s->is_writing_on_cache = false; > + qemu_co_mutex_lock(&s->lock); > + if (ret < 0) { > + if (ret == (-ENOSPC)) > + { > + s->is_cache_full = true; > + } > + else { > + /* error is other than cache space */ > + fprintf(stderr, "Cache write error (%d)\n", > + ret); > + goto fail; > + } > + } > + } This is unsafe since other QEMU processes on the host are not synchronizing with each other. The image file will be corrupted. Stefan