On Wed, Jan 23, 2013 at 09:15:47PM +0800, Liu Yuan wrote: > On 01/23/2013 08:34 PM, Stefan Hajnoczi wrote: > > On Wed, Jan 23, 2013 at 06:47:55PM +0800, Liu Yuan wrote: > >> On 01/23/2013 06:14 PM, Daniel P. Berrange wrote: > >>> On Wed, Jan 23, 2013 at 06:09:01PM +0800, Liu Yuan wrote: > >>>> On 01/23/2013 05:30 PM, Daniel P. Berrange wrote: > >>>>> FYI There is a patch proposed for customization > >>>>> > >>>>> https://review.openstack.org/#/c/18042/ > >>>>> > >>>> > >>>> Seems that this patch is dropped and declined? > >>>> > >>>>> > >>>>> I should note that it is wrong to assume that enabling cache mode will > >>>>> improve the performance in general. Allowing caching in the host will > >>>>> require a non-negligable amount of host RAM to have a benefit. RAM is > >>>>> usually the most constrained resource in any virtualization environment. > >>>>> So while the cache may help performance when only one or two Vms are > >>>>> running on the host, it may well in fact hurt performance once the host > >>>>> is running enough VMs to max out RAM. So allowing caching will actually > >>>>> give you quite variable performance, while the cache=none will give you > >>>>> consistent performance regardless of host RAM utilization (underlying > >>>>> contention of the storage device may of course still impact things). > >>>> > >>>> Yeah, allowing page cache in the host might not be a good idea to run > >>>> multiple VMs, but cache type in QEMU has different meaning for network > >>>> block devices. For e.g, we use 'cache type' to control client side cache > >>>> of Sheepdog cluster, which implement a object cache in the local disk > >>>> for performance boost and reducing network traffics. This doesn't > >>>> consume memory at all, just occupy the disk space where runs sheep > >>>> daemon. > > > > How can it be a "client-side cache" if it doesn't consume memory on the > > client? > > > > Please explain how the "client-side cache" feature works. I'm not > > familiar with sheepdog internals. > > > > Let me start with local file as backend of block device of QEMU. It > basically uses host memory pages to cache blocks of emulated device. > Kernel internally maps those blocks into pages of file (A.K.A page > cache) and then we relies on the kernel memory subsystem to do writeback > of those cached pages. When VM read/write some blocks, kernel allocate > pages on demand to serve the read/write requests operated on the pages. > > QEMU <----> VM > ^ > | writeback/readahead pages > V | > POSIX file < --- > page cache < --- > disk > | > kernel does page wb/ra and reclaim > > Object cache of Sheepdog do the similar things, the difference is that > we map those requested blocks into objects (which is plain fixed size > file on each node) and the sheep daemon play the role of kernel that > doing writeback of the dirty objects and reclaim of the clean objects to > make room to allocate objects for other requests. > > QEMU <----> VM > ^ > | push/pull objects > V | > SD device < --- > object cache < --- > SD replicated object storage. > | > Sheep daemon does object push/pull and reclaim > > > Object is implemented as fixed size file on disks, so for object cache, > those objects are all fixed size files on the node that sheep daemon > runs and sheep does directio on them. In this sense that we don't > consume memory, except those objects' metadata(inode & dentry) on the node.
Does QEMU usually talk to a local sheepdog daemon? I guess it must do that, otherwise the cache doesn't avoid network traffic. > >>> That is a serious abuse of the QEMU cache type variable. You now have one > >>> setting with two completely different meanings for the same value. If you > >>> want to control whether the sheepdog driver uses a local disk for object > >>> cache you should have a completely separate QEMU command line setting > >>> which can be controlled independantly of the cache= setting. > >>> > >> > >> Hello Stefen and Kevin, > >> > >> Should sheepdog driver use another new command setting to control its > >> internal cache? > >> > >> For network block device, which simply forward the IO requests from > >> VMs via networking and never have chance to touch host's memory, I think > >> it is okay to multiplex the 'cache=type', but it looks that it causes > >> confusion for libvirt code. > > > > From block/sheepdog.c: > > > > /* > > * QEMU block layer emulates writethrough cache as 'writeback + flush', so > > * we always set SD_FLAG_CMD_CACHE (writeback cache) as default. > > */ > > s->cache_flags = SD_FLAG_CMD_CACHE; > > if (flags & BDRV_O_NOCACHE) { > > s->cache_flags = SD_FLAG_CMD_DIRECT; > > } > > > > That means -drive cache=none and -drive cache=directsync use > > SD_FLAG_CMD_DIRECT. > > > > And -drive cache=writeback and cache=writethrough use SD_FLAG_CMD_CACHE. > > > > This matches the behavior that QEMU uses for local files: > > none/directsync mean O_DIRECT and writeback/writethrough go via the page > > cache. > > > > When you use NFS O_DIRECT also means bypass client-side cache. Where is > > the issue? > > I don't have any issue on this, just Daniel complained that Sheepdog > possible abuse the cache flags, which he thinks it should be page cache > oriented only, if I understand correctly. Daniel: I know of users setting cache= differently depending on local files vs NFS. That's because O_DIRECT isn't well-defined and has no impact on guest I/O semantics. It's purely a performance option that you can choose according to your workload and host configuration - just like Sheepdog's SD_FLAG_CMD_DIRECT. Stefan