On 12/30/2016 05:31 PM, Pavel Butsykin wrote: > The prefetch cache aims to improve the performance of sequential read data. > Of most interest here are the requests of a small size of data for sequential > read, such requests can be optimized by extending them and moving into > the prefetch cache. However, there are 2 issues: > - In aggregate only a small portion of requests is sequential, so delays > caused > by the need to read more volumes of data will lead to an overall decrease > in performance. > - The presence of redundant data in the cache memory with a large number of > random requests. > This pcache implementation solves the above and other problems prefetching > data. > The pcache algorithm can be summarised by the following main steps. > > 1. Monitor I/O requests to identify typical sequences. > This implementation of prefetch cache works at the storage system level and > has > information only about the physical block addresses of I/O requests. > Statistics > are collected only from read requests to a maximum size of 64kb(by default), > each request that matches the criteria falls into a pool of requests. In order > to store request statistics used by the rb-tree, it's simple but for > this issue a quite efficient data structure. > > 2. Identifying sequential I/O streams. > For each read request to be carried out attempting to lift the chain sequence > from the tree statistics, where this request will be element of a sequential > chain of requests. The key to search for consecutive requests is the area of > bytes > preceding the current request. The size of this area should not be too small > to > avoid false readahead. The sequential stream data requests can be identified > even when a large number of random requests. For example, if there is access > to > the blocks 100, 1157, 27520, 4, 101, 312, 1337, 102, in the context of request > processing 102 will be identified the chain of sequential requests 100, 101. > 102 > and then should a decision be made to do readahead. Also a situation may arise > when multiple applications A, B, C simultaneously perform sequential read of > data. For each separate application that will be sequential read data > A(100, 101, 102), B(300, 301, 302), C(700, 701, 702), but for block devices > it > may look like a random data reading: 100,300,700,101,301,701,102,302,702. > In this case, the sequential streams will also be recognised because location > requests in the rb-tree will allow to separate the sequential I/O streams. > > 3. Do the readahead into the cache for recognized sequential data streams. > After the issue of the detection of pcache case was resolved, need using > larger > requests to bring data into the cache. In this implementation the pcache used > readahead instead of the extension request, therefore the request goes as is. > There is not any reason to put data in the cache that will never be picked > up, > but this will always happen in the case of extension requests. In order to > store > areas of cached blocks is also used the rb-tree, it's simple but for this > issue > a quite efficient data structure. > > 4. Control size of the prefetch cache pool and the requests statistic pool > For control the border of the pool statistic of requests, the data of > requests > are placed and replaced according to the FIFO principle, everything is simple. > For control the boundaries of the memory cache used LRU list, it allows to > limit > the max amount memory that we can allocate for pcache. But the LRU is there > mainly to prevent displacement of the cache blocks that was read partially. > The main way the memory is pushed out immediately after use, as soon as a > chunk > of memory from the cache has been completely read, since the probability of > repetition of the request is very low. Cases when one and the same portion of > the cache memory has been read several times are not optimized and do not > apply > to the cases that can optimize the pcache. Thus, using a cache memory of small > volume, by the optimization of the operations read-ahead and clear memory, we > can read entire volumes of data, providing a 100% cache hit. Also does not > decrease the effectiveness of random read requests. > > PCache is implemented as a qemu block filter driver, has some configurable > parameters, such as: total cache size, statistics size, readahead size, > maximum size of block that can be processed. > > For performance evaluation has been used several test cases with different > sequential and random read data on SSD disk. Here are the results of tests and > qemu parameters: > > qemu parameters: > -machine pc,accel=kvm,usb=off,vmport=off -m 1024 -smp 8 > -drive > file=/img/harddisk.hdd,if=none,cache=none,id=drive-scsi0-0-0-0,aio=native > -drive driver=pcache,image=drive-scsi0-0-0-0,if=virtio > > ***************************************************************** > * Testcase * Results in iops * > * ******************************* > * * clean qemu * pcache * > ***************************************************************** > * Create/open 16 file(s) of total * 21645 req/s * 74793 req/s * > * size 2048.00 MB named * 20385 req/s * 66481 req/s * > * /tmp/tmp.tmp, start 4 thread(s) * 20616 req/s * 69007 req/s * > * and do uncached sequential read * * * > * by 4KB blocks * * * > ***************************************************************** > * Create/open 16 file(s) of total * 84033 req/s * 87828 req/s * > * size 2048.00 MB named * 84602 req/s * 89678 req/s * > * /tmp/tmp.tmp, start 4 thread(s) * 83163 req/s * 96202 req/s * > * and do uncached sequential read * * * > * by 4KB blocks with constant * * * > * queue len 32 * * * > ***************************************************************** > * Create/open 16 file(s) of total * 14104 req/s * 14164 req/s * > * size 2048.00 MB named * 14130 req/s * 14232 req/s * > * /tmp/tmp.tmp, start 4 thread(s) * 14183 req/s * 14080 req/s * > * and do uncached random read by * * * > * 4KB blocks * * * > ***************************************************************** > * Create/open 16 file(s) of total * 23480 req/s * 23483 req/s * > * size 2048.00 MB named * 23070 req/s * 22432 req/s * > * /tmp/tmp.tmp, start 4 thread(s) * 24090 req/s * 23499 req/s * > * and do uncached random read by * * * > * 4KB blocks with constant queue * * * > * len 32 * * * > ***************************************************************** > > Changes from v1: > - avoid bdrv_aio_*() interfaces > - add pcache to the QAPI schema > - fix remarks and add more comments for rbcache > - add more scenarios for "/rbcache/insert" test > - fix rbcache/shrink/* tests > - pcache: up-to-date cache for removed nodes > - rewrite "block/pcache: pick up parts of the cache" patch > - changed the statuses of nodes for a more flexible determination of > the node state > > Pavel Butsykin (18): > block/pcache: empty pcache driver filter > util/rbtree: add rbtree from linux kernel > util/rbcache: range-based cache core > tests/test-rbcache: add test cases > block/pcache: statistics collection read requests > block/pcache: skip large aio read > block/pcache: updating statistics for overlapping requests > block/pcache: add AIO readahead > block/pcache: skip readahead for unallocated clusters > block/pcache: cache invalidation on write requests > block/pcache: add reading data from the cache > block/pcache: inflight readahead request waiting for read > block/pcache: write through > block/pcache: up-to-date cache for removed nodes > block/pcache: pick up parts of the cache > block/pcache: drop used pcache nodes > qapi: allow blockdev-add for pcache > block/pcache: add tracepoints > > MAINTAINERS | 13 + > block/Makefile.objs | 1 + > block/pcache.c | 764 > ++++++++++++++++++++++++++++++++++++++++ > block/trace-events | 10 + > include/qemu/rbcache.h | 128 +++++++ > include/qemu/rbtree.h | 107 ++++++ > include/qemu/rbtree_augmented.h | 235 ++++++++++++ > qapi/block-core.json | 30 +- > tests/Makefile.include | 3 + > tests/test-rbcache.c | 431 +++++++++++++++++++++++ > util/Makefile.objs | 2 + > util/rbcache.c | 253 +++++++++++++ > util/rbtree.c | 570 ++++++++++++++++++++++++++++++ > 13 files changed, 2545 insertions(+), 2 deletions(-) > create mode 100644 block/pcache.c > create mode 100644 include/qemu/rbcache.h > create mode 100644 include/qemu/rbtree.h > create mode 100644 include/qemu/rbtree_augmented.h > create mode 100644 tests/test-rbcache.c > create mode 100644 util/rbcache.c > create mode 100644 util/rbtree.c > ping?