During the last few releases we have got rid of most of the overhead of metadata writes during cluster allocation. What's left is the COW for unaligned allocating write requests, and it's quite expensive.
In the general case, this cost cannot be avoided. However, if we're lucky enough that before the next flush the data copied during COW would be overwritten, we can do without the COW. Sequential writes always overwrite the COW area at the end of the cluster immediately, so delaying the COW a bit and cancelling it if it's overwritten is a worthwhile optimisation. The really interesting part of this series should be close to final; however, you only see the improvements with the last patch applied, which isn't quite correct yet. Doing it right requires some additional refactoring, so I thought I'd get this out for a first round of review before fixing it. iozone results with and without this series show significant difference for allocating writes: random random KB reclen write rewrite read reread read write base 65536 8 1727 1945 12546 12539 2449 1836 patch 65536 8 1934 1949 12263 12521 2463 1796 base 1048576 256 22344 38437 105823 106135 37743 32167 patch 1048576 256 35989 38542 105231 105994 38301 33036 Kevin Wolf (16): qcow2: Round QCowL2Meta.offset down to cluster boundary qcow2: Introduce Qcow2COWRegion qcow2: Allocate l2meta dynamically qcow2: Drop l2meta.cluster_offset qcow2: Allocate l2meta only for cluster allocations qcow2: Enable dirty flag in qcow2_alloc_cluster_link_l2 qcow2: Factor out handle_dependencies() qcow2: Reading from areas not in L2 tables yet qcow2: Move COW and L2 update into own coroutine qcow2: Delay the COW qcow2: Add error handling to the l2meta coroutine qcow2: Handle dependencies earlier qcow2: Change handle_dependency to byte granularity qcow2: Execute run_dependent_requests() without lock qcow2: Cancel COW when overwritten [BROKEN] qcow2: Overwrite COW and allocate new cluster at the same time block.c | 5 + block/qcow2-cluster.c | 432 ++++++++++++++++++++++++++++++++++++++----------- block/qcow2.c | 239 +++++++++++++++++++++++----- block/qcow2.h | 153 +++++++++++++++++- block_int.h | 3 + 5 files changed, 692 insertions(+), 140 deletions(-) -- 1.7.6.5