On 28.11.22 15:15, Nir Soffer wrote:
Add coroutine based loop inspired by `qemu-img convert` design.
Changes compared to `qemu-img convert`:
- State for the entire image is kept in ImgChecksumState
- State for single worker coroutine is kept in ImgChecksumworker.
- "Writes" are always in-order, ensured using a queue.
- Calling block status once per image extent, when the current extent is
consumed by the workers.
- Using 1m buffer size - testings shows that this gives best read
performance both with buffered and direct I/O.
- Number of coroutines is not configurable. Testing does not show
improvement when using more than 8 coroutines.
- Progress include entire image, not only the allocated state.
Comparing to the simple read loop shows that this version is up to 4.67
times faster when computing a checksum for an image full of zeroes. For
real images it is 1.59 times faster with direct I/O, and with buffered
I/O there is no difference.
Test results on Dell PowerEdge R640 in a CentOS Stream 9 container:
| image | size | i/o | before | after | change |
|----------|------|-----------|----------------|----------------|--------|
| zero [1] | 6g | buffered | 1.600s ±0.014s | 0.342s ±0.016s | x4.67 |
| zero | 6g | direct | 4.684s ±0.093s | 2.211s ±0.009s | x2.12 |
| real [2] | 6g | buffered | 1.841s ±0.075s | 1.806s ±0.036s | x1.02 |
| real | 6g | direct | 3.094s ±0.079s | 1.947s ±0.017s | x1.59 |
| nbd [3] | 6g | buffered | 2.455s ±0.183s | 1.808s ±0.016s | x1.36 |
| nbd | 6g | direct | 3.540s ±0.020s | 1.749s ±0.018s | x2.02 |
[1] raw image full of zeroes
[2] raw fedora 35 image with additional random data, 50% full
[3] image [2] exported by qemu-nbd via unix socket
Signed-off-by: Nir Soffer <nsof...@redhat.com>
---
qemu-img.c | 350 ++++++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 277 insertions(+), 73 deletions(-)
Reviewed-by: Hanna Reitz <hre...@redhat.com>