This patch series refactors QEMU's FUSE export module to leverage coroutines
for read/write operations,
addressing concurrency limitations and aligning with QEMU's asynchronous I/O
model. The changes
demonstrate measurable performance improvements while simplifying resource
management.
1. technology implementation
according to Stefan suggerstion, i move the processing logic of
read_from_fuse_export into a coroutine for buffer management.
and change the fuse_getattr to call: bdrv_co_get_allocated_file_size().
2. performance summary
For the coroutine_integration_fuse test, the average results for iodepth=1
and iodepth=64 are as follows:
-------------------------------
Average results for iodepth=1:
Read_IOPS: coroutine_integration_fuse: 4492.88 | origin: 4309.39 | 4.25%
improvement
Write_IOPS: coroutine_integration_fuse: 4500.68 | origin: 4318.68 | 4.21%
improvement
Read_BW: coroutine_integration_fuse: 17971.00 KB/s | origin: 17237.30 KB/s
| 4.26% improvement
Write_BW: coroutine_integration_fuse: 18002.50 KB/s | origin: 17274.30 KB/s
| 4.23% improvement
--------------------------------
-------------------------------
Average results for iodepth=64:
Read_IOPS: coroutine_integration_fuse: 5576.93 | origin: 5347.13 | 4.29%
improvement
Write_IOPS: coroutine_integration_fuse: 5569.55 | origin: 5337.33 | 4.33%
improvement
Read_BW: coroutine_integration_fuse: 22311.40 KB/s | origin: 21392.20 KB/s
| 4.31% improvement
Write_BW: coroutine_integration_fuse: 22282.20 KB/s | origin: 21353.20 KB/s
| 4.34% improvement
--------------------------------
Although all metrics show improvements, the gains are concentrated in the
4.2%–4.3% range, which is lower than expected. Further investigation using
gprof reveals the reasons for this limited improvement.
3. Performance Bottlenecks Identified via gprof
After running a fio test with the following command:
fio --ioengine=io_uring --numjobs=1 --runtime=30 --ramp_time=5 \
--rw=randrw --bs=4k --time_based=1 --name=job1 \
--filename=/mnt/qemu-fuse --iopath=64
and analyzing the execution profile using gprof, the following issues were
identified:
3.1 Increased Overall Execution Time
In the original implementation, fuse_write + blk_pwrite accounted for 8.7%
of total execution time (6.0% + 2.7%).
After refactoring, fuse_write_coroutine + blk_co_pwrite now accounts for
43.1% (22.9% + 20.2%).
This suggests that coroutine overhead is contributing significantly to
execution time.
3.2 Increased Read and Write Calls
fuse_write calls increased from 173,400 → 333,232.
fuse_read calls increased from 173,526 → 332,931.
This indicates that the coroutine-based approach is introducing redundant
I/O calls, likely due to unnecessary coroutine switches.
3.3 Significant Coroutine Overhead
qemu_coroutine_enter is now called 1,572,803 times, compared to ~476,057
previously.
This frequent coroutine switching introduces unnecessary overhead, limiting
the expected performance improvements.
saz97 (1):
Integration coroutines into fuse export
block/export/fuse.c | 190 +++++++++++++++++++++++++++++---------------
1 file changed, 126 insertions(+), 64 deletions(-)
--
2.34.1