Hello everyone,

This patch series introduces native io_uring support for FUSE storage
export to overcome the scalability limits of the /dev/fuse interface.
By utilizing shared memory ring buffers and per-core queues, this
feature drastically reduces context switch overhead and lock contention.
This allows FUSE export daemons to achieve much higher throughput and
lower latency by minimizing the userspace-kernel switch penalty.

More details on Fuse-over-io_uring:
https://docs.kernel.org/filesystems/fuse/fuse-io-uring.html


Changes in this version:

- Reorganized patch structure.
- Unified naming of Uring data structures (e.g. FuseRing -> FuseUring)
- Refactored FUSE_IN/OUT_OP_STRUCT_LEGACY
- Code cleanup and logic simplification:
        - Used the io_uring flag to indicate the intention to enable
          Fuse-over-io_uring.
        - Used uring_started to track the active state.
        - Removed unnecessary #ifdef CONFIG_LINUX_IO_URING guards.
- Moved fuse_fd closing to BH in uring mode to prevent data races.
- Updated tests: now using mount to verify if the test image mount is
  fully gone.

More detail in the v3 cover letter:
https://lists.nongnu.org/archive/html/qemu-block/2025-08/msg00325.html

V2 cover letter:
https://lists.nongnu.org/archive/html/qemu-block/2025-08/msg00140.html

V1 cover letter:
https://lists.nongnu.org/archive/html/qemu-block/2025-07/msg00280.html

We used fio to test a 1GB file under both legacy FUSE and
FUSE-over-io_uring modes. The experiments were conducted with the
following iodepth and numjobs configurations: 1-1, 64-1, 1-4, and 64-4,
with 70% read and 30% write mix. This resulted in a total of 8 test
cases, measuring both latency and throughput.

Performance Results:

[Bandwidth (MiB/s)]
| Config (Job/QD)  | Read (Leg -> Uring) | Write (Leg -> Uring)|
|------------------|---------------------|---------------------|
| 1 Job, QD=1      | 72.2 -> 104         | 30.9 -> 44.7        |
| 1 Job, QD=64     | 114  -> 181         | 48.8 -> 77.7        |
| 4 Jobs, QD=1     | 109  -> 159         | 47.0 -> 68.5        |
| 4 Jobs, QD=64    | 106  -> 160         | 45.7 -> 68.9        |

[Latency (usec)]
| Config (Job/QD)  | Read (Leg -> Uring) | Write (Leg -> Uring)|
|------------------|---------------------|---------------------|
| 1 Job, QD=1      | 37.0 -> 23.7        | 36.9 -> 29.5        |
| 1 Job, QD=64     | 1537 -> 964         | 1535 -> 967         |
| 4 Jobs, QD=1     | 96.6 -> 66.4        | 114.2 -> 71.9       |
| 4 Jobs, QD=64    | 6560 -> 4234        | 6600 -> 4280        |

Brian Song (7):
  [Patch v4 1/7] aio-posix: enable 128-byte SQEs
  [Patch v4 2/7] fuse: io_uring mode init
  [Patch v4 3/7] fuse: uring support for write ops
  [Patch v4 4/7] fuse: refactor FUSE request handler
  [Patch v4 5/6] fuse: safe termination for io_uring
  [Patch v4 6/7] fuse: add 'io-uring' option
  [Patch v4 7/7] fuse: add io_uring test support

 block/export/fuse.c                  | 958 +++++++++++++++++++++++----
 docs/tools/qemu-storage-daemon.rst   |   7 +-
 qapi/block-export.json               |   5 +-
 storage-daemon/qemu-storage-daemon.c |   1 +
 tests/qemu-iotests/check             |   2 +
 tests/qemu-iotests/common.rc         |  47 +-
 util/fdmon-io_uring.c                |   7 +-
 7 files changed, 879 insertions(+), 148 deletions(-)

--
2.43.0


Reply via email to