Re: [f2fs-dev] [PATCH 00/13] Parallelizing filesystem writeback

Kundan Kumar Mon, 23 Jun 2025 23:01:49 -0700

On Wed, Jun 11, 2025 at 9:21 PM Darrick J. Wong <djw...@kernel.org> wrote:
>
> On Wed, Jun 04, 2025 at 02:52:34PM +0530, Kundan Kumar wrote:
> > > > > For xfs used this command:
> > > > > xfs_io -c "stat" /mnt/testfile
> > > > > And for ext4 used this:
> > > > > filefrag /mnt/testfile
> > > >
> > > > filefrag merges contiguous extents, and only counts up for discontiguous
> > > > mappings, while fsxattr.nextents counts all extent even if they are
> > > > contiguous.  So you probably want to use filefrag for both cases.
> > >
> > > Got it — thanks for the clarification. We'll switch to using filefrag
> > > and will share updated extent count numbers accordingly.
> >
> > Using filefrag, we recorded extent counts on xfs and ext4 at three
> > stages:
> > a. Just after a 1G random write,
> > b. After a 30-second wait,
> > c. After unmounting and remounting the filesystem,
> >
> > xfs
> > Base
> > a. 6251   b. 2526  c. 2526
> > Parallel writeback
> > a. 6183   b. 2326  c. 2326
>
> Interesting that the mapping record count goes down...
>
> I wonder, you said the xfs filesystem has 4 AGs and 12 cores, so I guess
> wb_ctx_arr[] is 12?  I wonder, do you see a knee point in writeback
> throughput when the # of wb contexts exceeds the AG count?
>
> Though I guess for the (hopefully common) case of pure overwrites, we
> don't have to do any metadata updates so we wouldn't really hit a
> scaling limit due to ag count or log contention or whatever.  Does that
> square with what you see?
>


Hi Darrick,

We analyzed AG count vs. number of writeback contexts to identify any
knee point. Earlier, wb_ctx_arr[] was fixed at 12; now we varied nr_wb_ctx
and measured the impact.

We implemented a configurable number of writeback contexts to measure
throughput more easily. This feature will be exposed in the next series.
To configure, used: echo <nr_wb_ctx> > /sys/class/bdi/259:2/nwritebacks.

In our test, writing 1G across 12 directories showed improved bandwidth up
to the number of allocation groups (AGs), mostly a knee point, but gains
tapered off beyond that. Also, we see a good increase in bandwidth of about
16 times from base to nr_wb_ctx = 6.

    Base (single threaded)                :  9799KiB/s
    Parallel Writeback (nr_wb_ctx = 1)    :  9727KiB/s
    Parallel Writeback (nr_wb_ctx = 2)    :  18.1MiB/s
    Parallel Writeback (nr_wb_ctx = 3)    :  46.4MiB/s
    Parallel Writeback (nr_wb_ctx = 4)    :  135MiB/s
    Parallel Writeback (nr_wb_ctx = 5)    :  160MiB/s
    Parallel Writeback (nr_wb_ctx = 6)    :  163MiB/s
    Parallel Writeback (nr_wb_ctx = 7)    :  162MiB/s
    Parallel Writeback (nr_wb_ctx = 8)    :  154MiB/s
    Parallel Writeback (nr_wb_ctx = 9)    :  152MiB/s
    Parallel Writeback (nr_wb_ctx = 10)   :  145MiB/s
    Parallel Writeback (nr_wb_ctx = 11)   :  145MiB/s
    Parallel Writeback (nr_wb_ctx = 12)   :  138MiB/s


System config
===========
Number of CPUs = 12
System RAM = 9G
For XFS number of AGs = 4
Used NVMe SSD of 3.84 TB (Enterprise SSD PM1733a)

Script
=====
mkfs.xfs -f /dev/nvme0n1
mount /dev/nvme0n1 /mnt
echo <num_wb_ctx> > /sys/class/bdi/259:2/nwritebacks
sync
echo 3 > /proc/sys/vm/drop_caches

for i in {1..12}; do
  mkdir -p /mnt/dir$i
done

fio job_nvme.fio

umount /mnt
echo 3 > /proc/sys/vm/drop_caches
sync

fio job
=====
[global]
bs=4k
iodepth=1
rw=randwrite
ioengine=io_uring
nrfiles=12
numjobs=1                # Each job writes to a different file
size=1g
direct=0                 # Buffered I/O to trigger writeback
group_reporting=1
create_on_open=1
name=test

[job1]
directory=/mnt/dir1

[job2]
directory=/mnt/dir2
...
...
[job12]
directory=/mnt/dir1

> > ext4
> > Base
> > a. 7080   b. 7080    c. 11
> > Parallel writeback
> > a. 5961   b. 5961    c. 11
>
> Hum, that's particularly ... interesting.  I wonder what the mapping
> count behaviors are when you turn off delayed allocation?
>
> --D
>

I attempted to disable delayed allocation by setting allocsize=4096
during mount (mount -o allocsize=4096 /dev/pmem0 /mnt), but still
observed a reduction in file fragments after a delay. Is there something
I'm overlooking?

-Kundan


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Re: [f2fs-dev] [PATCH 00/13] Parallelizing filesystem writeback

Reply via email to