Hi Wang Yugui,
On Sun, May 28, 2023 at 5:53 PM Wang Yugui <[email protected]> wrote:
> Hi,
>
> > Hi,
> >
> > gfs2 write bandwidth regression on 6.4-rc3 compare to 5.15.y.
> >
> > we added linux-xfs@ and linux-fsdevel@ because some related problem[1]
> > and related patches[2].
> >
> > we compared 6.4-rc3(rather than 6.1.y) to 5.15.y because some related
> > patches[2]
> > work only for 6.4 now.
> >
> > [1]
> > https://lore.kernel.org/linux-xfs/[email protected]/
> > [2]
> > https://lore.kernel.org/linux-xfs/[email protected]/
> >
> >
> > test case:
> > 1) PCIe3 SSD *4 with LVM
> > 2) gfs2 lock_nolock
> > gfs2 attr(T) GFS2_AF_ORLOV
> > # chattr +T /mnt/test
> > 3) fio
> > fio --name=global --rw=write -bs=1024Ki -size=32Gi -runtime=30 -iodepth 1
> > -ioengine sync -zero_buffers=1 -direct=0 -end_fsync=1 -numjobs=1 \
> > -name write-bandwidth-1 -filename=/mnt/test/sub1/1.txt \
> > -name write-bandwidth-2 -filename=/mnt/test/sub2/1.txt \
> > -name write-bandwidth-3 -filename=/mnt/test/sub3/1.txt \
> > -name write-bandwidth-4 -filename=/mnt/test/sub4/1.txt
> > 4) patches[2] are applied to 6.4-rc3.
> >
> >
> > 5.15.y result
> > fio WRITE: bw=5139MiB/s (5389MB/s),
> > 6.4-rc3 result
> > fio WRITE: bw=2599MiB/s (2725MB/s)
>
> more test result:
>
> 5.17.0 WRITE: bw=4988MiB/s (5231MB/s)
> 5.18.0 WRITE: bw=5165MiB/s (5416MB/s)
> 5.19.0 WRITE: bw=5511MiB/s (5779MB/s)
> 6.0.5 WRITE: bw=3055MiB/s (3203MB/s), WRITE: bw=3225MiB/s (3382MB/s)
> 6.1.30 WRITE: bw=2579MiB/s (2705MB/s)
>
> so this regression happen in some code introduced in 6.0,
> and maybe some minor regression in 6.1 too?
thanks for this bug report. Bob has noticed a similar looking
performance regression recently, and it turned out that commit
e1fa9ea85ce8 ("gfs2: Stop using glock holder auto-demotion for now")
inadvertently caused buffered writes to fall back to writing single
pages instead of multiple pages at once. That patch was added in
v5.18, so it doesn't perfectly align with the regression history
you're reporting, but maybe there's something else going on that we're
not aware of.
In any case, the regression introduced by commit e1fa9ea85ce8 should
be fixed by commit c8ed1b359312 ("gfs2: Fix duplicate
should_fault_in_pages() call"), which ended up in v6.5-rc1.
Could you please check where we end up with that fix?
Thank you very much,
Andreas