Hi, Filipe Manana

There are some dbench(sync mode) result on the same hardware,
but with different linux kernel

4.14.200
Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 WriteX        225281     5.163    82.143
 Flush          32161     2.250    62.669
Throughput 236.719 MB/sec (sync open)  32 clients  32 procs  max_latency=82.149 
ms

4.19.21
Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 WriteX        118842    10.946   116.345
 Flush          16506     0.115    44.575
Throughput 125.973 MB/sec (sync open)  32 clients  32 procs  
max_latency=116.390 ms

4.19.150
 Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 WriteX        144509     9.151   117.353
 lush          20563     0.128    52.014
Throughput 153.707 MB/sec (sync open)  32 clients  32 procs  
max_latency=117.379 ms

5.4.91
 Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 WriteX        367033     4.377  1908.724
 Flush          52037     0.159    39.871
Throughput 384.554 MB/sec (sync open)  32 clients  32 procs  
max_latency=1908.968 ms

5.10.12+patches
Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 WriteX        429696     3.960  2239.973
 Flush          60771     0.621     6.794
Throughput 452.385 MB/sec (sync open)  32 clients  32 procs  
max_latency=1963.312 ms


MaxLat / AvgLat  of WriteX is increased from 82.143/5.163=15.9  to
2239.973/3.960=565.6.

For QoS, can we have an option to tune the value of MaxLat / AvgLat  of
WriteX to less than 100?

Best Regards
Wang Yugui (wangyu...@e16-tech.com)
2021/02/02

> Hi, Filipe Manana
> 
> > On Tue, Feb 2, 2021 at 5:42 AM Wang Yugui <wangyu...@e16-tech.com> wrote:
> > >
> > > Hi, Filipe Manana
> > >
> > > The dbench result with these patches is very good. thanks a lot.
> > >
> > > This is the dbench(synchronous mode) result , and then a question.
> > >
> > > command: dbench -s -t 60 -D /btrfs/ 32
> > > mount option:ssd,space_cache=v2
> > > kernel:5.10.12 + patchset 1 + this patchset
> > 
> > patchset 1 and "this patchset" are the same, did you mean two
> > different patchsets or just a single patchset?
> 
> patchset1:
> btrfs: some performance improvements for dbench alike workloads
> 
> patchset2:
> btrfs: more performance improvements for dbench workloads
> https://patchwork.kernel.org/project/linux-btrfs/list/?series=422801
> 
> I'm sorroy that I have replayed to the wrong patchset.
> 
> > 
> > >
> > > Question:
> > > for synchronous mode, the result type 1 is perfect?
> > 
> > What do you mean by perfect? You mean if result 1 is better than result 2?
> 
> In result 1,  the MaxLat of Flush of dbench synchronous mode is fast as
> expected, the same level as  kernel 5.4.91.
> 
> But in result 2, the MaxLat of Flush of dbench synchronous mode is big
> as write level, but this is synchronous mode, most job should be done
> already before flush.
> 
> > > and there is still some minor place about the flush to do for
> > > the result type2?
> > 
> > By "minor place" you mean the huge difference I suppose.
> > 
> > >
> > >
> > > result type 1:
> > >
> > >  Operation      Count    AvgLat    MaxLat
> > >  ----------------------------------------
> > >  NTCreateX     868942     0.028     3.017
> > >  Close         638536     0.003     0.061
> > >  Rename         36851     0.663     4.000
> > >  Unlink        175182     0.399     5.358
> > >  Qpathinfo     789014     0.014     1.846
> > >  Qfileinfo     137684     0.002     0.047
> > >  Qfsinfo       144241     0.004     0.059
> > >  Sfileinfo      70913     0.008     0.046
> > >  Find          304554     0.057     1.889
> > > ** WriteX        429696     3.960  2239.973
> > >  ReadX        1363356     0.005     0.358
> > >  LockX           2836     0.004     0.038
> > >  UnlockX         2836     0.002     0.018
> > > ** Flush          60771     0.621     6.794
> > >
> > > Throughput 452.385 MB/sec (sync open)  32 clients  32 procs  
> > > max_latency=1963.312 ms
> > > + stat -f -c %T /btrfs/
> > > btrfs
> > > + uname -r
> > > 5.10.12-4.el7.x86_64
> > >
> > >
> > > result type 2:
> > >  Operation      Count    AvgLat    MaxLat
> > >  ----------------------------------------
> > >  NTCreateX     888943     0.028     2.679
> > >  Close         652765     0.002     0.058
> > >  Rename         37705     0.572     3.962
> > >  Unlink        179713     0.383     3.983
> > >  Qpathinfo     806705     0.014     2.294
> > >  Qfileinfo     140752     0.002     0.125
> > >  Qfsinfo       147909     0.004     0.049
> > >  Sfileinfo      72374     0.008     0.104
> > >  Find          311839     0.058     2.305
> > > ** WriteX        439656     3.854  1872.109
> > >  ReadX        1396868     0.005     0.324
> > >  LockX           2910     0.004     0.026
> > >  UnlockX         2910     0.002     0.025
> > > ** Flush          62260     0.750  1659.364
> > >
> > > Throughput 461.856 MB/sec (sync open)  32 clients  32 procs  
> > > max_latency=1872.118 ms
> > > + stat -f -c %T /btrfs/
> > > btrfs
> > > + uname -r
> > > 5.10.12-4.el7.x86_64
> > 
> > I'm not sure what your question is exactly.
> > 
> > Are both results after applying the same patchset, or are they before
> > and after applying the patchset, respectively?
> 
> Both result after applying the same patchset.
> and both on the same server, same SAS SSD disk.
> but the result is not stable, and the major diff is MaxLat of Flush.
> 
> Server:Dell T7610
> CPU: E5-2660 v2(10core 20threads) x2
> SSD:TOSHIBA  PX05SMQ040
> Memory:192G (with ECC)
> 
> 
> > If they are both with the patchset applied, and you wonder about the
> > big variation in the "Flush" operations, I am not sure about why it is
> > so.
> > Both throughput and max latency are better in result 2.
> > 
> > It's normal to have variations across dbench runs, I get them too, and
> > I do several runs (5 or 6) to check things out.
> > 
> > I don't use virtualization (testing on bare metal), I set the cpu
> > governor mode to performance (instead of the "powersave" default) and
> > use a non-debug kernel configuration, because otherwise I get
> > significant variations in latencies and throughput too (though I never
> > got a huge difference such as from 6.794 to 1659.364).
> 
> This is a bare metal(dell T7610).
> CPU mode is set to performance by BIOS. and I checked it by
> 'cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor'
> 
> Maybe I used a SAS ssd, and the queue depth of SAS SSD is 254.
> smaller than 1023 of a NVMe SSD,but it is still enough for
> dbench 32 threads?
> 
> 
> The huge difference of MaxLat of Flush such as from 6.794 to 1659.364 is
> a problem.
> It is not easy to re-product both,  mabye easy to reproduce the small
> one, maybe easy to reproduce the big one.
> 
> 
> Best Regards
> Wang Yugui (wangyu...@e16-tech.com)
> 2021/02/02
> 


Reply via email to