On 7/15/2020 7:04 PM, Ritesh Harjani wrote:
Hello Xing,

On 4/7/20 1:30 PM, kernel test robot wrote:
Greeting,

FYI, we noticed a -60.5% regression of stress-ng.fiemap.ops_per_sec due to commit:


commit: d3b6f23f71670007817a5d59f3fbafab2b794e8c ("ext4: move ext4_fiemap to use iomap framework")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: stress-ng
on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory
with following parameters:

    nr_threads: 10%
    disk: 1HDD
    testtime: 1s
    class: os
    cpufreq_governor: performance
    ucode: 0x500002c
    fs: ext4

I started looking into this issue. But with my unit testing, I didn't
find any perf issue with fiemap ioctl call. I haven't yet explored about
how stress-ng take fiemap performance numbers, it could be doing
something differently. But in my testing I just made sure to create a
file with large number of extents and used xfs_io -c "fiemap -v" cmd
to check how much time it takes to read all the entries in 1st
and subsequent iterations.


Setup comprised of qemu machine on x86_64 with latest linux branch.

1. created a file of 10G using fallocate. (this allocated unwritten
extents for this file).

2. Then I punched hole on every alternate block of file. This step took
a long time. And after sufficiently long time, I had to cancel it.
for i in $(seq 1 2 xxxxx); do echo $i; fallocate -p -o $(($i*4096)) -l 4096; done

3. Then issued fiemap call via xfs_io and took the time measurement.
time xfs_io -c "fiemap -v" bigfile > /dev/null


Perf numbers on latest default kernel build for above cmd.

1st iteration
==============
real    0m31.684s
user    0m1.593s
sys     0m24.174s

2nd and subsequent iteration
============================
real    0m3.379s
user    0m1.300s
sys     0m2.080s


4. Then I reverted all the iomap_fiemap patches and re-tested this.
With this the older ext4_fiemap implementation will be tested:-


1st iteration
==============
real    0m31.591s
user    0m1.400s
sys     0m24.243s


2nd and subsequent iteration (had to cancel it since it was taking more time then 15m)
============================
^C^C
real    15m49.884s
user    0m0.032s
sys     15m49.722s

I guess the reason why 2nd iteration with older implementation is taking
too much time is since with previous implementation we never cached
extent entries in extent_status tree. And also in 1st iteration the page
cache may get filled with lot of buffer_head entries. So maybe page
reclaims are taking more time.

While with the latest implementation using iomap_fiemap(), the call to query extent blocks is done using ext4_map_blocks(). ext4_map_blocks()
by default will also cache the extent entries into extent_status tree.
Hence during 2nd iteration, we will directly read the entries from extent_status tree and will not do any disk I/O.

-ritesh

Could you try stress-ng( https://kernel.ubuntu.com/~cking/stress-ng/) test cases? The tarballs can be get from https://kernel.ubuntu.com/~cking/tarballs/stress-ng/. The command for this case you can try "stress-ng --timeout 1 --times --verify --metrics-brief --sequential 9 --class os --minimize --exclude spawn,exec,swap"
I re-test it on the v5.8-rc6, the regression still existed.

=========================================================================================
tbox_group/testcase/rootfs/kconfig/compiler/debug-setup/nr_threads/disk/testtime/fs/class/cpufreq_governor/ucode:

lkp-csl-2sp5/stress-ng/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/test/10%/1HDD/1s/ext4/os/performance/0x5002f01

commit:
  b2c5764262edded1b1cfff5a6ca82c3d61bb4a4a
  d3b6f23f71670007817a5d59f3fbafab2b794e8c
  v5.8-rc6

b2c5764262edded1 d3b6f23f71670007817a5d59f3f                    v5.8-rc6
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
20419 ± 3% -4.9% 19423 ± 4% +27.1% 25959 stress-ng.af-alg.ops 19655 ± 3% -5.7% 18537 ± 4% +27.8% 25111 stress-ng.af-alg.ops_per_sec 64.67 ± 5% -17.0% 53.67 ± 38% +22.2% 79.00 ± 9% stress-ng.chdir.ops 55.34 ± 3% -13.3% 47.99 ± 38% +26.4% 69.96 ± 10% stress-ng.chdir.ops_per_sec 64652 ± 7% -14.1% 55545 ± 11% -13.6% 55842 ± 6% stress-ng.chown.ops 64683 ± 7% -14.1% 55565 ± 11% -13.6% 55858 ± 6% stress-ng.chown.ops_per_sec 2805 ± 2% +0.6% 2820 ± 2% +130.0% 6452 stress-ng.clone.ops 2802 ± 2% +0.6% 2818 ± 2% +129.9% 6443 stress-ng.clone.ops_per_sec 34.67 +1.9% 35.33 ± 3% -9.6% 31.33 ± 3% stress-ng.copy-file.ops 22297 ± 23% +26.7% 28258 ± 2% +38.1% 30783 ± 14% stress-ng.dir.ops_per_sec 4743 ± 7% -12.6% 4147 ± 4% -19.9% 3801 ± 14% stress-ng.dirdeep.ops_per_sec 24499 ± 8% -9.3% 22223 ± 3% -19.8% 19647 ± 2% stress-ng.enosys.ops 24494 ± 8% -9.3% 22207 ± 3% -19.8% 19637 ± 2% stress-ng.enosys.ops_per_sec 76618 -63.8% 27756 ± 5% -55.5% 34092 ± 2% stress-ng.fiemap.ops 76042 -63.8% 27492 ± 5% -55.4% 33899 ± 2% stress-ng.fiemap.ops_per_sec 3816791 -6.2% 3581880 ± 2% -10.7% 3409584 ± 3% stress-ng.file-ioctl.ops 3817421 -6.2% 3582248 ± 2% -10.7% 3409544 ± 3% stress-ng.file-ioctl.ops_per_sec 27332 ± 17% +30.2% 35574 +37.2% 37506 ± 6% stress-ng.filename.ops 26222 ± 16% +28.6% 33722 +35.6% 35567 ± 6% stress-ng.filename.ops_per_sec 23431 ± 2% +0.5% 23559 +22.5% 28711 ± 4% stress-ng.fork.ops 23433 ± 2% +0.5% 23558 +22.5% 28709 ± 4% stress-ng.fork.ops_per_sec 48074 ± 15% -0.6% 47786 ± 9% -42.5% 27623 stress-ng.get.ops 48088 ± 15% -0.6% 47793 ± 9% -42.6% 27624 stress-ng.get.ops_per_sec 111379 -1.4% 109815 ± 2% -7.8% 102662 ± 4% stress-ng.getdent.ops 111398 -1.4% 109818 ± 2% -7.9% 102651 ± 4% stress-ng.getdent.ops_per_sec 226.67 ± 2% -1.0% 224.33 -15.1% 192.33 ± 6% stress-ng.inotify.ops 226.39 ± 2% -1.3% 223.53 -15.3% 191.82 ± 6% stress-ng.inotify.ops_per_sec 18.00 +9.3% 19.67 ± 2% +3.7% 18.67 ± 2% stress-ng.iomix.ops 4846 ± 10% +2.9% 4984 ± 7% -36.8% 3062 ± 7% stress-ng.kill.ops 4846 ± 10% +2.8% 4982 ± 7% -36.9% 3058 ± 7% stress-ng.kill.ops_per_sec 851324 ± 3% -9.9% 766988 ± 2% +2.3% 870703 stress-ng.lease.ops 850238 ± 3% -9.9% 766255 ± 2% +2.3% 870047 stress-ng.lease.ops_per_sec 134492 ± 14% -7.0% 125085 ± 16% -19.9% 107736 ± 3% stress-ng.mlock.ops_per_sec 1.38 +2.9% 1.42 ± 5% -20.2% 1.10 ± 12% stress-ng.mmapfork.ops_per_sec 1117721 ± 5% -9.2% 1015321 ± 20% -36.2% 713598 stress-ng.mq.ops 1117186 ± 5% -9.2% 1014593 ± 20% -36.2% 712387 stress-ng.mq.ops_per_sec 4584592 ± 2% -2.7% 4460470 ± 7% -12.0% 4036627 ± 8% stress-ng.msg.ops_per_sec 8179 ± 10% +7.3% 8772 ± 9% +24.0% 10144 stress-ng.opcode.ops 8177 ± 10% +7.2% 8770 ± 9% +24.0% 10141 stress-ng.opcode.ops_per_sec 5218 ± 3% +3.6% 5408 ± 2% +27.3% 6642 stress-ng.reboot.ops 5217 ± 3% +3.6% 5407 ± 2% +27.3% 6640 stress-ng.reboot.ops_per_sec 1724 ± 21% +27.9% 2205 ± 18% +54.3% 2660 ± 19% stress-ng.resources.ops 1217728 -31.4% 835808 ± 14% -18.7% 989423 ± 10% stress-ng.revio.ops 1217899 -31.4% 835846 ± 14% -18.8% 989313 ± 10% stress-ng.revio.ops_per_sec 752664 ± 9% -7.6% 695116 ± 10% -0.6% 748117 ± 6% stress-ng.seal.ops 752957 ± 9% -7.7% 695278 ± 10% -0.6% 748068 ± 6% stress-ng.seal.ops_per_sec 628.00 ± 83% -20.1% 502.00 ± 8% +176.2% 1734 stress-ng.seccomp.ops 628.22 ± 83% -20.1% 502.08 ± 8% +176.1% 1734 stress-ng.seccomp.ops_per_sec 1361741 ± 7% +4.0% 1416628 ± 8% +17.2% 1596549 ± 9% stress-ng.seek.ops 1361799 ± 7% +4.0% 1416578 ± 8% +17.2% 1596376 ± 9% stress-ng.seek.ops_per_sec 1537496 +1.2% 1556468 -9.9% 1385679 stress-ng.sigrt.ops 1532297 +1.3% 1551993 -9.9% 1381175 stress-ng.sigrt.ops_per_sec 1324589 ± 12% -26.6% 972626 ± 9% -14.8% 1128923 ± 4% stress-ng.sigsuspend.ops 1323957 ± 12% -26.6% 972084 ± 9% -14.8% 1128174 ± 4% stress-ng.sigsuspend.ops_per_sec 806.67 ± 13% +11.2% 896.67 ± 10% +31.3% 1059 ± 4% stress-ng.sockabuse.ops 806.73 ± 13% +11.1% 896.44 ± 10% +31.3% 1059 ± 4% stress-ng.sockabuse.ops_per_sec 6224635 ± 2% +0.8% 6274521 ± 2% -12.8% 5427883 ± 2% stress-ng.sockpair.ops 5383397 -1.9% 5279526 ± 2% -14.9% 4579084 ± 4% stress-ng.sockpair.ops_per_sec 22409 ± 6% -2.8% 21784 ± 5% -41.2% 13172 ± 9% stress-ng.sysbadaddr.ops 22414 ± 6% -2.8% 21784 ± 5% -41.3% 13167 ± 9% stress-ng.sysbadaddr.ops_per_sec 525.33 ± 13% +2.6% 539.00 ± 15% -42.1% 304.33 ± 25% stress-ng.sysfs.ops 524.98 ± 13% +2.6% 538.58 ± 15% -42.1% 303.79 ± 25% stress-ng.sysfs.ops_per_sec 3077 ± 12% +19.9% 3689 ± 4% +23.0% 3786 ± 4% stress-ng.sysinfo.ops 3009 ± 11% +22.6% 3688 ± 4% +25.7% 3783 ± 4% stress-ng.sysinfo.ops_per_sec 3001932 +1.2% 3038017 +15.7% 3474274 stress-ng.udp.ops 3002618 +1.2% 3037939 +15.7% 3473606 stress-ng.udp.ops_per_sec 184634 ± 12% -6.9% 171823 +24.5% 229788 ± 4% stress-ng.utime.ops 184678 ± 12% -7.0% 171836 +24.4% 229784 ± 4% stress-ng.utime.ops_per_sec 1835 ± 5% +2.2% 1875 ± 4% +8.2% 1986 ± 3% stress-ng.vforkmany.ops 1829 ± 5% +1.9% 1863 ± 4% +8.1% 1978 ± 3% stress-ng.vforkmany.ops_per_sec 14829 -1.0% 14682 ± 2% -20.1% 11848 ± 3% stress-ng.vm-segv.ops 14830 -1.0% 14682 ± 2% -29.7% 10433 ± 13% stress-ng.vm-segv.ops_per_sec 3527835 ± 3% +1.9% 3596613 ± 5% +495.6% 21010377 stress-ng.vm-splice.ops 3528845 ± 3% +1.9% 3597219 ± 5% +495.4% 21011045 stress-ng.vm-splice.ops_per_sec





--
Zhengjun Xing

Reply via email to