On 2018/12/12 11:01, Rong Chen wrote: > > > On 12/11/2018 06:12 PM, Chao Yu wrote: >> Hi all, >> >> The commit only clean up codes which are unused currently, so why we can >> improve performance with it? could you retest to make sure? > > Hi Chao, > > the improvement is exist in 0day environment.
Hi Rong, Logically, the deleted codes are dead, and removal of them shouldn't impact any flows. Instead, I expect below patch can improve performance in some cases: https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev-test&id=1e771e83ce26d0ba2ce6c7df7effb7822f032c4a So I suspect there is problem in current test result, not sure the problem is in test suit or test environment. Any thoughts? Thanks, > > ➜ job cat > /result/aim7/4BRD_12G-RAID1-f2fs-disk_rw-3000-performance/lkp-ivb-ep01/debian-x86_64-2018-04-03.cgz/x86_64-rhel-7.2/gcc-7/089842de5750f434aa016eb23f3d3a3a151083bd/*/aim7.json > > | grep -A1 min\" > "aim7.jobs-per-min": [ > 111406.82 > -- > "aim7.jobs-per-min": [ > 110851.09 > -- > "aim7.jobs-per-min": [ > 111399.93 > -- > "aim7.jobs-per-min": [ > 110327.92 > -- > "aim7.jobs-per-min": [ > 110321.16 > > ➜ job cat > /result/aim7/4BRD_12G-RAID1-f2fs-disk_rw-3000-performance/lkp-ivb-ep01/debian-x86_64-2018-04-03.cgz/x86_64-rhel-7.2/gcc-7/d6c66cd19ef322fe0d51ba09ce1b7f386acab04a/*/aim7.json > > | grep -A1 min\" > "aim7.jobs-per-min": [ > 97082.14 > -- > "aim7.jobs-per-min": [ > 95959.06 > -- > "aim7.jobs-per-min": [ > 95959.06 > -- > "aim7.jobs-per-min": [ > 95851.75 > -- > "aim7.jobs-per-min": [ > 96946.19 > > Best Regards, > Rong Chen > >> >> Thanks, >> >> On 2018/12/11 17:59, kernel test robot wrote: >>> Greeting, >>> >>> FYI, we noticed a 15.4% improvement of aim7.jobs-per-min due to commit: >>> >>> >>> commit: 089842de5750f434aa016eb23f3d3a3a151083bd ("f2fs: remove codes of >>> unused wio_mutex") >>> https://git.kernel.org/cgit/linux/kernel/git/jaegeuk/f2fs.git dev-test >>> >>> in testcase: aim7 >>> on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with >>> 384G memory >>> with following parameters: >>> >>> disk: 4BRD_12G >>> md: RAID1 >>> fs: f2fs >>> test: disk_rw >>> load: 3000 >>> cpufreq_governor: performance >>> >>> test-description: AIM7 is a traditional UNIX system level benchmark suite >>> which is used to test and measure the performance of multiuser system. >>> test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/ >>> >>> In addition to that, the commit also has significant impact on the >>> following tests: >>> >>> +------------------+-----------------------------------------------------------------------+ >>> | testcase: change | aim7: aim7.jobs-per-min 8.8% improvement >>> | >>> | test machine | 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz >>> with 384G memory | >>> | test parameters | cpufreq_governor=performance >>> | >>> | | disk=4BRD_12G >>> | >>> | | fs=f2fs >>> | >>> | | load=3000 >>> | >>> | | md=RAID1 >>> | >>> | | test=disk_rr >>> | >>> +------------------+-----------------------------------------------------------------------+ >>> >>> >>> Details are as below: >>> --------------------------------------------------------------------------------------------------> >>> >>> >>> To reproduce: >>> >>> git clone https://github.com/intel/lkp-tests.git >>> cd lkp-tests >>> bin/lkp install job.yaml # job file is attached in this email >>> bin/lkp run job.yaml >>> >>> ========================================================================================= >>> compiler/cpufreq_governor/disk/fs/kconfig/load/md/rootfs/tbox_group/test/testcase: >>> >>> gcc-7/performance/4BRD_12G/f2fs/x86_64-rhel-7.2/3000/RAID1/debian-x86_64-2018-04-03.cgz/lkp-ivb-ep01/disk_rw/aim7 >>> >>> commit: >>> d6c66cd19e ("f2fs: fix count of seg_freed to make sec_freed correct") >>> 089842de57 ("f2fs: remove codes of unused wio_mutex") >>> >>> d6c66cd19ef322fe 089842de5750f434aa016eb23f >>> ---------------- -------------------------- >>> %stddev %change %stddev >>> \ | \ >>> 96213 +15.4% 110996 aim7.jobs-per-min >>> 191.50 ± 3% -15.1% 162.52 aim7.time.elapsed_time >>> 191.50 ± 3% -15.1% 162.52 aim7.time.elapsed_time.max >>> 1090253 ± 2% -17.5% 899165 >>> aim7.time.involuntary_context_switches >>> 176713 -7.5% 163478 aim7.time.minor_page_faults >>> 6882 -14.6% 5875 aim7.time.system_time >>> 127.97 +4.7% 134.00 aim7.time.user_time >>> 760923 +7.1% 814632 >>> aim7.time.voluntary_context_switches >>> 78499 ± 2% -11.2% 69691 >>> interrupts.CAL:Function_call_interrupts >>> 3183861 ± 4% -16.7% 2651390 ± 4% softirqs.TIMER >>> 191.54 ± 13% +45.4% 278.59 ± 12% iostat.md0.w/s >>> 6118 ± 3% +16.5% 7126 ± 2% iostat.md0.wkB/s >>> 151257 ± 2% -10.1% 135958 ± 2% meminfo.AnonHugePages >>> 46754 ± 3% +14.0% 53307 ± 3% meminfo.max_used_kB >>> 0.03 ± 62% -0.0 0.01 ± 78% mpstat.cpu.soft% >>> 1.73 ± 3% +0.4 2.13 ± 3% mpstat.cpu.usr% >>> 16062961 ± 2% -12.1% 14124403 ± 2% turbostat.IRQ >>> 0.76 ± 37% -71.8% 0.22 ± 83% turbostat.Pkg%pc6 >>> 9435 ± 7% -18.1% 7730 ± 4% turbostat.SMI >>> 6113 ± 3% +16.5% 7120 ± 2% vmstat.io.bo >>> 11293 ± 2% +12.3% 12688 ± 2% vmstat.system.cs >>> 81879 ± 2% +2.5% 83951 vmstat.system.in >>> 2584 -4.4% 2469 ± 2% proc-vmstat.nr_active_file >>> 2584 -4.4% 2469 ± 2% >>> proc-vmstat.nr_zone_active_file >>> 28564 ± 4% -23.6% 21817 ± 12% proc-vmstat.numa_hint_faults >>> 10958 ± 5% -43.9% 6147 ± 26% >>> proc-vmstat.numa_hint_faults_local >>> 660531 ± 3% -10.7% 590059 ± 2% proc-vmstat.pgfault >>> 1191 ± 7% -16.5% 995.25 ± 12% slabinfo.UNIX.active_objs >>> 1191 ± 7% -16.5% 995.25 ± 12% slabinfo.UNIX.num_objs >>> 10552 ± 4% -7.8% 9729 >>> slabinfo.ext4_io_end.active_objs >>> 10552 ± 4% -7.8% 9729 slabinfo.ext4_io_end.num_objs >>> 18395 +12.3% 20656 ± 8% >>> slabinfo.kmalloc-32.active_objs >>> 18502 ± 2% +12.3% 20787 ± 8% slabinfo.kmalloc-32.num_objs >>> 1.291e+12 -12.3% 1.131e+12 perf-stat.branch-instructions >>> 0.66 +0.1 0.76 ± 3% perf-stat.branch-miss-rate% >>> 1.118e+10 ± 4% -7.5% 1.034e+10 perf-stat.cache-misses >>> 2.772e+10 ± 8% -6.6% 2.589e+10 perf-stat.cache-references >>> 2214958 -3.6% 2136237 perf-stat.context-switches >>> 3.95 ± 2% -5.8% 3.72 perf-stat.cpi >>> 2.24e+13 -16.4% 1.873e+13 perf-stat.cpu-cycles >>> 1.542e+12 -10.4% 1.382e+12 perf-stat.dTLB-loads >>> 0.18 ± 6% +0.0 0.19 ± 4% >>> perf-stat.dTLB-store-miss-rate% >>> 5.667e+12 -11.3% 5.029e+12 perf-stat.instructions >>> 5534 -13.1% 4809 ± 6% >>> perf-stat.instructions-per-iTLB-miss >>> 0.25 ± 2% +6.1% 0.27 perf-stat.ipc >>> 647970 ± 2% -10.7% 578955 ± 2% perf-stat.minor-faults >>> 2.783e+09 ± 18% -17.8% 2.288e+09 ± 4% perf-stat.node-loads >>> 5.706e+09 ± 2% -5.2% 5.407e+09 perf-stat.node-store-misses >>> 7.693e+09 -4.4% 7.352e+09 perf-stat.node-stores >>> 647979 ± 2% -10.7% 578955 ± 2% perf-stat.page-faults >>> 70960 ± 16% -26.6% 52062 >>> sched_debug.cfs_rq:/.exec_clock.avg >>> 70628 ± 16% -26.7% 51787 >>> sched_debug.cfs_rq:/.exec_clock.min >>> 22499 ± 3% -10.5% 20133 ± 3% sched_debug.cfs_rq:/.load.avg >>> 7838 ± 23% -67.6% 2536 ± 81% sched_debug.cfs_rq:/.load.min >>> 362.19 ± 12% +58.3% 573.50 ± 25% >>> sched_debug.cfs_rq:/.load_avg.max >>> 3092960 ± 16% -28.5% 2211400 >>> sched_debug.cfs_rq:/.min_vruntime.avg >>> 3244162 ± 15% -27.0% 2367437 ± 2% >>> sched_debug.cfs_rq:/.min_vruntime.max >>> 2984299 ± 16% -28.9% 2121271 >>> sched_debug.cfs_rq:/.min_vruntime.min >>> 0.73 ± 4% -65.7% 0.25 ± 57% >>> sched_debug.cfs_rq:/.nr_running.min >>> 0.12 ± 13% +114.6% 0.26 ± 9% >>> sched_debug.cfs_rq:/.nr_running.stddev >>> 8.44 ± 23% -36.8% 5.33 ± 15% >>> sched_debug.cfs_rq:/.nr_spread_over.max >>> 1.49 ± 21% -29.6% 1.05 ± 7% >>> sched_debug.cfs_rq:/.nr_spread_over.stddev >>> 16.53 ± 20% -38.8% 10.12 ± 23% >>> sched_debug.cfs_rq:/.runnable_load_avg.avg >>> 15259 ± 7% -33.3% 10176 ± 22% >>> sched_debug.cfs_rq:/.runnable_weight.avg >>> 796.65 ± 93% -74.8% 200.68 ± 17% >>> sched_debug.cfs_rq:/.util_est_enqueued.avg >>> 669258 ± 3% -13.3% 580068 sched_debug.cpu.avg_idle.avg >>> 116020 ± 12% -21.4% 91239 sched_debug.cpu.clock.avg >>> 116076 ± 12% -21.4% 91261 sched_debug.cpu.clock.max >>> 115967 ± 12% -21.3% 91215 sched_debug.cpu.clock.min >>> 116020 ± 12% -21.4% 91239 >>> sched_debug.cpu.clock_task.avg >>> 116076 ± 12% -21.4% 91261 >>> sched_debug.cpu.clock_task.max >>> 115967 ± 12% -21.3% 91215 >>> sched_debug.cpu.clock_task.min >>> 15.41 ± 4% -32.0% 10.48 ± 24% >>> sched_debug.cpu.cpu_load[0].avg >>> 15.71 ± 6% -26.6% 11.53 ± 22% >>> sched_debug.cpu.cpu_load[1].avg >>> 16.20 ± 8% -22.9% 12.49 ± 21% >>> sched_debug.cpu.cpu_load[2].avg >>> 16.92 ± 7% -21.2% 13.33 ± 21% >>> sched_debug.cpu.cpu_load[3].avg >>> 2650 ± 6% -15.6% 2238 ± 3% sched_debug.cpu.curr->pid.avg >>> 1422 ± 8% -68.5% 447.42 ± 57% sched_debug.cpu.curr->pid.min >>> 7838 ± 23% -67.6% 2536 ± 81% sched_debug.cpu.load.min >>> 86066 ± 14% -26.3% 63437 >>> sched_debug.cpu.nr_load_updates.min >>> 3.97 ± 88% -70.9% 1.15 ± 10% >>> sched_debug.cpu.nr_running.avg >>> 0.73 ± 4% -65.7% 0.25 ± 57% >>> sched_debug.cpu.nr_running.min >>> 1126 ± 16% -27.6% 816.02 ± 9% >>> sched_debug.cpu.sched_count.stddev >>> 1468 ± 16% +31.1% 1925 ± 5% >>> sched_debug.cpu.sched_goidle.avg >>> 1115 ± 16% +37.8% 1538 ± 4% >>> sched_debug.cpu.sched_goidle.min >>> 3979 ± 13% -27.4% 2888 ± 5% >>> sched_debug.cpu.ttwu_local.max >>> 348.96 ± 8% -26.3% 257.16 ± 13% >>> sched_debug.cpu.ttwu_local.stddev >>> 115966 ± 12% -21.3% 91214 sched_debug.cpu_clk >>> 113505 ± 12% -21.8% 88773 sched_debug.ktime >>> 116416 ± 12% -21.3% 91663 sched_debug.sched_clk >>> 0.26 ±100% +0.3 0.57 ± 6% >>> perf-profile.calltrace.cycles-pp.security_file_permission.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe >>> 0.29 ±100% +0.4 0.66 ± 5% >>> perf-profile.calltrace.cycles-pp.find_get_entry.pagecache_get_page.f2fs_write_begin.generic_perform_write.__generic_file_write_iter >>> 0.67 ± 65% +0.4 1.11 >>> perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyin.iov_iter_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter >>> 0.69 ± 65% +0.5 1.14 >>> perf-profile.calltrace.cycles-pp.copyin.iov_iter_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter.f2fs_file_write_iter >>> 1.07 ± 57% +0.5 1.61 ± 5% >>> perf-profile.calltrace.cycles-pp.pagecache_get_page.f2fs_write_begin.generic_perform_write.__generic_file_write_iter.f2fs_file_write_iter >>> 0.79 ± 64% +0.5 1.33 >>> perf-profile.calltrace.cycles-pp.iov_iter_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter.f2fs_file_write_iter.__vfs_write >>> 0.73 ± 63% +0.6 1.32 ± 3% >>> perf-profile.calltrace.cycles-pp.syscall_return_via_sysret >>> 0.81 ± 63% +0.6 1.43 ± 3% >>> perf-profile.calltrace.cycles-pp.entry_SYSCALL_64 >>> 0.06 ± 58% +0.0 0.09 ± 4% >>> perf-profile.children.cycles-pp.__pagevec_lru_add_fn >>> 0.05 ± 58% +0.0 0.09 ± 13% >>> perf-profile.children.cycles-pp.down_write_trylock >>> 0.06 ± 58% +0.0 0.10 ± 4% >>> perf-profile.children.cycles-pp.__x64_sys_write >>> 0.07 ± 58% +0.0 0.11 ± 3% >>> perf-profile.children.cycles-pp.account_page_dirtied >>> 0.04 ± 57% +0.0 0.09 ± 5% >>> perf-profile.children.cycles-pp.account_page_cleaned >>> 0.06 ± 58% +0.0 0.10 ± 7% >>> perf-profile.children.cycles-pp.free_pcppages_bulk >>> 0.10 ± 58% +0.1 0.15 ± 6% >>> perf-profile.children.cycles-pp.page_mapping >>> 0.09 ± 57% +0.1 0.14 ± 7% >>> perf-profile.children.cycles-pp.__lru_cache_add >>> 0.10 ± 57% +0.1 0.15 ± 9% >>> perf-profile.children.cycles-pp.__might_sleep >>> 0.12 ± 58% +0.1 0.19 ± 3% >>> perf-profile.children.cycles-pp.set_page_dirty >>> 0.08 ± 64% +0.1 0.15 ± 10% >>> perf-profile.children.cycles-pp.dquot_claim_space_nodirty >>> 0.06 ± 61% +0.1 0.13 ± 5% >>> perf-profile.children.cycles-pp.percpu_counter_add_batch >>> 0.18 ± 57% +0.1 0.27 ± 2% >>> perf-profile.children.cycles-pp.iov_iter_fault_in_readable >>> 0.17 ± 57% +0.1 0.26 ± 2% >>> perf-profile.children.cycles-pp.__set_page_dirty_nobuffers >>> 0.09 ± 57% +0.1 0.18 ± 27% >>> perf-profile.children.cycles-pp.free_unref_page_list >>> 0.16 ± 58% +0.1 0.30 ± 18% >>> perf-profile.children.cycles-pp.__pagevec_release >>> 0.30 ± 57% +0.1 0.43 ± 5% >>> perf-profile.children.cycles-pp.add_to_page_cache_lru >>> 0.17 ± 58% +0.1 0.31 ± 16% >>> perf-profile.children.cycles-pp.release_pages >>> 0.29 ± 58% +0.2 0.45 ± 7% >>> perf-profile.children.cycles-pp.selinux_file_permission >>> 0.38 ± 57% +0.2 0.58 ± 6% >>> perf-profile.children.cycles-pp.security_file_permission >>> 0.78 ± 57% +0.3 1.12 >>> perf-profile.children.cycles-pp.copy_user_enhanced_fast_string >>> 0.80 ± 57% +0.3 1.15 >>> perf-profile.children.cycles-pp.copyin >>> 0.92 ± 57% +0.4 1.34 >>> perf-profile.children.cycles-pp.iov_iter_copy_from_user_atomic >>> 0.98 ± 54% +0.5 1.43 ± 3% >>> perf-profile.children.cycles-pp.entry_SYSCALL_64 >>> 0.98 ± 53% +0.5 1.50 ± 3% >>> perf-profile.children.cycles-pp.syscall_return_via_sysret >>> 1.64 ± 57% +0.8 2.45 ± 5% >>> perf-profile.children.cycles-pp.pagecache_get_page >>> 0.04 ± 57% +0.0 0.06 >>> perf-profile.self.cycles-pp.__pagevec_lru_add_fn >>> 0.04 ± 58% +0.0 0.07 ± 7% >>> perf-profile.self.cycles-pp.release_pages >>> 0.05 ± 58% +0.0 0.08 ± 15% >>> perf-profile.self.cycles-pp._cond_resched >>> 0.04 ± 58% +0.0 0.08 ± 6% >>> perf-profile.self.cycles-pp.ksys_write >>> 0.05 ± 58% +0.0 0.09 ± 13% >>> perf-profile.self.cycles-pp.down_write_trylock >>> 0.09 ± 58% +0.1 0.14 ± 9% >>> perf-profile.self.cycles-pp.page_mapping >>> 0.01 ±173% +0.1 0.07 ± 7% >>> perf-profile.self.cycles-pp.__fdget_pos >>> 0.11 ± 57% +0.1 0.17 ± 7% >>> perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe >>> 0.05 ± 59% +0.1 0.12 ± 5% >>> perf-profile.self.cycles-pp.percpu_counter_add_batch >>> 0.12 ± 58% +0.1 0.19 ± 4% >>> perf-profile.self.cycles-pp.iov_iter_copy_from_user_atomic >>> 0.17 ± 57% +0.1 0.24 ± 4% >>> perf-profile.self.cycles-pp.generic_perform_write >>> 0.17 ± 58% +0.1 0.26 ± 2% >>> perf-profile.self.cycles-pp.iov_iter_fault_in_readable >>> 0.19 ± 57% +0.1 0.30 ± 2% >>> perf-profile.self.cycles-pp.f2fs_set_data_page_dirty >>> 0.18 ± 58% +0.1 0.30 ± 4% >>> perf-profile.self.cycles-pp.pagecache_get_page >>> 0.27 ± 57% +0.1 0.41 ± 4% >>> perf-profile.self.cycles-pp.do_syscall_64 >>> 0.40 ± 57% +0.2 0.62 ± 5% >>> perf-profile.self.cycles-pp.find_get_entry >>> 0.77 ± 57% +0.3 1.11 >>> perf-profile.self.cycles-pp.copy_user_enhanced_fast_string >>> 0.96 ± 54% +0.5 1.43 ± 3% >>> perf-profile.self.cycles-pp.entry_SYSCALL_64 >>> 0.98 ± 53% +0.5 1.50 ± 2% >>> perf-profile.self.cycles-pp.syscall_return_via_sysret >>> 0.72 ± 59% +0.5 1.26 ± 10% >>> perf-profile.self.cycles-pp.f2fs_lookup_extent_cache >>> >>> >>> >>> >>> aim7.jobs-per-min >>> >>> >>> 114000 >>> +-+----------------------------------------------------------------+ >>> 112000 +-+ O >>> | >>> O O O O O O O O O >>> | >>> 110000 +-+ O O O O O O O >>> | >>> 108000 +-+ >>> | >>> | O O O O O >>> | >>> 106000 +-+O >>> | >>> 104000 +-+ >>> | >>> 102000 +-+ >>> | >>> | >>> | >>> 100000 +-+ >>> | >>> 98000 +-+ >>> | >>> |.. .+..+.+.. .+.. .+.. .+..+..+.+.. .+..+.+..+..+.+.. +.. >>> | >>> 96000 +-++ .+. + + + + >>> +.+..| >>> 94000 >>> +-+----------------------------------------------------------------+ >>> >>> >>> >>> >>> >>> aim7.time.system_time >>> >>> >>> 7200 >>> +-+------------------------------------------------------------------+ >>> | >>> | >>> 7000 +-+ .+.. +.. .+.. >>> | >>> | .+. .+ +.. + .+. .+. .+. .+. .+ >>> .+.+..| >>> 6800 +-+ +..+. + +. +..+. +. +..+. +..+. +. >>> | >>> | >>> | >>> 6600 +-+ >>> | >>> | >>> | >>> 6400 +-+ >>> | >>> | O >>> | >>> 6200 +-+ >>> | >>> | O O O O O >>> | >>> 6000 +-+ O O O >>> | >>> O O O O O O O O O O O >>> | >>> 5800 >>> +-+-----O---------------O-------------------------O------------------+ >>> >>> >>> >>> >>> >>> aim7.time.elapsed_time >>> >>> >>> 205 >>> +-+-------------------------------------------------------------------+ >>> | >>> :: | >>> 200 +-+ : >>> : | >>> 195 +-+ : >>> :| >>> | .+.. +.. : >>> :| >>> 190 +-++. .+ +.. .+. .+.. .+.. .+.. .+.. + .+ >>> | >>> 185 +-+ +..+. +. +. +.+. + +..+ +..+..+ +. >>> | >>> | >>> | >>> 180 +-+ >>> | >>> 175 +-+ >>> | >>> | O >>> | >>> 170 +-+ O O O >>> | >>> 165 +-+ O O O >>> | >>> O O O O O O O O O O O O O O O >>> | >>> 160 >>> +-+-----O-------------------------------------------------------------+ >>> >>> >>> >>> >>> >>> aim7.time.elapsed_time.max >>> >>> >>> 205 >>> +-+-------------------------------------------------------------------+ >>> | >>> :: | >>> 200 +-+ : >>> : | >>> 195 +-+ : >>> :| >>> | .+.. +.. : >>> :| >>> 190 +-++. .+ +.. .+. .+.. .+.. .+.. .+.. + .+ >>> | >>> 185 +-+ +..+. +. +. +.+. + +..+ +..+..+ +. >>> | >>> | >>> | >>> 180 +-+ >>> | >>> 175 +-+ >>> | >>> | O >>> | >>> 170 +-+ O O O >>> | >>> 165 +-+ O O O >>> | >>> O O O O O O O O O O O O O O O >>> | >>> 160 >>> +-+-----O-------------------------------------------------------------+ >>> >>> >>> >>> >>> >>> aim7.time.involuntary_context_switches >>> >>> >>> 1.15e+06 >>> +-+--------------------------------------------------------------+ >>> | +.. >>> + | >>> 1.1e+06 +-++ .+.. .+.. + .+.. .+. .+ .+.. .+. : >>> +| >>> |. + .+ + + + .+. +. + .+ +.+. +..+ : >>> | >>> | +. + +. + : >>> | >>> 1.05e+06 +-+ + >>> | >>> | >>> | >>> 1e+06 +-+ >>> | >>> | >>> | >>> 950000 +-+ >>> | >>> | O >>> | >>> O O O O O O O O O >>> | >>> 900000 +-+ O O O O O O O O O O O >>> | >>> | O O >>> | >>> 850000 >>> +-+--------------------------------------------------------------+ >>> >>> >>> >>> >>> [*] bisect-good sample >>> [O] bisect-bad sample >>> >>> *************************************************************************************************** >>> lkp-ivb-ep01: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with >>> 384G memory >>> ========================================================================================= >>> compiler/cpufreq_governor/disk/fs/kconfig/load/md/rootfs/tbox_group/test/testcase: >>> >>> gcc-7/performance/4BRD_12G/f2fs/x86_64-rhel-7.2/3000/RAID1/debian-x86_64-2018-04-03.cgz/lkp-ivb-ep01/disk_rr/aim7 >>> >>> commit: >>> d6c66cd19e ("f2fs: fix count of seg_freed to make sec_freed correct") >>> 089842de57 ("f2fs: remove codes of unused wio_mutex") >>> >>> d6c66cd19ef322fe 089842de5750f434aa016eb23f >>> ---------------- -------------------------- >>> fail:runs %reproduction fail:runs >>> | | | >>> :4 50% 2:4 >>> dmesg.WARNING:at#for_ip_interrupt_entry/0x >>> :4 25% 1:4 >>> kmsg.DHCP/BOOTP:Reply_not_for_us_on_eth#,op[#]xid[#] >>> :4 25% 1:4 >>> kmsg.IP-Config:Reopening_network_devices >>> %stddev %change %stddev >>> \ | \ >>> 102582 +8.8% 111626 aim7.jobs-per-min >>> 176.57 -8.5% 161.64 aim7.time.elapsed_time >>> 176.57 -8.5% 161.64 aim7.time.elapsed_time.max >>> 1060618 -12.5% 927723 >>> aim7.time.involuntary_context_switches >>> 6408 -8.9% 5839 aim7.time.system_time >>> 785554 +4.5% 820987 >>> aim7.time.voluntary_context_switches >>> 1077477 -9.5% 975130 ± 2% softirqs.RCU >>> 184.77 ± 6% +41.2% 260.90 ± 11% iostat.md0.w/s >>> 6609 ± 2% +9.6% 7246 iostat.md0.wkB/s >>> 0.00 ± 94% +0.0 0.02 ± 28% mpstat.cpu.soft% >>> 1.89 ± 4% +0.3 2.15 ± 3% mpstat.cpu.usr% >>> 6546 ± 19% -49.1% 3328 ± 63% >>> numa-numastat.node0.other_node >>> 1470 ± 86% +222.9% 4749 ± 45% >>> numa-numastat.node1.other_node >>> 959.75 ± 8% +16.8% 1120 ± 7% slabinfo.UNIX.active_objs >>> 959.75 ± 8% +16.8% 1120 ± 7% slabinfo.UNIX.num_objs >>> 38.35 +3.2% 39.57 ± 2% turbostat.RAMWatt >>> 8800 ± 2% -10.7% 7855 ± 3% turbostat.SMI >>> 103925 ± 27% -59.5% 42134 ± 61% >>> numa-meminfo.node0.AnonHugePages >>> 14267 ± 61% -54.9% 6430 ± 76% >>> numa-meminfo.node0.Inactive(anon) >>> 52220 ± 18% +104.0% 106522 ± 40% >>> numa-meminfo.node1.AnonHugePages >>> 6614 ± 2% +9.6% 7248 vmstat.io.bo >>> 316.00 ± 2% -15.4% 267.25 ± 8% vmstat.procs.r >>> 12256 ± 2% +6.9% 13098 vmstat.system.cs >>> 2852 ± 3% +12.5% 3208 ± 3% >>> numa-vmstat.node0.nr_active_file >>> 3566 ± 61% -54.9% 1607 ± 76% >>> numa-vmstat.node0.nr_inactive_anon >>> 2852 ± 3% +12.4% 3207 ± 3% >>> numa-vmstat.node0.nr_zone_active_file >>> 3566 ± 61% -54.9% 1607 ± 76% >>> numa-vmstat.node0.nr_zone_inactive_anon >>> 95337 +2.3% 97499 proc-vmstat.nr_active_anon >>> 5746 ± 2% +4.3% 5990 proc-vmstat.nr_active_file >>> 89732 +2.0% 91532 proc-vmstat.nr_anon_pages >>> 95337 +2.3% 97499 >>> proc-vmstat.nr_zone_active_anon >>> 5746 ± 2% +4.3% 5990 >>> proc-vmstat.nr_zone_active_file >>> 10407 ± 4% -49.3% 5274 ± 52% >>> proc-vmstat.numa_hint_faults_local >>> 615058 -6.0% 578344 ± 2% proc-vmstat.pgfault >>> 1.187e+12 -8.7% 1.084e+12 perf-stat.branch-instructions >>> 0.65 ± 3% +0.0 0.70 ± 2% perf-stat.branch-miss-rate% >>> 2219706 -2.5% 2164425 perf-stat.context-switches >>> 2.071e+13 -10.0% 1.864e+13 perf-stat.cpu-cycles >>> 641874 -2.7% 624703 perf-stat.cpu-migrations >>> 1.408e+12 -7.3% 1.305e+12 perf-stat.dTLB-loads >>> 39182891 ± 4% +796.4% 3.512e+08 ±150% perf-stat.iTLB-loads >>> 5.184e+12 -8.0% 4.77e+12 perf-stat.instructions >>> 5035 ± 2% -14.1% 4325 ± 13% >>> perf-stat.instructions-per-iTLB-miss >>> 604219 -6.2% 566725 perf-stat.minor-faults >>> 4.962e+09 -2.7% 4.827e+09 perf-stat.node-stores >>> 604097 -6.2% 566730 perf-stat.page-faults >>> 110.81 ± 13% +25.7% 139.25 ± 8% >>> sched_debug.cfs_rq:/.load_avg.stddev >>> 12.76 ± 74% +114.6% 27.39 ± 38% >>> sched_debug.cfs_rq:/.removed.load_avg.avg >>> 54.23 ± 62% +66.2% 90.10 ± 17% >>> sched_debug.cfs_rq:/.removed.load_avg.stddev >>> 585.18 ± 74% +115.8% 1262 ± 38% >>> sched_debug.cfs_rq:/.removed.runnable_sum.avg >>> 2489 ± 62% +66.9% 4153 ± 17% >>> sched_debug.cfs_rq:/.removed.runnable_sum.stddev >>> 11909 ± 10% +44.7% 17229 ± 18% >>> sched_debug.cfs_rq:/.runnable_weight.avg >>> 1401 ± 2% +36.5% 1913 ± 5% >>> sched_debug.cpu.sched_goidle.avg >>> 2350 ± 2% +21.9% 2863 ± 5% >>> sched_debug.cpu.sched_goidle.max >>> 1082 ± 5% +39.2% 1506 ± 4% >>> sched_debug.cpu.sched_goidle.min >>> 7327 +14.7% 8401 ± 2% >>> sched_debug.cpu.ttwu_count.avg >>> 5719 ± 3% +18.3% 6767 ± 2% >>> sched_debug.cpu.ttwu_count.min >>> 1518 ± 3% +15.6% 1755 ± 3% >>> sched_debug.cpu.ttwu_local.min >>> 88.70 -1.0 87.65 >>> perf-profile.calltrace.cycles-pp.generic_perform_write.__generic_file_write_iter.f2fs_file_write_iter.__vfs_write.vfs_write >>> 54.51 -1.0 53.48 >>> perf-profile.calltrace.cycles-pp._raw_spin_lock.f2fs_inode_dirtied.f2fs_mark_inode_dirty_sync.f2fs_write_end.generic_perform_write >>> 54.55 -1.0 53.53 >>> perf-profile.calltrace.cycles-pp.f2fs_mark_inode_dirty_sync.f2fs_write_end.generic_perform_write.__generic_file_write_iter.f2fs_file_write_iter >>> 56.32 -1.0 55.30 >>> perf-profile.calltrace.cycles-pp.f2fs_write_end.generic_perform_write.__generic_file_write_iter.f2fs_file_write_iter.__vfs_write >>> 54.54 -1.0 53.53 >>> perf-profile.calltrace.cycles-pp.f2fs_inode_dirtied.f2fs_mark_inode_dirty_sync.f2fs_write_end.generic_perform_write.__generic_file_write_iter >>> 88.93 -1.0 87.96 >>> perf-profile.calltrace.cycles-pp.__generic_file_write_iter.f2fs_file_write_iter.__vfs_write.vfs_write.ksys_write >>> 89.94 -0.8 89.14 >>> perf-profile.calltrace.cycles-pp.f2fs_file_write_iter.__vfs_write.vfs_write.ksys_write.do_syscall_64 >>> 90.01 -0.8 89.26 >>> perf-profile.calltrace.cycles-pp.__vfs_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe >>> 90.72 -0.7 90.00 >>> perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe >>> 90.59 -0.7 89.87 >>> perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe >>> 13.32 -0.3 13.01 >>> perf-profile.calltrace.cycles-pp._raw_spin_lock.f2fs_inode_dirtied.f2fs_mark_inode_dirty_sync.f2fs_reserve_new_blocks.f2fs_reserve_block >>> 13.33 -0.3 13.01 >>> perf-profile.calltrace.cycles-pp.f2fs_inode_dirtied.f2fs_mark_inode_dirty_sync.f2fs_reserve_new_blocks.f2fs_reserve_block.f2fs_get_block >>> 13.33 -0.3 13.01 >>> perf-profile.calltrace.cycles-pp.f2fs_mark_inode_dirty_sync.f2fs_reserve_new_blocks.f2fs_reserve_block.f2fs_get_block.f2fs_write_begin >>> 13.26 -0.3 12.94 >>> perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.f2fs_inode_dirtied.f2fs_mark_inode_dirty_sync.f2fs_reserve_new_blocks >>> 1.30 ± 2% +0.1 1.40 ± 2% >>> perf-profile.calltrace.cycles-pp.entry_SYSCALL_64 >>> 2.20 ± 6% +0.2 2.40 ± 3% >>> perf-profile.calltrace.cycles-pp.generic_file_read_iter.__vfs_read.vfs_read.ksys_read.do_syscall_64 >>> 2.28 ± 5% +0.2 2.52 ± 5% >>> perf-profile.calltrace.cycles-pp.__vfs_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe >>> 2.85 ± 4% +0.3 3.16 ± 5% >>> perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe >>> 2.97 ± 4% +0.3 3.31 ± 5% >>> perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe >>> 88.74 -1.0 87.70 >>> perf-profile.children.cycles-pp.generic_perform_write >>> 56.33 -1.0 55.31 >>> perf-profile.children.cycles-pp.f2fs_write_end >>> 88.95 -1.0 87.98 >>> perf-profile.children.cycles-pp.__generic_file_write_iter >>> 89.95 -0.8 89.15 >>> perf-profile.children.cycles-pp.f2fs_file_write_iter >>> 90.03 -0.8 89.28 >>> perf-profile.children.cycles-pp.__vfs_write >>> 90.73 -0.7 90.02 >>> perf-profile.children.cycles-pp.ksys_write >>> 90.60 -0.7 89.89 >>> perf-profile.children.cycles-pp.vfs_write >>> 0.22 ± 5% -0.1 0.17 ± 19% >>> perf-profile.children.cycles-pp.f2fs_invalidate_page >>> 0.08 ± 10% +0.0 0.10 ± 5% >>> perf-profile.children.cycles-pp.page_mapping >>> 0.09 +0.0 0.11 ± 7% >>> perf-profile.children.cycles-pp.__cancel_dirty_page >>> 0.06 ± 6% +0.0 0.09 ± 28% >>> perf-profile.children.cycles-pp.read_node_page >>> 0.10 ± 4% +0.0 0.14 ± 14% >>> perf-profile.children.cycles-pp.current_time >>> 0.07 ± 12% +0.0 0.11 ± 9% >>> perf-profile.children.cycles-pp.percpu_counter_add_batch >>> 0.00 +0.1 0.05 >>> perf-profile.children.cycles-pp.__x64_sys_write >>> 0.38 ± 3% +0.1 0.43 ± 5% >>> perf-profile.children.cycles-pp.selinux_file_permission >>> 0.55 ± 4% +0.1 0.61 ± 4% >>> perf-profile.children.cycles-pp.security_file_permission >>> 1.30 +0.1 1.40 ± 2% >>> perf-profile.children.cycles-pp.entry_SYSCALL_64 >>> 2.21 ± 6% +0.2 2.41 ± 3% >>> perf-profile.children.cycles-pp.generic_file_read_iter >>> 2.29 ± 6% +0.2 2.53 ± 5% >>> perf-profile.children.cycles-pp.__vfs_read >>> 2.86 ± 4% +0.3 3.18 ± 5% >>> perf-profile.children.cycles-pp.vfs_read >>> 2.99 ± 4% +0.3 3.32 ± 5% >>> perf-profile.children.cycles-pp.ksys_read >>> 0.37 -0.1 0.24 ± 23% >>> perf-profile.self.cycles-pp.__get_node_page >>> 0.21 ± 3% -0.1 0.15 ± 16% >>> perf-profile.self.cycles-pp.f2fs_invalidate_page >>> 0.07 ± 5% +0.0 0.09 ± 11% >>> perf-profile.self.cycles-pp.page_mapping >>> 0.06 ± 11% +0.0 0.08 ± 8% >>> perf-profile.self.cycles-pp.vfs_read >>> 0.07 ± 7% +0.0 0.10 ± 21% >>> perf-profile.self.cycles-pp.__generic_file_write_iter >>> 0.06 ± 14% +0.0 0.10 ± 10% >>> perf-profile.self.cycles-pp.percpu_counter_add_batch >>> 0.20 ± 11% +0.0 0.25 ± 12% >>> perf-profile.self.cycles-pp.selinux_file_permission >>> 0.05 ± 8% +0.1 0.11 ± 52% >>> perf-profile.self.cycles-pp.__vfs_read >>> 0.33 ± 9% +0.1 0.41 ± 9% >>> perf-profile.self.cycles-pp.f2fs_lookup_extent_cache >>> 1.30 +0.1 1.40 ± 2% >>> perf-profile.self.cycles-pp.entry_SYSCALL_64 >>> >>> >>> >>> >>> >>> Disclaimer: >>> Results have been estimated based on internal Intel analysis and are >>> provided >>> for informational purposes only. Any difference in system hardware or >>> software >>> design or configuration may affect actual performance. >>> >>> >>> Thanks, >>> Rong Chen >>> >> _______________________________________________ >> LKP mailing list >> l...@lists.01.org >> https://lists.01.org/mailman/listinfo/lkp > > > . >