On Fri, Apr 24, 2015 at 12:15:59PM +1000, NeilBrown wrote: > On Thu, 23 Apr 2015 14:55:59 +0800 Huang Ying <ying.hu...@intel.com> wrote: > > > FYI, we noticed the below changes on > > > > git://neil.brown.name/md for-next > > commit 878ee6792799e2f88bdcac329845efadb205252f ("RAID5: batch adjacent > > full stripe write") > > Hi, > is there any chance that you could explain what some of this means? > There is lots of data and some very pretty graphs, but no explanation.
Hi Neil, (Sorry for late response: Ying is on vacation) I guess you can simply ignore this report, as I already reported to you month ago that this patch made fsmark performs better in most cases: https://lists.01.org/pipermail/lkp/2015-March/002411.html > > Which numbers are "good", which are "bad"? Which is "worst". > What do the graphs really show? and what would we like to see in them? > > I think it is really great that you are doing this testing and reporting the > results. It's just so sad that I completely fail to understand them. Sorry, it's our bad to make them hard to understand as well as to report a duplicate one(well, the commit hash is different ;). We might need take some time to make those data understood easier. --yliu > > > > > > > testbox/testcase/testparams: > > lkp-st02/dd-write/300-5m-11HDD-RAID5-cfq-xfs-1dd > > > > a87d7f782b47e030 878ee6792799e2f88bdcac3298 > > ---------------- -------------------------- > > %stddev %change %stddev > > \ | \ > > 59035 ± 0% +18.4% 69913 ± 1% softirqs.SCHED > > 1330 ± 10% +17.4% 1561 ± 4% slabinfo.kmalloc-512.num_objs > > 1330 ± 10% +17.4% 1561 ± 4% > > slabinfo.kmalloc-512.active_objs > > 305908 ± 0% -1.8% 300427 ± 0% vmstat.io.bo > > 1 ± 0% +100.0% 2 ± 0% vmstat.procs.r > > 8266 ± 1% -15.7% 6968 ± 0% vmstat.system.cs > > 14819 ± 0% -2.1% 14503 ± 0% vmstat.system.in > > 18.20 ± 6% +10.2% 20.05 ± 4% > > perf-profile.cpu-cycles.raid_run_ops.handle_stripe.handle_active_stripes.raid5d.md_thread > > 1.94 ± 9% +90.6% 3.70 ± 9% > > perf-profile.cpu-cycles.async_xor.raid_run_ops.handle_stripe.handle_active_stripes.raid5d > > 0.00 ± 0% +Inf% 25.18 ± 3% > > perf-profile.cpu-cycles.handle_active_stripes.isra.45.raid5d.md_thread.kthread.ret_from_fork > > 0.00 ± 0% +Inf% 14.14 ± 4% > > perf-profile.cpu-cycles.async_copy_data.isra.42.raid_run_ops.handle_stripe.handle_active_stripes.raid5d > > 1.79 ± 7% +102.9% 3.64 ± 9% > > perf-profile.cpu-cycles.xor_blocks.async_xor.raid_run_ops.handle_stripe.handle_active_stripes > > 3.09 ± 4% -10.8% 2.76 ± 4% > > perf-profile.cpu-cycles.get_active_stripe.make_request.md_make_request.generic_make_request.submit_bio > > 0.80 ± 14% +28.1% 1.02 ± 10% > > perf-profile.cpu-cycles.mutex_lock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write > > 14.78 ± 6% -100.0% 0.00 ± 0% > > perf-profile.cpu-cycles.async_copy_data.isra.38.raid_run_ops.handle_stripe.handle_active_stripes.raid5d > > 25.68 ± 4% -100.0% 0.00 ± 0% > > perf-profile.cpu-cycles.handle_active_stripes.isra.41.raid5d.md_thread.kthread.ret_from_fork > > 1.23 ± 5% +140.0% 2.96 ± 7% > > perf-profile.cpu-cycles.xor_sse_5_pf64.xor_blocks.async_xor.raid_run_ops.handle_stripe > > 2.62 ± 6% -95.6% 0.12 ± 33% > > perf-profile.cpu-cycles.analyse_stripe.handle_stripe.handle_active_stripes.raid5d.md_thread > > 0.96 ± 9% +17.5% 1.12 ± 2% > > perf-profile.cpu-cycles.xfs_ilock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write > > 1.461e+10 ± 0% -5.3% 1.384e+10 ± 1% > > perf-stat.L1-dcache-load-misses > > 3.688e+11 ± 0% -2.7% 3.59e+11 ± 0% perf-stat.L1-dcache-loads > > 1.124e+09 ± 0% -27.7% 8.125e+08 ± 0% perf-stat.L1-dcache-prefetches > > 2.767e+10 ± 0% -1.8% 2.717e+10 ± 0% > > perf-stat.L1-dcache-store-misses > > 2.352e+11 ± 0% -2.8% 2.287e+11 ± 0% perf-stat.L1-dcache-stores > > 6.774e+09 ± 0% -2.3% 6.62e+09 ± 0% > > perf-stat.L1-icache-load-misses > > 5.571e+08 ± 0% +40.5% 7.826e+08 ± 1% perf-stat.LLC-load-misses > > 6.263e+09 ± 0% -13.7% 5.407e+09 ± 1% perf-stat.LLC-loads > > 1.914e+11 ± 0% -4.2% 1.833e+11 ± 0% perf-stat.branch-instructions > > 1.145e+09 ± 2% -5.6% 1.081e+09 ± 0% perf-stat.branch-load-misses > > 1.911e+11 ± 0% -4.3% 1.829e+11 ± 0% perf-stat.branch-loads > > 1.142e+09 ± 2% -5.1% 1.083e+09 ± 0% perf-stat.branch-misses > > 1.218e+09 ± 0% +19.8% 1.46e+09 ± 0% perf-stat.cache-misses > > 2.118e+10 ± 0% -5.2% 2.007e+10 ± 0% perf-stat.cache-references > > 2510308 ± 1% -15.7% 2115410 ± 0% perf-stat.context-switches > > 39623 ± 0% +22.1% 48370 ± 1% perf-stat.cpu-migrations > > 4.179e+08 ± 40% +165.7% 1.111e+09 ± 35% perf-stat.dTLB-load-misses > > 3.684e+11 ± 0% -2.5% 3.592e+11 ± 0% perf-stat.dTLB-loads > > 1.232e+08 ± 15% +62.5% 2.002e+08 ± 27% perf-stat.dTLB-store-misses > > 2.348e+11 ± 0% -2.5% 2.288e+11 ± 0% perf-stat.dTLB-stores > > 3577297 ± 2% +8.7% 3888986 ± 1% perf-stat.iTLB-load-misses > > 1.035e+12 ± 0% -3.5% 9.988e+11 ± 0% perf-stat.iTLB-loads > > 1.036e+12 ± 0% -3.7% 9.978e+11 ± 0% perf-stat.instructions > > 594 ± 30% +130.3% 1369 ± 13% > > sched_debug.cfs_rq[0]:/.blocked_load_avg > > 17 ± 10% -28.2% 12 ± 23% > > sched_debug.cfs_rq[0]:/.nr_spread_over > > 210 ± 21% +42.1% 298 ± 28% > > sched_debug.cfs_rq[0]:/.tg_runnable_contrib > > 9676 ± 21% +42.1% 13754 ± 28% > > sched_debug.cfs_rq[0]:/.avg->runnable_avg_sum > > 772 ± 25% +116.5% 1672 ± 9% > > sched_debug.cfs_rq[0]:/.tg_load_contrib > > 8402 ± 9% +83.3% 15405 ± 11% > > sched_debug.cfs_rq[0]:/.tg_load_avg > > 8356 ± 9% +82.8% 15272 ± 11% > > sched_debug.cfs_rq[1]:/.tg_load_avg > > 968 ± 25% +100.8% 1943 ± 14% > > sched_debug.cfs_rq[1]:/.blocked_load_avg > > 16242 ± 9% -22.2% 12643 ± 14% > > sched_debug.cfs_rq[1]:/.avg->runnable_avg_sum > > 353 ± 9% -22.1% 275 ± 14% > > sched_debug.cfs_rq[1]:/.tg_runnable_contrib > > 1183 ± 23% +77.7% 2102 ± 12% > > sched_debug.cfs_rq[1]:/.tg_load_contrib > > 181 ± 8% -31.4% 124 ± 26% > > sched_debug.cfs_rq[2]:/.tg_runnable_contrib > > 8364 ± 8% -31.3% 5745 ± 26% > > sched_debug.cfs_rq[2]:/.avg->runnable_avg_sum > > 8297 ± 9% +81.7% 15079 ± 12% > > sched_debug.cfs_rq[2]:/.tg_load_avg > > 30439 ± 13% -45.2% 16681 ± 26% > > sched_debug.cfs_rq[2]:/.exec_clock > > 39735 ± 14% -48.3% 20545 ± 29% > > sched_debug.cfs_rq[2]:/.min_vruntime > > 8231 ± 10% +82.2% 15000 ± 12% > > sched_debug.cfs_rq[3]:/.tg_load_avg > > 1210 ± 14% +110.3% 2546 ± 30% > > sched_debug.cfs_rq[4]:/.tg_load_contrib > > 8188 ± 10% +82.8% 14964 ± 12% > > sched_debug.cfs_rq[4]:/.tg_load_avg > > 8132 ± 10% +83.1% 14890 ± 12% > > sched_debug.cfs_rq[5]:/.tg_load_avg > > 749 ± 29% +205.9% 2292 ± 34% > > sched_debug.cfs_rq[5]:/.blocked_load_avg > > 963 ± 30% +169.9% 2599 ± 33% > > sched_debug.cfs_rq[5]:/.tg_load_contrib > > 37791 ± 32% -38.6% 23209 ± 13% > > sched_debug.cfs_rq[6]:/.min_vruntime > > 693 ± 25% +132.2% 1609 ± 29% > > sched_debug.cfs_rq[6]:/.blocked_load_avg > > 10838 ± 13% -39.2% 6587 ± 13% > > sched_debug.cfs_rq[6]:/.avg->runnable_avg_sum > > 29329 ± 27% -33.2% 19577 ± 10% > > sched_debug.cfs_rq[6]:/.exec_clock > > 235 ± 14% -39.7% 142 ± 14% > > sched_debug.cfs_rq[6]:/.tg_runnable_contrib > > 8085 ± 10% +83.6% 14848 ± 12% > > sched_debug.cfs_rq[6]:/.tg_load_avg > > 839 ± 25% +128.5% 1917 ± 18% > > sched_debug.cfs_rq[6]:/.tg_load_contrib > > 8051 ± 10% +83.6% 14779 ± 12% > > sched_debug.cfs_rq[7]:/.tg_load_avg > > 156 ± 34% +97.9% 309 ± 19% sched_debug.cpu#0.cpu_load[4] > > 160 ± 25% +64.0% 263 ± 16% sched_debug.cpu#0.cpu_load[2] > > 156 ± 32% +83.7% 286 ± 17% sched_debug.cpu#0.cpu_load[3] > > 164 ± 20% -35.1% 106 ± 31% sched_debug.cpu#2.cpu_load[0] > > 249 ± 15% +80.2% 449 ± 10% sched_debug.cpu#4.cpu_load[3] > > 231 ± 11% +101.2% 466 ± 13% sched_debug.cpu#4.cpu_load[2] > > 217 ± 14% +189.9% 630 ± 38% sched_debug.cpu#4.cpu_load[0] > > 71951 ± 5% +21.6% 87526 ± 7% > > sched_debug.cpu#4.nr_load_updates > > 214 ± 8% +146.1% 527 ± 27% sched_debug.cpu#4.cpu_load[1] > > 256 ± 17% +75.7% 449 ± 13% sched_debug.cpu#4.cpu_load[4] > > 209 ± 23% +98.3% 416 ± 48% sched_debug.cpu#5.cpu_load[2] > > 68024 ± 2% +18.8% 80825 ± 1% > > sched_debug.cpu#5.nr_load_updates > > 217 ± 26% +74.9% 380 ± 45% sched_debug.cpu#5.cpu_load[3] > > 852 ± 21% -38.3% 526 ± 22% sched_debug.cpu#6.curr->pid > > > > lkp-st02: Core2 > > Memory: 8G > > > > > > > > > > perf-stat.cache-misses > > > > 1.6e+09 > > O+-----O--O---O--O---O--------------------------------------------+ > > | O O O O O O O O O O > > | > > 1.4e+09 ++ > > | > > 1.2e+09 *+.*...* *..* * > > *...*..*...*..*...*..*...*..*...*..* > > | : : : : : > > | > > 1e+09 ++ : : : : : : > > | > > | : : : : : : > > | > > 8e+08 ++ : : : : : : > > | > > | : : : : : : > > | > > 6e+08 ++ : : : : : : > > | > > 4e+08 ++ : : : : : : > > | > > | : : : : : : > > | > > 2e+08 ++ : : : : : : > > | > > | : : : > > | > > 0 > > ++-O------*----------*------*-------------------------------------+ > > > > > > perf-stat.L1-dcache-prefetches > > > > 1.2e+09 > > ++----------------------------------------------------------------+ > > *..*...* *..* * ..*.. > > ..*..*...*..*...*..*...*..* > > 1e+09 ++ : : : : *. *. > > | > > | : : : :: : > > | > > | : : : : : : O > > | > > 8e+08 O+ O: O :O O: O :O: O :O O O O O O O > > | > > | : : : : : : > > | > > 6e+08 ++ : : : : : : > > | > > | : : : : : : > > | > > 4e+08 ++ : : : : : : > > | > > | : : : : : : > > | > > | : : : : : : > > | > > 2e+08 ++ :: :: : : > > | > > | : : : > > | > > 0 > > ++-O------*----------*------*-------------------------------------+ > > > > > > perf-stat.LLC-load-misses > > > > 1e+09 > > ++------------------------------------------------------------------+ > > 9e+08 O+ O O O O O > > | > > | O O O O > > | > > 8e+08 ++ O O O O O O > > | > > 7e+08 ++ > > | > > | > > | > > 6e+08 *+..*..* *...* * > > *...*..*...*...*..*...*..*...*..*...* > > 5e+08 ++ : : : :: : > > | > > 4e+08 ++ : : : : : : > > | > > | : : : : : : > > | > > 3e+08 ++ : : : : : : > > | > > 2e+08 ++ : : : : : : > > | > > | : : : : : : > > | > > 1e+08 ++ : :: : > > | > > 0 > > ++--O------*---------*-------*--------------------------------------+ > > > > > > perf-stat.context-switches > > > > 3e+06 > > ++----------------------------------------------------------------+ > > | *...*..*... > > | > > 2.5e+06 *+.*...* *..* * : *..*... .*...*..*... > > .* > > | : : : : : *. *. > > | > > O O: O :O O: O :: : O O O O O O > > | > > 2e+06 ++ : : : :O: O :O O > > | > > | : : : : : : > > | > > 1.5e+06 ++ : : : : : : > > | > > | : : : : : : > > | > > 1e+06 ++ : : : : : : > > | > > | : : : : : : > > | > > | : : : : : : > > | > > 500000 ++ :: : : :: > > | > > | : : : > > | > > 0 > > ++-O------*----------*------*-------------------------------------+ > > > > > > vmstat.system.cs > > > > 10000 > > ++------------------------------------------------------------------+ > > 9000 ++ *...*.. > > | > > *...*..* *...* * : *...*...*.. ..*..*...*.. > > ..* > > 8000 ++ : : : : : *. *. > > | > > 7000 O+ O: O O O: O : : : O O O O O O > > | > > | : : : :O: O :O O > > | > > 6000 ++ : : : : : : > > | > > 5000 ++ : : : : : : > > | > > 4000 ++ : : : : : : > > | > > | : : : : : : > > | > > 3000 ++ : : : : : : > > | > > 2000 ++ : : : : : : > > | > > | : : :: :: > > | > > 1000 ++ : : : > > | > > 0 > > ++--O------*---------*-------*--------------------------------------+ > > > > > > [*] bisect-good sample > > [O] bisect-bad sample > > > > To reproduce: > > > > apt-get install ruby > > git clone > > git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git > > cd lkp-tests > > bin/setup-local job.yaml # the job file attached in this email > > bin/run-local job.yaml > > > > > > Disclaimer: > > Results have been estimated based on internal Intel analysis and are > > provided > > for informational purposes only. Any difference in system hardware or > > software > > design or configuration may affect actual performance. > > > > > > Thanks, > > Ying Huang > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/