Re: [LKP] [RAID5] 878ee679279: -1.8% vmstat.io.bo, +40.5% perf-stat.LLC-load-misses

2015-04-29 Thread Yuanhan Liu
On Fri, Apr 24, 2015 at 12:15:59PM +1000, NeilBrown wrote:
> On Thu, 23 Apr 2015 14:55:59 +0800 Huang Ying  wrote:
> 
> > FYI, we noticed the below changes on
> > 
> > git://neil.brown.name/md for-next
> > commit 878ee6792799e2f88bdcac329845efadb205252f ("RAID5: batch adjacent 
> > full stripe write")
> 
> Hi,
>  is there any chance that you could explain what some of this means?
> There is lots of data and some very pretty graphs, but no explanation.

Hi Neil,

(Sorry for late response: Ying is on vacation)

I guess you can simply ignore this report, as I already reported to you
month ago that this patch made fsmark performs better in most cases:

https://lists.01.org/pipermail/lkp/2015-March/002411.html

> 
> Which numbers are "good", which are "bad"?  Which is "worst".
> What do the graphs really show? and what would we like to see in them?
> 
> I think it is really great that you are doing this testing and reporting the
> results.  It's just so sad that I completely fail to understand them.

Sorry, it's our bad to make them hard to understand as well as
to report a duplicate one(well, the commit hash is different ;).

We might need take some time to make those data understood easier.

--yliu

> 
> > 
> > 
> > testbox/testcase/testparams: 
> > lkp-st02/dd-write/300-5m-11HDD-RAID5-cfq-xfs-1dd
> > 
> > a87d7f782b47e030  878ee6792799e2f88bdcac3298  
> >   --  
> >  %stddev %change %stddev
> >  \  |\  
> >  59035 ±  0% +18.4%  69913 ±  1%  softirqs.SCHED
> >   1330 ± 10% +17.4%   1561 ±  4%  slabinfo.kmalloc-512.num_objs
> >   1330 ± 10% +17.4%   1561 ±  4%  
> > slabinfo.kmalloc-512.active_objs
> > 305908 ±  0%  -1.8% 300427 ±  0%  vmstat.io.bo
> >  1 ±  0%+100.0%  2 ±  0%  vmstat.procs.r
> >   8266 ±  1% -15.7%   6968 ±  0%  vmstat.system.cs
> >  14819 ±  0%  -2.1%  14503 ±  0%  vmstat.system.in
> >  18.20 ±  6% +10.2%  20.05 ±  4%  
> > perf-profile.cpu-cycles.raid_run_ops.handle_stripe.handle_active_stripes.raid5d.md_thread
> >   1.94 ±  9% +90.6%   3.70 ±  9%  
> > perf-profile.cpu-cycles.async_xor.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
> >   0.00 ±  0%  +Inf%  25.18 ±  3%  
> > perf-profile.cpu-cycles.handle_active_stripes.isra.45.raid5d.md_thread.kthread.ret_from_fork
> >   0.00 ±  0%  +Inf%  14.14 ±  4%  
> > perf-profile.cpu-cycles.async_copy_data.isra.42.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
> >   1.79 ±  7%+102.9%   3.64 ±  9%  
> > perf-profile.cpu-cycles.xor_blocks.async_xor.raid_run_ops.handle_stripe.handle_active_stripes
> >   3.09 ±  4% -10.8%   2.76 ±  4%  
> > perf-profile.cpu-cycles.get_active_stripe.make_request.md_make_request.generic_make_request.submit_bio
> >   0.80 ± 14% +28.1%   1.02 ± 10%  
> > perf-profile.cpu-cycles.mutex_lock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
> >  14.78 ±  6%-100.0%   0.00 ±  0%  
> > perf-profile.cpu-cycles.async_copy_data.isra.38.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
> >  25.68 ±  4%-100.0%   0.00 ±  0%  
> > perf-profile.cpu-cycles.handle_active_stripes.isra.41.raid5d.md_thread.kthread.ret_from_fork
> >   1.23 ±  5%+140.0%   2.96 ±  7%  
> > perf-profile.cpu-cycles.xor_sse_5_pf64.xor_blocks.async_xor.raid_run_ops.handle_stripe
> >   2.62 ±  6% -95.6%   0.12 ± 33%  
> > perf-profile.cpu-cycles.analyse_stripe.handle_stripe.handle_active_stripes.raid5d.md_thread
> >   0.96 ±  9% +17.5%   1.12 ±  2%  
> > perf-profile.cpu-cycles.xfs_ilock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
> >  1.461e+10 ±  0%  -5.3%  1.384e+10 ±  1%  
> > perf-stat.L1-dcache-load-misses
> >  3.688e+11 ±  0%  -2.7%   3.59e+11 ±  0%  perf-stat.L1-dcache-loads
> >  1.124e+09 ±  0% -27.7%  8.125e+08 ±  0%  perf-stat.L1-dcache-prefetches
> >  2.767e+10 ±  0%  -1.8%  2.717e+10 ±  0%  
> > perf-stat.L1-dcache-store-misses
> >  2.352e+11 ±  0%  -2.8%  2.287e+11 ±  0%  perf-stat.L1-dcache-stores
> >  6.774e+09 ±  0%  -2.3%   6.62e+09 ±  0%  
> > perf-stat.L1-icache-load-misses
> >  5.571e+08 ±  0% +40.5%  7.826e+08 ±  1%  perf-stat.LLC-load-misses
> >  6.263e+09 ±  0% -13.7%  5.407e+09 ±  1%  perf-stat.LLC-loads
> >  1.914e+11 ±  0%  -4.2%  1.833e+11 ±  0%  perf-stat.branch-instructions
> >  1.145e+09 ±  2%  -5.6%  1.081e+09 ±  0%  perf-stat.branch-load-misses
> >  1.911e+11 ±  0%  -4.3%  1.829e+11 ±  0%  perf-stat.branch-loads
> >  1.142e+09 ±  2%  -5.1%  1.083e+09 ±  0%  perf-stat.branch-misses
> >  1.218e+09 ±  0% +19.8%   1.46e+09 ±  0%  perf-stat.cache-misses
> >  2.118e+10 ±  0%  -5.2%  2.007e+10 ±  0%  perf-stat.cache-references
> >2510308 ±  1% -15.7%2115410 ±  0%

Re: [LKP] [RAID5] 878ee679279: -1.8% vmstat.io.bo, +40.5% perf-stat.LLC-load-misses

2015-04-23 Thread NeilBrown
On Thu, 23 Apr 2015 14:55:59 +0800 Huang Ying  wrote:

> FYI, we noticed the below changes on
> 
> git://neil.brown.name/md for-next
> commit 878ee6792799e2f88bdcac329845efadb205252f ("RAID5: batch adjacent full 
> stripe write")

Hi,
 is there any chance that you could explain what some of this means?
There is lots of data and some very pretty graphs, but no explanation.

Which numbers are "good", which are "bad"?  Which is "worst".
What do the graphs really show? and what would we like to see in them?

I think it is really great that you are doing this testing and reporting the
results.  It's just so sad that I completely fail to understand them.

Thanks,
NeilBrown

> 
> 
> testbox/testcase/testparams: lkp-st02/dd-write/300-5m-11HDD-RAID5-cfq-xfs-1dd
> 
> a87d7f782b47e030  878ee6792799e2f88bdcac3298  
>   --  
>  %stddev %change %stddev
>  \  |\  
>  59035 ±  0% +18.4%  69913 ±  1%  softirqs.SCHED
>   1330 ± 10% +17.4%   1561 ±  4%  slabinfo.kmalloc-512.num_objs
>   1330 ± 10% +17.4%   1561 ±  4%  slabinfo.kmalloc-512.active_objs
> 305908 ±  0%  -1.8% 300427 ±  0%  vmstat.io.bo
>  1 ±  0%+100.0%  2 ±  0%  vmstat.procs.r
>   8266 ±  1% -15.7%   6968 ±  0%  vmstat.system.cs
>  14819 ±  0%  -2.1%  14503 ±  0%  vmstat.system.in
>  18.20 ±  6% +10.2%  20.05 ±  4%  
> perf-profile.cpu-cycles.raid_run_ops.handle_stripe.handle_active_stripes.raid5d.md_thread
>   1.94 ±  9% +90.6%   3.70 ±  9%  
> perf-profile.cpu-cycles.async_xor.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
>   0.00 ±  0%  +Inf%  25.18 ±  3%  
> perf-profile.cpu-cycles.handle_active_stripes.isra.45.raid5d.md_thread.kthread.ret_from_fork
>   0.00 ±  0%  +Inf%  14.14 ±  4%  
> perf-profile.cpu-cycles.async_copy_data.isra.42.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
>   1.79 ±  7%+102.9%   3.64 ±  9%  
> perf-profile.cpu-cycles.xor_blocks.async_xor.raid_run_ops.handle_stripe.handle_active_stripes
>   3.09 ±  4% -10.8%   2.76 ±  4%  
> perf-profile.cpu-cycles.get_active_stripe.make_request.md_make_request.generic_make_request.submit_bio
>   0.80 ± 14% +28.1%   1.02 ± 10%  
> perf-profile.cpu-cycles.mutex_lock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
>  14.78 ±  6%-100.0%   0.00 ±  0%  
> perf-profile.cpu-cycles.async_copy_data.isra.38.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
>  25.68 ±  4%-100.0%   0.00 ±  0%  
> perf-profile.cpu-cycles.handle_active_stripes.isra.41.raid5d.md_thread.kthread.ret_from_fork
>   1.23 ±  5%+140.0%   2.96 ±  7%  
> perf-profile.cpu-cycles.xor_sse_5_pf64.xor_blocks.async_xor.raid_run_ops.handle_stripe
>   2.62 ±  6% -95.6%   0.12 ± 33%  
> perf-profile.cpu-cycles.analyse_stripe.handle_stripe.handle_active_stripes.raid5d.md_thread
>   0.96 ±  9% +17.5%   1.12 ±  2%  
> perf-profile.cpu-cycles.xfs_ilock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
>  1.461e+10 ±  0%  -5.3%  1.384e+10 ±  1%  perf-stat.L1-dcache-load-misses
>  3.688e+11 ±  0%  -2.7%   3.59e+11 ±  0%  perf-stat.L1-dcache-loads
>  1.124e+09 ±  0% -27.7%  8.125e+08 ±  0%  perf-stat.L1-dcache-prefetches
>  2.767e+10 ±  0%  -1.8%  2.717e+10 ±  0%  perf-stat.L1-dcache-store-misses
>  2.352e+11 ±  0%  -2.8%  2.287e+11 ±  0%  perf-stat.L1-dcache-stores
>  6.774e+09 ±  0%  -2.3%   6.62e+09 ±  0%  perf-stat.L1-icache-load-misses
>  5.571e+08 ±  0% +40.5%  7.826e+08 ±  1%  perf-stat.LLC-load-misses
>  6.263e+09 ±  0% -13.7%  5.407e+09 ±  1%  perf-stat.LLC-loads
>  1.914e+11 ±  0%  -4.2%  1.833e+11 ±  0%  perf-stat.branch-instructions
>  1.145e+09 ±  2%  -5.6%  1.081e+09 ±  0%  perf-stat.branch-load-misses
>  1.911e+11 ±  0%  -4.3%  1.829e+11 ±  0%  perf-stat.branch-loads
>  1.142e+09 ±  2%  -5.1%  1.083e+09 ±  0%  perf-stat.branch-misses
>  1.218e+09 ±  0% +19.8%   1.46e+09 ±  0%  perf-stat.cache-misses
>  2.118e+10 ±  0%  -5.2%  2.007e+10 ±  0%  perf-stat.cache-references
>2510308 ±  1% -15.7%2115410 ±  0%  perf-stat.context-switches
>  39623 ±  0% +22.1%  48370 ±  1%  perf-stat.cpu-migrations
>  4.179e+08 ± 40%+165.7%  1.111e+09 ± 35%  perf-stat.dTLB-load-misses
>  3.684e+11 ±  0%  -2.5%  3.592e+11 ±  0%  perf-stat.dTLB-loads
>  1.232e+08 ± 15% +62.5%  2.002e+08 ± 27%  perf-stat.dTLB-store-misses
>  2.348e+11 ±  0%  -2.5%  2.288e+11 ±  0%  perf-stat.dTLB-stores
>3577297 ±  2%  +8.7%3888986 ±  1%  perf-stat.iTLB-load-misses
>  1.035e+12 ±  0%  -3.5%  9.988e+11 ±  0%  perf-stat.iTLB-loads
>  1.036e+12 ±  0%  -3.7%  9.978e+11 ±  0%  perf-stat.instructions
>594 ± 30%+130.3%   1369 ± 13%  
> sched_debug.cfs_rq[0]:/.block

[LKP] [RAID5] 878ee679279: -1.8% vmstat.io.bo, +40.5% perf-stat.LLC-load-misses

2015-04-22 Thread Huang Ying
FYI, we noticed the below changes on

git://neil.brown.name/md for-next
commit 878ee6792799e2f88bdcac329845efadb205252f ("RAID5: batch adjacent full 
stripe write")


testbox/testcase/testparams: lkp-st02/dd-write/300-5m-11HDD-RAID5-cfq-xfs-1dd

a87d7f782b47e030  878ee6792799e2f88bdcac3298  
  --  
 %stddev %change %stddev
 \  |\  
 59035 ±  0% +18.4%  69913 ±  1%  softirqs.SCHED
  1330 ± 10% +17.4%   1561 ±  4%  slabinfo.kmalloc-512.num_objs
  1330 ± 10% +17.4%   1561 ±  4%  slabinfo.kmalloc-512.active_objs
305908 ±  0%  -1.8% 300427 ±  0%  vmstat.io.bo
 1 ±  0%+100.0%  2 ±  0%  vmstat.procs.r
  8266 ±  1% -15.7%   6968 ±  0%  vmstat.system.cs
 14819 ±  0%  -2.1%  14503 ±  0%  vmstat.system.in
 18.20 ±  6% +10.2%  20.05 ±  4%  
perf-profile.cpu-cycles.raid_run_ops.handle_stripe.handle_active_stripes.raid5d.md_thread
  1.94 ±  9% +90.6%   3.70 ±  9%  
perf-profile.cpu-cycles.async_xor.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
  0.00 ±  0%  +Inf%  25.18 ±  3%  
perf-profile.cpu-cycles.handle_active_stripes.isra.45.raid5d.md_thread.kthread.ret_from_fork
  0.00 ±  0%  +Inf%  14.14 ±  4%  
perf-profile.cpu-cycles.async_copy_data.isra.42.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
  1.79 ±  7%+102.9%   3.64 ±  9%  
perf-profile.cpu-cycles.xor_blocks.async_xor.raid_run_ops.handle_stripe.handle_active_stripes
  3.09 ±  4% -10.8%   2.76 ±  4%  
perf-profile.cpu-cycles.get_active_stripe.make_request.md_make_request.generic_make_request.submit_bio
  0.80 ± 14% +28.1%   1.02 ± 10%  
perf-profile.cpu-cycles.mutex_lock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
 14.78 ±  6%-100.0%   0.00 ±  0%  
perf-profile.cpu-cycles.async_copy_data.isra.38.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
 25.68 ±  4%-100.0%   0.00 ±  0%  
perf-profile.cpu-cycles.handle_active_stripes.isra.41.raid5d.md_thread.kthread.ret_from_fork
  1.23 ±  5%+140.0%   2.96 ±  7%  
perf-profile.cpu-cycles.xor_sse_5_pf64.xor_blocks.async_xor.raid_run_ops.handle_stripe
  2.62 ±  6% -95.6%   0.12 ± 33%  
perf-profile.cpu-cycles.analyse_stripe.handle_stripe.handle_active_stripes.raid5d.md_thread
  0.96 ±  9% +17.5%   1.12 ±  2%  
perf-profile.cpu-cycles.xfs_ilock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
 1.461e+10 ±  0%  -5.3%  1.384e+10 ±  1%  perf-stat.L1-dcache-load-misses
 3.688e+11 ±  0%  -2.7%   3.59e+11 ±  0%  perf-stat.L1-dcache-loads
 1.124e+09 ±  0% -27.7%  8.125e+08 ±  0%  perf-stat.L1-dcache-prefetches
 2.767e+10 ±  0%  -1.8%  2.717e+10 ±  0%  perf-stat.L1-dcache-store-misses
 2.352e+11 ±  0%  -2.8%  2.287e+11 ±  0%  perf-stat.L1-dcache-stores
 6.774e+09 ±  0%  -2.3%   6.62e+09 ±  0%  perf-stat.L1-icache-load-misses
 5.571e+08 ±  0% +40.5%  7.826e+08 ±  1%  perf-stat.LLC-load-misses
 6.263e+09 ±  0% -13.7%  5.407e+09 ±  1%  perf-stat.LLC-loads
 1.914e+11 ±  0%  -4.2%  1.833e+11 ±  0%  perf-stat.branch-instructions
 1.145e+09 ±  2%  -5.6%  1.081e+09 ±  0%  perf-stat.branch-load-misses
 1.911e+11 ±  0%  -4.3%  1.829e+11 ±  0%  perf-stat.branch-loads
 1.142e+09 ±  2%  -5.1%  1.083e+09 ±  0%  perf-stat.branch-misses
 1.218e+09 ±  0% +19.8%   1.46e+09 ±  0%  perf-stat.cache-misses
 2.118e+10 ±  0%  -5.2%  2.007e+10 ±  0%  perf-stat.cache-references
   2510308 ±  1% -15.7%2115410 ±  0%  perf-stat.context-switches
 39623 ±  0% +22.1%  48370 ±  1%  perf-stat.cpu-migrations
 4.179e+08 ± 40%+165.7%  1.111e+09 ± 35%  perf-stat.dTLB-load-misses
 3.684e+11 ±  0%  -2.5%  3.592e+11 ±  0%  perf-stat.dTLB-loads
 1.232e+08 ± 15% +62.5%  2.002e+08 ± 27%  perf-stat.dTLB-store-misses
 2.348e+11 ±  0%  -2.5%  2.288e+11 ±  0%  perf-stat.dTLB-stores
   3577297 ±  2%  +8.7%3888986 ±  1%  perf-stat.iTLB-load-misses
 1.035e+12 ±  0%  -3.5%  9.988e+11 ±  0%  perf-stat.iTLB-loads
 1.036e+12 ±  0%  -3.7%  9.978e+11 ±  0%  perf-stat.instructions
   594 ± 30%+130.3%   1369 ± 13%  
sched_debug.cfs_rq[0]:/.blocked_load_avg
17 ± 10% -28.2% 12 ± 23%  
sched_debug.cfs_rq[0]:/.nr_spread_over
   210 ± 21% +42.1%298 ± 28%  
sched_debug.cfs_rq[0]:/.tg_runnable_contrib
  9676 ± 21% +42.1%  13754 ± 28%  
sched_debug.cfs_rq[0]:/.avg->runnable_avg_sum
   772 ± 25%+116.5%   1672 ±  9%  
sched_debug.cfs_rq[0]:/.tg_load_contrib
  8402 ±  9% +83.3%  15405 ± 11%  
sched_debug.cfs_rq[0]:/.tg_load_avg
  8356 ±  9% +82.8%  15272 ± 11%  
sched_debug.cfs_rq[1]:/.tg_load_avg
   968 ± 25%+100.8%   1943 ± 14%  
sched_debug.cfs_rq[1]:/.blocked_load_avg
 16242 ±  9%