Re: [PATCH V3 2/2] powerpc/pseries: init fault_around_order for pseries

2014-05-06 Thread Ingo Molnar

* Rusty Russell  wrote:

> Ingo Molnar  writes:
> > * Madhavan Srinivasan  wrote:
> >
> >> Performance data for different FAULT_AROUND_ORDER values from 4 socket
> >> Power7 system (128 Threads and 128GB memory). perf stat with repeat of 5
> >> is used to get the stddev values. Test ran in v3.14 kernel (Baseline) and
> >> v3.15-rc1 for different fault around order values.
> >> 
> >> FAULT_AROUND_ORDER  Baseline1   3   4  
> >>  5   8
> >> 
> >> Linux build (make -j64)
> >> minor-faults47,437,359  35,279,286  25,425,347  
> >> 23,461,275  22,002,189  21,435,836
> >> times in seconds347.302528420   344.061588460   340.974022391   
> >> 348.193508116   348.673900158   350.986543618
> >>  stddev for time( +-  1.50% )   ( +-  0.73% )   ( +-  1.13% )   ( 
> >> +-  1.01% )   ( +-  1.89% )   ( +-  1.55% )
> >>  %chg time to baseline  -0.9%   -1.8%   
> >> 0.2%0.39%   1.06%
> >
> > Probably too noisy.
> 
> A little, but 3 still looks like the winner.
> 
> >> Linux rebuild (make -j64)
> >> minor-faults941,552 718,319 486,625 
> >> 440,124 410,510 397,416
> >> times in seconds30.56983471831.21963753931.319370649
> >> 31.43428547231.97236717431.443043580
> >>  stddev for time( +-  1.07% )   ( +-  0.13% )   ( +-  0.43% )   ( 
> >> +-  0.18% )   ( +-  0.95% )   ( +-  0.58% )
> >>  %chg time to baseline  2.1%2.4%
> >> 2.8%4.58%   2.85%
> >
> > Here it looks like a speedup. Optimal value: 5+.
> 
> No, lower time is better.  Baseline (no faultaround) wins.
> 
> 
> etc.

ah, yeah, you are right. Brainfart of the week...

> It's not a huge surprise that a 64k page arch wants a smaller value 
> than a 4k system.  But I agree: I don't see much upside for FAO > 0, 
> but I do see downside.
> 
> Most extreme results:
> Order 1: 2% loss on recompile.  10% win 4% loss on seq.  9% loss random.
> Order 3: 2% loss on recompile.  6% win 5% loss on seq.  14% loss on random.
> Order 4: 2.8% loss on recompile. 10% win 7% loss on seq.  9% loss on random.
> 
> > I'm starting to suspect that maybe workloads ought to be given a 
> > choice in this matter, via madvise() or such.
> 
> I really don't think they'll be able to use it; it'll change far too 
> much with machine and kernel updates. [...]

Do we know that?

> [...] I think we should apply patch
> #1 (with fixes) to make it a variable, then set it to 0 for PPC.

Ok, agreed - at least until contrary data comes around.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 2/2] powerpc/pseries: init fault_around_order for pseries

2014-04-30 Thread Madhavan Srinivasan
On Wednesday 30 April 2014 12:34 PM, Rusty Russell wrote:
> Ingo Molnar  writes:
>> * Madhavan Srinivasan  wrote:
>>
>>> Performance data for different FAULT_AROUND_ORDER values from 4 socket
>>> Power7 system (128 Threads and 128GB memory). perf stat with repeat of 5
>>> is used to get the stddev values. Test ran in v3.14 kernel (Baseline) and
>>> v3.15-rc1 for different fault around order values.
>>>
>>> FAULT_AROUND_ORDER  Baseline1   3   4   
>>> 5   8
>>>
>>> Linux build (make -j64)
>>> minor-faults47,437,359  35,279,286  25,425,347  
>>> 23,461,275  22,002,189  21,435,836
>>> times in seconds347.302528420   344.061588460   340.974022391   
>>> 348.193508116   348.673900158   350.986543618
>>>  stddev for time( +-  1.50% )   ( +-  0.73% )   ( +-  1.13% )   ( 
>>> +-  1.01% )   ( +-  1.89% )   ( +-  1.55% )
>>>  %chg time to baseline  -0.9%   -1.8%   
>>> 0.2%0.39%   1.06%
>>
>> Probably too noisy.
> 
> A little, but 3 still looks like the winner.
> 
>>> Linux rebuild (make -j64)
>>> minor-faults941,552 718,319 486,625 
>>> 440,124 410,510 397,416
>>> times in seconds30.56983471831.21963753931.319370649
>>> 31.43428547231.97236717431.443043580
>>>  stddev for time( +-  1.07% )   ( +-  0.13% )   ( +-  0.43% )   ( 
>>> +-  0.18% )   ( +-  0.95% )   ( +-  0.58% )
>>>  %chg time to baseline  2.1%2.4%
>>> 2.8%4.58%   2.85%
>>
>> Here it looks like a speedup. Optimal value: 5+.
> 
> No, lower time is better.  Baseline (no faultaround) wins.
> 
> 
> etc.
> 
> It's not a huge surprise that a 64k page arch wants a smaller value than
> a 4k system.  But I agree: I don't see much upside for FAO > 0, but I do
> see downside.
> 
> Most extreme results:
> Order 1: 2% loss on recompile.  10% win 4% loss on seq.  9% loss random.
> Order 3: 2% loss on recompile.  6% win 5% loss on seq.  14% loss on random.
> Order 4: 2.8% loss on recompile. 10% win 7% loss on seq.  9% loss on random.
> 
>> I'm starting to suspect that maybe workloads ought to be given a 
>> choice in this matter, via madvise() or such.
> 
> I really don't think they'll be able to use it; it'll change far too
> much with machine and kernel updates.  I think we should apply patch #1
> (with fixes) to make it a variable, then set it to 0 for PPC.
> 

Ok. Will do.

Thanks for review
With regards
Maddy


> Cheers,
> Rusty.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 2/2] powerpc/pseries: init fault_around_order for pseries

2014-04-30 Thread Rusty Russell
Ingo Molnar  writes:
> * Madhavan Srinivasan  wrote:
>
>> Performance data for different FAULT_AROUND_ORDER values from 4 socket
>> Power7 system (128 Threads and 128GB memory). perf stat with repeat of 5
>> is used to get the stddev values. Test ran in v3.14 kernel (Baseline) and
>> v3.15-rc1 for different fault around order values.
>> 
>> FAULT_AROUND_ORDER  Baseline1   3   4
>>5   8
>> 
>> Linux build (make -j64)
>> minor-faults47,437,359  35,279,286  25,425,347  
>> 23,461,275  22,002,189  21,435,836
>> times in seconds347.302528420   344.061588460   340.974022391   
>> 348.193508116   348.673900158   350.986543618
>>  stddev for time( +-  1.50% )   ( +-  0.73% )   ( +-  1.13% )   ( +- 
>>  1.01% )   ( +-  1.89% )   ( +-  1.55% )
>>  %chg time to baseline  -0.9%   -1.8%   0.2% 
>>0.39%   1.06%
>
> Probably too noisy.

A little, but 3 still looks like the winner.

>> Linux rebuild (make -j64)
>> minor-faults941,552 718,319 486,625 
>> 440,124 410,510 397,416
>> times in seconds30.56983471831.21963753931.319370649
>> 31.43428547231.97236717431.443043580
>>  stddev for time( +-  1.07% )   ( +-  0.13% )   ( +-  0.43% )   ( +- 
>>  0.18% )   ( +-  0.95% )   ( +-  0.58% )
>>  %chg time to baseline  2.1%2.4%2.8% 
>>4.58%   2.85%
>
> Here it looks like a speedup. Optimal value: 5+.

No, lower time is better.  Baseline (no faultaround) wins.


etc.

It's not a huge surprise that a 64k page arch wants a smaller value than
a 4k system.  But I agree: I don't see much upside for FAO > 0, but I do
see downside.

Most extreme results:
Order 1: 2% loss on recompile.  10% win 4% loss on seq.  9% loss random.
Order 3: 2% loss on recompile.  6% win 5% loss on seq.  14% loss on random.
Order 4: 2.8% loss on recompile. 10% win 7% loss on seq.  9% loss on random.

> I'm starting to suspect that maybe workloads ought to be given a 
> choice in this matter, via madvise() or such.

I really don't think they'll be able to use it; it'll change far too
much with machine and kernel updates.  I think we should apply patch #1
(with fixes) to make it a variable, then set it to 0 for PPC.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 2/2] powerpc/pseries: init fault_around_order for pseries

2014-04-29 Thread Madhavan Srinivasan
On Tuesday 29 April 2014 12:36 PM, Ingo Molnar wrote:
> 
> * Madhavan Srinivasan  wrote:
> 
>> Performance data for different FAULT_AROUND_ORDER values from 4 socket
>> Power7 system (128 Threads and 128GB memory). perf stat with repeat of 5
>> is used to get the stddev values. Test ran in v3.14 kernel (Baseline) and
>> v3.15-rc1 for different fault around order values.
>>
>> FAULT_AROUND_ORDER  Baseline1   3   4
>>5   8
>>
>> Linux build (make -j64)
>> minor-faults47,437,359  35,279,286  25,425,347  
>> 23,461,275  22,002,189  21,435,836
>> times in seconds347.302528420   344.061588460   340.974022391   
>> 348.193508116   348.673900158   350.986543618
>>  stddev for time( +-  1.50% )   ( +-  0.73% )   ( +-  1.13% )   ( +- 
>>  1.01% )   ( +-  1.89% )   ( +-  1.55% )
>>  %chg time to baseline  -0.9%   -1.8%   0.2% 
>>0.39%   1.06%
> 
> Probably too noisy.

Ok. I should have added the formula used for %change to clarify the data
presented. My bad.

Just to clarify, %change here is calculated based on this formula.

((new value - baseline)/baseline)

And in this case, negative %change says it a drop in time and
positive value has increase the time when compared to baseline.

With regards
Maddy

> 
>> Linux rebuild (make -j64)
>> minor-faults941,552 718,319 486,625 
>> 440,124 410,510 397,416
>> times in seconds30.56983471831.21963753931.319370649
>> 31.43428547231.97236717431.443043580
>>  stddev for time( +-  1.07% )   ( +-  0.13% )   ( +-  0.43% )   ( +- 
>>  0.18% )   ( +-  0.95% )   ( +-  0.58% )
>>  %chg time to baseline  2.1%2.4%2.8% 
>>4.58%   2.85%
> 
> Here it looks like a speedup. Optimal value: 5+.
> 
>> Binutils build (make all -j64 )
>> minor-faults474,821 371,380 269,463 
>> 247,715 235,255 228,337
>> times in seconds53.88249243253.58428934853.882773216
>> 53.75581643153.60782434853.423759642
>>  stddev for time( +-  0.08% )   ( +-  0.56% )   ( +-  0.17% )   ( +- 
>>  0.11% )   ( +-  0.60% )   ( +-  0.69% )
>>  %chg time to baseline  -0.55%  0.0%
>> -0.23%  -0.51%  -0.85%
> 
> Probably too noisy, but looks like a potential slowdown?
> 
>> Two synthetic tests: access every word in file in sequential/random order.
>>
>> Sequential access 16GiB file
>> FAULT_AROUND_ORDER  Baseline1   3   4
>>5   8
>> 1 thread
>>minor-faults 263,148 131,166 32,908  
>> 16,514  8,260   1,093
>>times in seconds 53.09113834553.11319167253.188776177
>> 53.23301721853.20684134753.429979442
>>stddev for time  ( +-  0.06% )   ( +-  0.07% )   ( +-  0.08% )   ( +- 
>>  0.09% )   ( +-  0.03% )   ( +-  0.03% )
>>%chg time to baseline0.04%   0.18%   
>> 0.26%   0.21%   0.63%
> 
> Speedup, optimal value: 8+.
> 
>> 8 threads
>>minor-faults 2,097,267   1,048,753   262,237 
>> 131,397 65,621  8,274
>>times in seconds 55.17379002854.59188079054.824623287
>> 54.80216221154.96968050354.790387715
>>stddev for time  ( +-  0.78% )   ( +-  0.09% )   ( +-  0.08% )   ( +- 
>>  0.07% )   ( +-  0.28% )   ( +-  0.05% )
>>%chg time to baseline-1.05%  -0.63%  
>> -0.67%  -0.36%  -0.69%
> 
> Looks like a regression?
> 
>> 32 threads
>>minor-faults 8,388,751   4,195,621   1,049,664   
>> 525,461 262,535 32,924
>>times in seconds 60.43157304660.66911074460.485336388
>> 60.69778970660.07795956460.588855032
>>stddev for time  ( +-  0.44% )   ( +-  0.27% )   ( +-  0.46% )   ( +- 
>>  0.67% )   ( +-  0.31% )   ( +-  0.49% )
>>%chg time to baseline0.39%   0.08%   
>> 0.44%   -0.58%  0.25%
> 
> Probably too noisy.
> 
>> 64 threads
>>minor-faults 16,777,409  8,607,527   2,289,766   
>> 1,202,264   598,405 67,587
>>times in seconds 96.932617720100.675418760   102.109880836   
>> 103.881733383   102.580199555   105.751194041
>>stddev for time  ( +-  1.39% )   ( +-  1.06% )   ( +-  0.99% )   ( +- 
>>  0.76% )   ( +-  1.65% )   ( +-  1.60% )
>>%chg time to baseline3.86%   5.34%   
>> 7.16%   5.82%   9.09%
> 
> Speedup, optimal value: 4+
> 
>> 128 threads
>>minor-faults 33,554,705  17,375,375  4,682,462   
>> 2,337,245   1,179

Re: [PATCH V3 2/2] powerpc/pseries: init fault_around_order for pseries

2014-04-29 Thread Madhavan Srinivasan
On Tuesday 29 April 2014 07:48 AM, Rusty Russell wrote:
> Madhavan Srinivasan  writes:
>> diff --git a/arch/powerpc/platforms/pseries/setup.c 
>> b/arch/powerpc/platforms/pseries/setup.c
>> index 2db8cc6..c87e6b6 100644
>> --- a/arch/powerpc/platforms/pseries/setup.c
>> +++ b/arch/powerpc/platforms/pseries/setup.c
>> @@ -74,6 +74,8 @@ int CMO_SecPSP = -1;
>>  unsigned long CMO_PageSize = (ASM_CONST(1) << IOMMU_PAGE_SHIFT_4K);
>>  EXPORT_SYMBOL(CMO_PageSize);
>>  
>> +extern unsigned int fault_around_order;
>> +
> 
> It's considered bad form to do this.  Put the declaration in linux/mm.h.
> 

ok. Will change it.

Thanks for review
With regards
Maddy

> Thanks,
> Rusty.
> PS.  But we're getting there! :)
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 2/2] powerpc/pseries: init fault_around_order for pseries

2014-04-29 Thread Ingo Molnar

* Madhavan Srinivasan  wrote:

> Performance data for different FAULT_AROUND_ORDER values from 4 socket
> Power7 system (128 Threads and 128GB memory). perf stat with repeat of 5
> is used to get the stddev values. Test ran in v3.14 kernel (Baseline) and
> v3.15-rc1 for different fault around order values.
> 
> FAULT_AROUND_ORDER  Baseline1   3   4 
>   5   8
> 
> Linux build (make -j64)
> minor-faults47,437,359  35,279,286  25,425,347  
> 23,461,275  22,002,189  21,435,836
> times in seconds347.302528420   344.061588460   340.974022391   
> 348.193508116   348.673900158   350.986543618
>  stddev for time( +-  1.50% )   ( +-  0.73% )   ( +-  1.13% )   ( +-  
> 1.01% )   ( +-  1.89% )   ( +-  1.55% )
>  %chg time to baseline  -0.9%   -1.8%   0.2%  
>   0.39%   1.06%

Probably too noisy.

> Linux rebuild (make -j64)
> minor-faults941,552 718,319 486,625 
> 440,124 410,510 397,416
> times in seconds30.56983471831.21963753931.319370649
> 31.43428547231.97236717431.443043580
>  stddev for time( +-  1.07% )   ( +-  0.13% )   ( +-  0.43% )   ( +-  
> 0.18% )   ( +-  0.95% )   ( +-  0.58% )
>  %chg time to baseline  2.1%2.4%2.8%  
>   4.58%   2.85%

Here it looks like a speedup. Optimal value: 5+.

> Binutils build (make all -j64 )
> minor-faults474,821 371,380 269,463 
> 247,715 235,255 228,337
> times in seconds53.88249243253.58428934853.882773216
> 53.75581643153.60782434853.423759642
>  stddev for time( +-  0.08% )   ( +-  0.56% )   ( +-  0.17% )   ( +-  
> 0.11% )   ( +-  0.60% )   ( +-  0.69% )
>  %chg time to baseline  -0.55%  0.0%
> -0.23%  -0.51%  -0.85%

Probably too noisy, but looks like a potential slowdown?

> Two synthetic tests: access every word in file in sequential/random order.
> 
> Sequential access 16GiB file
> FAULT_AROUND_ORDER  Baseline1   3   4 
>   5   8
> 1 thread
>minor-faults 263,148 131,166 32,908  
> 16,514  8,260   1,093
>times in seconds 53.09113834553.11319167253.188776177
> 53.23301721853.20684134753.429979442
>stddev for time  ( +-  0.06% )   ( +-  0.07% )   ( +-  0.08% )   ( +-  
> 0.09% )   ( +-  0.03% )   ( +-  0.03% )
>%chg time to baseline0.04%   0.18%   0.26% 
>   0.21%   0.63%

Speedup, optimal value: 8+.

> 8 threads
>minor-faults 2,097,267   1,048,753   262,237 
> 131,397 65,621  8,274
>times in seconds 55.17379002854.59188079054.824623287
> 54.80216221154.96968050354.790387715
>stddev for time  ( +-  0.78% )   ( +-  0.09% )   ( +-  0.08% )   ( +-  
> 0.07% )   ( +-  0.28% )   ( +-  0.05% )
>%chg time to baseline-1.05%  -0.63%  
> -0.67%  -0.36%  -0.69%

Looks like a regression?

> 32 threads
>minor-faults 8,388,751   4,195,621   1,049,664   
> 525,461 262,535 32,924
>times in seconds 60.43157304660.66911074460.485336388
> 60.69778970660.07795956460.588855032
>stddev for time  ( +-  0.44% )   ( +-  0.27% )   ( +-  0.46% )   ( +-  
> 0.67% )   ( +-  0.31% )   ( +-  0.49% )
>%chg time to baseline0.39%   0.08%   0.44% 
>   -0.58%  0.25%

Probably too noisy.

> 64 threads
>minor-faults 16,777,409  8,607,527   2,289,766   
> 1,202,264   598,405 67,587
>times in seconds 96.932617720100.675418760   102.109880836   
> 103.881733383   102.580199555   105.751194041
>stddev for time  ( +-  1.39% )   ( +-  1.06% )   ( +-  0.99% )   ( +-  
> 0.76% )   ( +-  1.65% )   ( +-  1.60% )
>%chg time to baseline3.86%   5.34%   7.16% 
>   5.82%   9.09%

Speedup, optimal value: 4+

> 128 threads
>minor-faults 33,554,705  17,375,375  4,682,462   
> 2,337,245   1,179,007   134,819
>times in seconds 128.766704495   115.659225437   120.353046307   
> 115.291871270   115.450886036   113.991902150
>stddev for time  ( +-  2.93% )   ( +-  0.30% )   ( +-  2.93% )   ( +-  
> 1.24% )   ( +-  1.03% )   ( +-  0.70% )
>%chg time to baseline-10.17% -6.53%  
> -10.46% -10.34% -11.47%

Rather significant regression at order 1 already.

> Random access 1GiB file
> FAULT_AROUND_ORDER  Baseline1   3   

Re: [PATCH V3 2/2] powerpc/pseries: init fault_around_order for pseries

2014-04-28 Thread Rusty Russell
Madhavan Srinivasan  writes:
> diff --git a/arch/powerpc/platforms/pseries/setup.c 
> b/arch/powerpc/platforms/pseries/setup.c
> index 2db8cc6..c87e6b6 100644
> --- a/arch/powerpc/platforms/pseries/setup.c
> +++ b/arch/powerpc/platforms/pseries/setup.c
> @@ -74,6 +74,8 @@ int CMO_SecPSP = -1;
>  unsigned long CMO_PageSize = (ASM_CONST(1) << IOMMU_PAGE_SHIFT_4K);
>  EXPORT_SYMBOL(CMO_PageSize);
>  
> +extern unsigned int fault_around_order;
> +

It's considered bad form to do this.  Put the declaration in linux/mm.h.

Thanks,
Rusty.
PS.  But we're getting there! :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V3 2/2] powerpc/pseries: init fault_around_order for pseries

2014-04-28 Thread Madhavan Srinivasan
Performance data for different FAULT_AROUND_ORDER values from 4 socket
Power7 system (128 Threads and 128GB memory). perf stat with repeat of 5
is used to get the stddev values. Test ran in v3.14 kernel (Baseline) and
v3.15-rc1 for different fault around order values.

FAULT_AROUND_ORDER  Baseline1   3   4   
5   8

Linux build (make -j64)
minor-faults47,437,359  35,279,286  25,425,347  
23,461,275  22,002,189  21,435,836
times in seconds347.302528420   344.061588460   340.974022391   
348.193508116   348.673900158   350.986543618
 stddev for time( +-  1.50% )   ( +-  0.73% )   ( +-  1.13% )   ( +-  
1.01% )   ( +-  1.89% )   ( +-  1.55% )
 %chg time to baseline  -0.9%   -1.8%   0.2%
0.39%   1.06%

Linux rebuild (make -j64)
minor-faults941,552 718,319 486,625 440,124 
410,510 397,416
times in seconds30.56983471831.21963753931.319370649
31.43428547231.97236717431.443043580
 stddev for time( +-  1.07% )   ( +-  0.13% )   ( +-  0.43% )   ( +-  
0.18% )   ( +-  0.95% )   ( +-  0.58% )
 %chg time to baseline  2.1%2.4%2.8%
4.58%   2.85%

Binutils build (make all -j64 )
minor-faults474,821 371,380 269,463 247,715 
235,255 228,337
times in seconds53.88249243253.58428934853.882773216
53.75581643153.60782434853.423759642
 stddev for time( +-  0.08% )   ( +-  0.56% )   ( +-  0.17% )   ( +-  
0.11% )   ( +-  0.60% )   ( +-  0.69% )
 %chg time to baseline  -0.55%  0.0%-0.23%  
-0.51%  -0.85%

Two synthetic tests: access every word in file in sequential/random order.

Sequential access 16GiB file
FAULT_AROUND_ORDER  Baseline1   3   4   
5   8
1 thread
   minor-faults 263,148 131,166 32,908  16,514  
8,260   1,093
   times in seconds 53.09113834553.11319167253.188776177
53.23301721853.20684134753.429979442
   stddev for time  ( +-  0.06% )   ( +-  0.07% )   ( +-  0.08% )   ( +-  
0.09% )   ( +-  0.03% )   ( +-  0.03% )
   %chg time to baseline0.04%   0.18%   0.26%   
0.21%   0.63%
8 threads
   minor-faults 2,097,267   1,048,753   262,237 131,397 
65,621  8,274
   times in seconds 55.17379002854.59188079054.824623287
54.80216221154.96968050354.790387715
   stddev for time  ( +-  0.78% )   ( +-  0.09% )   ( +-  0.08% )   ( +-  
0.07% )   ( +-  0.28% )   ( +-  0.05% )
   %chg time to baseline-1.05%  -0.63%  -0.67%  
-0.36%  -0.69%
32 threads
   minor-faults 8,388,751   4,195,621   1,049,664   525,461 
262,535 32,924
   times in seconds 60.43157304660.66911074460.485336388
60.69778970660.07795956460.588855032
   stddev for time  ( +-  0.44% )   ( +-  0.27% )   ( +-  0.46% )   ( +-  
0.67% )   ( +-  0.31% )   ( +-  0.49% )
   %chg time to baseline0.39%   0.08%   0.44%   
-0.58%  0.25%
64 threads
   minor-faults 16,777,409  8,607,527   2,289,766   
1,202,264   598,405 67,587
   times in seconds 96.932617720100.675418760   102.109880836   
103.881733383   102.580199555   105.751194041
   stddev for time  ( +-  1.39% )   ( +-  1.06% )   ( +-  0.99% )   ( +-  
0.76% )   ( +-  1.65% )   ( +-  1.60% )
   %chg time to baseline3.86%   5.34%   7.16%   
5.82%   9.09%
128 threads
   minor-faults 33,554,705  17,375,375  4,682,462   
2,337,245   1,179,007   134,819
   times in seconds 128.766704495   115.659225437   120.353046307   
115.291871270   115.450886036   113.991902150
   stddev for time  ( +-  2.93% )   ( +-  0.30% )   ( +-  2.93% )   ( +-  
1.24% )   ( +-  1.03% )   ( +-  0.70% )
   %chg time to baseline-10.17% -6.53%  -10.46% 
-10.34% -11.47%

Random access 1GiB file
FAULT_AROUND_ORDER  Baseline1   3   4   
5   8
1 thread
   minor-faults 17,155  8,678   2,126   1,097   
581 134
   times in seconds 51.90443052351.65801798751.919270792
51.56053173852.35443159751.976469502
   stddev for time  ( +-  3.19% )   ( +-  1.35% )   ( +-  1.56% )   ( +-  
0.91% )   ( +-  1.70% )   ( +-  2.02% )
   %chg time to baseline-0.47%  0.02%   -0.66%  
0.8