* Aubrey Li <aubrey.in...@gmail.com> wrote:
> On Sun, Apr 28, 2019 at 5:33 PM Ingo Molnar <mi...@kernel.org> wrote: > > So because I'm a big fan of presenting data in a readable fashion, here > > are your results, tabulated: > > I thought I tried my best to make it readable, but this one looks much better, > thanks, ;-) > > > > # > > # Sysbench throughput comparison of 3 different kernels at different > > # load levels, higher numbers are better: > > # > > > > > > .--------------------------------------|----------------------------------------------------------------. > > | NA/AVX vanilla-SMT [stddev%] |coresched-SMT [stddev%] +/- | > > no-SMT [stddev%] +/- | > > > > |--------------------------------------|----------------------------------------------------------------| > > | 1/1 508.5 [ 0.2% ] | 504.7 [ 1.1% ] 0.8% | > > 509.0 [ 0.2% ] 0.1% | > > | 2/2 1000.2 [ 1.4% ] | 1004.1 [ 1.6% ] 0.4% | > > 997.6 [ 1.2% ] 0.3% | > > | 4/4 1912.1 [ 1.0% ] | 1904.2 [ 1.1% ] 0.4% | > > 1914.9 [ 1.3% ] 0.1% | > > | 8/8 3753.5 [ 0.3% ] | 3748.2 [ 0.3% ] 0.1% | > > 3751.3 [ 0.4% ] 0.1% | > > | 16/16 7139.3 [ 2.4% ] | 7137.9 [ 1.8% ] 0.0% | > > 7049.2 [ 2.4% ] 1.3% | > > | 32/32 10899.0 [ 4.2% ] | 10780.3 [ 4.4% ] -1.1% | > > 10339.2 [ 9.6% ] -5.1% | > > | 64/64 15086.1 [ 11.5% ] | 14262.0 [ 8.2% ] -5.5% | > > 11168.7 [ 22.2% ] -26.0% | > > | 128/128 15371.9 [ 22.0% ] | 14675.8 [ 14.4% ] -4.5% | > > 10963.9 [ 18.5% ] -28.7% | > > | 256/256 15990.8 [ 22.0% ] | 12227.9 [ 10.3% ] -23.5% | > > 10469.9 [ 19.6% ] -34.5% | > > > > '--------------------------------------|----------------------------------------------------------------' > > > > One major thing that sticks out is that if we compare the stddev numbers > > to the +/- comparisons then it's pretty clear that the benchmarks are > > very noisy: in all but the last row stddev is actually higher than the > > measured effect. > > > > So what does 'stddev' mean here, exactly? The stddev of multipe runs, > > i.e. measured run-to-run variance? Or is it some internal metric of the > > benchmark? > > > > The benchmark periodically reports intermediate statistics in one second, > the raw log looks like below: > [ 11s ] thds: 256 eps: 14346.72 lat (ms,95%): 44.17 > [ 12s ] thds: 256 eps: 14328.45 lat (ms,95%): 44.17 > [ 13s ] thds: 256 eps: 13773.06 lat (ms,95%): 43.39 > [ 14s ] thds: 256 eps: 13752.31 lat (ms,95%): 43.39 > [ 15s ] thds: 256 eps: 15362.79 lat (ms,95%): 43.39 > [ 16s ] thds: 256 eps: 26580.65 lat (ms,95%): 35.59 > [ 17s ] thds: 256 eps: 15011.78 lat (ms,95%): 36.89 > [ 18s ] thds: 256 eps: 15025.78 lat (ms,95%): 39.65 > [ 19s ] thds: 256 eps: 15350.87 lat (ms,95%): 39.65 > [ 20s ] thds: 256 eps: 15491.70 lat (ms,95%): 36.89 > > I have a python script to parse eps(events per second) and lat(latency) > out, and compute the average and stddev. (And I can draw a curve locally). > > It's noisy indeed when tasks number is greater than the CPU number. > It's probably caused by high frequent load balance and context switch. Ok, so it's basically an internal workload noise metric, it doesn't represent the run-to-run noise. So it's the real stddev of the workload - but we don't know whether the measured performance figure is exactly in the middle of the runtime probability distribution. > Do you have any suggestions? Or any other information I can provide? Yeah, so we don't just want to know the "standard deviation" of the measured throughput values, but also the "standard error of the mean". I suspect it's pretty low, below 1% for all rows? Thanks, Ingo