I started to work on visualisation. IMHO it helps to understand the problem.
Let's create a large dataset: 500 samples (100 processes x 5 samples): --- $ python3 telco.py --json-file=telco.json -p 100 -n 5 --- Attached plot.py script creates an histogram: --- avg: 26.7 ms +- 0.2 ms; min = 26.2 ms 26.1 ms: 1 # 26.2 ms: 12 ##### 26.3 ms: 34 ############ 26.4 ms: 44 ################ 26.5 ms: 109 ###################################### 26.6 ms: 117 ######################################## 26.7 ms: 86 ############################## 26.8 ms: 50 ################## 26.9 ms: 32 ########### 27.0 ms: 10 #### 27.1 ms: 3 ## 27.2 ms: 1 # 27.3 ms: 1 # minimum 26.1 ms: 0.2% (1) of 500 samples --- Replace "if 1" with "if 0" to produce a graphical view, or just view the attached distribution.png, the numpy+scipy histogram. The distribution looks a gaussian curve: https://en.wikipedia.org/wiki/Gaussian_function The interesting thing is that only 1 sample on 500 are in the minimum bucket (26.1 ms). If you say that the performance is 26.1 ms, only 0.2% of your users will be able to reproduce this timing. The average and std dev are 26.7 ms +- 0.2 ms, so numbers 26.5 ms .. 26.9 ms: we got 109+117+86+50+32 samples in this range which gives us 394/500 = 79%. IMHO saying "26.7 ms +- 0.2 ms" (79% of samples) is less a lie than 26.1 ms (0.2%). Victor
telco.json
Description: application/json
import perf
import sys
filename = sys.argv[1]
bench = perf.Benchmark.json_load(open(filename).read())
h = [sample * 1e3 for sample in bench.get_samples()]
if 1:
import math
import collections
print("avg: %.1f ms +- %.1f ms; min = %.1f ms"
% (perf.mean(h), perf.stdev(h), min(h)))
print("")
c = collections.Counter([int(value * 10) for value in h])
k = 40.0 / max(c.values())
for ms in range(min(c), max(c)+1):
value = c.get(ms, 0)
linelen = int(math.ceil(value * k))
print("%.1f ms: % 3s %s" % (float(ms) / 10, value, '#' * linelen))
print("")
cmin = min(c)
value = c.get(cmin)
print("minimum %.1f ms: %.1f%% (%s) of %s samples" % (float(cmin) / 10, value * 100.0 / len(h), value, len(h)))
else:
import numpy as np
import scipy.stats as stats
import pylab as pl
h.sort()
fit = stats.norm.pdf(h, np.mean(h), np.std(h)) #this is a fitting indeed
pl.plot(h,fit,'-o')
pl.hist(h,normed=True) #use this to draw histogram of your data
pl.show() #use may also need add this
_______________________________________________ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
