Re: The future of PGO on Windows

Dave Mandelin Thu, 31 Jan 2013 12:01:04 -0800

On Thursday, January 31, 2013 11:32:52 AM UTC-8, Joshua Cranmer wrote:
> On 1/31/2013 12:05 PM, Dave Mandelin wrote:
> > On Thursday, January 31, 2013 9:17:44 AM UTC-8, Joshua Cranmer wrote:
> >> For what it's worth, reading 
> >> <https://bugzilla.mozilla.org/show_bug.cgi?id=833890>, I do not get 
> >> the impression that dmandelin "proved" otherwise. His startup tests 
> >> have very low statistical confidence (n=2, n=3), and someone who 
> >> disclaims his own findings. It may be evidence that PGO is not a Ts 
> >> win, but it is weak evidence at best. 
> 
> > I could certainly run a larger number of trials to see what happens. In 
> > that case, I stopped because the min values for warm startup were about 
> > equal (and also happened to be about equal to other warm startup times I 
> > had measured recently). For many timed benchmarks, "base value + positive 
> > random noise" seems like a good model, in which case mins seem like good 
> > things to compare.
> 
>  From a statistical hypothesis testing perspective, I think (I haven't 
> actually done the math) that the given data is unable to reject either 
> the hypothesis that PGO gives a benefit on startup time or the 
> hypothesis that it does not. Mostly, I was cringing at ehsan's statement 
> that your results "proved" the hypothesis. About what the best 
> statistical criteria are, I don't wish to argue here.


I don't think statistics ever claims to be able to "prove" anything about the 
"actual reality", assuming I understood my stats class at all. Instead, you 
have some assumptions about the distribution of your data (is it normal, 
exponential, etc.; is the variance the same for all conditions or possibly 
variable) and based on that you can compute the probability of your 
experimental outcome. Interpreting that probability is outside the scope of the 
math and is more of a judgment call. It's all very subtle and complex.

For example, in the SunSpider comparison, I did 10 trials each of PGO and 
non-PGO and then ran a t test. The t test then said "If the data are normally 
distributed and PGO and non-PGO have the same variance, then there is a 0.06 
probability that the difference in means was equal to or larger than the 
difference in the averages observed in this experiment". From that plus general 
knowledge, I judged that there 'probably' was some real difference, but that 
it's hard to know for sure. SunSpider scores do not have a normal distribution, 
though, so the 0.06 is talking about a fictional world.

> >> Our Talos results may be measuring imperfect things, but we have
> >> enough datapoints that we can draw statistical conclusions from
> >> them confidently.
> 
> > Statistics doesn't help if you're measuring the wrong things. Whether Ts is 
> > measuring the wrong thing, I don't know. It would be possible to learn 
> > something about that question by measuring startup with a camera, Telemetry 
> > simple measures, and Talos on the same machine and seeing how they compare.
> 
> I should clarify my previous statement: I want to avoid confirmation 
> bias in this decision. The proper way to do that is to lay out all the 
> criterion for acceptance or rejection before you run experiments and 
> measure the results. This, obviously, is impossible at this point, since 
> we have a mountain of data which has already biased our thought processes.

It's rare that you can tell exactly what experiments and criteria you will want 
to use before you start. In practice, the main cues I use are to be sure to 
understand the limits of the information I'm picking up, and to try to prove 
myself wrong when I get a chance.

> > By the way, there is a project (in a very early phase now) to do accurate 
> > measurements of startup time, both cold and warm, on machines that model 
> > user hardware, etc.
> 
> This is really starting to get off-topic, but I do think we need clear 
> guidelines on evaluating performance results, which includes things like 
> ensuring proper statistical testing on results, etc.

That is a primary goal of the project.

Dave
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: The future of PGO on Windows

Reply via email to