On Fri, Aug 07, 2009 at 02:08:21PM -0500, Kevin Grittner wrote: > With the 20 samples from that last round of tests, the answer (rounded > to the nearest percent) is 60%, so "probably noise" is a good summary. > Combined with the 12 samples from earlier comparable runs with the > prior version of the patch, it goes to a 90% probability that noise > would generate a difference at least that large, so I think we've > gotten to "almost certainly noise". :-) > > To me, that seems more valuable for this situation than saying "we > haven't reached 90% confidence that it's a real difference." I used > the same calculations up through the t-statistic.
The stats people in our group just tend to say that things are significant or not at a specific level; never bothered to find out why, I'll ask someone when I get a chance. > The one question I have left for this technique is why you went with > > ((avg1 - avg2) / (stddev * sqrt(2/samples))) > instead of > ((avg1 - avg2) / (stddev / sqrt(samples))) I was just doing a literal translation of what was on the Wikipedia page: http://en.wikipedia.org/wiki/Student's_t-test#Independent_two-sample_t-test If you really want to find out, there should be much better implementations in the pl/r language already in PG. I'd trust R much more than Wikipedia, but for things like this Wikipedia is reasonable. > I assume that it's because the baseline was a set of samples rather > than a fixed mark, but I couldn't pick out a specific justification > for this in the literature (although I might have just missed it), so > I'd feel more comfy if you could clarify. Sorry, that's about my limit! I've never studied stats, I'm a computer science person who just happens to be around people who use stats on a day-to-day basis and think it needs more use in the software world. I think you're right and you're aggregating the errors from two (assumed independent) datasets hence you want to keep a bit more of the error in there. As to the formal justification (and probably proof) I've no real idea. > Given the convenience of capturing benchmarking data in a database, > has anyone tackled implementation of something like the spreadsheet > TDIST function within PostgreSQL? Again, pl/r is what you want! -- Sam http://samason.me.uk/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers