On 30-nov-08, at 16:51, Jason House wrote:
You've claimed to be non-statistical, so I'm hoping the following
is useful... You can compute the likelihood that you made an
improvement as:
erf(# of standard deviations)
Where # of standard deviations =
(win rate - 0.5)/sqrt(#games)
Erf is ill-defined, and in practice, people use lookup tables to
translate between standard deviations and confidence levels. In
practice, people set a goal confidence and directly translate it to
a number of standard deviations (3.0 for 99.85%). This situation
requires the one-tailed p test.
After about 20 or 30 games, this approximation is accurate and can
be used for early termination of your test.
Lately I use twogtp for my test runs. It computes the winning
percentage and puts a ± value after it in parenthesis. Is that the
value of one standard deviation? (I had always assumed so.) Even
after a 1,000 games it stays in the 1.5% neighbourhood.
Maybe 20-30 games is usually an accurate approximation. But if you
perform tests often, you'll occasionally bump into that unlikely
event where what you thought was a big improvement turned out to be
no improvement at all. Or the other way around. Only when I see 20+
games with a zero winning percentage do I stop it, assuming I made a
mistake.
Mark
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/