On 30-nov-08, at 16:51, Jason House wrote:

You've claimed to be non-statistical, so I'm hoping the following is useful... You can compute the likelihood that you made an improvement as:
erf(# of standard deviations)
Where # of standard deviations =
(win rate - 0.5)/sqrt(#games)

Erf is ill-defined, and in practice, people use lookup tables to translate between standard deviations and confidence levels. In practice, people set a goal confidence and directly translate it to a number of standard deviations (3.0 for 99.85%). This situation requires the one-tailed p test.

After about 20 or 30 games, this approximation is accurate and can be used for early termination of your test.


Lately I use twogtp for my test runs. It computes the winning percentage and puts a ± value after it in parenthesis. Is that the value of one standard deviation? (I had always assumed so.) Even after a 1,000 games it stays in the 1.5% neighbourhood.

Maybe 20-30 games is usually an accurate approximation. But if you perform tests often, you'll occasionally bump into that unlikely event where what you thought was a big improvement turned out to be no improvement at all. Or the other way around. Only when I see 20+ games with a zero winning percentage do I stop it, assuming I made a mistake.

Mark

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to