Tim, I stand corrected. I always struggle to remember how to calculate these significance tests. I only get to grips with it infrequently, then the knowledge leaches out of my head and I have to relearn it.
I was under the impression that many more games were required to separate backgammon bots. I suppose that a match can only have a 0 or 1 outcome, unlike $ games. This must reduce the number of required trials somewhat. Out of interest, I left gnubg playing a money session 0-ply vs 2-ply over the weekend. It crashed before getting too far, so the only results I have to report are 2ply leading 0-ply by 3034 to 2591 (443 points) over an unknown number of games. -- Ian -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Timothy Y. Chow Sent: 12 June 2015 17:21 To: [email protected] Subject: Re: [Bug-gnubg] Confused Lucas wrote: > Last year i tested using Fibs,( were i did in the past 8 bots), 2 bots > one set to play Worldclass and the other at grandmaster so 2 ply > against 3 ply they played 3000 5 point matches Worldclass the lesser > setting had a winrate of 55 % Ian Shaw wrote: > I'd be surprised if just 3000 5-point matches (maybe 12000 games) was > sufficient to produce statistical significance. A win rate of 55% for 3000 trials is significant at the 5 sigma level. Even if it turns out that the test was not statistically pure (an example of "impurity" would be failing to specify in advance the exact number of trials), Lucas's result is probably very significant. Of course, what can be stated with high confidence is that the two settings are *not equally good*. One cannot state with equal confidence that 2-ply really does have a 55-45 advantage over 3-ply in 5-point matches. Tim _______________________________________________ Bug-gnubg mailing list [email protected] https://lists.gnu.org/mailman/listinfo/bug-gnubg _______________________________________________ Bug-gnubg mailing list [email protected] https://lists.gnu.org/mailman/listinfo/bug-gnubg
