Hello again Mark, I already have a script for my second experiment but I still want to run the first one for 50,000-100,000 games just to see what the results will be.
I improved my scripts to keep track of the error rates and the number of cube decisions, so that I can make more sense out of the results, doing better calculations. For that, I would like confirm that I understand your below previous comments correctly, if you don't mind clarifying. On 1/28/2024 8:35 PM, Mark Higgins wrote:
Make that happens once every three games, and maybe the average error size is 0.1, so we'd expect that the dumber strategy would lose, on average, about 0.03 cents per game.
By "0.03 cents" above, did you mean "0.03 percent"? I don't understand equities and points lost, etc. as well as winning chances and results in percentages. Could you reword your comments with that in mind?
How many games do you need to simulate such that the statistical measurement error on the average score is much less than 0.03? The standard deviation of score in a regular backgammon money game is something like 1.3, IIRC;
Is this the generally accepted "imperfectness" of bots? For well over 10 years, I asked people how much "imperfectness" they actually mean by that word but I never got any answers. It must mean "nearly perfect", i.e. very close to 100%, but what is that actual magic number? 1%, 5%, 10%, 25%, 49.9%?? At what point does the cube skill stop being nearly perfect and becomes garbage? Can you clarify what does your above 1.3 mean in this context?
the statistical measurement error on the average is around 1.3 / sqrt(N), where N is the number of games you play. If you want that to be, say, 0.006 (5x smaller than the 0.03 signal we're trying to find), when N would be about 50k games.
So you could run that and see whether the dumb strategy does in fact, lose in head to head play against the standard; or whether it's about even, and all this fancy cube stuff is nonsense.
In this particular experiment, mutant strategy most likely will lose big. My next mutant probably will still lose but by less and in my mutant after that may come out even. In all of my experiments, what I will be looking at is how much the mutant strategies lose or win compared to what is expected from their cube error rates, rather than if they win > 50%, because that will be enough to show that bots' equity and error calculations are wrong. XG does calculate a "should have won" number, (if you are familiar with it). In order to calculate the same, I am extracting from the "match info" the "total-cube" (i.e. the number of actual cube actions), "error-skill" and "error-cost" (i.e. the normalized and un-normalized cube errors per cube decision). With those, I assume I will be able to calculate the total amount of errors for the entire 50,000 games(?) At this point, I don't understand, thus don't know if I need to use the normalized or un-normalized error rate but I hope I will figure it out while the session is running. Any help with this will be appreciated. Also, I have no idea about how to calculate a "should have won" number like in XG. Can you or someone else here help me with that by chance..? MK