On 1/28/2024 8:35 PM, Mark Higgins wrote: Hi Mark,
Thanks for taking interest and commenting in the subject I'm not going to negate anything you said but will reply by further explaining my experiments. I have a feeling that once you understand better, you will be able to contribute a lot more ideas.
I think one problem with your current test is that it does way too much (random) doubling. So you get huge total scores coming out, which adds loads of noise to the results and makes it hard to be confident what it means.
As I said previously, I was first going to replicate an experiment done in RGB a few years ago, upon my urging, by a mathematician, in which the mutant doubled at > 50%, took at > 0% and never dropped. One of the many threads in RGB you may want to read on this is: https://groups.google.com/g/rec.games.backgammon/c/k61QtBwlsBk/m/EGa4NXdmAgAJ In which he had said: "To summarize: Like for the expected value of a single "game (as shown previously), we have a Petersburg "Paradox occuring for the lead of GNU Backgammon in a "session of such games, so the expected value of this "lead does not exist (base of the exponential term > 1 "for the math people, "oscillations too wild" for the "non-math people). I have the script for that and will do that experiment later. But first I wanted to do an even worse mutant experiment on purpose, causing extremely high, unlimited cube values. Yes, in this case it will take a lot more trials to derive any meaning out of the experiment but I think it's useful as the lowest starting point.
A simpler test might be to compare gnubg's best strategy > against a "dumber" doubling strategy that, say, never offers the cube, and always takes.
Well, this is an idea that may be worth trying. It will force the games to be played out but it won't be as useful for the effort of debunking the "cube skill theory", which claims among other things that beavers and raccoons require even more skill than simple doubles/takes. BTW: I'm not saying that there is no cube skill at all but that it's way too exaggerated. I'm arguing that early in the game, cubeful equities are so inaccurate that cube skill is pretty much non-existent. It becomes really decisive mostly towards the final moves of the game.
If we assume gnubg is "perfect", then anytime the dumber strategy takes when it should be a pass, it'll lose expected value, in the amount of the equity error. .... so we'd expect that the dumber strategy would lose, on average, about 0.03 cents per game.
One of the goal of my experiments is to show that even the "dumbest" cube mutant will win more than what would be expected from its error rate. This is to show that equity and error calculations are inaccurate of unknown amount, at least some but maybe beyond way beyond belief. I have less dumb mutant experiment and then one mutant cube strategy that I have concocted, which I believe will not only win more than expected from its error rate but actually win more than GnuBG World Class. I want to do these experiment in order of worse to better.
The standard deviation of score in a regular backgammon money game is something like 1.3, IIRC; so the statistical measurement error on the average is around 1.3 / sqrt(N), where N is the number of games you play. If you want that to be, say, 0.006 (5x smaller than the 0.03 signal we're trying to find), when N would be about 50k games.
These are great comments but to be honest, I'm struggling to understand how do they apply to what I'm trying to do. If we keep communicating, I may come to fully appreciate.
So you could run that and see whether the dumb strategy does, in fact, lose in head to head play against the standard; or whether it's about even, and all this fancy cube stuff is nonsense.
Whether they lose by a lot, or by a little less, or come out even, or BG gods for come out on top, I'm hoping that all my mutant cube strategies will poke holes of different sizes in the current so-called "cube skill theory" (how dare anyone can call it "theory" is another question). Measuring the size of the holes will come later and perhaps will be done better by others than myself. I just want to at least provide the data. The only "best/perfect cube strategy" I will accept will come from training the bots through cubeful and "matchful" (a term I coined) self-play, instead of extrapolating cubeful equities by applying "untested" formulas to cubeless equities and extrapolating matchfull equities by applying MET's to cubeless equities... I only want better bots. But to create a need for them, I must first try to destroy the mediocre offsprings of TD-Gammon v2. (TD-Gammon v1 was okay. It became human-biased later). MK