I think one problem with your current test is that it does way too much
(random) doubling. So you get huge total scores coming out, which adds
loads of noise to the results and makes it hard to be confident what it
means.

A simpler test might be to compare gnubg's best strategy against a "dumber"
doubling strategy that, say, never offers the cube, and always takes. In
that world, the cube is never larger than 2, so it fixes the noise from
randomly humongous cube values.

If you do that, then you might wonder how many games those two strategies
need to play against themselves to see a difference. First, you'd need to
estimate the size of the signal we're trying to find. If we assume gnubg is
"perfect", then anytime the dumber strategy takes when it should be a pass,
it'll lose expected value, in the amount of the equity error. Make that
happens once every three games, and maybe the average error size is 0.1, so
we'd expect that the dumber strategy would lose, on average, about 0.03
cents per game.

How many games do you need to simulate such that the statistical
measurement error on the average score is much less than 0.03? The standard
deviation of score in a regular backgammon money game is something like
1.3, IIRC; so the statistical measurement error on the average is around
1.3 / sqrt(N), where N is the number of games you play. If you want that to
be, say, 0.006 (5x smaller than the 0.03 signal we're trying to find), when
N would be about 50k games.

So that tells you how many games you need to play in your simulation to see
if there's a measurable difference: around 50k. Maybe my numbers are wrong
by a factor of 2 here or there, but that's the idea.

So you could run that and see whether the dumb strategy does, in fact, lose
in head to head play against the standard; or whether it's about even, and
all this fancy cube stuff is nonsense.




On Sun, Jan 28, 2024 at 10:14 PM MK <playbg-...@yahoo.com> wrote:

> On 1/28/2024 4:29 PM, Joseph Heled wrote:
>
> > I hope you realize you will need hundreds of thousands
> > of games, millions maybe, to get statistical significance.
>
> Okay, well, let's try to have a rational conversation
> about this. (I won't quote the entire previous post.)
>
> How many hundreds of thousands or millions of games
> have you guys ran to determine that the current cube
> strategy adopted by all bots and humans (except me),
> is indeed the best strategy?
>
> None. Zero. You all believe it "myth-ematically".
> (Ha! Aren't I the "monster punster"..? :)
>
> BTW: It's never too late and there is a fairly easy
> way of doing this, and won't take long if done in a
> spot checking manner for starters. Let me know if
> you guys would want to do it and dare the outcome.
>
> Still, I don't mind your asking from me something that
> you all haven't asked from yourselves. I'll be glad to
> run a hundred thousand games for starters (and more if
> needed later). But before I do that, we need to agree
> on certain things.
>
> 1- Will you trust my results? Probably not. Nobody in
> the past openly said that they trusted the results of
> my previous experiments. That's why I'm always urging
> you guys to run your own tests and sharing my scripts.
>
> 2- You and preferably a "statistically significant" ;)
> number of credible members of the BG community have to
> commit beforehand to what kind of results will convince
> you all that the current "cube skill theory" is bogus.
>
> If such a self-destructive mutant cube strategy in this
> script wins 5% against GnuBG World-Class, for example,
> will it be enough to convince you all? Maybe 10%? More?
> You have to name it and commit to it before I start.
>
> I have older spare CPU's that I can run 24/7 for this.
> But alternatively and preferably for more reasons than
> one, several of you can run shorter sessions concurrently
> and accummulate several hundreds of thousands, if not
> millions, of games in a matter of a few days, with your
> own results that you can trust.
>
> How about it guys...??
>
> BTW: this not the only mutant cube experiment I have in
> mind but let's start with this one from the very bottom.
>
> MK
>
>
>

Reply via email to