Re: [computer-go] Abstract analysis of Monte Carlo playout

Erik van der Werf Sat, 28 Jul 2007 02:23:36 -0700

Hi Antti,

I had a quick look at your numbers. Maybe I misunderstood something,
but at first glance there appears to be a parity effect (an even
number of 100% blunder moves always get it right).


How do the statistics look if the game length is odd?

If it matters, maybe you should sample over a reasonable distribution
of game lengths or otherwise just average odd and even.

Erik


On 7/28/07, Antti Huima <[EMAIL PROTECTED]> wrote:
>
>  Hi,
>
>  there was some time ago discussion about whether it pays off to improve the
> quality of an MC play-out agent or not, and how important it is to keep it
> "balanced", so I performed the following abstract experiment:
>
>  Assume that we start from a position that is game-theoretic win for Black.
> If we play out moves from this position--say for instance 100--then every
> move can either switch the game-theoretic value of the present position
> (blunder) or not (in general a correct move). Of course the only way to
> switch the game-theoretic value by a player is by blundering in a position
> that is won for the player and ending up in a position that is lost for the
> same player: there is no way to "blunder" a lost position into a won one.
>
>  I implemented a simple C program that calculates the probability of ending
> up with correct game-theoretic value at the end of the simulation when the
> probability of blundering, when possible, is given as a function of the move
> number. Here are some results (explanations below):
>
>  Game length 100, simulations 1000000
>                         0% flat | 100.00%
>                         1% flat | 99.01%
>                         2% flat | 98.04%
>                         5% flat | 95.27%
>                        10% flat | 90.97%
>                        20% flat | 83.28%
>                        50% flat | 66.74%
>                        80% flat | 55.59%
>                        90% flat | 52.65%
>                        95% flat | 51.58%
>                        98% flat | 57.02%
>                        99% flat | 68.53%
>                      99.5% flat | 80.37%
>                      99.8% flat | 90.97%
>                      99.9% flat | 95.21%
>                       100% flat | 100.00%
>                  Linear ramp up | 50.17%
>                Linear ramp down | 99.03%
>                 Squared ramp up | 50.17%
>               Squared ramp down | 99.99%
>       Squared ramp up, inverted | 98.09%
>     Squared ramp down, inverted | 49.97%
>                           Spike | 0.00%
>        Spike with 10%/10% noise | 52.30%
>         Spike with 10%/0% noise | 9.95%
>         Spike with 0%/10% noise | 52.34%
>
>  Each row represents one million play-outs. The left column is the
> probability function (how probable it is to blunder) and the right column is
> the probability that we get the "correct" result at the end of a play-out.
> Here are the descriptions of the functions:
>
>  - N% flat means that the move is correct with probability N% and a blunder
> with probability (1-N%), when possible (you can't blunder if you are in a
> lost position)
>  - Linear ramp up means that the probability is 100% * (k/N) where k is the
> move number, i.e. moves tend to get better and better by the end of the game
>  - Linear ramp down is 100% * (1-k/N), i.e. inverted
>  - Squared ramp up is 100% * (k/N)^2
>  - Squared ramp down is 100% * (1 - (k/N)^2)
>  - Squared ramp up and down inverted are obtained by 100% - X where X is the
> squared ramp
>  - Spike means that black makes one blunder in the middle but all other
> moves are correct
>  - Spike 10%/10% noise is 10% correct move in the middle move and 90%
> elsewhere
>  - Spike 10%/0% noise is 10% correct move in the middle and 100% elsewhere
>  - Spike 0%/10% noise is 0% correct move in the middle and 90% elsewhere
>
>  And here some analysis:
>
>  - Obviously a move generated that blunders always with probability 1/2 when
> possible is a great basis for MC analysis because it ends up with correct
> game-theoretic value with 67% probability
>
>  - It is also obvious that of the ones sampled above, the worst probability
> patterns are rising ramps, i.e. playout agents that play badly in the
> beginning but get better and better towards the end of the game. For these
> agents the end result is basically just random noise. The reason is, I
> believe, that first both players blunder all the time and the game-theoretic
> value remains always won for Black (two blunders --> Black still winning),
> but when the blunder probability starts to drop, first the result becomes
> more or less random, and then the dropping probability "locks" the
> game-theoretic value to the random value.
>
>  Finally, to those who question these numbers, here some intuitive
> explanation of the mechanics behind:
>
>  Suppose you play correctly with probability 50% and you start with Black's
> move from a position that is win for Black.
>
>  With probability 50% you play correct, White answers whatever, but you have
> still a won position (White cannot turn lost position into won by playing a
> move.)
>
>  With probability 50% you play incorrect, and the position is now won for
> White. But White also blunders now with probability 50%, so you get another
> 25% probability to have won position after the two plys.
>
>  So even though you the playout agent has only 50% probability of playing
> correctly, the probability that after 2 plys the position is still won is
> 75%!
>
>  All the best,
>
>
> --
>  Antti Huima
>
>
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Abstract analysis of Monte Carlo playout

Reply via email to