Hi,

there was some time ago discussion about whether it pays off to improve the quality of an MC play-out agent or not, and how important it is to keep it "balanced", so I performed the following abstract experiment:

Assume that we start from a position that is game-theoretic win for Black. If we play out moves from this position--say for instance 100--then every move can either switch the game-theoretic value of the present position (blunder) or not (in general a correct move). Of course the only way to switch the game-theoretic value by a player is by blundering in a position that is won for the player and ending up in a position that is lost for the same player: there is no way to "blunder" a lost position into a won one.

I implemented a simple C program that calculates the probability of ending up with correct game-theoretic value at the end of the simulation when the probability of blundering, when possible, is given as a function of the move number. Here are some results (explanations below):

Game length 100, simulations 1000000
                       0% flat | 100.00%
                       1% flat | 99.01%
                       2% flat | 98.04%
                       5% flat | 95.27%
                      10% flat | 90.97%
                      20% flat | 83.28%
                      50% flat | 66.74%
                      80% flat | 55.59%
                      90% flat | 52.65%
                      95% flat | 51.58%
                      98% flat | 57.02%
                      99% flat | 68.53%
                    99.5% flat | 80.37%
                    99.8% flat | 90.97%
                    99.9% flat | 95.21%
                     100% flat | 100.00%
                Linear ramp up | 50.17%
              Linear ramp down | 99.03%
               Squared ramp up | 50.17%
             Squared ramp down | 99.99%
     Squared ramp up, inverted | 98.09%
   Squared ramp down, inverted | 49.97%
                         Spike | 0.00%
      Spike with 10%/10% noise | 52.30%
       Spike with 10%/0% noise | 9.95%
       Spike with 0%/10% noise | 52.34%

Each row represents one million play-outs. The left column is the probability function (how probable it is to blunder) and the right column is the probability that we get the "correct" result at the end of a play-out. Here are the descriptions of the functions:

- N% flat means that the move is correct with probability N% and a blunder with probability (1-N%), when possible (you can't blunder if you are in a lost position)
- Linear ramp up means that the probability is 100% * (k/N) where k is the move number, i.e. moves tend to get better and better by the end of the game
- Linear ramp down is 100% * (1-k/N), i.e. inverted
- Squared ramp up is 100% * (k/N)^2
- Squared ramp down is 100% * (1 - (k/N)^2)
- Squared ramp up and down inverted are obtained by 100% - X where X is the squared ramp
- Spike means that black makes one blunder in the middle but all other moves are correct
- Spike 10%/10% noise is 10% correct move in the middle move and 90% elsewhere
- Spike 10%/0% noise is 10% correct move in the middle and 100% elsewhere
- Spike 0%/10% noise is 0% correct move in the middle and 90% elsewhere

And here some analysis:

- Obviously a move generated that blunders always with probability 1/2 when possible is a great basis for MC analysis because it ends up with correct game-theoretic value with 67% probability

- It is also obvious that of the ones sampled above, the worst probability patterns are rising ramps, i.e. playout agents that play badly in the beginning but get better and better towards the end of the game. For these agents the end result is basically just random noise. The reason is, I believe, that first both players blunder all the time and the game-theoretic value remains always won for Black (two blunders --> Black still winning), but when the blunder probability starts to drop, first the result becomes more or less random, and then the dropping probability "locks" the game-theoretic value to the random value.

Finally, to those who question these numbers, here some intuitive explanation of the mechanics behind:

Suppose you play correctly with probability 50% and you start with Black's move from a position that is win for Black.

With probability 50% you play correct, White answers whatever, but you have still a won position (White cannot turn lost position into won by playing a move.)

With probability 50% you play incorrect, and the position is now won for White. But White also blunders now with probability 50%, so you get another 25% probability to have won position after the two plys.

So even though you the playout agent has only 50% probability of playing correctly, the probability that after 2 plys the position is still won is 75%!

All the best,

--
Antti Huima

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to