Eric, Yes, as Magnus also stated MC play-out doesn't really accurately estimate the "real" winning probability but it still get the move order right most of the time.
The situation is that if the position is really a win, it doesn't mean that a MC is able to find the proof tree. But it means that it's easier to find wins than losses as so the score as expressed by a winning percentage goes up - but not to 1.0 I have also found that if Lazarus says 60%, it is going to win much more than 60% against equal opposition. Of course it's not surprise if it beats weaker opposition from this point. I thought of mapping these percentages to actual percentages by playing a few thousand self play games, but this is pretty much futile. If the score is, for instance 65% it will tend to grow higher and higher depending on how long I let it think. Unless it discovers a clever defense that is - so it's possible that it will start declining at a deep level. So these numbers are really meaningless as absolute figures and have everything to do with the current context of the search. - Don Eric Boesch wrote: > On 12/11/07, Mark Boon <[EMAIL PROTECTED]> wrote: > >> Question: how do MC programs perform with a long ladder on the board? >> >> My understandig of MC is limited but thinking about it, a crucial >> long ladder would automatically make the chances of any playout >> winning 50-50, regardless of the actual outcome of the ladder. >> > > No, 50/50 would make too much sense. It might be higher or it might be > lower, depending on whose move it is in the ladder and what style of > playouts you use, but exactly 50/50 would be like flipping a coin and > having it land on its edge. In test cases, MC-UCT evaluations tend to > cluster near 50/50 in any case, because MC-UCT, especially dumb > uniform MC-UCT, tends to be conservative about predicting the winner, > especially in 19x19 where the opportunities for luck to overwhelm the > actual advantage on the board are greater. But if you accept this as > just a moot scaling issue -- that a clearly lopsided exchange can mean > just a 2% increment in winning percentage even if read more or less > correctly -- then the numbers may not look so even after all. I's > certainly possible for MC-UCT to climb a broken ladder in a winning > position (and climbing a broken ladder in an even position is at least > half as bad as that anyhow). > > I tried testing this on 19x19 using libego at 1 million playouts per > move. The behavior was not consistent, but the numbers trended in the > defender's favor as the sides played out the ladder. In one bizarre > case, the attacker played out the ladder until there were just 17 > plies left, and then backed off. > > Why would the attacker give up a winning ladder? It appears the MC-UCT > was never actually reading the ladder to begin with; just four or five > plies in, sometimes just a few thousand simulations were still > following the key line. 1 million playouts were not nearly enough for > that in this case; maybe 100 million would be enough, but I couldn't > test that. Also, after enough simulations, decisively inferior moves > lead to fewer losses than slightly inferior ones. Suppose you have > three moves available: one wins 75% of the time, one 50%, and one 25%. > In the long run, the 75% move will be simulated almost all the time, > but the middle move will be simulated roughly four times as often as > the 25% one that, compared to the best move available, is twice as > bad, and four times the simulations with half the loss per simulation > adds up to twice the excess losses compared to the 25% move. That is > apropos here, because giving up on an open-field ladder once it has > been played out for a dozen moves is much more painful for the > defender than for the attacker. The longer the ladder got, the more > the evaluations trended in the defender's favor, and my best > explanation would be the fact that -- until you actually read the > ladder all the way out and find that the defender is dead -- every > move except pulling out of atari is so obviously bad that even uniform > MC-UCT did a better job of focusing on that one good move. > > (Incidentally, the conservative nature of MC-UCT ratings largely > explains why maximizing winning probabilities alone is not a bad > strategy, at least in even games. The classic beginner mistake, when > you already have a clear lead in theory, is to fail to fight hard to > grab still more points as blunder insurance. But an MC-UCT evaluation > of 90% typically means a >90% probability of actually winning against > even opposition, not just a 90% likelihood of a theoretical win. > Assigning a 65% evaluation to an obvious THEORETICAL win allows plenty > of room to assign higher evaluations to even more lopsided advantages. > As Don said, when MC-UCT starts blatantly throwing away points for no > obvious reason, it's almost certainly because the game is REALLY over, > because MC-UCT's errors tend to be probabilistic instead of absolute > -- it may in effect evaluate a dead group as 75% alive, but it won't > call it 100% alive except in the rare cases when the underlying random > playout rules forbid the correct line of play.) > _______________________________________________ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > > _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/