Mark Boon wrote: > > On 6-dec-07, at 19:29, Don Dailey wrote: >> >> Here is an example of why this works so well and why your greedy >> approach is so wrong: >> >> Consider a position where there are 2 groups left that are being >> fought over. One of these groups is very large and the other is quite >> small. The computer must win one of these groups or it will lose the >> game - but if it wins either group it wins the game. The computer >> estimates that attacking the huge group will result in an impressive win >> with 80% confidence, but attacking the small group will result in a >> "tiny" win with 85% confidence. > > Don, what I think they are trying to say is the following: in your > scenario there may be a third group that is currently 'evaluated' as > dead which is bigger than your small group. This evaluation may be > wrong with probability X. In this case, killing the big group will > give you 80% chance of winning where killing the smaller group gives > you 85% times X, the chance the other group was evaluated wrongly. > That may turn out to be far less then 80% in total because X needs to > be >95%. > > So size makes up for some loss of confidence. The question is how to > quantify it.
But the point is that size should NEVER over-ride confidence. The size of stones is very much of a consideration in Monte Carlo play-outs and when it isn't a consideration is only when it doesn't matter. The whole conversation sprung from a misconception that a monte carlo program was losing game after game due to ignoring a few stones it could have captured but this was not what happened - it was a difference in how handicap games were scored. Would you rather be 95% confident of a win or 90% confident? There is only 1 correct answer to that question. > > Of course 'evaluation' in MC is completely counterintuitive. I disagree. This is the most intuitive kind of evaluation but it's more difficult to code up in a traditional evaluation function if that is what you mean. Trying to grab more stones than you need to win is a more difficult goal to impose on a program. Your evaluation function DEFINES the goal that your program will be playing for. I think you are confusing this with "follow through" in other sports. When you run a sprint you dont' stop at the finish line, you have to run full strength until you are clearly beyond the finish line, otherwise you pull up a little short. But I think you image that playing to win is like not following through in sports. You think this is the same as playing to win by 0.5 points (and no more.) And this simply is NOT the case. In most cases monte carlo programs are trying to win a lot more than 0.5 points - precisely because it increases the chances of winning period. In the many cases where they win by exactly 0.5 it's because they saw the clear win early, focused on not spoiling this for anything in the world and followed through on this. You would have them stop short of this and go for some speculative big win and take foolish risks. > It doesn't evaluate groups separately at all. It makes it hard to > think about ways to improve it. But I'm absolutely convinced that MC > programs will play better if they find a way to adjust confidence for > the size of the win. The correct way just hasn't been found yet. And > it may not be easy at all. I have a reason for believing this won't work. First of all, material evaluation in GO is an even WORSE evaluation function than in Chess. It says that losing by 0.5 is ALMOST the same as winning by 0.5. When you have no way to calculate the odds, that seems to be the simplest thing to do. Monte Carlo play-outs represents something WAY more intelligent, it knows the difference between wins and losses. If you mix something inferior with something superior, you hope that somehow the whole is greater than the sum of the parts. But when you mix these two things you are getting the average of the two, not the best of both. It's pretty easy to see why this doesn't work. If you do straight stone counting a very large group that might be won very quickly dominates the score. Consider this: 1. A group of 5 stones that you can win for sure and it wins the game. 2. A group of 200 stones - you have a 5% chance of winning it. You lose otherwise. In stone counts scenario 1 gives you 5 stones. scenario 2 gives you 40 stones (20% chance of winning 200 stones.) The difference is enormous! The losing moves wins by a large margin! Even more amazing is that even if the big group was infinitely large, scenario 1 is a sure win and scenario 2 represents only a 5 percent chance of winning the game. This is the nature of the error that you want to fold into the evaluation function. It's natural to think that MC play-outs themselves are subject to error and so needs some help with some more "sensible" heuristics. For instance 30% winning chance doesn't sound very good but due to error it could be a lot more. So should the program revert to stone counting with the idea that if you win a few more stones you will have a better chance of winning the game? Let's consider the difference in how both program would play in this scenario: 1. A stone counter will focus on a few small groups that are easy to win and minimize the sure loss. 2. A monte carlo program will see that these few small groups are irrelevant and will focus entirely on some line of play that gives it a fighting chance. Perhaps attacking a large group that it has about 30% chance of winning. Not good odds, but certainly better than picking off the little stones that guarantee a loss. With an alpha beta search, you horizon is limited. Let's say you are doing a 5 ply search. Then your evaluation function should be heavily biased towards counting potential territory. If there is a big group that can be won in 5 ply and it's not clear whether you can waste any time, then you should go for it because you don't have any idea what will happen later. You must be greedy. However MC play-outs is not horizon limited like this. It's stupid to make it greedy because it may notice that winning the big group leads to a loss every time and that some other course of action is more productive. I don't see how to mix the two and many of us have tried to no avail. The only possibility that I can think of is to put the stone count information in the lower order bits and make them so low they cannot affect the higher order probability bits. - Don > > Mark > > _______________________________________________ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/