On Wed, 2007-07-04 at 11:34 +0200, Magnus Persson wrote: > but what really will make a > difference is in the quality in the playouts.
I would like to suggest a more abstract view of things. In the purest form of the algorithm there isn't an artificial distinction between the tree and the play-outs. The algorithm is applied as if the whole tree already exists (conceptually) and nodes are updated to the end of the game. We had to impose end nodes and a tree that grows in depth due to the fact that it's impractical to store the whole tree in memory. So we have a tree phase on the one hand, and on the other hand we have a play-out phase that simulates an unexplored tree (but without updates which introduces out of necessity a small inefficiency.) This makes everything a bit of a compromise but a well advised one due to hardware limitations. But then we started imposing our will on the play-outs in order to make them "smarter." But we didn't do the same to the tree portion because we now believe they are 2 separate things (even though they really are not.) So I prefer to think of the play-outs and the tree as the same thing. I think whatever is done can be applied to both. For instance Lazarus does a lot of pruning and the pruning rules are the same for tree portion and the play-out portion. Actually, Lazarus saw most of the improvement from the tree pruning when I test each without the other. But I notice that we are now looking at the tree as the "search" portion and the play-outs as the "evaluation function." I think that is incredible because I have always believed that "tree search" and "evaluation" are the same thing, just different forms or states. Like water and ice, or matter and energy. It's interesting that chess has this too. Traditionally programs have always had these 3 very crude phases, search, quiescence, evaluation. Modern programs have somewhat blurred these distinctions but it hasn't changed very much. UCT comes along and finally does away with the distinction altogether. Now you can call it all evaluation or search, whatever pleases you. But in it's purest form, UCT with totally random play-outs is a beautiful thing - a recursive evaluation function with zero (almost) domain specific knowledge. Of course now we just had to go and spoil it all by imposing domain specific rules. I have done the same and I admit it. It would be fun to see how far we could go if domain specific knowledge was forbidden as an experiment. Once patterns are introduced along with other direct Go knowledge, it's still fun but it feels a bit "wrong", kind of like cheating. It's clear that when we do this, we introduce strengths and weaknesses to the program, making it a bit more fragile, less "universal" or robust. Stronger too, but more susceptible to in-transitivity. Of course we do this in Chess programs in a big way. We very tediously "tell" the program what is good and what is bad. It has no choice, it must accept our definition of right and wrong, our morality. However in our great wisdom we provide a search mechanism in order to correct our bad judgments. The search mechanism is an admission that we know we are wrong about many things. Of course you are right - if the play-outs are improved, the quality of the moves will also improve. - Don _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/