UCT, alpha-beta, Monte-Carlo, and many others (not related to game playing) are all methods of optimization. In college we all did it countless times to find the maximum or minimum of a function. It's the simplest form of optimization. In an optimization two things are usually required: the accuracy and the efficiency in time and resources. To address the efficiency problem one thing can be used. That is an early indication which direction the optimization should go. Most analytical methods use the first and the second order derivatives as indicators. In UCT the values of the top level nodes are used as indicators. Then there is another problem in an optimization. It's the local maximum or minimum which may not be the overall optimum value. Many methods implemented ways to get out of local maxima or minima. When come to the real world problems, another problem appears. It's the noise in the measured data. This problem can more or less be treated with the same method of treating local maxima or minima. In the real world there are other optimization problems. The function one seek to optimize could be a random function with a known or unknown probability distribution. It's the extreme case of the noise problem mentioned earlier, except that the probability distribution may be different. It can also be treated as an extreme case oflocal maxima or minina. In UCT the term exploitation vs exploration (EvE) is just refering to the method of treating this (though extreme here) local maxima or minima. It does this by discounting the values obtained by examine its child nodes. Thus, UCT not only does exploitation and exploration, it also does probability distribution analysis of the values. The main method used is the averaging. From this understanding one improvement of the UCT applied to Go can immediately be made. That is the values of the higher level nodes should be averaged more and less for deep level nodes. Or the values of deeper level nodes should be weighed more in the averaging. This is because the MC evaluation becomes more and more accurate in deeper levels. Is this effect already included in UCT? Probably not. Because UCT is derived for the 'Bandit problem'. In the 'bandit problem' the value of the reward has the same distribution at any level of nodes. However, I have not go through the mathematics of UCT in detail. This effect could be already included even though 'Bandit problem' doesn't require it. Daniel Liu ________________________________________________________________________ AOL now offers free email to everyone. Find out more about what's free from AOL at AOL.com.
_______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/