UCT, alpha-beta, Monte-Carlo, and many others (not related to game playing) are 
all methods of optimization. In college we all did it countless times to find 
the maximum or minimum of a function. It's the simplest form of optimization. 
In an optimization two things are usually required: the accuracy and the 
efficiency in time and resources.
 
To address the efficiency problem one thing can be used. That is an early 
indication which direction the optimization should go. Most analytical methods 
use the first and the second order derivatives as indicators. In UCT the values 
of the top level nodes are used as indicators.
 
Then there is another problem in an optimization. It's the local maximum or 
minimum which may not be the overall optimum value. Many methods implemented 
ways to get out of local maxima or minima.
 
When come to the real world problems, another problem appears. It's the noise 
in the measured data. This problem can more or less be treated with the same 
method of treating local maxima or minima. 
 
In the real world there are other optimization problems. The function one seek 
to optimize could be a random function with a known or unknown probability 
distribution. It's the extreme case of the noise problem mentioned earlier, 
except that the probability distribution may be different. It can also be 
treated as an extreme case oflocal maxima or minina.
 
In UCT the term exploitation vs exploration (EvE) is just refering to the 
method of treating this (though extreme here) local maxima or minima. It does 
this by discounting the values obtained by examine its child nodes. Thus, UCT 
not only does exploitation and exploration, it also does probability 
distribution analysis of the values. The main method used is the averaging. 
From this understanding one improvement of the UCT applied to Go can 
immediately be made. That is the values of the higher level nodes should be 
averaged more and less for deep level nodes. Or the values of deeper level 
nodes should be weighed more in the averaging. This is because the MC 
evaluation becomes more and more accurate in deeper levels.
 
Is this effect already included in UCT? Probably not. Because UCT is derived 
for the 'Bandit problem'. In the 'bandit problem' the value of the reward has 
the same distribution at any level of nodes. However, I have not go through the 
mathematics of UCT in detail. This effect could be already included even though 
'Bandit problem' doesn't require it.
 
 
 
Daniel Liu
________________________________________________________________________
AOL now offers free email to everyone.  Find out more about what's free from 
AOL at AOL.com.
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to