On Tue, Apr 13, 2010 at 10:36:35PM -0400, [email protected] wrote:
> 
> The policy of basic UCT-MC is to choose the child node that has the largest 
> (or smalles) value. It's found that with this policy alone the search faces a 
> big hump, which is impossible to overcome with present computng power. To 
> reduce the problem online and off-lne knowledge are used to help to choose 
> the child node. The choosng criteria based on the online knowledge is a 
> policy. So is that based on the off-lne knowledge. Instead of combining the 
> three polcies to choose a child node, three playout routines can be written. 
> Each playout routine chooses child nodes based on only one of the above 
> polices through out the playout process. So run N1 number of playout_1(), N2 
> number of playout_2() and N3 number of playout_3(). There are possibility of 
> other policies.

Ah, I see. However, the offline knowledge seems beneficial only very
early after node expansion, the online knowledge only slightly later
and for well-expanded node, not using the actual winrates seems harmful;
so giving up the per-node time dependence of policy choice does not
seem very appealing.

-- 
                                Petr "Pasky" Baudis
http://pasky.or.cz/ | "Ars longa, vita brevis." -- Hippocrates
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to