On Tue, Apr 13, 2010 at 10:36:35PM -0400, [email protected] wrote:
>
> The policy of basic UCT-MC is to choose the child node that has the largest
> (or smalles) value. It's found that with this policy alone the search faces a
> big hump, which is impossible to overcome with present computng power. To
> reduce the problem online and off-lne knowledge are used to help to choose
> the child node. The choosng criteria based on the online knowledge is a
> policy. So is that based on the off-lne knowledge. Instead of combining the
> three polcies to choose a child node, three playout routines can be written.
> Each playout routine chooses child nodes based on only one of the above
> polices through out the playout process. So run N1 number of playout_1(), N2
> number of playout_2() and N3 number of playout_3(). There are possibility of
> other policies.
Ah, I see. However, the offline knowledge seems beneficial only very
early after node expansion, the online knowledge only slightly later
and for well-expanded node, not using the actual winrates seems harmful;
so giving up the per-node time dependence of policy choice does not
seem very appealing.
--
Petr "Pasky" Baudis
http://pasky.or.cz/ | "Ars longa, vita brevis." -- Hippocrates
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go