The policy of basic UCT-MC is to choose the child node that has the largest (or smalles) value. It's found that with this policy alone the search faces a big hump, which is impossible to overcome with present computng power. To reduce the problem online and off-lne knowledge are used to help to choose the child node. The choosng criteria based on the online knowledge is a policy. So is that based on the off-lne knowledge. Instead of combining the three polcies to choose a child node, three playout routines can be written. Each playout routine chooses child nodes based on only one of the above polices through out the playout process. So run N1 number of playout_1(), N2 number of playout_2() and N3 number of playout_3(). There are possibility of other policies.
DL -----Original Message----- From: Petr Baudis <[email protected]> To: [email protected] Cc: [email protected] Sent: Tue, Apr 13, 2010 6:12 am Subject: Re: [Computer-go] A different approach to RAVE On Mon, Apr 12, 2010 at 11:20:19PM -0400, [email protected] wrote: Instead of usng different polcies to choose a child node, another possibility s to run different play out over the same tree. Each play out uses a different olicy. Standard UCT-MC is one of the policy. I think it would achieve the same esukts as the RAVE. I'm sorry, your idea is completely unclear to me. What would be the ther policies? What does this have to do with RAVE? Petr "Pasky" Baudis
_______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
