Thanks for posting that Remi. I do remember seeing that before but somehow I didn't notice it when looking for RAVE-related stuff recently.

Mathematics is not my strong point, so I have a hard time making sense of those formula's. I do get the gist that it uses a UCT value and a RAVE value in a similar fashion, one based on actual playouts and the other based on virtual playouts (based on AMAF). The balance in which the two values influences node-selection is calculated by beta, which favours UCT for frequently visited nodes and RAVE for unfrequently visited notes. But I'm not toally clear on what b_r and q_ur actually are in formula (11). (I don't know how to denote subscription symbols in mail.) At first glance this seems to be a bit more sophisticated version of what Denis was trying to explain.

What is also not clear to me from the article is how this UCT_RAVE value is used after it's calculated. In plain UCT search you select the node with the highest win/loss+UCT value. How does the virtual win/loss ratio get used in combination with the UCT-RAVE value resulting from formula (14)? Is this explained in the original by Gelly and Silver?

        Mark


On 28-nov-08, at 07:38, Rémi Coulom wrote:

Hi Mark,

Maybe you missed the nice RAVE formula that David Silver posted in that message:
http://computer-go.org/pipermail/computer-go/2008-February/014095.html
Unfortunately, the list archive does not keep attachments. I attached another copy to this message.

I am not sure it is better than your formula, but I thought it would be good to repost it, since it seems that it is not available online anywhere.

Rémi<rave.pdf>_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to