Thanks for posting that Remi. I do remember seeing that before but
somehow I didn't notice it when looking for RAVE-related stuff recently.
Mathematics is not my strong point, so I have a hard time making
sense of those formula's. I do get the gist that it uses a UCT value
and a RAVE value in a similar fashion, one based on actual playouts
and the other based on virtual playouts (based on AMAF). The balance
in which the two values influences node-selection is calculated by
beta, which favours UCT for frequently visited nodes and RAVE for
unfrequently visited notes. But I'm not toally clear on what b_r and
q_ur actually are in formula (11). (I don't know how to denote
subscription symbols in mail.) At first glance this seems to be a bit
more sophisticated version of what Denis was trying to explain.
What is also not clear to me from the article is how this UCT_RAVE
value is used after it's calculated. In plain UCT search you select
the node with the highest win/loss+UCT value. How does the virtual
win/loss ratio get used in combination with the UCT-RAVE value
resulting from formula (14)? Is this explained in the original by
Gelly and Silver?
Mark
On 28-nov-08, at 07:38, Rémi Coulom wrote:
Hi Mark,
Maybe you missed the nice RAVE formula that David Silver posted in
that message:
http://computer-go.org/pipermail/computer-go/2008-February/014095.html
Unfortunately, the list archive does not keep attachments. I
attached another copy to this message.
I am not sure it is better than your formula, but I thought it
would be good to repost it, since it seems that it is not available
online anywhere.
Rémi<rave.pdf>_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/