Re: [computer-go] Rapid action value estimation
On Mon, 5 Nov 2007, Jason House wrote: I implemented this yesterday. In doing so, I realized I didn't know the proper way to initialize new leaves in the UCT tree. MoGo papers seem to talk about a progression from always picking an unexplored leaf (AKA using infinity for the upper confidence bound), to first play urgency (using a fixed ucb for new leaves), to using patterns. What did you decide on? What is the difference between 'hb-678-UCTRAVE-10k' and 'hb-675-UCT-10k'. Thanks, Christoph ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Rapid action value estimation
On Wed, 2007-11-07 at 14:34 -0800, Christoph Birk wrote: What is the difference between 'hb-678-UCTRAVE-10k' and 'hb-675-UCT-10k'. It's probably obvious, but UCTRAVE uses RAVE instead of just (tuned) UCT. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Rapid action value estimation
I'd like to implement RAVE as described in [1]. I believe I have a very clear understanding of how to do this at the leaves of the UCT search tree. What I'm not sure about is how to apply RAVE results higher in the UCT tree. Does anyone have any experience with this that they're willing to share? [1] http://www.machinelearning.org/proceedings/icml2007/papers/387.pdf ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Rapid action value estimation
On Fri, 2 Nov 2007, Benjamin Teuber wrote: I don't think there's something different at different depths in the tree.. To update RAVE after a simulation, for each child of a node you visited during that simulation, you update if the move leading to the child was played later (until the end of the playout). Then, always when you calculate the UCT value, you combine that with the RAVE value with that weighted average formula to give the final score. Of course, you need to be careful with signs :-) That means you have one global 'RAVE' table? Or one at each node in the UCT tree? Christoph ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Rapid action value estimation
I don't think there's something different at different depths in the tree.. To update RAVE after a simulation, for each child of a node you visited during that simulation, you update if the move leading to the child was played later (until the end of the playout). Then, always when you calculate the UCT value, you combine that with the RAVE value with that weighted average formula to give the final score. Of course, you need to be careful with signs :-) Btw, I don't really see a point in calculating and adding the confidence bound for RAVE as well, as all moves will have been played almost equally often - thus I dropped the term.. Maybe Sylvain or someone else can comment on this.. Another thing - I didn't believe that you need to do RAVE seperately for both colors (i.e. you should only consider later moves on the point by the same color), as e.g. Peter Drake mentioned in a paper of his. But after some experiments I changed my mind and think he is right =) Cheers, Benjamin On 11/2/07, Jason House [EMAIL PROTECTED] wrote: I'd like to implement RAVE as described in [1]. I believe I have a very clear understanding of how to do this at the leaves of the UCT search tree. What I'm not sure about is how to apply RAVE results higher in the UCT tree. Does anyone have any experience with this that they're willing to share? [1] http://www.machinelearning.org/proceedings/icml2007/papers/387.pdf ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Rapid action value estimation
I store it in the normal uct tree, so that each node has variables raveVisits and raveWins besides uctVisits and uctWins. So a node in the UCT-DAG can either represent a position or a move. On 11/2/07, Christoph Birk [EMAIL PROTECTED] wrote: On Fri, 2 Nov 2007, Benjamin Teuber wrote: I don't think there's something different at different depths in the tree.. To update RAVE after a simulation, for each child of a node you visited during that simulation, you update if the move leading to the child was played later (until the end of the playout). Then, always when you calculate the UCT value, you combine that with the RAVE value with that weighted average formula to give the final score. Of course, you need to be careful with signs :-) That means you have one global 'RAVE' table? Or one at each node in the UCT tree? Christoph ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/