Re: [computer-go] Rapid action value estimation

2007-11-07 Thread Christoph Birk

On Mon, 5 Nov 2007, Jason House wrote:

I implemented this yesterday.  In doing so, I realized I didn't know the
proper way to initialize new leaves in the UCT tree.  MoGo papers seem to
talk about a progression from always picking an unexplored leaf (AKA using
infinity for the upper confidence bound), to first play urgency (using a
fixed ucb for new leaves), to using patterns.


What did you decide on?
What is the difference between 'hb-678-UCTRAVE-10k' and 'hb-675-UCT-10k'.

Thanks,
Christoph

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Rapid action value estimation

2007-11-07 Thread Jason House

On Wed, 2007-11-07 at 14:34 -0800, Christoph Birk wrote:
 What is the difference between 'hb-678-UCTRAVE-10k' and 'hb-675-UCT-10k'.


It's probably obvious, but UCTRAVE uses RAVE instead of just (tuned)
UCT.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Rapid action value estimation

2007-11-02 Thread Jason House
I'd like to implement RAVE as described in [1].  I believe I have a very
clear understanding of how to do this at the leaves of the UCT search tree.
What I'm not sure about is how to apply RAVE results higher in the UCT
tree.  Does anyone have any experience with this that they're willing to
share?


[1] http://www.machinelearning.org/proceedings/icml2007/papers/387.pdf
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Rapid action value estimation

2007-11-02 Thread Christoph Birk

On Fri, 2 Nov 2007, Benjamin Teuber wrote:

I don't think there's something different at different depths in the tree..
To update RAVE after a simulation, for each child of a node you visited
during that simulation, you update if the move leading to the child was
played later (until the end of the playout).
Then, always when you calculate the UCT value, you combine that with the
RAVE value with that weighted average formula to give the final score.
Of course, you need to be careful with signs :-)


That means you have one global 'RAVE' table?
Or one at each node in the UCT tree?

Christoph
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Rapid action value estimation

2007-11-02 Thread Benjamin Teuber
I don't think there's something different at different depths in the tree..
To update RAVE after a simulation, for each child of a node you visited
during that simulation, you update if the move leading to the child was
played later (until the end of the playout).
Then, always when you calculate the UCT value, you combine that with the
RAVE value with that weighted average formula to give the final score.
Of course, you need to be careful with signs :-)

Btw, I don't really see a point in calculating and adding the confidence
bound for RAVE as well, as all moves will have been played almost equally
often - thus I dropped the term..
Maybe Sylvain or someone else can comment on this..

Another thing - I didn't believe that you need to do RAVE seperately for
both colors (i.e. you should only consider later moves on the point by the
same color), as e.g. Peter Drake mentioned in a paper of his. But after some
experiments I changed my mind and think he is right =)

Cheers,
Benjamin

On 11/2/07, Jason House [EMAIL PROTECTED] wrote:

 I'd like to implement RAVE as described in [1].  I believe I have a very
 clear understanding of how to do this at the leaves of the UCT search tree.
 What I'm not sure about is how to apply RAVE results higher in the UCT
 tree.  Does anyone have any experience with this that they're willing to
 share?


 [1] http://www.machinelearning.org/proceedings/icml2007/papers/387.pdf

 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Rapid action value estimation

2007-11-02 Thread Benjamin Teuber
I store it in the normal uct tree,
so that each node has variables raveVisits and raveWins besides uctVisits
and uctWins.
So a node in the UCT-DAG can either represent a position or a move.

On 11/2/07, Christoph Birk [EMAIL PROTECTED] wrote:

 On Fri, 2 Nov 2007, Benjamin Teuber wrote:
  I don't think there's something different at different depths in the
 tree..
  To update RAVE after a simulation, for each child of a node you visited
  during that simulation, you update if the move leading to the child was
  played later (until the end of the playout).
  Then, always when you calculate the UCT value, you combine that with the
  RAVE value with that weighted average formula to give the final score.
  Of course, you need to be careful with signs :-)

 That means you have one global 'RAVE' table?
 Or one at each node in the UCT tree?

 Christoph
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/