subject:"Re\: \[computer\-go\] More UCT \/ Monte\-Carlo questions \(Effect of rave\)"

Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

2008-02-07 Thread Erik van der Werf

Hmm, sounds like I should experiment some more. Thanks!

Erik


On Thu, Feb 7, 2008 at 1:11 AM, Hideki Kato [EMAIL PROTECTED] wrote:
 Hi Erik,

  My program is very based on MoGo's report and the paper.  Yes, I
  used FPU of 1.15.

  -Hideki

  Erik van der Werf: [EMAIL PROTECTED]:


 Hi Hideki,
  
  Your results look similar to those of Mogo as reported in their icml
  paper. When you ran this experiment, did you use anything like FPU or
  progressive widening, or did you use Levente's original design which
  always selects unvisited moves first?
  
  Regards,
  Erik
  
  
  On Wed, Feb 6, 2008 at 3:42 PM, Hideki Kato [EMAIL PROTECTED] wrote:
   I found some data.  GGMC Go v2r6, against GNU Go 3.7.10 level 10, 9x9,
komi 7.5, 3000 playouts/move, 2000 games match:
  
Without RAVE:   winning rate was 23.1 +- 0.9% (-209 +- 9 ELO)
With RAVE:  winning rate was 65.3 +- 1.1% (+110 +- 8 ELO)
  
Though this includes some other improvements, most come from RAVE.
Unlike MoGo, my best 'K' was 1000.
  
Following is my implementation of RAVE for GGMC v2r6.
1) Each playout returns the score and all moves with colors played.
2) While back-propagating the value (degitized score), computes the
mean and the variance according to UCB1 and do the same for RAVE
seperatelly.  For RAVE, the values of all (legal) moves, except played
one, in a node are updated.
3) In the computation of values for RAVE, the point is that there
appeares three colors (as someone, I remember GCP, mentioned before).
If the players' colors aren't the same then skip.  Count the value as
is or negate (1 - score, for me), depending on the color of the player
at the position and the color for the score.
4) Before back-propagating the value of each playout, I setup a color
table for all intersections of the board for speed-up, in fact
(initialized with EMPTY). That is, fill the board (table[move] =
color) by tracing the moves and the colors returned by the playout
forward (from leaf node to end of the game). Then, by tracing the
path from root to the leaf node, clear the table[move] (table[move] =
EMPTY), in order to avoid duplicate counting with UCB1.
5) While descending the tree, merge the values come from UCB1 and
RAVE with 'K' according to the formula in the paper.
  
#Though I'm writing this by reading my source code, this description
may include some errors.
  
Hope this helps,
  
   Hideki
  
Gian-Carlo Pascutto: [EMAIL PROTECTED]:
  
I also implemented RAVE in Mango. There was a few points of 
 improvements
 (around 60 Elo points with gnugo as reference), but as much as in the
 paper of Gelly and Silver :( (around 250 Elo points if I remember 
 well)

 It might be that the effect of RAVE depends a lot on the simulation
 strategy. Indeed, sometimes my RAVE was playing very good moves but 
 also
 very bad ones.

I don't think the simulation strategy is the key.

I suspect the improvement is largest when you don't do progressive 
 widening.

Nevertheless it would be quite interesting to see the implementation
details of ggmc's RAVE. RAVE performance is quite dependent on exact
implementation and parameters.
--
  
   [EMAIL PROTECTED] (Kato)
___
  
  
   computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/
  
  ___
  computer-go mailing list
  computer-go@computer-go.org
  http://www.computer-go.org/mailman/listinfo/computer-go/
  --
  [EMAIL PROTECTED] (Kato)
  ___
  computer-go mailing list
  computer-go@computer-go.org
  http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

RE: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

2008-02-06 Thread Gian-Carlo Pascutto

 I also implemented RAVE in Mango. There was a few points of improvements
 (around 60 Elo points with gnugo as reference), but as much as in the
 paper of Gelly and Silver :( (around 250 Elo points if I remember well)

 It might be that the effect of RAVE depends a lot on the simulation
 strategy. Indeed, sometimes my RAVE was playing very good moves but also
 very bad ones.

I don't think the simulation strategy is the key.

I suspect the improvement is largest when you don't do progressive widening.

Nevertheless it would be quite interesting to see the implementation
details of ggmc's RAVE. RAVE performance is quite dependent on exact
implementation and parameters.

-- 
GCP
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

RE: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

2008-02-06 Thread Chaslot G (MICC)

I also implemented RAVE in Mango. There was a few points of improvements 
(around 60 Elo points with gnugo as reference), but as much as in the paper of 
Gelly and Silver :( (around 250 Elo points if I remember well)

It might be that the effect of RAVE depends a lot on the simulation strategy. 
Indeed, sometimes my RAVE was playing very good moves but also very bad ones.

Guillaume

-Original Message-
From: [EMAIL PROTECTED] on behalf of Magnus Persson
Sent: Wed 06/02/2008 00:42
To: computer-go@computer-go.org
Subject: Re: [computer-go] More UCT / Monte-Carlo questions
 
Quoting Gunnar Farnebäck [EMAIL PROTECTED]:

 I have never managed to implement RAVE successfully. It made my
 program significantly slower but no stronger even at a fixed number of
 simulations.

I get a small effect from RAVE, My rationalisation is that if the  
program is rich with other features to improve performance RAVE may  
not add that much.

-Magnus


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

winmail.dat___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

2008-02-06 Thread Erik van der Werf

Hi Guillaume,

I think we talked about this before, but others may be interested as
well. In my opinion the ICML paper on Rave has several weaknesses.
It's been a while since I read the paper, but here are some I
remember:

(1) They compared Rave to plain UCT. If they would have compared it to
a more sophisticated implementation (like the best Mogo before Rave)
they probably could not have shown a spectacular improvement.

(2) The paper suggest that you should add an upper confidence bound to
the rave values. Although they didn't actually make this explicit,
because IIRC the paper failed to provide an actual value for the
constant c, I guess most people naturally assume it to be positive.
However, Rave is already a greedy heuristic, so if anything you should
probably subtract the confidence bound. Depending on the playout
policy, adding an upper confidence bound to the rave values can push
some terrible bad moves up (like playing on 1-1). The reason seems to
be that such moves are normally sampled very infrequently (so the UCB
will be higher), and when they are selected (e.g. for a big capture
deep in the playout) they correlate more with winning (so the value
will also be higher) without generalizing to earlier positions.

Best,
Erik



On Wed, Feb 6, 2008 at 10:47 AM, Chaslot G (MICC)
[EMAIL PROTECTED] wrote:
 I also implemented RAVE in Mango. There was a few points of improvements 
 (around 60 Elo points with gnugo as reference), but as much as in the paper 
 of Gelly and Silver :( (around 250 Elo points if I remember well)

  It might be that the effect of RAVE depends a lot on the simulation 
 strategy. Indeed, sometimes my RAVE was playing very good moves but also very 
 bad ones.

  Guillaume

  -Original Message-
  From: [EMAIL PROTECTED] on behalf of Magnus Persson
  Sent: Wed 06/02/2008 00:42
  To: computer-go@computer-go.org
  Subject: Re: [computer-go] More UCT / Monte-Carlo questions

  Quoting Gunnar Farnebäck [EMAIL PROTECTED]:

   I have never managed to implement RAVE successfully. It made my
   program significantly slower but no stronger even at a fixed number of
   simulations.

  I get a small effect from RAVE, My rationalisation is that if the
  program is rich with other features to improve performance RAVE may
  not add that much.

  -Magnus


  ___
  computer-go mailing list
  computer-go@computer-go.org
  http://www.computer-go.org/mailman/listinfo/computer-go/


 ___
  computer-go mailing list
  computer-go@computer-go.org
  http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

2008-02-06 Thread Sylvain Gelly

Hi Erik,

 (1) They compared Rave to plain UCT. If they would have compared it to
 a more sophisticated implementation (like the best Mogo before Rave)
 they probably could not have shown a spectacular improvement.

The best Mogo before Rave was very close to plain UCT with the
sequence-like simulations. And indeed we exactly compared the best
Mogo before and after Rave. There is a table (I don't remember which
number), which show the incremental improvements from plain UCT, to
Rave, passing by plain UCT with sequence-like simulations. All
experiments have been done with MoGo's code, all other parts of the
code staying constant. There were not secret part of MoGo disabled
to make the improvement of Rave more interesting.

One discrepancy between our results and the one some of you observe,
as Gian-Carlo stated, is likely to come from the parameters and detail
of implementation. We heavily tuned those parameters and details
against gnugo, and that makes quite a big difference. I chatted more
closely with some of you about details and it did make a difference.
Maybe some of you can share what made a change, if you want.

Note as well that the current implementation of MoGo (not the one at
the time of the ICML paper) use a different tradeoff between UCT and
Rave value, thanks to an idea of David Silver, which brought
improvements in 19x19 (where the Rave values are the most useful),
while it was marginal (still better) in 9x9. But anyway we here are
talking about 9x9, so it can't explain what you are talking about.

 (2) () Depending on the playout
 policy, adding an upper confidence bound to the rave values can push
 some terrible bad moves up (like playing on 1-1). The reason seems to
 be that such moves are normally sampled very infrequently (so the UCB
 will be higher), and when they are selected (...)

That could be an explanation, but there are two points:
- the prior you put on top of Rave often avoid to first sample 1-1,
and even when you do, you very often loose just 1 playout because of
the UCT value you get right away.
- I never observed a big discrepancy between the number of Rave
samples for each move.

Best,
Sylvain
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

2008-02-06 Thread Erik van der Werf

Hi Sylvain,

Thanks for your reply! How do you like your new job? Do you miss CompGo? ;-)


On Wed, Feb 6, 2008 at 2:20 PM, Sylvain Gelly [EMAIL PROTECTED] wrote:
   (1) They compared Rave to plain UCT. If they would have compared it to
   a more sophisticated implementation (like the best Mogo before Rave)
   they probably could not have shown a spectacular improvement.

  The best Mogo before Rave was very close to plain UCT with the
  sequence-like simulations. And indeed we exactly compared the best
  Mogo before and after Rave. There is a table (I don't remember which
  number), which show the incremental improvements from plain UCT, to
  Rave, passing by plain UCT with sequence-like simulations. All
  experiments have been done with MoGo's code, all other parts of the
  code staying constant. There were not secret part of MoGo disabled
  to make the improvement of Rave more interesting.

  One discrepancy between our results and the one some of you observe,
  as Gian-Carlo stated, is likely to come from the parameters and detail
  of implementation. We heavily tuned those parameters and details
  against gnugo, and that makes quite a big difference. I chatted more
  closely with some of you about details and it did make a difference.
  Maybe some of you can share what made a change, if you want.

  Note as well that the current implementation of MoGo (not the one at
  the time of the ICML paper) use a different tradeoff between UCT and
  Rave value, thanks to an idea of David Silver, which brought
  improvements in 19x19 (where the Rave values are the most useful),
  while it was marginal (still better) in 9x9. But anyway we here are
  talking about 9x9, so it can't explain what you are talking about.

Well, since you say the improvement is marginal on 9x9 then I think we
are actually in agreement. I also get an improvement, but it's just
not that much. When I wrote 'spectacular' I meant the reported jump
from 25% to over 55% winrate against gnugo. I only got such an
improvement when I first dumbed down my implementation to a plain UCT
without a good move-ordering (and always expanding all unvisited nodes
first).


   (2) () Depending on the playout

  policy, adding an upper confidence bound to the rave values can push
   some terrible bad moves up (like playing on 1-1). The reason seems to
   be that such moves are normally sampled very infrequently (so the UCB
   will be higher), and when they are selected (...)

  That could be an explanation, but there are two points:
  - the prior you put on top of Rave often avoid to first sample 1-1,
  and even when you do, you very often loose just 1 playout because of
  the UCT value you get right away.

Yes, using more prior knowledge will probably reduce the problem.


  - I never observed a big discrepancy between the number of Rave
  samples for each move.

I guess this is because your playout policy is more uniform than mine?
The problem tends to disappear with uniform random playouts.
My program has some hard-reject patterns to discard moves that are
strictly inferior to adjacent alternatives, so in some situations I
can easily get a large difference between the number of Rave samples
for each move.

Best,
Erik
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

2008-02-06 Thread Hideki Kato

I found some data.  GGMC Go v2r6, against GNU Go 3.7.10 level 10, 9x9, 
komi 7.5, 3000 playouts/move, 2000 games match:

Without RAVE:   winning rate was 23.1 +- 0.9% (-209 +- 9 ELO)
With RAVE:  winning rate was 65.3 +- 1.1% (+110 +- 8 ELO)

Though this includes some other improvements, most come from RAVE.  
Unlike MoGo, my best 'K' was 1000.

Following is my implementation of RAVE for GGMC v2r6.
1) Each playout returns the score and all moves with colors played.
2) While back-propagating the value (degitized score), computes the 
mean and the variance according to UCB1 and do the same for RAVE 
seperatelly.  For RAVE, the values of all (legal) moves, except played 
one, in a node are updated.
3) In the computation of values for RAVE, the point is that there 
appeares three colors (as someone, I remember GCP, mentioned before).  
If the players' colors aren't the same then skip.  Count the value as 
is or negate (1 - score, for me), depending on the color of the player 
at the position and the color for the score.
4) Before back-propagating the value of each playout, I setup a color 
table for all intersections of the board for speed-up, in fact 
(initialized with EMPTY). That is, fill the board (table[move] = 
color) by tracing the moves and the colors returned by the playout 
forward (from leaf node to end of the game). Then, by tracing the 
path from root to the leaf node, clear the table[move] (table[move] = 
EMPTY), in order to avoid duplicate counting with UCB1.
5) While descending the tree, merge the values come from UCB1 and 
RAVE with 'K' according to the formula in the paper.

#Though I'm writing this by reading my source code, this description 
may include some errors.

Hope this helps,
Hideki

Gian-Carlo Pascutto: [EMAIL PROTECTED]:
 I also implemented RAVE in Mango. There was a few points of improvements
 (around 60 Elo points with gnugo as reference), but as much as in the
 paper of Gelly and Silver :( (around 250 Elo points if I remember well)

 It might be that the effect of RAVE depends a lot on the simulation
 strategy. Indeed, sometimes my RAVE was playing very good moves but also
 very bad ones.

I don't think the simulation strategy is the key.

I suspect the improvement is largest when you don't do progressive widening.

Nevertheless it would be quite interesting to see the implementation
details of ggmc's RAVE. RAVE performance is quite dependent on exact
implementation and parameters.
--
[EMAIL PROTECTED] (Kato)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

2008-02-06 Thread Jason House

On Feb 6, 2008 11:39 AM, Gian-Carlo Pascutto [EMAIL PROTECTED] wrote:

 Hideki Kato wrote:

  4) Before back-propagating the value of each playout, I setup a color
  table for all intersections of the board for speed-up, in fact
  (initialized with EMPTY). That is, fill the board (table[move] =
  color) by tracing the moves and the colors returned by the playout
  forward (from leaf node to end of the game). Then, by tracing the
  path from root to the leaf node, clear the table[move] (table[move] =
  EMPTY), in order to avoid duplicate counting with UCB1.

 I don't understand this. What and how would you be double counting?


Let's a set of stones played in the playout get captured and then the void
gets filled again.  The key is to count only the first move at a set board
position.  If you don't, you can end up counting a particular move more than
once.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

2008-02-06 Thread Gian-Carlo Pascutto


Hideki Kato wrote:

4) Before back-propagating the value of each playout, I setup a color 
table for all intersections of the board for speed-up, in fact 
(initialized with EMPTY). That is, fill the board (table[move] = 
color) by tracing the moves and the colors returned by the playout 
forward (from leaf node to end of the game). Then, by tracing the 
path from root to the leaf node, clear the table[move] (table[move] = 
EMPTY), in order to avoid duplicate counting with UCB1.


I don't understand this. What and how would you be double counting?

--
GCP
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

2008-02-06 Thread Hideki Kato


Gian-Carlo Pascutto: [EMAIL PROTECTED]:
Hideki Kato wrote:

 4) Before back-propagating the value of each playout, I setup a color 
 table for all intersections of the board for speed-up, in fact 
 (initialized with EMPTY). That is, fill the board (table[move] = 
 color) by tracing the moves and the colors returned by the playout 
 forward (from leaf node to end of the game). Then, by tracing the 
 path from root to the leaf node, clear the table[move] (table[move] = 
 EMPTY), in order to avoid duplicate counting with UCB1.

I don't understand this. What and how would you be double counting?

I mean the values of such moves are updated in both UCB and RAVE.  
That is, the moves in the path are updated by UCB and all moves in 
the nodes in the path are updated by RAVE.  As UCB values and RAVE 
values will be averaged later, perhaps I thought not updating the 
values of such moves in RAVE would be _natural_.

As this code was written last Oct. and I've been working on other 
staffs, I'm not sure I remember the idea correctly.  But I believe 
this improved some.

-Hideki
--
[EMAIL PROTECTED] (Kato)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

2008-02-06 Thread Erik van der Werf

Hi Hideki,

Your results look similar to those of Mogo as reported in their icml
paper. When you ran this experiment, did you use anything like FPU or
progressive widening, or did you use Levente's original design which
always selects unvisited moves first?

Regards,
Erik


On Wed, Feb 6, 2008 at 3:42 PM, Hideki Kato [EMAIL PROTECTED] wrote:
 I found some data.  GGMC Go v2r6, against GNU Go 3.7.10 level 10, 9x9,
  komi 7.5, 3000 playouts/move, 2000 games match:

  Without RAVE:   winning rate was 23.1 +- 0.9% (-209 +- 9 ELO)
  With RAVE:  winning rate was 65.3 +- 1.1% (+110 +- 8 ELO)

  Though this includes some other improvements, most come from RAVE.
  Unlike MoGo, my best 'K' was 1000.

  Following is my implementation of RAVE for GGMC v2r6.
  1) Each playout returns the score and all moves with colors played.
  2) While back-propagating the value (degitized score), computes the
  mean and the variance according to UCB1 and do the same for RAVE
  seperatelly.  For RAVE, the values of all (legal) moves, except played
  one, in a node are updated.
  3) In the computation of values for RAVE, the point is that there
  appeares three colors (as someone, I remember GCP, mentioned before).
  If the players' colors aren't the same then skip.  Count the value as
  is or negate (1 - score, for me), depending on the color of the player
  at the position and the color for the score.
  4) Before back-propagating the value of each playout, I setup a color
  table for all intersections of the board for speed-up, in fact
  (initialized with EMPTY). That is, fill the board (table[move] =
  color) by tracing the moves and the colors returned by the playout
  forward (from leaf node to end of the game). Then, by tracing the
  path from root to the leaf node, clear the table[move] (table[move] =
  EMPTY), in order to avoid duplicate counting with UCB1.
  5) While descending the tree, merge the values come from UCB1 and
  RAVE with 'K' according to the formula in the paper.

  #Though I'm writing this by reading my source code, this description
  may include some errors.

  Hope this helps,

 Hideki

  Gian-Carlo Pascutto: [EMAIL PROTECTED]:

  I also implemented RAVE in Mango. There was a few points of improvements
   (around 60 Elo points with gnugo as reference), but as much as in the
   paper of Gelly and Silver :( (around 250 Elo points if I remember well)
  
   It might be that the effect of RAVE depends a lot on the simulation
   strategy. Indeed, sometimes my RAVE was playing very good moves but also
   very bad ones.
  
  I don't think the simulation strategy is the key.
  
  I suspect the improvement is largest when you don't do progressive widening.
  
  Nevertheless it would be quite interesting to see the implementation
  details of ggmc's RAVE. RAVE performance is quite dependent on exact
  implementation and parameters.
  --

 [EMAIL PROTECTED] (Kato)
  ___


 computer-go mailing list
  computer-go@computer-go.org
  http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

2008-02-06 Thread Sylvain Gelly

Hi Erik,

 In the ICML version of UCT without RAVE, you did not use your First
 Play Urgency, right?

 I think that using FPU has an effect similar to what others reported
 with their progressive widening. From what I've seen it looks like
 plain UCT, without FPU or progressive widening, has more to gain from
 RAVE.

 Am I wrong to assume that the strongest version of Mogo before you
 introduced RAVE used something like FPU or progressive widening?

You are right, but the effect of terms which were before is by far
negligeable. It may definitely be possible that stronger use of
knowledge like the progressive widening make the effect of RAVE much
less interesting. It was not our case at that time. I am also not sure
that it is the main reason of failure for people who tried and report
a small improvement. Of course, there can be so many reasons that it
is difficult to find out without looking at the implementation in
details :/.

Best,
Sylvain
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

2008-02-06 Thread Hideki Kato

Hi Erik,

My program is very based on MoGo's report and the paper.  Yes, I 
used FPU of 1.15.

-Hideki

Erik van der Werf: [EMAIL PROTECTED]:
Hi Hideki,

Your results look similar to those of Mogo as reported in their icml
paper. When you ran this experiment, did you use anything like FPU or
progressive widening, or did you use Levente's original design which
always selects unvisited moves first?

Regards,
Erik


On Wed, Feb 6, 2008 at 3:42 PM, Hideki Kato [EMAIL PROTECTED] wrote:
 I found some data.  GGMC Go v2r6, against GNU Go 3.7.10 level 10, 9x9,
  komi 7.5, 3000 playouts/move, 2000 games match:

  Without RAVE:   winning rate was 23.1 +- 0.9% (-209 +- 9 ELO)
  With RAVE:  winning rate was 65.3 +- 1.1% (+110 +- 8 ELO)

  Though this includes some other improvements, most come from RAVE.
  Unlike MoGo, my best 'K' was 1000.

  Following is my implementation of RAVE for GGMC v2r6.
  1) Each playout returns the score and all moves with colors played.
  2) While back-propagating the value (degitized score), computes the
  mean and the variance according to UCB1 and do the same for RAVE
  seperatelly.  For RAVE, the values of all (legal) moves, except played
  one, in a node are updated.
  3) In the computation of values for RAVE, the point is that there
  appeares three colors (as someone, I remember GCP, mentioned before).
  If the players' colors aren't the same then skip.  Count the value as
  is or negate (1 - score, for me), depending on the color of the player
  at the position and the color for the score.
  4) Before back-propagating the value of each playout, I setup a color
  table for all intersections of the board for speed-up, in fact
  (initialized with EMPTY). That is, fill the board (table[move] =
  color) by tracing the moves and the colors returned by the playout
  forward (from leaf node to end of the game). Then, by tracing the
  path from root to the leaf node, clear the table[move] (table[move] =
  EMPTY), in order to avoid duplicate counting with UCB1.
  5) While descending the tree, merge the values come from UCB1 and
  RAVE with 'K' according to the formula in the paper.

  #Though I'm writing this by reading my source code, this description
  may include some errors.

  Hope this helps,

 Hideki

  Gian-Carlo Pascutto: [EMAIL PROTECTED]:

  I also implemented RAVE in Mango. There was a few points of improvements
   (around 60 Elo points with gnugo as reference), but as much as in the
   paper of Gelly and Silver :( (around 250 Elo points if I remember well)
  
   It might be that the effect of RAVE depends a lot on the simulation
   strategy. Indeed, sometimes my RAVE was playing very good moves but also
   very bad ones.
  
  I don't think the simulation strategy is the key.
  
  I suspect the improvement is largest when you don't do progressive 
 widening.
  
  Nevertheless it would be quite interesting to see the implementation
  details of ggmc's RAVE. RAVE performance is quite dependent on exact
  implementation and parameters.
  --

 [EMAIL PROTECTED] (Kato)
  ___


 computer-go mailing list
  computer-go@computer-go.org
  http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/
--
[EMAIL PROTECTED] (Kato)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

RE: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

RE: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

13 matches

Site Navigation

Mail list logo

Footer information