Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

Hideki Kato Wed, 06 Feb 2008 16:11:51 -0800

Hi Erik,

My program is very based on MoGo's report and the paper.  Yes, I 
used FPU of 1.15.


-Hideki

Erik van der Werf: <[EMAIL PROTECTED]>:
>Hi Hideki,
>
>Your results look similar to those of Mogo as reported in their icml
>paper. When you ran this experiment, did you use anything like FPU or
>progressive widening, or did you use Levente's original design which
>always selects unvisited moves first?
>
>Regards,
>Erik
>
>
>On Wed, Feb 6, 2008 at 3:42 PM, Hideki Kato <[EMAIL PROTECTED]> wrote:
>> I found some data.  GGMC Go v2r6, against GNU Go 3.7.10 level 10, 9x9,
>>  komi 7.5, 3000 playouts/move, 2000 games match:
>>
>>  Without RAVE:   winning rate was 23.1 +- 0.9% (-209 +- 9 ELO)
>>  With RAVE:      winning rate was 65.3 +- 1.1% (+110 +- 8 ELO)
>>
>>  Though this includes some other improvements, most come from RAVE.
>>  Unlike MoGo, my best 'K' was 1000.
>>
>>  Following is my implementation of RAVE for GGMC v2r6.
>>  1) Each playout returns the score and all moves with colors played.
>>  2) While back-propagating the value (degitized score), computes the
>>  mean and the variance according to UCB1 and do the same for RAVE
>>  seperatelly.  For RAVE, the values of all (legal) moves, except played
>>  one, in a node are updated.
>>  3) In the computation of values for RAVE, the point is that there
>>  appeares three colors (as someone, I remember GCP, mentioned before).
>>  If the players' colors aren't the same then skip.  Count the value as
>>  is or negate (1 - score, for me), depending on the color of the player
>>  at the position and the color for the score.
>>  4) Before back-propagating the value of each playout, I setup a color
>>  table for all intersections of the board for speed-up, in fact
>>  (initialized with EMPTY). That is, fill the board (table[move] =
>>  color) by tracing the moves and the colors returned by the playout
>>  forward (from leaf node to end of the game). Then, by tracing the
>>  path from root to the leaf node, clear the table[move] (table[move] =
>>  EMPTY), in order to avoid duplicate counting with UCB1.
>>  5) While descending the tree, merge the values come from UCB1 and
>>  RAVE with 'K' according to the formula in the paper.
>>
>>  #Though I'm writing this by reading my source code, this description
>>  may include some errors.
>>
>>  Hope this helps,
>>
>> Hideki
>>
>>  Gian-Carlo Pascutto: <[EMAIL PROTECTED]>:
>>
>> >> I also implemented RAVE in Mango. There was a few points of improvements
>>  >> (around 60 Elo points with gnugo as reference), but as much as in the
>>  >> paper of Gelly and Silver :( (around 250 Elo points if I remember well)
>>  >>
>>  >> It might be that the effect of RAVE depends a lot on the simulation
>>  >> strategy. Indeed, sometimes my RAVE was playing very good moves but also
>>  >> very bad ones.
>>  >
>>  >I don't think the simulation strategy is the key.
>>  >
>>  >I suspect the improvement is largest when you don't do progressive 
>> widening.
>>  >
>>  >Nevertheless it would be quite interesting to see the implementation
>>  >details of ggmc's RAVE. RAVE performance is quite dependent on exact
>>  >implementation and parameters.
>>  --
>>
>> [EMAIL PROTECTED] (Kato)
>>  _______________________________________________
>>
>>
>> computer-go mailing list
>>  computer-go@computer-go.org
>>  http://www.computer-go.org/mailman/listinfo/computer-go/
>>
>_______________________________________________
>computer-go mailing list
>computer-go@computer-go.org
>http://www.computer-go.org/mailman/listinfo/computer-go/
--
[EMAIL PROTECTED] (Kato)
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

Reply via email to