Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)
Hmm, sounds like I should experiment some more. Thanks! Erik On Thu, Feb 7, 2008 at 1:11 AM, Hideki Kato [EMAIL PROTECTED] wrote: Hi Erik, My program is very based on MoGo's report and the paper. Yes, I used FPU of 1.15. -Hideki Erik van der Werf: [EMAIL PROTECTED]: Hi Hideki, Your results look similar to those of Mogo as reported in their icml paper. When you ran this experiment, did you use anything like FPU or progressive widening, or did you use Levente's original design which always selects unvisited moves first? Regards, Erik On Wed, Feb 6, 2008 at 3:42 PM, Hideki Kato [EMAIL PROTECTED] wrote: I found some data. GGMC Go v2r6, against GNU Go 3.7.10 level 10, 9x9, komi 7.5, 3000 playouts/move, 2000 games match: Without RAVE: winning rate was 23.1 +- 0.9% (-209 +- 9 ELO) With RAVE: winning rate was 65.3 +- 1.1% (+110 +- 8 ELO) Though this includes some other improvements, most come from RAVE. Unlike MoGo, my best 'K' was 1000. Following is my implementation of RAVE for GGMC v2r6. 1) Each playout returns the score and all moves with colors played. 2) While back-propagating the value (degitized score), computes the mean and the variance according to UCB1 and do the same for RAVE seperatelly. For RAVE, the values of all (legal) moves, except played one, in a node are updated. 3) In the computation of values for RAVE, the point is that there appeares three colors (as someone, I remember GCP, mentioned before). If the players' colors aren't the same then skip. Count the value as is or negate (1 - score, for me), depending on the color of the player at the position and the color for the score. 4) Before back-propagating the value of each playout, I setup a color table for all intersections of the board for speed-up, in fact (initialized with EMPTY). That is, fill the board (table[move] = color) by tracing the moves and the colors returned by the playout forward (from leaf node to end of the game). Then, by tracing the path from root to the leaf node, clear the table[move] (table[move] = EMPTY), in order to avoid duplicate counting with UCB1. 5) While descending the tree, merge the values come from UCB1 and RAVE with 'K' according to the formula in the paper. #Though I'm writing this by reading my source code, this description may include some errors. Hope this helps, Hideki Gian-Carlo Pascutto: [EMAIL PROTECTED]: I also implemented RAVE in Mango. There was a few points of improvements (around 60 Elo points with gnugo as reference), but as much as in the paper of Gelly and Silver :( (around 250 Elo points if I remember well) It might be that the effect of RAVE depends a lot on the simulation strategy. Indeed, sometimes my RAVE was playing very good moves but also very bad ones. I don't think the simulation strategy is the key. I suspect the improvement is largest when you don't do progressive widening. Nevertheless it would be quite interesting to see the implementation details of ggmc's RAVE. RAVE performance is quite dependent on exact implementation and parameters. -- [EMAIL PROTECTED] (Kato) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ -- [EMAIL PROTECTED] (Kato) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)
I also implemented RAVE in Mango. There was a few points of improvements (around 60 Elo points with gnugo as reference), but as much as in the paper of Gelly and Silver :( (around 250 Elo points if I remember well) It might be that the effect of RAVE depends a lot on the simulation strategy. Indeed, sometimes my RAVE was playing very good moves but also very bad ones. I don't think the simulation strategy is the key. I suspect the improvement is largest when you don't do progressive widening. Nevertheless it would be quite interesting to see the implementation details of ggmc's RAVE. RAVE performance is quite dependent on exact implementation and parameters. -- GCP ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)
I also implemented RAVE in Mango. There was a few points of improvements (around 60 Elo points with gnugo as reference), but as much as in the paper of Gelly and Silver :( (around 250 Elo points if I remember well) It might be that the effect of RAVE depends a lot on the simulation strategy. Indeed, sometimes my RAVE was playing very good moves but also very bad ones. Guillaume -Original Message- From: [EMAIL PROTECTED] on behalf of Magnus Persson Sent: Wed 06/02/2008 00:42 To: computer-go@computer-go.org Subject: Re: [computer-go] More UCT / Monte-Carlo questions Quoting Gunnar Farnebäck [EMAIL PROTECTED]: I have never managed to implement RAVE successfully. It made my program significantly slower but no stronger even at a fixed number of simulations. I get a small effect from RAVE, My rationalisation is that if the program is rich with other features to improve performance RAVE may not add that much. -Magnus ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ winmail.dat___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)
Hi Guillaume, I think we talked about this before, but others may be interested as well. In my opinion the ICML paper on Rave has several weaknesses. It's been a while since I read the paper, but here are some I remember: (1) They compared Rave to plain UCT. If they would have compared it to a more sophisticated implementation (like the best Mogo before Rave) they probably could not have shown a spectacular improvement. (2) The paper suggest that you should add an upper confidence bound to the rave values. Although they didn't actually make this explicit, because IIRC the paper failed to provide an actual value for the constant c, I guess most people naturally assume it to be positive. However, Rave is already a greedy heuristic, so if anything you should probably subtract the confidence bound. Depending on the playout policy, adding an upper confidence bound to the rave values can push some terrible bad moves up (like playing on 1-1). The reason seems to be that such moves are normally sampled very infrequently (so the UCB will be higher), and when they are selected (e.g. for a big capture deep in the playout) they correlate more with winning (so the value will also be higher) without generalizing to earlier positions. Best, Erik On Wed, Feb 6, 2008 at 10:47 AM, Chaslot G (MICC) [EMAIL PROTECTED] wrote: I also implemented RAVE in Mango. There was a few points of improvements (around 60 Elo points with gnugo as reference), but as much as in the paper of Gelly and Silver :( (around 250 Elo points if I remember well) It might be that the effect of RAVE depends a lot on the simulation strategy. Indeed, sometimes my RAVE was playing very good moves but also very bad ones. Guillaume -Original Message- From: [EMAIL PROTECTED] on behalf of Magnus Persson Sent: Wed 06/02/2008 00:42 To: computer-go@computer-go.org Subject: Re: [computer-go] More UCT / Monte-Carlo questions Quoting Gunnar Farnebäck [EMAIL PROTECTED]: I have never managed to implement RAVE successfully. It made my program significantly slower but no stronger even at a fixed number of simulations. I get a small effect from RAVE, My rationalisation is that if the program is rich with other features to improve performance RAVE may not add that much. -Magnus ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)
Hi Erik, (1) They compared Rave to plain UCT. If they would have compared it to a more sophisticated implementation (like the best Mogo before Rave) they probably could not have shown a spectacular improvement. The best Mogo before Rave was very close to plain UCT with the sequence-like simulations. And indeed we exactly compared the best Mogo before and after Rave. There is a table (I don't remember which number), which show the incremental improvements from plain UCT, to Rave, passing by plain UCT with sequence-like simulations. All experiments have been done with MoGo's code, all other parts of the code staying constant. There were not secret part of MoGo disabled to make the improvement of Rave more interesting. One discrepancy between our results and the one some of you observe, as Gian-Carlo stated, is likely to come from the parameters and detail of implementation. We heavily tuned those parameters and details against gnugo, and that makes quite a big difference. I chatted more closely with some of you about details and it did make a difference. Maybe some of you can share what made a change, if you want. Note as well that the current implementation of MoGo (not the one at the time of the ICML paper) use a different tradeoff between UCT and Rave value, thanks to an idea of David Silver, which brought improvements in 19x19 (where the Rave values are the most useful), while it was marginal (still better) in 9x9. But anyway we here are talking about 9x9, so it can't explain what you are talking about. (2) () Depending on the playout policy, adding an upper confidence bound to the rave values can push some terrible bad moves up (like playing on 1-1). The reason seems to be that such moves are normally sampled very infrequently (so the UCB will be higher), and when they are selected (...) That could be an explanation, but there are two points: - the prior you put on top of Rave often avoid to first sample 1-1, and even when you do, you very often loose just 1 playout because of the UCT value you get right away. - I never observed a big discrepancy between the number of Rave samples for each move. Best, Sylvain ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)
Hi Sylvain, Thanks for your reply! How do you like your new job? Do you miss CompGo? ;-) On Wed, Feb 6, 2008 at 2:20 PM, Sylvain Gelly [EMAIL PROTECTED] wrote: (1) They compared Rave to plain UCT. If they would have compared it to a more sophisticated implementation (like the best Mogo before Rave) they probably could not have shown a spectacular improvement. The best Mogo before Rave was very close to plain UCT with the sequence-like simulations. And indeed we exactly compared the best Mogo before and after Rave. There is a table (I don't remember which number), which show the incremental improvements from plain UCT, to Rave, passing by plain UCT with sequence-like simulations. All experiments have been done with MoGo's code, all other parts of the code staying constant. There were not secret part of MoGo disabled to make the improvement of Rave more interesting. One discrepancy between our results and the one some of you observe, as Gian-Carlo stated, is likely to come from the parameters and detail of implementation. We heavily tuned those parameters and details against gnugo, and that makes quite a big difference. I chatted more closely with some of you about details and it did make a difference. Maybe some of you can share what made a change, if you want. Note as well that the current implementation of MoGo (not the one at the time of the ICML paper) use a different tradeoff between UCT and Rave value, thanks to an idea of David Silver, which brought improvements in 19x19 (where the Rave values are the most useful), while it was marginal (still better) in 9x9. But anyway we here are talking about 9x9, so it can't explain what you are talking about. Well, since you say the improvement is marginal on 9x9 then I think we are actually in agreement. I also get an improvement, but it's just not that much. When I wrote 'spectacular' I meant the reported jump from 25% to over 55% winrate against gnugo. I only got such an improvement when I first dumbed down my implementation to a plain UCT without a good move-ordering (and always expanding all unvisited nodes first). (2) () Depending on the playout policy, adding an upper confidence bound to the rave values can push some terrible bad moves up (like playing on 1-1). The reason seems to be that such moves are normally sampled very infrequently (so the UCB will be higher), and when they are selected (...) That could be an explanation, but there are two points: - the prior you put on top of Rave often avoid to first sample 1-1, and even when you do, you very often loose just 1 playout because of the UCT value you get right away. Yes, using more prior knowledge will probably reduce the problem. - I never observed a big discrepancy between the number of Rave samples for each move. I guess this is because your playout policy is more uniform than mine? The problem tends to disappear with uniform random playouts. My program has some hard-reject patterns to discard moves that are strictly inferior to adjacent alternatives, so in some situations I can easily get a large difference between the number of Rave samples for each move. Best, Erik ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)
I found some data. GGMC Go v2r6, against GNU Go 3.7.10 level 10, 9x9, komi 7.5, 3000 playouts/move, 2000 games match: Without RAVE: winning rate was 23.1 +- 0.9% (-209 +- 9 ELO) With RAVE: winning rate was 65.3 +- 1.1% (+110 +- 8 ELO) Though this includes some other improvements, most come from RAVE. Unlike MoGo, my best 'K' was 1000. Following is my implementation of RAVE for GGMC v2r6. 1) Each playout returns the score and all moves with colors played. 2) While back-propagating the value (degitized score), computes the mean and the variance according to UCB1 and do the same for RAVE seperatelly. For RAVE, the values of all (legal) moves, except played one, in a node are updated. 3) In the computation of values for RAVE, the point is that there appeares three colors (as someone, I remember GCP, mentioned before). If the players' colors aren't the same then skip. Count the value as is or negate (1 - score, for me), depending on the color of the player at the position and the color for the score. 4) Before back-propagating the value of each playout, I setup a color table for all intersections of the board for speed-up, in fact (initialized with EMPTY). That is, fill the board (table[move] = color) by tracing the moves and the colors returned by the playout forward (from leaf node to end of the game). Then, by tracing the path from root to the leaf node, clear the table[move] (table[move] = EMPTY), in order to avoid duplicate counting with UCB1. 5) While descending the tree, merge the values come from UCB1 and RAVE with 'K' according to the formula in the paper. #Though I'm writing this by reading my source code, this description may include some errors. Hope this helps, Hideki Gian-Carlo Pascutto: [EMAIL PROTECTED]: I also implemented RAVE in Mango. There was a few points of improvements (around 60 Elo points with gnugo as reference), but as much as in the paper of Gelly and Silver :( (around 250 Elo points if I remember well) It might be that the effect of RAVE depends a lot on the simulation strategy. Indeed, sometimes my RAVE was playing very good moves but also very bad ones. I don't think the simulation strategy is the key. I suspect the improvement is largest when you don't do progressive widening. Nevertheless it would be quite interesting to see the implementation details of ggmc's RAVE. RAVE performance is quite dependent on exact implementation and parameters. -- [EMAIL PROTECTED] (Kato) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)
On Feb 6, 2008 11:39 AM, Gian-Carlo Pascutto [EMAIL PROTECTED] wrote: Hideki Kato wrote: 4) Before back-propagating the value of each playout, I setup a color table for all intersections of the board for speed-up, in fact (initialized with EMPTY). That is, fill the board (table[move] = color) by tracing the moves and the colors returned by the playout forward (from leaf node to end of the game). Then, by tracing the path from root to the leaf node, clear the table[move] (table[move] = EMPTY), in order to avoid duplicate counting with UCB1. I don't understand this. What and how would you be double counting? Let's a set of stones played in the playout get captured and then the void gets filled again. The key is to count only the first move at a set board position. If you don't, you can end up counting a particular move more than once. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)
Hideki Kato wrote: 4) Before back-propagating the value of each playout, I setup a color table for all intersections of the board for speed-up, in fact (initialized with EMPTY). That is, fill the board (table[move] = color) by tracing the moves and the colors returned by the playout forward (from leaf node to end of the game). Then, by tracing the path from root to the leaf node, clear the table[move] (table[move] = EMPTY), in order to avoid duplicate counting with UCB1. I don't understand this. What and how would you be double counting? -- GCP ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)
Gian-Carlo Pascutto: [EMAIL PROTECTED]: Hideki Kato wrote: 4) Before back-propagating the value of each playout, I setup a color table for all intersections of the board for speed-up, in fact (initialized with EMPTY). That is, fill the board (table[move] = color) by tracing the moves and the colors returned by the playout forward (from leaf node to end of the game). Then, by tracing the path from root to the leaf node, clear the table[move] (table[move] = EMPTY), in order to avoid duplicate counting with UCB1. I don't understand this. What and how would you be double counting? I mean the values of such moves are updated in both UCB and RAVE. That is, the moves in the path are updated by UCB and all moves in the nodes in the path are updated by RAVE. As UCB values and RAVE values will be averaged later, perhaps I thought not updating the values of such moves in RAVE would be _natural_. As this code was written last Oct. and I've been working on other staffs, I'm not sure I remember the idea correctly. But I believe this improved some. -Hideki -- [EMAIL PROTECTED] (Kato) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)
Hi Hideki, Your results look similar to those of Mogo as reported in their icml paper. When you ran this experiment, did you use anything like FPU or progressive widening, or did you use Levente's original design which always selects unvisited moves first? Regards, Erik On Wed, Feb 6, 2008 at 3:42 PM, Hideki Kato [EMAIL PROTECTED] wrote: I found some data. GGMC Go v2r6, against GNU Go 3.7.10 level 10, 9x9, komi 7.5, 3000 playouts/move, 2000 games match: Without RAVE: winning rate was 23.1 +- 0.9% (-209 +- 9 ELO) With RAVE: winning rate was 65.3 +- 1.1% (+110 +- 8 ELO) Though this includes some other improvements, most come from RAVE. Unlike MoGo, my best 'K' was 1000. Following is my implementation of RAVE for GGMC v2r6. 1) Each playout returns the score and all moves with colors played. 2) While back-propagating the value (degitized score), computes the mean and the variance according to UCB1 and do the same for RAVE seperatelly. For RAVE, the values of all (legal) moves, except played one, in a node are updated. 3) In the computation of values for RAVE, the point is that there appeares three colors (as someone, I remember GCP, mentioned before). If the players' colors aren't the same then skip. Count the value as is or negate (1 - score, for me), depending on the color of the player at the position and the color for the score. 4) Before back-propagating the value of each playout, I setup a color table for all intersections of the board for speed-up, in fact (initialized with EMPTY). That is, fill the board (table[move] = color) by tracing the moves and the colors returned by the playout forward (from leaf node to end of the game). Then, by tracing the path from root to the leaf node, clear the table[move] (table[move] = EMPTY), in order to avoid duplicate counting with UCB1. 5) While descending the tree, merge the values come from UCB1 and RAVE with 'K' according to the formula in the paper. #Though I'm writing this by reading my source code, this description may include some errors. Hope this helps, Hideki Gian-Carlo Pascutto: [EMAIL PROTECTED]: I also implemented RAVE in Mango. There was a few points of improvements (around 60 Elo points with gnugo as reference), but as much as in the paper of Gelly and Silver :( (around 250 Elo points if I remember well) It might be that the effect of RAVE depends a lot on the simulation strategy. Indeed, sometimes my RAVE was playing very good moves but also very bad ones. I don't think the simulation strategy is the key. I suspect the improvement is largest when you don't do progressive widening. Nevertheless it would be quite interesting to see the implementation details of ggmc's RAVE. RAVE performance is quite dependent on exact implementation and parameters. -- [EMAIL PROTECTED] (Kato) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)
Hi Erik, In the ICML version of UCT without RAVE, you did not use your First Play Urgency, right? I think that using FPU has an effect similar to what others reported with their progressive widening. From what I've seen it looks like plain UCT, without FPU or progressive widening, has more to gain from RAVE. Am I wrong to assume that the strongest version of Mogo before you introduced RAVE used something like FPU or progressive widening? You are right, but the effect of terms which were before is by far negligeable. It may definitely be possible that stronger use of knowledge like the progressive widening make the effect of RAVE much less interesting. It was not our case at that time. I am also not sure that it is the main reason of failure for people who tried and report a small improvement. Of course, there can be so many reasons that it is difficult to find out without looking at the implementation in details :/. Best, Sylvain ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)
Hi Erik, My program is very based on MoGo's report and the paper. Yes, I used FPU of 1.15. -Hideki Erik van der Werf: [EMAIL PROTECTED]: Hi Hideki, Your results look similar to those of Mogo as reported in their icml paper. When you ran this experiment, did you use anything like FPU or progressive widening, or did you use Levente's original design which always selects unvisited moves first? Regards, Erik On Wed, Feb 6, 2008 at 3:42 PM, Hideki Kato [EMAIL PROTECTED] wrote: I found some data. GGMC Go v2r6, against GNU Go 3.7.10 level 10, 9x9, komi 7.5, 3000 playouts/move, 2000 games match: Without RAVE: winning rate was 23.1 +- 0.9% (-209 +- 9 ELO) With RAVE: winning rate was 65.3 +- 1.1% (+110 +- 8 ELO) Though this includes some other improvements, most come from RAVE. Unlike MoGo, my best 'K' was 1000. Following is my implementation of RAVE for GGMC v2r6. 1) Each playout returns the score and all moves with colors played. 2) While back-propagating the value (degitized score), computes the mean and the variance according to UCB1 and do the same for RAVE seperatelly. For RAVE, the values of all (legal) moves, except played one, in a node are updated. 3) In the computation of values for RAVE, the point is that there appeares three colors (as someone, I remember GCP, mentioned before). If the players' colors aren't the same then skip. Count the value as is or negate (1 - score, for me), depending on the color of the player at the position and the color for the score. 4) Before back-propagating the value of each playout, I setup a color table for all intersections of the board for speed-up, in fact (initialized with EMPTY). That is, fill the board (table[move] = color) by tracing the moves and the colors returned by the playout forward (from leaf node to end of the game). Then, by tracing the path from root to the leaf node, clear the table[move] (table[move] = EMPTY), in order to avoid duplicate counting with UCB1. 5) While descending the tree, merge the values come from UCB1 and RAVE with 'K' according to the formula in the paper. #Though I'm writing this by reading my source code, this description may include some errors. Hope this helps, Hideki Gian-Carlo Pascutto: [EMAIL PROTECTED]: I also implemented RAVE in Mango. There was a few points of improvements (around 60 Elo points with gnugo as reference), but as much as in the paper of Gelly and Silver :( (around 250 Elo points if I remember well) It might be that the effect of RAVE depends a lot on the simulation strategy. Indeed, sometimes my RAVE was playing very good moves but also very bad ones. I don't think the simulation strategy is the key. I suspect the improvement is largest when you don't do progressive widening. Nevertheless it would be quite interesting to see the implementation details of ggmc's RAVE. RAVE performance is quite dependent on exact implementation and parameters. -- [EMAIL PROTECTED] (Kato) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ -- [EMAIL PROTECTED] (Kato) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/