Hi Sylvain, Thanks for your reply! How do you like your new job? Do you miss CompGo? ;-)
On Wed, Feb 6, 2008 at 2:20 PM, Sylvain Gelly <[EMAIL PROTECTED]> wrote: > > (1) They compared Rave to plain UCT. If they would have compared it to > > a more sophisticated implementation (like the best Mogo before Rave) > > they probably could not have shown a spectacular improvement. > > The best Mogo before Rave was very close to plain UCT with the > sequence-like simulations. And indeed we exactly compared the best > Mogo before and after Rave. There is a table (I don't remember which > number), which show the incremental improvements from plain UCT, to > Rave, passing by plain UCT with sequence-like simulations. All > experiments have been done with MoGo's code, all other parts of the > code staying constant. There were not "secret part" of MoGo disabled > to make the improvement of Rave more interesting. > > One discrepancy between our results and the one some of you observe, > as Gian-Carlo stated, is likely to come from the parameters and detail > of implementation. We heavily tuned those parameters and details > against gnugo, and that makes quite a big difference. I chatted more > closely with some of you about details and it did make a difference. > Maybe some of you can share what made a change, if you want. > > Note as well that the current implementation of MoGo (not the one at > the time of the ICML paper) use a different tradeoff between UCT and > Rave value, thanks to an idea of David Silver, which brought > improvements in 19x19 (where the Rave values are the most useful), > while it was marginal (still better) in 9x9. But anyway we here are > talking about 9x9, so it can't explain what you are talking about. Well, since you say the improvement is marginal on 9x9 then I think we are actually in agreement. I also get an improvement, but it's just not that much. When I wrote 'spectacular' I meant the reported jump from 25% to over 55% winrate against gnugo. I only got such an improvement when I first dumbed down my implementation to a plain UCT without a good move-ordering (and always expanding all unvisited nodes first). > > (2) (....) Depending on the playout > > > policy, adding an upper confidence bound to the rave values can push > > some terrible bad moves up (like playing on 1-1). The reason seems to > > be that such moves are normally sampled very infrequently (so the UCB > > will be higher), and when they are selected (...) > > That could be an explanation, but there are two points: > - the prior you put on top of Rave often avoid to first sample 1-1, > and even when you do, you very often loose just 1 playout because of > the UCT value you get right away. Yes, using more prior knowledge will probably reduce the problem. > - I never observed a big discrepancy between the number of Rave > samples for each move. I guess this is because your playout policy is more uniform than mine? The problem tends to disappear with uniform random playouts. My program has some hard-reject patterns to discard moves that are strictly inferior to adjacent alternatives, so in some situations I can easily get a large difference between the number of Rave samples for each move. Best, Erik _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/