Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

Erik van der Werf Wed, 06 Feb 2008 06:02:57 -0800

Hi Sylvain,

Thanks for your reply! How do you like your new job? Do you miss CompGo? ;-)



On Wed, Feb 6, 2008 at 2:20 PM, Sylvain Gelly <[EMAIL PROTECTED]> wrote:
>  > (1) They compared Rave to plain UCT. If they would have compared it to
>  > a more sophisticated implementation (like the best Mogo before Rave)
>  > they probably could not have shown a spectacular improvement.
>
>  The best Mogo before Rave was very close to plain UCT with the
>  sequence-like simulations. And indeed we exactly compared the best
>  Mogo before and after Rave. There is a table (I don't remember which
>  number), which show the incremental improvements from plain UCT, to
>  Rave, passing by plain UCT with sequence-like simulations. All
>  experiments have been done with MoGo's code, all other parts of the
>  code staying constant. There were not "secret part" of MoGo disabled
>  to make the improvement of Rave more interesting.
>
>  One discrepancy between our results and the one some of you observe,
>  as Gian-Carlo stated, is likely to come from the parameters and detail
>  of implementation. We heavily tuned those parameters and details
>  against gnugo, and that makes quite a big difference. I chatted more
>  closely with some of you about details and it did make a difference.
>  Maybe some of you can share what made a change, if you want.
>
>  Note as well that the current implementation of MoGo (not the one at
>  the time of the ICML paper) use a different tradeoff between UCT and
>  Rave value, thanks to an idea of David Silver, which brought
>  improvements in 19x19 (where the Rave values are the most useful),
>  while it was marginal (still better) in 9x9. But anyway we here are
>  talking about 9x9, so it can't explain what you are talking about.

Well, since you say the improvement is marginal on 9x9 then I think we
are actually in agreement. I also get an improvement, but it's just
not that much. When I wrote 'spectacular' I meant the reported jump
from 25% to over 55% winrate against gnugo. I only got such an
improvement when I first dumbed down my implementation to a plain UCT
without a good move-ordering (and always expanding all unvisited nodes
first).


>  > (2) (....) Depending on the playout
>
> > policy, adding an upper confidence bound to the rave values can push
>  > some terrible bad moves up (like playing on 1-1). The reason seems to
>  > be that such moves are normally sampled very infrequently (so the UCB
>  > will be higher), and when they are selected (...)
>
>  That could be an explanation, but there are two points:
>  - the prior you put on top of Rave often avoid to first sample 1-1,
>  and even when you do, you very often loose just 1 playout because of
>  the UCT value you get right away.

Yes, using more prior knowledge will probably reduce the problem.


>  - I never observed a big discrepancy between the number of Rave
>  samples for each move.

I guess this is because your playout policy is more uniform than mine?
The problem tends to disappear with uniform random playouts.
My program has some hard-reject patterns to discard moves that are
strictly inferior to adjacent alternatives, so in some situations I
can easily get a large difference between the number of Rave samples
for each move.

Best,
Erik
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

Reply via email to