[computer-go] Re: Explanation to MoGo paper wanted.

David Silver Tue, 03 Jul 2007 10:23:01 -0700

> Hello all,
>
> We just presented our paper describing MoGo's improvements at ICML,
> and we thought we would pass on some of the feedback and corrections
> we have received.
> (http://www.machinelearning.org/proceedings/icml2007/papers/387.pdf)
>
I have the feeling that the paper is important, but it is completly
obfuscated by the strange reinforcement learning notation and jargon.

I am sorry if the paper is not clear to you or other people on thismailing list. However, we chose this notation for several good reasons:

1. We wish to reach a wide audience - the whole machine learningcommunity, for whom this notation is well-known.2. We want other communities to find out about UCT, and start usingit many different domains. It is not just a Go-programming algorithm!3. We want to point out that UCT is a reinforcement learningalgorithm, and fits into an existing framework. This is an importantpoint for all of us - the established ideas and methods of RL can beapplied to our UCT Go programs.4. There are already many papers describing UCT in the gamesliterature. There are very few papers describing UCT to the machinelearning community. So we hope to make a clear presentation of UCT tothem, and show that it can achieve good performance.

Can anyone explain it in Go-programming words?


Maybe I can explain it using pictures :-)

I just updated my website, so you can see our ICML presentation. Itmay help to understand the ideas: http://www.cs.ualberta.ca/~silver/research/presentations/files/sylvain-silver.pdf

Is the RLOG Evaluation function used for evaluation or for justselecting
the best move? (by doing a 1 Ply search).


We used the RLGO evaluation function in two different ways.

1. We tried using it for play-outs (as a "heavy" simulation), whichdidn't work as well as MoGo's handcrafted play-outs. This issurprising, because RLGO is much stronger than MoGo's simulation player.

2. We tried using it for new nodes in the tree. When a new positionis encountered, and we add it to the UCT, what should the initialvalue be? We use RLGO to provide an initial value, and we specify howmany games of simulation this initial value is worth. The RLGO valuefunction does better than any of the other heuristics we tried.

Can anyone explain me, why it is necessary to obfuscate things atall? Whyis a move an action and not just a move, a game an episode and nota game?
Is it less scientific if coders than myself can understand it?

Not less scientific, but less general. We hope to make the point thatour ideas are not restricted to games.

It was pointed out by Donald Knuth in his paper on Alpha-Beta, thatthe -
simple - algorithm was not understood for a long time, because of the
inappropriate mathematical notation. For recursive functions,(pseudo-)code
is much better suited than the mathematical notation. Actually its
pseudo-mathematic notation.
Why is this inappropriate notation still used?

Actually I think the best notation would be: description in plaintext + mathematical notation + pseudocode + many diagrams. But in aconference paper we have just 8 pages to describe everything, so wemust make some compromises.

I have build just for fun a simple BackGammon engine. I think itdoes whatthe paper proposses for the Monte-Carlo-Part. It uses a simpleevaluationfunction to select the next move in the Rollout aka Monte-Carlosimulation.The engine does not build up an UCT-tree. It uses UCT only at theroot. The
rollout always starts at the first ply.
The 1ply engine has not the slightest chance against sophisticated
BackGammon programm. But the simple minded UCT version is already aserious
opponent.

Why do you call this UCT if there is no tree? Isn't this just roll-out simulation, as used by Tesauro and Galperin in 1996?


-Dave

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] Re: Explanation to MoGo paper wanted.

Reply via email to