> Hello all,
>
> We just presented our paper describing MoGo's improvements at ICML,
> and we thought we would pass on some of the feedback and corrections
> we have received.
> (http://www.machinelearning.org/proceedings/icml2007/papers/387.pdf)
>
I have the feeling that the paper is important, but it is completly
obfuscated by the strange reinforcement learning notation and jargon.

I am sorry if the paper is not clear to you or other people on this mailing list. However, we chose this notation for several good reasons:

1. We wish to reach a wide audience - the whole machine learning community, for whom this notation is well-known. 2. We want other communities to find out about UCT, and start using it many different domains. It is not just a Go-programming algorithm! 3. We want to point out that UCT is a reinforcement learning algorithm, and fits into an existing framework. This is an important point for all of us - the established ideas and methods of RL can be applied to our UCT Go programs. 4. There are already many papers describing UCT in the games literature. There are very few papers describing UCT to the machine learning community. So we hope to make a clear presentation of UCT to them, and show that it can achieve good performance.

Can anyone explain it in Go-programming words?

Maybe I can explain it using pictures :-)
I just updated my website, so you can see our ICML presentation. It may help to understand the ideas: http://www.cs.ualberta.ca/~silver/ research/presentations/files/sylvain-silver.pdf

Is the RLOG Evaluation function used for evaluation or for just selecting
the best move? (by doing a 1 Ply search).

We used the RLGO evaluation function in two different ways.

1. We tried using it for play-outs (as a "heavy" simulation), which didn't work as well as MoGo's handcrafted play-outs. This is surprising, because RLGO is much stronger than MoGo's simulation player.

2. We tried using it for new nodes in the tree. When a new position is encountered, and we add it to the UCT, what should the initial value be? We use RLGO to provide an initial value, and we specify how many games of simulation this initial value is worth. The RLGO value function does better than any of the other heuristics we tried.

Can anyone explain me, why it is necessary to obfuscate things at all? Why is a move an action and not just a move, a game an episode and not a game?
Is it less scientific if coders than myself can understand it?

Not less scientific, but less general. We hope to make the point that our ideas are not restricted to games.


It was pointed out by Donald Knuth in his paper on Alpha-Beta, that the -
simple - algorithm was not understood for a long time, because of the
inappropriate mathematical notation. For recursive functions, (pseudo-)code
is much better suited than the mathematical notation. Actually its
pseudo-mathematic notation.
Why is this inappropriate notation still used?

Actually I think the best notation would be: description in plain text + mathematical notation + pseudocode + many diagrams. But in a conference paper we have just 8 pages to describe everything, so we must make some compromises.


I have build just for fun a simple BackGammon engine. I think it does what the paper proposses for the Monte-Carlo-Part. It uses a simple evaluation function to select the next move in the Rollout aka Monte-Carlo simulation. The engine does not build up an UCT-tree. It uses UCT only at the root. The
rollout always starts at the first ply.
The 1ply engine has not the slightest chance against sophisticated
BackGammon programm. But the simple minded UCT version is already a serious
opponent.

Why do you call this UCT if there is no tree? Isn't this just roll- out simulation, as used by Tesauro and Galperin in 1996?

-Dave
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to