> Hello all,
>
> We just presented our paper describing MoGo's improvements at ICML,
> and we thought we would pass on some of the feedback and corrections
> we have received.
> (http://www.machinelearning.org/proceedings/icml2007/papers/387.pdf)
>
I have the feeling that the paper is important, but it is completly
obfuscated by the strange reinforcement learning notation and jargon.
I am sorry if the paper is not clear to you or other people on this
mailing list. However, we chose this notation for several good reasons:
1. We wish to reach a wide audience - the whole machine learning
community, for whom this notation is well-known.
2. We want other communities to find out about UCT, and start using
it many different domains. It is not just a Go-programming algorithm!
3. We want to point out that UCT is a reinforcement learning
algorithm, and fits into an existing framework. This is an important
point for all of us - the established ideas and methods of RL can be
applied to our UCT Go programs.
4. There are already many papers describing UCT in the games
literature. There are very few papers describing UCT to the machine
learning community. So we hope to make a clear presentation of UCT to
them, and show that it can achieve good performance.
Can anyone explain it in Go-programming words?
Maybe I can explain it using pictures :-)
I just updated my website, so you can see our ICML presentation. It
may help to understand the ideas: http://www.cs.ualberta.ca/~silver/
research/presentations/files/sylvain-silver.pdf
Is the RLOG Evaluation function used for evaluation or for just
selecting
the best move? (by doing a 1 Ply search).
We used the RLGO evaluation function in two different ways.
1. We tried using it for play-outs (as a "heavy" simulation), which
didn't work as well as MoGo's handcrafted play-outs. This is
surprising, because RLGO is much stronger than MoGo's simulation player.
2. We tried using it for new nodes in the tree. When a new position
is encountered, and we add it to the UCT, what should the initial
value be? We use RLGO to provide an initial value, and we specify how
many games of simulation this initial value is worth. The RLGO value
function does better than any of the other heuristics we tried.
Can anyone explain me, why it is necessary to obfuscate things at
all? Why
is a move an action and not just a move, a game an episode and not
a game?
Is it less scientific if coders than myself can understand it?
Not less scientific, but less general. We hope to make the point that
our ideas are not restricted to games.
It was pointed out by Donald Knuth in his paper on Alpha-Beta, that
the -
simple - algorithm was not understood for a long time, because of the
inappropriate mathematical notation. For recursive functions,
(pseudo-)code
is much better suited than the mathematical notation. Actually its
pseudo-mathematic notation.
Why is this inappropriate notation still used?
Actually I think the best notation would be: description in plain
text + mathematical notation + pseudocode + many diagrams. But in a
conference paper we have just 8 pages to describe everything, so we
must make some compromises.
I have build just for fun a simple BackGammon engine. I think it
does what
the paper proposses for the Monte-Carlo-Part. It uses a simple
evaluation
function to select the next move in the Rollout aka Monte-Carlo
simulation.
The engine does not build up an UCT-tree. It uses UCT only at the
root. The
rollout always starts at the first ply.
The 1ply engine has not the slightest chance against sophisticated
BackGammon programm. But the simple minded UCT version is already a
serious
opponent.
Why do you call this UCT if there is no tree? Isn't this just roll-
out simulation, as used by Tesauro and Galperin in 1996?
-Dave
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/