As Rémi said, MCTS has little to do with MCMC, except in the trivial
sense where iid sampling is a degenerate Markov chain.

What is done in MCTS is that the target distribution is directly
sampled. The playout (a Markov chain, but with a role that has nothing
to do with MCMC) is just a somewhat complicated way to get this exact
sample.

In MCMC, we have no (easy) way to sample the target distribution $p$.
But we can build a Markov chain whose stationary distribution is $p$.
Typically we know the likelihood ratios P(A)/P(B) for different outcomes
(like we are three times more likely to win than to draw), but we do not
know the normalisation (does this mean I have 10% chance draw, 30% win,
or 20% draw, 60% win?).
We then run the Markov chain for long enough, and we hope that we arrive
at a point that is distributed almost as if we had sampled it directly
from $p$.
Of course this is something we will never do when the state space has
only three outcomes (loss, draw, win), since it is very easy in that
case to get the normalisation.

Maybe it helps to notice one major difference between the Markov chains in MCTS and MCMC:
in MCMC the Markov chain lives in the same space as the final
distribution. The possible states are only loss, draw, win. In MCTS the
Markov chain lives on the whole board, and transitions are the moves...

Jonas
_______________________________________________
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to