On Sat, Nov 15, 2008 at 11:38:34PM +0100, [EMAIL PROTECTED] wrote:
> Being a computer scientist but new to go, i can grasp some of the theory.
> The question I was trying to get across was:
> 
> In a game of self play, if both parties are employing only monte carlo,
> surely its not a good conceptual representation of a human, and if the
> reinforcement learning is based on random simulations wouldnt it be very
> weak when playing a real human?


Here is another amateur answering.

The way I understand it, modern Monte Carlo programs do not even try to
emulate a human player with a random player - obviously that would not work.

What they do is that they build a quite traditional search tree starting from
the current position. They use a random playout as a crude way to evaluate a
position. Based on this evaluation, they decide which branch of the tree to
expand.

This is the way I understand the random playouts: If, in a given position,
white is clearly ahead, he will win the game if both parts play perfect
moves. He is also likely to win if both parts play reasonably good moves
(say, like human amateurs), but there is a bit more of a chance that one
player hits upon a good combination which the other misses, so the result is
not quite as reliable. If the playouts are totally random, there is still a
better chance for white to win, because both parts make equally bad moves.
The results have much more variation, of course. So far it does not sound
like a very good proposal, but things change if you consider the facts that
we don't have perfecr oracles, and good humans are slow to play out a
position, and can not be integrated into a computer program. Whereas random
playouts can be done awfully fast, tens of thousands of playouts in a second.
Averaging the reuslts gives a fair indication of who is more likely to win
from that position, just what is needed to decide which part of the search
tree to expand.

The 'random' playouts are not totally random, they include a minimum of
tactical rules (do not fill own eyes, do not pass as long as there are valid
moves). Even this little will produce a few blind spots, moves that the
playouts can not see, and systematically wrong results. Adding more
go-specific knowledge can make the results much better (more likely to be
right), but can also add some more blind spots. And it costs time, reducing
the number of playouts the program can make.

Hope that explains something of the mystery


Regards

   Heikki

-- 
Heikki Levanto   "In Murphy We Turst"     heikki (at) lsd (dot) dk

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to