On Sunday 16 November 2008, Heikki Levanto wrote:
> On Sat, Nov 15, 2008 at 11:38:34PM +0100, [EMAIL PROTECTED] wrote:
> > Being a computer scientist but new to go, i can grasp some of the theory.
> > The question I was trying to get across was:
> >
> > In a game of self play, if both parties are employing only monte carlo,
> > surely its not a good conceptual representation of a human, and if the
> > reinforcement learning is based on random simulations wouldnt it be very
> > weak when playing a real human?
>
> Here is another amateur answering.
>
> The way I understand it, modern Monte Carlo programs do not even try to
> emulate a human player with a random player - obviously that would not
> work.
>
> What they do is that they build a quite traditional search tree starting
> from the current position. They use a random playout as a crude way to
> evaluate a position. Based on this evaluation, they decide which branch of
> the tree to expand.
>
> This is the way I understand the random playouts: If, in a given position,
> white is clearly ahead, he will win the game if both parts play perfect
> moves. He is also likely to win if both parts play reasonably good moves
> (say, like human amateurs), but there is a bit more of a chance that one
> player hits upon a good combination which the other misses, so the result
> is not quite as reliable. If the playouts are totally random, there is
> still a better chance for white to win, because both parts make equally bad
> moves. The results have much more variation, of course. So far it does not
> sound like a very good proposal, but things change if you consider the
> facts that we don't have perfecr oracles, and good humans are slow to play
> out a position, and can not be integrated into a computer program. Whereas
> random playouts can be done awfully fast, tens of thousands of playouts in
> a second. Averaging the reuslts gives a fair indication of who is more
> likely to win from that position, just what is needed to decide which part
> of the search tree to expand.

Do you know what use (if any) is made of the standard deviation of the 
results?

>
> The 'random' playouts are not totally random, they include a minimum of
> tactical rules (do not fill own eyes, do not pass as long as there are
> valid moves). Even this little will produce a few blind spots, moves that
> the playouts can not see, and systematically wrong results. Adding more
> go-specific knowledge can make the results much better (more likely to be
> right), but can also add some more blind spots. And it costs time, reducing
> the number of playouts the program can make.
>
> Hope that explains something of the mystery
>
>
> Regards
>
>    Heikki


_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to