On Sunday 16 November 2008, Heikki Levanto wrote: > On Sat, Nov 15, 2008 at 11:38:34PM +0100, [EMAIL PROTECTED] wrote: > > Being a computer scientist but new to go, i can grasp some of the theory. > > The question I was trying to get across was: > > > > In a game of self play, if both parties are employing only monte carlo, > > surely its not a good conceptual representation of a human, and if the > > reinforcement learning is based on random simulations wouldnt it be very > > weak when playing a real human? > > Here is another amateur answering. > > The way I understand it, modern Monte Carlo programs do not even try to > emulate a human player with a random player - obviously that would not > work. > > What they do is that they build a quite traditional search tree starting > from the current position. They use a random playout as a crude way to > evaluate a position. Based on this evaluation, they decide which branch of > the tree to expand. > > This is the way I understand the random playouts: If, in a given position, > white is clearly ahead, he will win the game if both parts play perfect > moves. He is also likely to win if both parts play reasonably good moves > (say, like human amateurs), but there is a bit more of a chance that one > player hits upon a good combination which the other misses, so the result > is not quite as reliable. If the playouts are totally random, there is > still a better chance for white to win, because both parts make equally bad > moves. The results have much more variation, of course. So far it does not > sound like a very good proposal, but things change if you consider the > facts that we don't have perfecr oracles, and good humans are slow to play > out a position, and can not be integrated into a computer program. Whereas > random playouts can be done awfully fast, tens of thousands of playouts in > a second. Averaging the reuslts gives a fair indication of who is more > likely to win from that position, just what is needed to decide which part > of the search tree to expand.
Do you know what use (if any) is made of the standard deviation of the results? > > The 'random' playouts are not totally random, they include a minimum of > tactical rules (do not fill own eyes, do not pass as long as there are > valid moves). Even this little will produce a few blind spots, moves that > the playouts can not see, and systematically wrong results. Adding more > go-specific knowledge can make the results much better (more likely to be > right), but can also add some more blind spots. And it costs time, reducing > the number of playouts the program can make. > > Hope that explains something of the mystery > > > Regards > > Heikki _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/