Van: tony tang [mailto:[EMAIL PROTECTED] Verzonden: za 15-11-2008 23:08 Aan: [EMAIL PROTECTED] Onderwerp: RE: computer-go] Monte carlo play?
Thanks Dave, that was incredible helpful, hopefully this new hobby of mine will materialise into a decent project. All the best Tony ________________________________ Subject: RE: computer-go] Monte carlo play? Date: Sat, 15 Nov 2008 22:44:51 +0100 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Hello Tony, Ok, so you know about the minimax algorithm and the like. My impression was wrong and I'm very sorry for my analogy. I'm no expert on montecarlo, so I can only say what I know. But most of the real experts have been rather quiet during the last week, so perhaps I can act as a stand-in. The basics is indeed absolutely random play (alternating turns and removing captured stones as in normal play) until the game finishes. The only exceptions to random plays are occupied intersections, simple ko recaptures, suicide moves (not sure if all programs prune suicide if the rules allow it) and moves that fill one-point eyes of the color who is to play (there may be slight differences in the exact definition of a one point eye). Without the one eye pruning, random games would never finish, because both random players would keep on killing their own stones and refilling the empty space after the opponent has captured. Even with eye detection and simple ko detection, the game could end up in a cycle occasionally because of triple ko or more pathological situations, so one should probably also limit the number of moves and discard the result if that limit is exceeded. This is probably a much cheaper solution to the cycle problem than a more sophisticated detection). A purely random game is played out until both players are forced to pass. This is a called a light playout. Claus Reinke recently dubbed the term "random fill-in" for this process (http://www.mail-archive.com/computer-go@computer-go.org/msg09749.html). I like this term better than "light playout", but it doesn't seem to catch on here. The board is then counted to determine the winner and this binary result is added to a statistic for the move that is currently being evaluated. The accumulated statistic is effectively the evaluation value for the position. It is then backtracked to the game tree root in a minimax way. In go it is hard to quickly evaluate a position statically (or even slowly for that matter). So it is nice to have a dynamic "evaluation" that becomes more accurate as you spend more cpu cyles on it. You can see that it's important that random fill-ins are fast, but it's still a rather large investment to do a lot of playouts, when you compare this with a static evaluation function. So it is even more important to avoid wasting effort in evaluating positions by random fill-ins. The trick is to use crude statistics (few fill-ins) to decide which statistic to refine by investing more fill-ins. This is where the UCT value comes in (http://senseis.xmp.net/?UCT). It is derived from the statistic and it decides which position to investigate further, preferring good statistics over bad ones, but also preferring crude statistics over accurate ones. There is no training involved and there is no a priori knowledge involved here except for the rules and eye detection. The program has absolutely no idea about go concepts. Would you call this process learning? Programs using this algorithm are called "pure" MC programs and although they get better with more CPU cycles, they aren't very strong with normal time limits on current hardware (or hardware in the near future). You are right, by itself pure MC is just not effective enough. So programmers started to build in more Go knowledge to improve on this. This makes the fill-in process slower, so they are called "heavy playouts". And this is also where my knowledge ends. I may guess that instead of selecting moves in a uniform random way, moves are weighted by a priori Go knowledge built in by the programmer, but I know no details and I don't know if there have been publications on this. Perhaps there aren't any, because it is also a competitive field and commercial interests might be involved too. Other contributors can probably tell you more about this. Best Regards, Dave ________________________________ Van: tony tang [mailto:[EMAIL PROTECTED] Verzonden: za 15-11-2008 21:22 Aan: [EMAIL PROTECTED] Onderwerp: [RE:computer-go] Monte carlo play? Hello Dave, Thank you for a thorough introduction of the theory, and i sincerely hope I am not wasting your time with amateur questions. Because it was quite late when I sent the question I think I didnt word it correctly and mis-communicated my question. Being a computer scientist but new to go, i can grasp some of the theory. The question I was trying to get across was: In a game of self play, if both parties are employing only monte carlo, surely its not a good conceptual representation of a human, and if the reinforcement learning is based on random simulations wouldnt it be very weak when playing a real human? According to nick wedd and some playing around I am clearly mistaken because its works quite well. Is it because they are training(not in the neural network sense) their monte carlo based engine against another engine with prior knowledge or something of that nature? I have seen papers implementing patterns and prior knowledge with monte carlo (dated 2006) has that become a "standard" now, and when people refer to monte carlo they dont mean absolutly random? Thank you Kind regards Tony ________________________________ Read amazing stories to your kids on Messenger Try it Now! <http://clk.atdmt.com/UKM/go/117588488/direct/01/> ________________________________ Read amazing stories to your kids on Messenger Try it Now! <http://clk.atdmt.com/UKM/go/117588488/direct/01/>
_______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/