Re: [computer-go] FW: computer-go] Monte carlo play?
I use a method inititally from the Mogo team that sorts of randomizes the position before running the heavy playout. One simply plays uniformly random *non contact* moves. The effect of this is that it preserves the shapes of stones on the board, but it prevents the heavy playouts from playing long sequences deterministically. I have not tested this, but I felt that the evaluation on large boards got less noisy and/or biased doing this. Magnus Quoting Michael Williams [EMAIL PROTECTED]: It seems move selection in the playouts should be very random at first and more deterministic toward the end of the playout. Has anyone tried that? ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] FW: computer-go] Monte carlo play?
I have to add that it is possible that a large part of the advantage from using heavy playouts in valkyria comes from using the same code to bias the the exploration part of MCTS. I could probably test it by simply relying completely on AMAF with the proximity heuristic as the only bias. A second experiment would be to rewrite the playouts and make them superfast. -Magnus Quoting Mark Boon [EMAIL PROTECTED]: On 17-nov-08, at 02:42, George Dahl wrote: So you say that: ...I'm observing that most of the increase in level comes from the selection during exploration and only in small part from the selection during simulation., could you elaborate at all? This is very interesting. That almost suggests it might be fruitful to use the patterns in the tree, but keep lighter playouts. That's exactly what it's suggesting. But as I said, I need to do some more testing to make a hard case for that. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ -- Magnus Persson Berlin, Germany ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] FW: computer-go] Monte carlo play?
So you say that: ...I'm observing that most of the increase in level comes from the selection during exploration and only in small part from the selection during simulation., could you elaborate at all? This is very interesting. That almost suggests it might be fruitful to use the patterns in the tree, but keep lighter playouts. If I understand David Fotland [1] (and you) correctly this is the same approach Many Faces uses. Keep the playouts light, but it is worth spending effort to guide the MCTS tree. I also think Remi was saying the same in his paper [2] on using move patterns. E.g. in section 4.1 he says he just used a few features in the simulations, but used all the features to control the progressive widening of the monte-carlo search tree (section 4.2). Darren [1]: http://www.mail-archive.com/computer-go@computer-go.org/msg09759.html [2]: http://remi.coulom.free.fr/Amsterdam2007/ -- Darren Cook, Software Researcher/Developer http://dcook.org/mlsn/ (English-Japanese-German-Chinese-Arabic open source dictionary/semantic network) http://dcook.org/work/ (About me and my work) http://dcook.org/blogs.html (My blogs and articles) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] FW: computer-go] Monte carlo play?
On Sat, Nov 15, 2008 at 11:38:34PM +0100, [EMAIL PROTECTED] wrote: Being a computer scientist but new to go, i can grasp some of the theory. The question I was trying to get across was: In a game of self play, if both parties are employing only monte carlo, surely its not a good conceptual representation of a human, and if the reinforcement learning is based on random simulations wouldnt it be very weak when playing a real human? Here is another amateur answering. The way I understand it, modern Monte Carlo programs do not even try to emulate a human player with a random player - obviously that would not work. What they do is that they build a quite traditional search tree starting from the current position. They use a random playout as a crude way to evaluate a position. Based on this evaluation, they decide which branch of the tree to expand. This is the way I understand the random playouts: If, in a given position, white is clearly ahead, he will win the game if both parts play perfect moves. He is also likely to win if both parts play reasonably good moves (say, like human amateurs), but there is a bit more of a chance that one player hits upon a good combination which the other misses, so the result is not quite as reliable. If the playouts are totally random, there is still a better chance for white to win, because both parts make equally bad moves. The results have much more variation, of course. So far it does not sound like a very good proposal, but things change if you consider the facts that we don't have perfecr oracles, and good humans are slow to play out a position, and can not be integrated into a computer program. Whereas random playouts can be done awfully fast, tens of thousands of playouts in a second. Averaging the reuslts gives a fair indication of who is more likely to win from that position, just what is needed to decide which part of the search tree to expand. The 'random' playouts are not totally random, they include a minimum of tactical rules (do not fill own eyes, do not pass as long as there are valid moves). Even this little will produce a few blind spots, moves that the playouts can not see, and systematically wrong results. Adding more go-specific knowledge can make the results much better (more likely to be right), but can also add some more blind spots. And it costs time, reducing the number of playouts the program can make. Hope that explains something of the mystery Regards Heikki -- Heikki Levanto In Murphy We Turst heikki (at) lsd (dot) dk ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] FW: computer-go] Monte carlo play?
On Sunday 16 November 2008, Heikki Levanto wrote: On Sat, Nov 15, 2008 at 11:38:34PM +0100, [EMAIL PROTECTED] wrote: Being a computer scientist but new to go, i can grasp some of the theory. The question I was trying to get across was: In a game of self play, if both parties are employing only monte carlo, surely its not a good conceptual representation of a human, and if the reinforcement learning is based on random simulations wouldnt it be very weak when playing a real human? Here is another amateur answering. The way I understand it, modern Monte Carlo programs do not even try to emulate a human player with a random player - obviously that would not work. What they do is that they build a quite traditional search tree starting from the current position. They use a random playout as a crude way to evaluate a position. Based on this evaluation, they decide which branch of the tree to expand. This is the way I understand the random playouts: If, in a given position, white is clearly ahead, he will win the game if both parts play perfect moves. He is also likely to win if both parts play reasonably good moves (say, like human amateurs), but there is a bit more of a chance that one player hits upon a good combination which the other misses, so the result is not quite as reliable. If the playouts are totally random, there is still a better chance for white to win, because both parts make equally bad moves. The results have much more variation, of course. So far it does not sound like a very good proposal, but things change if you consider the facts that we don't have perfecr oracles, and good humans are slow to play out a position, and can not be integrated into a computer program. Whereas random playouts can be done awfully fast, tens of thousands of playouts in a second. Averaging the reuslts gives a fair indication of who is more likely to win from that position, just what is needed to decide which part of the search tree to expand. Do you know what use (if any) is made of the standard deviation of the results? The 'random' playouts are not totally random, they include a minimum of tactical rules (do not fill own eyes, do not pass as long as there are valid moves). Even this little will produce a few blind spots, moves that the playouts can not see, and systematically wrong results. Adding more go-specific knowledge can make the results much better (more likely to be right), but can also add some more blind spots. And it costs time, reducing the number of playouts the program can make. Hope that explains something of the mystery Regards Heikki ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] FW: computer-go] Monte carlo play?
Hello Heikki, Heikki Levanto: [EMAIL PROTECTED]: On Sat, Nov 15, 2008 at 11:38:34PM +0100, [EMAIL PROTECTED] wrote: Being a computer scientist but new to go, i can grasp some of the theory. The question I was trying to get across was: In a game of self play, if both parties are employing only monte carlo, surely its not a good conceptual representation of a human, and if the reinforcement learning is based on random simulations wouldnt it be very weak when playing a real human? Here is another amateur answering. The way I understand it, modern Monte Carlo programs do not even try to emulate a human player with a random player - obviously that would not work. I believe CrazyStone's use of patterns does so and it seems successful. Hideki What they do is that they build a quite traditional search tree starting from the current position. They use a random playout as a crude way to evaluate a position. Based on this evaluation, they decide which branch of the tree to expand. This is the way I understand the random playouts: If, in a given position, white is clearly ahead, he will win the game if both parts play perfect moves. He is also likely to win if both parts play reasonably good moves (say, like human amateurs), but there is a bit more of a chance that one player hits upon a good combination which the other misses, so the result is not quite as reliable. If the playouts are totally random, there is still a better chance for white to win, because both parts make equally bad moves. The results have much more variation, of course. So far it does not sound like a very good proposal, but things change if you consider the facts that we don't have perfecr oracles, and good humans are slow to play out a position, and can not be integrated into a computer program. Whereas random playouts can be done awfully fast, tens of thousands of playouts in a second. Averaging the reuslts gives a fair indication of who is more likely to win from that position, just what is needed to decide which part of the search tree to expand. The 'random' playouts are not totally random, they include a minimum of tactical rules (do not fill own eyes, do not pass as long as there are valid moves). Even this little will produce a few blind spots, moves that the playouts can not see, and systematically wrong results. Adding more go-specific knowledge can make the results much better (more likely to be right), but can also add some more blind spots. And it costs time, reducing the number of playouts the program can make. Hope that explains something of the mystery Regards Heikki -- [EMAIL PROTECTED] (Kato) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] FW: computer-go] Monte carlo play?
On Sun, Nov 16, 2008 at 11:46:28AM +, D Gilder wrote: This is the way I understand the random playouts: If, in a given position, white is clearly ahead, he will win the game if both parts play perfect moves. He is also likely to win if both parts play reasonably good moves (say, like human amateurs), but there is a bit more of a chance that one player hits upon a good combination which the other misses, so the result is not quite as reliable. If the playouts are totally random, there is still a better chance for white to win, because both parts make equally bad moves. The results have much more variation, of course. So far it does not sound like a very good proposal, but things change if you consider the facts that we don't have perfecr oracles, and good humans are slow to play out a position, and can not be integrated into a computer program. Whereas random playouts can be done awfully fast, tens of thousands of playouts in a second. Averaging the reuslts gives a fair indication of who is more likely to win from that position, just what is needed to decide which part of the search tree to expand. Do you know what use (if any) is made of the standard deviation of the results? Now statistics isn't my strong point, but one of the first and most successfull systems was called UCT for Upper Confidence Bound. It calculated some sort of expected error, and added that to the winning ratio. Then it expanded the branch that had the highest value. If that expansion was a win, the error margin would get smaller, but the average result would get higher. Perhaps some other branch would be tried next, but this one would still be a good candidate. If the result was a loss, the average would drop, and so would the error, so this move would become much less likely to be expanded. I am sure someone who understands statistics will soon jump in and correct my explanation :-) - Heikki -- Heikki Levanto In Murphy We Turst heikki (at) lsd (dot) dk ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] FW: computer-go] Monte carlo play?
Quoting Hideki Kato [EMAIL PROTECTED]: Heikki Levanto: [EMAIL PROTECTED]: The way I understand it, modern Monte Carlo programs do not even try to emulate a human player with a random player - obviously that would not work. I believe CrazyStone's use of patterns does so and it seems successful. With Valkyria I try to follow two principles in heavy playouts. 1) In contact fights there are a lot of shapes that are played most of the time. Thus Valkyria checks each move played if there is an obvious local response to it. If so it plays it deterministcally. In many situations there are two or more such candidates and then it plays one of those moves. 2) In many positions the last move played does not trigger any obvious response, and then a random move is chosen uniformly 3) There are moves that are inferior 100% of the time both locally and globally. These moves are pruned if they are selected and a new random move is chosen as long as there are moves left to try. I got hundreds of handcoded patterns for both 1 and 3. It would be too time consuming to test these patterns, so I use my knowledge and intuition (European 2 Dan) to simply decide what patterns to include. So Valkyria has a lot of go knowledge, but mostly such knowledge that all go players have up to some strength such as perhaps 8-10 kyu. It has no knowledge about global matters. The beauty of MC-evaluation is that globally strong moves are most of the time evaluated better than globally weak moves. Heavy playouts removes noise from MC-evaluation and makes it more sensitive to the true value of moves. Still there are biases with all heavy playouts, but they are overcome with MC Tree Search (MCTS) that corrects mistakes in the evaluation recursively. Here are my latest scaling experiment on 9x9 for Valkyria. Valkyria plays 1150 random games per second on my 4 year old laptop. This test is against gnugo 3.7.10 assumed to be Elo 1800. Most datapoints are based on 500 games. N sims means Valkyria playes N heavy playouts per move played. Winrates are in %. N sims WinRate Elo (rel Gnu) 47 7.4 1361 94 22 1580 188 37 1708 375 53 1821 750 69.91946 150081.22054 300088 2146 600092.62239 12000 94 2278 24000 97.22416 48000 97.42429 the heavy playouts of Valkyria needs just 375 random games per move to match gnugo using only 0.3 seconds per move. And even using only 47 simulations per move it can still win. So obviously the heavy playout code of Valkyria is much weaker ( Elo 1361) than Gnugo and most human opponents, but compared to CGOS a lot of programs witho no knowledge are about the same level, although they uses 2000 simulations or more. Searching efficiently using MCTS with AMAF it apparently can be made arbitrarily strong. Hope this explains how both the nature of playouts and the MCTS contributes to the playing strength of a program. Should one go heavy or light? I do not know, I feel that Valkyria is a little bit too slow on equivalent hardware against most top programs. On the other hand I think it could be tweaked and improved upon. Perhaps it can even be made faster by removing code that does not improve playing strength. And there is probably still room for adding code that improves strength without a noticable slowdown. I just know that is a lot of hard work doing it the way I did it. Best Magnus ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] FW: computer-go] Monte carlo play?
The random playouts or even heavy playouts are not intended to emulate a human player. Heikki is exactly right. It's a crude measurement of how good the position is. The moves in a random playout are horrible and so are the moves in a heavy playout. In fact, improving them arbitrarily will probably hurt the playouts. Random playouts work reasonably well because to a certain extent bad moves cancel each other. Chrilly called this mutual stupidity I think. For example, a group of stones may be weak and ultimately lost, but it is still possible to defend that group if the attacker isn't diligent. What happens in the playouts is that the attacker is NOT diligent, but neither is the defender! So the result comes out correct anyway. Of course this is not reliable, but it's amazingly good at getting a reasonable picture of what is weak, what is strong and who owns what. You can improve that by adding some knowledge to the playouts, but you must do this with great care. In my example above, let's say you add a piece of knowledge that causes the defender to succeed. You can argue that the playouts play better go now, but the conclusion that you come to for the group we are talking about is now wrong. - Don On Sun, 2008-11-16 at 10:08 +0100, Heikki Levanto wrote: On Sat, Nov 15, 2008 at 11:38:34PM +0100, [EMAIL PROTECTED] wrote: Being a computer scientist but new to go, i can grasp some of the theory. The question I was trying to get across was: In a game of self play, if both parties are employing only monte carlo, surely its not a good conceptual representation of a human, and if the reinforcement learning is based on random simulations wouldnt it be very weak when playing a real human? Here is another amateur answering. The way I understand it, modern Monte Carlo programs do not even try to emulate a human player with a random player - obviously that would not work. What they do is that they build a quite traditional search tree starting from the current position. They use a random playout as a crude way to evaluate a position. Based on this evaluation, they decide which branch of the tree to expand. This is the way I understand the random playouts: If, in a given position, white is clearly ahead, he will win the game if both parts play perfect moves. He is also likely to win if both parts play reasonably good moves (say, like human amateurs), but there is a bit more of a chance that one player hits upon a good combination which the other misses, so the result is not quite as reliable. If the playouts are totally random, there is still a better chance for white to win, because both parts make equally bad moves. The results have much more variation, of course. So far it does not sound like a very good proposal, but things change if you consider the facts that we don't have perfecr oracles, and good humans are slow to play out a position, and can not be integrated into a computer program. Whereas random playouts can be done awfully fast, tens of thousands of playouts in a second. Averaging the reuslts gives a fair indication of who is more likely to win from that position, just what is needed to decide which part of the search tree to expand. The 'random' playouts are not totally random, they include a minimum of tactical rules (do not fill own eyes, do not pass as long as there are valid moves). Even this little will produce a few blind spots, moves that the playouts can not see, and systematically wrong results. Adding more go-specific knowledge can make the results much better (more likely to be right), but can also add some more blind spots. And it costs time, reducing the number of playouts the program can make. Hope that explains something of the mystery Regards Heikki signature.asc Description: This is a digitally signed message part ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] FW: computer-go] Monte carlo play?
Yes, Valkyria does a lot of ladder reading as well. Actually pattern matching in the case of Valkyria is a little unclear, it is a decision trees where the leaves are often procedure calls that looks at a larger portion of the board. The ladder code is called for various reasons in the tree. Is 3.7.10 level 10 weaker than the default settings? I will run a test using 5000 playouts and 375 playouts to see if it makes a difference. Also have you tried this version on CGOS9x9? This version http://cgos.boardspace.net/9x9/cross/Valkyria3.2.6_C2.html used a machine similar to yours and reached 3000 playouts per second using two threads. Your code is then 8 times faster and should be much stronger. -Magnus Quoting David Fotland [EMAIL PROTECTED]: I thought Valkyria does local search (ladders) during the playouts. Many Faces is lighter on the playouts. I have 17 local 3x3 patterns, then go to uniform random without filling eyes. Against Gnugo 3.7.10 level 10 on 9x9, with 5000 playouts, I win 92%, so our performance is similar. I'm doing 23K playouts per second on my 2.2 GHz Core 2 Duo, so my performance might be a little better, depending on the specs of your old machine. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] FW: computer-go] Monte carlo play?
I think I added a small capture bias, but it didn't make much difference. Sorry, I forgot that it isn't quite pure random. Before the uniform random, if there is an enemy one liberty group on the board, with some small probability, I capture it. A pattern includes don't cares and is matched in any orientation. The 3x3 patterns are only matched adjacent to the last move (8 local places). David -Original Message- From: [EMAIL PROTECTED] [mailto:computer-go- [EMAIL PROTECTED] On Behalf Of Jason House Sent: Sunday, November 16, 2008 5:06 PM To: computer-go Subject: Re: [computer-go] FW: computer-go] Monte carlo play? On Nov 16, 2008, at 11:18 AM, David Fotland [EMAIL PROTECTED] games.com wrote: I thought Valkyria does local search (ladders) during the playouts. Many Faces is lighter on the playouts. I have 17 local 3x3 patterns, then go to uniform random without filling eyes. No capture bias in the playouts? I thought that was a big strength boost. Out of curiosity, how do you count your patterns. For example, is it still one pattern if it includes a don't care? How about rotations/ reflections of the same basic pattern? Against Gnugo 3.7.10 level 10 on 9x9, with 5000 playouts, I win 92%, so our performance is similar. I'm doing 23K playouts per second on my 2.2 GHz Core 2 Duo, so my performance might be a little better, depending on the specs of your old machine. David -Original Message- From: [EMAIL PROTECTED] [mailto:computer-go- [EMAIL PROTECTED] On Behalf Of Magnus Persson Sent: Sunday, November 16, 2008 5:45 AM To: computer-go@computer-go.org Subject: Re: [computer-go] FW: computer-go] Monte carlo play? Quoting Hideki Kato [EMAIL PROTECTED]: Heikki Levanto: [EMAIL PROTECTED]: The way I understand it, modern Monte Carlo programs do not even try to emulate a human player with a random player - obviously that would not work. I believe CrazyStone's use of patterns does so and it seems successful. With Valkyria I try to follow two principles in heavy playouts. 1) In contact fights there are a lot of shapes that are played most of the time. Thus Valkyria checks each move played if there is an obvious local response to it. If so it plays it deterministcally. In many situations there are two or more such candidates and then it plays one of those moves. 2) In many positions the last move played does not trigger any obvious response, and then a random move is chosen uniformly 3) There are moves that are inferior 100% of the time both locally and globally. These moves are pruned if they are selected and a new random move is chosen as long as there are moves left to try. I got hundreds of handcoded patterns for both 1 and 3. It would be too time consuming to test these patterns, so I use my knowledge and intuition (European 2 Dan) to simply decide what patterns to include. So Valkyria has a lot of go knowledge, but mostly such knowledge that all go players have up to some strength such as perhaps 8-10 kyu. It has no knowledge about global matters. The beauty of MC-evaluation is that globally strong moves are most of the time evaluated better than globally weak moves. Heavy playouts removes noise from MC-evaluation and makes it more sensitive to the true value of moves. Still there are biases with all heavy playouts, but they are overcome with MC Tree Search (MCTS) that corrects mistakes in the evaluation recursively. Here are my latest scaling experiment on 9x9 for Valkyria. Valkyria plays 1150 random games per second on my 4 year old laptop. This test is against gnugo 3.7.10 assumed to be Elo 1800. Most datapoints are based on 500 games. N sims means Valkyria playes N heavy playouts per move played. Winrates are in %. N simsWinRateElo (rel Gnu) 477.41361 94221580 188371708 375531821 75069.91946 150081.22054 3000882146 600092.62239 12000942278 2400097.22416 4800097.42429 the heavy playouts of Valkyria needs just 375 random games per move to match gnugo using only 0.3 seconds per move. And even using only 47 simulations per move it can still win. So obviously the heavy playout code of Valkyria is much weaker ( Elo 1361) than Gnugo and most human opponents, but compared to CGOS a lot of programs witho no knowledge are about the same level, although they uses 2000 simulations or more. Searching efficiently using MCTS with AMAF it apparently can be made arbitrarily strong. Hope this explains how both the nature of playouts and the MCTS contributes to the playing strength of a program. Should one go heavy or light? I do not know, I feel that Valkyria is a little bit too slow on equivalent hardware against most top programs. On the other hand I think
Re: [computer-go] FW: computer-go] Monte carlo play?
Some months ago I did several experiments with using tactics and patterns in playouts. Generally I found a big boost in strength using tactics. I also found a boost in strength using patterns but with a severe diminishing return after a certain number and even becoming detrimental when using large number of patterns (1,000s to 10,000s). Since I was using a generalized pattern-matcher, using it slowed things down considerably. Although it played a lot better with the same number of playouts, if I compared MC playouts with patterns to a MC playout without patterns using the same amount of CPU time the gain was not so obvious. Since most of the gain in strength was gained by just a few patterns I concluded just as David that it was probably better to just use a handful of hard-coded patterns during playouts. I only recently started to do real experiments with hard-coded patterns and so far my results are rather inconclusive. I found when mixing different things it's not always clear what contributes to any increased strength observed. So I'm still in the process of trying to dissect what is actually contributing where. I found for example that a lot of the increased level of play using patterns does not come from using them during playouts but comes from the effect they have on move-exploration. I don't know if this is due to my particular way of implementing MC playouts in combination with UCT search, but moves matching a pattern (usually) automatically make it first in the tree- expansion as well and generally I can say so far I'm observing that most of the increase in level comes from the selection during exploration and only in small part from the selection during simulation. For example, in one particular experiment using just 5 patterns I saw a win-rate of 65% against the same program not using patterns (with the same number of playouts). But when not using the patterns during exploration saw the win-rate drop to just 55%. I still have a lot of testing to do and it's too early to draw any hard conclusions. But I think it's worthwhile trying to distinguish where the strength is actually gained. Better yet, finding out exactly 'why' it gained strength, because with MC playouts I often find results during testing highly counter-intuitive, occasionally to the point of being (seemingly) nonsensical. I also think what Don was proposing with his reference-bot could be interesting. Trying to make it play around ELO 1700 on CGOS just using 5,000 (light) playouts. I don't know if it's possible, but I think it's a fruitful exercise. At a time where most people are looking at using more and more hardware to increase playing-strength, knowing what plays best at the other end of the spectrum is valuable as well. With that I mean, finding what plays best using severely constrained resources. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] FW: computer-go] Monte carlo play?
So you say that: ...I'm observing that most of the increase in level comes from the selection during exploration and only in small part from the selection during simulation., could you elaborate at all? This is very interesting. That almost suggests it might be fruitful to use the patterns in the tree, but keep lighter playouts. - George On Sun, Nov 16, 2008 at 10:39 PM, Mark Boon [EMAIL PROTECTED] wrote: Some months ago I did several experiments with using tactics and patterns in playouts. Generally I found a big boost in strength using tactics. I also found a boost in strength using patterns but with a severe diminishing return after a certain number and even becoming detrimental when using large number of patterns (1,000s to 10,000s). Since I was using a generalized pattern-matcher, using it slowed things down considerably. Although it played a lot better with the same number of playouts, if I compared MC playouts with patterns to a MC playout without patterns using the same amount of CPU time the gain was not so obvious. Since most of the gain in strength was gained by just a few patterns I concluded just as David that it was probably better to just use a handful of hard-coded patterns during playouts. I only recently started to do real experiments with hard-coded patterns and so far my results are rather inconclusive. I found when mixing different things it's not always clear what contributes to any increased strength observed. So I'm still in the process of trying to dissect what is actually contributing where. I found for example that a lot of the increased level of play using patterns does not come from using them during playouts but comes from the effect they have on move-exploration. I don't know if this is due to my particular way of implementing MC playouts in combination with UCT search, but moves matching a pattern (usually) automatically make it first in the tree-expansion as well and generally I can say so far I'm observing that most of the increase in level comes from the selection during exploration and only in small part from the selection during simulation. For example, in one particular experiment using just 5 patterns I saw a win-rate of 65% against the same program not using patterns (with the same number of playouts). But when not using the patterns during exploration saw the win-rate drop to just 55%. I still have a lot of testing to do and it's too early to draw any hard conclusions. But I think it's worthwhile trying to distinguish where the strength is actually gained. Better yet, finding out exactly 'why' it gained strength, because with MC playouts I often find results during testing highly counter-intuitive, occasionally to the point of being (seemingly) nonsensical. I also think what Don was proposing with his reference-bot could be interesting. Trying to make it play around ELO 1700 on CGOS just using 5,000 (light) playouts. I don't know if it's possible, but I think it's a fruitful exercise. At a time where most people are looking at using more and more hardware to increase playing-strength, knowing what plays best at the other end of the spectrum is valuable as well. With that I mean, finding what plays best using severely constrained resources. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] FW: computer-go] Monte carlo play?
On 17-nov-08, at 02:42, George Dahl wrote: So you say that: ...I'm observing that most of the increase in level comes from the selection during exploration and only in small part from the selection during simulation., could you elaborate at all? This is very interesting. That almost suggests it might be fruitful to use the patterns in the tree, but keep lighter playouts. That's exactly what it's suggesting. But as I said, I need to do some more testing to make a hard case for that. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] FW: computer-go] Monte carlo play?
I look forward to hearing more! Happy testing. - George On Sun, Nov 16, 2008 at 11:53 PM, Mark Boon [EMAIL PROTECTED] wrote: On 17-nov-08, at 02:42, George Dahl wrote: So you say that: ...I'm observing that most of the increase in level comes from the selection during exploration and only in small part from the selection during simulation., could you elaborate at all? This is very interesting. That almost suggests it might be fruitful to use the patterns in the tree, but keep lighter playouts. That's exactly what it's suggesting. But as I said, I need to do some more testing to make a hard case for that. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] FW: computer-go] Monte carlo play?
It seems move selection in the playouts should be very random at first and more deterministic toward the end of the playout. Has anyone tried that? Mark Boon wrote: On 17-nov-08, at 02:42, George Dahl wrote: So you say that: ...I'm observing that most of the increase in level comes from the selection during exploration and only in small part from the selection during simulation., could you elaborate at all? This is very interesting. That almost suggests it might be fruitful to use the patterns in the tree, but keep lighter playouts. That's exactly what it's suggesting. But as I said, I need to do some more testing to make a hard case for that. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] FW: computer-go] Monte carlo play?
Van: tony tang [mailto:[EMAIL PROTECTED] Verzonden: za 15-11-2008 21:22 Aan: [EMAIL PROTECTED] Onderwerp: [RE:computer-go] Monte carlo play? Hello Dave, Thank you for a thorough introduction of the theory, and i sincerely hope I am not wasting your time with amateur questions. Because it was quite late when I sent the question I think I didnt word it correctly and mis-communicated my question. Being a computer scientist but new to go, i can grasp some of the theory. The question I was trying to get across was: In a game of self play, if both parties are employing only monte carlo, surely its not a good conceptual representation of a human, and if the reinforcement learning is based on random simulations wouldnt it be very weak when playing a real human? According to nick wedd and some playing around I am clearly mistaken because its works quite well. Is it because they are training(not in the neural network sense) their monte carlo based engine against another engine with prior knowledge or something of that nature? I have seen papers implementing patterns and prior knowledge with monte carlo (dated 2006) has that become a standard now, and when people refer to monte carlo they dont mean absolutly random? Thank you Kind regards Tony Read amazing stories to your kids on Messenger Try it Now! http://clk.atdmt.com/UKM/go/117588488/direct/01/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] FW: computer-go] Monte carlo play?
Van: [EMAIL PROTECTED] Verzonden: za 15-11-2008 22:44 Aan: tony tang Onderwerp: RE: computer-go] Monte carlo play? Hello Tony, Ok, so you know about the minimax algorithm and the like. My impression was wrong and I'm very sorry for my analogy. I'm no expert on montecarlo, so I can only say what I know. But most of the real experts have been rather quiet during the last week, so perhaps I can act as a stand-in. The basics is indeed absolutely random play (alternating turns and removing captured stones as in normal play) until the game finishes. The only exceptions to random plays are occupied intersections, simple ko recaptures, suicide moves (not sure if all programs prune suicide if the rules allow it) and moves that fill one-point eyes of the color who is to play (there may be slight differences in the exact definition of a one point eye). Without the one eye pruning, random games would never finish, because both random players would keep on killing their own stones and refilling the empty space after the opponent has captured. Even with eye detection and simple ko detection, the game could end up in a cycle occasionally because of triple ko or more pathological situations, so one should probably also limit the number of moves and discard the result if that limit is exceeded. This is probably a much cheaper solution to the cycle problem than a more sophisticated detection). A purely random game is played out until both players are forced to pass. This is a called a light playout. Claus Reinke recently dubbed the term random fill-in for this process (http://www.mail-archive.com/computer-go@computer-go.org/msg09749.html). I like this term better than light playout, but it doesn't seem to catch on here. The board is then counted to determine the winner and this binary result is added to a statistic for the move that is currently being evaluated. The accumulated statistic is effectively the evaluation value for the position. It is then backtracked to the game tree root in a minimax way. In go it is hard to quickly evaluate a position statically (or even slowly for that matter). So it is nice to have a dynamic evaluation that becomes more accurate as you spend more cpu cyles on it. You can see that it's important that random fill-ins are fast, but it's still a rather large investment to do a lot of playouts, when you compare this with a static evaluation function. So it is even more important to avoid wasting effort in evaluating positions by random fill-ins. The trick is to use crude statistics (few fill-ins) to decide which statistic to refine by investing more fill-ins. This is where the UCT value comes in (http://senseis.xmp.net/?UCT). It is derived from the statistic and it decides which position to investigate further, preferring good statistics over bad ones, but also preferring crude statistics over accurate ones. There is no training involved and there is no a priori knowledge involved here except for the rules and eye detection. The program has absolutely no idea about go concepts. Would you call this process learning? Programs using this algorithm are called pure MC programs and although they get better with more CPU cycles, they aren't very strong with normal time limits on current hardware (or hardware in the near future). You are right, by itself pure MC is just not effective enough. So programmers started to build in more Go knowledge to improve on this. This makes the fill-in process slower, so they are called heavy playouts. And this is also where my knowledge ends. I may guess that instead of selecting moves in a uniform random way, moves are weighted by a priori Go knowledge built in by the programmer, but I know no details and I don't know if there have been publications on this. Perhaps there aren't any, because it is also a competitive field and commercial interests might be involved too. Other contributors can probably tell you more about this. Best Regards, Dave Van: tony tang [mailto:[EMAIL PROTECTED] Verzonden: za 15-11-2008 21:22 Aan: [EMAIL PROTECTED] Onderwerp: [RE:computer-go] Monte carlo play? Hello Dave, Thank you for a thorough introduction of the theory, and i sincerely hope I am not wasting your time with amateur questions. Because it was quite late when I sent the question I think I didnt word it correctly and mis-communicated my question. Being a computer scientist but new to go, i can grasp some of the theory. The question I was trying to get across was: In a game of self play, if both parties are employing only monte carlo, surely its not a good conceptual representation of a human, and if the reinforcement learning is based on random simulations wouldnt it be very weak when playing a real human? According to nick wedd and some playing around I am clearly mistaken because its works quite well. Is it because they are training(not in the neural
[computer-go] FW: computer-go] Monte carlo play?
Van: tony tang [mailto:[EMAIL PROTECTED] Verzonden: za 15-11-2008 23:08 Aan: [EMAIL PROTECTED] Onderwerp: RE: computer-go] Monte carlo play? Thanks Dave, that was incredible helpful, hopefully this new hobby of mine will materialise into a decent project. All the best Tony Subject: RE: computer-go] Monte carlo play? Date: Sat, 15 Nov 2008 22:44:51 +0100 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Hello Tony, Ok, so you know about the minimax algorithm and the like. My impression was wrong and I'm very sorry for my analogy. I'm no expert on montecarlo, so I can only say what I know. But most of the real experts have been rather quiet during the last week, so perhaps I can act as a stand-in. The basics is indeed absolutely random play (alternating turns and removing captured stones as in normal play) until the game finishes. The only exceptions to random plays are occupied intersections, simple ko recaptures, suicide moves (not sure if all programs prune suicide if the rules allow it) and moves that fill one-point eyes of the color who is to play (there may be slight differences in the exact definition of a one point eye). Without the one eye pruning, random games would never finish, because both random players would keep on killing their own stones and refilling the empty space after the opponent has captured. Even with eye detection and simple ko detection, the game could end up in a cycle occasionally because of triple ko or more pathological situations, so one should probably also limit the number of moves and discard the result if that limit is exceeded. This is probably a much cheaper solution to the cycle problem than a more sophisticated detection). A purely random game is played out until both players are forced to pass. This is a called a light playout. Claus Reinke recently dubbed the term random fill-in for this process (http://www.mail-archive.com/computer-go@computer-go.org/msg09749.html). I like this term better than light playout, but it doesn't seem to catch on here. The board is then counted to determine the winner and this binary result is added to a statistic for the move that is currently being evaluated. The accumulated statistic is effectively the evaluation value for the position. It is then backtracked to the game tree root in a minimax way. In go it is hard to quickly evaluate a position statically (or even slowly for that matter). So it is nice to have a dynamic evaluation that becomes more accurate as you spend more cpu cyles on it. You can see that it's important that random fill-ins are fast, but it's still a rather large investment to do a lot of playouts, when you compare this with a static evaluation function. So it is even more important to avoid wasting effort in evaluating positions by random fill-ins. The trick is to use crude statistics (few fill-ins) to decide which statistic to refine by investing more fill-ins. This is where the UCT value comes in (http://senseis.xmp.net/?UCT). It is derived from the statistic and it decides which position to investigate further, preferring good statistics over bad ones, but also preferring crude statistics over accurate ones. There is no training involved and there is no a priori knowledge involved here except for the rules and eye detection. The program has absolutely no idea about go concepts. Would you call this process learning? Programs using this algorithm are called pure MC programs and although they get better with more CPU cycles, they aren't very strong with normal time limits on current hardware (or hardware in the near future). You are right, by itself pure MC is just not effective enough. So programmers started to build in more Go knowledge to improve on this. This makes the fill-in process slower, so they are called heavy playouts. And this is also where my knowledge ends. I may guess that instead of selecting moves in a uniform random way, moves are weighted by a priori Go knowledge built in by the programmer, but I know no details and I don't know if there have been publications on this. Perhaps there aren't any, because it is also a competitive field and commercial interests might be involved too. Other contributors can probably tell you more about this. Best Regards, Dave Van: tony tang [mailto:[EMAIL PROTECTED] Verzonden: za 15-11-2008 21:22 Aan: [EMAIL PROTECTED] Onderwerp: [RE:computer-go] Monte carlo play? Hello Dave, Thank you for a thorough introduction of the theory, and i sincerely hope I am not wasting your time with amateur questions. Because it was quite late when I sent the question I think I didnt word it correctly and mis-communicated my question. Being a computer scientist but new to go, i can grasp some of the theory. The question I was trying to get across was: In a game of self play, if both parties are employing only monte carlo,
Re: [computer-go] FW: computer-go] Monte carlo play?
In a game of self play, if both parties are employing only monte carlo, ... random simulations... wouldnt it be very weak... ... and some playing around I am clearly mistaken because its works quite well. Yes, it doesn't make sense but it does indeed seem to work :-). I have seen papers implementing patterns and prior knowledge with monte carlo (dated 2006) has that become a standard now, and when people refer to monte carlo they dont mean absolutly random? Just using monte carlo playouts is still weak. Monte carlo + UCT is strong. (UCT is a specific algorithm, so people are now referring to the set of similar algorithms as MCTS for Monte Carlo Tree Search.) No one is using purely random playouts in any top program, but David Fotland described Many Faces' playouts as fairly light. I think Crazy Stone (as described in detail in one of RĂ©mi Coulom's papers) or Valkryia perhaps have the heaviest playouts, and Mogo is somewhere between? Too much knowledge in the monte carlo playouts and you use too many CPU cycles and cannot do enough of them. (And the real situation is even more complex than that, as sometimes adding knowledge that seems really fundamental for human players does not seem to help, even before allowing for the extra CPU cycles used.) Darren -- Darren Cook, Software Researcher/Developer http://dcook.org/mlsn/ (English-Japanese-German-Chinese-Arabic open source dictionary/semantic network) http://dcook.org/work/ (About me and my work) http://dcook.org/blogs.html (My blogs and articles) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/