Re: [Computer-go] Monte-Carlo Tree Search as Regularized Policy Optimization
I used stochastic sampling at internal nodes, because of this. > During the forward simulation phase of SEARCH, the action at each node x is > selected by sampling a ∼ π¯(·|x). > As a result, the full imaginary trajectory is generated consistently > according to policy π¯. > In this section, we establish our main claim namely that AlphaZero’s action > selection criteria can be interpreted as approximating the solution to a > regularized policy-optimization objective. I think they say UCT and PUCT is approximation of direct π¯ sampling, but I haven't understood section 3 well. 2020年7月20日(月) 2:51 Daniel : > > @Kensuke I suppose all the proposed algorithms ACT, SEARCH and LEARN are > meant to be used during training, no? > I think I understand ACT and LEARN but I am not sure about SEARCH for which > they say this: > > > During search, we propose to stochastically sample actions according to π¯ > > instead of the deterministic action selection rule of Eq. 1. > > This sounds much like the random selection done at the root with temperature, > but this time applied at internal nodes. > Does it mean the pUCT formula is not used? Why does the selection have to be > stochastic now? > On selection, you compute π_bar every time from (q, π_theta, n_visits) so I > suppose π_bar has everything it needs to balance exploration and exploitation. > > > On Sun, Jul 19, 2020 at 8:10 AM David Wu wrote: >> >> I imagine that at low visits at least, "ACT" behaves similarly to Leela >> Zero's "LCB" move selection, which also has the effect of sometimes >> selecting a move that is not the max-visits move, if its value estimate has >> recently been found to be sufficiently larger to balance the fact that it is >> lower prior and lower visits (at least, typically, this is why the move >> wouldn't have been the max visits move in the first place). It also scales >> in an interesting way with empirical observed playout-by-playout variance of >> moves, but I think by far the important part is that it can use sufficiently >> confident high value to override max-visits. >> >> The gain from "LCB" in match play I recall is on the very very rough order >> of 100 Elo, although it could be less or more depending on match conditions >> and what neural net is used and other things. So for LZ at least, "ACT"-like >> behavior at low visits is not new. >> >> >> On Sun, Jul 19, 2020 at 5:39 AM Kensuke Matsuzaki >> wrote: >>> >>> Hi, >>> >>> I couldn't improve leela zero's strength by implementing SEARCH and ACT. >>> https://github.com/zakki/leela-zero/commits/regularized_policy >>> >>> 2020年7月17日(金) 2:47 Rémi Coulom : >>> > >>> > This looks very interesting. >>> > >>> > From a quick glance, it seems the improvement is mainly when the number >>> > of playouts is small. Also they don't test on the game of Go. Has anybody >>> > tried it? >>> > >>> > I will take a deeper look later. >>> > >>> > On Thu, Jul 16, 2020 at 9:49 AM Ray Tayek wrote: >>> >> >>> >> https://old.reddit.com/r/MachineLearning/comments/hrzooh/r_montecarlo_tree_search_as_regularized_policy/ >>> >> >>> >> >>> >> -- >>> >> Honesty is a very expensive gift. So, don't expect it from cheap people >>> >> - Warren Buffett >>> >> http://tayek.com/ >>> >> >>> >> ___ >>> >> Computer-go mailing list >>> >> Computer-go@computer-go.org >>> >> http://computer-go.org/mailman/listinfo/computer-go >>> > >>> > ___ >>> > Computer-go mailing list >>> > Computer-go@computer-go.org >>> > http://computer-go.org/mailman/listinfo/computer-go >>> >>> >>> >>> -- >>> Kensuke Matsuzaki >>> ___ >>> Computer-go mailing list >>> Computer-go@computer-go.org >>> http://computer-go.org/mailman/listinfo/computer-go >> >> ___ >> Computer-go mailing list >> Computer-go@computer-go.org >> http://computer-go.org/mailman/listinfo/computer-go > > ___ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go -- Kensuke Matsuzaki ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Monte-Carlo Tree Search as Regularized Policy Optimization
@Kensuke I suppose all the proposed algorithms ACT, SEARCH and LEARN are meant to be used during training, no? I think I understand ACT and LEARN but I am not sure about SEARCH for which they say this: > During search, we propose to stochastically sample actions according to π¯ instead of the deterministic action selection rule of Eq. 1. This sounds much like the random selection done at the root with temperature, but this time applied at internal nodes. Does it mean the pUCT formula is not used? Why does the selection have to be stochastic now? On selection, you compute π_bar every time from (q, π_theta, n_visits) so I suppose π_bar has everything it needs to balance exploration and exploitation. On Sun, Jul 19, 2020 at 8:10 AM David Wu wrote: > I imagine that at low visits at least, "ACT" behaves similarly to Leela > Zero's "LCB" move selection, which also has the effect of sometimes > selecting a move that is not the max-visits move, if its value estimate has > recently been found to be sufficiently larger to balance the fact that it > is lower prior and lower visits (at least, typically, this is why the move > wouldn't have been the max visits move in the first place). It also scales > in an interesting way with empirical observed playout-by-playout variance > of moves, but I think by far the important part is that it can use > sufficiently confident high value to override max-visits. > > The gain from "LCB" in match play I recall is on the very very rough order > of 100 Elo, although it could be less or more depending on match conditions > and what neural net is used and other things. So for LZ at least, > "ACT"-like behavior at low visits is not new. > > > On Sun, Jul 19, 2020 at 5:39 AM Kensuke Matsuzaki > wrote: > >> Hi, >> >> I couldn't improve leela zero's strength by implementing SEARCH and ACT. >> https://github.com/zakki/leela-zero/commits/regularized_policy >> >> 2020年7月17日(金) 2:47 Rémi Coulom : >> > >> > This looks very interesting. >> > >> > From a quick glance, it seems the improvement is mainly when the number >> of playouts is small. Also they don't test on the game of Go. Has anybody >> tried it? >> > >> > I will take a deeper look later. >> > >> > On Thu, Jul 16, 2020 at 9:49 AM Ray Tayek wrote: >> >> >> >> >> https://old.reddit.com/r/MachineLearning/comments/hrzooh/r_montecarlo_tree_search_as_regularized_policy/ >> >> >> >> >> >> -- >> >> Honesty is a very expensive gift. So, don't expect it from cheap >> people - Warren Buffett >> >> http://tayek.com/ >> >> >> >> ___ >> >> Computer-go mailing list >> >> Computer-go@computer-go.org >> >> http://computer-go.org/mailman/listinfo/computer-go >> > >> > ___ >> > Computer-go mailing list >> > Computer-go@computer-go.org >> > http://computer-go.org/mailman/listinfo/computer-go >> >> >> >> -- >> Kensuke Matsuzaki >> ___ >> Computer-go mailing list >> Computer-go@computer-go.org >> http://computer-go.org/mailman/listinfo/computer-go >> > ___ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go > ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Monte-Carlo Tree Search as Regularized Policy Optimization
I imagine that at low visits at least, "ACT" behaves similarly to Leela Zero's "LCB" move selection, which also has the effect of sometimes selecting a move that is not the max-visits move, if its value estimate has recently been found to be sufficiently larger to balance the fact that it is lower prior and lower visits (at least, typically, this is why the move wouldn't have been the max visits move in the first place). It also scales in an interesting way with empirical observed playout-by-playout variance of moves, but I think by far the important part is that it can use sufficiently confident high value to override max-visits. The gain from "LCB" in match play I recall is on the very very rough order of 100 Elo, although it could be less or more depending on match conditions and what neural net is used and other things. So for LZ at least, "ACT"-like behavior at low visits is not new. On Sun, Jul 19, 2020 at 5:39 AM Kensuke Matsuzaki wrote: > Hi, > > I couldn't improve leela zero's strength by implementing SEARCH and ACT. > https://github.com/zakki/leela-zero/commits/regularized_policy > > 2020年7月17日(金) 2:47 Rémi Coulom : > > > > This looks very interesting. > > > > From a quick glance, it seems the improvement is mainly when the number > of playouts is small. Also they don't test on the game of Go. Has anybody > tried it? > > > > I will take a deeper look later. > > > > On Thu, Jul 16, 2020 at 9:49 AM Ray Tayek wrote: > >> > >> > https://old.reddit.com/r/MachineLearning/comments/hrzooh/r_montecarlo_tree_search_as_regularized_policy/ > >> > >> > >> -- > >> Honesty is a very expensive gift. So, don't expect it from cheap people > - Warren Buffett > >> http://tayek.com/ > >> > >> ___ > >> Computer-go mailing list > >> Computer-go@computer-go.org > >> http://computer-go.org/mailman/listinfo/computer-go > > > > ___ > > Computer-go mailing list > > Computer-go@computer-go.org > > http://computer-go.org/mailman/listinfo/computer-go > > > > -- > Kensuke Matsuzaki > ___ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go > ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Monte-Carlo Tree Search as Regularized Policy Optimization
Hi, I couldn't improve leela zero's strength by implementing SEARCH and ACT. https://github.com/zakki/leela-zero/commits/regularized_policy 2020年7月17日(金) 2:47 Rémi Coulom : > > This looks very interesting. > > From a quick glance, it seems the improvement is mainly when the number of > playouts is small. Also they don't test on the game of Go. Has anybody tried > it? > > I will take a deeper look later. > > On Thu, Jul 16, 2020 at 9:49 AM Ray Tayek wrote: >> >> https://old.reddit.com/r/MachineLearning/comments/hrzooh/r_montecarlo_tree_search_as_regularized_policy/ >> >> >> -- >> Honesty is a very expensive gift. So, don't expect it from cheap people - >> Warren Buffett >> http://tayek.com/ >> >> ___ >> Computer-go mailing list >> Computer-go@computer-go.org >> http://computer-go.org/mailman/listinfo/computer-go > > ___ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go -- Kensuke Matsuzaki ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Monte-Carlo Tree Search as Regularized Policy Optimization
This looks very interesting. >From a quick glance, it seems the improvement is mainly when the number of playouts is small. Also they don't test on the game of Go. Has anybody tried it? I will take a deeper look later. On Thu, Jul 16, 2020 at 9:49 AM Ray Tayek wrote: > > https://old.reddit.com/r/MachineLearning/comments/hrzooh/r_montecarlo_tree_search_as_regularized_policy/ > > > -- > Honesty is a very expensive gift. So, don't expect it from cheap people - > Warren Buffett > http://tayek.com/ > > ___ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go > ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] monte carlo search; all valid moves?
Exactly you don't check super ko in the playouts and you generally stop playouts at 3 * boardsize * boardsize. But once you start adding more logic into the playouts (like not playing in single point eyes) they tend not to hit this limit. Von meinem iPhone gesendet Am 28.03.2015 um 11:53 schrieb hughperkins2 hughperki...@gmail.com: Oh wait, superko check is not that cheap, but it is so rare, you can probably ignore it in playouts, and jist check befote submitting a move to the server. If its superko, then jist pass pethaps. ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] monte carlo search; all valid moves?
Well you could check if a field is empty or not but you could also check if you're putting stones in a dead area or if it would be a suicide, etc. On Sat, Mar 28, 2015 at 10:15:32AM +0100, ?lvaro Begu? wrote: I am not sure I understand the question. The only thing that is typically not checked in the playouts is superko. What other validity checks are you performing? Álvaro. On Sat, Mar 28, 2015 at 9:54 AM, holger krekel hol...@merlinux.eu wrote: On Sat, Mar 28, 2015 at 08:51 +0100, folkert wrote: Hi, For a monte carlo search, are only valid moves performed? Or does it work from beginning to the end of a playout using whatever free position is available on the board? I am also interested in the question. Because I read here that people can do 25k playouts per second while my program can only do ~ 20 per second when doing full validity checks on all steps. Do you have a reference, some context for the 25K playouts? holger Folkert van Heusden -- -- Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go -- about me:http://holgerkrekel.net/about-me/ contracting: http://merlinux.eu ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go Folkert van Heusden -- MultiTail is a versatile tool for watching logfiles and output of commands. Filtering, coloring, merging, diff-view, etc. http://www.vanheusden.com/multitail/ -- Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] monte carlo search; all valid moves?
20 playouts is a bit slow Folkert. But I've been there, too! Now my bot is at around 1000 19x19 pps and 5000 on 9x9 (but I also check to not play into single point eyes). That's still quite slow and further optimisation will be required. But OTOH on a 4 core machine without UCT (only the AMAF) heuristic I get around 1250ELO on CGOS 13x13 ( http://cgos.boardspace.net/13x13/cross/Imrscl-017-A-N.html). Urban On Sat, Mar 28, 2015 at 11:30 AM, Petr Baudis pa...@ucw.cz wrote: On Sat, Mar 28, 2015 at 10:15:32AM +0100, Álvaro Begué wrote: I am not sure I understand the question. The only thing that is typically not checked in the playouts is superko. What other validity checks are you performing? There is a few: (i) No single stone suicide. Can't think of how to playout without that, you wouldn't know when to stop playing. (ii) No multi stone suicide. Sometimes in the fastest implementations this rule is relaxed (i.e. you proclaim that you are using Tromp-Taylor or New Zealand rules). But in real-world engines, you need to check things like self-atari or 2-liberty semeai and then a suicide check naturally falls out from that too. (iii) No ko. This is a pretty cheap check, and without it again your playouts may end up looping. BTW, the fastest playouts are probably done by libego and that's open source. But as I said in the past - consider not optimizing first; write some reasonable playout heuristics first and *then* profile your code. Petr Baudis ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go -- Blog: http://bettong.net/ Twitter: https://twitter.com/ujh Homepage: http://www.urbanhafner.com/ ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] monte carlo search; all valid moves?
20 playouts per second is pretty slow. Are you sure youre not just looping around endlessly for 1 moves or something? Playing in a dead area is legal, as is filling your own eyes etc. Only suicide, and ko is illegal. Even superko is a pretty cheap check. ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] monte carlo search; all valid moves?
Because I read here that people can do 25k playouts per second while my program can only do ~ 20 per second when doing full validity checks on all steps. Do you have a reference, some context for the 25K playouts? Look for the topic What's a good playout speed? in the mailinglist archives. http://computer-go.org/pipermail/computer-go/ Folkert van Heusden -- Curious about the inner workings of your car? Then check O2OO: it'll tell you all that there is to know about your car's engine! http://www.vanheusden.com/O2OO/ -- Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] monte carlo search; all valid moves?
On Sat, Mar 28, 2015 at 08:51 +0100, folkert wrote: Hi, For a monte carlo search, are only valid moves performed? Or does it work from beginning to the end of a playout using whatever free position is available on the board? I am also interested in the question. Because I read here that people can do 25k playouts per second while my program can only do ~ 20 per second when doing full validity checks on all steps. Do you have a reference, some context for the 25K playouts? holger Folkert van Heusden -- -- Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go -- about me:http://holgerkrekel.net/about-me/ contracting: http://merlinux.eu ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] monte carlo search; all valid moves?
I am not sure I understand the question. The only thing that is typically not checked in the playouts is superko. What other validity checks are you performing? Álvaro. On Sat, Mar 28, 2015 at 9:54 AM, holger krekel hol...@merlinux.eu wrote: On Sat, Mar 28, 2015 at 08:51 +0100, folkert wrote: Hi, For a monte carlo search, are only valid moves performed? Or does it work from beginning to the end of a playout using whatever free position is available on the board? I am also interested in the question. Because I read here that people can do 25k playouts per second while my program can only do ~ 20 per second when doing full validity checks on all steps. Do you have a reference, some context for the 25K playouts? holger Folkert van Heusden -- -- Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go -- about me:http://holgerkrekel.net/about-me/ contracting: http://merlinux.eu ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] monte carlo search; all valid moves?
Oh wait, superko check is not that cheap, but it is so rare, you can probably ignore it in playouts, and jist check befote submitting a move to the server. If its superko, then jist pass pethaps. ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [computer-go] monte carlo
What method are you guys using for the monte carlo search? What do you do? I pick at random a move, then for the resulting board that comes out of that pick another move and so on. Then, after that I evaulate the number of stones etc. What do you guys look at to select a move using monte carlo? On Sat, Oct 17, 2009 at 05:02:33PM +0200, Folkert van Heusden wrote: People, I'm trying to implement a monthecarlo algorithm in my go program. Now the results are dramatic: the elo-rating of my go program drops from 1150 to below 700. I tried: - evaluate the number of captured stone - evaluate strategic elements (without MC this strategic eval gives that 1150 elo). Currently my program can evaluate 500 scenes per second and I let it think for 5 seconds. What could be the cause of this dramatic results? Wrong evaluation? Not enough nodes processed? Folkert van Heusden -- www.vanheusden.com/multitail - multitail is tail on steroids. multiple windows, filtering, coloring, anything you can think of -- Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] monte carlo
On Sun, Oct 25, 2009 at 06:52:56PM +0100, Folkert van Heusden wrote: What method are you guys using for the monte carlo search? What do you do? I pick at random a move, then for the resulting board that comes out of that pick another move and so on. Then, after that I evaulate the number of stones etc. Do you mean you count the score? Make sure you actually count the stone right, and (quite important) that you are picking out moves really at random, not skewed in any way. What do you guys look at to select a move using monte carlo? People use heuristics to prefer captures, escaping atari, play local moves that match certain 3x3 patterns etc. - there is plenty of papers covering this, I recommend you look at them. It's fairly important to have all the heuristics somewhat randomized (depending on other parts of my engine, I've found any value between 60% to 90% probability of applying each heuristic optimal). -- Petr Pasky Baudis A lot of people have my books on their bookshelves. That's the problem, they need to read them. -- Don Knuth ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] monte carlo
What method are you guys using for the monte carlo search? What do you do? I pick at random a move, then for the resulting board that comes out of that pick another move and so on. Then, after that I evaulate the number of stones etc. Do you mean you count the score? Make sure you actually count the stone right, and (quite important) that you are picking out moves really at random, not skewed in any way. Yes, the score indeed. Well, only the number of stones removed from the board. I did not implement an algorithm to count areas belonging to a color yet. What do you guys look at to select a move using monte carlo? People use heuristics to prefer captures, escaping atari, play local moves that match certain 3x3 patterns etc. - there is plenty of papers covering this, I recommend you look at them. It's fairly important to have all the heuristics somewhat randomized (depending on other parts of my engine, I've found any value between 60% to 90% probability of applying each heuristic optimal). Yeah, I rated most of them equally. Some are weighted in depending on applicability. May I ask: how many board-settings (one move + all evaluation) do your programs calculate per second? Folkert van Heusden -- Ever wonder what is out there? Any alien races? Then please support the s...@home project: setiathome.ssl.berkeley.edu -- Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] monte carlo
On Sun, Oct 25, 2009 at 10:16:58PM +0100, Folkert van Heusden wrote: What method are you guys using for the monte carlo search? What do you do? I pick at random a move, then for the resulting board that comes out of that pick another move and so on. Then, after that I evaulate the number of stones etc. Do you mean you count the score? Make sure you actually count the stone right, and (quite important) that you are picking out moves really at random, not skewed in any way. Yes, the score indeed. Well, only the number of stones removed from the board. I did not implement an algorithm to count areas belonging to a color yet. That would seem to have only quite limited correlation with winning the game. May I ask: how many board-settings (one move + all evaluation) do your programs calculate per second? Depends on the hardware. ;-) We usually count in number of complete playouts per second. On 9x9, libego can play i think about 100k playouts per second on decent CPU, but is very light. Pachi can do 35k light playouts per second and 15k heavy playouts per second on the same CPU (single-threaded). Both Pachi and libego are written in a compiled language; when your speed is in the right order of magnitude, I've found playout move selection improvements much more profitable than further optimizing for speed. -- Petr Pasky Baudis A lot of people have my books on their bookshelves. That's the problem, they need to read them. -- Don Knuth ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] monte carlo
Many Faces of go is currently much slower. On 9x9 it does about 18k playouts per second, on two CPU cores, so about 9K per thread. The average 9x9 game is about 92 moves, so there are about 1.6 million board-settings per second. The playouts are far from random. Many moves are local responses to the last move, and when there is no forced local response, the global moves are not fully random either. My original light code did about 55K playouts per second on one CPU core, but was far weaker. David May I ask: how many board-settings (one move + all evaluation) do your programs calculate per second? Depends on the hardware. ;-) We usually count in number of complete playouts per second. On 9x9, libego can play i think about 100k playouts per second on decent CPU, but is very light. Pachi can do 35k light playouts per second and 15k heavy playouts per second on the same CPU (single-threaded). Both Pachi and libego are written in a compiled language; when your speed is in the right order of magnitude, I've found playout move selection improvements much more profitable than further optimizing for speed. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
I admit I had trouble understanding the details of the paper. What I think is the biggest problem for applying this to bigger (up to 19x19) games is that you somehow need access to the true value of a move, i.e. it's a win or a loss. On the 5x5 board they used, this might be approximated pretty well, but there's no chance on 19x19 to do so. Am 13.08.2009 um 05:14 schrieb Michael Williams: After about the 5th reading, I'm concluding that this is an excellent paper. Is anyone (besides the authors) doing research based on this? There is a lot to do. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
In future papers they should avoid using a strong authority like Fuego for the training and instead force it to learn from a naive uniform random playout policy (with 100x or 1000x more playouts) and then build on that with an iterative approach (as was suggested in the paper). I also had another thought. Since they are training the policy to maximize the balance and not the winrate, wouldn't you be able to extract more information from each trial by using the score instead of the game result? The normal pitfalls to doing so do not apply here. Isaac Deutsch wrote: I admit I had trouble understanding the details of the paper. What I think is the biggest problem for applying this to bigger (up to 19x19) games is that you somehow need access to the true value of a move, i.e. it's a win or a loss. On the 5x5 board they used, this might be approximated pretty well, but there's no chance on 19x19 to do so. Am 13.08.2009 um 05:14 schrieb Michael Williams: After about the 5th reading, I'm concluding that this is an excellent paper. Is anyone (besides the authors) doing research based on this? There is a lot to do. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
A web search turned up a 2 page and an 8 page version. I read the short one. I agree that it's promising work that requires some follow- up research. Now that you've read it so many times, what excites you about it? Can you envision a way to scale it to larger patterns and boards on modern hardware? Sent from my iPhone On Aug 12, 2009, at 11:14 PM, Michael Williams michaelwilliam...@gmail.com wrote: After about the 5th reading, I'm concluding that this is an excellent paper. Is anyone (besides the authors) doing research based on this? There is a lot to do. David Silver wrote: Hi everyone, Please find attached my ICML paper with Gerry Tesauro on automatically learning a simulation policy for Monte-Carlo Go. Our preliminary results show a 200+ Elo improvement over previous approaches, although our experiments were restricted to simple Monte-Carlo search with no tree on small boards. Abstract In this paper we introduce the first algorithms for efficiently learning a simulation policy for Monte-Carlo search. Our main idea is to optimise the balance of a simulation policy, so that an accurate spread of simulation outcomes is maintained, rather than optimising the direct strength of the simulation policy. We develop two algorithms for balancing a simulation policy by gradient descent. The first algorithm optimises the balance of complete simulations, using a policy gradient algorithm; whereas the second algorithm optimises the balance over every two steps of simulation. We compare our algorithms to reinforcement learning and supervised learning algorithms for maximising the strength of the simulation policy. We test each algorithm in the domain of 5x5 and 6x6 Computer Go, using a softmax policy that is parameterised by weights for a hundred simple patterns. When used in a simple Monte- Carlo search, the policies learnt by simulation balancing achieved significantly better performance, with half the mean squared error of a uniform random policy, and equal overall performance to a sophisticated Go engine. -Dave --- - ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
. Pebbles has a Mogo playout design, where you check for patterns only around the last move (or two). In MoGo, it's not only around the last move (at least with some probability and when there are empty spaces in the board); this is the fill board modification. (this provides a big improvement in 19x19 with big numbers of simulations, see http://www.lri.fr/~rimmel/publi/EK_explo.pdf , Fig 3 page 8 - not only quantitative improvement, but also, according to players, a qualitative improvement in the way mogo plays) Best regards, Olivier ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
Just to clarify: I was not saying that Mogo's policy consisted *solely* of looking for patterns around the last move. Merely that it does not look for patterns around *every* point, which other playout policies (e.g., CrazyStone, if I understand Remi's papers correctly) appear to do. The RL paper seems to require that playout design. Fine! BTW, FillBoard seems to help Pebbles, too. A few percent better on 9x9 games. No testing on larger boards. YMMV, and like everything about computer go: all predictions are guaranteed to be wrong, or your money back. For us the improvement is essential in 19x19 - I'll find that for the generality of fillboard if it helps also for you :-) the loss of diversity due to patterns is really clear in some situations, so the problem solved by fillboard is understandable, so I believe it should work also for you - but, as you say, all predictions in computer-go are almost guaranteed to be wrong :-) Olivier ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
After about the 5th reading, I'm concluding that this is an excellent paper. Is anyone (besides the authors) doing research based on this? There is a lot to do. David Silver wrote: Hi everyone, Please find attached my ICML paper with Gerry Tesauro on automatically learning a simulation policy for Monte-Carlo Go. Our preliminary results show a 200+ Elo improvement over previous approaches, although our experiments were restricted to simple Monte-Carlo search with no tree on small boards. Abstract In this paper we introduce the first algorithms for efficiently learning a simulation policy for Monte-Carlo search. Our main idea is to optimise the balance of a simulation policy, so that an accurate spread of simulation outcomes is maintained, rather than optimising the direct strength of the simulation policy. We develop two algorithms for balancing a simulation policy by gradient descent. The first algorithm optimises the balance of complete simulations, using a policy gradient algorithm; whereas the second algorithm optimises the balance over every two steps of simulation. We compare our algorithms to reinforcement learning and supervised learning algorithms for maximising the strength of the simulation policy. We test each algorithm in the domain of 5x5 and 6x6 Computer Go, using a softmax policy that is parameterised by weights for a hundred simple patterns. When used in a simple Monte-Carlo search, the policies learnt by simulation balancing achieved significantly better performance, with half the mean squared error of a uniform random policy, and equal overall performance to a sophisticated Go engine. -Dave ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
Hi, We used alpha=0.1. There may well be a better setting of alpha, but this appeared to work nicely in our experiments. -Dave On 3-May-09, at 2:01 AM, elife wrote: Hi Dave, In your experiments what's the constant value alpha you set? Thanks. 2009/5/1 David Silver sil...@cs.ualberta.ca: Yes, in our experiments they were just constant numbers M=N=100. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
Hi Yamato, If M and N are the same, is there any reason to run M simulations and N simulations separately? What happens if you combine them and calculate V and g in the single loop? I think it gives the wrong answer to do it in a single loop. Note that the simulation outcomes z are used to compute both V and g, so that they are quite strongly correlated. In general, if random variables X and Y are correlated then E[X]E[Y] != E[XY]. A simple example of this problem would be sampling E[X]E[X] for some random variable X. Let's say X is actually just noise with mean zero. If you just take one sample x1, then x1*x1 is always +ve, and you would estimate E[X]E[X]=E[X^2]0. But if you take two independent samples x1 and x2, then x1*x2 would be +ve half the time and -ve half the time, and you would correctly estimate E[X]E[X]=0. -Dave ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
David Silver wrote: 2. Run another N simulations, average the value of psi(s,a) over all positions and moves in games that black won (call this g) This is strange: you do not take lost playouts into consideration. I believe there is a problem with your estimation of the gradient. Suppose for instance that you count z = +1 for a win, and z = -1 for a loss. Then you would take lost playouts into consideration. This makes me a little suspicious. The fundamental problem here may be that your estimate of the gradient is biased by the playout policy. You should probably sample X(s) uniformly at random to have an unbiased estimator. Maybe this can be fixed with importance sampling, and then you may get a formula that is symmetrical regarding wins and losses. I don't have time to do it now, but it may be worth taking a look. Rémi ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
Rémi Coulom wrote: The fundamental problem here may be that your estimate of the gradient is biased by the playout policy. You should probably sample X(s) uniformly at random to have an unbiased estimator. Maybe this can be fixed with importance sampling, and then you may get a formula that is symmetrical regarding wins and losses. I don't have time to do it now, but it may be worth taking a look. Rémi More precisely: you should estimate the value of N playouts as Sum p_i z_i / Sum p_i instead of Sum z_i. Then, take the gradient of Sum p_i z_i / Sum p_i. This would be better. Maybe Sum p_i z_i / Sum p_i would be better for MCTS, too ? Rémi ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
Hi Remi, This is strange: you do not take lost playouts into consideration. I believe there is a problem with your estimation of the gradient. Suppose for instance that you count z = +1 for a win, and z = -1 for a loss. Then you would take lost playouts into consideration. This makes me a little suspicious. Sorry, I should have made it clear that this assumes that we are treating black wins as z=1 and white wins as z=0. In this special case, the gradient is the average of games in which black won. But yes, more generally you need to include games won by both sides. The algorithms in the paper still cover this case - I was just trying to simplify their description to make it easy to understand the ideas. The fundamental problem here may be that your estimate of the gradient is biased by the playout policy. You should probably sample X(s) uniformly at random to have an unbiased estimator. Maybe this can be fixed with importance sampling, and then you may get a formula that is symmetrical regarding wins and losses. I don't have time to do it now, but it may be worth taking a look. The gradient already compensates for the playout policy (equation 9), so in fact it would bias the gradient to sample uniformly at random! In equation 9, the gradient is taken with respect to the playout policy parameters. Using the product rule (third line), the gradient is equal to the playout policy probabilities multiplied by the sum likelihood ratios multiplied by the simulation outcomes z. This gradient can be computed by sampling playouts instead of multiplying by the playout policy probabilities. This is also why games with outcomes of z=0 can be ignored - they don't affect this gradient computation. More precisely: you should estimate the value of N playouts as Sum p_i z_i / Sum p_i instead of Sum z_i. Then, take the gradient of Sum p_i z_i / Sum p_i. This would be better. Maybe Sum p_i z_i / Sum p_i would be better for MCTS, too ? I think a similar point applies here. We care about the expected value of the playout policy, which can be sampled directly from playouts, instead of multiplying by the policy probabilities. You would only need importance sampling if you were using a different playout policy to the one which you are evaluating. But I guess I'm not seeing any good reason to do this? -Dave ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
David Silver wrote: Sorry, I should have made it clear that this assumes that we are treating black wins as z=1 and white wins as z=0. In this special case, the gradient is the average of games in which black won. But yes, more generally you need to include games won by both sides. The algorithms in the paper still cover this case - I was just trying to simplify their description to make it easy to understand the ideas. I understood this. What I find strange is that using -1/1 should be equivalent to using 0/1, but your algorithm behaves differently: it ignores lost games with 0/1, and uses them with -1/1. Imagine you add a big constant to z. One million, say. This does not change the problem. You get either 100 or 101 as outcome of a playout. But then, your estimate of the gradient becomes complete noise. So maybe using -1/1 is better than 0/1 ? Since your algorithm depends so much on the definition of the reward, there must be an optimal way to set the reward. Or there must a better way to define an algorithm that would not depend on an offset in the reward. The gradient already compensates for the playout policy (equation 9), so in fact it would bias the gradient to sample uniformly at random! Yes, you are right. There is still something wrong that I don't understand. There may be a way to quantify the amount of noise in the unbiased gradient estimate, and it would depend on the average reward. Probably setting the average reward to zero is what would minimize noise in the gradient estimate. This is just an intuitive guess. Rémi ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
Hi Remi, I understood this. What I find strange is that using -1/1 should be equivalent to using 0/1, but your algorithm behaves differently: it ignores lost games with 0/1, and uses them with -1/1. Imagine you add a big constant to z. One million, say. This does not change the problem. You get either 100 or 101 as outcome of a playout. But then, your estimate of the gradient becomes complete noise. So maybe using -1/1 is better than 0/1 ? Since your algorithm depends so much on the definition of the reward, there must be an optimal way to set the reward. Or there must a better way to define an algorithm that would not depend on an offset in the reward. There is still something wrong that I don't understand. There may be a way to quantify the amount of noise in the unbiased gradient estimate, and it would depend on the average reward. Probably setting the average reward to zero is what would minimize noise in the gradient estimate. This is just an intuitive guess. Okay, now I understand your point :-) It's a good question - and I think you're right. In REINFORCE any baseline can be subtracted from the reward, without affecting the expected gradient, but possibly reducing its variance. The baseline leading to the best estimate is indeed the average reward. So it should be the case that {-1,+1} would estimate the gradient g more efficiently than {0,1}, assuming that we see similar numbers of black wins as white wins across the training set. So to answer your question, we can safely modify the algorithm to use (z-b) instead of z, where b is the average reward. This would then make the {0,1} and {-1,+1} cases equivalent (with appropriate scaling of step-size). I don't think this would have affected the results we presented (because all of the learning algorithms converged anyway, at least approximately, during training) but it could be an important modification for larger boards. -Dave ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
I wish I was smart :( David Silver wrote: Hi Remi, I understood this. What I find strange is that using -1/1 should be equivalent to using 0/1, but your algorithm behaves differently: it ignores lost games with 0/1, and uses them with -1/1. Imagine you add a big constant to z. One million, say. This does not change the problem. You get either 100 or 101 as outcome of a playout. But then, your estimate of the gradient becomes complete noise. So maybe using -1/1 is better than 0/1 ? Since your algorithm depends so much on the definition of the reward, there must be an optimal way to set the reward. Or there must a better way to define an algorithm that would not depend on an offset in the reward. There is still something wrong that I don't understand. There may be a way to quantify the amount of noise in the unbiased gradient estimate, and it would depend on the average reward. Probably setting the average reward to zero is what would minimize noise in the gradient estimate. This is just an intuitive guess. Okay, now I understand your point :-) It's a good question - and I think you're right. In REINFORCE any baseline can be subtracted from the reward, without affecting the expected gradient, but possibly reducing its variance. The baseline leading to the best estimate is indeed the average reward. So it should be the case that {-1,+1} would estimate the gradient g more efficiently than {0,1}, assuming that we see similar numbers of black wins as white wins across the training set. So to answer your question, we can safely modify the algorithm to use (z-b) instead of z, where b is the average reward. This would then make the {0,1} and {-1,+1} cases equivalent (with appropriate scaling of step-size). I don't think this would have affected the results we presented (because all of the learning algorithms converged anyway, at least approximately, during training) but it could be an important modification for larger boards. -Dave ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
Hi Yamato, Thanks for the detailed explanation. M, N and alpha are constant numbers, right? What did you set them to? You're welcome! Yes, in our experiments they were just constant numbers M=N=100. The feature vector is the set of patterns you use, with value 1 if a pattern is matched and 0 otherwise. The simulation policy selects actions in proportion to the exponentiated, weighted sum of all matching patterns. For example let's say move a matches patterns 1 and 2, move b matches patterns 1 and 3, and move c matches patterns 2 and 4. Then move a would be selected with probability e^(theta1 + theta2) / (e^(theta1 + theta2) + e^(theta1 + theta3) + e^(theta2 + theta4)). The theta values are the weights on the patterns which we would like to learn. They are the log of the Elo ratings in Remi Coulom's approach. OK, I guess it is the formula 5 in the paper. Yes, exactly. The only tricky part is computing the vector psi(s,a). Each component of psi(s,a) corresponds to a particular pattern, and is the difference between the observed feature (i.e. whether the pattern actually occurred after move a in position s) and the expected feature (the average value of the pattern, weighted by the probability of selecting each action). I still don't understand this. Is it the formula 6? Could you please give me an example like the above? Yes that's right, this is equation 6. Okay, let's continue the example above. Let's say that in position s, using the current theta, moves a, b and c will be selected with probability 0.5, 0.3 and 0.2 respectively. Let's say that move a was actually selected. Now consider pattern 1, this is matched after a. But the probability of matching was (0.5*1 +0.3*1 +0.2*0) = 0.8. So psi_1(s,a)=1-0.8 = 0.2. For pattern 2, psi_2(s,a)=1- (0.5*1+0.3*0+0.2*1)=0.3, etc.. So in this example the vector psi(s,a) = [0.2,0.3,-0.3,-0.2]. In other words, psi tells us whether each pattern was actually matched more or less than we could have expected. Hope this helps. -Dave ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
IMO other people's equations/code/ideas/papers always seem smarter than your own. The stuff you understand and do yourself just seems like common sense, and the stuff you don't always has a mystical air of complexity, at least until you understand it too :-) On 30-Apr-09, at 1:59 PM, Michael Williams wrote: I wish I was smart :( ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
David Silver wrote: Yes, in our experiments they were just constant numbers M=N=100. If M and N are the same, is there any reason to run M simulations and N simulations separately? What happens if you combine them and calculate V and g in the single loop? Okay, let's continue the example above. Let's say that in position s, using the current theta, moves a, b and c will be selected with probability 0.5, 0.3 and 0.2 respectively. Let's say that move a was actually selected. Now consider pattern 1, this is matched after a. But the probability of matching was (0.5*1 +0.3*1 +0.2*0) = 0.8. So psi_1(s,a)=1-0.8 = 0.2. For pattern 2, psi_2(s,a)=1- (0.5*1+0.3*0+0.2*1)=0.3, etc.. So in this example the vector psi(s,a) = [0.2,0.3,-0.3,-0.2]. In other words, psi tells us whether each pattern was actually matched more or less than we could have expected. I understood what psi was. I am not sure how it works, but anyway I can see your algorithm now. Thanks. -- Yamato ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
Hi Yamato, Could you give us the source code which you used? Your algorithm is too complicated, so it would be very helpful if possible. Actually I think the source code would be much harder to understand! It is written inside RLGO, and makes use of a substantial existing framework that would take a lot of effort to understand. (On a separate note I am considering making RLGO open source at some point, but I'd prefer to spend some effort cleaning it up before making it public). But I think maybe Algorithm 1 is much easier than you think: A: Estimate value V* of every position in a training set, using deep rollouts. B: Repeat, for each position in the training set 1. Run M simulations, estimate value of position (call this V) 2. Run another N simulations, average the value of psi(s,a) over all positions and moves in games that black won (call this g) 3. Adjust parameters by alpha * (V* - V) * g The feature vector is the set of patterns you use, with value 1 if a pattern is matched and 0 otherwise. The simulation policy selects actions in proportion to the exponentiated, weighted sum of all matching patterns. For example let's say move a matches patterns 1 and 2, move b matches patterns 1 and 3, and move c matches patterns 2 and 4. Then move a would be selected with probability e^(theta1 + theta2) / (e^(theta1 + theta2) + e^(theta1 + theta3) + e^(theta2 + theta4)). The theta values are the weights on the patterns which we would like to learn. They are the log of the Elo ratings in Remi Coulom's approach. The only tricky part is computing the vector psi(s,a). Each component of psi(s,a) corresponds to a particular pattern, and is the difference between the observed feature (i.e. whether the pattern actually occurred after move a in position s) and the expected feature (the average value of the pattern, weighted by the probability of selecting each action). It's also very important to be careful about signs and the colour to play - it's easy to make a mistake and follow the gradient in the wrong direction. Is that any clearer? -Dave ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
David Silver wrote: Hi Michael, But one thing confuses me: You are using the value from Fuego's 10k simulations as an approximation of the actual value of the position. But isn't the actual value of the position either a win or a loss? On such small boards, can't you assume that Fuego is able to correctly determin who is winning and round it's evaluation to the nearest win/loss? i.e. if it evaluates the position to 0.674, that gets rounded to 1. If such an assumption about Fuego's ability to read the position on a small board is valid, then it should improve the results of your balanced simulation strategy, right? Or am I missing something? It's true that 5x5 Go is solved, so in principle we could have used the true minimax values. However we chose to use an approach that can scale to larger boards, which means that we should treat the expert evaluations as approximate. And in fact Fuego was not always accurate on 6x6 boards, as we used only 10k simulations in our training set. Also, I think it really helps to have soft rather than hard expert evaluations. We want a simulation policy that helps differentiate e.g. a 90% winning position from an 85% winning position. Rounding all the expert evaluations to 0 or 1 would lose much of this advantage. -Dave By this argument (your last paragraph), you need to do some magical number of simulations for the training data. Not enough simulations and you have too much noise. And infinite simulations gives you hard 0 or 1 results. But I can't argue with your results. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Re: [computer go] Monte-Carlo Simulation Balancing
Hi Remi, What komi did you use for 5x5 and 6x6 ? I used 7.5 komi for both board sizes. I find it strange that you get only 70 Elo points from supervised learning over uniform random. Don't you have any feature for atari extension ? This one alone should improve strength immensely (extend string in atari because of the previous move). Actually no. The features are very simple, and know how to capture but not how to defend ataris. I'm sure that a better set of features could improve by more than 70 Elo, but I expect we would still see a benefit to balancing the weights correctly. For example, the Fuego policy defends ataris and follows several other common-sense rules, but the results in 5x5 and 6x6 show that it is not well balanced on small boards. Let us extrapolate: I got more than 200 Elo points of improvements from my patterns in 9x9 over uniform random (I never really measured that, it may be even more than 200). I guess you got more than 200 Elo on 9x9, in Mogo (Gelly et al. 2006) the improvement from uniform random was at least 400 Elo and I think your simulation policy is probably at least as strong. By the way I was sad to hear you're not working on Crazystone any more. Is it because you are you busy with other projects? So maybe I could get 600 more Elo points with your method. And even more on 19x19. I noticed that, in general, changes in the playout policy have a much bigger impact on larger boards than on smaller boards. As to whether we can extrapolate, I hope so :-) I share the same feeling that improving the simulation policy will be more impactful on bigger boards with longer simulations. On the other hand I've been surprised by many things in MC Go, so nothing is certain until we try it! -Dave ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
David Silver wrote: A: Estimate value V* of every position in a training set, using deep rollouts. B: Repeat, for each position in the training set 1. Run M simulations, estimate value of position (call this V) 2. Run another N simulations, average the value of psi(s,a) over all positions and moves in games that black won (call this g) 3. Adjust parameters by alpha * (V* - V) * g Thanks for the detailed explanation. M, N and alpha are constant numbers, right? What did you set them to? The feature vector is the set of patterns you use, with value 1 if a pattern is matched and 0 otherwise. The simulation policy selects actions in proportion to the exponentiated, weighted sum of all matching patterns. For example let's say move a matches patterns 1 and 2, move b matches patterns 1 and 3, and move c matches patterns 2 and 4. Then move a would be selected with probability e^(theta1 + theta2) / (e^(theta1 + theta2) + e^(theta1 + theta3) + e^(theta2 + theta4)). The theta values are the weights on the patterns which we would like to learn. They are the log of the Elo ratings in Remi Coulom's approach. OK, I guess it is the formula 5 in the paper. The only tricky part is computing the vector psi(s,a). Each component of psi(s,a) corresponds to a particular pattern, and is the difference between the observed feature (i.e. whether the pattern actually occurred after move a in position s) and the expected feature (the average value of the pattern, weighted by the probability of selecting each action). I still don't understand this. Is it the formula 6? Could you please give me an example like the above? -- Yamato ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
David Silver wrote: I don't think the 200+ Elo improvement is so impressive I agree that it would be much more impressive to report positive results on larger boards. But perhaps it is already interesting that tuning the balance of the simulation policy can make a big difference on small boards? Also, larger boards mean longer simulations, and so the importance of simulation balancing should become even more exaggerated. What komi did you use for 5x5 and 6x6 ? I find it strange that you get only 70 Elo points from supervised learning over uniform random. Don't you have any feature for atari extension ? This one alone should improve strength immensely (extend string in atari because of the previous move). Let us extrapolate: I got more than 200 Elo points of improvements from my patterns in 9x9 over uniform random (I never really measured that, it may be even more than 200). So maybe I could get 600 more Elo points with your method. And even more on 19x19. I noticed that, in general, changes in the playout policy have a much bigger impact on larger boards than on smaller boards. Rémi ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
I noticed that, in general, changes in the playout policy have a much bigger impact on larger boards than on smaller boards. Rémi I think rating differences are emplified on larger boards. This is easy to see if you think about it this way : Somehow a 19x19 board is like 4 9x9 boards. Let us define a new game that I would call 4-Go where instead of playing one game, you play simultenously 4 games and determine the winner by calculating the sum of the scores of the four games. Certainly rating differences would be bigger with 4-go than with go (given the same two players). This explains why rating differences are bigger on 19x19 than 9x9. Ivan ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
A simplistic model that helps explain this is golf. On a single hole, even a casual golfer has a realistic chance of out-golfing Tiger Woods. Tiger occasionally shoots a 1 over par on some hole and even weak amateurs occasionally par or even birdie a hole.It's not going to happen a lot, but it's not ridiculous either. Years ago I taught a player how to golf, and on his third time out with me, he hit a hole in one on a short par 3. If Tiger Woods had been playing with us, he would have lost that hole to this beginner. But in a 9 hole match, the odds go down enormously - for all practical purposes there is no chance. I kind of think of GO like that, even though it's a pretty simplistic model. Each move is like a hole of golf, it can be a good shot or a bad one. With GO, however, probably a LOT of your moves are just as good as the moves of a good player. But it's the ones that fall short, that kill you. Go on a big board is like 18 holes of golf compared to just 1 or 2 holes of golf. The better player is far more likely to win the 18 hole match than the 1 hole match. - Don On Tue, Apr 28, 2009 at 1:53 PM, Ivan Dubois dubois_i...@yahoo.fr wrote: I noticed that, in general, changes in the playout policy have a much bigger impact on larger boards than on smaller boards. Rémi I think rating differences are emplified on larger boards. This is easy to see if you think about it this way : Somehow a 19x19 board is like 4 9x9 boards. Let us define a new game that I would call 4-Go where instead of playing one game, you play simultenously 4 games and determine the winner by calculating the sum of the scores of the four games. Certainly rating differences would be bigger with 4-go than with go (given the same two players). This explains why rating differences are bigger on 19x19 than 9x9. Ivan ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
also, i'm not sure that a lot of most amateurs' moves are very good. the spectrum of bad moves is wide, it's just that it takes someone many stones stronger to severely punish small differences between good and nearly-good moves. among players of relatively similar strength, these differences will go unnoticed and unpunished. s. 2009/4/28 Don Dailey dailey@gmail.com: A simplistic model that helps explain this is golf. On a single hole, even a casual golfer has a realistic chance of out-golfing Tiger Woods. Tiger occasionally shoots a 1 over par on some hole and even weak amateurs occasionally par or even birdie a hole. It's not going to happen a lot, but it's not ridiculous either. Years ago I taught a player how to golf, and on his third time out with me, he hit a hole in one on a short par 3. If Tiger Woods had been playing with us, he would have lost that hole to this beginner. But in a 9 hole match, the odds go down enormously - for all practical purposes there is no chance. I kind of think of GO like that, even though it's a pretty simplistic model. Each move is like a hole of golf, it can be a good shot or a bad one. With GO, however, probably a LOT of your moves are just as good as the moves of a good player. But it's the ones that fall short, that kill you. Go on a big board is like 18 holes of golf compared to just 1 or 2 holes of golf. The better player is far more likely to win the 18 hole match than the 1 hole match. - Don On Tue, Apr 28, 2009 at 1:53 PM, Ivan Dubois dubois_i...@yahoo.fr wrote: I noticed that, in general, changes in the playout policy have a much bigger impact on larger boards than on smaller boards. Rémi I think rating differences are emplified on larger boards. This is easy to see if you think about it this way : Somehow a 19x19 board is like 4 9x9 boards. Let us define a new game that I would call 4-Go where instead of playing one game, you play simultenously 4 games and determine the winner by calculating the sum of the scores of the four games. Certainly rating differences would be bigger with 4-go than with go (given the same two players). This explains why rating differences are bigger on 19x19 than 9x9. Ivan ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
A fair number of amateur moves are exactly what a pro would use; this is no great surprise, since the moves are actually copied from professional games. The weaker player can blindly emulate, but does not know why a pro plays A instead of B or C; does not know how to punish slack moves. I've been taking some classes from an 8d pro. Sadly, I'm at the bottom of the class. The top players in these classes are 8d amateurs; the pro can give them 3 stones and win. When going over their games, many of their moves are quite respectable, but the pro is able to find mistakes which can be exploited with devastating effect. You can't afford to be a little bit off against a pro, but weaker players find it hard to exploit slack moves. When explaining his moves, the pro often uses very deep reading. We've all seen simple cases, where move A is advisable if a ladder works, otherwise B is recommended. The pro knows six moves in advance that he'll need to read that ladder, before the ladder is actually on the board. These are the simplest expositions I've seen; things get more complex from there. Terry McIntyre terrymcint...@yahoo.com Government is an association of men who do violence to the rest of us. - Leo Tolstoy From: steve uurtamo uurt...@gmail.com To: computer-go computer-go@computer-go.org Sent: Tuesday, April 28, 2009 5:09:20 PM Subject: Re: [computer-go] Monte-Carlo Simulation Balancing also, i'm not sure that a lot of most amateurs' moves are very good. the spectrum of bad moves is wide, it's just that it takes someone many stones stronger to severely punish small differences between good and nearly-good moves. among players of relatively similar strength, these differences will go unnoticed and unpunished. s. 2009/4/28 Don Dailey dailey@gmail.com: A simplistic model that helps explain this is golf. On a single hole, even a casual golfer has a realistic chance of out-golfing Tiger Woods. Tiger occasionally shoots a 1 over par on some hole and even weak amateurs occasionally par or even birdie a hole.It's not going to happen a lot, but it's not ridiculous either. Years ago I taught a player how to golf, and on his third time out with me, he hit a hole in one on a short par 3. If Tiger Woods had been playing with us, he would have lost that hole to this beginner. But in a 9 hole match, the odds go down enormously - for all practical purposes there is no chance. I kind of think of GO like that, even though it's a pretty simplistic model. Each move is like a hole of golf, it can be a good shot or a bad one. With GO, however, probably a LOT of your moves are just as good as the moves of a good player. But it's the ones that fall short, that kill you. Go on a big board is like 18 holes of golf compared to just 1 or 2 holes of golf. The better player is far more likely to win the 18 hole match than the 1 hole match. - Don On Tue, Apr 28, 2009 at 1:53 PM, Ivan Dubois dubois_i...@yahoo.fr wrote: I noticed that, in general, changes in the playout policy have a much bigger impact on larger boards than on smaller boards. Rémi I think rating differences are emplified on larger boards. This is easy to see if you think about it this way : Somehow a 19x19 board is like 4 9x9 boards. Let us define a new game that I would call 4-Go where instead of playing one game, you play simultenously 4 games and determine the winner by calculating the sum of the scores of the four games. Certainly rating differences would be bigger with 4-go than with go (given the same two players). This explains why rating differences are bigger on 19x19 than 9x9. Ivan ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
David Silver wrote: because the previous approaches were not optimized for such a small boards. I'm not sure what you mean here? The supervised learning and reinforcement learning approaches that we compared against are both trained on the small boards, i.e. the pattern weights are specifically optimised for that size of board. Ah, sorry. I missed it. I agree that the handcrafted policy from Fuego was not optimised for small boards, which is why it performed poorly. But perhaps this is also interesting, i.e. it suggests that a handcrafted policy for 9x9 Go may be significantly suboptimal when used in 19x19 Go. So automatically learning a simulation policy may ultimately prove to be very beneficial. It is surprising Fuego was far worse than uniform random on 5x5 and 6x6. These results show the particularity of the very small boards. I suppose 5x5 is very different from 9x9 but 19x19 is not so from 9x9. I'm looking forward to your results on larger boards. Me too :-) Coming soon, will let you know how it goes. But I hope that others will try these ideas too, it's always much better to compare multiple implementations of the same algorithm. Could you give us the source code which you used? Your algorithm is too complicated, so it would be very helpful if possible. -- Yamato ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
But I'm only trying to make a point, not pin the analogy down perfectly. Naturally the stronger the player, the more likely his moves will conform to the level of the top players. The basic principle is that the longer the contest, the more opportunities a strong player has to demonstrate his superiority over an inferior player and/or recover from mistakes or surprises.In my golf analogy, my friend hits a hole in one. Let's say it was on the first hole of 9 against Tiger Woods. Tiger likely would have hit it in 2 strokes, and my beginner friend would be ahead by 1 stroke!But that 1 stroke lead would almost certainly disappear by the time the next hole was completed, as my friend was typically taking 7 or 8 shots on that par 3 course! Another way to look at this, is that a random legal move generator has a chance to play a perfect game, although that chance is almost infinitesimal. But even that small chance can be an order of magnitude larger if you can shorted the contest even by 1 or 2 moves. - Don On Tue, Apr 28, 2009 at 8:09 PM, steve uurtamo uurt...@gmail.com wrote: also, i'm not sure that a lot of most amateurs' moves are very good. the spectrum of bad moves is wide, it's just that it takes someone many stones stronger to severely punish small differences between good and nearly-good moves. among players of relatively similar strength, these differences will go unnoticed and unpunished. s. 2009/4/28 Don Dailey dailey@gmail.com: A simplistic model that helps explain this is golf. On a single hole, even a casual golfer has a realistic chance of out-golfing Tiger Woods. Tiger occasionally shoots a 1 over par on some hole and even weak amateurs occasionally par or even birdie a hole.It's not going to happen a lot, but it's not ridiculous either. Years ago I taught a player how to golf, and on his third time out with me, he hit a hole in one on a short par 3. If Tiger Woods had been playing with us, he would have lost that hole to this beginner. But in a 9 hole match, the odds go down enormously - for all practical purposes there is no chance. I kind of think of GO like that, even though it's a pretty simplistic model. Each move is like a hole of golf, it can be a good shot or a bad one. With GO, however, probably a LOT of your moves are just as good as the moves of a good player. But it's the ones that fall short, that kill you. Go on a big board is like 18 holes of golf compared to just 1 or 2 holes of golf. The better player is far more likely to win the 18 hole match than the 1 hole match. - Don On Tue, Apr 28, 2009 at 1:53 PM, Ivan Dubois dubois_i...@yahoo.fr wrote: I noticed that, in general, changes in the playout policy have a much bigger impact on larger boards than on smaller boards. Rémi I think rating differences are emplified on larger boards. This is easy to see if you think about it this way : Somehow a 19x19 board is like 4 9x9 boards. Let us define a new game that I would call 4-Go where instead of playing one game, you play simultenously 4 games and determine the winner by calculating the sum of the scores of the four games. Certainly rating differences would be bigger with 4-go than with go (given the same two players). This explains why rating differences are bigger on 19x19 than 9x9. Ivan ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
Finally! I guess you can add this technique to your list, Lukasz. David Silver wrote: Hi everyone, Please find attached my ICML paper with Gerry Tesauro on automatically learning a simulation policy for Monte-Carlo Go. Our preliminary results show a 200+ Elo improvement over previous approaches, although our experiments were restricted to simple Monte-Carlo search with no tree on small boards. Abstract In this paper we introduce the first algorithms for efficiently learning a simulation policy for Monte-Carlo search. Our main idea is to optimise the balance of a simulation policy, so that an accurate spread of simulation outcomes is maintained, rather than optimising the direct strength of the simulation policy. We develop two algorithms for balancing a simulation policy by gradient descent. The first algorithm optimises the balance of complete simulations, using a policy gradient algorithm; whereas the second algorithm optimises the balance over every two steps of simulation. We compare our algorithms to reinforcement learning and supervised learning algorithms for maximising the strength of the simulation policy. We test each algorithm in the domain of 5x5 and 6x6 Computer Go, using a softmax policy that is parameterised by weights for a hundred simple patterns. When used in a simple Monte-Carlo search, the policies learnt by simulation balancing achieved significantly better performance, with half the mean squared error of a uniform random policy, and equal overall performance to a sophisticated Go engine. -Dave ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
David Silver wrote: Hi everyone, Please find attached my ICML paper with Gerry Tesauro on automatically learning a simulation policy for Monte-Carlo Go. Our preliminary results show a 200+ Elo improvement over previous approaches, although our experiments were restricted to simple Monte-Carlo search with no tree on small boards. Very interesting, thanks. If I understand correctly, your method makes your program 250 Elo points stronger than my pattern-learning algorithm on 5x5 and 6x6, by just learning better weights. That is impressive. I had stopped working on my program for a very long time, but maybe I will go back to work and try your algorithm. Rémi ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
My favorite part: One natural idea is to use the learned simulation policy in Monte-Carlo search, and generate new deep search values, in an iterative cycle. But one thing confuses me: You are using the value from Fuego's 10k simulations as an approximation of the actual value of the position. But isn't the actual value of the position either a win or a loss? On such small boards, can't you assume that Fuego is able to correctly determin who is winning and round it's evaluation to the nearest win/loss? i.e. if it evaluates the position to 0.674, that gets rounded to 1. If such an assumption about Fuego's ability to read the position on a small board is valid, then it should improve the results of your balanced simulation strategy, right? Or am I missing something? David Silver wrote: Hi everyone, Please find attached my ICML paper with Gerry Tesauro on automatically learning a simulation policy for Monte-Carlo Go. Our preliminary results show a 200+ Elo improvement over previous approaches, although our experiments were restricted to simple Monte-Carlo search with no tree on small boards. Abstract In this paper we introduce the first algorithms for efficiently learning a simulation policy for Monte-Carlo search. Our main idea is to optimise the balance of a simulation policy, so that an accurate spread of simulation outcomes is maintained, rather than optimising the direct strength of the simulation policy. We develop two algorithms for balancing a simulation policy by gradient descent. The first algorithm optimises the balance of complete simulations, using a policy gradient algorithm; whereas the second algorithm optimises the balance over every two steps of simulation. We compare our algorithms to reinforcement learning and supervised learning algorithms for maximising the strength of the simulation policy. We test each algorithm in the domain of 5x5 and 6x6 Computer Go, using a softmax policy that is parameterised by weights for a hundred simple patterns. When used in a simple Monte-Carlo search, the policies learnt by simulation balancing achieved significantly better performance, with half the mean squared error of a uniform random policy, and equal overall performance to a sophisticated Go engine. -Dave ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
David Silver wrote: Please find attached my ICML paper with Gerry Tesauro on automatically learning a simulation policy for Monte-Carlo Go. Our preliminary results show a 200+ Elo improvement over previous approaches, although our experiments were restricted to simple Monte-Carlo search with no tree on small boards. I like you idea, but why do you use only 5x5 and 6x6 Go? I don't think the 200+ Elo improvement is so impressive because the previous approaches were not optimized for such a small boards. I'm looking forward to your results on larger boards. -- Yamato ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Tree Search reference bot
Finally got a window of time. It's called the epsilon trick. Here is a cut and paste of Lukasz' explanation from the archives, including how to calculate N. - Dave Hillis -- -Original Message- From: [EMAIL PROTECTED] To: computer-go@computer-go.org Sent: Sun, 14 Jan 2007 4:50 AM Subject: [computer-go] Fast UCT (epsilon trick) Hi I've looked back into the article and I see that it is not trivial to understand how does the epsilon trick (ET) works. Therefore I will describe it here on the example of UCT. The bottleneck of UCT algorithm is choosing a move. Program has to iterate over all children evaluating their UCB value. We want to use of this loop rarely. One way to do it is to group playouts together. I.e. play 2 or more MC simulations per each tree traversal. But this is not good enough as it doesn't distribute the playouts over the tree and some parts will not get enough of them. The ET works as follows: With each node of the tree program keeps additional information: - child_to_visit - visits_to_go When we visit node, if node.visits_to_go 0 then node-visits_to_go - node.visits_to_go - 1 node = child_to_visit So we have predetermined strategy to go to particular child visits_to_go times. But if node.visis_to_go = 0 then // we have to set the strategy first node-child_to_visit = UCT_find_child (node) node-visits_to_go = Epsilon * node-visited_count // round up As we can see, when Epsilon is small or when a particular node is visited not too many times, UCT-ET is identical to UCT. But when node was visited more times (near the root, far from leaves), then UCT-ET may save some time and go to precalculated child. That's it. I hope You like it :) Łukasz Lew PS. In Df-PN, good Epsilon was 1/4 I believe that here it will be much smaller. 1/64 maybe. Or even one may change the Epsilon equation to: node-visits_to_go = Epsilon * sqrt(node-visited_count) If someone will do some experiments, I'm eager to hear the results. PS2 float InvSqrt (float x) { float xhalf = 0.5f * x; int i = *(int*)x; i = 0x5f3759df - (i1); x = *(float*)i; x = x * (1.5f - xhalf * x * x); return x; } ;-) ___ -- -Original Message- From: [EMAIL PROTECTED] To: computer-go@computer-go.org Sent: Wed, 3 Dec 2008 12:56 am Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot Hmm.. it could be that N is picked randomly each time... now I can't seem to find the description and my memory is not serving me well here. Perhaps someone else remembers? I'll try to track it down. - Dave Hillis -Original Message- From: Michael Williams [EMAIL PROTECTED] To: computer-go computer-go@computer-go.org Sent: Wed, 3 Dec 2008 12:14 am Subject: Re: [computer-go] Monte-Carlo Tree Search reference b ot That's going to repeat the same exact path through the tree three times, isn't it? If so, it seems like it would be more efficient to do N playouts from the leaf after each traversal of the tree. [EMAIL PROTECTED] wrote: There is another speedup trick that might interest you. IIRC Lukasz Lew came up with it, but I forget what he called it. After calculating the selected move for an internal node (going through UCT and RAVE and whatnot) store that move. Then, for the next N visits to that node (where N is 5 or 10 or so), just repeat that move without having to calculate what the move would be. - Dave Hillis -Original Message- From: Mark Boon [EMAIL PROTECTED] To: computer-go computer-go@computer-go.org Sent: Tue, 2 Dec 2008 11:17 pm Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot I have made some minor performance improvements and this is as far as I intend to take this particular project. I might make some small changes if necessary, but most likely I'll leave this largely unchanged from now. I had set myself as an arbitrary goal that it should do at least 20K playouts. But with real liberties, AMAF and a RAVE formula I got stuck in the 16K-17K range. According to my profiler that is mostly due to the expensive formula used to compare nodes, where it says it spends 25% of total time. The formula I'm using is: beta * (virtu al-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT) beta = 1 - log(parent-visits) / 20 UCT = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) ) RAVE = exploration-factor *sqrt( log(parent-visits) / (nr-virtual-visits+1) ) There are probably quite a few possibilities still to tune this program with regards to playing streng th and performance. But I felt it doesn't help to obscure the implementation by too many details. The implementation of the search algorithm was entirely game-independent, until I introduced AMAF. I didn't see how to fix that, as there's no way getting around
Re: [computer-go] Monte-Carlo Tree Search reference bot
What I did (was a long time ago, I don't know if it is still used in Mogo), is to compute the m best moves every so often and most of the time just do the max over those m moves. m was on the order of 5, and every so often was an increasing function like sqrt or log (I don't remember). That speeds up quite a lot (when the max becomes the bottleneck), and if you tune the parameters well you almost don't loose any strength at a fixed number of playouts. Sylvain 2008/12/3 [EMAIL PROTECTED]: There is another speedup trick that might interest you. IIRC Lukasz Lew came up with it, but I forget what he called it. After calculating the selected move for an internal node (going through UCT and RAVE and whatnot) store that move. Then, for the next N visits to that node (where N is 5 or 10 or so), just repeat that move without having to calculate what the move would be. - Dave Hillis -Original Message- From: Mark Boon [EMAIL PROTECTED] To: computer-go computer-go@computer-go.org Sent: Tue, 2 Dec 2008 11:17 pm Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot I have made some minor performance improvements and this is as far as I intend to take this particular project. I might make some small changes if necessary, but most likely I'll leave this largely unchanged from now. I had set myself as an arbitrary goal that it should do at least 20K playouts. But with real liberties, AMAF and a RAVE formula I got stuck in the 16K-17K range. According to my profiler that is mostly due to the expensive formula used to compare nodes, where it says it spends 25% of total time. The formula I'm using is: beta * (virtual-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT) beta = 1 - log(parent-visits) / 20 UCT = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) ) RAVE = exploration-factor *sqrt( log(parent-visits) / (nr-virtual-visits+1) ) There are probably quite a few possibilities still to tune this program with regards to playing strength and performance. But I felt it doesn't help to obscure the implementation by too many details. The implementation of the search algorithm was entirely game-independent, until I introduced AMAF. I didn't see how to fix that, as there's no way getting around that it's based on the fact that a move is represented by a single coordinate, which is mapped to an array to store the statistical values. But strip the AMAF part, which is a block of 30 odd lines of code, and I think it can be used for other games basically as-is. I did this not because I ever see myself program another game, but because in my experience in doing so I get a cleaner separation between modules. At 2,000 playouts, it's still quite a bit weaker than the plain MC-AMAF refbot. It wins only about 33%. But that's probably because the 1,000-2,000 playouts range is the sweet-spot for that particular type of playing engine. It doesn't scale from there, whereas the MCTS ref-bot only just gets warmed up with 2,000 playouts. This leads me to a question. I suppose it might be of some interest to put this bot up on CGOS. But what parameters to use? The main one being the number of playouts, naturally. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ Tis the season to save your money! Get the new AOL Holiday Toolbar for money saving offers and gift ideas. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Tree Search reference bot
On Wed, Dec 3, 2008 at 05:17, Mark Boon [EMAIL PROTECTED] wrote: I had set myself as an arbitrary goal that it should do at least 20K playouts. But with real liberties, AMAF and a RAVE formula I got stuck in the 16K-17K range. According to my profiler that is mostly due to the expensive formula used to compare nodes, where it says it spends 25% of total time. The formula I'm using is: beta * (virtual-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT) beta = 1 - log(parent-visits) / 20 UCT = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) ) RAVE = exploration-factor *sqrt( log(parent-visits) / (nr-virtual-visits+1) ) Hi, This is probably something you already do, but just in case you don't, here comes a simple optimization trick: since parents-visits is an integer value and in a somewhat bounded range (0..max_playouts), the logarithm can be precomputed in a table. Then even sqrt(log()) can be precomputed also. Similarly, a table with sqrt(integer()) can be used for the other values. best regards, Vlad ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Tree Search reference bot
On 3-dec-08, at 06:09, Sylvain Gelly wrote: What I did (was a long time ago, I don't know if it is still used in Mogo), is to compute the m best moves every so often and most of the time just do the max over those m moves. m was on the order of 5, and every so often was an increasing function like sqrt or log (I don't remember). That speeds up quite a lot (when the max becomes the bottleneck), and if you tune the parameters well you almost don't loose any strength at a fixed number of playouts. I thought about that, but I was afraid the code would become too obscure. After all, this is supposed to be a reference implementation. But maybe I should actually give it a try to see what it would look like. I also played with precomputing a table of the log() function but found little speedup. Instead I decided to (p)recompute the log(nr- parent-visits) only once in 8 times. According to my profiler that reduced the log() from using 8% to 1% of computation time. Another thing I did was replace log(nr-virtual-parent-visits) in the RAVE calculation by the same log(nr-parent-visits) used in the UCT calculation, cutting both the number of times I need to call log in half and reducing the amount of storage in the node a little. After these modifications I didn't notice any degradation in play. In terms of memory use, I did a few obvious things to keep storage of the nodes to a minimum. I had seen people write about memory usage of the tree, but never understood the concerns. But that was because I always expanded the tree one node at a time. Of course, if you create all the children in one go like this reference implementation does, it's a little bit a different story. But still, I have to push my computer hard to run out of space before I run out of time so I didn't see much need to reduce memory too aggressively. Using just a moderately small number of playouts before expansion is already enough to never have memory problems. But if memory is a concern to anyone I see two obvious ways to make make significant reductions. One is to flatten the data-structure, reducing the object overhead Java imposes. That could save up to a third. The other is to use smaller placeholders for nodes that have been created, but not yet visited. Once a node gets visited it can be replaced by a full-blown node, but until then all you need is the move-coordinate and the AMAF statistics. That should save up to 75% or so. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Tree Search reference bot
On Wed, Dec 03, 2008 at 09:55:07AM -0200, Mark Boon wrote: I thought about that, but I was afraid the code would become too obscure. After all, this is supposed to be a reference implementation. But maybe I should actually give it a try to see what it would look like. I agree that the reference implementation should be maximally clear code. Performance is not all that important here, correctness is. Having said that, I can't help responding to one detail: I had seen people write about memory usage of the tree, but never understood the concerns. One thing to remember is that more memory use means more cache misses, and more access to the main memory. On modern computers, those can cost as much as executing a thousand instructions! So memory optimizing can often pay off, also with respect to time! Of course, Java does a lot of memory management behind the scenes, so optimizing such can be harder... But when writing in C or C++, it does make sense. -H -- Heikki Levanto In Murphy We Turst heikki (at) lsd (dot) dk ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Tree Search reference bot
On 3-dec-08, at 10:31, Heikki Levanto wrote: Having said that, I can't help responding to one detail: I had seen people write about memory usage of the tree, but never understood the concerns. One thing to remember is that more memory use means more cache misses, and more access to the main memory. On modern computers, those can cost as much as executing a thousand instructions! So memory optimizing can often pay off, also with respect to time! Of course, Java does a lot of memory management behind the scenes, so optimizing such can be harder... But when writing in C or C++, it does make sense. I've seen this stated often. But color me skeptical. If what you say is true, then the MC-AMAF bot, which hardly uses any memory at all and which fits in the secondary cache entirely, code and data combined, should show an enormous speed-up in terms of number of playouts per second. Instead it does only about twice the number of playouts per second as the tree-search version, which can be almost if not entirely explained by the extra work that needs to be done to traverse and maintain the tree. If this is because I use Java, then Don's concise C implementation of the MC-AMAF bot should be a lot faster than my bloated Java version. Which, ahem, is not the case at all. I think a well-crafted C program can be up to twice as fast as a similar Java program. But I doubt that has much to do with cache misses. I think in computer-Go, trying to control cache misses is futile no matter the language. Maybe you can do it for a relatively trivial, well-understood and stable sub-part. But then you risk losing the gain again as soon as it gets incorporated into a larger whole. So your efforts go to waste. So this is in the show me category for me. You can use C or C++ if you like, but show me a tree-search bot that speeds up the number of playouts significantly by reducing the node-size. Maybe it's possible, but I don't believe it until I see it. This could be different for Chess programs, at least I've seen it claimed often. But I don't envy them, it must be excruciatingly frustrating to deal with. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Tree Search reference bot
On Wed, Dec 03, 2008 at 11:24:22AM -0200, Mark Boon wrote: Heikki wrote: One thing to remember is that more memory use means more cache misses, and more access to the main memory. On modern computers, those can cost as much as executing a thousand instructions! So memory optimizing can often pay off, also with respect to time! I've seen this stated often. But color me skeptical. Maybe I was quoting common wisdom without having checked it personally. I only remember one case - not related to go at all - where optimizing the memory stuff made a noticeable difference. I would also like to see hard evidence. But not badly enough to put in the necessary work to get some ;-) - H -- Heikki Levanto In Murphy We Turst heikki (at) lsd (dot) dk ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Tree Search reference bot
I had a chess program years ago that was blindingly fast on some computers, very slow on others. It was all about the cache. The move generator was hard coded for each piece on each square. For instance a white pawn on d7 had it's very own move specialized move generator. There was a function for each piece of each color on each square. By doing this I avoided the majority of conditional tests. I did not have to test for board edge, pawn on 7th, pawn on the second rank, etc. All the logic was unrolled and I basically had to write a program to write the move generator. The machine I developed this program on had a lot of cache and it was not a problem. But when running on, for instance a laptop, it would slow down enormously - something like 4 to 1 where other cache friendly programs would slow down perhaps 25%. I don't keep up much with processor technologies now and I don't know how much data vs instruction cache modern setups have, but I know it is far better than it used to be. I bet that old program would run well on modern computers. One trick, which you are probably doing, is to keep data structures together in memory. For instance it is better to have children next to each other so that you are not accessing memory all over the place, perhaps incurring several cache misses instead of one. On Wed, 2008-12-03 at 11:24 -0200, Mark Boon wrote: On 3-dec-08, at 10:31, Heikki Levanto wrote: Having said that, I can't help responding to one detail: I had seen people write about memory usage of the tree, but never understood the concerns. One thing to remember is that more memory use means more cache misses, and more access to the main memory. On modern computers, those can cost as much as executing a thousand instructions! So memory optimizing can often pay off, also with respect to time! Of course, Java does a lot of memory management behind the scenes, so optimizing such can be harder... But when writing in C or C++, it does make sense. I've seen this stated often. But color me skeptical. If what you say is true, then the MC-AMAF bot, which hardly uses any memory at all and which fits in the secondary cache entirely, code and data combined, should show an enormous speed-up in terms of number of playouts per second. Instead it does only about twice the number of playouts per second as the tree-search version, which can be almost if not entirely explained by the extra work that needs to be done to traverse and maintain the tree. If this is because I use Java, then Don's concise C implementation of the MC-AMAF bot should be a lot faster than my bloated Java version. Which, ahem, is not the case at all. I think a well-crafted C program can be up to twice as fast as a similar Java program. But I doubt that has much to do with cache misses. I think in computer-Go, trying to control cache misses is futile no matter the language. Maybe you can do it for a relatively trivial, well-understood and stable sub-part. But then you risk losing the gain again as soon as it gets incorporated into a larger whole. So your efforts go to waste. So this is in the show me category for me. You can use C or C++ if you like, but show me a tree-search bot that speeds up the number of playouts significantly by reducing the node-size. Maybe it's possible, but I don't believe it until I see it. This could be different for Chess programs, at least I've seen it claimed often. But I don't envy them, it must be excruciatingly frustrating to deal with. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ signature.asc Description: This is a digitally signed message part ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Tree Search reference bot
On Wed, 2008-12-03 at 11:24 -0200, Mark Boon wrote: If this is because I use Java, then Don's concise C implementation of the MC-AMAF bot should be a lot faster than my bloated Java version. I don't remember the numbers, but my own java and C implementation were written in the same style - same basic data structure and just about the only difference was language. I did do some kind of list trick using pointers than you cannot do in java, at least not easily and efficiently but I don't think it made that much of a difference. The Java version was not that bad for speed compared to C, I think C was something like 2/3 of the speed or close to that.But if you got into heavy duty data structures Java hurts - you won't get as big a tree in Java for instance and I'm sure caching will be a bigger issue. - Don signature.asc Description: This is a digitally signed message part ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Tree Search reference bot
I have made some minor performance improvements and this is as far as I intend to take this particular project. I might make some small changes if necessary, but most likely I'll leave this largely unchanged from now. I had set myself as an arbitrary goal that it should do at least 20K playouts. But with real liberties, AMAF and a RAVE formula I got stuck in the 16K-17K range. According to my profiler that is mostly due to the expensive formula used to compare nodes, where it says it spends 25% of total time. The formula I'm using is: beta * (virtual-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT) beta = 1 - log(parent-visits) / 20 UCT = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) ) RAVE = exploration-factor *sqrt( log(parent-visits) / (nr-virtual- visits+1) ) There are probably quite a few possibilities still to tune this program with regards to playing strength and performance. But I felt it doesn't help to obscure the implementation by too many details. The implementation of the search algorithm was entirely game- independent, until I introduced AMAF. I didn't see how to fix that, as there's no way getting around that it's based on the fact that a move is represented by a single coordinate, which is mapped to an array to store the statistical values. But strip the AMAF part, which is a block of 30 odd lines of code, and I think it can be used for other games basically as-is. I did this not because I ever see myself program another game, but because in my experience in doing so I get a cleaner separation between modules. At 2,000 playouts, it's still quite a bit weaker than the plain MC- AMAF refbot. It wins only about 33%. But that's probably because the 1,000-2,000 playouts range is the sweet-spot for that particular type of playing engine. It doesn't scale from there, whereas the MCTS ref- bot only just gets warmed up with 2,000 playouts. This leads me to a question. I suppose it might be of some interest to put this bot up on CGOS. But what parameters to use? The main one being the number of playouts, naturally. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Tree Search reference bot
There is another speedup trick that might interest you. IIRC Lukasz Lew came up with it, but I forget what he called it. After calculating the selected move for an internal node (going through UCT and RAVE and whatnot) store that move. Then, for the next N visits to that node (where N is 5 or 10 or so), just repeat that move without having to calculate what the move would be. - Dave Hillis -Original Message- From: Mark Boon [EMAIL PROTECTED] To: computer-go computer-go@computer-go.org Sent: Tue, 2 Dec 2008 11:17 pm Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot I have made some minor performance improvements and this is as far as I intend to take this particular project. I might make some small changes if necessary, but most likely I'll leave this largely unchanged from now.? ? I had set myself as an arbitrary goal that it should do at least 20K playouts. But with real liberties, AMAF and a RAVE formula I got stuck in the 16K-17K range. According to my profiler that is mostly due to the expensive formula used to compare nodes, where it says it spends 25% of total time. The formula I'm using is:? ? ? beta * (virtual-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT)? ? ? beta = 1 - log(parent-visits) / 20? ? UCT = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) )? ? RAVE = exploration-factor *sqrt( log(parent-visits) / (nr-virtual-visits+1) )? ? There are probably quite a few possibilities still to tune this program with regards to playing strength and performance. But I felt it doesn't help to obscure the implementation by too many details.? ? The implementation of the search algorithm was entirely game-independent, until I introduced AMAF. I didn't see how to fix that, as there's no way getting around that it's based on the fact that a move is represented by a single coordinate, which is mapped to an array to store the statistical values. But strip the AMAF part, which is a block of 30 odd lines of code, and I think it can be used for other games basically as-is. I did this not because I ever see myself program another game, but because in my experience in doing so I get a cleaner separation between modules.? ? At 2,000 playouts, it's still quite a bit weaker than the plain MC-AMAF refbot. It wins only about 33%. But that's probably because the 1,000-2,000 playouts range is the sweet-spot for that particular type of playing engine. It doesn't scale from there, whereas the MCTS ref-bot only just gets warmed up with 2,000 playouts.? ? This leads me to a question. I suppose it might be of some interest to put this bot up on CGOS. But what parameters to use? The main one being the number of playouts, naturally.? ? Mark? ___? computer-go mailing list? [EMAIL PROTECTED] http://www.computer-go.org/mailman/listinfo/computer-go/? ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Tree Search reference bot
That's going to repeat the same exact path through the tree three times, isn't it? If so, it seems like it would be more efficient to do N playouts from the leaf after each traversal of the tree. [EMAIL PROTECTED] wrote: There is another speedup trick that might interest you. IIRC Lukasz Lew came up with it, but I forget what he called it. After calculating the selected move for an internal node (going through UCT and RAVE and whatnot) store that move. Then, for the next N visits to that node (where N is 5 or 10 or so), just repeat that move without having to calculate what the move would be. - Dave Hillis -Original Message- From: Mark Boon [EMAIL PROTECTED] To: computer-go computer-go@computer-go.org Sent: Tue, 2 Dec 2008 11:17 pm Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot I have made some minor performance improvements and this is as far as I intend to take this particular project. I might make some small changes if necessary, but most likely I'll leave this largely unchanged from now. I had set myself as an arbitrary goal that it should do at least 20K playouts. But with real liberties, AMAF and a RAVE formula I got stuck in the 16K-17K range. According to my profiler that is mostly due to the expensive formula used to compare nodes, where it says it spends 25% of total time. The formula I'm using is: beta * (virtual-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT) beta = 1 - log(parent-visits) / 20 UCT = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) ) RAVE = exploration-factor *sqrt( log(parent-visits) / (nr-virtual-visits+1) ) There are probably quite a few possibilities still to tune this program with regards to playing strength and performance. But I felt it doesn't help to obscure the implementation by too many details. The implementation of the search algorithm was entirely game-independent, until I introduced AMAF. I didn't see how to fix that, as there's no way getting around that it's based on the fact that a move is represented by a single coordinate, which is mapped to an array to store the statistical values. But strip the AMAF part, which is a block of 30 odd lines of code, and I think it can be used for other games basically as-is. I did this not because I ever see myself program another game, but because in my experience in doing so I get a cleaner separation between modules. At 2,000 playouts, it's still quite a bit weaker than the plain MC-AMAF refbot. It wins only about 33%. But that's probably because the 1,000-2,000 playouts range is the sweet-spot for that particular type of playing engine. It doesn't scale from there, whereas the MCTS ref-bot only just gets warmed up with 2,000 playouts. This leads me to a question. I suppose it might be of some interest to put this bot up on CGOS. But what parameters to use? The main one being the number of playouts, naturally. Mark ___ computer-go mailing list computer-go@computer-go.org mailto:computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ Tis the season to save your money! Get the new AOL Holiday Toolbar http://toolbar.aol.com/holiday/download.html?ncid=emlweusdown0008 for money saving offers and gift ideas. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Tree Search reference bot
Hmm.. it could be that N is picked randomly each time... now I can't seem to find the description and my memory is not serving me well here. Perhaps someone else remembers? I'll try to track it down. - Dave Hillis -Original Message- From: Michael Williams [EMAIL PROTECTED] To: computer-go computer-go@computer-go.org Sent: Wed, 3 Dec 2008 12:14 am Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot That's going to repeat the same exact path through the tree three times, isn't it? If so, it seems like it would be more efficient to do N playouts from the leaf after each traversal of the tree.? ? [EMAIL PROTECTED] wrote:? There is another speedup trick that might interest you. IIRC Lukasz Lew came up with it, but I forget what he called it. After calculating the selected move for an internal node (going through UCT and RAVE and whatnot) store that move. Then, for the next N visits to that node (where N is 5 or 10 or so), just repeat that move without having to calculate what the move would be.? - Dave Hillis? -Original Message-? From: Mark Boon [EMAIL PROTECTED]? To: computer-go computer-go@computer-go.org? Sent: Tue, 2 Dec 2008 11:17 pm? Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot? I have made some minor performance improvements and this is as far as I intend to take this particular project. I might make some small changes if necessary, but most likely I'll leave this largely unchanged from now. I had set myself as an arbitrary goal that it should do at least 20K playouts. But with real liberties, AMAF and a RAVE formula I got stuck in the 16K-17K range. According to my profiler that is mostly due to the expensive formula used to compare nodes, where it says it spends 25% of total time. The formula I'm using is: beta * (virtual-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT) beta = 1 - log(parent-visits) / 20 UCT = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) ) RAVE = exploration-factor *sqrt( log(parent-visits) / (nr-virtual-visits+1) ) There are probably quite a few possibilities still to tune this program with regards to playing strength and performance. But I felt it doesn't help to obscure the implementation by too many details. The implementation of the search algorithm was entirely game-independent, until I introduced AMAF. I didn't see how to fix that, as there's no way getting around that it's based on the fact that a move is represented by a single coordinate, which is mapped to an array to store the statistical values. But strip the AMAF part, which is a block of 30 odd lines of code, and I think it can be used for other games basically as-is. I did this not because I ever see myself program another game, but because in my experience in doing so I get a cleaner separation between modules. At 2,000 playouts, it's still quite a bit weaker than the plain MC-AMAF refbot. It wins only about 33%. But that's probably because the 1,000-2,000 playouts range is the sweet-spot for that particular type of playing engine. It doesn't scale from there, whereas the MCTS ref-bot only just gets warmed up with 2,000 playouts. This leads me to a question. I suppose it might be of some interest to put th is bot up on CGOS. But what parameters to use? The main one being the number of playouts, naturally. Mark ___ computer-go mailing list computer-go@computer-go.org mailto:computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ? Tis the season to save your money! Get the new AOL Holiday Toolbar http://toolbar.aol.com/holiday/download.html?ncid=emlweusdown0008 for money saving offers and gift ideas.? ? ___? computer-go mailing list? [EMAIL PROTECTED] http://www.computer-go.org/mailman/listinfo/computer-go/? ? ___? computer-go mailing list? [EMAIL PROTECTED] http://www.computer-go.org/mailman/listinfo/computer-go/? ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] Monte-Carlo Tree Search reference bot
My own experiments showed a very big increase in strength especially with about 1000 light-playouts. I had many features and parameters witch may make a difference. But my main hypothesis was that it shouldn't ... I think that i weighted real simulations (aka first move) 10 times more than virtual one (aka AMAF moves). I assume that by expansions you mean allowing data for every child of a node. So let's assume that your root node is called A, you would then allocate for every of it's child. Then when make up a simulation running from A, you would first apply the UCT formula for determining wich first move you should make ... then you would both get back the AMAF score for every moves that has been played during the simulation for virtual scoring AND you would use the true first move for Real scoring (normal UCT part) Of course, this shouldn't behave exactly like AMAF, because for one thing you can get a better judgment for the first move because of UCT, and then you also make the first move weight more. (although you probably technically should also do so with AMAF despite that the first move is still 100% random. Which is part of what the weighted AMAF does. But first move weight would be arbitrary thus defying the purpose of a standard way) Last, you could try to downgrade the weight of the virtual score part as the number of simulation grows. like with a (if nbsim 1000) winRatio=[realRatio *(1000+nbsim)]/3000+[virtualRatio*(2000-(1000+nbsim))]/3000 I'm not sure about the formula :) but you get the idea of taking more and more into account the realRatio. I think that what i did was that i didn't take the realRatio into accout at all at first for estimating the winRatio, and then increased it linearly to the point where i wouldn't take into accout the virtualRatio. The drawback with all that, is that you will certainly have to make several versions, although they would share a lot of code. I don't think that trying to get One version behave as several is a good idea. You probably should keep the code safe, once you have made a lot of testing of it. For reference purpose. Then you'll be able to exactly know the strength of a particular piece of software, even in two years from now. I din't do that. I tried to make it evolve. And with poor svn versionning capacity, and poor management of the measures/software link : i'm now to a point where all i can do is make random guesses, although i made millions of experiments in the last past years :) I just can't link those experiments back to a fixed piece of software. Then, if by any means, you think a code is bug free. Which is hard to bet on, you really should put that somewhere, with all the proof (and experiments) you can associate with it. (Even on a web page if you ask me , or a thesis) To me that is the most usefull part of the standard ref bot : because a lot of persons have tried it, you can be sure of the exact link between the software, and the experimental results. It makes assertions reproducibles, and that's really great. From: [EMAIL PROTECTED] Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot Date: Fri, 28 Nov 2008 01:48:09 -0200 To: computer-go@computer-go.org CC: On 27-nov-08, at 19:50, Denis fidaali wrote: So, you use AMAF for simulating the first UCT evaluations ? I though the classical way to use AMAF, was to affect only the win/lose ratio portion of the uct equation. Obvioulsy it should be allowed to use an arbitrary large number of AMAF simulation accumulating them longer than what it take to expand a node. I think that a classical way to affect the win/ratio is to decrease the effect of the AMAF correction as the number of simulation grows. If you test with a very low number of simulation (in the 1000 - 3000 range), i think you should be able to get out a very nice improvement out of the AMAF version. If you don't, i would think that something is wrong somewhere. What test process do you use for this version ? I tested it mostly doing 2,000 playouts. When AMAF is true I create a map of virtual-win values of all the moves played during a playout. These values get accumulated over all the playouts (not just the first ones). The 'virtual-value' of a move is calculated as follows: exploration-factor * UCT + ( (nr-wins*2 + nr-virtual-wins) / (nr- playouts*2 + nr-virtual-playouts)) where the exploration-factor is currently sqrt(0.2) and UCT is sqrt ( log( nr-parent-playouts ) / ( nr-playouts+1) ) Like I said, I haven't had time to experiment much so this formula may not be any good. I had also expected to see some positive effect of the virtual-win / virtual-playout ratio from AMAF, but I see none. Of course it's also possible I have a different kind of bug still. What happens in my 'formula' is that when it never expands beyond the first level, which is what happens if the number of simulations is equal
Re: [computer-go] Monte-Carlo Tree Search reference bot
: because a lot of persons have tried it, you can be sure of the exact link between the software, and the experimental results. It makes assertions reproducibles, and that's really great. From: [EMAIL PROTECTED] Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot Date: Fri, 28 Nov 2008 01:48:09 -0200 To: computer-go@computer-go.org CC: On 27-nov-08, at 19:50, Denis fidaali wrote: So, you use AMAF for simulating the first UCT evaluations ? I though the classical way to use AMAF, was to affect only the win/lose ratio portion of the uct equation. Obvioulsy it should be allowed to use an arbitrary large number of AMAF simulation accumulating them longer than what it take to expand a node. I think that a classical way to affect the win/ratio is to decrease the effect of the AMAF correction as the number of simulation grows. If you test with a very low number of simulation (in the 1000 - 3000 range), i think you should be able to get out a very nice improvement out of the AMAF version. If you don't, i would think that something is wrong somewhere. What test process do you use for this version ? I tested it mostly doing 2,000 playouts. When AMAF is true I create a map of virtual-win values of all the moves played during a playout. These values get accumulated over all the playouts (not just the first ones). The 'virtual-value' of a move is calculated as follows: exploration-factor * UCT + ( (nr-wins*2 + nr-virtual-wins) / (nr- playouts*2 + nr-virtual-playouts)) where the exploration-factor is currently sqrt(0.2) and UCT is sqrt ( log( nr-parent-playouts ) / ( nr-playouts+1) ) Like I said, I haven't had time to experiment much so this formula may not be any good. I had also expected to see some positive effect of the virtual-win / virtual-playout ratio from AMAF, but I see none. Of course it's also possible I have a different kind of bug still. What happens in my 'formula' is that when it never expands beyond the first level, which is what happens if the number of simulations is equal to the number of simulations before expansion, the virtual- value becomes completely determined by nr-virtual-wins / nr-virtual- playouts making it equivalent to the original ref-bot. In case it does expand further and creates a tree, the actual win-loss ratio is weighed twice as heavily as the virtual win-loss ratio. That seemed like a reasonable first try. I have tried a few others, but usually didn't get much different results or much worse results. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ Souhaitez vous « être au bureau sans y être » ? Oui je le veux ! ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] Monte-Carlo Tree Search reference bot
So, you use AMAF for simulating the first UCT evaluations ? I though the classical way to use AMAF, was to affect only the win/lose ratio portion of the uct equation. Obvioulsy it should be allowed to use an arbitrary large number of AMAF simulation accumulating them longer than what it take to expand a node. I think that a classical way to affect the win/ratio is to decrease the effect of the AMAF correction as the number of simulation grows. If you test with a very low number of simulation (in the 1000 - 3000 range), i think you should be able to get out a very nice improvement out of the AMAF version. If you don't, i would think that something is wrong somewhere. What test process do you use for this version ? From: [EMAIL PROTECTED] Date: Thu, 27 Nov 2008 18:15:34 -0200 To: computer-go@computer-go.org CC: Subject: [computer-go] Monte-Carlo Tree Search reference bot I have added a MCTS implementation to my reference-bot project. It allows you to specify how many nodes to search, after how many nodes to expand and whether to use AMAF or not. If you specify the number of nodes before expansion to be the same as the total number of nodes to search and set AMAF to true you get the equivalent of Don's original ref-bot. When you set AMAF to false and set the number of nodes before epxansion to 1 you get my original UCT search algorithm. Between those extremes there might be a good formula to combine AMAF with tree-search, but unfortunately I haven't had time lately to look for one. The few small attempts I made show no benefit using AMAF in tree-search, only when used on a single level. The contrast between the two exptremes is very stark, so I'm actually convinced there should be a way to use both. This implementation is also quite a bit slower than my original search algorithm but I also didn't have time yet to trace it. It might simply be due to the different expansion method, which is much more expensive with a value of 1. Also, searching for the best UCT node gets more expensive with more (unused) nodes on each level. Using a higher expansion value may alleviate the performance hit. Anyway I think this is a reasonable starting point. At first I intended to create a different project for the search reference bot. But half of the code (the MC part) is the same. And I don't want to end up having to maintain the same code in two places. I also didn't want to separate out some code into a separate library and making the structure for the simple ref-bot more complicated. This organization may need some more thought though. I'll update the project pages tomorrow. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ _ Email envoyé avec Windows Live Hotmail. Dites adieux aux spam et virus, passez à Hotmail ! C'est gratuit ! http://www.windowslive.fr/hotmail/default.asp___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] Monte carlo play
Hello Tony, I'm just speaking for myself here, but I suspect you may not get the answers you are looking for, because you're not asking the right questions. I get the impression that the situation is similar to this scenario: A young guy with a surf board and a hawai shirt turns up at basecamp on mount Everest and says: Hey dudes, what's up! This morning I had this wonderful idea: I wanna climb the mount Everest! So here I am, and I even brought my hiking shoes! But can anyone give me directions to the nearest McDonalds, cuz I'm starving. Everybody in the tent stares at him speechlessy. I may be wrong, but that's my impression. I'm not even sure whether if I myself should be here at basecamp, even though I am a 4d in Go and even though I have 25 years of programming experience in a couple of different languages (including the last 8 years as a fulltime professional developer.) It's only logical that ever since I've learned Go 20 years ago I've had a special interest in computer-go and I spent some time over the years attempting to build a strong go playing program myself (although not nearly as much time as some others here). I'm a mountaineer, but climbing the mount Everest is something else. I feel a bit like a rooky between all the seasoned mountaineers hanging around here at the Everest basecamp. Most of the time I keep my mouth shut when one of them is talking, because I'm a bit apprehensive that I might say something that is utterly obvious to the others or even outright stupid. And then this guy turns up in his hawai shirt. I hope you don't feel offended. Indeed you took up a wonderful endeavour, but I sense that you're not quite ready to go for the summit today. Best Regards, Dave Van: [EMAIL PROTECTED] namens tony tang Verzonden: do 13-11-2008 22:03 Aan: go mailing list Onderwerp: RE: [computer-go] Monte carlo play Thanks Darren, After reading a couple of interesting papers on this field I am applying pattern matching to limit the number of possibilities of the monte carlo tree (on 19x19 it appears to take a day to play). I was wondering if you could comment on weather my following concept is right on the evalution part. The possible patterns for the current situation are passed to a class where self play begins. Each different pattern runs on a instance, and at the end the results are written into a file. The game tree will be constructed using the stats from the file. Thanks again Tony Date: Wed, 12 Nov 2008 10:34:21 +0900 From: [EMAIL PROTECTED] To: computer-go@computer-go.org Subject: Re: [computer-go] Monte carlo play I am new to programming go, could some one explain to me how a monte carlo based evalution manages to play random games by itself? ie: who/what is the oppoent which supplies the opposing moves which allows another move to be randomly played after making the initial It is self-play, so both players are random. I'm not aware of any programs that use different playout strategies for black/white, though it has been briefly mentioned before on this list, and I personally think it might be interesting as a way of discovering and more accurately evaluating the imbalances present in most go positions (i.e. usually one side has more strength and one side has more territory, and so to correctly understand the position one side has to play aggressive moves and the other has to play defensive moves). move at the root. I am implementing in java, is there a package/framework which allows me to train my software. This page, recently started by Eric Marchand, might be a good starting place: http://ricoh51.free.fr/go/engineeng.htm Darren -- Darren Cook, Software Researcher/Developer http://dcook.org/mlsn/ (English-Japanese-German-Chinese-Arabic open source dictionary/semantic network) http://dcook.org/work/ (About me and my work) http://dcook.org/blogs.html (My blogs and articles) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ Get the best wallpapers on the Web - FREE. Click here! http://wallpapers.msn.com/?ocid=[B001MSN42A0716B] ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte carlo play
In message [EMAIL PROTECTED], [EMAIL PROTECTED] writes Hawaiian shirt analogy snipped I hope you don't feel offended. Indeed you took up a wonderful endeavour, but I sense that you're not quite ready to go for the summit today. But Tony never expressed any interest in going for the summit. Maybe he just wants to write a simple MC implementation. Here is what he wrote: I am new to programming go, could some one explain to me how a monte carlo based evalution manages to play random games by itself? ie: who/what is the oppoent which supplies the opposing moves which allows another move to be randomly played after making the initial move at the root. I am implementing in java, is there a package/framework which allows me to train my software. I will try to answer these questions myself. who/what is the oppoent which supplies the opposing moves It is the same piece of your program that supplies your own moves. For each random game or rollout, your program places both black and white stones, alternately, at random. is there a package/framework which allows me to train my software. Nothing trains your software. This isn't a neural net. The way it works is: you write a program A: it plays, badly you modify it so it plays differently, and with luck, better repeat from A Nick -- Nick Wedd[EMAIL PROTECTED] ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] Monte carlo play
You are absolutely right. Tony did not imply he wanted to go for the summit today. Also, Tony's qualifications are surely a lot better than a hawaiian shirt and a surfboard. My analogy was an exaggeration with the intention of explaining to Tony that he has more catching up to do than he might have expected. By no means was my intention to put Tony down. I hope Tony understands this and I sincerely hope that Tony won't give up on joining us in this wonderful endeavour, because we need all the help we can get. Best Regards, Dave Van: [EMAIL PROTECTED] namens Nick Wedd Verzonden: za 15-11-2008 15:48 Aan: computer-go Onderwerp: Re: [computer-go] Monte carlo play In message [EMAIL PROTECTED], [EMAIL PROTECTED] writes Hawaiian shirt analogy snipped I hope you don't feel offended. Indeed you took up a wonderful endeavour, but I sense that you're not quite ready to go for the summit today. But Tony never expressed any interest in going for the summit. Maybe he just wants to write a simple MC implementation. Here is what he wrote: I am new to programming go, could some one explain to me how a monte carlo based evalution manages to play random games by itself? ie: who/what is the oppoent which supplies the opposing moves which allows another move to be randomly played after making the initial move at the root. I am implementing in java, is there a package/framework which allows me to train my software. I will try to answer these questions myself. who/what is the oppoent which supplies the opposing moves It is the same piece of your program that supplies your own moves. For each random game or rollout, your program places both black and white stones, alternately, at random. is there a package/framework which allows me to train my software. Nothing trains your software. This isn't a neural net. The way it works is: you write a program A: it plays, badly you modify it so it plays differently, and with luck, better repeat from A Nick -- Nick Wedd[EMAIL PROTECTED] ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] Monte carlo play
Just a quickstart here, and I'm not one of the real experts, so it may contain inaccuracies. Perhaps you already know all this, perhaps not. The basis of many game-playing programs (like chess programs) is the minimax (or negamax) algorithm. It is a simple algorithm by itself and you can find a lot of background and examples for chess or other games if you google it. Basically, minimax explores all possible variations stemming from the current position and then selects the best move. This algorithm works in principle, but there is a problem with it: For interesting games like chess and go, it will usually take the program billions of years to find the best move, given a game position. In the past decades several adaptations of the minimax algorithm were invented to enable the algorithm to find a reasonably good move in a timespan that humans are prepared to wait for (like evaluation functions, alfabeta pruning and transposition tables). The program keeps investigating possibilities for some time and then the move seeming best so far is played. The longer the program is allowed to think (and the faster the computer), the better the move it will come up with. These adaptations work by reducing the number of possibilities the program explores, without weakening the program too much. The program must ignore many possibilities to save time (this is called pruning), but the more possibilities it ignores, the higher the risk that it accidentally ignores the best move. Also, there is allways the danger of the horizon effect: The program misinterprets the situation because it simply doesn't get enough time to fully investigate every possibility. The challenge for the programmer is to strike an optimal balance between time consumption and losing by accidentally ignoring or misinterpreting the best move. Some game programs are even able to finetune themselves to optimize this balance over time (learning). A combination of these adaptations have proven to be well enough to enable chess programs on current hardware to beat any human with reasonable thinking time. But computer-go is a harder problem than computer-chess. The adaptations that are succesful in computer-chess are just not enough for computer-go, mainly because there ar far more possibilities (and I mean FAR more) to explore in go. But now there is a recent adaptation that has proven to be fairly succesful in computer-go: MonteCarlo playouts. The program reduces the work to be done by gambling a number of times and measuring the average payoff. If the average payoff is good, the move is assumed to be good for the time being. Also, it doesn't automatically ignore moves with a low payoff (remember the risk of ignoring the best move). The program just postpones further investigation of seemingly bad moves until the moves seeming good initially give more and more dissapointing payoffs after further investigation. Although this invention has had only moderate success in computer-chess, it has proven to be fairly succesful in computer-go. But at the heart of a MonteCarlo go program, you can still recognize the minimax algorithm. Best Regards, Dave Van: [EMAIL PROTECTED] namens tony tang Verzonden: wo 12-11-2008 2:23 Aan: computer-go@computer-go.org Onderwerp: [computer-go] Monte carlo play Greetings all go gurus, I am new to programming go, could some one explain to me how a monte carlo based evalution manages to play random games by itself? ie: who/what is the oppoent which supplies the opposing moves which allows another move to be randomly played after making the initial move at the root. I am implementing in java, is there a package/framework which allows me to train my software. Thank you Kind regards Tony Read amazing stories to your kids on Messenger Try it Now! http://clk.atdmt.com/UKM/go/117588488/direct/01/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte carlo play
I am new to programming go, could some one explain to me how a monte carlo based evalution manages to play random games by itself? ie: who/what is the oppoent which supplies the opposing moves which allows another move to be randomly played after making the initial It is self-play, so both players are random. I'm not aware of any programs that use different playout strategies for black/white, though it has been briefly mentioned before on this list, and I personally think it might be interesting as a way of discovering and more accurately evaluating the imbalances present in most go positions (i.e. usually one side has more strength and one side has more territory, and so to correctly understand the position one side has to play aggressive moves and the other has to play defensive moves). move at the root. I am implementing in java, is there a package/framework which allows me to train my software. This page, recently started by Eric Marchand, might be a good starting place: http://ricoh51.free.fr/go/engineeng.htm Darren -- Darren Cook, Software Researcher/Developer http://dcook.org/mlsn/ (English-Japanese-German-Chinese-Arabic open source dictionary/semantic network) http://dcook.org/work/ (About me and my work) http://dcook.org/blogs.html (My blogs and articles) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo and Japanese rules
Ingo Althöfer wrote: Hello all, two questions. (i) Do there exist strong 9x9-go programs on Monte-Carlo base for Japanese rules? (ii) Having available only programs for Chinese rules, but playing in a tournament with Japanese rules, which special tricks and settings should be used to maximise winning chances? (This is meant especially in the light of MC's tendency to win games by 0.5 points according to the rules implemented.) Ingo. Hi Ingo, The standard trick is to pass as soon as your opponent passes. For this to work, you need to take a one-point security margin with the komi. You should also score seki the Japanese way. That is how Crazy Stone played the UEC Cup and the matches at FIT2008. In fact, for the UEC cup, Crazy Stone did not understand seki, so I took a bigger security margin with the komi. But a bigger security margin is not good for 9x9. Rémi ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo and Japanese rules
On Thu, 2008-11-06 at 09:19 +0100, Ingo Althöfer wrote: Hello all, two questions. (i) Do there exist strong 9x9-go programs on Monte-Carlo base for Japanese rules? (ii) Having available only programs for Chinese rules, but playing in a tournament with Japanese rules, which special tricks and settings should be used to maximise winning chances? (This is meant especially in the light of MC's tendency to win games by 0.5 points according to the rules implemented.) I've thought about those questions myself from time to time. Let me think out loud concerning this. I am by know means an expert in Japanese scoring or even GO in general, so I'm just giving some thoughts here and a plan for building a Japanese simple bot that you can be free to criticize: It seems to me the primary difference between the two is knowing when to stop playing and of course scoring dead groups. The Chinese style bots do not technically need to know about scoring. You can look at the combined statistics at the end of the games for a given point to get a sense of whether that point is still in play or whether it's a forgone conclusion. You can do the same to determine dead groups. I don't know how well that works in all cases, but I have used it and it works pretty well. But we also want to recognize dame, and not play to dame points early in the game even if it doesn't affect the final Chinese outcome. So here is my idea: 1. If ALL the stones of a particular group belong to the opponent with high certainty, they are dead. 2. If there are open spaces that belong to you or the opponent with high certainty don't move to them. 3. If an uncertain point is touching stones of both colors and both colors have high certainty for the color they belong to, it is probably dame and you shouldn't move to them. example: White has a stone on d4 that is clearly alive. Black has a stone on f4 that is clearly alive. An empty point on e4 is highly uncertain. Do not play to e4 - it is probably dame. question: Is that a reasonably good rule or does it need some work? 4. If you have no moves other than these cases, you should pass. You can test this idea by playing a bot on KGS under Japanese rules. You may have to tweak what you consider your uncertainty margin. Also, I'm not considering seki here but we would want to find a way to cope with that. - Don Ingo. signature.asc Description: This is a digitally signed message part ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo and Japanese rules
I think simplistic handling of Japanese rules should play dame points that connect chains. This avoids some problems that can arise where ownership probability drops after the opponent plays the dame, and a point of territory must get filled. Even if not technically required, I can imagine bots acting like beginners and get nervous over imagined vulnerabilites. Sent from my iPhone On Nov 6, 2008, at 9:12 AM, Don Dailey [EMAIL PROTECTED] wrote: On Thu, 2008-11-06 at 09:19 +0100, Ingo Althöfer wrote: Hello all, two questions. (i) Do there exist strong 9x9-go programs on Monte-Carlo base for Japanese rules? (ii) Having available only programs for Chinese rules, but playing in a tournament with Japanese rules, which special tricks and settings should be used to maximise winning chances? (This is meant especially in the light of MC's tendency to win games by 0.5 points according to the rules implemented.) I've thought about those questions myself from time to time. Let me think out loud concerning this. I am by know means an expert in Japanese scoring or even GO in general, so I'm just giving some thoughts here and a plan for building a Japanese simple bot that you can be free to criticize: It seems to me the primary difference between the two is knowing when to stop playing and of course scoring dead groups. The Chinese style bots do not technically need to know about scoring. You can look at the combined statistics at the end of the games for a given point to get a sense of whether that point is still in play or whether it's a forgone conclusion. You can do the same to determine dead groups. I don't know how well that works in all cases, but I have used it and it works pretty well. But we also want to recognize dame, and not play to dame points early in the game even if it doesn't affect the final Chinese outcome. So here is my idea: 1. If ALL the stones of a particular group belong to the opponent with high certainty, they are dead. 2. If there are open spaces that belong to you or the opponent with high certainty don't move to them. 3. If an uncertain point is touching stones of both colors and both colors have high certainty for the color they belong to, it is probably dame and you shouldn't move to them. example: White has a stone on d4 that is clearly alive. Black has a stone on f4 that is clearly alive. An empty point on e4 is highly uncertain. Do not play to e4 - it is probably dame. question: Is that a reasonably good rule or does it need some work? 4. If you have no moves other than these cases, you should pass. You can test this idea by playing a bot on KGS under Japanese rules. You may have to tweak what you consider your uncertainty margin. Also, I'm not considering seki here but we would want to find a way to cope with that. - Don Ingo. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo and Japanese rules
On Thu, 2008-11-06 at 10:44 -0500, Jason House wrote: I think simplistic handling of Japanese rules should play dame points that connect chains. This avoids some problems that can arise where ownership probability drops after the opponent plays the dame, and a point of territory must get filled. I'm trying to think of a counter example where my rule will not work. Of course this has to be understood within the context of the playing strength of the bot itself, so we don't expect extremely sophisticated handling of special cases. In my example, both black and white have clearly live stones and there is an empty point touching both of them. Assuming that the map is correct of course and both groups are alive, then I suppose a connecting move could cause a weaker group to live. So that suggests another rule - in addition to the original condition of at least 1 white and black live group touching, there should be no connections to a group that is not alive. A group that is alive passes some thresehold of certainty. Otherwise, it is not-alive but not dead. The threshold might be something like 90 - 95% or something but some experimentation would be required to find a good value for this. - Don Even if not technically required, I can imagine bots acting like beginners and get nervous over imagined vulnerabilites. signature.asc Description: This is a digitally signed message part ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo and Japanese rules
IIRC under official Japanese rules at the end of the game all groups with liberties shared between opposing colours are by definition in seki. Therefore eventually (before counting) all dame have to be filled. Further, playing dame points is almost equally bad under Chinese rules as it is under Japanese rules. So, if you have a strong 'Chinese' engine no special tricks are needed at least until you reach the endgame. The only thing that is severely penalized under Japanese rules is playing needless defensive moves inside your own territory while the opponent is passing. Erik On Thu, Nov 6, 2008 at 4:44 PM, Jason House [EMAIL PROTECTED] wrote: I think simplistic handling of Japanese rules should play dame points that connect chains. This avoids some problems that can arise where ownership probability drops after the opponent plays the dame, and a point of territory must get filled. Even if not technically required, I can imagine bots acting like beginners and get nervous over imagined vulnerabilites. Sent from my iPhone On Nov 6, 2008, at 9:12 AM, Don Dailey [EMAIL PROTECTED] wrote: On Thu, 2008-11-06 at 09:19 +0100, Ingo Althöfer wrote: Hello all, two questions. (i) Do there exist strong 9x9-go programs on Monte-Carlo base for Japanese rules? (ii) Having available only programs for Chinese rules, but playing in a tournament with Japanese rules, which special tricks and settings should be used to maximise winning chances? (This is meant especially in the light of MC's tendency to win games by 0.5 points according to the rules implemented.) I've thought about those questions myself from time to time. Let me think out loud concerning this. I am by know means an expert in Japanese scoring or even GO in general, so I'm just giving some thoughts here and a plan for building a Japanese simple bot that you can be free to criticize: It seems to me the primary difference between the two is knowing when to stop playing and of course scoring dead groups. The Chinese style bots do not technically need to know about scoring. You can look at the combined statistics at the end of the games for a given point to get a sense of whether that point is still in play or whether it's a forgone conclusion. You can do the same to determine dead groups. I don't know how well that works in all cases, but I have used it and it works pretty well. But we also want to recognize dame, and not play to dame points early in the game even if it doesn't affect the final Chinese outcome. So here is my idea: 1. If ALL the stones of a particular group belong to the opponent with high certainty, they are dead. 2. If there are open spaces that belong to you or the opponent with high certainty don't move to them. 3. If an uncertain point is touching stones of both colors and both colors have high certainty for the color they belong to, it is probably dame and you shouldn't move to them. example: White has a stone on d4 that is clearly alive. Black has a stone on f4 that is clearly alive. An empty point on e4 is highly uncertain. Do not play to e4 - it is probably dame. question: Is that a reasonably good rule or does it need some work? 4. If you have no moves other than these cases, you should pass. You can test this idea by playing a bot on KGS under Japanese rules. You may have to tweak what you consider your uncertainty margin. Also, I'm not considering seki here but we would want to find a way to cope with that. - Don Ingo. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo and Japanese rules
Although what Don writes is all correct, I understood the question to be rather different. It's not a matter of being able to determine the right score at the end or the right way to play, it's a matter of determining the right score after each playout. For performance reasons MC programs will cut corners there which could be taken advantage of when playing by Japanese rules because the after the playout it is prone to getting the wrong score in certain situations. Mark On 6-nov-08, at 12:12, Don Dailey wrote: On Thu, 2008-11-06 at 09:19 +0100, Ingo Althöfer wrote: Hello all, two questions. (i) Do there exist strong 9x9-go programs on Monte-Carlo base for Japanese rules? (ii) Having available only programs for Chinese rules, but playing in a tournament with Japanese rules, which special tricks and settings should be used to maximise winning chances? (This is meant especially in the light of MC's tendency to win games by 0.5 points according to the rules implemented.) I've thought about those questions myself from time to time. Let me think out loud concerning this. I am by know means an expert in Japanese scoring or even GO in general, so I'm just giving some thoughts here and a plan for building a Japanese simple bot that you can be free to criticize: It seems to me the primary difference between the two is knowing when to stop playing and of course scoring dead groups. The Chinese style bots do not technically need to know about scoring. You can look at the combined statistics at the end of the games for a given point to get a sense of whether that point is still in play or whether it's a forgone conclusion. You can do the same to determine dead groups. I don't know how well that works in all cases, but I have used it and it works pretty well. But we also want to recognize dame, and not play to dame points early in the game even if it doesn't affect the final Chinese outcome. So here is my idea: 1. If ALL the stones of a particular group belong to the opponent with high certainty, they are dead. 2. If there are open spaces that belong to you or the opponent with high certainty don't move to them. 3. If an uncertain point is touching stones of both colors and both colors have high certainty for the color they belong to, it is probably dame and you shouldn't move to them. example: White has a stone on d4 that is clearly alive. Black has a stone on f4 that is clearly alive. An empty point on e4 is highly uncertain. Do not play to e4 - it is probably dame. question: Is that a reasonably good rule or does it need some work? 4. If you have no moves other than these cases, you should pass. You can test this idea by playing a bot on KGS under Japanese rules. You may have to tweak what you consider your uncertainty margin. Also, I'm not considering seki here but we would want to find a way to cope with that. - Don Ingo. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo and Japanese rules
On Thu, 2008-11-06 at 17:10 +0100, Erik van der Werf wrote: IIRC under official Japanese rules at the end of the game all groups with liberties shared between opposing colours are by definition in seki. Therefore eventually (before counting) all dame have to be filled. Further, playing dame points is almost equally bad under Chinese rules as it is under Japanese rules. So, if you have a strong 'Chinese' engine no special tricks are needed at least until you reach the endgame. The only thing that is severely penalized under Japanese rules is playing needless defensive moves inside your own territory while the opponent is passing. The dame rule I assumed was needed for an MC bot playing because of the way MC bots play badly when winning or losing. The idea was to focus on what fights were left on the board so that you don't lose due to a Japanese technicality. But it's now not clear to me that my rule would fix that. Maybe it should be left out.I'm also not sure how my rule would affect seki positions. - Don signature.asc Description: This is a digitally signed message part ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo and Japanese rules
On Nov 6, 2008, at 11:09 AM, Don Dailey [EMAIL PROTECTED] wrote: On Thu, 2008-11-06 at 10:44 -0500, Jason House wrote: I think simplistic handling of Japanese rules should play dame points that connect chains. This avoids some problems that can arise where ownership probability drops after the opponent plays the dame, and a point of territory must get filled. I'm trying to think of a counter example where my rule will not work. Of course this has to be understood within the context of the playing strength of the bot itself, so we don't expect extremely sophisticated handling of special cases. Consider a miai to connect a chain with two liberties. It's easy to get the ownership correct in all heavy playouts. If one of the two miai points is dame, your rule would say not to play it. If the opponent plays there, the chain ends up in atari... In my example, both black and white have clearly live stones and there is an empty point touching both of them. Assuming that the map is correct of course and both groups are alive, then I suppose a connecting move could cause a weaker group to live. So that suggests another rule - in addition to the original condition of at least 1 white and black live group touching, there should be no connections to a group that is not alive. A group that is alive passes some thresehold of certainty. Otherwise, it is not-alive but not dead. The threshold might be something like 90 - 95% or something but some experimentation would be required to find a good value for this. - Don Even if not technically required, I can imagine bots acting like beginners and get nervous over imagined vulnerabilites. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] Monte-Carlo and Japanese rules
Many Faces of Go's Monte Carlo engine plays strongly using Japanese rules. It's required for sales in American and japan (as AI Igo). I don't use Remi's trick, since there are sometimes points remaining when your opponent passes when playing against weaker players. David -Original Message- From: [EMAIL PROTECTED] [mailto:computer-go- [EMAIL PROTECTED] On Behalf Of Rémi Coulom Sent: Thursday, November 06, 2008 1:02 AM To: computer-go Subject: Re: [computer-go] Monte-Carlo and Japanese rules Ingo Althöfer wrote: Hello all, two questions. (i) Do there exist strong 9x9-go programs on Monte-Carlo base for Japanese rules? (ii) Having available only programs for Chinese rules, but playing in a tournament with Japanese rules, which special tricks and settings should be used to maximise winning chances? (This is meant especially in the light of MC's tendency to win games by 0.5 points according to the rules implemented.) Ingo. Hi Ingo, The standard trick is to pass as soon as your opponent passes. For this to work, you need to take a one-point security margin with the komi. You should also score seki the Japanese way. That is how Crazy Stone played the UEC Cup and the matches at FIT2008. In fact, for the UEC cup, Crazy Stone did not understand seki, so I took a bigger security margin with the komi. But a bigger security margin is not good for 9x9. Rémi ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] Monte-Carlo and Japanese rules
What if the playout uses the AGA rule of paying 1 point for a pass and requiring white to pass last (so the game does not end by two passes if black plays the second pass). Wouldn't the score then be equivalent to the japanese score? Dave Van: [EMAIL PROTECTED] namens Mark Boon Verzonden: do 6-11-2008 17:11 Aan: computer-go Onderwerp: Re: [computer-go] Monte-Carlo and Japanese rules Although what Don writes is all correct, I understood the question to be rather different. It's not a matter of being able to determine the right score at the end or the right way to play, it's a matter of determining the right score after each playout. For performance reasons MC programs will cut corners there which could be taken advantage of when playing by Japanese rules because the after the playout it is prone to getting the wrong score in certain situations. Mark On 6-nov-08, at 12:12, Don Dailey wrote: On Thu, 2008-11-06 at 09:19 +0100, Ingo Althöfer wrote: Hello all, two questions. (i) Do there exist strong 9x9-go programs on Monte-Carlo base for Japanese rules? (ii) Having available only programs for Chinese rules, but playing in a tournament with Japanese rules, which special tricks and settings should be used to maximise winning chances? (This is meant especially in the light of MC's tendency to win games by 0.5 points according to the rules implemented.) I've thought about those questions myself from time to time. Let me think out loud concerning this. I am by know means an expert in Japanese scoring or even GO in general, so I'm just giving some thoughts here and a plan for building a Japanese simple bot that you can be free to criticize: It seems to me the primary difference between the two is knowing when to stop playing and of course scoring dead groups. The Chinese style bots do not technically need to know about scoring. You can look at the combined statistics at the end of the games for a given point to get a sense of whether that point is still in play or whether it's a forgone conclusion. You can do the same to determine dead groups. I don't know how well that works in all cases, but I have used it and it works pretty well. But we also want to recognize dame, and not play to dame points early in the game even if it doesn't affect the final Chinese outcome. So here is my idea: 1. If ALL the stones of a particular group belong to the opponent with high certainty, they are dead. 2. If there are open spaces that belong to you or the opponent with high certainty don't move to them. 3. If an uncertain point is touching stones of both colors and both colors have high certainty for the color they belong to, it is probably dame and you shouldn't move to them. example: White has a stone on d4 that is clearly alive. Black has a stone on f4 that is clearly alive. An empty point on e4 is highly uncertain. Do not play to e4 - it is probably dame. question: Is that a reasonably good rule or does it need some work? 4. If you have no moves other than these cases, you should pass. You can test this idea by playing a bot on KGS under Japanese rules. You may have to tweak what you consider your uncertainty margin. Also, I'm not considering seki here but we would want to find a way to cope with that. - Don Ingo. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] Monte-Carlo and Japanese rules
And of course black should pay 1 point for each extra handicap stone. http://www.britgo.org/rules/compare.html#coun Dave Van: [EMAIL PROTECTED] namens [EMAIL PROTECTED] Verzonden: do 6-11-2008 19:28 Aan: computer-go Onderwerp: RE: [computer-go] Monte-Carlo and Japanese rules What if the playout uses the AGA rule of paying 1 point for a pass and requiring white to pass last (so the game does not end by two passes if black plays the second pass). Wouldn't the score then be equivalent to the japanese score? Dave Van: [EMAIL PROTECTED] namens Mark Boon Verzonden: do 6-11-2008 17:11 Aan: computer-go Onderwerp: Re: [computer-go] Monte-Carlo and Japanese rules Although what Don writes is all correct, I understood the question to be rather different. It's not a matter of being able to determine the right score at the end or the right way to play, it's a matter of determining the right score after each playout. For performance reasons MC programs will cut corners there which could be taken advantage of when playing by Japanese rules because the after the playout it is prone to getting the wrong score in certain situations. Mark On 6-nov-08, at 12:12, Don Dailey wrote: On Thu, 2008-11-06 at 09:19 +0100, Ingo Althöfer wrote: Hello all, two questions. (i) Do there exist strong 9x9-go programs on Monte-Carlo base for Japanese rules? (ii) Having available only programs for Chinese rules, but playing in a tournament with Japanese rules, which special tricks and settings should be used to maximise winning chances? (This is meant especially in the light of MC's tendency to win games by 0.5 points according to the rules implemented.) I've thought about those questions myself from time to time. Let me think out loud concerning this. I am by know means an expert in Japanese scoring or even GO in general, so I'm just giving some thoughts here and a plan for building a Japanese simple bot that you can be free to criticize: It seems to me the primary difference between the two is knowing when to stop playing and of course scoring dead groups. The Chinese style bots do not technically need to know about scoring. You can look at the combined statistics at the end of the games for a given point to get a sense of whether that point is still in play or whether it's a forgone conclusion. You can do the same to determine dead groups. I don't know how well that works in all cases, but I have used it and it works pretty well. But we also want to recognize dame, and not play to dame points early in the game even if it doesn't affect the final Chinese outcome. So here is my idea: 1. If ALL the stones of a particular group belong to the opponent with high certainty, they are dead. 2. If there are open spaces that belong to you or the opponent with high certainty don't move to them. 3. If an uncertain point is touching stones of both colors and both colors have high certainty for the color they belong to, it is probably dame and you shouldn't move to them. example: White has a stone on d4 that is clearly alive. Black has a stone on f4 that is clearly alive. An empty point on e4 is highly uncertain. Do not play to e4 - it is probably dame. question: Is that a reasonably good rule or does it need some work? 4. If you have no moves other than these cases, you should pass. You can test this idea by playing a bot on KGS under Japanese rules. You may have to tweak what you consider your uncertainty margin. Also, I'm not considering seki here but we would want to find a way to cope with that. - Don Ingo. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] Monte-Carlo and Japanese rules
On Thu, 2008-11-06 at 09:43 -0800, David Fotland wrote: Many Faces of Go's Monte Carlo engine plays strongly using Japanese rules. So what do you do in the playouts? Do you score with area or territory? Does your program play optimally where different rules would result in different winner? -Jeff ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo and Japanese rules
I'm sure he meant, Does your program play optimally in trivial situations where different rules would result in a different winner? I'm not sure if your last answer also applies to that, more specific question. David Fotland wrote: I score with area, and adjust for Japanese rules. It doesn't play optimally under and rule set :) David -Original Message- From: [EMAIL PROTECTED] [mailto:computer-go- [EMAIL PROTECTED] On Behalf Of Jeff Nowakowski Sent: Thursday, November 06, 2008 11:57 AM To: computer-go Subject: RE: [computer-go] Monte-Carlo and Japanese rules On Thu, 2008-11-06 at 09:43 -0800, David Fotland wrote: Many Faces of Go's Monte Carlo engine plays strongly using Japanese rules. So what do you do in the playouts? Do you score with area or territory? Does your program play optimally where different rules would result in different winner? -Jeff ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo and Japanese rules
(ii) Having available only programs for Chinese rules, but playing in a tournament with Japanese rules, which special tricks and settings should be used to maximise winning chances? (This is meant especially in the light of MC's tendency to win games by 0.5 points according to the rules implemented.) The difference I've noticed most in analyzing strong 9x9 games (in Japanese rules using a bot playing in Chinese rules) is a half-point ko at the end of the game. In Chinese rules, once all the dame have been filled in, if one side has no ko threats he has to pass, and the opponent fills the ko and gets an extra point. In Japanese rules passing does not cost anything. So the score ends up one point different. In these games what tends to happen is the Chinese-rule-monte-carlo bot, playing for its 0.5pt win, ends up choosing the wrong move 20 moves from the end of the game. If that is as clear as mud let me know and I'll try to hunt up an example game. Darren -- Darren Cook, Software Researcher/Developer http://dcook.org/mlsn/ (English-Japanese-German-Chinese-Arabic open source dictionary/semantic network) http://dcook.org/work/ (About me and my work) http://dcook.org/blogs.html (My blogs and articles) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte Carlo evaluation in chess
Álvaro Begué wrote: On Sun, Jul 20, 2008 at 3:40 AM, Rémi Coulom [EMAIL PROTECTED] wrote: Rybka 3 has Monte-Carlo evaluation: http://www.chessbase.com/newsdetail.asp?newsid=4772 If I understand the release note correctly, Monte Carlo Analysis is something like a feature of the GUI for analyzing a position and giving statistics back to the user, not part of the evaluation function used in the search. Correct. They play a lot of very fast games from a position and look at the winning rate. It is similar to basic Monte Carlo eval in go, which implies that it also inherits its weaknesses. Now if they would add UCT... :) I am sure it is not used in the actual playing engine. -- GCP ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte Carlo evaluation in chess
On Sun, Jul 20, 2008 at 3:40 AM, Rémi Coulom [EMAIL PROTECTED] wrote: Rybka 3 has Monte-Carlo evaluation: http://www.chessbase.com/newsdetail.asp?newsid=4772 If I understand the release note correctly, Monte Carlo Analysis is something like a feature of the GUI for analyzing a position and giving statistics back to the user, not part of the evaluation function used in the search. Álvaro. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte Carlo evaluation in chess
On Sun, Jul 20, 2008 at 10:10 AM, Jason House [EMAIL PROTECTED] wrote: On Jul 20, 2008, at 9:10 AM, Álvaro Begué [EMAIL PROTECTED] wrote: On Sun, Jul 20, 2008 at 3:40 AM, Rémi Coulom [EMAIL PROTECTED] wrote: Rybka 3 has Monte-Carlo evaluation: http://www.chessbase.com/newsdetail.asp?newsid=4772 If I understand the release note correctly, Monte Carlo Analysis is something like a feature of the GUI for analyzing a position and giving statistics back to the user, not part of the evaluation function used in the search. It also mentions a recent strength gain of 80 ELO. I would not be surprised if they added MC to influence the search. Well, I would be. I was a chess programmer for years before I started toying with go, and I can't believe there is any chance of MC having anything to do with the improved strength. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] monte carlo transposition reuse (MCTR)
Does it defeat it based on number of samples taken or time allotted per turn? On 9/14/07, Jason House [EMAIL PROTECTED] wrote: I know I'm only wading in the kiddie pool of computer go with my 1-ply bots, but I think I may have found a useful enhancement to monte carlo. HouseBot supports three 1-ply search modes: 1plyMC - Uniform sampling 1plyShuffle - Uniform sampling with monte carlo transposition reuse 1plyUCT - Non-uniform sampling based on the UCT algorithm (AKA UCB) Obviously, 1plyMC is far inferior to 1plyUCT as everyone probably expects. What may surprise many is that 1plyShuffle defeats 1plyUCT nearly every time. I'm basic this on self-play data from CGOS. Currently, http://cgos.boardspace.net/9x9/cross/housebot-617-UCB.html shows 10 matches between housebot-617-UCB has played housebot-618-shuff. housebot-617-UCB (1plyUCT) lost every time. While tricky, it should be possible to combine UCT and MCTR for an even stronger bot. MCTR can be thought of as a low bias alternative to the AMAF heuristic. Rather than using all moves, MCTR takes only the top N moves, where N is computed based on which moves were played in the random game. From an open board position MCTR uses about 1/3 of the moves that AMAF would. Computation of the resulting winning percentage must also be weighted based on the probabilities of duplicating results (roughly speaking, it's 1/N). As a result of using MCTR, winning rates are no longer integers as one would expect. Here's the estimated winning rates for all three algorithms when asked for a white response to black G3: 1plyMC: 781 / 1272 1plyShuffle: 140.15 / 231.75 1plyUCT: 936 / 1515 1plyShuffle is slower because of the extra work information tracking, but the variance in estimates should be far lower than the numbers would indicate. I have yet to do the computations, but a sample size of 231.75 has an estimation error of around 6000 normal MC runs for that position. That is why my implementation of MCTR is defeating my (1ply) implementation of UCT. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] monte carlo transposition reuse (MCTR)
Time management is identical. Based on quick profiling, the MCTR version does about 1/6 of the simulations. Actually, since the MCTR does extra tracking of info, it can reuse some old simulations, so it may be more like 1/4 or 1/5 of the simulations. It's just using the results more efficiently. On 9/14/07, Chris Fant [EMAIL PROTECTED] wrote: Does it defeat it based on number of samples taken or time allotted per turn? On 9/14/07, Jason House [EMAIL PROTECTED] wrote: I know I'm only wading in the kiddie pool of computer go with my 1-ply bots, but I think I may have found a useful enhancement to monte carlo. HouseBot supports three 1-ply search modes: 1plyMC - Uniform sampling 1plyShuffle - Uniform sampling with monte carlo transposition reuse 1plyUCT - Non-uniform sampling based on the UCT algorithm (AKA UCB) Obviously, 1plyMC is far inferior to 1plyUCT as everyone probably expects. What may surprise many is that 1plyShuffle defeats 1plyUCT nearly every time. I'm basic this on self-play data from CGOS. Currently, http://cgos.boardspace.net/9x9/cross/housebot-617-UCB.html shows 10 matches between housebot-617-UCB has played housebot-618-shuff. housebot-617-UCB (1plyUCT) lost every time. While tricky, it should be possible to combine UCT and MCTR for an even stronger bot. MCTR can be thought of as a low bias alternative to the AMAF heuristic. Rather than using all moves, MCTR takes only the top N moves, where N is computed based on which moves were played in the random game. From an open board position MCTR uses about 1/3 of the moves that AMAF would. Computation of the resulting winning percentage must also be weighted based on the probabilities of duplicating results (roughly speaking, it's 1/N). As a result of using MCTR, winning rates are no longer integers as one would expect. Here's the estimated winning rates for all three algorithms when asked for a white response to black G3: 1plyMC: 781 / 1272 1plyShuffle: 140.15 / 231.75 1plyUCT: 936 / 1515 1plyShuffle is slower because of the extra work information tracking, but the variance in estimates should be far lower than the numbers would indicate. I have yet to do the computations, but a sample size of 231.75 has an estimation error of around 6000 normal MC runs for that position. That is why my implementation of MCTR is defeating my (1ply) implementation of UCT. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] monte carlo transposition reuse (MCTR)
Sure. I'm not sure how best to answer your question, so I'll respond with mostly pseudo code. I can answer questions on theory, but I expect that to go back and forth a bit since it's typically tough to explain. Where there's grey areas in implementation choice, I have text describing it (including basic theory). For the real source code, the logic is in search.d of HouseBot's (open) source. HouseBot's home page is http://housebot.sourceforge.net Slowdown over basic MC occurs in two ways: 1. Maintaining a list of moves whose order can be shuffled around. 2. Applying the results to all affected leaves in the search tree (3. In my current implementation, I may save too much info and memory management may be another problem) Here's some pseudo code Main thinking loop: while ( can think ){ reset board reset shuffle list shuffleMode = true foreach(legal move){ play(move) if ( shuffleMode ){ shuffleMode = transposition still true //see below if (shuffleMode || first move) insert move into shuffle list } apply result //see below } Transposition still true By transposition or shuffle, I mean that I can reorder the moves by each color and arrive at the same board position without violating any rules along the way... Still representing a mostly random game. Obviously, any move that's illegal in the starting board position should will violate this and breaks the move shuffling property. Illegal would be stuff like suicide plays, taking a ko, or playing on top of another stone. If you assume all captures break the move shuffling property, life is simple and strange suicide plays that occur as combinations of stones won't occur. If you're smart enough to find a way to do that efficiently (or ignore it like I did), then you must check to ensure that stones captured are not in the shuffle list. If they are, order matters and the latest move breaks the move shuffling property. My code is far different, but when I think about it, the simplest form would be : return not (capture on current board). I should probably try that one out... Allowing captures does have the benefit that the shuffle list gets longer, but the control logic gets more complex (simple ko's, suicide in original position or a complex reordered combination of moves that create suicides, etc...). How to apply the result: Essentially, everything in the shuffle list can consider this as a game that began with that move first. At first, I had fractional game = 1 in a nieve attempt to test out my theory and it failed horribly. The problem is that a positions with long shuffle lists can be reached in many ways by many starting moves. Because of this, sequences with long shuffle lists got counted way too much. I had to make it so that regardless of the shuffle list length, a position would not be over-counted in the final average. To do this, you should compute the probability of the position being arrived at by all candidate methods and then do a weighted sum. prob(this method) / [ prob (this method) + prob (alternate method 1) + prob( alternate method 2) + ...]. I made the naive assumption that prob(method a) == prob(method b). I think that's reasonable for uniform sampling. That value is used in the pseudo code below foreach(random game) float fractional game = 1 / number of moves in shuffle list with the same color to move foreach(move in shuffle list with same color to move){ num sims (move of interest) += fractional game if (random game is is win) num wins (move of interest) += fractional game // Note that variance of the estimate increases by (0.5 * fractional game)^2 } } On 9/14/07, Brian Slesinsky [EMAIL PROTECTED] wrote: Can you explain a bit more about how 1plyShuffle works? On 9/14/07, Jason House [EMAIL PROTECTED] wrote: Time management is identical. Based on quick profiling, the MCTR version does about 1/6 of the simulations. Actually, since the MCTR does extra tracking of info, it can reuse some old simulations, so it may be more like 1/4 or 1/5 of the simulations. It's just using the results more efficiently. On 9/14/07, Chris Fant [EMAIL PROTECTED] wrote: Does it defeat it based on number of samples taken or time allotted per turn? On 9/14/07, Jason House [EMAIL PROTECTED] wrote: I know I'm only wading in the kiddie pool of computer go with my 1-ply bots, but I think I may have found a useful enhancement to monte carlo. HouseBot supports three 1-ply search modes: 1plyMC - Uniform sampling 1plyShuffle - Uniform sampling with monte carlo transposition reuse 1plyUCT - Non-uniform sampling based on the UCT algorithm (AKA UCB) Obviously, 1plyMC is far inferior to 1plyUCT as everyone probably expects. What may surprise many is that 1plyShuffle defeats 1plyUCT nearly every time. I'm basic this on self-play data from
Re: [computer-go] monte carlo transposition reuse (MCTR)
On 9/14/07, Jason House [EMAIL PROTECTED] wrote: // Note that variance of the estimate increases by (0.5 * fractional game)^2 I should say that the variance of wins increases by that amount. The estimate of winning percentage will be computed as wins / sims. The variance for that is (variance of wins variable) / sims^2. In normal monte carlo, each sim causes the variance of wins to increase by 0.5^2 for a total of 0.5^2 * sims. The final result is a variance of 0.5^2/sims. Of course, final computations typically use the standard deviation (the square root of variance). This gives the classical result of 0.25 / sqrt(sims)... For the same MCBR, the result is less for the same value of sims... ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Go simulation
On 9, Feb 2007, at 4:40 AM, Sylvain Gelly wrote: Alain's point, that knowledge can both help narrow the search to good moves and at the same time steer you away from the best move is absolutely true in SlugGo's case. I completely agree with that. However can we agree that we want a better player in a whole, and not only better in some particular positions? So perhaps, I think, behind far from the best move, while playing always good moves is already good no? Sylvain Absolutely. I notice that when SlugGo makes moves that a professional said look quite playable a huge mistake is going to happen very soon. Making the best move from the point of view of a really strong player is also NOT what I want SlugGo to do. SlugGo has no concept of what the implications are and what the required followup moves will be. It is clearly better for SlugGo to make many 90% moves in a row than to have it try to make any 100% moves. There is a much better chance of it finding the correct followup to slightly lower quality moves. Cheers, David ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Go simulation
Alain's point, that knowledge can both help narrow the search to good moves and at the same time steer you away from the best move is absolutely true in SlugGo's case. I completely agree with that. However can we agree that we want a better player in a whole, and not only better in some particular positions? So perhaps, I think, behind far from the best move, while playing always good moves is already good no? Sylvain ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Go simulation
Le jeudi 8 février 2007 22:09, Sylvain Gelly a écrit : It seems i was ambiguous: I was speaking of the simulation player too. What i meant is a random simulation player is not biased, whereas a better simulation player is biased by its knowledge, and thus can give wrong evaluation of a position. I think we have to start defining what the bias. For me the bias is the difference between the expected value of the outcomes of playouts by the simulation player and the real minimax value. In this definition the uniform random simulation player is VERY biased and gnugo much less. OK, by i used bias in common sense, to mean that the strong simulator has preferences for some moves, and doesn't consider them equally, or worse doesn't consider some moves. So it will miss some good points due to its knowledge, whereas the random player will find the move. A trivial example is GNU Go: its analyze is sometimes wrong. Of course, if not computer go would be solved :-). Even if it is obviously much stronger than a random player, it would give wrong result if used as a simulation player. Hum, are you sure? I m 100% sure of this :-) I think that GnuGo with randomisation, (and much faster of course) would make a very good simulation player (much better than any existing simulation player). Even with randomization, GNU Go considers only a few dozen of possible moves, and makes systematic errors. Some times ago Rémi Coulom asked for positions illustrating computer stupidity (2006-11-22) http://computer-go.org/pipermail/computer-go/2006-November/007107.html and GNU Go provided some nice examples where its (wrong/misunderstood) knowledge induces a failure in play. One very impressive was GNU GO 3.6 not invading where obviously it is possible to invade (Steven Clark 2006-11-27) http://computer-go.org/pipermail/computer-go/2006-November/007184.html But a weaker player than GnuGo can make an even better simulation player. yes. David Doshay experiments with SlugGo showed that searching very deep/wide does not improve a lot the strength of the engine, which is bound by the underlying weaknesses of GNU Go. Yes, this a similar non trivial result. I think there are more existing experimental and theoritical analysis of this, though. Perhaps such an analysis already exist for MC also, it is just that I don't know. Or maybe i just understood nothing of what you explained ;) It was not really explanations, just thoughts. I have no the solution, just think that it is an interesting question, and that it may be discussed. May be from a strong explanation of this phenomenon could come new ideas. I understand all these counter examples, I just think that it is more complicated than that. Sylvain I fully agree. Alain ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Go Misnomer?
Hello, Is there any known (by theory or tests) function of how much a increase in the strength of the simulation policy increases the strength of the MC/UCT Program as a whole? I think that is a very interesting question. In our work on MoGo we found that there could be a decrease of the strength of the MC/UCT program while using a stronger simulation policy. It is why in MoGo it is more the sequence idea, than the strength idea. Our best simulation policy is quite weak compared to others we tested. But we have further experiments, in a work with David Silver from the university of Alberta. We found out that the relation strong simulation policy = strong MC program is wrong at a much larger scale. So the intransivity is true even with much much stronger simulation policies. Of course there is the simple counter example of a deterministic player. But our results hold even if we randomise (in a lot of manners, and tuning as best as we can the parameters) the much stronger policy. I have some theory about this phenomenon in general, but not enough polished for the moment. I really think that understanding deeply this experimental evidence, deeper than some intuition, would help going further. But maybe some already did. Sylvain ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Go Misnomer?
In our work on MoGo we found that there could be a decrease of the strength of the MC/UCT program while using a stronger simulation policy. It is why in MoGo it is more the sequence idea, than the strength idea. Our best simulation policy is quite weak compared to others we tested. Could you elaborate on the sequence idea? Or point me to somewhere where you have already done so? ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Go Misnomer?
Could you elaborate on the sequence idea? Or point me to somewhere where you have already done so? Sure. We already describe it in our research report that you can find there: http://hal.inria.fr/inria-00117266 or there if you prefers the web interface in english but with a longer URL :-) http://hal.inria.fr/index.php?halsid=f72d067037f3b8f161448b400d117210view_this_doc=inria-00117266version=3 Sylvain ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Go simulation
One simple explaination could be that a random player shamelessly tries all moves (very bad ones but also very nice tesuji) whereas the stronger player is restricted by its knowledge and will always miss some kind of moves. Here we are not speeking about the pruning in the tree, but the simulation player. The tree must explore every move, to avoid missing important ones. However we totally don't care if all possible games can or not be played by the simulation player. What we care about is the expectation of the wins by self play. If the simulation player sometimes play meaningful sequences but with a very small probability, then it has very little influence on the expectation. It seems to me that the explanation may be more complicated. Sylvain ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Go simulation
It seems i was ambiguous: I was speaking of the simulation player too. What i meant is a random simulation player is not biased, whereas a better simulation player is biased by its knowledge, and thus can give wrong evaluation of a position. I think we have to start defining what the bias. For me the bias is the difference between the expected value of the outcomes of playouts by the simulation player and the real minimax value. In this definition the uniform random simulation player is VERY biased and gnugo much less. A trivial example is GNU Go: its analyze is sometimes wrong. Of course, if not computer go would be solved :-). Even if it is obviously much stronger than a random player, it would give wrong result if used as a simulation player. Hum, are you sure? I think that GnuGo with randomisation, (and much faster of course) would make a very good simulation player (much better than any existing simulation player). But a weaker player than GnuGo can make an even better simulation player. David Doshay experiments with SlugGo showed that searching very deep/wide does not improve a lot the strength of the engine, which is bound by the underlying weaknesses of GNU Go. Yes, this a similar non trivial result. I think there are more existing experimental and theoritical analysis of this, though. Perhaps such an analysis already exist for MC also, it is just that I don't know. Or maybe i just understood nothing of what you explained ;) It was not really explanations, just thoughts. I have no the solution, just think that it is an interesting question, and that it may be discussed. May be from a strong explanation of this phenomenon could come new ideas. I understand all these counter examples, I just think that it is more complicated than that. Sylvain ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/