Re: [computer-go] How to design the stronger playout policy?
Yamato wrote: I guess the current top programs have much better playout policy than the classical MoGo-style one. The original policy of MoGo was, (1) If the last move is an Atari, plays one saving move randomly. (2) If there are interesting moves in the 8 positions around the last move, plays one randomly. (3) If there are the moves capturing stones, plays one randomly. (4) Plays one random move on the board. I (and maybe many others) use it with some improvements, however it will be not enough to catch up the top programs. What improvements did you try? The obvious one I know are prioritizing saving and capturing moves by the size of the string. Zen appears quite strong on CGOS. Leela using the above system was certainly weaker. Then I have tested a lot of change of probability distributions, but it was very hard to improve the strength. Any comments? I had the same problem, i.e. it seems almost impossible to improve the MoGo system by having a different pattern set for interesting moves, or even by varying the probability of interesting moves by pattern score. I tried 2 things: a) I exctracted about 5000 positions with a known winner (determined by UCT) from CGOS games, and measured the Mean Square Error of the result fof my playouts against the known result (also described in one of the MoGo papers). Then I applied a genetic algorithm to optimize the playout patterns. This worked, in the sense that the MSE measured over the 5000 positions dropped. However, it did not produce a stronger program! I found that somewhat shocking. It makes me doubt the value of the MSE measure. 2) I made a simple genetic algorithm that makes a random pool of a few hundred playout policites, picks 2 random parents and crossovers/mutates to 2 children, plays a 10 game match between the 2 children with simulations = 100, and then keeps the winner. This did not produce anything interesting either. My best guess is that the match results are simply too random. So I did not found any way to automatically optimize the patterns. I finally improved my playouts by using Remi's ELO system to learn a set of interesting patterns, and just randomly fiddling with the probabilities (compressing/expanding) until something improved my program in self-play with about +25%. Not a very satisfying method or an exceptional result. There could be some other magic combination that is even better, or maybe not. I now got some improvement by merging the (1) (2) (3) in the MoGo system and using probabilities on that. It makes sense because the playouts wont try hopeless saving moves, for example. What is so frustrating is that the playouts are essentially black magic. I know of no way to automatically determine what is good and not besides playing about 500 games between 2 strategies. The results are very often completely counterintuitive. There is no systematic way to improve. -- GCP ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] update on networking from phils with new CGOS configuration
Strangely enough, today the network from the phils to boardspace.net is so bad that even don's new configuration doesn't make up for the additional network lag (though I think it alleviates the bot's pain). With .2 seconds of thinking time, the net time detracted from time_remaining is still decreasing an average of 1 sec per move. I did run this same bot from a friend's server @ the same configuration, .2 sec/move, and the bot always still has 300 seconds remaining after every move [ a .7s/move bot ALSO always still had 300 seconds], so this means that Don's change works no problem in CGOS in general, the problem is the network from the philippines. Strangely, my ping from the philippines to boardspace.net at the time of this testing is 700ms. BUT my ping from the philippines to MIT.edu is only 350ms.So, I thought, Maybe boardspace.net has a network problem. I tested from the server in the US. ping to boardspace.net - 20ms. ping to mit.edu - 12ms. I definitely cannot understand those results. For the present time, it appears I may have to throw in the towel on getting dependable results running the bots from my laptop in the philippines. Raw output of ping cgos script follows. Peter ps I did notice Fat1800 resign a lost game against my bot from my laptop in the phils, thanks ;-) Now, if my bot could only achieve winning positions against Fat1800 on a regular basis ;) -PING FROM PHILIPPINES 0 [EMAIL PROTECTED] ping mit.edu PING mit.edu (18.7.22.69) 56(84) bytes of data. 64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=1 ttl=238 time=354 ms 64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=2 ttl=238 time=354 ms 64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=3 ttl=238 time=363 ms 64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=4 ttl=238 time=371 ms 64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=5 ttl=238 time=346 ms 64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=6 ttl=238 time=355 ms 64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=7 ttl=238 time=330 ms 64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=8 ttl=238 time=338 ms --- mit.edu ping statistics --- 8 packets transmitted, 8 received, 0% packet loss, time 7086ms rtt min/avg/max/mdev = 330.218/351.896/371.914/12.419 ms 0 [EMAIL PROTECTED] ping 208.100.19.102 [boardspace.net] PING 208.100.19.102 (208.100.19.102) 56(84) bytes of data. 64 bytes from 208.100.19.102: icmp_seq=1 ttl=49 time=615 ms 64 bytes from 208.100.19.102: icmp_seq=2 ttl=49 time=668 ms 64 bytes from 208.100.19.102: icmp_seq=3 ttl=49 time=761 ms 64 bytes from 208.100.19.102: icmp_seq=4 ttl=49 time=651 ms 64 bytes from 208.100.19.102: icmp_seq=5 ttl=49 time=740 ms --- 208.100.19.102 ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4026ms rtt min/avg/max/mdev = 615.330/687.537/761.351/55.010 ms --PING FROM AMAZON EC2 -- 127 [EMAIL PROTECTED] ping boardspace.net PING boardspace.net (208.100.19.102) 56(84) bytes of data. 64 bytes from boardspace.net (208.100.19.102): icmp_seq=1 ttl=50 time=20.6 ms 64 bytes from boardspace.net (208.100.19.102): icmp_seq=2 ttl=50 time=20.6 ms 64 bytes from boardspace.net (208.100.19.102): icmp_seq=3 ttl=50 time=20.7 ms 64 bytes from boardspace.net (208.100.19.102): icmp_seq=4 ttl=50 time=20.9 ms 64 bytes from boardspace.net (208.100.19.102): icmp_seq=5 ttl=50 time=20.9 ms --- boardspace.net ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4003ms rtt min/avg/max/mdev = 20.622/20.782/20.960/0.228 ms 0 [EMAIL PROTECTED] ping mit.edu PING mit.edu (18.7.22.69) 56(84) bytes of data. 64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=1 ttl=239 time=12.1 ms 64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=2 ttl=239 time=12.4 ms 64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=3 ttl=239 time=12.4 ms 64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=4 ttl=239 time=12.9 ms --- mit.edu ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 2997ms rtt min/avg/max/mdev = 12.171/12.511/12.929/0.292 ms - Output from philippines of .2 sec / move bot with cgos script 18:51:55S-C genmove b 30 18:51:55C-E time_left b 300 0 18:51:55E-C = 18:51:55C-E genmove b 18:51:55E-C = G2 18:51:55C-S G2 18:52:10S-C info Estimated time until next round: 09:56 18:52:10Estimated time until next round: 09:56 18:52:12S-C play w F6 284916 18:52:12C-E play w F6 18:52:12E-C = 18:52:12S-C genmove b 299823 18:52:12C-E time_left b 299 0 18:52:12E-C = 18:52:12C-E genmove b 18:52:13E-C = D8 18:52:13C-S D8 18:52:25S-C info Estimated time until next round: 09:56 18:52:25Estimated time until next round: 09:56 18:52:28S-C play w G4 271014 18:52:28C-E play w G4 18:52:28E-C = 18:52:29S-C genmove b 298950 18:52:29C-E time_left
RE: [computer-go] update on networking from phils with new CGOSconfiguration
It might be possible to automatically compensate for lag by looking up the geographic location of a bot's ISP. For instance via http://www.hostip.info/use.html https://webmail.planet.nl/exchweb/bin/redir.asp?URL=http://www.hostip.info/use.html . Dave Van: [EMAIL PROTECTED] namens Peter Christopher Verzonden: za 5-1-2008 12:25 Aan: computer-go@computer-go.org Onderwerp: [computer-go] update on networking from phils with new CGOSconfiguration Strangely enough, today the network from the phils to boardspace.net is so bad that even don's new configuration doesn't make up for the additional network lag (though I think it alleviates the bot's pain). With .2 seconds of thinking time, the net time detracted from time_remaining is still decreasing an average of 1 sec per move. I did run this same bot from a friend's server @ the same configuration, .2 sec/move, and the bot always still has 300 seconds remaining after every move [ a .7s/move bot ALSO always still had 300 seconds], so this means that Don's change works no problem in CGOS in general, the problem is the network from the philippines. Strangely, my ping from the philippines to boardspace.net at the time of this testing is 700ms. BUT my ping from the philippines to MIT.edu is only 350ms.So, I thought, Maybe boardspace.net has a network problem. I tested from the server in the US. ping to boardspace.net - 20ms. ping to mit.edu - 12ms. I definitely cannot understand those results. For the present time, it appears I may have to throw in the towel on getting dependable results running the bots from my laptop in the philippines. Raw output of ping cgos script follows. Peter ps I did notice Fat1800 resign a lost game against my bot from my laptop in the phils, thanks ;-) Now, if my bot could only achieve winning positions against Fat1800 on a regular basis ;) -PING FROM PHILIPPINES 0 [EMAIL PROTECTED] ping mit.edu PING mit.edu (18.7.22.69) 56(84) bytes of data. 64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=1 ttl=238 time=354 ms 64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=2 ttl=238 time=354 ms 64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=3 ttl=238 time=363 ms 64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=4 ttl=238 time=371 ms 64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=5 ttl=238 time=346 ms 64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=6 ttl=238 time=355 ms 64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=7 ttl=238 time=330 ms 64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=8 ttl=238 time=338 ms --- mit.edu ping statistics --- 8 packets transmitted, 8 received, 0% packet loss, time 7086ms rtt min/avg/max/mdev = 330.218/351.896/371.914/12.419 ms 0 [EMAIL PROTECTED] ping 208.100.19.102 [boardspace.net] PING 208.100.19.102 (208.100.19.102) 56(84) bytes of data. 64 bytes from 208.100.19.102: icmp_seq=1 ttl=49 time=615 ms 64 bytes from 208.100.19.102: icmp_seq=2 ttl=49 time=668 ms 64 bytes from 208.100.19.102: icmp_seq=3 ttl=49 time=761 ms 64 bytes from 208.100.19.102: icmp_seq=4 ttl=49 time=651 ms 64 bytes from 208.100.19.102: icmp_seq=5 ttl=49 time=740 ms --- 208.100.19.102 ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4026ms rtt min/avg/max/mdev = 615.330/687.537/761.351/55.010 ms --PING FROM AMAZON EC2 -- 127 [EMAIL PROTECTED] ping boardspace.net PING boardspace.net (208.100.19.102) 56(84) bytes of data. 64 bytes from boardspace.net (208.100.19.102): icmp_seq=1 ttl=50 time=20.6 ms 64 bytes from boardspace.net (208.100.19.102): icmp_seq=2 ttl=50 time=20.6 ms 64 bytes from boardspace.net (208.100.19.102): icmp_seq=3 ttl=50 time=20.7 ms 64 bytes from boardspace.net (208.100.19.102): icmp_seq=4 ttl=50 time=20.9 ms 64 bytes from boardspace.net (208.100.19.102): icmp_seq=5 ttl=50 time=20.9 ms --- boardspace.net ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4003ms rtt min/avg/max/mdev = 20.622/20.782/20.960/0.228 ms 0 [EMAIL PROTECTED] ping mit.edu PING mit.edu (18.7.22.69) 56(84) bytes of data. 64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=1 ttl=239 time=12.1 ms 64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=2 ttl=239 time=12.4 ms 64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=3 ttl=239 time=12.4 ms 64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=4 ttl=239 time=12.9 ms --- mit.edu ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 2997ms rtt min/avg/max/mdev = 12.171/12.511/12.929/0.292 ms - Output from philippines of .2 sec / move bot with cgos script 18:51:55S-C genmove b 30 18:51:55C-E time_left b 300 0 18:51:55E-C = 18:51:55C-E genmove b 18:51:55E-C = G2 18:51:55C-S G2 18:52:10S-C info Estimated time until next round: 09:56 18:52:10Estimated time until
Re: [computer-go] How to design the stronger playout policy?
Gian-Carlo Pascutto wrote: What improvements did you try? The obvious one I know are prioritizing saving and capturing moves by the size of the string. Zen appears quite strong on CGOS. Leela using the above system was certainly weaker. I use the static ladder search in playouts. For example, if a move that matched a 3x3 pattern is capturable in ladder, that is not interesting. Of course such a rule makes a program slower, but I believe it is an improvement. I finally improved my playouts by using Remi's ELO system to learn a set of interesting patterns, and just randomly fiddling with the probabilities (compressing/expanding) until something improved my program in self-play with about +25%. Not a very satisfying method or an exceptional result. There could be some other magic combination that is even better, or maybe not. I also have implemented Remi's Minorization-Maximization algorithm. But I could not find how to use the result of it to improve the strength. Would you explain the details of the playout policy? Do you use only 3x3 patterns? What is so frustrating is that the playouts are essentially black magic. I know of no way to automatically determine what is good and not besides playing about 500 games between 2 strategies. The results are very often completely counterintuitive. There is no systematic way to improve. Yes. In addition, the big problem is that testing policies is very time consuming. I think at least 1000 games that use 3000 or more playouts per move are needed to judge whether a change is good or bad. -- Yamato ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to design the stronger playout policy?
Lazarus uses a system very simlar to the original MoGo policy as documented in the paper. However I did find one significant improvement.I used Rémi's ELO system to rate patterns and I simply throw out moves which match the weakest patterns in the play-outs.In the tree, I also throw out these moves but this is progressive. Those moves still get explored but not near leaf nodes. It seems like I also found a minor improvement in giving captures a higher priority at some point but I don't remember the details. I have not put a huge amount of energy into anything other than this. Programs that play games tend to be highly idiosyncratic. What may work today may not work tomorrow or may work for me and not you.It depends on what is already in your program, what isn't, how you implemented it, etc.Some improvements work well with others and some do not cooperate with other changes. The whole process is a bit of a black art, not just the play-outs. The very biggest problem of all is how to test a change. It's not so difficult in the early stages where you are testing major improvements and getting 100 ELO at a time.But when you get the point where you refine a program, you are talking about small but important improvements.You are looking for 10 or 20 ELO and hoping to put several of them together to get 100 or 200. But testing even 20 ELO points takes thousands of games to get a reliable measure. If we could measure 5 ELO improvements in 5 minutes, you can bet most of us would be able to produce very strong programs. Some of the top computer chess guys have testing systems that play 50,000 games or more to measure the value of small improvements, such as a weight adjustment. They play those games on several computers (or cpu's) and play relatively fast - I have heard of testing systems that average a game per second per cpu at reasonably strong levels (they are still playing at least master strength.) The problem is that if you test too fast, you are not really stressing a monte carlo program or testing the sub-systems in the same way they would be tested in real games. Monte Carlo relies on statistics gathering and that seems to require games that last at least a few seconds. So unless you have a large bank of workstations it's difficult to get enough game to reliable measure small improvements - since this requires tens of thousands of games. - Don Gian-Carlo Pascutto wrote: Yamato wrote: I guess the current top programs have much better playout policy than the classical MoGo-style one. The original policy of MoGo was, (1) If the last move is an Atari, plays one saving move randomly. (2) If there are interesting moves in the 8 positions around the last move, plays one randomly. (3) If there are the moves capturing stones, plays one randomly. (4) Plays one random move on the board. I (and maybe many others) use it with some improvements, however it will be not enough to catch up the top programs. What improvements did you try? The obvious one I know are prioritizing saving and capturing moves by the size of the string. Zen appears quite strong on CGOS. Leela using the above system was certainly weaker. Then I have tested a lot of change of probability distributions, but it was very hard to improve the strength. Any comments? I had the same problem, i.e. it seems almost impossible to improve the MoGo system by having a different pattern set for interesting moves, or even by varying the probability of interesting moves by pattern score. I tried 2 things: a) I exctracted about 5000 positions with a known winner (determined by UCT) from CGOS games, and measured the Mean Square Error of the result fof my playouts against the known result (also described in one of the MoGo papers). Then I applied a genetic algorithm to optimize the playout patterns. This worked, in the sense that the MSE measured over the 5000 positions dropped. However, it did not produce a stronger program! I found that somewhat shocking. It makes me doubt the value of the MSE measure. 2) I made a simple genetic algorithm that makes a random pool of a few hundred playout policites, picks 2 random parents and crossovers/mutates to 2 children, plays a 10 game match between the 2 children with simulations = 100, and then keeps the winner. This did not produce anything interesting either. My best guess is that the match results are simply too random. So I did not found any way to automatically optimize the patterns. I finally improved my playouts by using Remi's ELO system to learn a set of interesting patterns, and just randomly fiddling with the probabilities (compressing/expanding) until something improved my program in self-play with about +25%. Not a very satisfying method or an exceptional result. There could be some other magic combination that is even better, or maybe not. I now got some improvement
Re: [computer-go] How to design the stronger playout policy?
Yamato wrote: I finally improved my playouts by using Remi's ELO system to learn a set of interesting patterns, and just randomly fiddling with the probabilities (compressing/expanding) until something improved my program in self-play with about +25%. Not a very satisfying method or an exceptional result. There could be some other magic combination that is even better, or maybe not. I also have implemented Remi's Minorization-Maximization algorithm. But I could not find how to use the result of it to improve the strength. Would you explain the details of the playout policy? (1) Captures of groups that could not save themselves last move. (2) Save groups in atari due to last move by capturing or extending. (3) Patterns next to last move. (4) Global moves. I quantize the MM pattern scores to 0..255 by multipying them with a large constant and clipping. This causes the very good patterns to have close scores. I then use a threshold so I do not play the very bad patterns at all. The remaining moves are played with the probabilities indicated by the quantized values. I also throw away very bad moves in phase (4) unless there are no alternatives. This gives a small but measurable improvement. But now I believe all the above is actually flawed. With this system I will play bad saving moves even if there are great pattern moves. It might be that your ladder detection avoids these problems somewhat. Considering the probabilities of all moves as Crazy Stone does avoids this problem. I am now trying to get a similar effect without incrementally updating all urgencies. Do you use only 3x3 patterns? Yes. I have not tried bigger ones. For size = 4 the tables would become 2 x 16M. Might be worth a try. -- GCP ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Re: update on networking from phils with new CGOS configuration
It's interesting to look at a graphic plot of a traceroute to see there the actual delays are. I use a program called pingplotter for this, but there are many such programs. Be warned though, that seeing a potential problem only leaves you feeling helpless, since there is typically nothing you can do about it. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to design the stronger playout policy?
Don Dailey wrote: Lazarus uses a system very simlar to the original MoGo policy as documented in the paper. However I did find one significant improvement.I used Rémi's ELO system to rate patterns and I simply throw out moves which match the weakest patterns in the play-outs.In the tree, I also throw out these moves but this is progressive. Those moves still get explored but not near leaf nodes. It is interesting that many people say the similar thing. I'll try to use ELO rating of patterns again. Programs that play games tend to be highly idiosyncratic. What may work today may not work tomorrow or may work for me and not you.It depends on what is already in your program, what isn't, how you implemented it, etc.Some improvements work well with others and some do not cooperate with other changes. The whole process is a bit of a black art, not just the play-outs. I fully agree. -- Yamato ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/