Re: [computer-go] MoGo paper at ICML
Hi, In the paper you only present results of UCT_RAVE with the MoGo default policy. Did you run tests with UCT_RAVE using pure random playouts too? I'm curious because I've tried millions ( well, it feels that way ) of uses for AMAF in my code... but so far all of them have been proven useless, often yielding worse results. /Christian Nilsson On 6/23/07, Sylvain Gelly [EMAIL PROTECTED] wrote: Hello all, We just presented our paper describing MoGo's improvements at ICML, and we thought we would pass on some of the feedback and corrections we have received. (http://www.machinelearning.org/proceedings/icml2007/papers/387.pdf) The way that we incorporate prior knowledge in UCT can be seen as a bayesian prior, and corresponds exactly to the dirichlet prior (more precisely to the beta prior as we here get binomials). The cumulative result is only given using the prior knowledge on top of RAVE, but it could have been done the other way round and give the same type of results. Each particular improvement is somehow independent of the others. On figure 5, the legend of horizontal axis should be m_prior rather than n_prior. All experiments (except the default policy) were played against GnuGo level 10, not level 8. Any other comments are welcome! Sylvain David ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] scalability study - final results
Someone just reminded me of the scalability study I did a few months back and I reported that I would continue to run it for perhaps a few more weeks. I did run about 20% more games, but the data was quite useful because it increased the number of games sampled for the highest levels. I had started the highest level program late but the auto-tester is designed to try to equalize the number of games played for each player. As a reminder, the study was designed to test the improvement of modern UCT programs as the number of play-outs increase. In the study, I had two basic versions, each testing at 12 different levels. The L series is Lazarus running with lite play-outs and the H series is Lazarus running with heavy play-outs. Since the study, Lazarus has actually improved significantly, so these are both older versions of Lazarus - still relatively strong and perhaps better candidates for a study of this type since the older programs tend to be more universal (less prone to serious intransitives.) I don't have a graph like I did before, but one can easily be constructed by the data: --- Player Key --- H_ is heavy play-out version of Lazarus L_ is light play-out version of Lazarus The numeric portion of the player name describes how many play-outs were executed to play each move. PLAYERTIME/GME RATING GAMES Total games: 2895 --- - H_204813350.17 2830.2168 H_1024 6693.84 2768.0169 H_0512 3147.28 2547.3168 H_0256 1547.30 2399.3168 L_2048 4549.37 2375.5168 H_0128 758.64 2315.7168 L_1024 2203.88 2287.8169 H_0064 381.00 2240.3339 L_0512 1064.80 2174.1168 H_0032 214.12 2129.2318 L_0256 523.12 2105.7168 L_0128 258.54 2097.8170 gg-3.7.9 68.97 2000.0307Standard GnuGo 3.7.9 L_0064 134.17 1981.7293 H_0016 125.93 1950.2284 L_0032 72.72 1941.5284 H_0008 62.27 1872.4276 L_0016 43.49 1758.6261 H_0004 31.22 1679.1253 L_0008 21.07 1556.2248 H_0002 14.90 1402.1250 L_0004 10.55 1347.0248 L_00025.03 1123.6248 H_00017.44 1031.6249 L_00012.49863.6248 Observations: If you look at the entire range of the HEAVY player, you will notice that each doubling (on average) was worth 164 ELO points. You will also notice a gradual falloff in improvement as the levels increase. As a general rule of thumb, there is about 150 ELO per doubling. I figured this by throwing out the highest and lowest rated HEAVY player and averaging the increase per doubling. It seems pragmatic to throw out the 2 extremes based on empirical observation - I have always noticed that in a pool of players the highest and lowest often have at least somewhat distorted ratings. After throwing out the low and high ratings the top 5 players average about 132 ELO per doubling and the bottom 5 average an increase of about 210 per doubling. So there is a definite decrease per doubling, but it's quite gradual. I did a similar study with 7x7 and found that the tapering is extremely pronounced. It was quite obvious which komi to use, because if it was too low black won every games, if it was too high white won every game. The tapering was pronounced because at higher levels the play was very close to perfect. If you are playing perfect, there is no improvement to be had by doubling. It appears as a general rule of thumb, (and is supported by empirical evidence in similar studies of other games) that the rating/resource curve is almost linear when you are far away from perfect play but gets pronounced as you approach perfect play. I suspect Lazarus at the highest level I tested is within a few hundred ELO points of perfect play. It's still a long way off, especially considering that Lazarus at the highest level was spending almost 4 hours on each 9x9 game! My auto-tester stores all data, including configuration in a single sqlite3 database. In it are the SGF game records, individual results and even time spent on each move and it's available to anyone who wants it upon request - so you can analyze the results for yourself and come to your own conclusions! - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MoGo paper at ICML
On 6/25/07, Sylvain Gelly [EMAIL PROTECTED] wrote: I have to admit that it took me several weeks to make the RAVE algorithm actually work, although the idea is so simple. That maybe explain your previous results. The description in the paper should be sufficient to make it work well. Ok, I'll just have to work harder then. :) Thanks, Christian ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] scalability study - final results
These are very interesting results. Thanks for doing all this work. - Dave Hillis -Original Message- From: Don Dailey [EMAIL PROTECTED] To: computer-go computer-go@computer-go.org Sent: Mon, 25 Jun 2007 12:34 pm Subject: [computer-go] scalability study - final results Someone just reminded me of the scalability study I did a few months ack and I reported that I would continue to run it for perhaps a few ore weeks. I did run about 20% more games, but the data was quite useful because t increased the number of games sampled for the highest levels. I had tarted the highest level program late but the auto-tester is designed o try to equalize the number of games played for each player. As a reminder, the study was designed to test the improvement of odern UCT programs as the number of play-outs increase. In the tudy, I had two basic versions, each testing at 12 different levels. The L series is Lazarus running with lite play-outs and the H series s Lazarus running with heavy play-outs. Since the study, Lazarus as actually improved significantly, so these are both older versions f Lazarus - still relatively strong and perhaps better candidates for study of this type since the older programs tend to be more niversal (less prone to serious intransitives.) I don't have a graph like I did before, but one can easily be onstructed by the data: --- Player Key --- H_ is heavy play-out version of Lazarus L_ is light play-out version of Lazarus The numeric portion of the player name describes how many play-outs were executed to play each move. PLAYERTIME/GME RATING GAMES Total games: 2895 --- --- - _204813350.17 2830.2168 _1024 6693.84 2768.0169 _0512 3147.28 2547.3168 _0256 1547.30 2399.3168 _2048 4549.37 2375.5168 _0128 758.64 2315.7168 _1024 2203.88 2287.8169 _0064 381.00 2240.3339 _0512 1064.80 2174.1168 _0032 214.12 2129.2318 _0256 523.12 2105.7168 _0128 258.54 2097.8170 g-3.7.9 68.97 2000.0307Standard GnuGo 3.7.9 _0064 134.17 1981.7293 _0016 125.93 1950.2284 _0032 72.72 1941.5284 _0008 62.27 1872.4276 _0016 43.49 1758.6261 _0004 31.22 1679.1253 _0008 21.07 1556.2248 _0002 14.90 1402.1250 _0004 10.55 1347.0248 _00025.03 1123.6248 _00017.44 1031.6249 _00012.49863.6248 Observations: If you look at the entire range of the HEAVY player, you will notice that each doubling (on average) was worth 164 ELO points. You will also notice a gradual falloff in improvement as the levels increase. As a general rule of thumb, there is about 150 ELO per doubling. I figured this by throwing out the highest and lowest rated HEAVY player and averaging the increase per doubling. It seems pragmatic to throw out the 2 extremes based on empirical observation - I have always noticed that in a pool of players the highest and lowest often have at least somewhat distorted ratings. After throwing out the low and high ratings the top 5 players average about 132 ELO per doubling and the bottom 5 average an increase of about 210 per doubling. So there is a definite decrease per doubling, but it's quite gradual. did a similar study with 7x7 and found that the tapering is extremely ronounced. It was quite obvious which komi to use, because if it was oo low black won every games, if it was too high white won every ame. The tapering was pronounced because at higher levels the play as very close to perfect. If you are playing perfect, there is no mprovement to be had by doubling. It appears as a general rule of thumb, (and is supported by empirical vidence in similar studies of other games) that the rating/resource urve is almost linear when you are far away from perfect play but ets pronounced as you approach perfect play. I suspect Lazarus at he highest level I tested is within a few hundred ELO points of erfect play. It's still a long way off, especially considering that azarus at the highest level was spending almost 4 hours on each 9x9 ame! My auto-tester stores all data, including configuration in a single qlite3 database. In it are the SGF game records, individual results nd even time spent on each move and it's available to anyone who ants it upon request - so you can analyze the results for yourself nd come to your own conclusions! - Don ___ omputer-go mailing list [EMAIL PROTECTED] ttp://www.computer-go.org/mailman/listinfo/computer-go/ Check Out the new free AIM(R) Mail -- 2 GB of storage and industry-leading spam and email virus
Re: [spam probable] [computer-go] scalability study - final results
Hi Don, This is a very interesting study! Sylvain 2007/6/25, Don Dailey [EMAIL PROTECTED]: Someone just reminded me of the scalability study I did a few months back and I reported that I would continue to run it for perhaps a few more weeks. I did run about 20% more games, but the data was quite useful because it increased the number of games sampled for the highest levels. I had started the highest level program late but the auto-tester is designed to try to equalize the number of games played for each player. As a reminder, the study was designed to test the improvement of modern UCT programs as the number of play-outs increase. In the study, I had two basic versions, each testing at 12 different levels. The L series is Lazarus running with lite play-outs and the H series is Lazarus running with heavy play-outs. Since the study, Lazarus has actually improved significantly, so these are both older versions of Lazarus - still relatively strong and perhaps better candidates for a study of this type since the older programs tend to be more universal (less prone to serious intransitives.) I don't have a graph like I did before, but one can easily be constructed by the data: --- Player Key --- H_ is heavy play-out version of Lazarus L_ is light play-out version of Lazarus The numeric portion of the player name describes how many play-outs were executed to play each move. PLAYERTIME/GME RATING GAMES Total games: 2895 --- - H_204813350.17 2830.2168 H_1024 6693.84 2768.0169 H_0512 3147.28 2547.3168 H_0256 1547.30 2399.3168 L_2048 4549.37 2375.5168 H_0128 758.64 2315.7168 L_1024 2203.88 2287.8169 H_0064 381.00 2240.3339 L_0512 1064.80 2174.1168 H_0032 214.12 2129.2318 L_0256 523.12 2105.7168 L_0128 258.54 2097.8170 gg-3.7.9 68.97 2000.0307Standard GnuGo 3.7.9 L_0064 134.17 1981.7293 H_0016 125.93 1950.2284 L_0032 72.72 1941.5284 H_0008 62.27 1872.4276 L_0016 43.49 1758.6261 H_0004 31.22 1679.1253 L_0008 21.07 1556.2248 H_0002 14.90 1402.1250 L_0004 10.55 1347.0248 L_00025.03 1123.6248 H_00017.44 1031.6249 L_00012.49863.6248 Observations: If you look at the entire range of the HEAVY player, you will notice that each doubling (on average) was worth 164 ELO points. You will also notice a gradual falloff in improvement as the levels increase. As a general rule of thumb, there is about 150 ELO per doubling. I figured this by throwing out the highest and lowest rated HEAVY player and averaging the increase per doubling. It seems pragmatic to throw out the 2 extremes based on empirical observation - I have always noticed that in a pool of players the highest and lowest often have at least somewhat distorted ratings. After throwing out the low and high ratings the top 5 players average about 132 ELO per doubling and the bottom 5 average an increase of about 210 per doubling. So there is a definite decrease per doubling, but it's quite gradual. I did a similar study with 7x7 and found that the tapering is extremely pronounced. It was quite obvious which komi to use, because if it was too low black won every games, if it was too high white won every game. The tapering was pronounced because at higher levels the play was very close to perfect. If you are playing perfect, there is no improvement to be had by doubling. It appears as a general rule of thumb, (and is supported by empirical evidence in similar studies of other games) that the rating/resource curve is almost linear when you are far away from perfect play but gets pronounced as you approach perfect play. I suspect Lazarus at the highest level I tested is within a few hundred ELO points of perfect play. It's still a long way off, especially considering that Lazarus at the highest level was spending almost 4 hours on each 9x9 game! My auto-tester stores all data, including configuration in a single sqlite3 database. In it are the SGF game records, individual results and even time spent on each move and it's available to anyone who wants it upon request - so you can analyze the results for yourself and come to your own conclusions! - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] evalgo autotesting system
On Mon, Jun 25, 2007 at 03:17:06PM -0400, Don Dailey wrote: However I'm releasing my testing system to the public if anyone is interested. It may have some features in it that you will be interested in. Thanks. Looks promising. It DOES require having sqlite3 and a little bit (but not much) sql knowledge. You DO have to manually insert registry records into the database to specify who the players are and how they should be invoked, but no other sql knoweldge is required beyond this - a report is furnished by the program or if you want you can manually query the database to find out anything you want about the games. I have it running, but at the moment I have no idea how to specify my program for it. sqlite3 seems not to understand the describe command that I normally use to sort out the table layouts before inserting anything. Maybe you could add a one-line example to the README, on how to add a program (say GNU Go) as a player. Regards Heikki -- Heikki Levanto In Murphy We Turst heikki (at) lsd (dot) dk ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] evalgo autotesting system
On Mon, Jun 25, 2007 at 04:33:47PM -0400, Don Dailey wrote: Here is how you might set up gnugo: Thanks! that certainly looks enough to get me going. - Heikki -- Heikki Levanto In Murphy We Turst heikki (at) lsd (dot) dk ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] evalgo autotesting system
Actually, the info table default to boardsize 9 and 7.5 komi, so to change this you need to either delete and re-insert or update the first row. - Don On Mon, 2007-06-25 at 21:57 +0200, Heikki Levanto wrote: On Mon, Jun 25, 2007 at 03:17:06PM -0400, Don Dailey wrote: However I'm releasing my testing system to the public if anyone is interested. It may have some features in it that you will be interested in. Thanks. Looks promising. It DOES require having sqlite3 and a little bit (but not much) sql knowledge. You DO have to manually insert registry records into the database to specify who the players are and how they should be invoked, but no other sql knoweldge is required beyond this - a report is furnished by the program or if you want you can manually query the database to find out anything you want about the games. I have it running, but at the moment I have no idea how to specify my program for it. sqlite3 seems not to understand the describe command that I normally use to sort out the table layouts before inserting anything. Maybe you could add a one-line example to the README, on how to add a program (say GNU Go) as a player. Regards Heikki ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] evalgo autotesting system
Let me know if you get it working. - Don On Mon, 2007-06-25 at 22:36 +0200, Heikki Levanto wrote: On Mon, Jun 25, 2007 at 04:33:47PM -0400, Don Dailey wrote: Here is how you might set up gnugo: Thanks! that certainly looks enough to get me going. - Heikki ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] scalability study - final results
Don, That's exciting! If Lazarus with heavy playouts can achieve within a few hundred points of perfect play on a 9x9 board, in less than 4 hours total game time, then it should do rather well on such turn-based servers as the Dragon Go Server. A 30-day clock should be more than adequate. That would be something of a milestone, trouncing strong human players on the 9x9 board, with no excuses about the humans running out of time. Terry McIntyre [EMAIL PROTECTED] They mean to govern well; but they mean to govern. They promise to be kind masters; but they mean to be masters. -- Daniel Webster The fish are biting. Get more visitors on your site using Yahoo! Search Marketing. http://searchmarketing.yahoo.com/arp/sponsoredsearch_v2.php___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] scalability study - final results
Hi Don, Thanks for doing this valueable work. Where can we get the data? I am interested with it. Cai Qiang ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] scalability study - final results
On Mon, 2007-06-25 at 15:07 -0700, terry mcintyre wrote: Don, That's exciting! If Lazarus with heavy playouts can achieve within a few hundred points of perfect play on a 9x9 board, in less than 4 hours total game time, then it should do rather well on such turn-based servers as the Dragon Go Server. A 30-day clock should be more than adequate. That would be something of a milestone, trouncing strong human players on the 9x9 board, with no excuses about the humans running out of time. I believe humans play much stronger too at those time controls. Unless of course they are playing many games and are not really focused on any particular game. In fact, I'm quite convinced that a human really trying hard on a turn based server would be a formidable opponent - playing much stronger than he normally would over the board. But then so would the program! - Don Terry McIntyre [EMAIL PROTECTED] They mean to govern well; but they mean to govern. They promise to be kind masters; but they mean to be masters. -- Daniel Webster __ Finding fabulous fares is fun. Let Yahoo! FareChase search your favorite travel sites to find flight and hotel bargains. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] scalability study - final results
On Tue, 2007-06-26 at 06:50 +0800, elife wrote: Hi Don, Thanks for doing this valueable work. Where can we get the data? I am interested with it. Cai Qiang I put everything on that web site: Just go to http://www.greencheeks.homelinux.org:8015/ and you can get the games from april, may and june from CGOS, the autotester the cgos server and scale.db - the data in sqlite3 format from the scalability study. - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] scalability study - final results
After throwing out the low and high ratings the top 5 players average about 132 ELO per doubling and the bottom 5 average an increase of about 210 per doubling. ... I suspect Lazarus at the highest level I tested is within a few hundred ELO points of perfect play. It's still a long way off, especially considering that Lazarus at the highest level was spending almost 4 hours on each 9x9 game! You're suggesting that it would be practically perfect with say three more doublings (another 132*3=400 ELO points), which is only 32 hours per game. At that level play should be relatively stable (statistically) and it would be great to run just 2 games of self-play (128 hours = 5 days?), and study the game record. Do you feel that at these strong levels the experiment is distorted by the lack of an equally strong non-UCT program? UCT programs tend to be weaker in the opening, stronger at the end. Once they reach the level where GnuGo is cannon fodder then it is just a self-play experiment, and their remaining weaknesses are not being exploited. Darren ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/