Weston, Although those result sound intriguing, it also looks like a convoluted experiment. I wouldn't call gnu-go an expert judge, although it is an impartial one. The fact that it says that the 5K ref-bot is ahead after 10 moves 46% of the time alone makes it suspect in my eyes. But it is curious it consistently shows a much lower percentage for the bot with more playouts.
It would have been much more persuasive if you had simply run a 5K playout bot against a 100K bot and see which wins more. It shouldn't take much more than a day to gather a significant number of games. twogtp is perfect for this. Or connect both to CGOS and see which ends up with a higher rating. But in that case it will take a week or more before you get conclusive data. Unless the difference is really clear. I did in fact put up a 100K+ ref-bot on CGOS for a little while, and it ended up with a rating slightly (possibly insignificantly) higher than the 2K ref-bot. Maybe I didn't put it there long enough, certainly not for thousands of games. But it didn't look anywhere near to supporting your findings. I say 100K+ because I didn't set it to a specific number, just run as many as it could within time allowed. Generally it would reach well over 100K per move, probably more like 250K-500K. That should only make things worse according to your hypothesis. So although I think the result of your experiment is very curious, I think it might be a bit hasty draw your conclusion. Mark On Mon, Dec 15, 2008 at 8:30 PM, Weston Markham <weston.mark...@gmail.com> wrote: > Hi. This is a continuation of a month-old conversation about the > possibility that the quality of AMAF Monte Carlo can degrade, as the > number of simulations increases: > > Me: "running 10k playouts can be significantly worse than running 5k > playouts." > > On Tue, Nov 18, 2008 at 2:27 PM, Don Dailey <drdai...@cox.net> wrote: >> On Tue, 2008-11-18 at 14:17 -0500, Weston Markham wrote: >>> On Tue, Nov 18, 2008 at 12:02 PM, Michael Williams >>> <michaelwilliam...@gmail.com> wrote: >>> > It doesn't make any sense to me from a theoretical perspective. Do you >>> > have >>> > empirical evidence? >>> >>> I used to have data on this, from a program that I think was very >>> nearly identical to Don's reference spec. When I get a chance, I'll >>> try to reproduce it. >> >> Unless the difference is large, you will have to run thousands of games >> to back this up. >> >> - Don > > I am comparing the behavior of the AMAF reference bot with 5000 > playouts against the behavior with 100000 playouts, and I am only > considering the first ten moves (five from each player) of the (9x9) > games. I downloaded a copy of Don's reference bot, as well as a copy > of Mogo, which is used as an opponent for each of the two settings. > gnugo version 3.7.11 is also used, in order to judge which side won > (jrefgo or mogo) after each individual match. gnugo was used because > it is simple to set it up for this sort of thing via command-line > options, and it seems plausible that it should give a somewhat > realistic assessment of the situation. > > jrefgo always plays black, and Mogo plays white. Komi is set to 0.5, > so that jrefgo has a reasonable number of winning lines available to > it, although the general superiority of Mogo means that egregiously > bad individual moves will be punished. > > In the games played, Mogo would occasionally crash. (This was run > under Windows Vista; perhaps there is some incompatibility of the > binary I downloaded) I have discarded these games (about 1 out of 50, > I think) from the statistics gathered. As far as I know, there would > be no reason to think that this would skew the comparison between 5k > playouts and 100k playouts. Other than occasional crashes, the > behavior of Mogo seemed reasonable in other games that I observed. I > have no reason to think that it was not playing at a relatively high > level in the retained results. > > Out of 3637 matches using 5k playouts, jrefgo won (i.e., was ahead > after 10 moves, as estimated by gnugo) 1688 of them. (46.4%) > Out of 2949 matches using 100k playouts, jrefgo won 785. (26.6%) > > It appears clear to me that increasing the number of playouts from 5k > to 100k certainly degrades the performance of jrefgo. Below, I am > including the commands that I used to run the tests and tally the > results. > > Weston > > > $ cat scratch5k.sh > > ../gogui-1.1.3/bin/gogui-twogtp -auto -black "\"C:\\\\Program > Files\\\\Java\\\\j > dk1.6.0_06\\\\bin\\\\java.exe\" -jar jrefgo.jar 5000" -games 10000 -komi 0.5 > -ma > xmoves 10 -referee "gnugo --mode gtp --score aftermath --chinese-rules > --positio > nal-superko" -sgffile games/jr5k-v-mogo -size 9 -white > C:\\\\cygwin\\\\home\\\\E > xperience\\\\projects\\\\go\\\\MoGo_release3\\\\mogo.exe > > > $ grep B+ games/jr5k-v-mogo.dat | grep -v unexp | wc -l > 1688 > > $ grep W+ games/jr5k-v-mogo.dat | grep -v unexp | wc -l > 1949 > > $ grep B+ games/jr100k-v-mogo.dat | grep -v unexp | wc -l > 785 > > $ grep W+ games/jr100k-v-mogo.dat | grep -v unexp | wc -l > 2164 > _______________________________________________ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/