When thinking about the apparent strength loss, I came up with a potential theory: consistency. With more simulations, noise has less of an impact. I'm going to guess that the known bias of AMAF leads to blunder that is played more consistently. Bots with fewer simulations would make the blunder too, but also pick "sub-optimal" moves due to evaluation noise.

Sent from my iPhone

On Dec 16, 2008, at 3:48 AM, Denis fidaali <denis.fida...@hotmail.fr> wrote:


 I agree that the experience is interesting in itself.
 I also agree that it's hard to draw any conclusion
 from it :) Running the game to the end would probably
 give near 0% win for the AMAF bot.

 Running the 5k bot against the 100K bot is certainly
 something you would like to do if you were to argue
 that 5k is indeed stronger. Although it also might
 be that for some reason the 5k bot is better at
 the oppening. The 5k has a wider range of choice
 while playing than the 100k bot. So it's easy
 to imagine that it plays the good (oppening) moves more.

 All in all, trying to assess the strength of a bot
 is awfully hard. It can make very good move
 and yet be very weak. It can have good global
 perception, or move ordonnancing, and be very
 weak. It can predict pro-moves with incredible
 accurracy, and still be very weak. (although you
 then would be able to use this prediction feature
 in a monte-carlo bot - CrazyStone).

 I guess any hard data will always be welcome. Your
 experiment was very original, in that few people
 would have tried it. I have no idea what one should
 conclude out of it. But it certainly can't hurt our
 understanding :) (or un-understanding) Maybe some day
 someone will look-up at this particular experiment
 and come out with the next computer-go revolution :)

> Date: Mon, 15 Dec 2008 21:10:07 -0200
> From: tesujisoftw...@gmail.com
> To: computer-go@computer-go.org
> Subject: Re: [computer-go] RefBot (thought-) experiments
>
> Weston,
>
> Although those result sound intriguing, it also looks like a
> convoluted experiment. I wouldn't call gnu-go an expert judge,
> although it is an impartial one. The fact that it says that the 5K
> ref-bot is ahead after 10 moves 46% of the time alone makes it suspect
> in my eyes. But it is curious it consistently shows a much lower
> percentage for the bot with more playouts.
>
> It would have been much more persuasive if you had simply run a 5K
> playout bot against a 100K bot and see which wins more. It shouldn't
> take much more than a day to gather a significant number of games.
> twogtp is perfect for this. Or connect both to CGOS and see which ends > up with a higher rating. But in that case it will take a week or more > before you get conclusive data. Unless the difference is really clear.
>
> I did in fact put up a 100K+ ref-bot on CGOS for a little while, and
> it ended up with a rating slightly (possibly insignificantly) higher
> than the 2K ref-bot. Maybe I didn't put it there long enough,
> certainly not for thousands of games. But it didn't look anywhere near
> to supporting your findings.
>
> I say 100K+ because I didn't set it to a specific number, just run as
> many as it could within time allowed. Generally it would reach well
> over 100K per move, probably more like 250K-500K. That should only
> make things worse according to your hypothesis.
>
> So although I think the result of your experiment is very curious, I
> think it might be a bit hasty draw your conclusion.
>
> Mark
>
>
> On Mon, Dec 15, 2008 at 8:30 PM, Weston Markham
> <weston.mark...@gmail.com> wrote:
> > Hi. This is a continuation of a month-old conversation about the
> > possibility that the quality of AMAF Monte Carlo can degrade, as the
> > number of simulations increases:
> >
> > Me: "running 10k playouts can be significantly worse than running 5k playouts."
> >
> > On Tue, Nov 18, 2008 at 2:27 PM, Don Dailey <drdai...@cox.net> wrote:
> >> On Tue, 2008-11-18 at 14:17 -0500, Weston Markham wrote:
> >>> On Tue, Nov 18, 2008 at 12:02 PM, Michael Williams
> >>> <michaelwilliam...@gmail.com> wrote:
> >>> > It doesn't make any sense to me from a theoretical perspective. Do you have
> >>> > empirical evidence?
> >>>
> >>> I used to have data on this, from a program that I think was very > >>> nearly identical to Don's reference spec. When I get a chance, I'll
> >>> try to reproduce it.
> >>
> >> Unless the difference is large, you will have to run thousands of games
> >> to back this up.
> >>
> >> - Don
> >
> > I am comparing the behavior of the AMAF reference bot with 5000
> > playouts against the behavior with 100000 playouts, and I am only
> > considering the first ten moves (five from each player) of the (9x9) > > games. I downloaded a copy of Don's reference bot, as well as a copy > > of Mogo, which is used as an opponent for each of the two settings. > > gnugo version 3.7.11 is also used, in order to judge which side won > > (jrefgo or mogo) after each individual match. gnugo was used because
> > it is simple to set it up for this sort of thing via command-line
> > options, and it seems plausible that it should give a somewhat
> > realistic assessment of the situation.
> >
> > jrefgo always plays black, and Mogo plays white. Komi is set to 0.5, > > so that jrefgo has a reasonable number of winning lines available to > > it, although the general superiority of Mogo means that egregiously
> > bad individual moves will be punished.
> >
> > In the games played, Mogo would occasionally crash. (This was run
> > under Windows Vista; perhaps there is some incompatibility of the
> > binary I downloaded) I have discarded these games (about 1 out of 50, > > I think) from the statistics gathered. As far as I know, there would > > be no reason to think that this would skew the comparison between 5k
> > playouts and 100k playouts. Other than occasional crashes, the
> > behavior of Mogo seemed reasonable in other games that I observed. I > > have no reason to think that it was not playing at a relatively high
> > level in the retained results.
> >
> > Out of 3637 matches using 5k playouts, jrefgo won (i.e., was ahead
> > after 10 moves, as estimated by gnugo) 1688 of them. (46.4%)
> > Out of 2949 matches using 100k playouts, jrefgo won 785. (26.6%)
> >
> > It appears clear to me that increasing the number of playouts from 5k
> > to 100k certainly degrades the performance of jrefgo. Below, I am
> > including the commands that I used to run the tests and tally the
> > results.
> >
> > Weston
> >
> >
> > $ cat scratch5k.sh
> >
> > ../gogui-1.1.3/bin/gogui-twogtp -auto -black "\"C:\\\\Program Files\\\\Java\\\\j > > dk1.6.0_06\\\\bin\\\\java.exe\" -jar jrefgo.jar 5000" -games 10000 -komi 0.5 -ma > > xmoves 10 -referee "gnugo --mode gtp --score aftermath --chinese- rules --positio > > nal-superko" -sgffile games/jr5k-v-mogo -size 9 -white C:\\\ \cygwin\\\\home\\\\E
> > xperience\\\\projects\\\\go\\\\MoGo_release3\\\\mogo.exe
> >
> >
> > $ grep B+ games/jr5k-v-mogo.dat | grep -v unexp | wc -l
> > 1688
> >
> > $ grep W+ games/jr5k-v-mogo.dat | grep -v unexp | wc -l
> > 1949
> >
> > $ grep B+ games/jr100k-v-mogo.dat | grep -v unexp | wc -l
> > 785
> >
> > $ grep W+ games/jr100k-v-mogo.dat | grep -v unexp | wc -l
> > 2164
> > _______________________________________________
> > computer-go mailing list
> > computer-go@computer-go.org
> > http://www.computer-go.org/mailman/listinfo/computer-go/
> >
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/

Discutez sur Messenger où que vous soyez ! Mettez Messenger sur votr e mobile !
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to