Re: [computer-go] RefBot (thought-) experiments

Jason House Tue, 16 Dec 2008 06:21:13 -0800

When thinking about the apparent strength loss, I came up with apotential theory: consistency. With more simulations, noise has lessof an impact. I'm going to guess that the known bias of AMAF leads toblunder that is played more consistently. Bots with fewer simulationswould make the blunder too, but also pick "sub-optimal" moves due toevaluation noise.


Sent from my iPhone

On Dec 16, 2008, at 3:48 AM, Denis fidaali <denis.fida...@hotmail.fr>wrote:

 I agree that the experience is interesting in itself.
 I also agree that it's hard to draw any conclusion
 from it :) Running the game to the end would probably
 give near 0% win for the AMAF bot.

 Running the 5k bot against the 100K bot is certainly
 something you would like to do if you were to argue
 that 5k is indeed stronger. Although it also might
 be that for some reason the 5k bot is better at
 the oppening. The 5k has a wider range of choice
 while playing than the 100k bot. So it's easy
 to imagine that it plays the good (oppening) moves more.

 All in all, trying to assess the strength of a bot
 is awfully hard. It can make very good move
 and yet be very weak. It can have good global
 perception, or move ordonnancing, and be very
 weak. It can predict pro-moves with incredible
 accurracy, and still be very weak. (although you
 then would be able to use this prediction feature
 in a monte-carlo bot - CrazyStone).

 I guess any hard data will always be welcome. Your
 experiment was very original, in that few people
 would have tried it. I have no idea what one should
 conclude out of it. But it certainly can't hurt our
 understanding :) (or un-understanding) Maybe some day
 someone will look-up at this particular experiment
 and come out with the next computer-go revolution :)

> Date: Mon, 15 Dec 2008 21:10:07 -0200
> From: tesujisoftw...@gmail.com
> To: computer-go@computer-go.org
> Subject: Re: [computer-go] RefBot (thought-) experiments
>
> Weston,
>
> Although those result sound intriguing, it also looks like a
> convoluted experiment. I wouldn't call gnu-go an expert judge,
> although it is an impartial one. The fact that it says that the 5K
> ref-bot is ahead after 10 moves 46% of the time alone makes itsuspect
> in my eyes. But it is curious it consistently shows a much lower
> percentage for the bot with more playouts.
>
> It would have been much more persuasive if you had simply run a 5K
> playout bot against a 100K bot and see which wins more. It shouldn't
> take much more than a day to gather a significant number of games.
> twogtp is perfect for this. Or connect both to CGOS and see whichends> up with a higher rating. But in that case it will take a week ormore> before you get conclusive data. Unless the difference is reallyclear.
>
> I did in fact put up a 100K+ ref-bot on CGOS for a little while, and
> it ended up with a rating slightly (possibly insignificantly) higher
> than the 2K ref-bot. Maybe I didn't put it there long enough,
> certainly not for thousands of games. But it didn't look anywherenear
> to supporting your findings.
>
> I say 100K+ because I didn't set it to a specific number, just runas
> many as it could within time allowed. Generally it would reach well
> over 100K per move, probably more like 250K-500K. That should only
> make things worse according to your hypothesis.
>
> So although I think the result of your experiment is very curious, I
> think it might be a bit hasty draw your conclusion.
>
> Mark
>
>
> On Mon, Dec 15, 2008 at 8:30 PM, Weston Markham
> <weston.mark...@gmail.com> wrote:
> > Hi. This is a continuation of a month-old conversation about the
> > possibility that the quality of AMAF Monte Carlo can degrade, asthe
> > number of simulations increases:
> >
> > Me: "running 10k playouts can be significantly worse thanrunning 5k playouts."
> >
> > On Tue, Nov 18, 2008 at 2:27 PM, Don Dailey <drdai...@cox.net>wrote:
> >> On Tue, 2008-11-18 at 14:17 -0500, Weston Markham wrote:
> >>> On Tue, Nov 18, 2008 at 12:02 PM, Michael Williams
> >>> <michaelwilliam...@gmail.com> wrote:
> >>> > It doesn't make any sense to me from a theoreticalperspective. Do you have
> >>> > empirical evidence?
> >>>
> >>> I used to have data on this, from a program that I think wasvery> >>> nearly identical to Don's reference spec. When I get a chance,I'll
> >>> try to reproduce it.
> >>
> >> Unless the difference is large, you will have to run thousandsof games
> >> to back this up.
> >>
> >> - Don
> >
> > I am comparing the behavior of the AMAF reference bot with 5000
> > playouts against the behavior with 100000 playouts, and I am only
> > considering the first ten moves (five from each player) of the(9x9)> > games. I downloaded a copy of Don's reference bot, as well as acopy> > of Mogo, which is used as an opponent for each of the twosettings.> > gnugo version 3.7.11 is also used, in order to judge which sidewon> > (jrefgo or mogo) after each individual match. gnugo was usedbecause
> > it is simple to set it up for this sort of thing via command-line
> > options, and it seems plausible that it should give a somewhat
> > realistic assessment of the situation.
> >
> > jrefgo always plays black, and Mogo plays white. Komi is set to0.5,> > so that jrefgo has a reasonable number of winning linesavailable to> > it, although the general superiority of Mogo means thategregiously
> > bad individual moves will be punished.
> >
> > In the games played, Mogo would occasionally crash. (This was run
> > under Windows Vista; perhaps there is some incompatibility of the
> > binary I downloaded) I have discarded these games (about 1 outof 50,> > I think) from the statistics gathered. As far as I know, therewould> > be no reason to think that this would skew the comparisonbetween 5k
> > playouts and 100k playouts. Other than occasional crashes, the
> > behavior of Mogo seemed reasonable in other games that Iobserved. I> > have no reason to think that it was not playing at a relativelyhigh
> > level in the retained results.
> >
> > Out of 3637 matches using 5k playouts, jrefgo won (i.e., was ahead
> > after 10 moves, as estimated by gnugo) 1688 of them. (46.4%)
> > Out of 2949 matches using 100k playouts, jrefgo won 785. (26.6%)
> >
> > It appears clear to me that increasing the number of playoutsfrom 5k
> > to 100k certainly degrades the performance of jrefgo. Below, I am
> > including the commands that I used to run the tests and tally the
> > results.
> >
> > Weston
> >
> >
> > $ cat scratch5k.sh
> >
> > ../gogui-1.1.3/bin/gogui-twogtp -auto -black "\"C:\\\\ProgramFiles\\\\Java\\\\j> > dk1.6.0_06\\\\bin\\\\java.exe\" -jar jrefgo.jar 5000" -games10000 -komi 0.5 -ma> > xmoves 10 -referee "gnugo --mode gtp --score aftermath --chinese-rules --positio> > nal-superko" -sgffile games/jr5k-v-mogo -size 9 -white C:\\\\cygwin\\\\home\\\\E
> > xperience\\\\projects\\\\go\\\\MoGo_release3\\\\mogo.exe
> >
> >
> > $ grep B+ games/jr5k-v-mogo.dat | grep -v unexp | wc -l
> > 1688
> >
> > $ grep W+ games/jr5k-v-mogo.dat | grep -v unexp | wc -l
> > 1949
> >
> > $ grep B+ games/jr100k-v-mogo.dat | grep -v unexp | wc -l
> > 785
> >
> > $ grep W+ games/jr100k-v-mogo.dat | grep -v unexp | wc -l
> > 2164
> > _______________________________________________
> > computer-go mailing list
> > computer-go@computer-go.org
> > http://www.computer-go.org/mailman/listinfo/computer-go/
> >
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
Discutez sur Messenger où que vous soyez ! Mettez Messenger sur votre mobile !
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] RefBot (thought-) experiments

Reply via email to