RE: [computer-go] Testing Process

David Fotland Mon, 28 Sep 2009 23:15:51 -0700

You have inspired me to put Many Faces back on cgos, both 9x9 and 19x19,
using just one core on each, so it doesn't take much of my computing
resources.  Testing against gnugo says going from 1 core to 4 cores is about
150 ELO for Many Faces.  I should be able to keep Many Faces on CGOS
indefinitely, but I won't watch it much, so please let me know if a restart
or crash drops it off so I can restart it.


I'd suggest you put Pebbles on 19x19 also.  My 19x19 engine is much stronger
than it was a year ago, while my 9x9 is about the same, so there is a lot of
progress to be made in playing go that can't be discovered on 9x9.

I test against gnugo for a fast opponent to give quick regressions after
every change, since I like to play 1000 games or more.  I use kgs to give a
variety of stronger opponents to find tactical flaws.

David

> -----Original Message-----
> From: computer-go-boun...@computer-go.org [mailto:computer-go-
> boun...@computer-go.org] On Behalf Of Brian Sheppard
> Sent: Monday, September 28, 2009 3:10 PM
> To: computer-go@computer-go.org
> Subject: [computer-go] Testing Process
> 
> >By now, I should probably find better reference opponent than
> >gnugo... I wonder if to pick fuego or mogo... ;-) Strength is probably
> >not _as_ important as the variety of techniques used in order to avoid
> >selective blindness (that's why I don't like tuning by self-play),
> >does anyone have a tip? Or do higher gnugo levels make much strength
> >difference?
> 
> Pebbles doesn't follow the norms, but I am very happy with my solution:
> just
> play constantly on CGOS.
> 
> CGOS provides a range of opponents of different styles, including non-MCTS
> opponents that use neural networks or alpha-beta. You can play 140 games
> per
> day, which is plenty.
> 
> You are right about selective blindness. Some programs, like GnuGo,
> regularly donate rating points to Pebbles. Others, like Valkyria, take
> those
> away. You need a range of opponents. Overall, CGOS gives a realistic
> measure
> of where you stand.
> 
> I have some suggestions for effective use of CGOS for testing purposes.
> 
> First, I don't aim to have the highest-rated program, because I am not
> caught up in the hardware race. Instead, I use a modest CPU and try to
> write
> good software. Some programs run on an 8-core processor, play exactly 100
> games (to get a non-provisional rating), lose only 10, and then disappear.
> The programmers must have good reasons for this, but running on elite
> hardware obscures flaws in the software, so it isn't for me. Even when
> Pebbles has better hardware, I will continue to run a version on CGOS
> using
> low hardware. I learn more from losses!
> 
> Second, I run two identical copies of Pebbles. The other is called
> PebblesToo. Both copies run on the same dual-core machine, so they have
> the
> same performance over the long run. PebblesToo serves three purposes. One,
> it doubles the pace of data collection. This is far better use of the
> second
> core than playing twice as fast, IMO. Two, when Pebbles is matched up
> against Valkyria, PebblesToo is necessarily playing a weaker opponent,
> which
> provides a more balanced view. Three, Pebbles will get some self-play
> games,
> which are a necessary part of an overall testing strategy.
> 
> Third, I run a version of Fuego that is a little below Pebbles. It is
> called
> fuego-0.4-slow. This program serves two purposes. One, it sucks up games
> against very low rated players (because CGOS favors pairings of equal
> opponents). Two, it provides a "nearby" program that is always available,
> so
> if Aya and Lingo are offline there is still an opponent of about the right
> level.
> 
> Fourth, Pebbles plays a lot of 9x9 games. Such games give your program an
> intense tactical workout at 10 minutes per game. I assure you that no
> current program has adequate tactics. Most strong programs have opening
> libraries and run on much more powerful computers. When Pebbles defeats
> such
> an opponent, it is invariably because they overlook a tactical shot. IMO,
> programmers that disdain 9x9 are learning about flaws more slowly than
> necessary.
> 
> Fifth, Pebbles saves two positions from every loss: the last position in
> which it thought it was winning (eval of the selected move >= 50%), and
> the
> position in which it thought it had the greatest advantage.
> 
> Pebbles regularly (~1 or 2 games per day) loses games where it thinks it
> will win >90% of the time. I always learn something by analyzing those
> games.
> 
> Sixth and finally, CGOS is a community resource that is more valuable when
> it used more by more people. So use it! Run your program as often as
> possible. Pebbles has played ~16000 games. I rarely have new software, but
> Pebbles plays anyway.
> 
> Following this process raised Pebbles by ~1200 rating points over a 6-
> month
> period, all on the same hardware. Pebbles now beats GnuGo by ~94% with no
> specific tuning towards that goal. If I targeted GnuGo, the percentage
> would
> run well over 100%. :-)
> 
> But there are larger and higher goals. Lingo, Aya, Valkyria, MFGO, Fuego,
> and Mogo take turns teaching me how to write Go programs. Someday Pebbles
> will return the favor. After I buy an 8-core, maybe.
> :-)
> 
> 
> 
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

RE: [computer-go] Testing Process

Reply via email to