[computer-go] Testing Process

Brian Sheppard Mon, 28 Sep 2009 15:10:07 -0700

>By now, I should probably find better reference opponent than
>gnugo... I wonder if to pick fuego or mogo... ;-) Strength is probably
>not _as_ important as the variety of techniques used in order to avoid
>selective blindness (that's why I don't like tuning by self-play),
>does anyone have a tip? Or do higher gnugo levels make much strength
>difference?


Pebbles doesn't follow the norms, but I am very happy with my solution: just
play constantly on CGOS.

CGOS provides a range of opponents of different styles, including non-MCTS
opponents that use neural networks or alpha-beta. You can play 140 games per
day, which is plenty.

You are right about selective blindness. Some programs, like GnuGo,
regularly donate rating points to Pebbles. Others, like Valkyria, take those
away. You need a range of opponents. Overall, CGOS gives a realistic measure
of where you stand.

I have some suggestions for effective use of CGOS for testing purposes.

First, I don't aim to have the highest-rated program, because I am not
caught up in the hardware race. Instead, I use a modest CPU and try to write
good software. Some programs run on an 8-core processor, play exactly 100
games (to get a non-provisional rating), lose only 10, and then disappear.
The programmers must have good reasons for this, but running on elite
hardware obscures flaws in the software, so it isn't for me. Even when
Pebbles has better hardware, I will continue to run a version on CGOS using
low hardware. I learn more from losses!

Second, I run two identical copies of Pebbles. The other is called
PebblesToo. Both copies run on the same dual-core machine, so they have the
same performance over the long run. PebblesToo serves three purposes. One,
it doubles the pace of data collection. This is far better use of the second
core than playing twice as fast, IMO. Two, when Pebbles is matched up
against Valkyria, PebblesToo is necessarily playing a weaker opponent, which
provides a more balanced view. Three, Pebbles will get some self-play games,
which are a necessary part of an overall testing strategy.

Third, I run a version of Fuego that is a little below Pebbles. It is called
fuego-0.4-slow. This program serves two purposes. One, it sucks up games
against very low rated players (because CGOS favors pairings of equal
opponents). Two, it provides a "nearby" program that is always available, so
if Aya and Lingo are offline there is still an opponent of about the right
level.

Fourth, Pebbles plays a lot of 9x9 games. Such games give your program an
intense tactical workout at 10 minutes per game. I assure you that no
current program has adequate tactics. Most strong programs have opening
libraries and run on much more powerful computers. When Pebbles defeats such
an opponent, it is invariably because they overlook a tactical shot. IMO,
programmers that disdain 9x9 are learning about flaws more slowly than
necessary.

Fifth, Pebbles saves two positions from every loss: the last position in
which it thought it was winning (eval of the selected move >= 50%), and the
position in which it thought it had the greatest advantage.

Pebbles regularly (~1 or 2 games per day) loses games where it thinks it
will win >90% of the time. I always learn something by analyzing those
games.

Sixth and finally, CGOS is a community resource that is more valuable when
it used more by more people. So use it! Run your program as often as
possible. Pebbles has played ~16000 games. I rarely have new software, but
Pebbles plays anyway.

Following this process raised Pebbles by ~1200 rating points over a 6-month
period, all on the same hardware. Pebbles now beats GnuGo by ~94% with no
specific tuning towards that goal. If I targeted GnuGo, the percentage would
run well over 100%. :-)

But there are larger and higher goals. Lingo, Aya, Valkyria, MFGO, Fuego,
and Mogo take turns teaching me how to write Go programs. Someday Pebbles
will return the favor. After I buy an 8-core, maybe.
:-)



_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] Testing Process

Reply via email to