[computer-go] experiments in alpha beta

Don Dailey Thu, 07 Feb 2008 08:42:36 -0800

I'm thinking about upgrading my palm OS  Go program and I'm looking for
ways to strengthen it without adding a lot of code or memory.  
Basically I want the strongest possible program that is truly "light" in
every way.


PDA's of course are MUCH slower than PC's,  and memory is at a premium
although that has been improved somewhat over the years.   However for
the current Palm OS,  using a lot of memory is a bit of a hack,
reminding me of the old days of DOS where you could use a lot of memory
but you had to do extra programming to get away with it.  

My first effort was to make the play-outs heavier.   It turns out that
this is pretty easy.    I won't elaborate but simply decreasing or
increasing the frequency of certain kinds of moves made a pretty
substantial improvement without adding too much code.   It does of
course slow the program considerably but there is net gain (and I have
not done an optimization pass.)   The original program does not improve
much over 5000 simulations and that is about the limit on a modern palm,
a happy coincidence for a light program.   The palm requires 2 sets of
code if you want to take advantage of the ARM processors, and this of
course increases the code footprint, but this must be done to enable
5000 play-outs in a reasonable amount of time.     I have not tested if
the heavy play-outs also "tops out" at 5000.    I probably cannot
currently have a practical playing level at 5000 play-outs with heavy
play-outs.  

Another idea is to have a UCT style program that severely limits the
tree, perhaps a 2-4 depth UCT program.  For the time being, I am not
going to walk that path since this is more complicated, I would probably
have to bring in math libraries to handle the calculation part  and this
may be a major drag on a palm device.    I think it would be a major
project to get this working really well without it being a resource hog.

Just playing around I discovered an interesting idea that I may try to
develop.   While thinking of the limitations of pure monte carlo without
a tree,   one of the simple cases that stands out is the programs
willingness to move into self-atari if the naive MC play-outs believes
the threat of the capture is strong.   The play-outs are naive, and the
threat outweighs the refutation from the point of view of the play-outs,
even though the opposite is usually  true in most positions.    Thus you
will often see pure MC playing stupid self-atari moves.       I use an
artificial fix in the current palm program (which is identical to
AnchorMan.)

All-moves-as-first is a very powerful resource saving technique that has
served Ogo and AnchorMan well, allowing a relatively passable game with
very little CPU and memory resources.    But some of kind of lookahead
would solve a lot of problems.     So that gave me the idea of ADDING a
search to the move logic.     The program first does it's play-outs,  
then it does a shallow alpha/beta search, using information from the
original play-outs as an evaluation function.   So here is my first attempt:

1.  Do N play-outs.

2.  At the end of each play-out, build an all-moves-as-first table for
BOTH colors,  tracking the count and score for each move (as many
programs do now.)

3.  After the N play-outs, do a shallow alpha/beta search.  (I tried 2-6
ply) 

4.  The evaluation function for the search is simply the sum of the
scores of all the moves leading to the leaf node divided by the sum of
the counts of all the moves leading the leaf nodes. 

This is a very simple, but not a very good evaluation function.   It
knows nothing about move order or timing of certain moves.   
Nevertheless, it is not completely useless either.   When combined with
alpha beta it can figure out some useful concepts.   For instance
self-atari gets punished to an extent because the opponent see's the
killer move.    On a 4 ply search, the opponent even can see that it
gets to occupy the vacated points - and these moves will tend to be high
scoring moves.  

I ran off about 500 games per depth (1, 2, 3 and 4 ply searches) and
found that the program improves substantially on the even ply
searches.   The 3 ply search, with reasonable statistical certainty, is
no better (or not substantially) better than the 2 ply search and shows
about 10 ELO weaker.    The 4 ply search shows a 100 ELO jump over 2 and
3.    

The 4 ply search is very fast because the evaluation function is trivial
- taking just a fraction of a second.   The 6 ply search (which still
shows a modest  improvement) takes less than a second on a core 2 duo
when I implemented reasonable move ordering. 

So my conclusion is that this, by itself (doing a 2 or 4 ply search), is
a reasonable way to get a modest improvement while keeping the program
extremely "light."    

Still,  it's not very satisfying.   I'm quite sure we could do far
better that this.   To generalize, what we want is this:    a way  to
pre-process position specific data to enable an alpha/beta searcher to
be especially effective.    So I have some more ideas for super-light
programs. 

In chess, the older programs made heavy use of what was called
piece-square-tables.   The tables stated exactly what score to give each
piece type on each square and these were all summed together to create a
super fast, and yet relatively powerful, evaluation function.    Before
the search begins, an "oracle" process determines what scores to assign
each table entry.   The oracle can afford to be very sophisticated,  it
could understand what type of position called for what kind of moves.   
For instance if the position called for queen side castling, the table
was adjusted to "encourage" queen side castling.     Knight outposts
could be located and those squares be given a bonus and so on.     The
table layed out  general goals,  and the search integrated those goals
into a coherent plan.  

It occurs to me that this is very similar to what I am doing here.    
It might be possible,  with the help of play-outs,  to build "something
like" a piece square table that enables a very simple and fast
evaluation function to effectively do it's job.       A trivial example
- if you can identify that a self-atari is a bad move, you make
adjustments to the table to ensure that a search will quickly discard
such a move.    If a certain piece of territory is a forgone conclusion,
you could tell the "piece square table" that moving there doesn't
contribute to the position (or even detracts from it.)  

It's my hope that if especially good techniques in this direction are
found,  it might even be applied effectively to other kinds of programs,
in a hybrid way.

- Don













_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] experiments in alpha beta

Reply via email to