I'm thinking about upgrading my palm OS Go program and I'm looking for ways to strengthen it without adding a lot of code or memory. Basically I want the strongest possible program that is truly "light" in every way.
PDA's of course are MUCH slower than PC's, and memory is at a premium although that has been improved somewhat over the years. However for the current Palm OS, using a lot of memory is a bit of a hack, reminding me of the old days of DOS where you could use a lot of memory but you had to do extra programming to get away with it. My first effort was to make the play-outs heavier. It turns out that this is pretty easy. I won't elaborate but simply decreasing or increasing the frequency of certain kinds of moves made a pretty substantial improvement without adding too much code. It does of course slow the program considerably but there is net gain (and I have not done an optimization pass.) The original program does not improve much over 5000 simulations and that is about the limit on a modern palm, a happy coincidence for a light program. The palm requires 2 sets of code if you want to take advantage of the ARM processors, and this of course increases the code footprint, but this must be done to enable 5000 play-outs in a reasonable amount of time. I have not tested if the heavy play-outs also "tops out" at 5000. I probably cannot currently have a practical playing level at 5000 play-outs with heavy play-outs. Another idea is to have a UCT style program that severely limits the tree, perhaps a 2-4 depth UCT program. For the time being, I am not going to walk that path since this is more complicated, I would probably have to bring in math libraries to handle the calculation part and this may be a major drag on a palm device. I think it would be a major project to get this working really well without it being a resource hog. Just playing around I discovered an interesting idea that I may try to develop. While thinking of the limitations of pure monte carlo without a tree, one of the simple cases that stands out is the programs willingness to move into self-atari if the naive MC play-outs believes the threat of the capture is strong. The play-outs are naive, and the threat outweighs the refutation from the point of view of the play-outs, even though the opposite is usually true in most positions. Thus you will often see pure MC playing stupid self-atari moves. I use an artificial fix in the current palm program (which is identical to AnchorMan.) All-moves-as-first is a very powerful resource saving technique that has served Ogo and AnchorMan well, allowing a relatively passable game with very little CPU and memory resources. But some of kind of lookahead would solve a lot of problems. So that gave me the idea of ADDING a search to the move logic. The program first does it's play-outs, then it does a shallow alpha/beta search, using information from the original play-outs as an evaluation function. So here is my first attempt: 1. Do N play-outs. 2. At the end of each play-out, build an all-moves-as-first table for BOTH colors, tracking the count and score for each move (as many programs do now.) 3. After the N play-outs, do a shallow alpha/beta search. (I tried 2-6 ply) 4. The evaluation function for the search is simply the sum of the scores of all the moves leading to the leaf node divided by the sum of the counts of all the moves leading the leaf nodes. This is a very simple, but not a very good evaluation function. It knows nothing about move order or timing of certain moves. Nevertheless, it is not completely useless either. When combined with alpha beta it can figure out some useful concepts. For instance self-atari gets punished to an extent because the opponent see's the killer move. On a 4 ply search, the opponent even can see that it gets to occupy the vacated points - and these moves will tend to be high scoring moves. I ran off about 500 games per depth (1, 2, 3 and 4 ply searches) and found that the program improves substantially on the even ply searches. The 3 ply search, with reasonable statistical certainty, is no better (or not substantially) better than the 2 ply search and shows about 10 ELO weaker. The 4 ply search shows a 100 ELO jump over 2 and 3. The 4 ply search is very fast because the evaluation function is trivial - taking just a fraction of a second. The 6 ply search (which still shows a modest improvement) takes less than a second on a core 2 duo when I implemented reasonable move ordering. So my conclusion is that this, by itself (doing a 2 or 4 ply search), is a reasonable way to get a modest improvement while keeping the program extremely "light." Still, it's not very satisfying. I'm quite sure we could do far better that this. To generalize, what we want is this: a way to pre-process position specific data to enable an alpha/beta searcher to be especially effective. So I have some more ideas for super-light programs. In chess, the older programs made heavy use of what was called piece-square-tables. The tables stated exactly what score to give each piece type on each square and these were all summed together to create a super fast, and yet relatively powerful, evaluation function. Before the search begins, an "oracle" process determines what scores to assign each table entry. The oracle can afford to be very sophisticated, it could understand what type of position called for what kind of moves. For instance if the position called for queen side castling, the table was adjusted to "encourage" queen side castling. Knight outposts could be located and those squares be given a bonus and so on. The table layed out general goals, and the search integrated those goals into a coherent plan. It occurs to me that this is very similar to what I am doing here. It might be possible, with the help of play-outs, to build "something like" a piece square table that enables a very simple and fast evaluation function to effectively do it's job. A trivial example - if you can identify that a self-atari is a bad move, you make adjustments to the table to ensure that a search will quickly discard such a move. If a certain piece of territory is a forgone conclusion, you could tell the "piece square table" that moving there doesn't contribute to the position (or even detracts from it.) It's my hope that if especially good techniques in this direction are found, it might even be applied effectively to other kinds of programs, in a hybrid way. - Don _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/