Yamato wrote:
I guess the current top programs have much better playout policy than
the classical MoGo-style one.

The original policy of MoGo was,

(1) If the last move is an Atari, plays one saving move randomly.
(2) If there are "interesting" moves in the 8 positions around the
    last move, plays one randomly.
(3) If there are the moves capturing stones, plays one randomly.
(4) Plays one random move on the board.

I (and maybe many others) use it with some improvements, however it
will be not enough to catch up the top programs.

What improvements did you try? The obvious one I know are prioritizing saving and capturing moves by the size of the string.

Zen appears quite strong on CGOS. Leela using the above system was certainly weaker.

Then I have tested a lot of change of probability distributions, but
it was very hard to improve the strength.

Any comments?

I had the same problem, i.e. it seems almost impossible to improve the MoGo system by having a different pattern set for "interesting" moves, or even by varying the probability of "interesting" moves by pattern score.

I tried 2 things:

a) I exctracted about 5000 positions with a known winner (determined by UCT) from CGOS games, and measured the Mean Square Error of the result fof my playouts against the known result (also described in one of the MoGo papers). Then I applied a genetic algorithm to optimize the playout patterns.

This worked, in the sense that the MSE measured over the 5000 positions dropped. However, it did not produce a stronger program! I found that somewhat shocking.

It makes me doubt the value of the MSE measure.

2) I made a simple genetic algorithm that makes a random pool of a few hundred playout policites, picks 2 random parents and crossovers/mutates to 2 children, plays a 10 game match between the 2 children with simulations = 100, and then keeps the winner.

This did not produce anything interesting either. My best guess is that the match results are simply too random.

So I did not found any way to automatically optimize the patterns.

I finally improved my playouts by using Remi's ELO system to learn a set of "interesting" patterns, and just randomly fiddling with the probabilities (compressing/expanding) until something improved my program in self-play with about +25%. Not a very satisfying method or an exceptional result. There could be some other magic combination that is even better, or maybe not.

I now got some improvement by "merging" the (1) (2) (3) in the MoGo system and using probabilities on that. It makes sense because the playouts wont try hopeless saving moves, for example.

What is so frustrating is that the playouts are essentially black magic. I know of no way to automatically determine what is good and not besides playing about 500 games between 2 strategies. The results are very often completely counterintuitive. There is no systematic way to improve.

--
GCP
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to