Re: [computer-go] How to design the stronger playout policy?

Gian-Carlo Pascutto Sat, 05 Jan 2008 02:48:56 -0800

Yamato wrote:

I guess the current top programs have much better playout policy than
the classical MoGo-style one.


The original policy of MoGo was,

(1) If the last move is an Atari, plays one saving move randomly.
(2) If there are "interesting" moves in the 8 positions around the
    last move, plays one randomly.
(3) If there are the moves capturing stones, plays one randomly.
(4) Plays one random move on the board.

I (and maybe many others) use it with some improvements, however it
will be not enough to catch up the top programs.

What improvements did you try? The obvious one I know are prioritizingsaving and capturing moves by the size of the string.

Zen appears quite strong on CGOS. Leela using the above system wascertainly weaker.

Then I have tested a lot of change of probability distributions, but
it was very hard to improve the strength.

Any comments?

I had the same problem, i.e. it seems almost impossible to improve theMoGo system by having a different pattern set for "interesting" moves,or even by varying the probability of "interesting" moves by pattern score.


I tried 2 things:

a) I exctracted about 5000 positions with a known winner (determined byUCT) from CGOS games, and measured the Mean Square Error of the resultfof my playouts against the known result (also described in one of theMoGo papers). Then I applied a genetic algorithm to optimize the playoutpatterns.

This worked, in the sense that the MSE measured over the 5000 positionsdropped. However, it did not produce a stronger program! I found thatsomewhat shocking.


It makes me doubt the value of the MSE measure.

2) I made a simple genetic algorithm that makes a random pool of a fewhundred playout policites, picks 2 random parents and crossovers/mutatesto 2 children, plays a 10 game match between the 2 children withsimulations = 100, and then keeps the winner.

This did not produce anything interesting either. My best guess is thatthe match results are simply too random.


So I did not found any way to automatically optimize the patterns.

I finally improved my playouts by using Remi's ELO system to learn a setof "interesting" patterns, and just randomly fiddling with theprobabilities (compressing/expanding) until something improved myprogram in self-play with about +25%. Not a very satisfying method or anexceptional result. There could be some other magic combination that iseven better, or maybe not.

I now got some improvement by "merging" the (1) (2) (3) in the MoGosystem and using probabilities on that. It makes sense because theplayouts wont try hopeless saving moves, for example.

What is so frustrating is that the playouts are essentially black magic.I know of no way to automatically determine what is good and notbesides playing about 500 games between 2 strategies. The results arevery often completely counterintuitive. There is no systematic way toimprove.


--
GCP
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to design the stronger playout policy?

Reply via email to