Yamato wrote:

I finally improved my playouts by using Remi's ELO system to learn a set of "interesting" patterns, and just randomly fiddling with the probabilities (compressing/expanding) until something improved my program in self-play with about +25%. Not a very satisfying method or an exceptional result. There could be some other magic combination that is even better, or maybe not.

I also have implemented Remi's Minorization-Maximization algorithm.
But I could not find how to use the result of it to improve the strength.
>
Would you explain the details of the playout policy?

(1) Captures of groups that could not save themselves last move.
(2) Save groups in atari due to last move by capturing or extending.
(3) Patterns next to last move.
(4) Global moves.

I quantize the MM pattern scores to 0..255 by multipying them with a large constant and clipping. This causes the "very good" patterns to have close scores. I then use a threshold so I do not play the very bad patterns at all. The remaining moves are played with the probabilities indicated by the quantized values.

I also throw away very bad moves in phase (4) unless there are no alternatives. This gives a small but measurable improvement.

But now I believe all the above is actually flawed. With this system I will play bad saving moves even if there are great pattern moves. It might be that your ladder detection avoids these problems somewhat.

Considering the probabilities of all moves as Crazy Stone does avoids this problem.

I am now trying to get a similar effect without incrementally updating all urgencies.

Do you use only 3x3 patterns?

Yes.

I have not tried bigger ones. For size = 4 the tables would become 2 x 16M. Might be worth a try.

--
GCP
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to