Dear all, Today I have tried Professor Drake's "last good reply" in Erica. So far, I got at most 20-30 elo from it.
I tested by self-play, with 3000 playouts/move on 19x19. The amount of playouts might be too few, but I would like to test more playouts IF the playing strength is not weaker with 3000 playouts. At first I tried the original scheme: play the "last good reply" deterministically, but it did not work at all. Then I tried to increase the probability of the "last good reply" (since I use probabilistic simulation in Erica), then the winning rate became almost 50% after 250 games. Finally I tried to include "forgetting", the winning rate increased to around 55% after 500 games. I also tried to decrease the probability for the "last-LOST-reply", still 50% after 200 games. >From this preliminary experiments with 3000 playouts, I have some observations: 1. In Erica, it's better to consider probability for this heuristic. 2. In Prof. Drake's implementation, there is a weakness in learning. I think the main problem is that for a reply which is deterministically played by default policy, there is no room to learn a new reply. For example, if "save by capture" produces a lost game, then in the next simulation, it will still play "save by capture" by default policy. If I am wrong in this point, I am glad to be corrected by anyone. 3. This heuristic has potential to perform better in Erica. I hope this brief result would encourage other authors to try it. Aja
_______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
