[Computer-go] The heuristic "last good reply"

Aja Tue, 25 Jan 2011 10:19:42 -0800

Dear all,

Today I have tried Professor Drake's "last good reply" in Erica. So far, I got 
at most 20-30 elo from it.


I tested by self-play, with 3000 playouts/move on 19x19. The amount of playouts 
might be too few, but I would like to test more playouts IF the playing 
strength is not weaker with 3000 playouts.

At first I tried the original scheme: play the "last good reply" 
deterministically, but it did not work at all. Then I tried to increase the 
probability of the "last good reply" (since I use probabilistic simulation in 
Erica), then the winning rate became almost 50% after 250 games. 

Finally I tried to include "forgetting", the winning rate increased to around 
55% after 500 games. I also tried to decrease the probability for the 
"last-LOST-reply", still 50% after 200 games.

>From this preliminary experiments with 3000 playouts, I have some observations:

1. In Erica, it's better to consider probability for this heuristic.

2. In Prof. Drake's implementation, there is a weakness in learning. I think 
the main problem is that for a reply which is deterministically played by 
default policy, there is no room to learn a new reply. For example, if "save by 
capture" produces a lost game, then in the next simulation, it will still play 
"save by capture" by default policy. If I am wrong in this point, I am glad to 
be corrected by anyone.

3. This heuristic has potential to perform better in Erica. I hope this brief 
result would encourage other authors to try it.

Aja

_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

[Computer-go] The heuristic "last good reply"

Reply via email to