Hi Aja,

I would be interested in your results. I think the LGRF policy is only a small first step into the direction of more adaptive playouts (and hopefully the overcoming of the horizon effect). As for the Last-Bad-Reply idea, you can read about my experiences with this and related policies in my Master's thesis, if you're interested. It contains the idea that resulted in the "Power of Forgetting" paper as well.
http://www.ke.tu-darmstadt.de/lehre/arbeiten/master/2010/Baier_Hendrik.pdf

regards,
Hendrik

I admit that it's difficult for me to include such deterministic default 
policy. :-)
With softmax policy, using the information of "last-LOST-reply" is maybe a good 
direction.

Aja

_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to