On 08-03-18 18:47, Brian Sheppard via Computer-go wrote: > I recall that someone investigated this question, but I don’t recall the > result. What is the formula that AGZ actually uses?
The one mentioned in their paper, I assume. I investigated both that and the original from the referenced paper, but after tuning I saw little meaningful strength difference. One thing of note is that (IIRC) the AGZ formula keeps scaling the exploration term by the policy prior forever. In the original formula, it is a diminishing term. -- GCP _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go