On 08-03-18 18:47, Brian Sheppard via Computer-go wrote:
> I recall that someone investigated this question, but I don’t recall the
> result. What is the formula that AGZ actually uses?

The one mentioned in their paper, I assume.

I investigated both that and the original from the referenced paper, but
after tuning I saw little meaningful strength difference.

One thing of note is that (IIRC) the AGZ formula keeps scaling the
exploration term by the policy prior forever. In the original formula,
it is a diminishing term.

Computer-go mailing list

Reply via email to