It might be a mistake, but on page 30 the paper has a formula for Elo that is off by a factor of log(10) = 2.3026 with respect to the standard formula, which means their Elo differences might be inflated. But I suspect they just meant to have used "10^" instead of "exp" on the paper, and they probably computed Elo correctly.
Álvaro. On Wed, Oct 18, 2017 at 5:39 PM, Gian-Carlo Pascutto <g...@sjeng.org> wrote: > On 18/10/2017 22:00, Brian Sheppard via Computer-go wrote: > > This paper is required reading. When I read this team’s papers, I think > > to myself “Wow, this is brilliant! And I think I see the next step.” > > When I read their next paper, they show me the next *three* steps. > > Hmm, interesting way of seeing it. Once they had Lee Sedol AlphaGo, it > was somewhat obvious that just self-playing that should lead to an > improved policy and value net. > > And before someone accuses me of Captain Hindsighting here, this was > pointed out on this list: > http://computer-go.org/pipermail/computer-go/2017-January/009786.html > > It looks to me like the real devil is in the details. Don't use a > residual stack? -600 Elo. Don't combine the networks? -600 Elo. > Bootstrap the learning? -300 Elo > > We made 3 perfectly reasonable choices and somehow lost 1500 Elo along > the way. I can't get over that number, actually. > > Getting the details right makes a difference. And they're getting them > right, either because they're smart, because of experience from other > domains, or because they're trying a ton of them. I'm betting on all 3. > > -- > GCP > _______________________________________________ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go >
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go