2017-12-06 9:23 GMT+00:00 Gian-Carlo Pascutto <g...@sjeng.org>: > On 03-12-17 17:57, Rémi Coulom wrote: > > They have a Q(s,a) term in their node-selection formula, but they > > don't tell what value they give to an action that has not yet been > > visited. Maybe Aja can tell us. > > FWIW I already asked Aja this exact question a bit after the paper came > out and he told me he cannot answer questions about unpublished details. >
Yes, I did ask my manager if I could answer your question but he specifically said no. All I can say is that first-play-urgency is not a significant technical detail, and what's why we didn't specify it in the paper. Aja > This is not very promising regarding reproducibility considering the AZ > paper is even lighter on them. > > Another issue which is up in the air is whether the choice of the number > of playouts for the MCTS part represents an implicit balancing between > self-play and training speed. This is particularly relevant if the > evaluation step is removed. But it's possible even DeepMind doesn't know > the answer for sure. They had a setup, and they optimized it. It's not > clear which parts generalize. > > (Usually one wonders about such things in terms of algorithms, but here > one wonders about it in terms of hardware!) > > -- > GCP > _______________________________________________ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go >
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go