Re: [Computer-go] action-value Q for unexpanded nodes

Aja Huang Wed, 06 Dec 2017 04:19:40 -0800

2017-12-06 9:23 GMT+00:00 Gian-Carlo Pascutto <g...@sjeng.org>:

> On 03-12-17 17:57, Rémi Coulom wrote:
> > They have a Q(s,a) term in their node-selection formula, but they
> > don't tell what value they give to an action that has not yet been
> > visited. Maybe Aja can tell us.
>
> FWIW I already asked Aja this exact question a bit after the paper came
> out and he told me he cannot answer questions about unpublished details.
>


Yes, I did ask my manager if I could answer your question but he
specifically said no. All I can say is that first-play-urgency is not a
significant technical detail, and what's why we didn't specify it in the
paper.

Aja



> This is not very promising regarding reproducibility considering the AZ
> paper is even lighter on them.
>
> Another issue which is up in the air is whether the choice of the number
> of playouts for the MCTS part represents an implicit balancing between
> self-play and training speed. This is particularly relevant if the
> evaluation step is removed. But it's possible even DeepMind doesn't know
> the answer for sure. They had a setup, and they optimized it. It's not
> clear which parts generalize.
>
> (Usually one wonders about such things in terms of algorithms, but here
> one wonders about it in terms of hardware!)
>
> --
> GCP
> _______________________________________________
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] action-value Q for unexpanded nodes

Reply via email to