They have a Q(s,a) term in their node-selection formula, but they don't tell 
what value they give to an action that has not yet been visited. Maybe Aja can 
tell us.

----- Mail original -----
De: "Álvaro Begué" <alvaro.be...@gmail.com>
À: "computer-go" <computer-go@computer-go.org>
Envoyé: Dimanche 3 Décembre 2017 16:44:00
Objet: Re: [Computer-go] action-value Q for unexpanded nodes




I am not sure where in the paper you think they use Q(s,a) for a node s that 
hasn't been expanded yet. Q(s,a) is a property of an edge of the graph. At a 
leaf they only use the `value' output of the neural network. 

If this doesn't match your understanding of the paper, please point to the 
specific paragraph that you are having trouble with. 

Álvaro. 





On Sun, Dec 3, 2017 at 9:53 AM, Andy < andy.olsen...@gmail.com > wrote: 



I don't see the AGZ paper explain what the mean action-value Q(s,a) should be 
for a node that hasn't been expanded yet. The equation for Q(s,a) has the term 
1/N(s,a) in it because it's supposed to average over N(s,a) visits. But in this 
case N(s,a)=0 so that won't work. 


Does anyone know how this is supposed to work? Or is it another detail AGZ 
didn't spell out? 




_______________________________________________ 
Computer-go mailing list 
Computer-go@computer-go.org 
http://computer-go.org/mailman/listinfo/computer-go 


_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to