Re: [Computer-go] action-value Q for unexpanded nodes

Brian Lee Sun, 03 Dec 2017 13:25:49 -0800

It should default to the Q of the parent node. Otherwise, let's say that
the root node is a losing position. Upon choosing a followup move, the Q
will be updated to a very negative value, and that node won't get explored
again - at least until all 362 top-level children have been explored and
revealed to have negative values. So without initializing Q to the parent's
Q, you would end up wasting 362 MCTS iterations.


Brian

On Sun, Dec 3, 2017 at 3:25 PM <[email protected]> wrote:

> Send Computer-go mailing list submissions to
>         [email protected]
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://computer-go.org/mailman/listinfo/computer-go
> or, via email, send a message with subject or body 'help' to
>         [email protected]
>
> You can reach the person managing the list at
>         [email protected]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Computer-go digest..."
>
>
> Today's Topics:
>
>    1. action-value Q for unexpanded nodes (Andy)
>    2. Re: action-value Q for unexpanded nodes (Álvaro Begué)
>    3. Re: action-value Q for unexpanded nodes (Andy)
>    4. Re: action-value Q for unexpanded nodes (Rémi Coulom)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sun, 3 Dec 2017 08:53:02 -0600
> From: Andy <[email protected]>
> To: computer-go <[email protected]>
> Subject: [Computer-go] action-value Q for unexpanded nodes
> Message-ID:
>         <
> caatbd5cguzt4arbsum8-d91j31znq+2tkzpbxv4u5fxthhd...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> I don't see the AGZ paper explain what the mean action-value Q(s,a) should
> be for a node that hasn't been expanded yet. The equation for Q(s,a) has
> the term 1/N(s,a) in it because it's supposed to average over N(s,a)
> visits. But in this case N(s,a)=0 so that won't work.
>
> Does anyone know how this is supposed to work? Or is it another detail AGZ
> didn't spell out?
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://computer-go.org/pipermail/computer-go/attachments/20171203/8fc94bcd/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 2
> Date: Sun, 3 Dec 2017 10:44:00 -0500
> From: Álvaro Begué <[email protected]>
> To: computer-go <[email protected]>
> Subject: Re: [Computer-go] action-value Q for unexpanded nodes
> Message-ID:
>         <
> caf8dvmu_f0ue2yykvbwvkrcsuy93wn-x9m8tgmcz+dqfbe4...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> I am not sure where in the paper you think they use Q(s,a) for a node s
> that hasn't been expanded yet. Q(s,a) is a property of an edge of the
> graph. At a leaf they only use the `value' output of the neural network.
>
> If this doesn't match your understanding of the paper, please point to the
> specific paragraph that you are having trouble with.
>
> Álvaro.
>
>
>
> On Sun, Dec 3, 2017 at 9:53 AM, Andy <[email protected]> wrote:
>
> > I don't see the AGZ paper explain what the mean action-value Q(s,a)
> should
> > be for a node that hasn't been expanded yet. The equation for Q(s,a) has
> > the term 1/N(s,a) in it because it's supposed to average over N(s,a)
> > visits. But in this case N(s,a)=0 so that won't work.
> >
> > Does anyone know how this is supposed to work? Or is it another detail
> AGZ
> > didn't spell out?
> >
> >
> >
> > _______________________________________________
> > Computer-go mailing list
> > [email protected]
> > http://computer-go.org/mailman/listinfo/computer-go
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://computer-go.org/pipermail/computer-go/attachments/20171203/b8f3d1cc/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 3
> Date: Sun, 3 Dec 2017 10:27:16 -0600
> From: Andy <[email protected]>
> To: computer-go <[email protected]>
> Subject: Re: [Computer-go] action-value Q for unexpanded nodes
> Message-ID:
>         <
> caatbd5cbdtsj7whjm9mybrtdbzlhqdujitosn49ce8kut5_...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Figure 2a shows two bolded Q+U max values. The second one is going to a
> leaf that doesn't exist yet, i.e. not expanded yet. Where do they get that
> Q value from?
>
> The associated text doesn't clarify the situation: "Figure 2: Monte-Carlo
> tree search in AlphaGo Zero. a Each simulation traverses the tree by
> selecting the edge with maximum action-value Q, plus an upper confidence
> bound U that depends on a stored prior probability P and visit count N for
> that edge (which is incremented once traversed). b The leaf node is
> expanded..."
>
>
>
>
>
>
> 2017-12-03 9:44 GMT-06:00 Álvaro Begué <[email protected]>:
>
> > I am not sure where in the paper you think they use Q(s,a) for a node s
> > that hasn't been expanded yet. Q(s,a) is a property of an edge of the
> > graph. At a leaf they only use the `value' output of the neural network.
> >
> > If this doesn't match your understanding of the paper, please point to
> the
> > specific paragraph that you are having trouble with.
> >
> > Álvaro.
> >
> >
> >
> > On Sun, Dec 3, 2017 at 9:53 AM, Andy <[email protected]> wrote:
> >
> >> I don't see the AGZ paper explain what the mean action-value Q(s,a)
> >> should be for a node that hasn't been expanded yet. The equation for
> Q(s,a)
> >> has the term 1/N(s,a) in it because it's supposed to average over N(s,a)
> >> visits. But in this case N(s,a)=0 so that won't work.
> >>
> >> Does anyone know how this is supposed to work? Or is it another detail
> >> AGZ didn't spell out?
> >>
> >>
> >>
> >> _______________________________________________
> >> Computer-go mailing list
> >> [email protected]
> >> http://computer-go.org/mailman/listinfo/computer-go
> >>
> >
> >
> > _______________________________________________
> > Computer-go mailing list
> > [email protected]
> > http://computer-go.org/mailman/listinfo/computer-go
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://computer-go.org/pipermail/computer-go/attachments/20171203/c01677b3/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 4
> Date: Sun, 3 Dec 2017 17:57:51 +0100 (CET)
> From: Rémi Coulom <[email protected]>
> To: [email protected]
> Subject: Re: [Computer-go] action-value Q for unexpanded nodes
> Message-ID:
>         <1885878373.291683317.1512320271343.JavaMail.root@spooler6-g27>
> Content-Type: text/plain; charset=utf-8
>
> They have a Q(s,a) term in their node-selection formula, but they don't
> tell what value they give to an action that has not yet been visited. Maybe
> Aja can tell us.
>
> ----- Mail original -----
> De: "Álvaro Begué" <[email protected]>
> À: "computer-go" <[email protected]>
> Envoyé: Dimanche 3 Décembre 2017 16:44:00
> Objet: Re: [Computer-go] action-value Q for unexpanded nodes
>
>
>
>
> I am not sure where in the paper you think they use Q(s,a) for a node s
> that hasn't been expanded yet. Q(s,a) is a property of an edge of the
> graph. At a leaf they only use the `value' output of the neural network.
>
> If this doesn't match your understanding of the paper, please point to the
> specific paragraph that you are having trouble with.
>
> Álvaro.
>
>
>
>
>
> On Sun, Dec 3, 2017 at 9:53 AM, Andy < [email protected] > wrote:
>
>
>
> I don't see the AGZ paper explain what the mean action-value Q(s,a) should
> be for a node that hasn't been expanded yet. The equation for Q(s,a) has
> the term 1/N(s,a) in it because it's supposed to average over N(s,a)
> visits. But in this case N(s,a)=0 so that won't work.
>
>
> Does anyone know how this is supposed to work? Or is it another detail AGZ
> didn't spell out?
>
>
>
>
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://computer-go.org/mailman/listinfo/computer-go
>
>
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://computer-go.org/mailman/listinfo/computer-go
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://computer-go.org/mailman/listinfo/computer-go
>
> ------------------------------
>
> End of Computer-go Digest, Vol 95, Issue 5
> ******************************************
>

_______________________________________________
Computer-go mailing list
[email protected]
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] action-value Q for unexpanded nodes

Reply via email to