It should default to the Q of the parent node. Otherwise, let's say that the root node is a losing position. Upon choosing a followup move, the Q will be updated to a very negative value, and that node won't get explored again - at least until all 362 top-level children have been explored and revealed to have negative values. So without initializing Q to the parent's Q, you would end up wasting 362 MCTS iterations.
Brian On Sun, Dec 3, 2017 at 3:25 PM <computer-go-requ...@computer-go.org> wrote: > Send Computer-go mailing list submissions to > computer-go@computer-go.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://computer-go.org/mailman/listinfo/computer-go > or, via email, send a message with subject or body 'help' to > computer-go-requ...@computer-go.org > > You can reach the person managing the list at > computer-go-ow...@computer-go.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Computer-go digest..." > > > Today's Topics: > > 1. action-value Q for unexpanded nodes (Andy) > 2. Re: action-value Q for unexpanded nodes (Álvaro Begué) > 3. Re: action-value Q for unexpanded nodes (Andy) > 4. Re: action-value Q for unexpanded nodes (Rémi Coulom) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sun, 3 Dec 2017 08:53:02 -0600 > From: Andy <andy.olsen...@gmail.com> > To: computer-go <computer-go@computer-go.org> > Subject: [Computer-go] action-value Q for unexpanded nodes > Message-ID: > < > caatbd5cguzt4arbsum8-d91j31znq+2tkzpbxv4u5fxthhd...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > I don't see the AGZ paper explain what the mean action-value Q(s,a) should > be for a node that hasn't been expanded yet. The equation for Q(s,a) has > the term 1/N(s,a) in it because it's supposed to average over N(s,a) > visits. But in this case N(s,a)=0 so that won't work. > > Does anyone know how this is supposed to work? Or is it another detail AGZ > didn't spell out? > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://computer-go.org/pipermail/computer-go/attachments/20171203/8fc94bcd/attachment-0001.html > > > > ------------------------------ > > Message: 2 > Date: Sun, 3 Dec 2017 10:44:00 -0500 > From: Álvaro Begué <alvaro.be...@gmail.com> > To: computer-go <computer-go@computer-go.org> > Subject: Re: [Computer-go] action-value Q for unexpanded nodes > Message-ID: > < > caf8dvmu_f0ue2yykvbwvkrcsuy93wn-x9m8tgmcz+dqfbe4...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > I am not sure where in the paper you think they use Q(s,a) for a node s > that hasn't been expanded yet. Q(s,a) is a property of an edge of the > graph. At a leaf they only use the `value' output of the neural network. > > If this doesn't match your understanding of the paper, please point to the > specific paragraph that you are having trouble with. > > Álvaro. > > > > On Sun, Dec 3, 2017 at 9:53 AM, Andy <andy.olsen...@gmail.com> wrote: > > > I don't see the AGZ paper explain what the mean action-value Q(s,a) > should > > be for a node that hasn't been expanded yet. The equation for Q(s,a) has > > the term 1/N(s,a) in it because it's supposed to average over N(s,a) > > visits. But in this case N(s,a)=0 so that won't work. > > > > Does anyone know how this is supposed to work? Or is it another detail > AGZ > > didn't spell out? > > > > > > > > _______________________________________________ > > Computer-go mailing list > > Computer-go@computer-go.org > > http://computer-go.org/mailman/listinfo/computer-go > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://computer-go.org/pipermail/computer-go/attachments/20171203/b8f3d1cc/attachment-0001.html > > > > ------------------------------ > > Message: 3 > Date: Sun, 3 Dec 2017 10:27:16 -0600 > From: Andy <andy.olsen...@gmail.com> > To: computer-go <computer-go@computer-go.org> > Subject: Re: [Computer-go] action-value Q for unexpanded nodes > Message-ID: > < > caatbd5cbdtsj7whjm9mybrtdbzlhqdujitosn49ce8kut5_...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Figure 2a shows two bolded Q+U max values. The second one is going to a > leaf that doesn't exist yet, i.e. not expanded yet. Where do they get that > Q value from? > > The associated text doesn't clarify the situation: "Figure 2: Monte-Carlo > tree search in AlphaGo Zero. a Each simulation traverses the tree by > selecting the edge with maximum action-value Q, plus an upper confidence > bound U that depends on a stored prior probability P and visit count N for > that edge (which is incremented once traversed). b The leaf node is > expanded..." > > > > > > > 2017-12-03 9:44 GMT-06:00 Álvaro Begué <alvaro.be...@gmail.com>: > > > I am not sure where in the paper you think they use Q(s,a) for a node s > > that hasn't been expanded yet. Q(s,a) is a property of an edge of the > > graph. At a leaf they only use the `value' output of the neural network. > > > > If this doesn't match your understanding of the paper, please point to > the > > specific paragraph that you are having trouble with. > > > > Álvaro. > > > > > > > > On Sun, Dec 3, 2017 at 9:53 AM, Andy <andy.olsen...@gmail.com> wrote: > > > >> I don't see the AGZ paper explain what the mean action-value Q(s,a) > >> should be for a node that hasn't been expanded yet. The equation for > Q(s,a) > >> has the term 1/N(s,a) in it because it's supposed to average over N(s,a) > >> visits. But in this case N(s,a)=0 so that won't work. > >> > >> Does anyone know how this is supposed to work? Or is it another detail > >> AGZ didn't spell out? > >> > >> > >> > >> _______________________________________________ > >> Computer-go mailing list > >> Computer-go@computer-go.org > >> http://computer-go.org/mailman/listinfo/computer-go > >> > > > > > > _______________________________________________ > > Computer-go mailing list > > Computer-go@computer-go.org > > http://computer-go.org/mailman/listinfo/computer-go > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://computer-go.org/pipermail/computer-go/attachments/20171203/c01677b3/attachment-0001.html > > > > ------------------------------ > > Message: 4 > Date: Sun, 3 Dec 2017 17:57:51 +0100 (CET) > From: Rémi Coulom <remi.cou...@free.fr> > To: computer-go@computer-go.org > Subject: Re: [Computer-go] action-value Q for unexpanded nodes > Message-ID: > <1885878373.291683317.1512320271343.JavaMail.root@spooler6-g27> > Content-Type: text/plain; charset=utf-8 > > They have a Q(s,a) term in their node-selection formula, but they don't > tell what value they give to an action that has not yet been visited. Maybe > Aja can tell us. > > ----- Mail original ----- > De: "Álvaro Begué" <alvaro.be...@gmail.com> > À: "computer-go" <computer-go@computer-go.org> > Envoyé: Dimanche 3 Décembre 2017 16:44:00 > Objet: Re: [Computer-go] action-value Q for unexpanded nodes > > > > > I am not sure where in the paper you think they use Q(s,a) for a node s > that hasn't been expanded yet. Q(s,a) is a property of an edge of the > graph. At a leaf they only use the `value' output of the neural network. > > If this doesn't match your understanding of the paper, please point to the > specific paragraph that you are having trouble with. > > Álvaro. > > > > > > On Sun, Dec 3, 2017 at 9:53 AM, Andy < andy.olsen...@gmail.com > wrote: > > > > I don't see the AGZ paper explain what the mean action-value Q(s,a) should > be for a node that hasn't been expanded yet. The equation for Q(s,a) has > the term 1/N(s,a) in it because it's supposed to average over N(s,a) > visits. But in this case N(s,a)=0 so that won't work. > > > Does anyone know how this is supposed to work? Or is it another detail AGZ > didn't spell out? > > > > > _______________________________________________ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go > > > _______________________________________________ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go > > ------------------------------ > > End of Computer-go Digest, Vol 95, Issue 5 > ****************************************** >
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go