I made a pull request to Leela, and put some data in there. It shows the details of how Q is initialized are actually important: https://github.com/gcp/leela-zero/pull/238
2017-12-03 19:56 GMT-06:00 Álvaro Begué <alvaro.be...@gmail.com>: > You are asking about the selection of the move that goes to a leaf. When > the node before the move was expanded (in a previous playout), the value of > Q(s,a) for that move was initialized to 0. > > The UCB-style formula they use in the tree part of the playout is such > that the first few visits will follow the probability distribution from the > policy output of the network, and over time it converges to using primarily > the moves that have best results. So the details of how Q is initialized > are not very relevant. > > > On Sun, Dec 3, 2017 at 5:11 PM, Andy <andy.olsen...@gmail.com> wrote: > >> Álvaro, you are quoting from "Expand and evaluate (Figure 2b)". But my >> question is about the section before that "Select (Figure 2a)". So the node >> has not been expanded+initialized. >> >> As Brian Lee mentioned, his MuGo uses the parent's value, which assumes >> without further information the value should be close to the same as before. >> >> LeelaZ uses 1.1 for a "first play urgency", which assumes you should >> prioritize getting at least one evaluation from the NN for each node. >> https://github.com/gcp/leela-zero/blob/master/src/UCTNode.cpp#L323 >> >> Finally using a value of 0 would seem to place extra confidence in the >> policy net values. >> >> I feel like MuGo's implementation makes sense, but I'm trying to get some >> experimental evidence showing the impact before suggesting it to Leela's >> author. So far my self-play tests with different settings do not show a big >> impact, but I am changing other variables at the same time. >> >> - Andy >> >> >> >> 2017-12-03 14:30 GMT-06:00 Álvaro Begué <alvaro.be...@gmail.com>: >> >>> The text in the appendix has the answer, in a paragraph titled "Expand >>> and evaluate (Fig. 2b)": >>> "[...] The leaf node is expanded and and each edge (s_t, a) is >>> initialized to {N(s_t, a) = 0, W(s_t, a) = 0, Q(s_t, a) = 0, P(s_t, a) = >>> p_a}; [...]" >>> >>> >>> >>> On Sun, Dec 3, 2017 at 11:27 AM, Andy <andy.olsen...@gmail.com> wrote: >>> >>>> Figure 2a shows two bolded Q+U max values. The second one is going to a >>>> leaf that doesn't exist yet, i.e. not expanded yet. Where do they get that >>>> Q value from? >>>> >>>> The associated text doesn't clarify the situation: "Figure 2: >>>> Monte-Carlo tree search in AlphaGo Zero. a Each simulation traverses the >>>> tree by selecting the edge with maximum action-value Q, plus an upper >>>> confidence bound U that depends on a stored prior probability P and visit >>>> count N for that edge (which is incremented once traversed). b The leaf >>>> node is expanded..." >>>> >>>> >>>> >>>> >>>> >>>> >>>> 2017-12-03 9:44 GMT-06:00 Álvaro Begué <alvaro.be...@gmail.com>: >>>> >>>>> I am not sure where in the paper you think they use Q(s,a) for a node >>>>> s that hasn't been expanded yet. Q(s,a) is a property of an edge of the >>>>> graph. At a leaf they only use the `value' output of the neural network. >>>>> >>>>> If this doesn't match your understanding of the paper, please point to >>>>> the specific paragraph that you are having trouble with. >>>>> >>>>> Álvaro. >>>>> >>>>> >>>>> >>>>> On Sun, Dec 3, 2017 at 9:53 AM, Andy <andy.olsen...@gmail.com> wrote: >>>>> >>>>>> I don't see the AGZ paper explain what the mean action-value Q(s,a) >>>>>> should be for a node that hasn't been expanded yet. The equation for >>>>>> Q(s,a) >>>>>> has the term 1/N(s,a) in it because it's supposed to average over N(s,a) >>>>>> visits. But in this case N(s,a)=0 so that won't work. >>>>>> >>>>>> Does anyone know how this is supposed to work? Or is it another >>>>>> detail AGZ didn't spell out? >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Computer-go mailing list >>>>>> Computer-go@computer-go.org >>>>>> http://computer-go.org/mailman/listinfo/computer-go >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Computer-go mailing list >>>>> Computer-go@computer-go.org >>>>> http://computer-go.org/mailman/listinfo/computer-go >>>>> >>>> >>>> >>>> _______________________________________________ >>>> Computer-go mailing list >>>> Computer-go@computer-go.org >>>> http://computer-go.org/mailman/listinfo/computer-go >>>> >>> >>> >>> _______________________________________________ >>> Computer-go mailing list >>> Computer-go@computer-go.org >>> http://computer-go.org/mailman/listinfo/computer-go >>> >> >> >> _______________________________________________ >> Computer-go mailing list >> Computer-go@computer-go.org >> http://computer-go.org/mailman/listinfo/computer-go >> > > > _______________________________________________ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go >
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go