Thank you for your answer. However, I am even more confused now. I
understand that "-" is for negamax, but I don't understand why it became
"1-". I am trying to implement your algorithm and I just want to know what
lines 7, 16 and 26 should be?


It became a "1-" because I said a mistake while answering. The "1" would
have been here only to keep values always between 0 and 1 (instead of [0,1]
if black or [-1,0] if white), IF "value" was the average win and not the
total win. So my fault, sorry :-/.

Is that make things clearer?

Sylvain




-----Original Message-----
From: "Sylvain Gelly" <[EMAIL PROTECTED]>
To: "Dmitry Kamenetsky" <[EMAIL PROTECTED]>
Date: Wed, 21 Feb 2007 11:03:08 +0100
Subject: Re: [computer-go] UCT vs MC

>
> Hello Dmitry,
>
>
> >> Your code says that the value is backed up by sum and negation (line
26,
> > >> value := -value).  But I don't see any negative values in your
sample
> > tree,
> > >> or values greater than one.  How do you actually back up values to
the
> > >> root?
> > >Sorry, it is value := 1-value. Thank you for pointing out the
mistake.
> >
> > I am confused about value. What is it actually storing? I thought
> > node[i].value stores the number of wins (for Black) for node i. Then
why
> > some of the values in Figure 1 not integer?
> >
> > If line 26 is now value := 1-value, then should some of the other
lines
> > also change? For example should line 7 be updateValue(node,
> > 1-node[i].value), and line 16 be else v[i]:= (1-node.childNode
> > [i].value)/node.childNode[i].nb+sqrt(...)?
>
>
> You're right there were some confusion :-).
> In fact it is very simple. The "-" is here because it is negamax and not
> minimax, so that you can always take the max of the value (but the value
is
> negated every 2 levels). The value stored then corresponds to the value
of
> "the player to play" in the node.
> It seems that node[i].value indeed keeps the number of wins for the
player
> to play in node i. the "1-" does not exist.
> In Figure 1, it is an example of UCT in general case, where the reward
is
> not always in [0,1]. And the values displayed in the nodes are the
averages.
> So that explains the non integers and the values not in [0,1].
>
>
>
> > Can you also update all the changes in your report? Thank you.
>
>
> I'll try to find sometime to do that. Can't tell it will be soon though.
>
> Regards,
> Sylvain
>
>

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to