On Wed, Apr 1, 2009 at 9:03 PM, Matthew Woodcraft
<matt...@woodcraft.me.uk> wrote:
> Erik van der Werf wrote:
>> >> Jonas Kahn wrote:
>>> No there is no danger. That's the whole point of weighting with N_{s,a}.
>>>
>>> N_{s,a} = number of times the node s has been visited, starting with parent
>>> a.
>>>
>>> You can write Value of a node a = (\sum_{s \in sons} N_{s,a} V_s) / (\sum
>>> N_{s,a})
>>>
>>> where V_s is ideally the «true» value of node s.
>>> In UCT2, they use V_s = Q_{s,a} the win average of simulations going
>>> through a, and then through s.
>>> In UCT3, they use V_s = Q_s the win average of all simulations through
>>> s.
>
>> There is a danger. The problem is that the selection policy also
>> implements the soft-max like behavior that ensures convergence to the
>> minimax result. If the you backup to all possible parents, including
>> those for which the child would have been an inferior choice, you may
>> get into trouble.
>
> That's what I was worried about.
>
> But I think it's ok the way Jonas describes above: you don't add
> anything to the false-parent node's simulation count, and you don't
> change the weight of the false-child in its value; you just change the
> evaluation of the false-child.
>
> (This means that the effect of backing up to alternate parents will be
> smaller than the effect of backing up to the 'true' parent, which is
> presumably part of the reason why this variant is less attractive.)


Ok, but I would not call that a back up; nothing goes up to the
alternative parents. Unless I missed something, with this you only
make adjustments to the statistics representing transposed occurrences
of the same position.  I don't see that this is how we should
interpret UCT3.

Erik
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to