> > By contrast, you
> > should test (in the tree) a kind of move that is either good or average,
> > but not either average or bad, even if it's the same amount of
> > information. In the tree, you look for the best move. Near the root at
> > least; when going deeper and the evaluation being less precise, you
> > merely look for good moves, that keep a trustworthy evaluation of the
> > position, and try to avoid brittle ones.
> >   
> Again, you are semantically challenged but basically correct.   A move
> that you can statically evaluate as being on the lower end of the scale
> does not have much information content - in other words evaluating it in
> the tree has very little chance of changing the score.   
>

OK. Then we agree. In fact, that's what UCB does: look for more
information about what could be the best.
By the way, there are schemes for putting a value a priori to a node, so
as to widen progressively the search. 

Is there anyone who tried changing the uncertainty a priori ? An atari
or such would get higher uncertainty, but maybe not higher a priori
value. 
 
> The alpha beta procedure in general is about information content in a
> big way.    Many of the branches cut off in alpha beta spring from great
> moves, that have no chance of changing the score so there is no useful
> information there.    You would not look at those moves just because
> they happen to be "really great" moves.  
> 
> > In the playouts, that's another matter. I would say that (almost) always
> > playing 'out of atari' would add stability, much in the way Magnus
> > Persson very well explained.
> >
> > What do we want of playout policies ? 
> > As static evaluation functions, we would want them to give the right
> > ordering of move values, with differences as wide as possible.
> > More precisely, we would want that among the best moves, that's not
> > important if the evaluation is not precise for bad moves.
> >   
> Maybe I'm semantically challenged now,  but the only correct ordering of
> move values is win or lose.    I suppose you are saying that there
> should be a variety of scores that somehow reflect the difficulty of
> winning?  

Yes, of course the true evaluation is zero or one, and we cannot access
it, and more to the point, we cannot approach it.

Any evaluation function such that, in a given position, the evaluation
of the winning moves is better than that of the losing moves would be
just as ideal. In fact, any evaluation such that the best move is a
winning move if there is one would still be ideal. And we could have a
useful ordering of the losing moves if all moves lose.

Maybe one if these evaluation functions is available as a playout
policy. Of course we cannot find one, but we could use this hypoth
policy as reference.

Another slightly more concrete possible reference playout policy would
be: the full-fledged program plays the playout
*IF* the programs are still random enough, this would make a lot of
sense: we are then evaluating the winning probability of the program
against itself. That's at least one of the opponents that is perfectly
modelized.
Such an ideal playout policy would not undervalue light play, for
example, as it would know how to connect when threatened to be cut.

Jonas
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to