On Wed, 2007-07-04 at 11:34 +0200, Magnus Persson wrote:
> but what really will make a
> difference is in the quality in the playouts. 

I would like to suggest a more abstract view of things.  In the purest
form of the algorithm there isn't an artificial distinction between the
tree and the play-outs.  The algorithm is applied as if the whole tree
already exists (conceptually) and nodes are updated to the end of the
game.

We had to impose end nodes and a tree that grows in depth due to the
fact that it's impractical to store the whole tree in memory.  So we
have a tree phase on the one hand, and on the other hand we have a
play-out phase that simulates an unexplored tree (but without updates
which introduces out of necessity a small inefficiency.)    

This makes everything a bit of a compromise but a well advised one due
to hardware limitations.   But then we started imposing our will on the
play-outs in order to make them "smarter."   But we didn't do the same
to the tree portion because we now believe they are 2 separate things
(even though they really are not.)

So I prefer to think of the play-outs and the tree as the same thing.  I
think whatever is done can be applied to both.   For instance Lazarus
does a lot of pruning and the pruning rules are the same for tree
portion and the play-out portion.   Actually, Lazarus saw most of the
improvement from the tree pruning when I test each without the other.   

But I notice that we are now looking at the tree as the "search" portion
and the play-outs as the "evaluation function."   I think that is
incredible because I have always believed that "tree search" and
"evaluation" are the same thing, just different forms or states.  Like
water and ice,  or matter and energy.  

It's interesting that chess has this too.  Traditionally programs have
always had these 3 very crude phases,   search, quiescence, evaluation.
Modern programs have somewhat blurred these distinctions but it hasn't
changed very much.   

UCT comes along and finally does away with the distinction altogether.
Now you can call it all evaluation or search, whatever pleases you.
But in it's purest form, UCT with totally random play-outs is a
beautiful thing - a recursive evaluation function with zero (almost)
domain specific knowledge.

Of course now we just had to go and spoil it all by imposing domain
specific rules.  I have done the same and I admit it.    It would be fun
to see how far we could go if domain specific knowledge was forbidden as
an experiment.   Once patterns are introduced along with other direct Go
knowledge, it's still fun but it feels a bit "wrong", kind of like
cheating.   It's clear that when we do this, we introduce strengths and
weaknesses to the program,  making it a bit more fragile, less
"universal" or robust.  Stronger too, but more susceptible to
in-transitivity.  

Of course we do this in Chess programs in a big way.  We very tediously
"tell" the program what is good and what is bad.   It has no choice, it
must accept our definition of right and wrong, our morality.   However
in our great wisdom we provide a search mechanism in order to correct
our bad judgments.   The search mechanism is an admission that we know
we are wrong about many things.  

Of course you are right - if the play-outs are improved, the quality of
the moves will also improve.

- Don


   


_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to