Jonas Kahn wrote:
> I think there was some confusion in Don's post on ``out of atari'' in
> play-outs.
> For one thing, I do not agree with the maximal information argument.
>   
This is more a theory than an argument.   Maybe I didn't express it very
well either.

It's a pretty solid principle in tree search - for instance in computer
chess the "checking" move and the responses to it are considered  highly
important moves and almost every program even extends the search on
these moves.   Is that because checking moves are usually great
moves?    Of course not - the majority of them are blunders.

But they are super critical in importance because you really "just gotta
know", you cannot afford to be wrong about these.

Atari moves in Go is the closest things to check in chess.   You cannot
thumb your nose at them because there is more game critical information
in them than in a typical random placement.    I would say that they win
and lose games more than most other moves and that their resolution is
likely to decide the game either way.

You may not believe that, or perhaps you think other moves are more
important and tell you more about who will win and lose.   I'm not a go
player so I could be wrong about these moves being important,  but I
noticed that the first Mogo paper addresses this issue as the very first
order of business.  

So I don't know how you define information content,  but if I were
writing a classic program I would probably have as my first order of
business a routine to tell me if groups that have limited liberties
could be captured.   And I would consider a ladder search as super
important.  Maybe David Fotland can tell us if ladder search is very
important or not,  but I believe it would be.   It is "information rich"
in that it doesn't mater whether it wins or loses,  you need to know in
either case or you will lose the game.

What is often not understood in game tree search is that you usually
have to look at certain classes of bad moves pretty hard to understand
that they are bad.   The knee jerk reaction is to throw out moves that
are "on average" bad but you really should be more focused on moves that
have high information content.   In chess these are the so called "high
compulsion" moves.   Checks, captures,  threats, etc.    Most checks,
captures, threats are stupid moves but it'a an error to immediately
prune them away.  

So I could be wrong about this,  but I'm pretty convinced I am right -
you must look at high information content moves.     These moves are not
"noisy" because they tend to tell you right away if you are going to win
or lose.

> Testing ``out of atari'' moves is not good because they might be good,
> or might be bad, but merely because they might be good. 
That's what I mean by information content.    You are semantically
challenged here,  what you are saying agrees with what I am saying.     
If I run a marathon just to "see if I can finish" you can also rephrase
this to say, "To see if I can finish or not."     The only important
issue is that I would run the marathon in order to make the
distinction.     

> By contrast, you
> should test (in the tree) a kind of move that is either good or average,
> but not either average or bad, even if it's the same amount of
> information. In the tree, you look for the best move. Near the root at
> least; when going deeper and the evaluation being less precise, you
> merely look for good moves, that keep a trustworthy evaluation of the
> position, and try to avoid brittle ones.
>   
Again, you are semantically challenged but basically correct.   A move
that you can statically evaluate as being on the lower end of the scale
does not have much information content - in other words evaluating it in
the tree has very little chance of changing the score.   

The alpha beta procedure in general is about information content in a
big way.    Many of the branches cut off in alpha beta spring from great
moves, that have no chance of changing the score so there is no useful
information there.    You would not look at those moves just because
they happen to be "really great" moves.  

> In the playouts, that's another matter. I would say that (almost) always
> playing 'out of atari' would add stability, much in the way Magnus
> Persson very well explained.
>
> What do we want of playout policies ? 
> As static evaluation functions, we would want them to give the right
> ordering of move values, with differences as wide as possible.
> More precisely, we would want that among the best moves, that's not
> important if the evaluation is not precise for bad moves.
>   
Maybe I'm semantically challenged now,  but the only correct ordering of
move values is win or lose.    I suppose you are saying that there
should be a variety of scores that somehow reflect the difficulty of
winning?  

This is hard to talk about without quantifying exactly what should be
returned in an ideal world.  If the evaluation can be perfect of course
you want want it to always return 0 or 1.     Since in the real world it
is not perfect,  it's probably useful that it returns a score that is
somehow proportionate to the difficulty of winning when averaged over
many playouts.  

I suspect, and this could be tested, that many heavy playout policies
are heavy at the extremes, either returning mostly wins or mostly losses
when tried from similar positions.   I would imagine that you are right
about this if you are saying that the behavior should be more in line
with the "difficulty" of winning (which is a concept we cannot really
pin down.)

> Now we build a tree while computing the evaluation function. So that we
> can allow for false good moves if they are quickly seen as such in the
> tree, that is after a one or three plies search, if the false good moves
> are not too numerous.
> False wrong moves is much worse, since we might never exploit the branch
> long enough to correct this feeling.
>
> The latter paragraph applies also to pre-ordering (I keep a model like
> the one of Crazystone in mind, with a priori probabilities for each move
> in the tree, and also a distribution in the playouts).
>
>
> Conclusions:
> It's no matter if there is a bias in the playout policy, as long as it's
> the same for all moves.
>   
This is murky ground and I would prefer not to comment too much on it.  
When I say bias I mean that any playout strategy will naturally be
better at some types of positions than others.    They will all have
blind spots and heavier playouts may play much better but still have
blind spots.

I don't really understand your statement so I can't agree or disagree. 


> Bias toward solid move is therefore a nuisance...
> Playing (not too numerous) nonsensical moves would only be a nuisance if
> there are positions whose associated playouts call for many more urgent
> moves than in others.
>
> What matters is making two moves having very different values. Here the
> reduction of noise comes into play: if there is 50% chance in all
> playouts that an alive group dies, and decides the game, then the
> difference in evaluation of the other positions is divided by two...
> Taking out all the common noise (that is the mistakes that appear in all
> playouts) makes the distinction easier.
> On the other hand, concentrating along a wrong evaluation (after this
> particular move, the group is dead) would be a catastropha.
> If this comes from one particular move, it should be noticed and
> played/avoided systematically.
>
>
> About learning on the fly:
> I agree completely, that was one of my first posts.
> However I really think we should have learnt patterns (and other
> properties) first: you cannot learn the whole of go, or even your game,
> in one game. 
> And learning is a good thing, but you must find the good move first, and
> should as quick as possible.
> For one thing, if we learn patterns, here they should obviously be
> localized (not translation invariant). We could also learn short move
> sequences.
> In a setting with probability distribution on moves, taking that into
> account is merely changing the probability distribution. Question is:
> how ?
>
> By the way, my guess is that learning on the fly would be more important
> in the playouts than in the tree: it would contribute to stabilizing the
> playouts. The tree should anyhow end up with the good moves.
>
> This learning should probably also come from the playouts (very much
> information, and we could stay with information already calculated for
> the playouts, allowing easy re-use), automatically building a status for
> groups that are solved only near the end...
>
> Jonas, who always reread thinking ``How do I manage to be that unclear
> !"
>   
A lot of these concepts are not easy to define and subject to semantic
confusion so it's not a surprise.     I have argued with people before
who actually agreed with me from the start, but didn't understand my
point of view or me theirs.

- Don

> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
>   
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to