Re: [computer-go] low-hanging fruit - yose

Don Dailey Mon, 10 Dec 2007 13:09:14 -0800


Mark Boon wrote:
>
> On 6-dec-07, at 19:29, Don Dailey wrote:
>>
>> Here is an example of why this works so well and why your greedy
>> approach is so wrong:
>>
>>    Consider a position where there are 2 groups left that are being
>> fought over.  One of these groups is very large and the other is quite
>> small.    The computer must win one of these groups or it will lose the
>> game - but if it wins either group it wins the game.    The computer
>> estimates that attacking the huge group will result in an impressive win
>> with 80% confidence,   but attacking the small group will result in a
>> "tiny" win with 85% confidence.
>
> Don, what I think they are trying to say is the following: in your
> scenario there may be a third group that is currently 'evaluated' as
> dead which is bigger than your small group. This evaluation may be
> wrong with probability X. In this case, killing the big group will
> give you 80% chance of winning where killing the smaller group gives
> you 85% times X, the chance the other group was evaluated wrongly.
> That may turn out to be far less then 80% in total because X needs to
> be >95%.
>
> So size makes up for some loss of confidence. The question is how to
> quantify it.


But the point is that size should NEVER over-ride confidence.    The
size of stones is very much of a consideration in Monte Carlo play-outs
and when it isn't a consideration is only when it doesn't matter.

The whole conversation sprung from a misconception that a monte carlo
program was losing game after game due to ignoring a few  stones it
could have captured but this was not what happened - it was a difference
in how handicap games were scored. 

Would you rather be 95% confident of a win or 90% confident?    There is
only 1 correct answer to that question.


>
> Of course 'evaluation' in MC is completely counterintuitive. 
I disagree.    This is the most intuitive kind of evaluation but it's
more difficult to code up in a traditional evaluation function if that
is what you mean.      Trying to grab more stones than you need to win
is a more difficult goal to impose on a program.    Your evaluation
function DEFINES the goal that your program will be playing for.   

I think you are confusing this with "follow through" in other sports.  
When you run a sprint you dont' stop at the finish line, you have to run
full strength until you are clearly beyond the finish line, otherwise
you pull up a little short.  

But I think you image that playing to win is like not following through
in sports.   You think this is the same as playing to win by 0.5 points
(and no more.)    And this simply is NOT the case.    In most cases
monte carlo programs are trying to win a lot more than 0.5 points -
precisely because it increases the chances of winning period.     In the
many cases where they win by exactly 0.5 it's because they saw the clear
win early, focused on not spoiling this for anything in the world and
followed through on this.     You would have them stop short of this and
go for some speculative big win and take foolish risks.

> It doesn't evaluate groups separately at all. It makes it hard to
> think about ways to improve it. But I'm absolutely convinced that MC
> programs will play better if they find a way to adjust confidence for
> the size of the win. The correct way just hasn't been found yet. And
> it may not be easy at all.
I have a reason for believing this won't work.   First of all, material
evaluation in GO is an even WORSE evaluation function than in Chess.   
It says that losing by 0.5 is ALMOST the same as winning by 0.5.  When
you have no way to calculate the odds,  that seems to be the simplest
thing to do.      Monte Carlo play-outs represents something WAY more
intelligent,  it knows the difference between wins and losses. 

If you mix something inferior with something superior,   you hope that
somehow the whole is greater than the sum of the parts.    But when you
mix these two things you are getting the average of the two, not the
best of both.  

It's pretty easy to see why this doesn't work.    If you do straight
stone counting a very large group that might be won very quickly 
dominates the score.     Consider this:

   1.   A group of 5 stones that you can win for sure and it wins the
game. 
   2.   A  group of 200 stones - you have a  5% chance of winning it.  
You lose otherwise.

In stone counts scenario 1 gives you 5 stones.    scenario 2 gives you
40 stones (20% chance of winning 200 stones.)    The difference is
enormous!   The losing moves wins by a large margin!      

Even more amazing is that even if the big group was infinitely large, 
scenario 1 is a sure win and scenario 2 represents only a 5 percent
chance of winning the game.       This is the nature of the error that
you want to fold into the evaluation function.

It's natural to think that MC play-outs themselves are subject to error
and so needs some help with some more "sensible" heuristics.     For
instance 30% winning chance doesn't sound very good but due to error it
could be a lot more.   So should the program revert to stone counting
with the idea that if you win a few more stones you will have a better
chance of winning the game?    

Let's consider the difference in how both program would play in this
scenario:

   1. A stone counter will focus on a few small groups that are easy to
win and minimize the sure loss.

   2. A monte carlo program will see that these few small groups are
irrelevant and will focus entirely on some line of play that gives it a
fighting chance.    Perhaps attacking a large group that it has about
30% chance of winning.    Not good odds, but certainly better than
picking off the little stones that guarantee a loss.

With an alpha beta search,  you horizon is limited.  Let's say you are
doing a 5 ply search.   Then your evaluation function should be heavily
biased towards counting potential territory.   If there is a big group
that can be won in 5 ply and it's not clear whether you can waste any
time,  then you should go for it because you don't have any idea what
will happen later.   You must be greedy.     

However MC play-outs is not horizon limited like this.   It's stupid to
make it greedy because it may notice that winning the big group leads to
a loss every time and that some other course of action is more productive.

I don't see how to mix the two and many of us have tried to no avail.   
The only possibility that I can think of is to put the stone count
information in the lower order bits and make them so low they cannot
affect the higher order probability bits.  


- Don



>
> Mark
>
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] low-hanging fruit - yose

Reply via email to