Gian-Carlo Pascutto wrote:
> Don Dailey wrote:
>
>>> The rest of your story is rather anecdotal and I won't comment on it.
>> Are you trying to be politely condescending? 
>
> No! Thing is:
>
> 1) I disagree with quite a few things which I have no interest in
> arguing (much) about because...
> 2) I wouldn't trust any opinion (including mine) thats not backed by
> cold, hard data. It seems this data doesn't publicly exist or we're
> not aware of the right publication.
Ok,  I don't blame you for distrusting my previous study,  especially as
it was many years ago and I cannot show you the actual data.

So in order that this doesn't remain anecdotal,   I'm doing another
study and I'm going to publish it on a web site somewhere.    I'm also
going to do something that you don't see in most published papers in
computer science,  I am going to make it possible to verify my results.    

I'm very surprised that you would even challenge me on this because this
is something that I thought is still common knowledge in the computer
chess community.   

Here is what I'm going to do:  

I will take an open source chess program, Toga,  and run  a multi-round
robin between 7 versions from fixed depth 1 to fixed depth 7.   Two
versions of Toga at these 7 levels where one version has pawn structure,
king safety, and passed pawns turned off.  

BOTH versions have NullMove Pruning and History Pruning turned off
because I feel that it would bias the test due to interactions between
selectivity and evaluation quality (I believe it would make the strong
version look even more scalable than it is.)  

I'll be using  200 different openings.   So each program will play each
other program 400 times.    I'll publish the exact Toga version and "uci
setoption" settings I used as well as the complete PGN game file - so
people can be free to interpret the result any way they choose.    In my
experience,  there will be a few unreasonable people who won't believe
the results no matter how the testing is done and will rationalize it
away,  in a similar manner to how the flat earth society refuses any
kind of empirical evidence.    Nevertheless, I will present it to the
world as it is.

I believe a reasonable person will come to the conclusion that the
quality of the evaluation is not just another constant that contributes
a fixed ELO increment to the program, but that it has a synergistic  (I
hate that word but it applies here) affect on the whole program and the
scalability.  

Now, here is my comment about  Tord's statement: 

    "Fruit's evaluation function is actually very good. It is true that there
    are many programs with more knowledgeable evals, but as explained above,
    this is not the same as better evals. Fruit's evaluation is founded on a
    sound philosophy, and has very few bugs. This is far more important than how
    much knowledge it contains."

I agree with this completely, especially the very first sentence that
Fruit has a very good evaluation function.    Tord's statement doesn't
claim that a simple evaluation is better, only that Fruit's is better
than others that are less well engineered.    The quality of an
evaluation function is a function of many things that together are
rather difficult to assess and this is not about the  total number of
evaluation features,  that's only one aspect of the whole picture.  
 

>
>> I'm not sure what you mean by the "starting
>> premise."     What is the starting premise? 
>
> Better "knowledge" scales better.
But this is the point you are challenging me on and quoted from me for
Chess,  so I assume you are talking about GO?


>
>>> I was surprised by the original Mogo-Leela result but the "light"
>>> result seems to show it was a bit of a coincidence.
>> I'm sorry, I don't follow.   What is surprising about the original
>> Mogo-Leela result?   Is it better or worse than you expected? 
>
> Given that I started with Mogo playouts and improved them, why did we
> end up with 2 parallel lines.
>
> Perhaps Mogo's search is (was?) better.
We don't really know Mogo's exact evaluation function or yours for that
matter.    But it's interesting to me too that the lines are
parallel.     I can only guess that the play-out quality is similar
between the two programs?  


>
>> What I mean is that the evaluation function is of better quality - knows
>> more about chess in some sense.  
>
> Yes, but "knows more" can be something very different from what one
> normally thinks.
>
> Would you rather have no evaluation for some feature or an evaluation
> that is wrong?
I answered that above -  in summary it's pretty difficult to assess an
evaluation function due to so many factors that interact.    I'm not a
big knowledge advocate,  I believe that the more knowledge you have, the
more difficult it is to get it right - and indeed it can conflict with
other knowledge.   



- Don
   




_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to