>2) Many go program authors have stated that "play to maximize wins"
>is stronger than "play to maximize points".  I think this is because
>their evaluation functions are imperfectly optimistic--the program
>counts points that future play does not deliver.

You could be right, because really this is something that no one is sure of.
But I will give a 90% confident reply that you are wrong, and using point
differential in MCTS *should* be weaker than using winning percentage.

The problem with point differential is the signal to noise ratio. The result
of a single trial could have a standard deviation of 100 points, so if you
are trying to identify the move with the highest point differential then you
have a lot of statistical noise to work though.

Winning percentage has only two outcomes, so the standard deviation is
bounded. And it seems (experimentally) that the difference between best and
second-best plays based on winning percentage (the "signal") is a larger
fraction of the standard deviation.

Now, it could be that this is just an implementation artifact, and if
someone wrote a better estimator of point differential then we would see
different things. And that's possible.

But then every MCTS Go program does use winning percentage, so we believe
that this is implied by the domain.

Brian

_______________________________________________
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to