At the end of a playout there is probably some code that says samoething
like
  reward = (score > komi) ? 1.0 : 0.0;

You can just replace it with
  reward = 1 / (1 + exp(- K * (score - komi)));

A huge value of K will reproduce the old behaviour, a tiny value will result
in a program that tries to maximize expected score, and values in the middle
will blend both things nicely. Of course you would precompute this in a
table.

This seems elegant and simple to me. Now we only need to know how it affects
performance. I bet there are values of K that would make everyone happy (no
measurable loss in strength, still play good-looking moves even if the game
is decided).


Álvaro.


On Dec 13, 2007 3:42 PM, Chris Fant <[EMAIL PROTECTED]> wrote:

> On Dec 13, 2007 3:33 PM, Chris Fant <[EMAIL PROTECTED]> wrote:
> > Seems like the final solution to this would need to build out the
> > search tree to the end of the game, finding a winning line.  And then
> > search again with a different evaluation function (one based on
> > points).  If the second search cannot find a line that wins bigger
> > than the first search did, just play the move returned by the first
> > search.  And you could get more clever be allowing the second search
> > to start with some information from the first search.  Note that when
> > I say "winning line", I mean all the way to the end.  No MC here.
> >
>
>
> Actually, I suppose it need not be to the absolute end of the game.
> As long as all MC sims that finish out the game prior to scoring lead
> to a win, then you can consider the tree portion a guaranteed winning
> line and try the second search to maximize points.
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to