Nice idea and worth a try.    I predict that this will weaken the
program no matter what value you use, but that there may indeed be a
reasonable compromise that gives you the "better" behavior with only a
very small decline in strength.  

I think this bother people so much that they would be willing to
sacrifice a tiny bit of strength to get the greedy behavior.

- Don


Álvaro Begué wrote:
> At the end of a playout there is probably some code that says
> samoething like
>   reward = (score > komi) ? 1.0 : 0.0;
>
> You can just replace it with
>   reward = 1 / (1 + exp(- K * (score - komi)));
>
> A huge value of K will reproduce the old behaviour, a tiny value will
> result in a program that tries to maximize expected score, and values
> in the middle will blend both things nicely. Of course you would
> precompute this in a table.
>
> This seems elegant and simple to me. Now we only need to know how it
> affects performance. I bet there are values of K that would make
> everyone happy (no measurable loss in strength, still play
> good-looking moves even if the game is decided).
>
>
> Álvaro.
>
>
> On Dec 13, 2007 3:42 PM, Chris Fant <[EMAIL PROTECTED]
> <mailto:[EMAIL PROTECTED]>> wrote:
>
>     On Dec 13, 2007 3:33 PM, Chris Fant <[EMAIL PROTECTED]
>     <mailto:[EMAIL PROTECTED]>> wrote:
>     > Seems like the final solution to this would need to build out the
>     > search tree to the end of the game, finding a winning line.  And
>     then
>     > search again with a different evaluation function (one based on
>     > points).  If the second search cannot find a line that wins bigger
>     > than the first search did, just play the move returned by the first
>     > search.  And you could get more clever be allowing the second search
>     > to start with some information from the first search.  Note that
>     when
>     > I say "winning line", I mean all the way to the end.  No MC here.
>     >
>
>
>     Actually, I suppose it need not be to the absolute end of the game.
>     As long as all MC sims that finish out the game prior to scoring lead
>     to a win, then you can consider the tree portion a guaranteed winning
>     line and try the second search to maximize points.
>     _______________________________________________
>     computer-go mailing list
>     computer-go@computer-go.org <mailto:computer-go@computer-go.org>
>     http://www.computer-go.org/mailman/listinfo/computer-go/
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to