Nice idea and worth a try. I predict that this will weaken the program no matter what value you use, but that there may indeed be a reasonable compromise that gives you the "better" behavior with only a very small decline in strength.
I think this bother people so much that they would be willing to sacrifice a tiny bit of strength to get the greedy behavior. - Don Álvaro Begué wrote: > At the end of a playout there is probably some code that says > samoething like > reward = (score > komi) ? 1.0 : 0.0; > > You can just replace it with > reward = 1 / (1 + exp(- K * (score - komi))); > > A huge value of K will reproduce the old behaviour, a tiny value will > result in a program that tries to maximize expected score, and values > in the middle will blend both things nicely. Of course you would > precompute this in a table. > > This seems elegant and simple to me. Now we only need to know how it > affects performance. I bet there are values of K that would make > everyone happy (no measurable loss in strength, still play > good-looking moves even if the game is decided). > > > Álvaro. > > > On Dec 13, 2007 3:42 PM, Chris Fant <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > > On Dec 13, 2007 3:33 PM, Chris Fant <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > > Seems like the final solution to this would need to build out the > > search tree to the end of the game, finding a winning line. And > then > > search again with a different evaluation function (one based on > > points). If the second search cannot find a line that wins bigger > > than the first search did, just play the move returned by the first > > search. And you could get more clever be allowing the second search > > to start with some information from the first search. Note that > when > > I say "winning line", I mean all the way to the end. No MC here. > > > > > Actually, I suppose it need not be to the absolute end of the game. > As long as all MC sims that finish out the game prior to scoring lead > to a win, then you can consider the tree portion a guaranteed winning > line and try the second search to maximize points. > _______________________________________________ > computer-go mailing list > computer-go@computer-go.org <mailto:computer-go@computer-go.org> > http://www.computer-go.org/mailman/listinfo/computer-go/ > > > ------------------------------------------------------------------------ > > _______________________________________________ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/