Weston Markham wrote:

But of course, it's not the size of the win that counts, it is rather
the confidence that it really is a win.
Yes, and my reasoning was that a larger average win implied a higher confidence since there is more room for error. That intuition may not hold though.
> In random playouts that
continue from a position from a close game, the ones that result in a
large victory are generally only ones where the opponent made a severe
blunder.  (Put another way, the score of the game is affected more by
how bad the bad moves are, rather than how good the good ones are, or
even how good most of the moves are.  Others have commented on this
effect in this list, in other contexts.)  Since you can't count on
that happening in the real game, these simulations have a lower value
in the context of ensuring a win.
That is the first reasonable argument I've heard that makes some sense as to why this effect may be true. The opposite of course may be true as well and close games may really not be close due to the same blunder effect. Perhaps it is just another symptom of the fact that most playouts are nonsense games.

> (snip)

Given that people have reported such a strong effect, I am actually
wondering if these simulations (those that result in a large score
difference) should be _penalized_, for not being properly
representative of the likely outcome of the game.  In other words:

value = 1000 * win - score
Instead of penalizing these simulations, what about keeping frequencies of the simulation scores and throwing out lower and upper extreme data points, then using the remaining average? By throwing out extremes, it might be safer to use the scoring information in the evaluation. This frequency distribution concept might also be used to find a type of quiescence or trust over a set of simulations - a single cluster of simulations returning tightly grouped scores from a position.

The real problem is with the poor simulations. Is there a way to measure the quality of a simulation somehow? If this were feasible, having the scores and confidence factors for each simulation would be pretty powerful for UCT evaluation, wouldn't it? Could the number of captured stones during a random simulation be an indicator? Are there other (cheap) heuristics that could be used to recognize nonsensical patterns during the playout?

A while back I suggested using a "stratified sampling" method where multiple different types of simulation distributions might be used and combined to combat the ineffectiveness of any single simulation method. Does anyone have any thoughts about this?

-Matt


_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to