On 5/18/07, Rémi Coulom <[EMAIL PROTECTED]> wrote:
My idea was very similar to what you describe. The program built a collection of rules of the kind "if condition then move". Condition could be anything from a "tree-search rule" of the kind "in this particular position play x", or general rule such as "in atari, extend". It could be also anything in-between, such as a miai specific to the current position. The strengths of moves were updated with an incremental Elo-rating algorithm, from the outcomes of random simulations.
The obvious way to update weights is to reward all the rules that fired for the winning side, and penalize all rules that fired for the losing side, with rewards and penalties decaying toward the end of the playout. But this is not quite Elo like, since it doesn't consider rules to beat each other. So one could make the reward dependent on the relative weight of the chosen rule versus all alternatives. increasing the reward if the alternatives carried a lot of weight. Is that how your ratings worked? I'm not sure how that compares with TD learning. Maybe someone more familiar with the latter can point out the differences. regards, -John
_______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/