Darren Cook wrote:
I've been toying with the idea of having a set of playout algorithms and
allowing black and white to choose different algorithms in that playout.
 (The idea came from trying to think how I could apply genetic
algorithms to UCT playouts.)

Here's how it would work. Assume you have 4 algorithms, A/B/C/D, some
aggressive, some defensive, etc. All with a random element. For the
first 16 playouts you try all combinations:
  Black uses A, White uses A;
  Black uses A, White uses B;
  ...
  Black uses D, White uses D;

Now, if you noticed any trends then emphasize them in the choice of
future playout algorithms. So if black never won with algorithm A, but
always won with B, and won about half with C and D, then he'd choose A
5% of the time, B 45% of the time, C 25% and D 25% of the time. White
may have found playout algorithms A and B won 3 out of 4, whereas C and
D only won 1 out of 4. So white would choose A and B 35% of the time, C
and D 15% of the time.

In go terms, white may be ahead on territory but have a lot of
weaknesses. Algorithm A might be weighting responding to the last enemy
move. Algorithm B might be encouraging making good shape. Algorithm C
might encourage capture or atari when you can. D might prefer areas of
equal influence.

So, after those initial 16 playouts each side would be choosing a
playout algorithm that better exploits their current position.

I have also considered instead of 4 distinct algorithms, one algorithm
with some tunable parameters. The idea then would be based on what wins
and loses to adjust the parameters before doing the next playout.

Darren
I've been thinking about that type of thing too. It seems like Remi's framework of learning from professional games could be applied in a similar manner.
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to