Don wrote:
> Presumably, if you run 1000 random play-outs from a given position you
> will get a fair indication of "how good" the position is.   
> 
> But what if you are able to prune out many of the bad moves in that
> simulation?   Would this improve the accuracy of the simulation?   
> 
> Probably, but not necessarily.   Suppose that during the play-outs, you
> are able to prune out 50% of the "bad" black moves, but only 30% of the
> "bad" white moves?     You would be playing 1000 simulations where BLACK
> was playing consistently stronger, regardless of how good the actual
> position was.   
> ...

Yes, I've noticed this.

I've been toying with the idea of having a set of playout algorithms and
allowing black and white to choose different algorithms in that playout.
 (The idea came from trying to think how I could apply genetic
algorithms to UCT playouts.)

Here's how it would work. Assume you have 4 algorithms, A/B/C/D, some
aggressive, some defensive, etc. All with a random element. For the
first 16 playouts you try all combinations:
  Black uses A, White uses A;
  Black uses A, White uses B;
  ...
  Black uses D, White uses D;

Now, if you noticed any trends then emphasize them in the choice of
future playout algorithms. So if black never won with algorithm A, but
always won with B, and won about half with C and D, then he'd choose A
5% of the time, B 45% of the time, C 25% and D 25% of the time. White
may have found playout algorithms A and B won 3 out of 4, whereas C and
D only won 1 out of 4. So white would choose A and B 35% of the time, C
and D 15% of the time.

In go terms, white may be ahead on territory but have a lot of
weaknesses. Algorithm A might be weighting responding to the last enemy
move. Algorithm B might be encouraging making good shape. Algorithm C
might encourage capture or atari when you can. D might prefer areas of
equal influence.

So, after those initial 16 playouts each side would be choosing a
playout algorithm that better exploits their current position.

I have also considered instead of 4 distinct algorithms, one algorithm
with some tunable parameters. The idea then would be based on what wins
and loses to adjust the parameters before doing the next playout.

Darren

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to