Quoting Hideki Kato <[EMAIL PROTECTED]>:

Yes, UCT does.  From my recent experiments with a delay
line (a fixed size FIFO queue) between a UCTsearcher and an MC
simulator with RAVE against GNU Go 3.7.11 level 0 on 9x9 (single
thread):

delay   #po     wins    games   winning rate    ELO     1 sigma of wr
0       1,000   721     2,000   36.05%          -99.6   1.07%
1       1,000   721     2,000   36.05%          -99.6   1.07%
2       1,000   690     2,000   34.50%          -111.4  1.06%
3       1,000   663     2,000   33.15%          -121.8  1.05%
5       1,000   642     2,000   32.10%          -130.1  1.04%
10      1,000   522     2,000   26.10%          -180.8  0.98%
20      1,000   412     2,000   20.60%          -234.4  0.90%
50      1,000   82      2,000   4.10%           -547.6  0.44%

If I understand this correctly this simulation for delay 50 computes 50 playouts and then updates the tree.

In Valkyria I do the following. Every simulation from the root with their own thread updates all nodes as visited down the tree before entering the heavy playout. This means that all moves made in the tree are temporarily updated as losses. When a playout has finished, half of the moves were winners and updated accordingly.

The idea behind this is that this hopefully avoids searching the same path over and over again. Have tried anything like this?

Also your results shows clearly that there is inefficency. But do you also have results where for example delay 50 also computes 50x1000 simulations so that we can see if what it means in practise?

Magnus





_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to