Re: [computer-go] Monte-Carlo Tree Search reference bot

Mark Boon Thu, 27 Nov 2008 19:48:58 -0800


On 27-nov-08, at 19:50, Denis fidaali wrote:

So, you use AMAF for "simulating" the first UCT evaluations ?
I though the classical way to use AMAF, was to affect only the
win/lose ratio portion of the uct equation.
Obvioulsy it should be allowed to use an arbitrary
large number of AMAF simulation accumulating them longer
than what it take to expand a node.
I think that a classical way to affect the win/ratio is to
decrease the effect of the AMAF correction as the number
of simulation grows.

If you test with a very low number of simulation
(in the 1000 - 3000 range), i think you should be
able to get out a very nice improvement out of the
AMAF version. If you don't, i would think that something
is wrong somewhere.

What test process do you use for this version ?

I tested it mostly doing 2,000 playouts. When AMAF is true I create amap of virtual-win values of all the moves played during a playout.These values get accumulated over all the playouts (not just thefirst ones). The 'virtual-value' of a move is calculated as follows:

exploration-factor * UCT + ( (nr-wins*2 + nr-virtual-wins) / (nr-playouts*2 + nr-virtual-playouts))

where the exploration-factor is currently sqrt(0.2) and UCT is sqrt( log( nr-parent-playouts ) / ( nr-playouts+1) )

Like I said, I haven't had time to experiment much so this formulamay not be any good. I had also expected to see some positive effectof the virtual-win / virtual-playout ratio from AMAF, but I see none.Of course it's also possible I have a different kind of bug still.

What happens in my 'formula' is that when it never expands beyond thefirst level, which is what happens if the number of simulations isequal to the number of simulations before expansion, the virtual-value becomes completely determined by nr-virtual-wins / nr-virtual-playouts making it equivalent to the original ref-bot. In case itdoes expand further and creates a tree, the actual win-loss ratio isweighed twice as heavily as the virtual win-loss ratio. That seemedlike a reasonable first try. I have tried a few others, but usuallydidn't get much different results or much worse results.


Mark

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo Tree Search reference bot

Reply via email to