Re: [computer-go] Monte-Carlo Tree Search reference bot

dhillismail Sun, 07 Dec 2008 09:50:32 -0800

Finally got a window of time. It's called the "epsilon trick." Here is a cut 
and paste of Lukasz' explanation from the archives, including how to calculate 
N.
- Dave Hillis


--------------------------------------

-----Original Message-----
From: [EMAIL PROTECTED]
To: computer-go@computer-go.org
Sent: Sun, 14 Jan 2007 4:50 AM
Subject: [computer-go] Fast UCT (epsilon trick)


Hi 
I've looked back into the article and I see that it is not trivial to 
understand how does the epsilon trick (ET) works. Therefore I will 
describe it here on the example of UCT. 

The bottleneck of UCT algorithm is choosing a move. Program has to 
iterate over all children 
evaluating their UCB value. We want to use of this loop rarely. 
One way to do it is to group playouts together. I.e. play 2 or more MC 
simulations per each tree traversal. But this is not good enough as it 
doesn't distribute 
the playouts over the tree and some parts will not get enough of them. 

The ET works as follows: 
With each node of the tree program keeps additional information: 
- child_to_visit 
- visits_to_go 

When we visit node, 
if node.visits_to_go > 0 then 
node->visits_to_go <- node.visits_to_go - 1 
node = child_to_visit 

So we have predetermined strategy to go to particular child visits_to_go times. 

But 
if node.visis_to_go = 0 then // we have to set the strategy first 
node->child_to_visit = UCT_find_child (node) 
node->visits_to_go = Epsilon * node->visited_count // round up 

As we can see, when 
Epsilon is small or when a particular node is 
visited not too many times, UCT-ET is identical to UCT. 

But when node was visited more times (near the root, far from leaves), 
then UCT-ET may save some time and go to precalculated child. 

That's it. 
I hope You like it :) 
Łukasz Lew 

PS. 

In Df-PN, good Epsilon was 1/4 I believe that here it will be much 
smaller. 1/64 maybe. 
Or even one may change the Epsilon equation to: 
node->visits_to_go = Epsilon * sqrt(node->visited_count) 

If someone will do some experiments, I'm eager to hear the results. 

PS2 

float InvSqrt (float x) { 
  float xhalf = 0.5f * x; 
  int i = *(int*)&x; 
  i = 0x5f3759df - (i>>1); 
  x = *(float*)&i; 
  x = x * (1.5f - xhalf * x * x); 
  return x; 
} 

;-) 

_______________________________________________
--------------------------------------


-----Original Message-----
From: [EMAIL PROTECTED]
To: computer-go@computer-go.org
Sent: Wed, 3 Dec 2008 12:56 am
Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot


Hmm.. it could be that N is picked randomly each time... now I can't seem to 
find the description and my memory is not serving me well here. Perhaps someone 
else remembers? I'll try to track it down.

- Dave Hillis


-----Original Message-----
From: Michael Williams <[EMAIL PROTECTED]>
To: computer-go <computer-go@computer-go.org>
Sent: Wed, 3 Dec 2008 12:14 am
Subject: Re: [computer-go] Monte-Carlo Tree Search reference b
ot


That's going to repeat the same exact path through the tree three times, isn't 
it? If so, it seems like it would be more efficient to do N playouts from the 
leaf after each traversal of the tree. 
 
[EMAIL PROTECTED] wrote: 
> There is another speedup trick that might interest you. IIRC Lukasz Lew > 
> came up with it, but I forget what he called it. After calculating the > 
> selected move for an internal node (going through UCT and RAVE and > whatnot) 
> store that move. Then, for the next N visits to that node > (where N is 5 or 
> 10 or so), just repeat that move without having to > calculate what the move 
> would be. 
> > - Dave Hillis 
> > -----Original Message----- 
> From: Mark Boon <[EMAIL PROTECTED]> 
> To: computer-go <computer-go@computer-go.org> 
> Sent: Tue, 2 Dec 2008 11:17 pm 
> Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot 
> > I have made some minor performance improvements and this is as far as I > 
> > intend to take this particular project. I might make some small changes > 
> > if necessary, but most likely I'll leave this largely unchanged from now. > 
> > > I had set myself as an arbitrary goal that it should do at least 20K > 
> > playouts. But with real liberties, AMAF and a RAVE formula I got stuck > in 
> > the 16K-17K range. According to my profiler that is mostly due to the > 
> > expensive formula used to compare nodes, where it says it spends 25% of > 
> > total time. The formula I'm using is: > > beta * (virtu
al-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT) > > beta = 1 - 
log(parent-visits) / 20 > UCT = exploration-factor *sqrt( log(parent-visits) / 
(nr-visits+1) ) > RAVE = exploration-factor *sqrt( log(parent-visits) / > 
(nr-virtual-visits+1) ) > > There are probably quite a few possibilities still 
to tune this program > with regards to playing streng th and performance. But I 
felt it doesn't > help to obscure the implementation by too many details. > > 
The implementation of the search algorithm was entirely > game-independent, 
until I introduced AMAF. I didn't see how to fix that, > as there's no way 
getting around that it's based on the fact that a move > is represented by a 
single coordinate, which is mapped to an array to > store the statistical 
values. But strip the AMAF part, which is a block > of 30 odd lines of code, 
and I think it can be used for other games > basically as-is. I did this not 
because I ever see myself program > another game, but because in my experience 
in doing so I get a cleaner > separation between modules. > > At 2,000 
playouts, it's still quite a bit weaker than the plain MC-AMAF > refbot. It 
wins only about 33%. But that's probably because the > 1,000-2,000 playouts 
range is the sweet-spot for that particular type of > playing engine. It doesn' 
t scale from there, whereas the MCTS ref-bot > only just gets war med up with 
2,000 playouts. > > This leads me to a question. I suppose it might be of some 
interest to > put this bot up on CGOS
. But what parameters to use? The main one being > the number of playouts, 
naturally. > > Mark > _______________________________________________ > 
computer-go mailing list > computer-go@computer-go.org 
<mailto:computer-go@computer-go.org> > 
http://www.computer-go.org/mailman/listinfo/computer-go/ > > 
------------------------------------------------------------------------ 
> Tis the season to save your money! Get the new AOL Holiday Toolbar > 
> <http://toolbar.aol.com/holiday/download.html?ncid=emlweusdown00000008> > for 
> money saving offers and gift ideas. 
> > > ------------------------------------------------------------------------ 
> > _______________________________________________ 
> computer-go mailing list 
> [EMAIL PROTECTED]
> http://www.computer-go.org/mailman/listinfo/computer-go/ 
 
_______________________________________________ 
computer-go mailing list 
[EMAIL PROTECTED]
http://www.computer-go.org/mailman/listinfo/computer-go/ 



Tis the season to save your money! Get the new AOL Holiday Toolbar for money 
saving offers and gift ideas. 

_______________________________________________
omputer-go mailing list
[EMAIL PROTECTED]
ttp://www.computer-go.org/mailman/listinfo/computer-go/





_______________________________________________
omputer-go mailing list
[EMAIL PROTECTED]
ttp://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo Tree Search reference bot

Reply via email to