Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-07 Thread dhillismail
Finally got a window of time. It's called the "epsilon trick." Here is a cut 
and paste of Lukasz' explanation from the archives, including how to calculate 
N.
- Dave Hillis

--

-Original Message-
From: [EMAIL PROTECTED]
To: computer-go@computer-go.org
Sent: Sun, 14 Jan 2007 4:50 AM
Subject: [computer-go] Fast UCT (epsilon trick)


Hi 
I've looked back into the article and I see that it is not trivial to 
understand how does the epsilon trick (ET) works. Therefore I will 
describe it here on the example of UCT. 

The bottleneck of UCT algorithm is choosing a move. Program has to 
iterate over all children 
evaluating their UCB value. We want to use of this loop rarely. 
One way to do it is to group playouts together. I.e. play 2 or more MC 
simulations per each tree traversal. But this is not good enough as it 
doesn't distribute 
the playouts over the tree and some parts will not get enough of them. 

The ET works as follows: 
With each node of the tree program keeps additional information: 
- child_to_visit 
- visits_to_go 

When we visit node, 
if node.visits_to_go > 0 then 
node->visits_to_go <- node.visits_to_go - 1 
node = child_to_visit 

So we have predetermined strategy to go to particular child visits_to_go times. 

But 
if node.visis_to_go = 0 then // we have to set the strategy first 
node->child_to_visit = UCT_find_child (node) 
node->visits_to_go = Epsilon * node->visited_count // round up 

As we can see, when 
Epsilon is small or when a particular node is 
visited not too many times, UCT-ET is identical to UCT. 

But when node was visited more times (near the root, far from leaves), 
then UCT-ET may save some time and go to precalculated child. 

That's it. 
I hope You like it :) 
Łukasz Lew 

PS. 

In Df-PN, good Epsilon was 1/4 I believe that here it will be much 
smaller. 1/64 maybe. 
Or even one may change the Epsilon equation to: 
node->visits_to_go = Epsilon * sqrt(node->visited_count) 

If someone will do some experiments, I'm eager to hear the results. 

PS2 

float InvSqrt (float x) { 
  float xhalf = 0.5f * x; 
  int i = *(int*)&x; 
  i = 0x5f3759df - (i>>1); 
  x = *(float*)&i; 
  x = x * (1.5f - xhalf * x * x); 
  return x; 
} 

;-) 

___
--


-Original Message-
From: [EMAIL PROTECTED]
To: computer-go@computer-go.org
Sent: Wed, 3 Dec 2008 12:56 am
Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot


Hmm.. it could be that N is picked randomly each time... now I can't seem to 
find the description and my memory is not serving me well here. Perhaps someone 
else remembers? I'll try to track it down.

- Dave Hillis


-Original Message-
From: Michael Williams <[EMAIL PROTECTED]>
To: computer-go 
Sent: Wed, 3 Dec 2008 12:14 am
Subject: Re: [computer-go] Monte-Carlo Tree Search reference b
ot


That's going to repeat the same exact path through the tree three times, isn't 
it? If so, it seems like it would be more efficient to do N playouts from the 
leaf after each traversal of the tree. 
 
[EMAIL PROTECTED] wrote: 
> There is another speedup trick that might interest you. IIRC Lukasz Lew > 
> came up with it, but I forget what he called it. After calculating the > 
> selected move for an internal node (going through UCT and RAVE and > whatnot) 
> store that move. Then, for the next N visits to that node > (where N is 5 or 
> 10 or so), just repeat that move without having to > calculate what the move 
> would be. 
> > - Dave Hillis 
> > -Original Message----- 
> From: Mark Boon <[EMAIL PROTECTED]> 
> To: computer-go  
> Sent: Tue, 2 Dec 2008 11:17 pm 
> Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot 
> > I have made some minor performance improvements and this is as far as I > 
> > intend to take this particular project. I might make some small changes > 
> > if necessary, but most likely I'll leave this largely unchanged from now. > 
> > > I had set myself as an arbitrary goal that it should do at least 20K > 
> > playouts. But with real liberties, AMAF and a RAVE formula I got stuck > in 
> > the 16K-17K range. According to my profiler that is mostly due to the > 
> > expensive formula used to compare nodes, where it says it spends 25% of > 
> > total time. The formula I'm using is: > > beta * (virtu
al-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT) > > beta = 1 - 
log(parent-visits) / 20 > UCT = exploration-factor *sqrt( log(parent-visits) / 
(nr-visits+1) ) > RAVE = exploration-factor *sqrt( log(parent-visits) / > 
(nr-virtual-visits+1) ) > > There are probably quite a few possibilities still 
to tune this program > with rega

Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-03 Thread Don Dailey
On Wed, 2008-12-03 at 11:24 -0200, Mark Boon wrote:
> If this is because I use Java, then  
> Don's concise C implementation of the MC-AMAF bot should be a lot  
> faster than my bloated Java version.

I don't remember the numbers, but my own java and C implementation were
written in the same style - same basic data structure and just about the
only difference was language.   I did do some kind of list trick using
pointers than you cannot do in java, at least not easily and efficiently
but I don't think it made that much of a difference.

The Java version was not that bad for speed compared to C, I think C was
something like 2/3 of the speed or close to that.But if you got into
heavy duty data structures Java hurts - you won't get as big a tree in
Java for instance and I'm sure caching will be a bigger issue.


- Don



signature.asc
Description: This is a digitally signed message part
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-03 Thread Don Dailey
I had a chess program years ago that was blindingly fast on some
computers,  very slow on others.   It was all about the cache.  The move
generator was hard coded for each piece on each square.  For instance a
white pawn on d7 had it's very own move specialized move generator.
There was a function for each piece of each color on each square. 

By doing this I avoided the majority of conditional tests.  I did not
have to test for board edge, pawn on 7th, pawn on the second rank, etc.
All the logic was unrolled and I basically had to write a program to
write the move generator.  

The machine I developed this program on had a lot of cache and it was
not a problem.  But when running on, for instance a laptop,  it would
slow down enormously - something like 4 to 1 where other cache friendly
programs would slow down perhaps 25%.   

I don't keep up much with processor technologies now and I don't know
how much data vs instruction cache modern setups have, but I know it is
far better than it used to be.   I bet that old program would run well
on modern computers. 

One trick, which you are probably doing, is to keep data structures
together in memory.  For instance it is better to have children next to
each other so that you are not accessing memory all over the place,
perhaps incurring several cache misses instead of one. 


On Wed, 2008-12-03 at 11:24 -0200, Mark Boon wrote:
> On 3-dec-08, at 10:31, Heikki Levanto wrote:
> >
> > Having said that, I can't help responding to one detail:
> >
> >> I had seen people write about memory usage of  the tree, but never
> >> understood the concerns.
> >
> > One thing to remember is that more memory use means more cache  
> > misses, and
> > more access to the main memory. On modern computers, those can cost  
> > as much
> > as executing a thousand instructions! So memory optimizing can  
> > often pay off,
> > also with respect to time!
> >
> > Of course, Java does a lot of memory management behind the scenes, so
> > optimizing such can be harder... But when writing in C or C++, it  
> > does make
> > sense.
> >
> 
> I've seen this stated often. But color me skeptical. If what you say  
> is true, then the MC-AMAF bot, which hardly uses any memory at all  
> and which fits in the secondary cache entirely, code and data  
> combined, should show an enormous speed-up in terms of number of  
> playouts per second. Instead it does only about twice the number of  
> playouts per second as the tree-search version, which can be almost  
> if not entirely explained by the extra work that needs to be done to  
> traverse and maintain the tree. If this is because I use Java, then  
> Don's concise C implementation of the MC-AMAF bot should be a lot  
> faster than my bloated Java version. Which, ahem, is not the case at  
> all. I think a well-crafted C program can be up to twice as fast as a  
> similar Java program. But I doubt that has much to do with cache misses.
> 
> I think in computer-Go, trying to control cache misses is futile no  
> matter the language. Maybe you can do it for a relatively trivial,  
> well-understood and stable sub-part. But then you risk losing the  
> gain again as soon as it gets incorporated into a larger whole. So  
> your efforts go to waste.
> 
> So this is in the "show me" category for me. You can use C or C++ if  
> you like, but show me a tree-search bot that speeds up the number of  
> playouts significantly by reducing the node-size. Maybe it's  
> possible, but I don't believe it until I see it. This could be  
> different for Chess programs, at least I've seen it claimed often.  
> But I don't envy them, it must be excruciatingly frustrating to deal  
> with.
> 
> Mark
> 
> ___
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/


signature.asc
Description: This is a digitally signed message part
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-03 Thread Heikki Levanto
On Wed, Dec 03, 2008 at 11:24:22AM -0200, Mark Boon wrote:
> 
> Heikki wrote:
> >One thing to remember is that more memory use means more cache  misses,
> >and more access to the main memory. On modern computers, those can cost
> >as much as executing a thousand instructions! So memory optimizing can
> >often pay off, also with respect to time!
>
> 
> I've seen this stated often. But color me skeptical.

Maybe I was quoting "common wisdom" without having checked it personally. I
only remember one case - not related to go at all - where optimizing the
memory stuff made a noticeable difference.

I would also like to see hard evidence. But not badly enough to put in the
necessary work to get some ;-)

  - H

-- 
Heikki Levanto   "In Murphy We Turst" heikki (at) lsd (dot) dk

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-03 Thread Mark Boon


On 3-dec-08, at 10:31, Heikki Levanto wrote:


Having said that, I can't help responding to one detail:


I had seen people write about memory usage of  the tree, but never
understood the concerns.


One thing to remember is that more memory use means more cache  
misses, and
more access to the main memory. On modern computers, those can cost  
as much
as executing a thousand instructions! So memory optimizing can  
often pay off,

also with respect to time!

Of course, Java does a lot of memory management behind the scenes, so
optimizing such can be harder... But when writing in C or C++, it  
does make

sense.



I've seen this stated often. But color me skeptical. If what you say  
is true, then the MC-AMAF bot, which hardly uses any memory at all  
and which fits in the secondary cache entirely, code and data  
combined, should show an enormous speed-up in terms of number of  
playouts per second. Instead it does only about twice the number of  
playouts per second as the tree-search version, which can be almost  
if not entirely explained by the extra work that needs to be done to  
traverse and maintain the tree. If this is because I use Java, then  
Don's concise C implementation of the MC-AMAF bot should be a lot  
faster than my bloated Java version. Which, ahem, is not the case at  
all. I think a well-crafted C program can be up to twice as fast as a  
similar Java program. But I doubt that has much to do with cache misses.


I think in computer-Go, trying to control cache misses is futile no  
matter the language. Maybe you can do it for a relatively trivial,  
well-understood and stable sub-part. But then you risk losing the  
gain again as soon as it gets incorporated into a larger whole. So  
your efforts go to waste.


So this is in the "show me" category for me. You can use C or C++ if  
you like, but show me a tree-search bot that speeds up the number of  
playouts significantly by reducing the node-size. Maybe it's  
possible, but I don't believe it until I see it. This could be  
different for Chess programs, at least I've seen it claimed often.  
But I don't envy them, it must be excruciatingly frustrating to deal  
with.


Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-03 Thread Heikki Levanto
On Wed, Dec 03, 2008 at 09:55:07AM -0200, Mark Boon wrote:
> I thought about that, but I was afraid the code would become too  
> obscure. After all, this is supposed to be a reference  
> implementation. But maybe I should actually give it a try to see what  
> it would look like.

I agree that the reference implementation should be maximally clear code.
Performance is not all that important here, correctness is.


Having said that, I can't help responding to one detail:

> I had seen people write about memory usage of  the tree, but never
> understood the concerns. 

One thing to remember is that more memory use means more cache misses, and
more access to the main memory. On modern computers, those can cost as much
as executing a thousand instructions! So memory optimizing can often pay off,
also with respect to time!

Of course, Java does a lot of memory management behind the scenes, so
optimizing such can be harder... But when writing in C or C++, it does make
sense.

  

  -H

-- 
Heikki Levanto   "In Murphy We Turst" heikki (at) lsd (dot) dk

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-03 Thread Mark Boon


On 3-dec-08, at 06:09, Sylvain Gelly wrote:


What I did (was a "long" time ago, I don't know if it is still used in
Mogo), is to compute the m best moves every so often and most of the
time just do the max over those m moves.
m was on the order of 5, and "every so often" was an increasing
function like sqrt or log (I don't remember).
That speeds up quite a lot (when the max becomes the bottleneck), and
if you tune the parameters well you almost don't loose any strength at
a fixed number of playouts.



I thought about that, but I was afraid the code would become too  
obscure. After all, this is supposed to be a reference  
implementation. But maybe I should actually give it a try to see what  
it would look like.


I also played with precomputing a table of the log() function but  
found little speedup. Instead I decided to (p)recompute the log(nr- 
parent-visits) only once in 8 times. According to my profiler that  
reduced the log() from using 8% to 1% of computation time. Another  
thing I did was replace log(nr-virtual-parent-visits) in the RAVE  
calculation by the same log(nr-parent-visits) used in the UCT  
calculation, cutting both the number of times I need to call log in  
half and reducing the amount of storage in the node a little. After  
these modifications I didn't notice any degradation in play.


In terms of memory use, I did a few obvious things to keep storage of  
the nodes to a minimum. I had seen people write about memory usage of  
the tree, but never understood the concerns. But that was because I  
always expanded the tree one node at a time. Of course, if you create  
all the children in one go like this reference implementation does,  
it's a little bit a different story. But still, I have to push my  
computer hard to run out of space before I run out of time so I  
didn't see much need to reduce memory too aggressively. Using just a  
moderately small number of playouts before expansion is already  
enough to never have memory problems. But if memory is a concern to  
anyone I see two obvious ways to make make significant reductions.  
One is to flatten the data-structure, reducing the object overhead  
Java imposes. That could save up to a third.  The other is to use  
smaller placeholders for nodes that have been created, but not yet  
visited. Once a node gets visited it can be replaced by a full-blown  
node, but until then all you need is the move-coordinate and the AMAF  
statistics. That should save up to 75% or so.


Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-03 Thread Vlad Dumitrescu
On Wed, Dec 3, 2008 at 05:17, Mark Boon <[EMAIL PROTECTED]> wrote:
> I had set myself as an arbitrary goal that it should do at least 20K
> playouts. But with real liberties, AMAF and a RAVE formula I got stuck in
> the 16K-17K range. According to my profiler that is mostly due to the
> expensive formula used to compare nodes, where it says it spends 25% of
> total time. The formula I'm using is:
>
>beta * (virtual-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT)
>
>beta = 1 - log(parent-visits) / 20
>UCT = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) )
>RAVE = exploration-factor *sqrt( log(parent-visits) /
> (nr-virtual-visits+1) )

Hi,

This is probably something you already do, but just in case you don't,
here comes a simple optimization trick: since parents-visits is an
integer value and in a somewhat bounded range (0..max_playouts), the
logarithm can be precomputed in a table. Then even sqrt(log()) can be
precomputed also. Similarly, a table with sqrt(integer()) can be used
for the other values.

best regards,
Vlad
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-03 Thread Sylvain Gelly
What I did (was a "long" time ago, I don't know if it is still used in
Mogo), is to compute the m best moves every so often and most of the
time just do the max over those m moves.
m was on the order of 5, and "every so often" was an increasing
function like sqrt or log (I don't remember).
That speeds up quite a lot (when the max becomes the bottleneck), and
if you tune the parameters well you almost don't loose any strength at
a fixed number of playouts.

Sylvain

2008/12/3  <[EMAIL PROTECTED]>:
> There is another speedup trick that might interest you. IIRC Lukasz Lew came
> up with it, but I forget what he called it. After calculating the selected
> move for an internal node (going through UCT and RAVE and whatnot) store
> that move. Then, for the next N visits to that node (where N is 5 or 10 or
> so), just repeat that move without having to calculate what the move would
> be.
>
> - Dave Hillis
>
> -Original Message-
> From: Mark Boon <[EMAIL PROTECTED]>
> To: computer-go 
> Sent: Tue, 2 Dec 2008 11:17 pm
> Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot
>
> I have made some minor performance improvements and this is as far as I
> intend to take this particular project. I might make some small changes if
> necessary, but most likely I'll leave this largely unchanged from now.
>
> I had set myself as an arbitrary goal that it should do at least 20K
> playouts. But with real liberties, AMAF and a RAVE formula I got stuck in
> the 16K-17K range. According to my profiler that is mostly due to the
> expensive formula used to compare nodes, where it says it spends 25% of
> total time. The formula I'm using is:
>
>   beta * (virtual-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT)
>
>   beta = 1 - log(parent-visits) / 20
>   UCT = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) )
>   RAVE = exploration-factor *sqrt( log(parent-visits) /
> (nr-virtual-visits+1) )
>
> There are probably quite a few possibilities still to tune this program with
> regards to playing strength and performance. But I felt it doesn't help to
> obscure the implementation by too many details.
>
> The implementation of the search algorithm was entirely game-independent,
> until I introduced AMAF. I didn't see how to fix that, as there's no way
> getting around that it's based on the fact that a move is represented by a
> single coordinate, which is mapped to an array to store the statistical
> values. But strip the AMAF part, which is a block of 30 odd lines of code,
> and I think it can be used for other games basically as-is. I did this not
> because I ever see myself program another game, but because in my experience
> in doing so I get a cleaner separation between modules.
>
> At 2,000 playouts, it's still quite a bit weaker than the plain MC-AMAF
> refbot. It wins only about 33%. But that's probably because the 1,000-2,000
> playouts range is the sweet-spot for that particular type of playing engine.
> It doesn't scale from there, whereas the MCTS ref-bot only just gets warmed
> up with 2,000 playouts.
>
> This leads me to a question. I suppose it might be of some interest to put
> this bot up on CGOS. But what parameters to use? The main one being the
> number of playouts, naturally.
>
> Mark
> ___
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
> 
> Tis the season to save your money! Get the new AOL Holiday Toolbar for money
> saving offers and gift ideas.
> ___
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-02 Thread dhillismail
Hmm.. it could be that N is picked randomly each time... now I can't seem to 
find the description and my memory is not serving me well here. Perhaps someone 
else remembers? I'll try to track it down.

- Dave Hillis


-Original Message-
From: Michael Williams <[EMAIL PROTECTED]>
To: computer-go 
Sent: Wed, 3 Dec 2008 12:14 am
Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot


That's going to repeat the same exact path through the tree three times, isn't 
it? If so, it seems like it would be more efficient to do N playouts from the 
leaf after each traversal of the tree.?
?
[EMAIL PROTECTED] wrote:?
> There is another speedup trick that might interest you. IIRC Lukasz Lew > 
> came up with it, but I forget what he called it. After calculating the > 
> selected move for an internal node (going through UCT and RAVE and > whatnot) 
> store that move. Then, for the next N visits to that node > (where N is 5 or 
> 10 or so), just repeat that move without having to > calculate what the move 
> would be.?
> > - Dave Hillis?
> > -Original Message-?
> From: Mark Boon <[EMAIL PROTECTED]>?
> To: computer-go ?
> Sent: Tue, 2 Dec 2008 11:17 pm?
> Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot?
> > I have made some minor performance improvements and this is as far as I > 
> > intend to take this particular project. I might make some small changes > 
> > if necessary, but most likely I'll leave this largely unchanged from now. > 
> > > I had set myself as an arbitrary goal that it should do at least 20K > 
> > playouts. But with real liberties, AMAF and a RAVE formula I got stuck > in 
> > the 16K-17K range. According to my profiler that is mostly due to the > 
> > expensive formula used to compare nodes, where it says it spends 25% of > 
> > total time. The formula I'm using is: > > beta * (virtual-win-ratio + RAVE) 
> > + (1-beta) * (win-ratio + UCT) > > beta = 1 - log(parent-visits) / 20 > UCT 
> > = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) ) > RAVE = 
> > exploration-factor *sqrt( log(parent-visits) / > (nr-virtual-visits+1) ) > 
> > > There are probably quite a few possibilities still to tune this program > 
> > with regards to playing strength and performance. But I felt it doesn't > 
> > help to obscure 
 the implementation by too many details. > > The implementation of the search 
algorithm was entirely > game-independent, until I introduced AMAF. I didn't 
see how to fix that, > as there's no way getting around that it's based on the 
fact that a move > is represented by a single coordinate, which is mapped to an 
array to > store the statistical values. But strip the AMAF part, which is a 
block > of 30 odd lines of code, and I think it can be used for other games > 
basically as-is. I did this not because I ever see myself program > another 
game, but because in my experience in doing so I get a cleaner > separation 
between modules. > > At 2,000 playouts, it's still quite a bit weaker than the 
plain MC-AMAF > refbot. It wins only about 33%. But that's probably because the 
> 1,000-2,000 playouts range is the sweet-spot for that particular type of > 
playing engine. It doesn't scale from there, whereas the MCTS ref-bot > only 
just gets warmed up with 2,000 playouts. > > This leads 
 me to a question. I suppose it might be of some interest to > put th

is bot up on CGOS. But what parameters to use? The main one being > the number 
of playouts, naturally. > > Mark > 
___ > computer-go mailing list > 
computer-go@computer-go.org <mailto:computer-go@computer-go.org> > 
http://www.computer-go.org/mailman/listinfo/computer-go/ > > 
?
> Tis the season to save your money! Get the new AOL Holiday Toolbar > 
> <http://toolbar.aol.com/holiday/download.html?ncid=emlweusdown0008> > for 
> money saving offers and gift ideas.?
> > > ?
> > ___?
> computer-go mailing list?
> [EMAIL PROTECTED]
> http://www.computer-go.org/mailman/listinfo/computer-go/?
?
___?
computer-go mailing list?
[EMAIL PROTECTED]
http://www.computer-go.org/mailman/listinfo/computer-go/?

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-02 Thread Michael Williams
That's going to repeat the same exact path through the tree three times, isn't it?  If so, it seems like it would be more efficient to do N playouts from the 
leaf after each traversal of the tree.



[EMAIL PROTECTED] wrote:
There is another speedup trick that might interest you. IIRC Lukasz Lew 
came up with it, but I forget what he called it. After calculating the 
selected move for an internal node (going through UCT and RAVE and 
whatnot) store that move. Then, for the next N visits to that node 
(where N is 5 or 10 or so), just repeat that move without having to 
calculate what the move would be.


- Dave Hillis

-Original Message-
From: Mark Boon <[EMAIL PROTECTED]>
To: computer-go 
Sent: Tue, 2 Dec 2008 11:17 pm
Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot

I have made some minor performance improvements and this is as far as I 
intend to take this particular project. I might make some small changes 
if necessary, but most likely I'll leave this largely unchanged from now. 
 
I had set myself as an arbitrary goal that it should do at least 20K 
playouts. But with real liberties, AMAF and a RAVE formula I got stuck 
in the 16K-17K range. According to my profiler that is mostly due to the 
expensive formula used to compare nodes, where it says it spends 25% of 
total time. The formula I'm using is: 
 
  beta * (virtual-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT) 
 
  beta = 1 - log(parent-visits) / 20 
  UCT = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) ) 
  RAVE = exploration-factor *sqrt( log(parent-visits) / 
(nr-virtual-visits+1) ) 
 
There are probably quite a few possibilities still to tune this program 
with regards to playing strength and performance. But I felt it doesn't 
help to obscure the implementation by too many details. 
 
The implementation of the search algorithm was entirely 
game-independent, until I introduced AMAF. I didn't see how to fix that, 
as there's no way getting around that it's based on the fact that a move 
is represented by a single coordinate, which is mapped to an array to 
store the statistical values. But strip the AMAF part, which is a block 
of 30 odd lines of code, and I think it can be used for other games 
basically as-is. I did this not because I ever see myself program 
another game, but because in my experience in doing so I get a cleaner 
separation between modules. 
 
At 2,000 playouts, it's still quite a bit weaker than the plain MC-AMAF 
refbot. It wins only about 33%. But that's probably because the 
1,000-2,000 playouts range is the sweet-spot for that particular type of 
playing engine. It doesn't scale from there, whereas the MCTS ref-bot 
only just gets warmed up with 2,000 playouts. 
 
This leads me to a question. I suppose it might be of some interest to 
put this bot up on CGOS. But what parameters to use? The main one being 
the number of playouts, naturally. 
 
Mark 
___ 
computer-go mailing list 
computer-go@computer-go.org <mailto:computer-go@computer-go.org> 
http://www.computer-go.org/mailman/listinfo/computer-go/ 



Tis the season to save your money! Get the new AOL Holiday Toolbar 
<http://toolbar.aol.com/holiday/download.html?ncid=emlweusdown0008> 
for money saving offers and gift ideas.





___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-02 Thread dhillismail
There is another speedup trick that might interest you. IIRC Lukasz Lew came up 
with it, but I forget what he called it. After calculating the selected move 
for an internal node (going through UCT and RAVE and whatnot) store that move. 
Then, for the next N visits to that node (where N is 5 or 10 or so), just 
repeat that move without having to calculate what the move would be.

- Dave Hillis

-Original Message-
From: Mark Boon <[EMAIL PROTECTED]>
To: computer-go 
Sent: Tue, 2 Dec 2008 11:17 pm
Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot


I have made some minor performance improvements and this is as far as I intend 
to take this particular project. I might make some small changes if necessary, 
but most likely I'll leave this largely unchanged from now.?
?
I had set myself as an arbitrary goal that it should do at least 20K playouts. 
But with real liberties, AMAF and a RAVE formula I got stuck in the 16K-17K 
range. According to my profiler that is mostly due to the expensive formula 
used to compare nodes, where it says it spends 25% of total time. The formula 
I'm using is:?
?
? beta * (virtual-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT)?
?
? beta = 1 - log(parent-visits) / 20?
? UCT = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) )?
? RAVE = exploration-factor *sqrt( log(parent-visits) / (nr-virtual-visits+1) )?
?
There are probably quite a few possibilities still to tune this program with 
regards to playing strength and performance. But I felt it doesn't help to 
obscure the implementation by too many details.?
?
The implementation of the search algorithm was entirely game-independent, until 
I introduced AMAF. I didn't see how to fix that, as there's no way getting 
around that it's based on the fact that a move is represented by a single 
coordinate, which is mapped to an array to store the statistical values. But 
strip the AMAF part, which is a block of 30 odd lines of code, and I think it 
can be used for other games basically as-is. I did this not because I ever see 
myself program another game, but because in my experience in doing so I get a 
cleaner separation between modules.?
?
At 2,000 playouts, it's still quite a bit weaker than the plain MC-AMAF refbot. 
It wins only about 33%. But that's probably because the 1,000-2,000 playouts 
range is the sweet-spot for that particular type of playing engine. It doesn't 
scale from there, whereas the MCTS ref-bot only just gets warmed up with 2,000 
playouts.?
?
This leads me to a question. I suppose it might be of some interest to put this 
bot up on CGOS. But what parameters to use? The main one being the number of 
playouts, naturally.?
?
Mark?
___?
computer-go mailing list?
[EMAIL PROTECTED]
http://www.computer-go.org/mailman/listinfo/computer-go/?

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-02 Thread Mark Boon
I have made some minor performance improvements and this is as far as  
I intend to take this particular project. I might make some small  
changes if necessary, but most likely I'll leave this largely  
unchanged from now.


I had set myself as an arbitrary goal that it should do at least 20K  
playouts. But with real liberties, AMAF and a RAVE formula I got  
stuck in the 16K-17K range. According to my profiler that is mostly  
due to the expensive formula used to compare nodes, where it says it  
spends 25% of total time. The formula I'm using is:


beta * (virtual-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT)

beta = 1 - log(parent-visits) / 20
UCT = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) )
	RAVE = exploration-factor *sqrt( log(parent-visits) / (nr-virtual- 
visits+1) )


There are probably quite a few possibilities still to tune this  
program with regards to playing strength and performance. But I felt  
it doesn't help to obscure the implementation by too many details.


The implementation of the search algorithm was entirely game- 
independent, until I introduced AMAF. I didn't see how to fix that,  
as there's no way getting around that it's based on the fact that a  
move is represented by a single coordinate, which is mapped to an  
array to store the statistical values. But strip the AMAF part, which  
is a block of 30 odd lines of code, and I think it can be used for  
other games basically as-is. I did this not because I ever see myself  
program another game, but because in my experience in doing so I get  
a cleaner separation between modules.


At 2,000 playouts, it's still quite a bit weaker than the plain MC- 
AMAF refbot. It wins only about 33%. But that's probably because the  
1,000-2,000 playouts range is the sweet-spot for that particular type  
of playing engine. It doesn't scale from there, whereas the MCTS ref- 
bot only just gets warmed up with 2,000 playouts.


This leads me to a question. I suppose it might be of some interest  
to put this bot up on CGOS. But what parameters to use? The main one  
being the number of playouts, naturally.


Mark
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-11-28 Thread Mark Boon
eriments)
you can associate with it. (Even on a web page if you ask me , or a  
thesis)

To me that is the most usefull part of the standard ref bot :
because a lot of persons have tried it, you can be sure of the  
exact link
between the software, and the experimental results. It makes  
assertions

reproducibles, and that's really great.


> From: [EMAIL PROTECTED]
> Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot
> Date: Fri, 28 Nov 2008 01:48:09 -0200
> To: computer-go@computer-go.org
> CC:
>
>
> On 27-nov-08, at 19:50, Denis fidaali wrote:
>
> > So, you use AMAF for "simulating" the first UCT evaluations ?
> > I though the classical way to use AMAF, was to affect only the
> > win/lose ratio portion of the uct equation.
> > Obvioulsy it should be allowed to use an arbitrary
> > large number of AMAF simulation accumulating them longer
> > than what it take to expand a node.
> > I think that a classical way to affect the win/ratio is to
> > decrease the effect of the AMAF correction as the number
> > of simulation grows.
> >
> > If you test with a very low number of simulation
> > (in the 1000 - 3000 range), i think you should be
> > able to get out a very nice improvement out of the
> > AMAF version. If you don't, i would think that something
> > is wrong somewhere.
> >
> > What test process do you use for this version ?
>
> I tested it mostly doing 2,000 playouts. When AMAF is true I  
create a

> map of virtual-win values of all the moves played during a playout.
> These values get accumulated over all the playouts (not just the
> first ones). The 'virtual-value' of a move is calculated as follows:
>
> exploration-factor * UCT + ( (nr-wins*2 + nr-virtual-wins) / (nr-
> playouts*2 + nr-virtual-playouts))
>
> where the exploration-factor is currently sqrt(0.2) and UCT is sqrt
> ( log( nr-parent-playouts ) / ( nr-playouts+1) )
>
> Like I said, I haven't had time to experiment much so this formula
> may not be any good. I had also expected to see some positive effect
> of the virtual-win / virtual-playout ratio from AMAF, but I see  
none.

> Of course it's also possible I have a different kind of bug still.
>
> What happens in my 'formula' is that when it never expands beyond  
the

> first level, which is what happens if the number of simulations is
> equal to the number of simulations before expansion, the virtual-
> value becomes completely determined by nr-virtual-wins / nr-virtual-
> playouts making it equivalent to the original ref-bot. In case it
> does expand further and creates a tree, the actual win-loss ratio is
> weighed twice as heavily as the virtual win-loss ratio. That seemed
> like a reasonable first try. I have tried a few others, but usually
> didn't get much different results or much worse results.
>
> Mark
>
> ___
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/

Souhaitez vous  « être au bureau sans y être » ? Oui je le veux !
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

RE: [computer-go] Monte-Carlo Tree Search reference bot

2008-11-28 Thread Denis fidaali

 My own experiments showed a very big increase in strength
especially with about 1000 light-playouts. I had many features
and parameters witch may make a difference. But my
main hypothesis was that it shouldn't ...

I think that i weighted real simulations (aka first move) 10 times more than
virtual one (aka AMAF moves). 

I assume that by expansions you mean allowing data for every child
of a node. So let's assume that your root node is called A, you would
then allocate for every of it's child. Then when make up a simulation
running from A, you would first apply the UCT formula for
determining wich first move you should make ... then
 you would both get back the AMAF score for every moves 
that has been played during the simulation for virtual scoring
AND you would use the true first move for Real scoring (normal UCT part)

Of course, this shouldn't behave exactly like AMAF, because for one thing
you can get a better judgment for the first move because of UCT, and
then you also make the first move weight more. (although you probably
technically should also do so with AMAF despite that the first move
is still 100% random. Which is part of what the
weighted AMAF does. But first move weight would be arbitrary
thus defying the purpose of a standard way)

Last, you could try to downgrade the weight of the virtual score part
as the number of simulation grows.

like with a (if nbsim < 1000)
winRatio=[realRatio *(1000+nbsim)]/3000+[virtualRatio*(2000-(1000+nbsim))]/3000

I'm not sure about the formula :) but you get the idea of taking more and more 
into account
the realRatio. I think that what i did was that i didn't take the realRatio 
into accout at all at
first for estimating the winRatio, and then increased it linearly to the point 
where i wouldn't
take into accout the virtualRatio.


The drawback with all that, is that you will certainly have to make several 
versions, although
they would share a lot of code. I don't think that trying to get One version 
behave as
several is a good idea. You probably should keep the code safe, once you have 
made a lot
of testing of it. For reference purpose. Then you'll be able to exactly know 
the strength
of a particular piece of software, even in two years from now. I din't do that. 
I tried
to make it evolve. And with poor svn versionning capacity, and poor management 
of
the measures/software link : i'm now to a point where all i can do is make 
random
guesses, although i made millions of experiments in the last past years :) I 
just
can't link those experiments back to a fixed piece of software.

 Then, if by any means, you think a code is bug free. Which is hard to bet on,
you really should put that somewhere, with all the proof (and experiments)
you can associate with it. (Even on a web page if you ask me , or a thesis)
To me that is the most usefull part of the standard ref bot : 
because a lot of persons have tried it, you can be sure of the exact link
between the software, and the experimental results. It makes assertions 
reproducibles, and that's really great.


> From: [EMAIL PROTECTED]
> Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot
> Date: Fri, 28 Nov 2008 01:48:09 -0200
> To: computer-go@computer-go.org
> CC: 
> 
> 
> On 27-nov-08, at 19:50, Denis fidaali wrote:
> 
> > So, you use AMAF for "simulating" the first UCT evaluations ?
> > I though the classical way to use AMAF, was to affect only the
> > win/lose ratio portion of the uct equation.
> > Obvioulsy it should be allowed to use an arbitrary
> > large number of AMAF simulation accumulating them longer
> > than what it take to expand a node.
> > I think that a classical way to affect the win/ratio is to
> > decrease the effect of the AMAF correction as the number
> > of simulation grows.
> >
> > If you test with a very low number of simulation
> > (in the 1000 - 3000 range), i think you should be
> > able to get out a very nice improvement out of the
> > AMAF version. If you don't, i would think that something
> > is wrong somewhere.
> >
> > What test process do you use for this version ?
> 
> I tested it mostly doing 2,000 playouts. When AMAF is true I create a  
> map of virtual-win values of all the moves played during a playout.  
> These values get accumulated over all the playouts (not just the  
> first ones). The 'virtual-value' of a move is calculated as follows:
> 
> exploration-factor * UCT + ( (nr-wins*2 + nr-virtual-wins) / (nr- 
> playouts*2 + nr-virtual-playouts))
> 
> where the exploration-factor is currently sqrt(0.2) and UCT is sqrt 
> ( log( nr-parent-playouts ) / ( nr-playouts+1) )
> 
> Like I said, I haven't had time to experiment much so this formula  
> may not be any good. I had also exp

Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-11-27 Thread Mark Boon


On 27-nov-08, at 19:50, Denis fidaali wrote:


So, you use AMAF for "simulating" the first UCT evaluations ?
I though the classical way to use AMAF, was to affect only the
win/lose ratio portion of the uct equation.
Obvioulsy it should be allowed to use an arbitrary
large number of AMAF simulation accumulating them longer
than what it take to expand a node.
I think that a classical way to affect the win/ratio is to
decrease the effect of the AMAF correction as the number
of simulation grows.

If you test with a very low number of simulation
(in the 1000 - 3000 range), i think you should be
able to get out a very nice improvement out of the
AMAF version. If you don't, i would think that something
is wrong somewhere.

What test process do you use for this version ?


I tested it mostly doing 2,000 playouts. When AMAF is true I create a  
map of virtual-win values of all the moves played during a playout.  
These values get accumulated over all the playouts (not just the  
first ones). The 'virtual-value' of a move is calculated as follows:


exploration-factor * UCT + ( (nr-wins*2 + nr-virtual-wins) / (nr- 
playouts*2 + nr-virtual-playouts))


where the exploration-factor is currently sqrt(0.2) and UCT is sqrt 
( log( nr-parent-playouts ) / ( nr-playouts+1) )


Like I said, I haven't had time to experiment much so this formula  
may not be any good. I had also expected to see some positive effect  
of the virtual-win / virtual-playout ratio from AMAF, but I see none.  
Of course it's also possible I have a different kind of bug still.


What happens in my 'formula' is that when it never expands beyond the  
first level, which is what happens if the number of simulations is  
equal to the number of simulations before expansion, the virtual- 
value becomes completely determined by nr-virtual-wins / nr-virtual- 
playouts making it equivalent to the original ref-bot. In case it  
does expand further and creates a tree, the actual win-loss ratio is  
weighed twice as heavily as the virtual win-loss ratio. That seemed  
like a reasonable first try. I have tried a few others, but usually  
didn't get much different results or much worse results.


Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


RE: [computer-go] Monte-Carlo Tree Search reference bot

2008-11-27 Thread Denis fidaali

So, you use AMAF for "simulating" the first UCT evaluations ?
I though the classical way to use AMAF, was to affect only the
win/lose ratio portion of the uct equation. 
Obvioulsy it should be allowed to use an arbitrary
large number of AMAF simulation accumulating them longer
than what it take to expand a node.
I think that a classical way to affect the win/ratio is to 
decrease the effect of the AMAF correction as the number
of simulation grows.

If you test with a very low number of simulation
(in the 1000 - 3000 range), i think you should be
able to get out a very nice improvement out of the 
AMAF version. If you don't, i would think that something
is wrong somewhere.

What test process do you use for this version ?

> From: [EMAIL PROTECTED]
> Date: Thu, 27 Nov 2008 18:15:34 -0200
> To: computer-go@computer-go.org
> CC: 
> Subject: [computer-go] Monte-Carlo Tree Search reference bot
> 
> I have added a MCTS implementation to my reference-bot project. It  
> allows you to specify how many nodes to search, after how many nodes  
> to expand and whether to use AMAF or not. If you specify the number  
> of nodes before expansion to be the same as the total number of nodes  
> to search and set AMAF to true you get the equivalent of Don's  
> original ref-bot. When you set AMAF to false and set the number of  
> nodes before epxansion to 1 you get my original UCT search algorithm.
> 
> Between those extremes there might be a good formula to combine AMAF  
> with tree-search, but unfortunately I haven't had time lately to look  
> for one. The few small attempts I made show no benefit using AMAF in  
> tree-search, only when used on a single level. The contrast between  
> the two exptremes is very stark, so I'm actually convinced there  
> should be a way to use both. This implementation is also quite a bit  
> slower than my original search algorithm but I also didn't have time  
> yet to trace it. It might simply be due to the different expansion  
> method, which is much more expensive with a value of 1. Also,  
> searching for the best UCT node gets more expensive with more  
> (unused) nodes on each level. Using a higher expansion value may  
> alleviate the performance hit. Anyway I think this is a reasonable  
> starting point.
> 
> At first I intended to create a different project for the search  
> reference bot. But half of the code (the MC part) is the same. And I  
> don't want to end up having to maintain the same code in two places.  
> I also didn't want to separate out some code into a separate library  
> and making the structure for the simple ref-bot more complicated.  
> This organization may need some more thought though.
> 
> I'll update the project pages tomorrow.
> 
> Mark
> 
> 
> 
> ___
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/

_
Email envoyé avec Windows Live Hotmail. Dites adieux aux spam et virus, passez 
à Hotmail ! C'est gratuit !
http://www.windowslive.fr/hotmail/default.asp___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] Monte-Carlo Tree Search reference bot

2008-11-27 Thread Mark Boon
I have added a MCTS implementation to my reference-bot project. It  
allows you to specify how many nodes to search, after how many nodes  
to expand and whether to use AMAF or not. If you specify the number  
of nodes before expansion to be the same as the total number of nodes  
to search and set AMAF to true you get the equivalent of Don's  
original ref-bot. When you set AMAF to false and set the number of  
nodes before epxansion to 1 you get my original UCT search algorithm.


Between those extremes there might be a good formula to combine AMAF  
with tree-search, but unfortunately I haven't had time lately to look  
for one. The few small attempts I made show no benefit using AMAF in  
tree-search, only when used on a single level. The contrast between  
the two exptremes is very stark, so I'm actually convinced there  
should be a way to use both. This implementation is also quite a bit  
slower than my original search algorithm but I also didn't have time  
yet to trace it. It might simply be due to the different expansion  
method, which is much more expensive with a value of 1. Also,  
searching for the best UCT node gets more expensive with more  
(unused) nodes on each level. Using a higher expansion value may  
alleviate the performance hit. Anyway I think this is a reasonable  
starting point.


At first I intended to create a different project for the search  
reference bot. But half of the code (the MC part) is the same. And I  
don't want to end up having to maintain the same code in two places.  
I also didn't want to separate out some code into a separate library  
and making the structure for the simple ref-bot more complicated.  
This organization may need some more thought though.


I'll update the project pages tomorrow.

Mark



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/