Re: [computer-go] Monte-Carlo Tree Search reference bot
Finally got a window of time. It's called the "epsilon trick." Here is a cut and paste of Lukasz' explanation from the archives, including how to calculate N. - Dave Hillis -- -Original Message- From: [EMAIL PROTECTED] To: computer-go@computer-go.org Sent: Sun, 14 Jan 2007 4:50 AM Subject: [computer-go] Fast UCT (epsilon trick) Hi I've looked back into the article and I see that it is not trivial to understand how does the epsilon trick (ET) works. Therefore I will describe it here on the example of UCT. The bottleneck of UCT algorithm is choosing a move. Program has to iterate over all children evaluating their UCB value. We want to use of this loop rarely. One way to do it is to group playouts together. I.e. play 2 or more MC simulations per each tree traversal. But this is not good enough as it doesn't distribute the playouts over the tree and some parts will not get enough of them. The ET works as follows: With each node of the tree program keeps additional information: - child_to_visit - visits_to_go When we visit node, if node.visits_to_go > 0 then node->visits_to_go <- node.visits_to_go - 1 node = child_to_visit So we have predetermined strategy to go to particular child visits_to_go times. But if node.visis_to_go = 0 then // we have to set the strategy first node->child_to_visit = UCT_find_child (node) node->visits_to_go = Epsilon * node->visited_count // round up As we can see, when Epsilon is small or when a particular node is visited not too many times, UCT-ET is identical to UCT. But when node was visited more times (near the root, far from leaves), then UCT-ET may save some time and go to precalculated child. That's it. I hope You like it :) Łukasz Lew PS. In Df-PN, good Epsilon was 1/4 I believe that here it will be much smaller. 1/64 maybe. Or even one may change the Epsilon equation to: node->visits_to_go = Epsilon * sqrt(node->visited_count) If someone will do some experiments, I'm eager to hear the results. PS2 float InvSqrt (float x) { float xhalf = 0.5f * x; int i = *(int*)&x; i = 0x5f3759df - (i>>1); x = *(float*)&i; x = x * (1.5f - xhalf * x * x); return x; } ;-) ___ -- -Original Message- From: [EMAIL PROTECTED] To: computer-go@computer-go.org Sent: Wed, 3 Dec 2008 12:56 am Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot Hmm.. it could be that N is picked randomly each time... now I can't seem to find the description and my memory is not serving me well here. Perhaps someone else remembers? I'll try to track it down. - Dave Hillis -Original Message- From: Michael Williams <[EMAIL PROTECTED]> To: computer-go Sent: Wed, 3 Dec 2008 12:14 am Subject: Re: [computer-go] Monte-Carlo Tree Search reference b ot That's going to repeat the same exact path through the tree three times, isn't it? If so, it seems like it would be more efficient to do N playouts from the leaf after each traversal of the tree. [EMAIL PROTECTED] wrote: > There is another speedup trick that might interest you. IIRC Lukasz Lew > > came up with it, but I forget what he called it. After calculating the > > selected move for an internal node (going through UCT and RAVE and > whatnot) > store that move. Then, for the next N visits to that node > (where N is 5 or > 10 or so), just repeat that move without having to > calculate what the move > would be. > > - Dave Hillis > > -Original Message----- > From: Mark Boon <[EMAIL PROTECTED]> > To: computer-go > Sent: Tue, 2 Dec 2008 11:17 pm > Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot > > I have made some minor performance improvements and this is as far as I > > > intend to take this particular project. I might make some small changes > > > if necessary, but most likely I'll leave this largely unchanged from now. > > > > I had set myself as an arbitrary goal that it should do at least 20K > > > playouts. But with real liberties, AMAF and a RAVE formula I got stuck > in > > the 16K-17K range. According to my profiler that is mostly due to the > > > expensive formula used to compare nodes, where it says it spends 25% of > > > total time. The formula I'm using is: > > beta * (virtu al-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT) > > beta = 1 - log(parent-visits) / 20 > UCT = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) ) > RAVE = exploration-factor *sqrt( log(parent-visits) / > (nr-virtual-visits+1) ) > > There are probably quite a few possibilities still to tune this program > with rega
Re: [computer-go] Monte-Carlo Tree Search reference bot
On Wed, 2008-12-03 at 11:24 -0200, Mark Boon wrote: > If this is because I use Java, then > Don's concise C implementation of the MC-AMAF bot should be a lot > faster than my bloated Java version. I don't remember the numbers, but my own java and C implementation were written in the same style - same basic data structure and just about the only difference was language. I did do some kind of list trick using pointers than you cannot do in java, at least not easily and efficiently but I don't think it made that much of a difference. The Java version was not that bad for speed compared to C, I think C was something like 2/3 of the speed or close to that.But if you got into heavy duty data structures Java hurts - you won't get as big a tree in Java for instance and I'm sure caching will be a bigger issue. - Don signature.asc Description: This is a digitally signed message part ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Tree Search reference bot
I had a chess program years ago that was blindingly fast on some computers, very slow on others. It was all about the cache. The move generator was hard coded for each piece on each square. For instance a white pawn on d7 had it's very own move specialized move generator. There was a function for each piece of each color on each square. By doing this I avoided the majority of conditional tests. I did not have to test for board edge, pawn on 7th, pawn on the second rank, etc. All the logic was unrolled and I basically had to write a program to write the move generator. The machine I developed this program on had a lot of cache and it was not a problem. But when running on, for instance a laptop, it would slow down enormously - something like 4 to 1 where other cache friendly programs would slow down perhaps 25%. I don't keep up much with processor technologies now and I don't know how much data vs instruction cache modern setups have, but I know it is far better than it used to be. I bet that old program would run well on modern computers. One trick, which you are probably doing, is to keep data structures together in memory. For instance it is better to have children next to each other so that you are not accessing memory all over the place, perhaps incurring several cache misses instead of one. On Wed, 2008-12-03 at 11:24 -0200, Mark Boon wrote: > On 3-dec-08, at 10:31, Heikki Levanto wrote: > > > > Having said that, I can't help responding to one detail: > > > >> I had seen people write about memory usage of the tree, but never > >> understood the concerns. > > > > One thing to remember is that more memory use means more cache > > misses, and > > more access to the main memory. On modern computers, those can cost > > as much > > as executing a thousand instructions! So memory optimizing can > > often pay off, > > also with respect to time! > > > > Of course, Java does a lot of memory management behind the scenes, so > > optimizing such can be harder... But when writing in C or C++, it > > does make > > sense. > > > > I've seen this stated often. But color me skeptical. If what you say > is true, then the MC-AMAF bot, which hardly uses any memory at all > and which fits in the secondary cache entirely, code and data > combined, should show an enormous speed-up in terms of number of > playouts per second. Instead it does only about twice the number of > playouts per second as the tree-search version, which can be almost > if not entirely explained by the extra work that needs to be done to > traverse and maintain the tree. If this is because I use Java, then > Don's concise C implementation of the MC-AMAF bot should be a lot > faster than my bloated Java version. Which, ahem, is not the case at > all. I think a well-crafted C program can be up to twice as fast as a > similar Java program. But I doubt that has much to do with cache misses. > > I think in computer-Go, trying to control cache misses is futile no > matter the language. Maybe you can do it for a relatively trivial, > well-understood and stable sub-part. But then you risk losing the > gain again as soon as it gets incorporated into a larger whole. So > your efforts go to waste. > > So this is in the "show me" category for me. You can use C or C++ if > you like, but show me a tree-search bot that speeds up the number of > playouts significantly by reducing the node-size. Maybe it's > possible, but I don't believe it until I see it. This could be > different for Chess programs, at least I've seen it claimed often. > But I don't envy them, it must be excruciatingly frustrating to deal > with. > > Mark > > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ signature.asc Description: This is a digitally signed message part ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Tree Search reference bot
On Wed, Dec 03, 2008 at 11:24:22AM -0200, Mark Boon wrote: > > Heikki wrote: > >One thing to remember is that more memory use means more cache misses, > >and more access to the main memory. On modern computers, those can cost > >as much as executing a thousand instructions! So memory optimizing can > >often pay off, also with respect to time! > > > I've seen this stated often. But color me skeptical. Maybe I was quoting "common wisdom" without having checked it personally. I only remember one case - not related to go at all - where optimizing the memory stuff made a noticeable difference. I would also like to see hard evidence. But not badly enough to put in the necessary work to get some ;-) - H -- Heikki Levanto "In Murphy We Turst" heikki (at) lsd (dot) dk ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Tree Search reference bot
On 3-dec-08, at 10:31, Heikki Levanto wrote: Having said that, I can't help responding to one detail: I had seen people write about memory usage of the tree, but never understood the concerns. One thing to remember is that more memory use means more cache misses, and more access to the main memory. On modern computers, those can cost as much as executing a thousand instructions! So memory optimizing can often pay off, also with respect to time! Of course, Java does a lot of memory management behind the scenes, so optimizing such can be harder... But when writing in C or C++, it does make sense. I've seen this stated often. But color me skeptical. If what you say is true, then the MC-AMAF bot, which hardly uses any memory at all and which fits in the secondary cache entirely, code and data combined, should show an enormous speed-up in terms of number of playouts per second. Instead it does only about twice the number of playouts per second as the tree-search version, which can be almost if not entirely explained by the extra work that needs to be done to traverse and maintain the tree. If this is because I use Java, then Don's concise C implementation of the MC-AMAF bot should be a lot faster than my bloated Java version. Which, ahem, is not the case at all. I think a well-crafted C program can be up to twice as fast as a similar Java program. But I doubt that has much to do with cache misses. I think in computer-Go, trying to control cache misses is futile no matter the language. Maybe you can do it for a relatively trivial, well-understood and stable sub-part. But then you risk losing the gain again as soon as it gets incorporated into a larger whole. So your efforts go to waste. So this is in the "show me" category for me. You can use C or C++ if you like, but show me a tree-search bot that speeds up the number of playouts significantly by reducing the node-size. Maybe it's possible, but I don't believe it until I see it. This could be different for Chess programs, at least I've seen it claimed often. But I don't envy them, it must be excruciatingly frustrating to deal with. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Tree Search reference bot
On Wed, Dec 03, 2008 at 09:55:07AM -0200, Mark Boon wrote: > I thought about that, but I was afraid the code would become too > obscure. After all, this is supposed to be a reference > implementation. But maybe I should actually give it a try to see what > it would look like. I agree that the reference implementation should be maximally clear code. Performance is not all that important here, correctness is. Having said that, I can't help responding to one detail: > I had seen people write about memory usage of the tree, but never > understood the concerns. One thing to remember is that more memory use means more cache misses, and more access to the main memory. On modern computers, those can cost as much as executing a thousand instructions! So memory optimizing can often pay off, also with respect to time! Of course, Java does a lot of memory management behind the scenes, so optimizing such can be harder... But when writing in C or C++, it does make sense. -H -- Heikki Levanto "In Murphy We Turst" heikki (at) lsd (dot) dk ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Tree Search reference bot
On 3-dec-08, at 06:09, Sylvain Gelly wrote: What I did (was a "long" time ago, I don't know if it is still used in Mogo), is to compute the m best moves every so often and most of the time just do the max over those m moves. m was on the order of 5, and "every so often" was an increasing function like sqrt or log (I don't remember). That speeds up quite a lot (when the max becomes the bottleneck), and if you tune the parameters well you almost don't loose any strength at a fixed number of playouts. I thought about that, but I was afraid the code would become too obscure. After all, this is supposed to be a reference implementation. But maybe I should actually give it a try to see what it would look like. I also played with precomputing a table of the log() function but found little speedup. Instead I decided to (p)recompute the log(nr- parent-visits) only once in 8 times. According to my profiler that reduced the log() from using 8% to 1% of computation time. Another thing I did was replace log(nr-virtual-parent-visits) in the RAVE calculation by the same log(nr-parent-visits) used in the UCT calculation, cutting both the number of times I need to call log in half and reducing the amount of storage in the node a little. After these modifications I didn't notice any degradation in play. In terms of memory use, I did a few obvious things to keep storage of the nodes to a minimum. I had seen people write about memory usage of the tree, but never understood the concerns. But that was because I always expanded the tree one node at a time. Of course, if you create all the children in one go like this reference implementation does, it's a little bit a different story. But still, I have to push my computer hard to run out of space before I run out of time so I didn't see much need to reduce memory too aggressively. Using just a moderately small number of playouts before expansion is already enough to never have memory problems. But if memory is a concern to anyone I see two obvious ways to make make significant reductions. One is to flatten the data-structure, reducing the object overhead Java imposes. That could save up to a third. The other is to use smaller placeholders for nodes that have been created, but not yet visited. Once a node gets visited it can be replaced by a full-blown node, but until then all you need is the move-coordinate and the AMAF statistics. That should save up to 75% or so. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Tree Search reference bot
On Wed, Dec 3, 2008 at 05:17, Mark Boon <[EMAIL PROTECTED]> wrote: > I had set myself as an arbitrary goal that it should do at least 20K > playouts. But with real liberties, AMAF and a RAVE formula I got stuck in > the 16K-17K range. According to my profiler that is mostly due to the > expensive formula used to compare nodes, where it says it spends 25% of > total time. The formula I'm using is: > >beta * (virtual-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT) > >beta = 1 - log(parent-visits) / 20 >UCT = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) ) >RAVE = exploration-factor *sqrt( log(parent-visits) / > (nr-virtual-visits+1) ) Hi, This is probably something you already do, but just in case you don't, here comes a simple optimization trick: since parents-visits is an integer value and in a somewhat bounded range (0..max_playouts), the logarithm can be precomputed in a table. Then even sqrt(log()) can be precomputed also. Similarly, a table with sqrt(integer()) can be used for the other values. best regards, Vlad ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Tree Search reference bot
What I did (was a "long" time ago, I don't know if it is still used in Mogo), is to compute the m best moves every so often and most of the time just do the max over those m moves. m was on the order of 5, and "every so often" was an increasing function like sqrt or log (I don't remember). That speeds up quite a lot (when the max becomes the bottleneck), and if you tune the parameters well you almost don't loose any strength at a fixed number of playouts. Sylvain 2008/12/3 <[EMAIL PROTECTED]>: > There is another speedup trick that might interest you. IIRC Lukasz Lew came > up with it, but I forget what he called it. After calculating the selected > move for an internal node (going through UCT and RAVE and whatnot) store > that move. Then, for the next N visits to that node (where N is 5 or 10 or > so), just repeat that move without having to calculate what the move would > be. > > - Dave Hillis > > -Original Message- > From: Mark Boon <[EMAIL PROTECTED]> > To: computer-go > Sent: Tue, 2 Dec 2008 11:17 pm > Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot > > I have made some minor performance improvements and this is as far as I > intend to take this particular project. I might make some small changes if > necessary, but most likely I'll leave this largely unchanged from now. > > I had set myself as an arbitrary goal that it should do at least 20K > playouts. But with real liberties, AMAF and a RAVE formula I got stuck in > the 16K-17K range. According to my profiler that is mostly due to the > expensive formula used to compare nodes, where it says it spends 25% of > total time. The formula I'm using is: > > beta * (virtual-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT) > > beta = 1 - log(parent-visits) / 20 > UCT = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) ) > RAVE = exploration-factor *sqrt( log(parent-visits) / > (nr-virtual-visits+1) ) > > There are probably quite a few possibilities still to tune this program with > regards to playing strength and performance. But I felt it doesn't help to > obscure the implementation by too many details. > > The implementation of the search algorithm was entirely game-independent, > until I introduced AMAF. I didn't see how to fix that, as there's no way > getting around that it's based on the fact that a move is represented by a > single coordinate, which is mapped to an array to store the statistical > values. But strip the AMAF part, which is a block of 30 odd lines of code, > and I think it can be used for other games basically as-is. I did this not > because I ever see myself program another game, but because in my experience > in doing so I get a cleaner separation between modules. > > At 2,000 playouts, it's still quite a bit weaker than the plain MC-AMAF > refbot. It wins only about 33%. But that's probably because the 1,000-2,000 > playouts range is the sweet-spot for that particular type of playing engine. > It doesn't scale from there, whereas the MCTS ref-bot only just gets warmed > up with 2,000 playouts. > > This leads me to a question. I suppose it might be of some interest to put > this bot up on CGOS. But what parameters to use? The main one being the > number of playouts, naturally. > > Mark > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > > > Tis the season to save your money! Get the new AOL Holiday Toolbar for money > saving offers and gift ideas. > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Tree Search reference bot
Hmm.. it could be that N is picked randomly each time... now I can't seem to find the description and my memory is not serving me well here. Perhaps someone else remembers? I'll try to track it down. - Dave Hillis -Original Message- From: Michael Williams <[EMAIL PROTECTED]> To: computer-go Sent: Wed, 3 Dec 2008 12:14 am Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot That's going to repeat the same exact path through the tree three times, isn't it? If so, it seems like it would be more efficient to do N playouts from the leaf after each traversal of the tree.? ? [EMAIL PROTECTED] wrote:? > There is another speedup trick that might interest you. IIRC Lukasz Lew > > came up with it, but I forget what he called it. After calculating the > > selected move for an internal node (going through UCT and RAVE and > whatnot) > store that move. Then, for the next N visits to that node > (where N is 5 or > 10 or so), just repeat that move without having to > calculate what the move > would be.? > > - Dave Hillis? > > -Original Message-? > From: Mark Boon <[EMAIL PROTECTED]>? > To: computer-go ? > Sent: Tue, 2 Dec 2008 11:17 pm? > Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot? > > I have made some minor performance improvements and this is as far as I > > > intend to take this particular project. I might make some small changes > > > if necessary, but most likely I'll leave this largely unchanged from now. > > > > I had set myself as an arbitrary goal that it should do at least 20K > > > playouts. But with real liberties, AMAF and a RAVE formula I got stuck > in > > the 16K-17K range. According to my profiler that is mostly due to the > > > expensive formula used to compare nodes, where it says it spends 25% of > > > total time. The formula I'm using is: > > beta * (virtual-win-ratio + RAVE) > > + (1-beta) * (win-ratio + UCT) > > beta = 1 - log(parent-visits) / 20 > UCT > > = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) ) > RAVE = > > exploration-factor *sqrt( log(parent-visits) / > (nr-virtual-visits+1) ) > > > > There are probably quite a few possibilities still to tune this program > > > with regards to playing strength and performance. But I felt it doesn't > > > help to obscure the implementation by too many details. > > The implementation of the search algorithm was entirely > game-independent, until I introduced AMAF. I didn't see how to fix that, > as there's no way getting around that it's based on the fact that a move > is represented by a single coordinate, which is mapped to an array to > store the statistical values. But strip the AMAF part, which is a block > of 30 odd lines of code, and I think it can be used for other games > basically as-is. I did this not because I ever see myself program > another game, but because in my experience in doing so I get a cleaner > separation between modules. > > At 2,000 playouts, it's still quite a bit weaker than the plain MC-AMAF > refbot. It wins only about 33%. But that's probably because the > 1,000-2,000 playouts range is the sweet-spot for that particular type of > playing engine. It doesn't scale from there, whereas the MCTS ref-bot > only just gets warmed up with 2,000 playouts. > > This leads me to a question. I suppose it might be of some interest to > put th is bot up on CGOS. But what parameters to use? The main one being > the number of playouts, naturally. > > Mark > ___ > computer-go mailing list > computer-go@computer-go.org <mailto:computer-go@computer-go.org> > http://www.computer-go.org/mailman/listinfo/computer-go/ > > ? > Tis the season to save your money! Get the new AOL Holiday Toolbar > > <http://toolbar.aol.com/holiday/download.html?ncid=emlweusdown0008> > for > money saving offers and gift ideas.? > > > ? > > ___? > computer-go mailing list? > [EMAIL PROTECTED] > http://www.computer-go.org/mailman/listinfo/computer-go/? ? ___? computer-go mailing list? [EMAIL PROTECTED] http://www.computer-go.org/mailman/listinfo/computer-go/? ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Tree Search reference bot
That's going to repeat the same exact path through the tree three times, isn't it? If so, it seems like it would be more efficient to do N playouts from the leaf after each traversal of the tree. [EMAIL PROTECTED] wrote: There is another speedup trick that might interest you. IIRC Lukasz Lew came up with it, but I forget what he called it. After calculating the selected move for an internal node (going through UCT and RAVE and whatnot) store that move. Then, for the next N visits to that node (where N is 5 or 10 or so), just repeat that move without having to calculate what the move would be. - Dave Hillis -Original Message- From: Mark Boon <[EMAIL PROTECTED]> To: computer-go Sent: Tue, 2 Dec 2008 11:17 pm Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot I have made some minor performance improvements and this is as far as I intend to take this particular project. I might make some small changes if necessary, but most likely I'll leave this largely unchanged from now. I had set myself as an arbitrary goal that it should do at least 20K playouts. But with real liberties, AMAF and a RAVE formula I got stuck in the 16K-17K range. According to my profiler that is mostly due to the expensive formula used to compare nodes, where it says it spends 25% of total time. The formula I'm using is: beta * (virtual-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT) beta = 1 - log(parent-visits) / 20 UCT = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) ) RAVE = exploration-factor *sqrt( log(parent-visits) / (nr-virtual-visits+1) ) There are probably quite a few possibilities still to tune this program with regards to playing strength and performance. But I felt it doesn't help to obscure the implementation by too many details. The implementation of the search algorithm was entirely game-independent, until I introduced AMAF. I didn't see how to fix that, as there's no way getting around that it's based on the fact that a move is represented by a single coordinate, which is mapped to an array to store the statistical values. But strip the AMAF part, which is a block of 30 odd lines of code, and I think it can be used for other games basically as-is. I did this not because I ever see myself program another game, but because in my experience in doing so I get a cleaner separation between modules. At 2,000 playouts, it's still quite a bit weaker than the plain MC-AMAF refbot. It wins only about 33%. But that's probably because the 1,000-2,000 playouts range is the sweet-spot for that particular type of playing engine. It doesn't scale from there, whereas the MCTS ref-bot only just gets warmed up with 2,000 playouts. This leads me to a question. I suppose it might be of some interest to put this bot up on CGOS. But what parameters to use? The main one being the number of playouts, naturally. Mark ___ computer-go mailing list computer-go@computer-go.org <mailto:computer-go@computer-go.org> http://www.computer-go.org/mailman/listinfo/computer-go/ Tis the season to save your money! Get the new AOL Holiday Toolbar <http://toolbar.aol.com/holiday/download.html?ncid=emlweusdown0008> for money saving offers and gift ideas. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Tree Search reference bot
There is another speedup trick that might interest you. IIRC Lukasz Lew came up with it, but I forget what he called it. After calculating the selected move for an internal node (going through UCT and RAVE and whatnot) store that move. Then, for the next N visits to that node (where N is 5 or 10 or so), just repeat that move without having to calculate what the move would be. - Dave Hillis -Original Message- From: Mark Boon <[EMAIL PROTECTED]> To: computer-go Sent: Tue, 2 Dec 2008 11:17 pm Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot I have made some minor performance improvements and this is as far as I intend to take this particular project. I might make some small changes if necessary, but most likely I'll leave this largely unchanged from now.? ? I had set myself as an arbitrary goal that it should do at least 20K playouts. But with real liberties, AMAF and a RAVE formula I got stuck in the 16K-17K range. According to my profiler that is mostly due to the expensive formula used to compare nodes, where it says it spends 25% of total time. The formula I'm using is:? ? ? beta * (virtual-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT)? ? ? beta = 1 - log(parent-visits) / 20? ? UCT = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) )? ? RAVE = exploration-factor *sqrt( log(parent-visits) / (nr-virtual-visits+1) )? ? There are probably quite a few possibilities still to tune this program with regards to playing strength and performance. But I felt it doesn't help to obscure the implementation by too many details.? ? The implementation of the search algorithm was entirely game-independent, until I introduced AMAF. I didn't see how to fix that, as there's no way getting around that it's based on the fact that a move is represented by a single coordinate, which is mapped to an array to store the statistical values. But strip the AMAF part, which is a block of 30 odd lines of code, and I think it can be used for other games basically as-is. I did this not because I ever see myself program another game, but because in my experience in doing so I get a cleaner separation between modules.? ? At 2,000 playouts, it's still quite a bit weaker than the plain MC-AMAF refbot. It wins only about 33%. But that's probably because the 1,000-2,000 playouts range is the sweet-spot for that particular type of playing engine. It doesn't scale from there, whereas the MCTS ref-bot only just gets warmed up with 2,000 playouts.? ? This leads me to a question. I suppose it might be of some interest to put this bot up on CGOS. But what parameters to use? The main one being the number of playouts, naturally.? ? Mark? ___? computer-go mailing list? [EMAIL PROTECTED] http://www.computer-go.org/mailman/listinfo/computer-go/? ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Tree Search reference bot
I have made some minor performance improvements and this is as far as I intend to take this particular project. I might make some small changes if necessary, but most likely I'll leave this largely unchanged from now. I had set myself as an arbitrary goal that it should do at least 20K playouts. But with real liberties, AMAF and a RAVE formula I got stuck in the 16K-17K range. According to my profiler that is mostly due to the expensive formula used to compare nodes, where it says it spends 25% of total time. The formula I'm using is: beta * (virtual-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT) beta = 1 - log(parent-visits) / 20 UCT = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) ) RAVE = exploration-factor *sqrt( log(parent-visits) / (nr-virtual- visits+1) ) There are probably quite a few possibilities still to tune this program with regards to playing strength and performance. But I felt it doesn't help to obscure the implementation by too many details. The implementation of the search algorithm was entirely game- independent, until I introduced AMAF. I didn't see how to fix that, as there's no way getting around that it's based on the fact that a move is represented by a single coordinate, which is mapped to an array to store the statistical values. But strip the AMAF part, which is a block of 30 odd lines of code, and I think it can be used for other games basically as-is. I did this not because I ever see myself program another game, but because in my experience in doing so I get a cleaner separation between modules. At 2,000 playouts, it's still quite a bit weaker than the plain MC- AMAF refbot. It wins only about 33%. But that's probably because the 1,000-2,000 playouts range is the sweet-spot for that particular type of playing engine. It doesn't scale from there, whereas the MCTS ref- bot only just gets warmed up with 2,000 playouts. This leads me to a question. I suppose it might be of some interest to put this bot up on CGOS. But what parameters to use? The main one being the number of playouts, naturally. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Tree Search reference bot
eriments) you can associate with it. (Even on a web page if you ask me , or a thesis) To me that is the most usefull part of the standard ref bot : because a lot of persons have tried it, you can be sure of the exact link between the software, and the experimental results. It makes assertions reproducibles, and that's really great. > From: [EMAIL PROTECTED] > Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot > Date: Fri, 28 Nov 2008 01:48:09 -0200 > To: computer-go@computer-go.org > CC: > > > On 27-nov-08, at 19:50, Denis fidaali wrote: > > > So, you use AMAF for "simulating" the first UCT evaluations ? > > I though the classical way to use AMAF, was to affect only the > > win/lose ratio portion of the uct equation. > > Obvioulsy it should be allowed to use an arbitrary > > large number of AMAF simulation accumulating them longer > > than what it take to expand a node. > > I think that a classical way to affect the win/ratio is to > > decrease the effect of the AMAF correction as the number > > of simulation grows. > > > > If you test with a very low number of simulation > > (in the 1000 - 3000 range), i think you should be > > able to get out a very nice improvement out of the > > AMAF version. If you don't, i would think that something > > is wrong somewhere. > > > > What test process do you use for this version ? > > I tested it mostly doing 2,000 playouts. When AMAF is true I create a > map of virtual-win values of all the moves played during a playout. > These values get accumulated over all the playouts (not just the > first ones). The 'virtual-value' of a move is calculated as follows: > > exploration-factor * UCT + ( (nr-wins*2 + nr-virtual-wins) / (nr- > playouts*2 + nr-virtual-playouts)) > > where the exploration-factor is currently sqrt(0.2) and UCT is sqrt > ( log( nr-parent-playouts ) / ( nr-playouts+1) ) > > Like I said, I haven't had time to experiment much so this formula > may not be any good. I had also expected to see some positive effect > of the virtual-win / virtual-playout ratio from AMAF, but I see none. > Of course it's also possible I have a different kind of bug still. > > What happens in my 'formula' is that when it never expands beyond the > first level, which is what happens if the number of simulations is > equal to the number of simulations before expansion, the virtual- > value becomes completely determined by nr-virtual-wins / nr-virtual- > playouts making it equivalent to the original ref-bot. In case it > does expand further and creates a tree, the actual win-loss ratio is > weighed twice as heavily as the virtual win-loss ratio. That seemed > like a reasonable first try. I have tried a few others, but usually > didn't get much different results or much worse results. > > Mark > > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ Souhaitez vous « être au bureau sans y être » ? Oui je le veux ! ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] Monte-Carlo Tree Search reference bot
My own experiments showed a very big increase in strength especially with about 1000 light-playouts. I had many features and parameters witch may make a difference. But my main hypothesis was that it shouldn't ... I think that i weighted real simulations (aka first move) 10 times more than virtual one (aka AMAF moves). I assume that by expansions you mean allowing data for every child of a node. So let's assume that your root node is called A, you would then allocate for every of it's child. Then when make up a simulation running from A, you would first apply the UCT formula for determining wich first move you should make ... then you would both get back the AMAF score for every moves that has been played during the simulation for virtual scoring AND you would use the true first move for Real scoring (normal UCT part) Of course, this shouldn't behave exactly like AMAF, because for one thing you can get a better judgment for the first move because of UCT, and then you also make the first move weight more. (although you probably technically should also do so with AMAF despite that the first move is still 100% random. Which is part of what the weighted AMAF does. But first move weight would be arbitrary thus defying the purpose of a standard way) Last, you could try to downgrade the weight of the virtual score part as the number of simulation grows. like with a (if nbsim < 1000) winRatio=[realRatio *(1000+nbsim)]/3000+[virtualRatio*(2000-(1000+nbsim))]/3000 I'm not sure about the formula :) but you get the idea of taking more and more into account the realRatio. I think that what i did was that i didn't take the realRatio into accout at all at first for estimating the winRatio, and then increased it linearly to the point where i wouldn't take into accout the virtualRatio. The drawback with all that, is that you will certainly have to make several versions, although they would share a lot of code. I don't think that trying to get One version behave as several is a good idea. You probably should keep the code safe, once you have made a lot of testing of it. For reference purpose. Then you'll be able to exactly know the strength of a particular piece of software, even in two years from now. I din't do that. I tried to make it evolve. And with poor svn versionning capacity, and poor management of the measures/software link : i'm now to a point where all i can do is make random guesses, although i made millions of experiments in the last past years :) I just can't link those experiments back to a fixed piece of software. Then, if by any means, you think a code is bug free. Which is hard to bet on, you really should put that somewhere, with all the proof (and experiments) you can associate with it. (Even on a web page if you ask me , or a thesis) To me that is the most usefull part of the standard ref bot : because a lot of persons have tried it, you can be sure of the exact link between the software, and the experimental results. It makes assertions reproducibles, and that's really great. > From: [EMAIL PROTECTED] > Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot > Date: Fri, 28 Nov 2008 01:48:09 -0200 > To: computer-go@computer-go.org > CC: > > > On 27-nov-08, at 19:50, Denis fidaali wrote: > > > So, you use AMAF for "simulating" the first UCT evaluations ? > > I though the classical way to use AMAF, was to affect only the > > win/lose ratio portion of the uct equation. > > Obvioulsy it should be allowed to use an arbitrary > > large number of AMAF simulation accumulating them longer > > than what it take to expand a node. > > I think that a classical way to affect the win/ratio is to > > decrease the effect of the AMAF correction as the number > > of simulation grows. > > > > If you test with a very low number of simulation > > (in the 1000 - 3000 range), i think you should be > > able to get out a very nice improvement out of the > > AMAF version. If you don't, i would think that something > > is wrong somewhere. > > > > What test process do you use for this version ? > > I tested it mostly doing 2,000 playouts. When AMAF is true I create a > map of virtual-win values of all the moves played during a playout. > These values get accumulated over all the playouts (not just the > first ones). The 'virtual-value' of a move is calculated as follows: > > exploration-factor * UCT + ( (nr-wins*2 + nr-virtual-wins) / (nr- > playouts*2 + nr-virtual-playouts)) > > where the exploration-factor is currently sqrt(0.2) and UCT is sqrt > ( log( nr-parent-playouts ) / ( nr-playouts+1) ) > > Like I said, I haven't had time to experiment much so this formula > may not be any good. I had also exp
Re: [computer-go] Monte-Carlo Tree Search reference bot
On 27-nov-08, at 19:50, Denis fidaali wrote: So, you use AMAF for "simulating" the first UCT evaluations ? I though the classical way to use AMAF, was to affect only the win/lose ratio portion of the uct equation. Obvioulsy it should be allowed to use an arbitrary large number of AMAF simulation accumulating them longer than what it take to expand a node. I think that a classical way to affect the win/ratio is to decrease the effect of the AMAF correction as the number of simulation grows. If you test with a very low number of simulation (in the 1000 - 3000 range), i think you should be able to get out a very nice improvement out of the AMAF version. If you don't, i would think that something is wrong somewhere. What test process do you use for this version ? I tested it mostly doing 2,000 playouts. When AMAF is true I create a map of virtual-win values of all the moves played during a playout. These values get accumulated over all the playouts (not just the first ones). The 'virtual-value' of a move is calculated as follows: exploration-factor * UCT + ( (nr-wins*2 + nr-virtual-wins) / (nr- playouts*2 + nr-virtual-playouts)) where the exploration-factor is currently sqrt(0.2) and UCT is sqrt ( log( nr-parent-playouts ) / ( nr-playouts+1) ) Like I said, I haven't had time to experiment much so this formula may not be any good. I had also expected to see some positive effect of the virtual-win / virtual-playout ratio from AMAF, but I see none. Of course it's also possible I have a different kind of bug still. What happens in my 'formula' is that when it never expands beyond the first level, which is what happens if the number of simulations is equal to the number of simulations before expansion, the virtual- value becomes completely determined by nr-virtual-wins / nr-virtual- playouts making it equivalent to the original ref-bot. In case it does expand further and creates a tree, the actual win-loss ratio is weighed twice as heavily as the virtual win-loss ratio. That seemed like a reasonable first try. I have tried a few others, but usually didn't get much different results or much worse results. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] Monte-Carlo Tree Search reference bot
So, you use AMAF for "simulating" the first UCT evaluations ? I though the classical way to use AMAF, was to affect only the win/lose ratio portion of the uct equation. Obvioulsy it should be allowed to use an arbitrary large number of AMAF simulation accumulating them longer than what it take to expand a node. I think that a classical way to affect the win/ratio is to decrease the effect of the AMAF correction as the number of simulation grows. If you test with a very low number of simulation (in the 1000 - 3000 range), i think you should be able to get out a very nice improvement out of the AMAF version. If you don't, i would think that something is wrong somewhere. What test process do you use for this version ? > From: [EMAIL PROTECTED] > Date: Thu, 27 Nov 2008 18:15:34 -0200 > To: computer-go@computer-go.org > CC: > Subject: [computer-go] Monte-Carlo Tree Search reference bot > > I have added a MCTS implementation to my reference-bot project. It > allows you to specify how many nodes to search, after how many nodes > to expand and whether to use AMAF or not. If you specify the number > of nodes before expansion to be the same as the total number of nodes > to search and set AMAF to true you get the equivalent of Don's > original ref-bot. When you set AMAF to false and set the number of > nodes before epxansion to 1 you get my original UCT search algorithm. > > Between those extremes there might be a good formula to combine AMAF > with tree-search, but unfortunately I haven't had time lately to look > for one. The few small attempts I made show no benefit using AMAF in > tree-search, only when used on a single level. The contrast between > the two exptremes is very stark, so I'm actually convinced there > should be a way to use both. This implementation is also quite a bit > slower than my original search algorithm but I also didn't have time > yet to trace it. It might simply be due to the different expansion > method, which is much more expensive with a value of 1. Also, > searching for the best UCT node gets more expensive with more > (unused) nodes on each level. Using a higher expansion value may > alleviate the performance hit. Anyway I think this is a reasonable > starting point. > > At first I intended to create a different project for the search > reference bot. But half of the code (the MC part) is the same. And I > don't want to end up having to maintain the same code in two places. > I also didn't want to separate out some code into a separate library > and making the structure for the simple ref-bot more complicated. > This organization may need some more thought though. > > I'll update the project pages tomorrow. > > Mark > > > > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ _ Email envoyé avec Windows Live Hotmail. Dites adieux aux spam et virus, passez à Hotmail ! C'est gratuit ! http://www.windowslive.fr/hotmail/default.asp___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Monte-Carlo Tree Search reference bot
I have added a MCTS implementation to my reference-bot project. It allows you to specify how many nodes to search, after how many nodes to expand and whether to use AMAF or not. If you specify the number of nodes before expansion to be the same as the total number of nodes to search and set AMAF to true you get the equivalent of Don's original ref-bot. When you set AMAF to false and set the number of nodes before epxansion to 1 you get my original UCT search algorithm. Between those extremes there might be a good formula to combine AMAF with tree-search, but unfortunately I haven't had time lately to look for one. The few small attempts I made show no benefit using AMAF in tree-search, only when used on a single level. The contrast between the two exptremes is very stark, so I'm actually convinced there should be a way to use both. This implementation is also quite a bit slower than my original search algorithm but I also didn't have time yet to trace it. It might simply be due to the different expansion method, which is much more expensive with a value of 1. Also, searching for the best UCT node gets more expensive with more (unused) nodes on each level. Using a higher expansion value may alleviate the performance hit. Anyway I think this is a reasonable starting point. At first I intended to create a different project for the search reference bot. But half of the code (the MC part) is the same. And I don't want to end up having to maintain the same code in two places. I also didn't want to separate out some code into a separate library and making the structure for the simple ref-bot more complicated. This organization may need some more thought though. I'll update the project pages tomorrow. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/