Re: [computer-go] How to "properly" implement RAVE?
On Sun, Feb 8, 2009 at 1:42 PM, Petr Baudis wrote: > On Sat, Jan 17, 2009 at 08:29:32PM +0100, Sylvain Gelly wrote: > > A small point: in "PlayoutOutTree", just after "if > > (!played.AlreadyPlayed(move)) {", there should have a > "played.Play(move)". > > I believe it does not change the final result (as the check is also done > in > > the backup, and the move played in the backup), but I simply forgot that > > line (that should make moves_played_out_tree smaller). > > > > To avoid confusion, I repost the pseudo code with that correction (and > > hoping the indentation is not broken by the email editor once again). > > Thank you so much for this! I have switched my RAVE implementation to > this formula and the bot has gotten noticeably stronger, though I > apparently still have some bugs to chase, since it seems to have trouble > considering strongest opponent's responses and frequently focuses on > unreasonable opponent's replies instead of the obvious (e.g. keeping a > group of stones in atari). Maybe I need better prior hinting... > > I have few questions. Of course, please feel free to skip questions > about particular constants if you feel that's giving away too much. :-) > > > ChooseMove(node, board) { > > bias = 0.015 // I put a random number here, to be tuned > > b = bias * bias / 0.25 > > Maybe it would be cleaner to define b = 1 / rave_equiv, where rave_equiv > is the number of playouts RAVE is thought to be equivalent of? Or is the > meaning of this constant actually different? > The meaning is supposed to be the difference between the expected value of AMAF and the expected value of the tree. But at that point it is just a constant, so I am not sure there is a "good" interpretation. > > What value works best for people? I did not do much tuning yet, but I > use b=1/3000. I see Fuego uses b=1/5000. (This example b=1/.) I believe the actually value depends on the other parts of your implementation, namely the playouts, the prior you use in the tree and the exploration term (there was none in the pseudo code I posted, but you might use one). > > > > best_value = -1 > > best_move = PASSMOVE > > for (move in board.allmoves) { > > c = node.child(move).counts > > w = node.child(move).wins > > rc = node.rave_counts[move] > > rw = node.rave_wins[move] > > coefficient = 1 - rc / (rc + c + rc * c * b) > > value = w / c * coef + rw / rc * (1 - coef) // please here take care > of > > the c==0 and rc == 0 cases > > if (value > best_value) { > > best_value = value > > best_move = move > > } > > } > > return best_move > > } > > I have two questions here: > > * Is the FPU concept abandoned? Or what values are reasonable? It seems > to me 1.0, which is usually recommended, is obviously too big here > since that's the upper bound of the value already. So far I have tried > 0.6 and 0.7 but both just make my bot slightly weaker. > > * How to accomodate prior knowledge? (I'm using grand-parent heuristics, > atari liberties, and few patterns.) Do you use it to fill normal > counts, RAVE values or both? What count values work best for you? > I have settled on 50 playouts. The FPU concept is here replaced by the prior value. I used node.rave_wins[move] and node.rave_counts[move] when the node is initialized. If you don't want to use go knowledge, you can initialize those values to something like 7 and 14 respectively. Grand-parent heuristics did not work at all in my experiments (was even very bad). The heuristics I was using were: - save a chain (positive prior) - self atari (negative prior) - match a simple 3x3 pattern (hane, cut, and a few others I don't remember) (positive prior). For node.rave_counts[move], 50 seems a little high. I think I was using values around 15, but it was not very sensitive (50 was working almost as well as 15). I give those numbers from memory, they may be quite off. Sylvain > > > -- >Petr "Pasky" Baudis > The average, healthy, well-adjusted adult gets up at seven-thirty > in the morning feeling just terrible. -- Jean Kerr > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
On Sun, Feb 8, 2009 at 3:02 PM, Isaac Deutsch wrote: > Hi, > > Can you explain what minimumNrNodes and nrSimulations do? In my program I > play 50k games regardless of the number of nodes, so I would like to adjust > this accordingly. > minimumNrNodes is the number of games played out. Originally I used to always create a new node when a playout happened. Maybe this should be renamed to minimumNrPlayouts. nrSimulationsBeforeExpansion is the minimum number of visits that have to be made to a node before the tree gets expanded any deeper. As an example, when the search begins, the root-node is expanded with all possible legal moves. But those children nodes are not expanded themselves until a certain number of simulations (playouts) have been done starting from the root-node. Until that time the children nodes are only used to store the AMAF values. I think this may be slightly different from how most people do it, but I figured it would be a waste to throw away the AMAF values for the first 'n' simulations in a node. > Otherwise, it works now. :) Good. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
On Fri, 2009-02-06 at 18:55 +0100, Isaac Deutsch wrote: > The rating of the bot still seems to be drifting upwards, but I think I can > conclude my UCT implementation is OK afterall. Many thanks to the bots > provided. Does someone have a bot that does 50k light playouts + RAVE? I > would be most grateful if you could put them online for a few days of > testing. :-) > > By the way, I've seen 2 games when checking my bot's status where one of the > "myCtest" bots lost because of an illegal ko move. Maybe there's a bug in > handling superko? > > Regards, > Isaac hb-0.7-RAVE-50k is now online. It should be identical to hb-797-RAVE-50k (Please disregard hb-0.7-50k-RAVE since it had a serious bug) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
Hi, Can you explain what minimumNrNodes and nrSimulations do? In my program I play 50k games regardless of the number of nodes, so I would like to adjust this accordingly. Otherwise, it works now. :) -ibd Original-Nachricht > Datum: Sun, 8 Feb 2009 14:47:55 -0200 > Von: Mark Boon > An: computer-go > Betreff: Re: [computer-go] How to "properly" implement RAVE? > I had a little spare time to look at it. It seems indeed I forgot to > update the GoEngineGTP.jar file last time I made some changes. This > was easy to fix even from here and I think it should work now. > > Just as a note, if you want to change the number of playouts to 50K, > you need to change the minimumNrNodes from 4,000 to 50,000 in the > ReferenceMonteCarloTreeSearch definition in the TesujiRefBot.xml. But > you may also want to increase the 'nrSimulationsBeforeExpansion' value > to something higher like 8 or even 32. It depends on how much memory > you have available. You may want to set it to the same value you use > for your own program to get a good comparison anyway. Use the > MCTSRefBot to play, which means you'll be passing the following > command to two-gtp: java -xmx512M -jar GoEngineGTP.jar MCTSRefBot > > Adjust the -xmx???M to whatever you think is suitable on your > computer. I think if you set nrSimulationsBeforeExpansion=8 then 512M > should be enough but you may have to experiment a little to find the > optimal settings. > > Mark > > > On Feb 8, 2009, at 7:06 AM, Isaac Deutsch wrote: > > > No hurry, I will be away for a week next week. :-) > > > > > >> Isaac, > >> > >> I haven't looked at this stuff for a while. I'm not at home so I > >> can't > >> look at it now. > >> > >>>> From the error I understand that tesuji/games/general/ > >>>> MoveIterator is > >> missing. It is there in the Subversion source-tree however. So I > >> don't > >> know what could be wrong. Maybe it's somehow missing in the > >> GoEngineGTP.jar > >> > >> If you can wait until Wednesday I'll fix it for you then. > >> > >> Mark > > > > -- > > Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit > > allen: http://www.gmx.net/de/go/multimessenger01 > > ___ > > computer-go mailing list > > computer-go@computer-go.org > > http://www.computer-go.org/mailman/listinfo/computer-go/ > -- Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01 ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
I had a little spare time to look at it. It seems indeed I forgot to update the GoEngineGTP.jar file last time I made some changes. This was easy to fix even from here and I think it should work now. Just as a note, if you want to change the number of playouts to 50K, you need to change the minimumNrNodes from 4,000 to 50,000 in the ReferenceMonteCarloTreeSearch definition in the TesujiRefBot.xml. But you may also want to increase the 'nrSimulationsBeforeExpansion' value to something higher like 8 or even 32. It depends on how much memory you have available. You may want to set it to the same value you use for your own program to get a good comparison anyway. Use the MCTSRefBot to play, which means you'll be passing the following command to two-gtp: java -xmx512M -jar GoEngineGTP.jar MCTSRefBot Adjust the -xmx???M to whatever you think is suitable on your computer. I think if you set nrSimulationsBeforeExpansion=8 then 512M should be enough but you may have to experiment a little to find the optimal settings. Mark On Feb 8, 2009, at 7:06 AM, Isaac Deutsch wrote: No hurry, I will be away for a week next week. :-) Isaac, I haven't looked at this stuff for a while. I'm not at home so I can't look at it now. From the error I understand that tesuji/games/general/ MoveIterator is missing. It is there in the Subversion source-tree however. So I don't know what could be wrong. Maybe it's somehow missing in the GoEngineGTP.jar If you can wait until Wednesday I'll fix it for you then. Mark -- Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01 ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
On Sat, Jan 17, 2009 at 08:29:32PM +0100, Sylvain Gelly wrote: > A small point: in "PlayoutOutTree", just after "if > (!played.AlreadyPlayed(move)) {", there should have a "played.Play(move)". > I believe it does not change the final result (as the check is also done in > the backup, and the move played in the backup), but I simply forgot that > line (that should make moves_played_out_tree smaller). > > To avoid confusion, I repost the pseudo code with that correction (and > hoping the indentation is not broken by the email editor once again). Thank you so much for this! I have switched my RAVE implementation to this formula and the bot has gotten noticeably stronger, though I apparently still have some bugs to chase, since it seems to have trouble considering strongest opponent's responses and frequently focuses on unreasonable opponent's replies instead of the obvious (e.g. keeping a group of stones in atari). Maybe I need better prior hinting... I have few questions. Of course, please feel free to skip questions about particular constants if you feel that's giving away too much. :-) > ChooseMove(node, board) { > bias = 0.015 // I put a random number here, to be tuned > b = bias * bias / 0.25 Maybe it would be cleaner to define b = 1 / rave_equiv, where rave_equiv is the number of playouts RAVE is thought to be equivalent of? Or is the meaning of this constant actually different? What value works best for people? I did not do much tuning yet, but I use b=1/3000. I see Fuego uses b=1/5000. (This example b=1/.) > best_value = -1 > best_move = PASSMOVE > for (move in board.allmoves) { > c = node.child(move).counts > w = node.child(move).wins > rc = node.rave_counts[move] > rw = node.rave_wins[move] > coefficient = 1 - rc / (rc + c + rc * c * b) > value = w / c * coef + rw / rc * (1 - coef) // please here take care of > the c==0 and rc == 0 cases > if (value > best_value) { > best_value = value > best_move = move > } > } > return best_move > } I have two questions here: * Is the FPU concept abandoned? Or what values are reasonable? It seems to me 1.0, which is usually recommended, is obviously too big here since that's the upper bound of the value already. So far I have tried 0.6 and 0.7 but both just make my bot slightly weaker. * How to accomodate prior knowledge? (I'm using grand-parent heuristics, atari liberties, and few patterns.) Do you use it to fill normal counts, RAVE values or both? What count values work best for you? I have settled on 50 playouts. -- Petr "Pasky" Baudis The average, healthy, well-adjusted adult gets up at seven-thirty in the morning feeling just terrible. -- Jean Kerr ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
No hurry, I will be away for a week next week. :-) > Isaac, > > I haven't looked at this stuff for a while. I'm not at home so I can't > look at it now. > > >>From the error I understand that tesuji/games/general/MoveIterator is > missing. It is there in the Subversion source-tree however. So I don't > know what could be wrong. Maybe it's somehow missing in the > GoEngineGTP.jar > > If you can wait until Wednesday I'll fix it for you then. > > Mark -- Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01 ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
Isaac, I haven't looked at this stuff for a while. I'm not at home so I can't look at it now. >From the error I understand that tesuji/games/general/MoveIterator is missing. It is there in the Subversion source-tree however. So I don't know what could be wrong. Maybe it's somehow missing in the GoEngineGTP.jar If you can wait until Wednesday I'll fix it for you then. Mark On Sat, Feb 7, 2009 at 4:29 PM, Isaac Deutsch wrote: > I put all files in a folder like so: > > http://img21.imageshack.us/img21/5272/bild4ya1.png > > But I get various errors when I try to run, for example, GoEngineGTP.jar > with "java -jar ...", also I'm not able to select the RefBot as player. > I'm not sure what the others do, but one seems to connect to the 19x19 CGOS > server. I don't have any experience with java. Any ideas? > > Isaac > > The beginning of the error: > org.springframework.beans.factory.BeanCreationException: Error creating bean > with name 'TesujiRefBot' defined in file [/Users/ibd/Desktop/computer > go/Rango/CGOS/TesujiRefBot.xml]: Cannot resolve reference to bean > 'referenceLibertyAdministration' while setting bean property > 'monteCarloAdministration'; nested exception is > org.springframework.beans.factory.BeanCreationException: Error creating bean > with name 'referenceLibertyAdministration' defined in file > [/Users/ibd/Desktop/computer go/Rango/CGOS/TesujiRefBot.xml]: Instantiation > of bean failed; nested exception is java.lang.NoClassDefFoundError: > tesuji/games/general/MoveIterator > > > ---- Original-Nachricht >> Datum: Sat, 7 Feb 2009 09:30:53 -0200 >> Von: Mark Boon >> An: computer-go >> Betreff: Re: [computer-go] How to "properly" implement RAVE? > >> You can get everything here: >> >> http://plug-and-go.dev.java.net >> >> The MCTS program description is under 'Derived Projects'. >> >> You don't really need the source-code. You can just get the 'binaries >> & scripts' and then copy the files 'TesujiRefBot.jar' and >> 'TesujiRefBot.xml' to the directory where you put the binaries and >> everything works automagically. It's all plug and go. You may want to >> change the number of nodes to read in the XML file to 50K (it's called >> minimumNrNodes). >> >> Of course if you prefer to get the source you are free to do so. >> >> I do remember making a significant fix a little while ago that I don't >> think I submitted yet. But I won't be able to commit that for a few >> more days as I'm not at home. But if you implemented RAVE correctly >> your bot should do at least as well as this one. >> >> Hopefully it's any help. >> >> Mark >> >> >> On Sat, Feb 7, 2009 at 7:18 AM, Isaac Deutsch wrote: >> > >> >> I'm currently tied up but you can get my MCTS implementation, which >> >> includes RAVE, and set it up to play 50K playouts. It's only a matter >> >> of setting the right number in the configuration file. >> >> >> >> You can also use it to play through two-gtp, that way you can test an >> >> awful lot faster. >> >> >> >> Mark >> > >> > Hi Mark, >> > >> > This would be awesome. Can you send the source code to this eMail >> address? >> > >> > Thanks, ibd >> > -- >> > Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit >> allen: http://www.gmx.net/de/go/multimessenger01 >> > ___ >> > computer-go mailing list >> > computer-go@computer-go.org >> > http://www.computer-go.org/mailman/listinfo/computer-go/ >> > >> ___ >> computer-go mailing list >> computer-go@computer-go.org >> http://www.computer-go.org/mailman/listinfo/computer-go/ > > -- > Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: > http://www.gmx.net/de/go/multimessenger01 > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
I put all files in a folder like so: http://img21.imageshack.us/img21/5272/bild4ya1.png But I get various errors when I try to run, for example, GoEngineGTP.jar with "java -jar ...", also I'm not able to select the RefBot as player. I'm not sure what the others do, but one seems to connect to the 19x19 CGOS server. I don't have any experience with java. Any ideas? Isaac The beginning of the error: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'TesujiRefBot' defined in file [/Users/ibd/Desktop/computer go/Rango/CGOS/TesujiRefBot.xml]: Cannot resolve reference to bean 'referenceLibertyAdministration' while setting bean property 'monteCarloAdministration'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'referenceLibertyAdministration' defined in file [/Users/ibd/Desktop/computer go/Rango/CGOS/TesujiRefBot.xml]: Instantiation of bean failed; nested exception is java.lang.NoClassDefFoundError: tesuji/games/general/MoveIterator Original-Nachricht > Datum: Sat, 7 Feb 2009 09:30:53 -0200 > Von: Mark Boon > An: computer-go > Betreff: Re: [computer-go] How to "properly" implement RAVE? > You can get everything here: > > http://plug-and-go.dev.java.net > > The MCTS program description is under 'Derived Projects'. > > You don't really need the source-code. You can just get the 'binaries > & scripts' and then copy the files 'TesujiRefBot.jar' and > 'TesujiRefBot.xml' to the directory where you put the binaries and > everything works automagically. It's all plug and go. You may want to > change the number of nodes to read in the XML file to 50K (it's called > minimumNrNodes). > > Of course if you prefer to get the source you are free to do so. > > I do remember making a significant fix a little while ago that I don't > think I submitted yet. But I won't be able to commit that for a few > more days as I'm not at home. But if you implemented RAVE correctly > your bot should do at least as well as this one. > > Hopefully it's any help. > > Mark > > > On Sat, Feb 7, 2009 at 7:18 AM, Isaac Deutsch wrote: > > > >> I'm currently tied up but you can get my MCTS implementation, which > >> includes RAVE, and set it up to play 50K playouts. It's only a matter > >> of setting the right number in the configuration file. > >> > >> You can also use it to play through two-gtp, that way you can test an > >> awful lot faster. > >> > >> Mark > > > > Hi Mark, > > > > This would be awesome. Can you send the source code to this eMail > address? > > > > Thanks, ibd > > -- > > Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit > allen: http://www.gmx.net/de/go/multimessenger01 > > ___ > > computer-go mailing list > > computer-go@computer-go.org > > http://www.computer-go.org/mailman/listinfo/computer-go/ > > > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ -- Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01 ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
You can get everything here: http://plug-and-go.dev.java.net The MCTS program description is under 'Derived Projects'. You don't really need the source-code. You can just get the 'binaries & scripts' and then copy the files 'TesujiRefBot.jar' and 'TesujiRefBot.xml' to the directory where you put the binaries and everything works automagically. It's all plug and go. You may want to change the number of nodes to read in the XML file to 50K (it's called minimumNrNodes). Of course if you prefer to get the source you are free to do so. I do remember making a significant fix a little while ago that I don't think I submitted yet. But I won't be able to commit that for a few more days as I'm not at home. But if you implemented RAVE correctly your bot should do at least as well as this one. Hopefully it's any help. Mark On Sat, Feb 7, 2009 at 7:18 AM, Isaac Deutsch wrote: > >> I'm currently tied up but you can get my MCTS implementation, which >> includes RAVE, and set it up to play 50K playouts. It's only a matter >> of setting the right number in the configuration file. >> >> You can also use it to play through two-gtp, that way you can test an >> awful lot faster. >> >> Mark > > Hi Mark, > > This would be awesome. Can you send the source code to this eMail address? > > Thanks, ibd > -- > Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: > http://www.gmx.net/de/go/multimessenger01 > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
> I'm currently tied up but you can get my MCTS implementation, which > includes RAVE, and set it up to play 50K playouts. It's only a matter > of setting the right number in the configuration file. > > You can also use it to play through two-gtp, that way you can test an > awful lot faster. > > Mark Hi Mark, This would be awesome. Can you send the source code to this eMail address? Thanks, ibd -- Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01 ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
I'm currently tied up but you can get my MCTS implementation, which includes RAVE, and set it up to play 50K playouts. It's only a matter of setting the right number in the configuration file. You can also use it to play through two-gtp, that way you can test an awful lot faster. Mark On Fri, Feb 6, 2009 at 3:55 PM, Isaac Deutsch wrote: > The rating of the bot still seems to be drifting upwards, but I think I can > conclude my UCT implementation is OK afterall. Many thanks to the bots > provided. Does someone have a bot that does 50k light playouts + RAVE? I > would be most grateful if you could put them online for a few days of > testing. :-) > > By the way, I've seen 2 games when checking my bot's status where one of the > "myCtest" bots lost because of an illegal ko move. Maybe there's a bug in > handling superko? > > Regards, > Isaac > -- > Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL > für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
On Feb 6, 2009, at 12:55 PM, "Isaac Deutsch" wrote: The rating of the bot still seems to be drifting upwards, but I think I can conclude my UCT implementation is OK afterall. Many thanks to the bots provided. Does someone have a bot that does 50k light playouts + RAVE? I would be most grateful if you could put them online for a few days of testing. :-) I do, and I will. I'm just slow... I reimaged my machine a while back... By the way, I've seen 2 games when checking my bot's status where one of the "myCtest" bots lost because of an illegal ko move. Maybe there's a bug in handling superko? Regards, Isaac -- Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T45 69a ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
On Feb 6, 2009, at 9:55 AM, Isaac Deutsch wrote: By the way, I've seen 2 games when checking my bot's status where one of the "myCtest" bots lost because of an illegal ko move. Maybe there's a bug in handling superko? Not a bug, I never implemented it :-( Christoph ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
The rating of the bot still seems to be drifting upwards, but I think I can conclude my UCT implementation is OK afterall. Many thanks to the bots provided. Does someone have a bot that does 50k light playouts + RAVE? I would be most grateful if you could put them online for a few days of testing. :-) By the way, I've seen 2 games when checking my bot's status where one of the "myCtest" bots lost because of an illegal ko move. Maybe there's a bug in handling superko? Regards, Isaac -- Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
Thanks Christoph. I've changed my bot to play 50k games. I know the results aren't very statistically meaningful yet, but the 2 bots look close to each other already. ;-) I will let it run for some more days if I can. http://img145.imageshack.us/img145/3586/bild3bk3.png -ibd > I have started the following versions of 'myCtest': > > myCtest-50k-UCT (Bayes-ELO= 1618+-14) > myCtest-10k-AMAF (Bayes-ELO= 1481+-14) > myCtest-10k-UCT (Bayes-ELO= 1334+-12) > > Christoph -- Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01 ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
On Tue, 3 Feb 2009, Jason House wrote: That kind of setup should make it easier to compare. There have been a few times in the past where multiple authors posted similar bots with the same configurations and let them all duke it out on the server for a while. Once upon a time, I was the owner of a bot that was performing far worse than yours and trying to figure out why. I forget who owns myctest, but they were very willing to run a bot to help me out. I have started the following versions of 'myCtest': myCtest-50k-UCT (Bayes-ELO= 1618+-14) myCtest-10k-AMAF (Bayes-ELO= 1481+-14) myCtest-10k-UCT (Bayes-ELO= 1334+-12) Christoph ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
On Tue, Feb 3, 2009 at 1:53 PM, Isaac Deutsch wrote: > Hi Jason, > > Thanks for your numbers. I might try to limit my bot to 50k playouts and 1 > core, but I usually simulate as long as time permits. That kind of setup should make it easier to compare. There have been a few times in the past where multiple authors posted similar bots with the same configurations and let them all duke it out on the server for a while. Once upon a time, I was the owner of a bot that was performing far worse than yours and trying to figure out why. I forget who owns myctest, but they were very willing to run a bot to help me out. > Do you suspect my > pure UCT version has bugs, too, judging from its rating? Maybe, but it's not all that likely. It could be that your RAVE implementation isn't buggy either. IIRC, when I was running my bots, there were very few bots in the 1600-2100 ELO range. That may have skewed my results. > I find it hard to > find good tests for the correctness of a program depending on "randomness", > and even harder to find bugs. I created a bunch of unit tests to test very specific cases with my search. Looking at http://housebot.svn.sourceforge.net/viewvc/housebot/branches/0.7/search/uct.d?view=markupthose tests are as follows: - (line 743) Exploitation of perfect children - Two children with 100% winning rate. Run 100 simulations and ensure that only one child was explored. - (line 767) Evaluation of foced sequences - Commented out. I never added that functionality (Auto-expand when only one follow-up move is available) - (line 777) Recognition of a lost position - The end of the game is one move away, and the game is lost. Ensure that each leaf is evaluated once and that the search stops and declares the position as lost - (line 793) Recognition of a won position - The end of the game is one move away and the game is won. Ensure that only one leaf is evaluated and that the search stops and declares the position as won - (line 809) Two ply search of a won position - (line 834) Searches with solved children (some won, some lost) - (line 862) Searches where a winning subtree gets solved through a transposed position - Multiple paths exist to the same subtree. Force the evaluation through one path to complete and then force the evaluation through another path and ensure the conclusion is picked up - (line 894) Searches where a losing subtree gets solved through a transposed position - (line 924) Searches where a losing terminal gets solved through a transposed position I focused most of the tests on terminal positions that were easier to understand how they should turn out. It also had the nice side effect of helping me speed up my endgame handling in addition to helping me find several bugs that had been plaguing my bot. > Maybe we can set up our bots to play under > similar circumstances on CGOS. Yes, I'll set it up soon (but probably not today). > > > Regards, > Isaac > > > My bot is about 1/4-1/2 of the speed of yours, but here are my strength > > numbers (median of reported bayeselo numbers below): > > HouseBot Rango_004 > > RAVE - 1778 ELO 1721 ELO > > UCT- 1587 ELO 1642 ELO > > > > Given the tremendous speed difference between our bots, I suspect you may > > have some debugging to do :( I'll try to put my bot up on CGOS again in > > the > > next few days. Maybe some head to head games will best answer how our > > implementations compare to each other. > > > -- > Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL > für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
Hi Jason, Thanks for your numbers. I might try to limit my bot to 50k playouts and 1 core, but I usually simulate as long as time permits. Do you suspect my pure UCT version has bugs, too, judging from its rating? I find it hard to find good tests for the correctness of a program depending on "randomness", and even harder to find bugs. Maybe we can set up our bots to play under similar circumstances on CGOS. Regards, Isaac > My bot is about 1/4-1/2 of the speed of yours, but here are my strength > numbers (median of reported bayeselo numbers below): > HouseBot Rango_004 > RAVE - 1778 ELO 1721 ELO > UCT- 1587 ELO 1642 ELO > > Given the tremendous speed difference between our bots, I suspect you may > have some debugging to do :( I'll try to put my bot up on CGOS again in > the > next few days. Maybe some head to head games will best answer how our > implementations compare to each other. -- Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
On Sun, Feb 1, 2009 at 4:46 PM, Isaac Deutsch wrote: > By the way, I got about 75 ELO points (1650->1720) with light playouts out > of RAVE. Do you think this is in the expected range? It's not really similar > to the 20%->60% win rate rise vs. GnuGo described in some papers... My bot is about 1/4-1/2 of the speed of yours, but here are my strength numbers (median of reported bayeselo numbers below): HouseBot Rango_004 RAVE - 1778 ELO 1721 ELO UCT- 1587 ELO 1642 ELO Given the tremendous speed difference between our bots, I suspect you may have some debugging to do :( I'll try to put my bot up on CGOS again in the next few days. Maybe some head to head games will best answer how our implementations compare to each other. HouseBot parameters: One search thread Max 50k playouts per move (time control may reduce) No pondering Variants with RAVE in the name: UCT+RAVE Variants with UCT in the name: UCT without RAVE hb-797-RAVE-50k 1826 hb-791-RAVE-50k 1778 hb-819-RAVE-50k 1715 hb-799-UCT-50k 1606 hb-797-UCT-50k 1605 hb-770-UCT-50k 1570 hb-791-UCT-50k 1556 ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
I actually tried leaf parallelization first, but after reading the mentioned paper I switched to an implementation of root parallelization (as described). I'm not sure if I implemented it correctly (like in my description), but after testing a 2-core-version against a single- core-version with a few games, I can say the dual core version wins at least 75% of the games, which I think would be an ELO difference of about 200. I have yet to switch colors but the results should be similar. -ibd > There were a couple of papers [2] at CG2008 on this subject. The > consensus seemed to be that root parallelization [1] was best. In fact > Guillaume Chaslot's version got a strength speedup of 3.0 using 2 > threads, and 6.5 using 4 threads (dropping to 14.9 with 16 threads). > This is of course impossible, and implies the parallel version is > somehow doing MCTS better than the single thread algorithm! > > Darren > > [1]: Build multiple MCTS trees in parallel, one per thread. No > communication. At end add the trees together and use grand total to > select move. > > [2]: > http://www.cs.unimaas.nl/g.chaslot/papers/parallelMCTS.pdf > and > "A Parallel Monte-Carlo Tree Search Algorithm" by Tristan Cazenave (I > couldn't seem to find a PDF link.) -- Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01 ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
> My first (braindead) multi-threaded UCT played weaker with two cores > than one core. How do you combine search trees/results? How do you pick > a move to play? There were a couple of papers [2] at CG2008 on this subject. The consensus seemed to be that root parallelization [1] was best. In fact Guillaume Chaslot's version got a strength speedup of 3.0 using 2 threads, and 6.5 using 4 threads (dropping to 14.9 with 16 threads). This is of course impossible, and implies the parallel version is somehow doing MCTS better than the single thread algorithm! Darren [1]: Build multiple MCTS trees in parallel, one per thread. No communication. At end add the trees together and use grand total to select move. [2]: http://www.cs.unimaas.nl/g.chaslot/papers/parallelMCTS.pdf and "A Parallel Monte-Carlo Tree Search Algorithm" by Tristan Cazenave (I couldn't seem to find a PDF link.) -- Darren Cook, Software Researcher/Developer http://dcook.org/mlsn/ (English-Japanese-German-Chinese-Arabic open source dictionary/semantic network) http://dcook.org/work/ (About me and my work) http://dcook.org/blogs.html (My blogs and articles) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
On Feb 2, 2009, at 12:09 PM, "Isaac Deutsch" wrote: Wow, thanks for all the answers! You're being really helpful. "Do you use UCT with a too large exploration term?" That's a good idea. I actually use a rather big value for c=0.5. I might try lowering it. Thanks! (Precisely, the formula is c*sqrt(log(p)/c).) That formula is wrong. Did you mean to have a denominator of c? "My first (braindead) multi-threaded UCT played weaker with two cores than one core. How do you combine search trees/results? How do you pick a move to play?" I have the threads write to a global array (of course one after another that stores the values and the numbers of games played (each thread adds to these values) of the first ply. I then pick the move with the most games played through that node. The value of this best move (sum/numplayed) determines if the bot should resign. That sounds a lot like what I did (two independent searches that are combined when genmove is called). I believe that strategy had a 100+ ELO loss in strength for me. I'm not sure why that was the case (but have some theories). You may want to experiment with one search thread to see if it hurts your performance too. "Yes, and you should go by the bayeselo page which is a better picture of what is going on." The ELOs posted here refer to these values. "I don't think many people realize that you have to play hundreds of games just to be within 40 or 50 ELO with much certainty. If you play less than 100 games you could easily be off by over 100 ELO." Maybe I'm a bit (a lot :) impatient, but I try to make a rough guess after about 150-200 games, which is about 1-2 days of letting the program run on the server. I think it's possible to say after 150 games that RAVE did not give me a 300 ELO boost. Ofline testing should be faster... Thanks again, Greetings, Isaac -- Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T45 69a ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
On Mon, 2009-02-02 at 18:09 +0100, Isaac Deutsch wrote: > "I don't think many people realize that you have to play hundreds of > games just to be within 40 or 50 ELO with much certainty. If you > play > less than 100 games you could easily be off by over 100 ELO." > > Maybe I'm a bit (a lot :) impatient, but I try to make a rough guess > after > about 150-200 games, which is about 1-2 days of letting the program > run on > the server. I think it's possible to say after 150 games that RAVE did > not > give me a 300 ELO boost. You already reported 75 ELO improvement. Gelly said you should get 200-300. So you are only about 125 off of Gelly's lower bound (which sounded like a rough guess to me.) So you have a lot of uncertainty here and only 150 games. Of course this is plenty enough to suspect a problem and investigate, but it's not enough to conclude that you surely did something wrong. How much did you test the version that doesn't have this change? You have 3 sources of slop here: 1. Gelly's rough estimate of how much you should get. 2. The rating uncertainty of the unmodified program. 3. The rating uncertainty of the modified program. - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
Wow, thanks for all the answers! You're being really helpful. "Do you use UCT with a too large exploration term?" That's a good idea. I actually use a rather big value for c=0.5. I might try lowering it. Thanks! (Precisely, the formula is c*sqrt(log(p)/c).) "My first (braindead) multi-threaded UCT played weaker with two cores than one core. How do you combine search trees/results? How do you pick a move to play?" I have the threads write to a global array (of course one after another that stores the values and the numbers of games played (each thread adds to these values) of the first ply. I then pick the move with the most games played through that node. The value of this best move (sum/numplayed) determines if the bot should resign. "Yes, and you should go by the bayeselo page which is a better picture of what is going on." The ELOs posted here refer to these values. "I don't think many people realize that you have to play hundreds of games just to be within 40 or 50 ELO with much certainty. If you play less than 100 games you could easily be off by over 100 ELO." Maybe I'm a bit (a lot :) impatient, but I try to make a rough guess after about 150-200 games, which is about 1-2 days of letting the program run on the server. I think it's possible to say after 150 games that RAVE did not give me a 300 ELO boost. Thanks again, Greetings, Isaac -- Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
On Mon, 2009-02-02 at 09:40 -0500, Jason House wrote: > Also, I noticed your rank measurements were based on CGOS results > after relatively few games. It can retain significant bias for quite > a > while. Yes, and you should go by the bayeselo page which is a better picture of what is going on. I don't think many people realize that you have to play hundreds of games just to be within 40 or 50 ELO with much certainty. If you play less than 100 games you could easily be off by over 100 ELO. - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
On Feb 2, 2009, at 9:40 AM, Jason House wrote: On Feb 2, 2009, at 6:57 AM, "Isaac Deutsch" wrote: Hi Issac, You should be more in the range of +200-300 ELO, at least with pattern based playouts. Sylvain Isaac. They are not pattern based playouts, but as I said uniformly random. I reckon the effect of RAVE is less with these? "How many playouts per second do you get with each version?" Actually, to make comparable tests with both versions, the version "without RAVE" just sets the coefficient of RAVE in the UCT-RAVE value calculation to zero. I get about 41k games/s on 9x9 (using 2 cores, about 20k/s/ core). The playouts are fairly optimized but the tree search isn't at all yet, so there is still some potential. My first (braindead) multi-threaded UCT played weaker with two cores than one core. How do you combine search trees/results? How do you pick a move to play? Single-threaded RAVE with no parameter tuning, an ancient RAVE formula, and 10k light playouts per second got me into the 1600's on CGOS. I used Don Dailey's AMAF methodology for RAVE (first color to play on a point, keep 7/8 of move list). Actually, that rating was probably using my own home-grown RAVE formula. Sorry for the misinformation. Also, I noticed your rank measurements were based on CGOS results after relatively few games. It can retain significant bias for quite a while. -- Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569 a ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
On Feb 2, 2009, at 6:57 AM, "Isaac Deutsch" wrote: Hi Issac, You should be more in the range of +200-300 ELO, at least with pattern based playouts. Sylvain Isaac. They are not pattern based playouts, but as I said uniformly random. I reckon the effect of RAVE is less with these? "How many playouts per second do you get with each version?" Actually, to make comparable tests with both versions, the version "without RAVE" just sets the coefficient of RAVE in the UCT-RAVE value calculation to zero. I get about 41k games/s on 9x9 (using 2 cores, about 20k/s/ core). The playouts are fairly optimized but the tree search isn't at all yet, so there is still some potential. My first (braindead) multi-threaded UCT played weaker with two cores than one core. How do you combine search trees/results? How do you pick a move to play? Single-threaded RAVE with no parameter tuning, an ancient RAVE formula, and 10k light playouts per second got me into the 1600's on CGOS. I used Don Dailey's AMAF methodology for RAVE (first color to play on a point, keep 7/8 of move list). Also, I noticed your rank measurements were based on CGOS results after relatively few games. It can retain significant bias for quite a while. -- Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T45 69a ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
I have about 200-300k games/move, so maybe the effect is even less. But, maybe I still have a grave bug somewhere. I will check again. Cheers, ibd > On Feb 2, 2009, at 9:57 AM, Isaac Deutsch wrote: > > > They are not pattern based playouts, but as I said uniformly random. > > I reckon the effect of RAVE is less with these? > > My MCTS implementation sees a gain of 200-400 ELO from RAVE using > uniformly random moves. 400 gain (90% win-rate) for 2K playouts, > becoming slowly less for larger numbers of playouts. > > Mark -- Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01 ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
On Feb 2, 2009, at 9:57 AM, Isaac Deutsch wrote: They are not pattern based playouts, but as I said uniformly random. I reckon the effect of RAVE is less with these? My MCTS implementation sees a gain of 200-400 ELO from RAVE using uniformly random moves. 400 gain (90% win-rate) for 2K playouts, becoming slowly less for larger numbers of playouts. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
> Hi Issac, > You should be more in the range of +200-300 ELO, at least with pattern > based > playouts. > > Sylvain Isaac. They are not pattern based playouts, but as I said uniformly random. I reckon the effect of RAVE is less with these? "How many playouts per second do you get with each version?" Actually, to make comparable tests with both versions, the version "without RAVE" just sets the coefficient of RAVE in the UCT-RAVE value calculation to zero. I get about 41k games/s on 9x9 (using 2 cores, about 20k/s/core). The playouts are fairly optimized but the tree search isn't at all yet, so there is still some potential. -- Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
Hi Issac, You should be more in the range of +200-300 ELO, at least with pattern based playouts. Sylvain 2009/2/1 Isaac Deutsch > By the way, I got about 75 ELO points (1650->1720) with light playouts out > of RAVE. Do you think this is in the expected range? It's not really similar > to the 20%->60% win rate rise vs. GnuGo described in some papers... > > > At the moment I'm also tuning the bias in the range 0.001-0.1. Given > > uniformly random (light) playouts, is the bias expected to be bigger than > > with > > heavy playouts, meaning the constant has to be bigger aswell? > > Cheers, ibd > -- > Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: > http://www.gmx.net/de/go/multimessenger01 > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
How many playouts per second do you get with each version? Sent from my iPhone On Feb 1, 2009, at 4:46 PM, "Isaac Deutsch" wrote: By the way, I got about 75 ELO points (1650->1720) with light playouts out of RAVE. Do you think this is in the expected range? It's not really similar to the 20%->60% win rate rise vs. GnuGo described in some papers... At the moment I'm also tuning the bias in the range 0.001-0.1. Given uniformly random (light) playouts, is the bias expected to be bigger than with heavy playouts, meaning the constant has to be bigger aswell? Cheers, ibd -- Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit al len: http://www.gmx.net/de/go/multimessenger01 ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
By the way, I got about 75 ELO points (1650->1720) with light playouts out of RAVE. Do you think this is in the expected range? It's not really similar to the 20%->60% win rate rise vs. GnuGo described in some papers... > At the moment I'm also tuning the bias in the range 0.001-0.1. Given > uniformly random (light) playouts, is the bias expected to be bigger than > with > heavy playouts, meaning the constant has to be bigger aswell? Cheers, ibd -- Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01 ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
At the moment I'm also tuning this constant in the range 0.001-0.1. Given uniformly random (light) playouts, is the bias expected to be bigger than with heavy playouts, meaning the constant has to be bigger aswell? -- Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01 ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
Quoting Sylvain Gelly : On Fri, Jan 30, 2009 at 9:29 AM, Magnus Persson wrote: Did you try to tune the bias constant at all or just took the one I posted? I wrote it from memory and I believe it is in the range of possibly good values, but it is certainly not optimal, I can be off by a significant factor in both directions. Yes I always tries to test at least 5 values. Three values centered around what I currently believe is the best value and two or more values to the extremes to get the big picture and make sure that I am at least the maximum. ConstantWinrate Against Gnugo 0.00015 43.9 0.0015 50 0.015 50 0.1543.1 1 7.1 10 6.7 The bias can't possibly be 1 because by definition it is the difference between the expected value of the normal tree search and the expected value of AMAF. As those two values are in [0,1] anyway, the bias is certainly not 1, and in most cases very small. I think I was confused about exactly what in the term that was the bias. I do not really remember what and why I thought these things. Anyway your explanation is very clear. On a side note I think Valkyria for some moves have very large biases because the playouts are very heavy. I am thinking of an experiment to test this. One way is to simply play uniform playouts from a single position N times for all moves and thus get accurate win rates for all those moves as well as AMAF values. Then taking the difference in values for AMAF and real win rates should reveal the size of bias in general and if there are systematic difference for certain patterns or situations. Alternatively one could play N playouts for a single move and see what AMAF values one get. Now can see to what extent the biases are stable when moves are played on the board. The real values will change for all moves and the AMAF as well, but will the bias be constant? Or will it be random as a function of the position. There is no tree search in these two cases. Selective search is often similar to the second situation, and my question is whether this can cause larger systematic biases for some moves, that thus is missed. And if one understands the nature of such biases maybe they can be neutralized somehow. It could be that using patterns to start with strong biases in the prior AMAF values is doing exactly this. Has anyone has done anything like this or have any interesting opinions I would be thankful. -Magnus ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
On Fri, Jan 30, 2009 at 9:29 AM, Magnus Persson wrote: > Quoting Sylvain Gelly : > > On Wed, Jan 28, 2009 at 10:01 PM, Isaac Deutsch wrote: >> >> And a final question: You calculate the (beta) coefficient as >>> c = rc / (rc+c+rc*c*BIAS); >>> which looks similar to the formula proposed by David Silver (If I recall >>> his name correctly). However, in his formula, the last term looks like >>> rc*c*BIAS/(q_ur*(1-q_ur)) >>> Is it correct that we could get q_ur from the current UCT-RAVE mean >>> value, >>> and that it is used like that? >>> >> >> >> Yes the formula looks very similar (David proposed that formula to me in >> the >> beginning of 2007). However my implementation did not contain >> the (q_ur*(1-q_ur) factor, that I approximated by a constant, taking q=0.5 >> so the factor=0.25. >> I did not try the other formula, maybe it works better in practice, while >> I >> would expect it is similar in practice. >> > > Valkyria uses an even more complicated version of what David Silver > proposed (I really did not understand it so I came up with something that > looked plausible to me that actually estimated the bias for each candidate > move rather than assuming it constant). > > When Sylvain proposed this simple version I tested that version against my > own interpretation. On 9x9 my complicated version might have a win rate 3% > better than the simple version for 3 data points (700 games each) near the > maximum. The standard error according to twogtp is 1.9. > > On 19x19 I got results where there no difference at all but with much > higher uncertainty because there was not many games played. Did you try to tune the bias constant at all or just took the one I posted? I wrote it from memory and I believe it is in the range of possibly good values, but it is certainly not optimal, I can be off by a significant factor in both directions. > But the term is important for sure, the constant BIAS used should be larger > than 0 but you should be careful to not set it too high. For Valkyria the > 0.015 value Sylvain posted here worked fine. But if it is higher for example > 0.15 leads to bad performance on 9x9 and catastrophic performance on 19x19. > Initially I thought this parameter should be something like 1 and that was > completely wrong. I was just lucky to get it right, just by visual > inspection of the search behavior when I played around with the parameter. The bias can't possibly be 1 because by definition it is the difference between the expected value of the normal tree search and the expected value of AMAF. As those two values are in [0,1] anyway, the bias is certainly not 1, and in most cases very small. > > > The reason is that the bias term of the denominator should be close to zero > initially is to allow AMAF to have strong impact on the search but at some > point (which is delayed by having a small BIAS constant) there is a quick > shift towards using the real win rate instead. > > -Magnus > > > > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
Quoting Sylvain Gelly : On Wed, Jan 28, 2009 at 10:01 PM, Isaac Deutsch wrote: And a final question: You calculate the (beta) coefficient as c = rc / (rc+c+rc*c*BIAS); which looks similar to the formula proposed by David Silver (If I recall his name correctly). However, in his formula, the last term looks like rc*c*BIAS/(q_ur*(1-q_ur)) Is it correct that we could get q_ur from the current UCT-RAVE mean value, and that it is used like that? Yes the formula looks very similar (David proposed that formula to me in the beginning of 2007). However my implementation did not contain the (q_ur*(1-q_ur) factor, that I approximated by a constant, taking q=0.5 so the factor=0.25. I did not try the other formula, maybe it works better in practice, while I would expect it is similar in practice. Valkyria uses an even more complicated version of what David Silver proposed (I really did not understand it so I came up with something that looked plausible to me that actually estimated the bias for each candidate move rather than assuming it constant). When Sylvain proposed this simple version I tested that version against my own interpretation. On 9x9 my complicated version might have a win rate 3% better than the simple version for 3 data points (700 games each) near the maximum. The standard error according to twogtp is 1.9. On 19x19 I got results where there no difference at all but with much higher uncertainty because there was not many games played. But the term is important for sure, the constant BIAS used should be larger than 0 but you should be careful to not set it too high. For Valkyria the 0.015 value Sylvain posted here worked fine. But if it is higher for example 0.15 leads to bad performance on 9x9 and catastrophic performance on 19x19. Initially I thought this parameter should be something like 1 and that was completely wrong. I was just lucky to get it right, just by visual inspection of the search behavior when I played around with the parameter. The reason is that the bias term of the denominator should be close to zero initially is to allow AMAF to have strong impact on the search but at some point (which is delayed by having a small BIAS constant) there is a quick shift towards using the real win rate instead. -Magnus ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
On Wed, Jan 28, 2009 at 10:01 PM, Isaac Deutsch wrote: > Hi again ;) > > I found some time to actually implement this stuff. And, this has raised > some small questions. In this part of the code: > > for (j = index; j < moves_played_in_tree.size(); j += 2) { > //stuff > } > for (j = 0; j < moves_played_out_tree.size(); ++j) { > //more stuff > >// If it is either not the right color or the intersection is >// already played we ignore that move for that node >if (move < 0 || already_played.AlreadyPlayed(move)) continue > >already_played.Play(move) > //stuff > } > > 1. Shouldn't the first loop start at j=index+1? Starting at j=index would > mean that the RAVE value of the node is updated with the move of the node > itself, wouldn't it? It makes more sense to me to actually start at the > first child of the node that is being back-upped. Correct me if I'm wrong. No, you need to update the RAVE value of the node for the first move (move taken in the position of the node itself). So it is j=index, and that is important to make the algorithm work. > > 2. Shouldn't the order in the second loop be: > -if (already played): continue; > -update already played; > -if (wrong color): continue; > Otherwise, moves that are the wrong color don't get counted as already > played (because they never get updated). I'm not sure if it makes a > difference in this case because you check in the playouts, too, but maybe > it does. I think it is ok like that because indeed the check is already done in the playout. The "already_played.Play(move)" is actually also unnecessary in the second loop (not really rechecked as I speak, but I think so as far as I remember). > And a final question: You calculate the (beta) coefficient as > c = rc / (rc+c+rc*c*BIAS); > which looks similar to the formula proposed by David Silver (If I recall > his name correctly). However, in his formula, the last term looks like > rc*c*BIAS/(q_ur*(1-q_ur)) > Is it correct that we could get q_ur from the current UCT-RAVE mean value, > and that it is used like that? Yes the formula looks very similar (David proposed that formula to me in the beginning of 2007). However my implementation did not contain the (q_ur*(1-q_ur) factor, that I approximated by a constant, taking q=0.5 so the factor=0.25. I did not try the other formula, maybe it works better in practice, while I would expect it is similar in practice. Sylvain > > > Regards, > Isaac Deutsch > -- > Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: > http://www.gmx.net/de/go/multimessenger > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
Hi again ;) I found some time to actually implement this stuff. And, this has raised some small questions. In this part of the code: for (j = index; j < moves_played_in_tree.size(); j += 2) { //stuff } for (j = 0; j < moves_played_out_tree.size(); ++j) { //more stuff // If it is either not the right color or the intersection is // already played we ignore that move for that node if (move < 0 || already_played.AlreadyPlayed(move)) continue already_played.Play(move) //stuff } 1. Shouldn't the first loop start at j=index+1? Starting at j=index would mean that the RAVE value of the node is updated with the move of the node itself, wouldn't it? It makes more sense to me to actually start at the first child of the node that is being back-upped. Correct me if I'm wrong. 2. Shouldn't the order in the second loop be: -if (already played): continue; -update already played; -if (wrong color): continue; Otherwise, moves that are the wrong color don't get counted as already played (because they never get updated). I'm not sure if it makes a difference in this case because you check in the playouts, too, but maybe it does. And a final question: You calculate the (beta) coefficient as c = rc / (rc+c+rc*c*BIAS); which looks similar to the formula proposed by David Silver (If I recall his name correctly). However, in his formula, the last term looks like rc*c*BIAS/(q_ur*(1-q_ur)) Is it correct that we could get q_ur from the current UCT-RAVE mean value, and that it is used like that? Regards, Isaac Deutsch -- Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
> Well, now mogo has an exploration term - but not at all UCB-like. >> > > I was talking about times where I was still there ... ages ago :) > Good old times :-) you've been helpful several times even from far away :-) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
2009/1/21 Olivier Teytaud > > Of course you can do put much more clever prior if you are a player and >> know the subtleties of the game. >> > > E.g. patterns extracted from databases - but it's not enough, carefully > tune the coefficients for "empty triangles" (important!) and various other > importants patterns/rules (don't just keep the empirical probabilities from > datasets as coefficients). I'm afraid the coefficients are very > implementation-dependent unless the bandit and the random player are > specified with > a lot of details. > > You can put an exploration term, but the cases where it is needed are rare. >> I did a lot of experiments on that, and even at long thinking time, no >> exploration term was always better (statistically). >> > > Well, now mogo has an exploration term - but not at all UCB-like. > I was talking about times where I was still there ... ages ago :) Sylvain ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
> Of course you can do put much more clever prior if you are a player and > know the subtleties of the game. > E.g. patterns extracted from databases - but it's not enough, carefully tune the coefficients for "empty triangles" (important!) and various other importants patterns/rules (don't just keep the empirical probabilities from datasets as coefficients). I'm afraid the coefficients are very implementation-dependent unless the bandit and the random player are specified with a lot of details. You can put an exploration term, but the cases where it is needed are rare. > I did a lot of experiments on that, and even at long thinking time, no > exploration term was always better (statistically). > Well, now mogo has an exploration term - but not at all UCB-like. As said previously, I'm not the main author of this modification and I'll not explain it myself. Also, for the "main" node (the root), the bandit in mogo is different from the bandit elsewhere in the tree. More diversification is useful. But it's not a very important improvement, a few percents. Best regards, Olivier ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
Hi, I did not mention here the prior initialization that is done in each node. When you create a node, you can look at all possible move and if a pattern matches (the exact same as in the playout) you initialize rw and rc to 14. If the move saves a capture (same as in the playout), same initialization, rw and rc to 14. If it is a self atari, you initialize rw to 0 and rc to 14. Else you initialize rw to 7 and rc to 14. Of course you can do put much more clever prior if you are a player and know the subtleties of the game. You can put an exploration term, but the cases where it is needed are rare. I did a lot of experiments on that, and even at long thinking time, no exploration term was always better (statistically). Sylvain 2009/1/21 Mark Boon > > On Jan 21, 2009, at 10:23 AM, Magnus Persson wrote: > > Quoting Thomas Lavergne : >> >> - the best play is a good only if played immediatly and very bad if >>> played later in the game : >>> - the first playout for this play resulted in a lost. >>> score and RAVE score will be very low and this play will never be >>> considered again until a very long time. >>> >> >> >> You raise an interesting concern. >> >> The simple solution to your question is to add an exploration term using >> UCT for example. Then it becomes an empirical question what parameter for >> exploration gives the strongest play. My experience is that the best >> parameter is so small it can be set to zero. >> > > Well, empirically, when I set the exploration component to zero it starts > to play a lot worse. Like I wrote: the winning percentage drops to 24% vs. > the same program with the exploration component, which is a huge difference. > > So if you have a different experience, you must have something else that > overcomes this hurdle that's not part of a simple MCTS-RAVE implementation. > I'd be very interested to learn what that is. Sylvain didn't take the bait > ;-) > > Mark > > > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
> > I'd like to make sure I understand what you mean exactly. You use some > heuristics to intialize all the moves (or maybe some of the moves) with a > certain win-loss and rave-win-loss ratios? Not only ratios, but also numbers of simulations. Thanks to patterns, expert rules. > > > To a certain extent I suppose these could come from the reading of the > previous move? I think I slowly start to make sense of things... > We have (almost) no "warm start" from the values of previous moves. This should be improved, as very clearly there is some time lost here (e.g. patterns should not be checked from scratch). A+ Olivier ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
On Jan 21, 2009, at 11:53 AM, Olivier Teytaud wrote: Here, we have a non-zero initialization of the number of wins, of the numbere of simulations, of the number of Rave-wins, of the number of Rave-losses. We have then a 0 constant for exploration, but also an exploratory term which is very different, and for which I am not the main author - therefore I let the main author give an explanation if he wants to :-) I point out that even before this exploratory term, the best UCB- like exploration-constant was 0 - as soon as the initializations of numbers of wins, of losses, of Rave-wins, of Rave-losses are heuristic values. I'd like to make sure I understand what you mean exactly. You use some heuristics to intialize all the moves (or maybe some of the moves) with a certain win-loss and rave-win-loss ratios? To a certain extent I suppose these could come from the reading of the previous move? I think I slowly start to make sense of things... Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
> > Well, empirically, when I set the exploration component to zero it starts > to play a lot worse. Like I wrote: the winning percentage drops to 24% vs. > the same program with the exploration component, which is a huge difference. > > So if you have a different experience, you must have something else that > overcomes this hurdle that's not part of a simple MCTS-RAVE implementation. > I'd be very interested to learn what that is. Sylvain didn't take the bait > ;-) > Here, we have a non-zero initialization of the number of wins, of the numbere of simulations, of the number of Rave-wins, of the number of Rave-losses. We have then a 0 constant for exploration, but also an exploratory term which is very different, and for which I am not the main author - therefore I let the main author give an explanation if he wants to :-) I point out that even before this exploratory term, the best UCB-like exploration-constant was 0 - as soon as the initializations of numbers of wins, of losses, of Rave-wins, of Rave-losses are heuristic values. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
On Jan 21, 2009, at 10:23 AM, Magnus Persson wrote: Quoting Thomas Lavergne : - the best play is a good only if played immediatly and very bad if played later in the game : - the first playout for this play resulted in a lost. score and RAVE score will be very low and this play will never be considered again until a very long time. You raise an interesting concern. The simple solution to your question is to add an exploration term using UCT for example. Then it becomes an empirical question what parameter for exploration gives the strongest play. My experience is that the best parameter is so small it can be set to zero. Well, empirically, when I set the exploration component to zero it starts to play a lot worse. Like I wrote: the winning percentage drops to 24% vs. the same program with the exploration component, which is a huge difference. So if you have a different experience, you must have something else that overcomes this hurdle that's not part of a simple MCTS-RAVE implementation. I'd be very interested to learn what that is. Sylvain didn't take the bait ;-) Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
Quoting Thomas Lavergne : - the best play is a good only if played immediatly and very bad if played later in the game : - the first playout for this play resulted in a lost. score and RAVE score will be very low and this play will never be considered again until a very long time. You raise an interesting concern. The simple solution to your question is to add an exploration term using UCT for example. Then it becomes an empirical question what parameter for exploration gives the strongest play. My experience is that the best parameter is so small it can be set to zero. I think the conditions you defined are very rarely completely fulfilled. What can be true often however is that a single bad move makes the best move very bad if played later in the game. If the bad move happen to be the second best move, it will be searched a lot lowering the AMAF score (rw/rc) for the best move. This is likely to happen when there are several local moves that more or less solves the same problem. That is when one move is played the effect of the other move played later will overlap with the first. -Magnus ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
On Sat, Jan 17, 2009 at 08:29:32PM +0100, Sylvain Gelly wrote: > ChooseMove(node, board) { > bias = 0.015 // I put a random number here, to be tuned > b = bias * bias / 0.25 > best_value = -1 > best_move = PASSMOVE > for (move in board.allmoves) { > c = node.child(move).counts > w = node.child(move).wins > rc = node.rave_counts[move] > rw = node.rave_wins[move] > coefficient = 1 - rc / (rc + c + rc * c * b) > value = w / c * coef + rw / rc * (1 - coef) // please here take care of > the c==0 and rc == 0 cases > if (value > best_value) { > best_value = value > best_move = move > } > } > return best_move > } Hi, it seems to me that, when you select play in the tree, you don't have an exploration component. You use just a weighted average of score and RAVE score. So, if : - the best play is a good only if played immediatly and very bad if played later in the game : - the first playout for this play resulted in a lost. score and RAVE score will be very low and this play will never be considered again until a very long time. Is it simplified code and in reality you replace w/c and rw/rc by scores with exploration component or did you realy use it as is ? Tom -- Thomas Lavergne"Entia non sunt multiplicanda praeter necessitatem." (Guillaume d'Ockham) thomas.laver...@reveurs.orghttp://oniros.org ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
On Jan 18, 2009, at 5:38 PM, Magnus Persson wrote: In Valkyria I solved this by playing out the ladder on the playout board, and store all changes on a stack. When the ladder undos moves it just pop changes from the stack. In this way I can also use the rich board representation of Valkyria to setup the ladder fast. The drawback is that the code is ugly and slightly buggy when stones are sacrificed during the search. A todo thing is to write a more correct ladder reader that still uses the idea of sharing the array with board with the playout, so that no copying has to be done. My ladder code re-uses the board array of the playout module. But copying this array before reading a ladder doesn't slow things down all that much. As has been discussed on this list elsewhere, copying an array is fast nowadays. Re-using the array is actually a way to make sure your ladder-reading is without bugs when doing undo. If something goes wrong you notice immediately... Maybe what David alluded to is that the playout has no undo so he can't use it to play and undo moves during ladder search. In my case the ladder reader only needs an array with the position. For the rest it's completely self-contained. That adds maybe a little bit when setting up the ladder reading, but it's still fast. Mark -Magnus Quoting David Fotland : I think it is too expensive to read ladders during playouts. I remember that you have faster ladders search code so it might not cost you as much. My playout code has no ability to undo a move or do any kind of lookahead. David -- Magnus Persson Berlin, Germany ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] How to "properly" implement RAVE?
In Valkyria I solved this by playing out the ladder on the playout board, and store all changes on a stack. When the ladder undos moves it just pop changes from the stack. In this way I can also use the rich board representation of Valkyria to setup the ladder fast. The drawback is that the code is ugly and slightly buggy when stones are sacrificed during the search. A todo thing is to write a more correct ladder reader that still uses the idea of sharing the array with board with the playout, so that no copying has to be done. -Magnus Quoting David Fotland : I think it is too expensive to read ladders during playouts. I remember that you have faster ladders search code so it might not cost you as much. My playout code has no ability to undo a move or do any kind of lookahead. David -- Magnus Persson Berlin, Germany ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
On Jan 18, 2009, at 4:11 PM, David Fotland wrote: I think it is too expensive to read ladders during playouts. I remember that you have faster ladders search code so it might not cost you as much. My playout code has no ability to undo a move or do any kind of lookahead. Yes, my ladder code is fast. But it has been public for many years, I had figured others would have caught up on that. My playout code also has no ability to do undo. If I remember well, adding a few simple tactical searches during playouts made the performance go from 20Kps to 15Kps. But it moved to a winning-rate of about 90%. Reading tactics has to be really slow for that not to be worth it. I did find a nice heuristic to cut in less than half the amount of tactical reading necessary without any noticeable loss in playing strength. I'll write it down one day, I think it may have applicability beyond tactical reading and be a general concept to be used during playouts. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] How to "properly" implement RAVE?
I think it is too expensive to read ladders during playouts. I remember that you have faster ladders search code so it might not cost you as much. My playout code has no ability to undo a move or do any kind of lookahead. David > > Some examples: David Fotland wrote he does light playouts with just a > few patterns but no tactics. I find that using a moderate amount of > tactics actually is the biggest contributor to playing strength (save > one or more stones if can't be caught in ladder). However, augmenting > patterns with tactical information I found doesn't help at all, even > when disregarding the performance cost. Maybe David uses some patterns > to compensate for part of the tactics and relies on the faster > playouts to compensate for poorer playouts. I'm guessing here, but > otherwise I can't imagine why he would forego what otherwise seems to > be a big gain in strength. > > Mark > > > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
On Jan 17, 2009, at 5:41 PM, Sylvain Gelly wrote: For the first difference you mention, as far as I remember it makes a small but significant difference and is one of the main reason I was talking about "tricky details". OK, I ran a test and after 1,000 games with 2K semi-light playouts I get a winning percentage of 50.6% for your methods vs. mine. Of course it's possible I made some mistake, but my first impression is it makes no difference which way you do this particular detail. Your ChooseNode is also quite different from mine, mostly because I also still have a UCT component in there. I'll give your method a go one day, just to see if it changes anything. I've come to understand what you mean by "tricky details", sometimes I see a big difference in playing strength that I find hard to explain given the change(s) I made. Conversely I've been in quite a few cases where I thought something would make a difference, only to find out it all didn't matter one bit. It's also possible that some deficiencies that would be apparent in one implementation, get compensated for in another. Some examples: David Fotland wrote he does light playouts with just a few patterns but no tactics. I find that using a moderate amount of tactics actually is the biggest contributor to playing strength (save one or more stones if can't be caught in ladder). However, augmenting patterns with tactical information I found doesn't help at all, even when disregarding the performance cost. Maybe David uses some patterns to compensate for part of the tactics and relies on the faster playouts to compensate for poorer playouts. I'm guessing here, but otherwise I can't imagine why he would forego what otherwise seems to be a big gain in strength. I also tried to use ownership maps to modify the RAVE value. Remi Coulom wrote in a paper he used ownership information of up to 63 playouts. When I tried something similar it always makes play weaker. Maybe I should use it in a different way, but I haven't stumbled on the solution yet. When I think of it, AMAF information is already something very similar to ownership information. So maybe combining the two doesn't make much sense. Lastly, in an earlier UCT bot that I made I gained a lot by initially reducing the number of moves and slowly expanding it. After using AMAF it turns out this method hardly gains anything at all anymore. So the devil is not only in the details, it's also in the combination of the details. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
Good catch :)Indeed it makes no sense with the "*", sorry... Sylvain 2009/1/17 Magnus Persson > I think I found a bug in ChooseMove > > Quoting Sylvain Gelly : > > coefficient = 1 - rc * (rc + c + rc * c * b) >> > > I think this has to be > > coefficient = 1 - rc / (rc + c + rc * c * b) > > thus when c is 0 (initially) the coefficient is 0 > when c goes towards infinity the coefficent goes 1 > > which is how this function should behave. > > Magnus > > -- > Magnus Persson > Berlin, Germany > > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
I think I found a bug in ChooseMove Quoting Sylvain Gelly : coefficient = 1 - rc * (rc + c + rc * c * b) I think this has to be coefficient = 1 - rc / (rc + c + rc * c * b) thus when c is 0 (initially) the coefficient is 0 when c goes towards infinity the coefficent goes 1 which is how this function should behave. Magnus -- Magnus Persson Berlin, Germany ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
Hi Issac, You are welcome, and I am happy there is finally a clearer of implementing RAVE out there. I believe I should have done it much earlier, sorry for that, but "better late than never", no? :) Best, Sylvain 2009/1/17 Isaac Deutsch > > Hi, > > > > Sorry for the slow reply. > > Hi > > I'd prefer quality over speed anytime. ;) > Your pseudo code is excellent and helped me to understand some of the > trickier things. Thanks again! > I think I will now be able to implement my own version. :) > > Regards, > Isaac > -- > Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: > http://www.gmx.net/de/go/multimessenger > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
Hi Mark, For the first difference you mention, as far as I remember it makes a small but significant difference and is one of the main reason I was talking about "tricky details". For the second difference, I also don't want to "count a move if the opposite color played on that point first", and, unless I am mistaken, I indeed don't count them in the part of the playout which is outside the tree. However you are right for the part inside the tree, due to the "j+=2". I am a little confused and now believe it should not be a j+=2 but a j++, updating the "already_played" map for each move inside the tree during the backup, but doing the backup only once every too moves (to match the color). It may be a bug in what I proposed, I cannot test it anymore. However, what I am sure about is that this "j+=2" is what I was using and gave very good results. If what you point out is indeed a bug and makes a difference, then it can only be even better :). I prefer not modifying the pseudo code I posted until someone verifies it becomes better. Thanks, Sylvain 2009/1/17 Mark Boon > Thanks for taking the time Sylvain. I took a quick look to see how your > pseudo-code compared to the reference implementation I made. There are a few > small differences, and I think the place(s) where I deviated is exactly what > confused Daniel Waeber. > > The first difference is that when I check whether a point has been played, > I always start from the position of the root-node, whereas you start from > the position of the node where the moves_played_in_tree is played. The > second difference is I also don't count a move if the opposite color played > on that point first. The result is I only need to compute the alreadyPlayed > map once (in my code these are called weightMap and colorMap, I need two > maps because I use decreasing weights) instead of computing it at each step > back up in the tree. > > The only time these 'deviations' make a difference is in case of ko and > ishi-no-shita. When I have a little spare time I'll check to see if this > actually makes a difference in playing strength. If there's any noticeable > difference I'll try to modify the code in my reference implementation to > reflect the 'correct' method. > > Mark > > > > On Jan 17, 2009, at 11:48 AM, Sylvain Gelly wrote: > > Hi, >> >> Sorry for the slow reply. >> After those discussions, I figured out that pseudo code was the >> fastest clear and not ambiguous way to describe how to precisely >> implement the algorithm. I needed to find some time to write it. >> Note that I did not write only the backup phase because to clearly >> describe the backup you have to see the playouts as well. I also >> describe the choice of the best move. >> Note also the fact that the backup from the moves in the tree and from >> the moves out of the tree is different. That is the somehow more >> tricky part to check that a move has not been already played (that is >> different for each node in the tree and we obviously don't want to >> keep a vector of already played moves for each node). >> >> Please forgive the mistakes that are in that code, of course I did not >> make any test, and it has been quite a long time I am not in the >> computer go trip anymore :). Please correct any mistake, >> I hope it makes things clearer. >> >> Best, >> Sylvain >> >> class AmafBoard { >> AmafBoard() { >> offset = 0 >> fill(alread_played, 0) >> } >> Clear() { >> offset++; >> } >> Play(move) { >> already_played[move] = offset >> } >> AlreadyPlayed(move) { >> return already_played[move] == offset >> } >> vector already_played >> int offset >> } >> >> ChooseMove(node, board) { >> bias = 0.015 // I put a random number here, to be tuned >> b = bias * bias / 0.25 >> best_value = -1 >> best_move = PASSMOVE >> for (move in board.allmoves) { >> c = node.child(move).counts >> w = node.child(move).wins >> rc = node.rave_counts[move] >> rw = node.rave_wins[move] >> coefficient = 1 - rc * (rc + c + rc * c * b) >> value = w / c * coef + rw / rc * (1 - coef) // please here take >> care of the c==0 and rc == 0 cases >> if (value > best_value) { >> best_value = value >> best_move = move >> } >> } >> return best_move >> } >> PlayoutInTree(board, moves_played_in_tree, nodes_seen_in_tree) { >> node = tree.root >> while (node) { >> move = ChooseMove(node, board) >> moves_played_in_tree.append(move) >> nodes_seen_in_tree.append(node) >> board.PlayMove(move) >> node = node.child(move) >> } >> } >> PlayoutOutTree(board, AmafBoard played, moves) { >> int pass = 0 >> while (pass < 2) { >> move = board.ChooseMoveAndPlay() >> if (move == PASSMOVE) { >> pass ++ >> continue >> } else { >> pass = 0 >> } >> if (!played.AlreadyPlayed(move)) { >> if (!board.last_move_was_black()) { >> move = -move >> } >> moves.append(move) >> } >> } >> return board.black_wins() >> } >> >> BackupNode(node, index, black_wins, moves_pla
Re: [computer-go] How to "properly" implement RAVE?
A small point: in "PlayoutOutTree", just after "if (!played.AlreadyPlayed(move)) {", there should have a "played.Play(move)". I believe it does not change the final result (as the check is also done in the backup, and the move played in the backup), but I simply forgot that line (that should make moves_played_out_tree smaller). To avoid confusion, I repost the pseudo code with that correction (and hoping the indentation is not broken by the email editor once again). Sylvain class AmafBoard { AmafBoard() { offset = 0 fill(alread_played, 0) } Clear() { offset++; } Play(move) { already_played[move] = offset } AlreadyPlayed(move) { return already_played[move] == offset } vector already_played int offset } ChooseMove(node, board) { bias = 0.015 // I put a random number here, to be tuned b = bias * bias / 0.25 best_value = -1 best_move = PASSMOVE for (move in board.allmoves) { c = node.child(move).counts w = node.child(move).wins rc = node.rave_counts[move] rw = node.rave_wins[move] coefficient = 1 - rc * (rc + c + rc * c * b) value = w / c * coef + rw / rc * (1 - coef) // please here take care of the c==0 and rc == 0 cases if (value > best_value) { best_value = value best_move = move } } return best_move } PlayoutInTree(board, moves_played_in_tree, nodes_seen_in_tree) { node = tree.root while (node) { move = ChooseMove(node, board) moves_played_in_tree.append(move) nodes_seen_in_tree.append(node) board.PlayMove(move) node = node.child(move) } } PlayoutOutTree(board, AmafBoard played, moves) { int pass = 0 while (pass < 2) { move = board.ChooseMoveAndPlay() if (move == PASSMOVE) { pass ++ continue } else { pass = 0 } if (!played.AlreadyPlayed(move)) { played.Play(move) if (!board.last_move_was_black()) { move = -move } moves.append(move) } } return board.black_wins() } BackupNode(node, index, black_wins, moves_played_in_tree, moves_played_out_tree, already_played) { already_played.Clear() win = node.is_black_to_play ? black_wins : !black_wins // normal backup node.wins += win node.counts ++ // rave backup for (j = index; j < moves_played_in_tree.size(); j += 2) { move = moves_played_in_tree[j] if (already_played.AlreadyPlayed(move)) continue already_played.Play(move) node.rave_wins[move] += win node.rave_counts[move] ++ } for (j = 0; j < moves_played_out_tree.size(); ++j) { move = moves_played_out_tree[j] if (!node.is_black_to_play) move = -move // If it is either not the right color or the intersection is // already played we ignore that move for that node if (move < 0 || already_played.AlreadyPlayed(move)) continue already_played.Play(move) node.rave_wins[move] += win node.rave_counts[move] ++ } } Backup(black_wins, nodes_seen_in_tree, moves_played_in_tree, moves_played_out_tree, already_played) { for (i = 0; i < nodes_seen_in_tree.size(); ++i) { node = nodes_seen_in_tree[i] BackupNode(node, i, black_wins, moves_played_in_tree, moves_played_out_tree, already_played) } } OneIteration(board_position, AmafBoard already_played) { board = board_position.Copy() already_played.Clear() vector moves_played_in_tree vector nodes_seen_in_tree PlayoutInTree(&board, &moves_played_in_tree, &nodes_seen_in_tree) vector moves_played_out_tree bool black_wins = PlayoutOutTree(&board, &already_played, &moves_played_out_tree) Backup(black_wins, nodes_seen_in_tree, moves_played_in_tree, moves_played_out_tree, &already_played) } 2009/1/17 Sylvain Gelly : > Hi, > > Sorry for the slow reply. > After those discussions, I figured out that pseudo code was the > fastest clear and not ambiguous way to describe how to precisely > implement the algorithm. I needed to find some time to write it. > Note that I did not write only the backup phase because to clearly > describe the backup you have to see the playouts as well. I also > describe the choice of the best move. > Note also the fact that the backup from the moves in the tree and from > the moves out of the tree is different. That is the somehow more > tricky part to check that a move has not been already played (that is > different for each node in the tree and we obviously don't want to > keep a vector of already played moves for each node). > > Please forgive the mistakes that are in that code, of course I did not > make any test, and it has been quite a long time I am not in the > computer go trip anymore :). Please correct any mistake, > I hope it makes things clearer. > > Best, > Sylvain > > class AmafBoard { > AmafBoard() { >offset = 0 >fill(alread_played, 0) > } > Clear() { >offset++; > } > Play(move) { >already_play
Re: [computer-go] How to "properly" implement RAVE?
Thanks for taking the time Sylvain. I took a quick look to see how your pseudo-code compared to the reference implementation I made. There are a few small differences, and I think the place(s) where I deviated is exactly what confused Daniel Waeber. The first difference is that when I check whether a point has been played, I always start from the position of the root-node, whereas you start from the position of the node where the moves_played_in_tree is played. The second difference is I also don't count a move if the opposite color played on that point first. The result is I only need to compute the alreadyPlayed map once (in my code these are called weightMap and colorMap, I need two maps because I use decreasing weights) instead of computing it at each step back up in the tree. The only time these 'deviations' make a difference is in case of ko and ishi-no-shita. When I have a little spare time I'll check to see if this actually makes a difference in playing strength. If there's any noticeable difference I'll try to modify the code in my reference implementation to reflect the 'correct' method. Mark On Jan 17, 2009, at 11:48 AM, Sylvain Gelly wrote: Hi, Sorry for the slow reply. After those discussions, I figured out that pseudo code was the fastest clear and not ambiguous way to describe how to precisely implement the algorithm. I needed to find some time to write it. Note that I did not write only the backup phase because to clearly describe the backup you have to see the playouts as well. I also describe the choice of the best move. Note also the fact that the backup from the moves in the tree and from the moves out of the tree is different. That is the somehow more tricky part to check that a move has not been already played (that is different for each node in the tree and we obviously don't want to keep a vector of already played moves for each node). Please forgive the mistakes that are in that code, of course I did not make any test, and it has been quite a long time I am not in the computer go trip anymore :). Please correct any mistake, I hope it makes things clearer. Best, Sylvain class AmafBoard { AmafBoard() { offset = 0 fill(alread_played, 0) } Clear() { offset++; } Play(move) { already_played[move] = offset } AlreadyPlayed(move) { return already_played[move] == offset } vector already_played int offset } ChooseMove(node, board) { bias = 0.015 // I put a random number here, to be tuned b = bias * bias / 0.25 best_value = -1 best_move = PASSMOVE for (move in board.allmoves) { c = node.child(move).counts w = node.child(move).wins rc = node.rave_counts[move] rw = node.rave_wins[move] coefficient = 1 - rc * (rc + c + rc * c * b) value = w / c * coef + rw / rc * (1 - coef) // please here take care of the c==0 and rc == 0 cases if (value > best_value) { best_value = value best_move = move } } return best_move } PlayoutInTree(board, moves_played_in_tree, nodes_seen_in_tree) { node = tree.root while (node) { move = ChooseMove(node, board) moves_played_in_tree.append(move) nodes_seen_in_tree.append(node) board.PlayMove(move) node = node.child(move) } } PlayoutOutTree(board, AmafBoard played, moves) { int pass = 0 while (pass < 2) { move = board.ChooseMoveAndPlay() if (move == PASSMOVE) { pass ++ continue } else { pass = 0 } if (!played.AlreadyPlayed(move)) { if (!board.last_move_was_black()) { move = -move } moves.append(move) } } return board.black_wins() } BackupNode(node, index, black_wins, moves_played_in_tree, moves_played_out_tree, already_played) { already_played.Clear() win = node.is_black_to_play ? black_wins : !black_wins // normal backup node.wins += win node.counts ++ // rave backup for (j = index; j < moves_played_in_tree.size(); j += 2) { move = moves_played_in_tree[j] if (already_played.AlreadyPlayed(move)) continue already_played.Play(move) node.rave_wins[move] += win node.rave_counts[move] ++ } for (j = 0; j < moves_played_out_tree.size(); ++j) { move = moves_played_out_tree[j] if (!node.is_black_to_play) move = -move // If it is either not the right color or the intersection is already played we ignore that move for that node if (move < 0 || already_played.AlreadyPlayed(move)) continue already_played.Play(move) node.rave_wins[move] += win node.rave_counts[move] ++ } } Backup(black_wins, nodes_seen_in_tree, moves_played_in_tree, moves_played_out_tree, already_played) { for (i = 0; i < nodes_seen_in_tree.size(); ++i) { node = nodes_seen_in_tree[i] BackupNode(node, i, black_wins, moves_played_in_tree, moves_played_out_tree, already_played) } } OneIteration(board_position, AmafBoard already_played) { board = board_position.Copy() already_played.Clear() vector moves_played_in_tree vector nodes_seen_in_tree PlayoutInTree(&board, &moves_playe
Re: [computer-go] How to "properly" implement RAVE?
> Hi, > > Sorry for the slow reply. Hi I'd prefer quality over speed anytime. ;) Your pseudo code is excellent and helped me to understand some of the trickier things. Thanks again! I think I will now be able to implement my own version. :) Regards, Isaac -- Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
Hi, Sorry for the slow reply. After those discussions, I figured out that pseudo code was the fastest clear and not ambiguous way to describe how to precisely implement the algorithm. I needed to find some time to write it. Note that I did not write only the backup phase because to clearly describe the backup you have to see the playouts as well. I also describe the choice of the best move. Note also the fact that the backup from the moves in the tree and from the moves out of the tree is different. That is the somehow more tricky part to check that a move has not been already played (that is different for each node in the tree and we obviously don't want to keep a vector of already played moves for each node). Please forgive the mistakes that are in that code, of course I did not make any test, and it has been quite a long time I am not in the computer go trip anymore :). Please correct any mistake, I hope it makes things clearer. Best, Sylvain class AmafBoard { AmafBoard() { offset = 0 fill(alread_played, 0) } Clear() { offset++; } Play(move) { already_played[move] = offset } AlreadyPlayed(move) { return already_played[move] == offset } vector already_played int offset } ChooseMove(node, board) { bias = 0.015 // I put a random number here, to be tuned b = bias * bias / 0.25 best_value = -1 best_move = PASSMOVE for (move in board.allmoves) { c = node.child(move).counts w = node.child(move).wins rc = node.rave_counts[move] rw = node.rave_wins[move] coefficient = 1 - rc * (rc + c + rc * c * b) value = w / c * coef + rw / rc * (1 - coef) // please here take care of the c==0 and rc == 0 cases if (value > best_value) { best_value = value best_move = move } } return best_move } PlayoutInTree(board, moves_played_in_tree, nodes_seen_in_tree) { node = tree.root while (node) { move = ChooseMove(node, board) moves_played_in_tree.append(move) nodes_seen_in_tree.append(node) board.PlayMove(move) node = node.child(move) } } PlayoutOutTree(board, AmafBoard played, moves) { int pass = 0 while (pass < 2) { move = board.ChooseMoveAndPlay() if (move == PASSMOVE) { pass ++ continue } else { pass = 0 } if (!played.AlreadyPlayed(move)) { if (!board.last_move_was_black()) { move = -move } moves.append(move) } } return board.black_wins() } BackupNode(node, index, black_wins, moves_played_in_tree, moves_played_out_tree, already_played) { already_played.Clear() win = node.is_black_to_play ? black_wins : !black_wins // normal backup node.wins += win node.counts ++ // rave backup for (j = index; j < moves_played_in_tree.size(); j += 2) { move = moves_played_in_tree[j] if (already_played.AlreadyPlayed(move)) continue already_played.Play(move) node.rave_wins[move] += win node.rave_counts[move] ++ } for (j = 0; j < moves_played_out_tree.size(); ++j) { move = moves_played_out_tree[j] if (!node.is_black_to_play) move = -move // If it is either not the right color or the intersection is already played we ignore that move for that node if (move < 0 || already_played.AlreadyPlayed(move)) continue already_played.Play(move) node.rave_wins[move] += win node.rave_counts[move] ++ } } Backup(black_wins, nodes_seen_in_tree, moves_played_in_tree, moves_played_out_tree, already_played) { for (i = 0; i < nodes_seen_in_tree.size(); ++i) { node = nodes_seen_in_tree[i] BackupNode(node, i, black_wins, moves_played_in_tree, moves_played_out_tree, already_played) } } OneIteration(board_position, AmafBoard already_played) { board = board_position.Copy() already_played.Clear() vector moves_played_in_tree vector nodes_seen_in_tree PlayoutInTree(&board, &moves_played_in_tree, &nodes_seen_in_tree) vector moves_played_out_tree bool black_wins = PlayoutOutTree(&board, &already_played, &moves_played_out_tree) Backup(black_wins, nodes_seen_in_tree, moves_played_in_tree, moves_played_out_tree, &already_played) } ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
On 12:53 Thu 15 Jan , Mark Boon wrote: > > On Jan 15, 2009, at 12:33 PM, Daniel Waeber wrote: > > > yes, but the weight/color maps stay the same for all updated nodes. > > > > I think the first playoutNode (the one most deep inside the tree) only > > should get amaf values for the random playout, the next one form > > random > > playout + from the first playoutNode ... and the root node amaf values > > form all the nodes. > > OK, I think I see now what you're trying to say. This is something I > did think about. I hope my memory serves me well enough to say why I > didn't do it that way. > > - What you propose adds complexity and possibly computation (if it > means recalculating or adjusting the weight map). > - I don't think it makes all that much difference. > > The reason I don't think it makes much difference is that adding AMAF > values for the moves above (closer to the root) the playoutNode is > that most likely those points are now occupied. Since the AMAF values > are used to compute which empty points are the best next candidate, > the AMAF values at occupied points are immaterial. They are not even > added. So it only makes a difference in cases where played stones get > captured and those points then are occupied again. This brings us back > to the issue discussed earlier about ko and ishi-no-shita. Yes, I don't know if it would be a big improvement. It is not a real speed decrease, as you just move the point were you add the additional values to the maps. Perhaps I'll run some tests when I have the time. > Mark > > P.S. what do I need to open that file? Is it a SVN patch? it's standard diff format, outputed by git. You can view it with a text editor and apply it with unix "patch" commandline tool or use eclipse->team->apply-patch. it's done agains svn-root, so you have to ignore 2 leading pathnames if you apply it inside the refbot folder. regards wabu ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
On Jan 15, 2009, at 12:33 PM, Daniel Waeber wrote: yes, but the weight/color maps stay the same for all updated nodes. I think the first playoutNode (the one most deep inside the tree) only should get amaf values for the random playout, the next one form random playout + from the first playoutNode ... and the root node amaf values form all the nodes. OK, I think I see now what you're trying to say. This is something I did think about. I hope my memory serves me well enough to say why I didn't do it that way. - What you propose adds complexity and possibly computation (if it means recalculating or adjusting the weight map). - I don't think it makes all that much difference. The reason I don't think it makes much difference is that adding AMAF values for the moves above (closer to the root) the playoutNode is that most likely those points are now occupied. Since the AMAF values are used to compute which empty points are the best next candidate, the AMAF values at occupied points are immaterial. They are not even added. So it only makes a difference in cases where played stones get captured and those points then are occupied again. This brings us back to the issue discussed earlier about ko and ishi-no-shita. Mark P.S. what do I need to open that file? Is it a SVN patch? ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
Hi, On 11:24 Thu 15 Jan , Mark Boon wrote: > > On Jan 15, 2009, at 10:47 AM, Daniel Waeber wrote: > > > Thanks you. I think that I understand it now :) > > > > On 23:21 Wed 14 Jan , Mark Boon wrote: > >> You have to understand that the 'start' variable really starts at the > >> root from the position for which we do the search. So all the moves > >> 'below' the playoutNode are also taken into account. The check in the > >> earlier part "if (_colorMap[moveXY]==0)" makes sure there's no move > >> between the playoutNode and the n.move as you call it. > > > > Ah, yes, I did not get the meaning of start right. But still I think > > you > > have to incrementally add values to the maps as you go down the tree. > > But it does that. This happens when playoutNode is set to its parent > and the AMAF values are added again (now for the other color) until > the root is reached. yes, but the weight/color maps stay the same for all updated nodes. I think the first playoutNode (the one most deep inside the tree) only should get amaf values for the random playout, the next one form random playout + from the first playoutNode ... and the root node amaf values form all the nodes. I attached code for what I would do, but had no time to test it. > > I think there's a problem with caputres/ko-fights inside the tree. > > All nodes after the capture should get the amaf color/value for the > > stones put there after the capture and not the value of the stones > > that > > were put there and captured. > > > > Most likely I missed a little piece of code again, but hey, perhaps > > its > > a real bug. > > I did stop to think about ko and 'ishi no shita' (something like > 'playing under the stones') a little bit but couldn't come to an > immediate conclusion what would be the best thing to do. So I decided > to leave it as it is. You didn't miss any little piece of code there. > Maybe there's room for improvement in case of ko, but my guess is the > difference will be minimal at best. If you find it does make a big > difference everyone here will be delighted to hear it. > > Given how it's performing I doubt there are major problems with my > AMAF-RAVE implementation. regards wabu > > Mark > diff --git a/TesujiRefBot/source/tesuji/games/go/reference/search/MonteCarloTreeSearch.java b/TesujiRefBot/source/tesuji/games/go/reference/search/MonteCarloTreeSearch.java index 5e47e00..ee650e1 100644 --- a/TesujiRefBot/source/tesuji/games/go/reference/search/MonteCarloTreeSearch.java +++ b/TesujiRefBot/source/tesuji/games/go/reference/search/MonteCarloTreeSearch.java @@ -566,6 +566,7 @@ public class MonteCarloTreeSearch TreeNode> node = getNodeToExpand(_rootNode, _searchAdministration); TreeNode> playoutNode = node; + int start = _searchAdministration.getMoveStack().getSize(); // begin at playoutNode if (node != null) { @@ -580,7 +581,6 @@ public class MonteCarloTreeSearch { IntStack playoutMoves = _searchAdministration.getMoveStack(); byte color = _monteCarloAdministration.getColorToMove(); - int start = _monteCarloAdministration.getMoveStack().getSize(); int end = playoutMoves.getSize(); double weight = 1.0; double weightDelta = 1.0 / (end - start + 1); // Michael Williams' idea to use decreasing weights @@ -613,6 +613,13 @@ public class MonteCarloTreeSearch if (_colorMap[xy]==color) result.increaseVirtualPlayouts(_weightMap[xy]*score,_weightMap[xy]); } + // add the move to the maps now + GoMove move = (GoMove) playoutNode.getContent().getMove(); + int xy = move.getXY(); + byte col = move.getColor(); + _colorMap[xy] = col; + _weightMap[xy] = 1.0; //TODO: handle weight + playoutNode = playoutNode.getParent(); } } ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
On Jan 15, 2009, at 10:47 AM, Daniel Waeber wrote: Thanks you. I think that I understand it now :) On 23:21 Wed 14 Jan , Mark Boon wrote: You have to understand that the 'start' variable really starts at the root from the position for which we do the search. So all the moves 'below' the playoutNode are also taken into account. The check in the earlier part "if (_colorMap[moveXY]==0)" makes sure there's no move between the playoutNode and the n.move as you call it. Ah, yes, I did not get the meaning of start right. But still I think you have to incrementally add values to the maps as you go down the tree. But it does that. This happens when playoutNode is set to its parent and the AMAF values are added again (now for the other color) until the root is reached. I think there's a problem with caputres/ko-fights inside the tree. All nodes after the capture should get the amaf color/value for the stones put there after the capture and not the value of the stones that were put there and captured. Most likely I missed a little piece of code again, but hey, perhaps its a real bug. I did stop to think about ko and 'ishi no shita' (something like 'playing under the stones') a little bit but couldn't come to an immediate conclusion what would be the best thing to do. So I decided to leave it as it is. You didn't miss any little piece of code there. Maybe there's room for improvement in case of ko, but my guess is the difference will be minimal at best. If you find it does make a big difference everyone here will be delighted to hear it. Given how it's performing I doubt there are major problems with my AMAF-RAVE implementation. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
Thanks you. I think that I understand it now :) On 23:21 Wed 14 Jan , Mark Boon wrote: > You have to understand that the 'start' variable really starts at the > root from the position for which we do the search. So all the moves > 'below' the playoutNode are also taken into account. The check in the > earlier part "if (_colorMap[moveXY]==0)" makes sure there's no move > between the playoutNode and the n.move as you call it. Ah, yes, I did not get the meaning of start right. But still I think you have to incrementally add values to the maps as you go down the tree. I think there's a problem with caputres/ko-fights inside the tree. All nodes after the capture should get the amaf color/value for the stones put there after the capture and not the value of the stones that were put there and captured. Most likely I missed a little piece of code again, but hey, perhaps its a real bug. > That is, if I understand you correctly. Because I'm not quite sure > what you mean by 'finding all these nodes n'. This is not a sub-set of > RAVE as I understand it. What you see in the code is the accumulation > of the AMAF values after each playout. The RAVE value is calculated > using these values when comparing move-candidates, which is in an > altogether different place. (The MonteCarloTreeSearchResult class in > my project). > > I'm afraid I haven't made things much clearer for you. The problem is that I was confused, but now, as I understand it, all your and all the other comments suddenly make sense ;) > Mark Thank again, wabu ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
On Jan 14, 2009, at 10:55 PM, Daniel Waeber wrote: Accessing the AMAF values depends on your implementation. The following is a code-snippet from my MCTS reference implementation that updates the AMAF values in the tree: if (_useAMAF) { IntStack playoutMoves = _searchAdministration.getMoveStack(); byte color = _monteCarloAdministration.getColorToMove(); int start = _monteCarloAdministration.getMoveStack().getSize(); int end = playoutMoves.getSize(); double weight = 1.0; double weightDelta = 1.0 / (end - start + 1); // Michael Williams' idea to use decreasing weights GoArray.clear(_weightMap); GoArray.clear(_colorMap); for (int i=start; i until here it's clear to me. OK, I hope so. Until here it's pretty much the same as in the original AMAF ref-bot without tree-search as defined by Don. while (playoutNode!=null) { color = opposite(playoutNode.getContent().getMove().getColor()); boolean playerWins = (blackWins && color==BLACK) || (!blackWins && color==WHITE); double score = playerWins ? MonteCarloTreeSearchResult.MAX_SCORE : MonteCarloTreeSearchResult.MIN_SCORE; for (int i=0; i TreeNode> nextNode = playoutNode.getChildAt(i); MonteCarloTreeSearchResult result = nextNode.getContent(); GoMove move = (GoMove) result.getMove(); int xy = move.getXY(); if (_colorMap[xy]==color) result.increaseVirtualPlayouts(_weightMap[xy]*score,_weightMap[xy]); if i understand this code correctly, it updates the amaf vaules of all direct children of the playoutNode according to the weight/color maps. Yes. And that update is done for all nodes on the selected path. Yes, until the root, which is the starting position from which the search is performed. } playoutNode = playoutNode.getParent(); First of all, I miss an weight/colorMap update for xy here. Souldn't the move of the current playoutNode be considered as an amaf move for all the nodes below this one? This is in fact done in this code, see further remarks below. } } But, most of all, I miss that the code only updates the amaf values of all direct children, and not of all nodes n below the playoutNode, where there is no play at n.move on the path between n and the playoutNode. Finding all these nodes n would be a costy thing to do, but wouldn't that be the "right" thing to do? Implementing a realistic subset of RAVE is another story, but first of all I want to understand the pure concept of RAVE. You have to understand that the 'start' variable really starts at the root from the position for which we do the search. So all the moves 'below' the playoutNode are also taken into account. The check in the earlier part "if (_colorMap[moveXY]==0)" makes sure there's no move between the playoutNode and the n.move as you call it. That is, if I understand you correctly. Because I'm not quite sure what you mean by 'finding all these nodes n'. This is not a sub-set of RAVE as I understand it. What you see in the code is the accumulation of the AMAF values after each playout. The RAVE value is calculated using these values when comparing move-candidates, which is in an altogether different place. (The MonteCarloTreeSearchResult class in my project). I'm afraid I haven't made things much clearer for you. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
Ok, I still have same questions about the refbot code. On 10:29 Wed 14 Jan , Mark Boon wrote: > > On Jan 14, 2009, at 9:36 AM, Daniel Waeber wrote: > > > I have a question about the the part were the stats are updated. > > (l.15-25). having an array of amaf-values in every node seems very > > memory > > intensive and I don't get how you would access these values. > > You are right, it is memory intensive. I believe this is one of the > reasons that most implementations wait a certain number of playouts > before creating the next level of nodes. > > form http://pastie.org/pastes/357231 : > > node[visitedNode[i]].AMAFSum[visitedPos[j]]+=result; > > node[visitedNode[i]].AMAFPlayed[visitedPos[j]]+=1; I had some problems with these lines. looks to me like the nodes have an array of boardsize*boardsize inside *all* nodes, increasing the memory of the tree by a factor >10. But, correct me if I'm wrong, the refbot code just adds two simple values to the node for rave. > Accessing the AMAF values depends on your implementation. The > following is a code-snippet from my MCTS reference implementation that > updates the AMAF values in the tree: > > if (_useAMAF) > { > IntStack playoutMoves = _searchAdministration.getMoveStack(); > byte color = _monteCarloAdministration.getColorToMove(); > int start = _monteCarloAdministration.getMoveStack().getSize(); > int end = playoutMoves.getSize(); > double weight = 1.0; > double weightDelta = 1.0 / (end - start + 1); // Michael Williams' idea > to use decreasing weights > GoArray.clear(_weightMap); > GoArray.clear(_colorMap); > for (int i=start; i { > int moveXY = playoutMoves.get(i); > if (_colorMap[moveXY]==0) > { > _colorMap[moveXY] = color; > _weightMap[moveXY] = weight; > } > weight -= weightDelta; > color = opposite(color); > } until here it's clear to me. > while (playoutNode!=null) > { > color = opposite(playoutNode.getContent().getMove().getColor()); > boolean playerWins = (blackWins && color==BLACK) || (!blackWins > && color==WHITE); > double score = playerWins ? > MonteCarloTreeSearchResult.MAX_SCORE : MonteCarloTreeSearchResult.MIN_SCORE; > for (int i=0; i { > TreeNode> nextNode > = playoutNode.getChildAt(i); > MonteCarloTreeSearchResult result = > nextNode.getContent(); > GoMove move = (GoMove) result.getMove(); > int xy = move.getXY(); > if (_colorMap[xy]==color) > > result.increaseVirtualPlayouts(_weightMap[xy]*score,_weightMap[xy]); if i understand this code correctly, it updates the amaf vaules of all direct children of the playoutNode according to the weight/color maps. And that update is done for all nodes on the selected path. > } > playoutNode = playoutNode.getParent(); First of all, I miss an weight/colorMap update for xy here. Souldn't the move of the current playoutNode be considered as an amaf move for all the nodes below this one? > } > } But, most of all, I miss that the code only updates the amaf values of all direct children, and not of all nodes n below the playoutNode, where there is no play at n.move on the path between n and the playoutNode. Finding all these nodes n would be a costy thing to do, but wouldn't that be the "right" thing to do? Implementing a realistic subset of RAVE is another story, but first of all I want to understand the pure concept of RAVE. > playoutNode is the move-node from which the playout was done. The amaf > values are stored in its children by the increaseVirtualPlayout() > method. Note that it goes up the tree by assigning the parent to > playoutNode until it gets to the root. > > For more context it would be better to lookup the whole source at > http://plug-and-go.dev.java.net > If you think some more comments in the code could clarify things > better I'm open to suggestions. Thanks for the code, didn't know amaf already is inside the mcts refbot. > > Good luck. > > Mark Regards, wabu ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
> Hi, > > I'm also interested at the same thing. Glad I'm not alone. ;) > > It sounds good but it seems to lack the check of whether a given move is > > first played in a given intersection. When you add that, it because a > little > > more tricky to update in the tree. I see, I missed that. I actually prune second (third, fourth...) moves at the same intersection in the playouts. However, this means that I can't just count every 2nd move as a move for the given node color, so I'd have to change the update loop. >>You also update the value of each > move > > independently of the color, i.e. a position in which it is black turn > will > > be update with white moves. You should restrict the update. This I don't really understand. Are you talking about the issue raised by basically doing j+=2 in the loop instead of checking whether the move color is the same as the node color? If yes, I understand. > I have a question about the the part were the stats are updated. > (l.15-25). having an array of amaf-values in every node seems very memory > intensive and I don't get how you would access these values. > > Something like searching for all nodes below the visited node with the > same > pos+color as the ones visited and updating amaf-stats directly in these > nodes > would make more sense to me. (but also costs more cpu time) I have been thinking *exactly* the same thing. When you keep the values in the node, it's easiest to keep boardsize*boardsize values to be able to access AMAF values directly by the childrens' positions. This wastes a bit of memory, but not too much. Keeping the value in the children is more cpu expensive but saves memory. For me, the first way seems easier to implement. How would you exactly do "searching for all nodes below the visited node with the same pos+color as the ones visited"? I think by the "visited node" you mean the first move played after root, is that correct? On a side note, I found that using plain AMAF for MCTS (so a single AMAF table for all nodes in the tree) actually made my program play about 100 elo *worse*! Clearly, it's necessary to implement proper RAVE to get a benefit. Thanks again, Isaac -- Sensationsangebot verlängert: GMX FreeDSL - Telefonanschluss + DSL für nur 16,37 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K1308T4569a ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
On Jan 14, 2009, at 9:36 AM, Daniel Waeber wrote: I have a question about the the part were the stats are updated. (l.15-25). having an array of amaf-values in every node seems very memory intensive and I don't get how you would access these values. You are right, it is memory intensive. I believe this is one of the reasons that most implementations wait a certain number of playouts before creating the next level of nodes. Accessing the AMAF values depends on your implementation. The following is a code-snippet from my MCTS reference implementation that updates the AMAF values in the tree: if (_useAMAF) { IntStack playoutMoves = _searchAdministration.getMoveStack(); byte color = _monteCarloAdministration.getColorToMove(); int start = _monteCarloAdministration.getMoveStack().getSize(); int end = playoutMoves.getSize(); double weight = 1.0; double weightDelta = 1.0 / (end - start + 1); // Michael Williams' idea to use decreasing weights GoArray.clear(_weightMap); GoArray.clear(_colorMap); for (int i=start; i boolean playerWins = (blackWins && color==BLACK) || (!blackWins && color==WHITE); double score = playerWins ? MonteCarloTreeSearchResult.MAX_SCORE : MonteCarloTreeSearchResult.MIN_SCORE; for (int i=0; i TreeNode> nextNode = playoutNode.getChildAt(i); MonteCarloTreeSearchResult result = nextNode.getContent(); GoMove move = (GoMove) result.getMove(); int xy = move.getXY(); if (_colorMap[xy]==color) result.increaseVirtualPlayouts(_weightMap[xy]*score,_weightMap[xy]); } playoutNode = playoutNode.getParent(); } } playoutNode is the move-node from which the playout was done. The amaf values are stored in its children by the increaseVirtualPlayout() method. Note that it goes up the tree by assigning the parent to playoutNode until it gets to the root. For more context it would be better to lookup the whole source at http://plug-and-go.dev.java.net If you think some more comments in the code could clarify things better I'm open to suggestions. Good luck. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
Hi, I'm also interested at the same thing. > > > > > I tried putting this into pseudo code, but it already looks like C. ;p > > http://pastie.org/357231 > > If you could look at it, I would be most grateful. > > It sounds good but it seems to lack the check of whether a given move is > first played in a given intersection. When you add that, it because a little > more tricky to update in the tree. You also update the value of each move > independently of the color, i.e. a position in which it is black turn will > be update with white moves. You should restrict the update. I have a question about the the part were the stats are updated. (l.15-25). having an array of amaf-values in every node seems very memory intensive and I don't get how you would access these values. Something like searching for all nodes below the visited node with the same pos+color as the ones visited and updating amaf-stats directly in these nodes would make more sense to me. (but also costs more cpu time) Hope I don't bring in more confusion into the topic, but I really would like to get rave done right. > Hopes that helps, > Sylvain > > > > > > > > -Isaac > > thanks for you help wabu ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
2009/1/10 Isaac Deutsch > Hi Sylvain, > > I think it's starting to make sense now. :-) > > > > Sorry to be unclear. I wish we have a white board where we could discuss > > and > > that would sorted out in a few minutes :). > > Several results turn up in a google search ;p > http://www.google.com/search?q=online+white+board > > > > What I tried to mean is that when you do the backup for a given node, you > > look at the part of the playout that happen after that node (including > > that > > node), and you do a normal AMAF backup for that part of the playout. > > Does it make sense? > > I hope we make progress and I am not making things more confusing :). > > I should write a pseudo code I guess, but for today I am too lazy :). > > > > Sylvain > > I tried putting this into pseudo code, but it already looks like C. ;p > http://pastie.org/357231 > If you could look at it, I would be most grateful. It sounds good but it seems to lack the check of whether a given move is first played in a given intersection. When you add that, it because a little more tricky to update in the tree. You also update the value of each move independently of the color, i.e. a position in which it is black turn will be update with white moves. You should restrict the update. Hopes that helps, Sylvain > > > -Isaac > > -- > Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: > http://www.gmx.net/de/go/multimessenger > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
Hi Sylvain, I think it's starting to make sense now. :-) > Sorry to be unclear. I wish we have a white board where we could discuss > and > that would sorted out in a few minutes :). Several results turn up in a google search ;p http://www.google.com/search?q=online+white+board > What I tried to mean is that when you do the backup for a given node, you > look at the part of the playout that happen after that node (including > that > node), and you do a normal AMAF backup for that part of the playout. > Does it make sense? > I hope we make progress and I am not making things more confusing :). > I should write a pseudo code I guess, but for today I am too lazy :). > > Sylvain I tried putting this into pseudo code, but it already looks like C. ;p http://pastie.org/357231 If you could look at it, I would be most grateful. -Isaac -- Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
Hi Isaac, 2009/1/9 Isaac Deutsch > Hi Sylvain, > > Thanks for your quick answer. > > > > in a nutshell RAVE is basically AMAF adapted for Monte Carlo Tree Search. > > The original paper describing it is > > http://www.machinelearning.org/proceedings/icml2007/papers/387.pdf and a > > paper for "broader audience" can be found here: > > http://www.lri.fr/~gelly/paper/MoGoNectar.pdf (the picture you posted > > comes > > from that paper). > > Yes, I took a screenshot. Another paper I looked at was > http://www.lri.fr/~teytaud/eg.pdf > > > > There are two important parts in the algorithms: the backup and the use > of > > the RAVE value. The second is the hardest to tune and to make it right. > > The > > proposed way of using the values in the original paper is not optimal > > (while > > already very useful). A much better way (especially in 19x19) has been > > described on that list by David Silver. > > Do you mean the calculation of the factor beta that the RAVE value is > multiplied with? > > > > For the backup (as it is your original question), for each node traversed > > by > > the simulation, back up the values exactly as it would be done in AMAF > > *if* > > the playout began at that node. Note that I call playout the whole > > simulation starting from the root and going to the end of the game. > > I see what you mean with the playout going from the root to the end of the > game. > How do you mean "back up the values ... if the playout began at that node"? > Since every playout starts > at the root (in my program, the root is the (previous) move of the opponent > player), wouldn't that mean > only updating the RAVE statistics for the root? I'm sorry if this question > sounds stupid. Sorry to be unclear. I wish we have a white board where we could discuss and that would sorted out in a few minutes :). In the quote of my sentence you did not put the "as" of the "as if the playout began..." (the "as" and the "if" where separated by a part of the sentence, which did not make things clearer, sorry...) What I tried to mean is that when you do the backup for a given node, you look at the part of the playout that happen after that node (including that node), and you do a normal AMAF backup for that part of the playout. Does it make sense? > > - Count only moves that happen after the node. > > How do you measure if a move is "after" another move? The amount of moves > taken from the root (i.e. the depth of the node in the tree/the playout)? Or > do you mean that the node is effectively a (grand-grand-...) father of the > move, so the playout has visited that node? By "after" I mean after in the sequence. If the playout is E5, A7, C4, D8, by "after" I mean that C4 is after E5, but not after D8. I hope we make progress and I am not making things more confusing :). I should write a pseudo code I guess, but for today I am too lazy :). Sylvain > I hope it will help you write a correct implementation. Don't hesitate to > > ask for precisions. > > I really appreciate your help. > > -ibd > -- > Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: > http://www.gmx.net/de/go/multimessenger > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
Hi Sylvain, Thanks for your quick answer. > in a nutshell RAVE is basically AMAF adapted for Monte Carlo Tree Search. > The original paper describing it is > http://www.machinelearning.org/proceedings/icml2007/papers/387.pdf and a > paper for "broader audience" can be found here: > http://www.lri.fr/~gelly/paper/MoGoNectar.pdf (the picture you posted > comes > from that paper). Yes, I took a screenshot. Another paper I looked at was http://www.lri.fr/~teytaud/eg.pdf > There are two important parts in the algorithms: the backup and the use of > the RAVE value. The second is the hardest to tune and to make it right. > The > proposed way of using the values in the original paper is not optimal > (while > already very useful). A much better way (especially in 19x19) has been > described on that list by David Silver. Do you mean the calculation of the factor beta that the RAVE value is multiplied with? > For the backup (as it is your original question), for each node traversed > by > the simulation, back up the values exactly as it would be done in AMAF > *if* > the playout began at that node. Note that I call playout the whole > simulation starting from the root and going to the end of the game. I see what you mean with the playout going from the root to the end of the game. How do you mean "back up the values ... if the playout began at that node"? Since every playout starts at the root (in my program, the root is the (previous) move of the opponent player), wouldn't that mean only updating the RAVE statistics for the root? I'm sorry if this question sounds stupid. > - Count only moves that happen after the node. How do you measure if a move is "after" another move? The amount of moves taken from the root (i.e. the depth of the node in the tree/the playout)? Or do you mean that the node is effectively a (grand-grand-...) father of the move, so the playout has visited that node? > I hope it will help you write a correct implementation. Don't hesitate to > ask for precisions. I really appreciate your help. -ibd -- Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to "properly" implement RAVE?
Hi Isaac, in a nutshell RAVE is basically AMAF adapted for Monte Carlo Tree Search. The original paper describing it is http://www.machinelearning.org/proceedings/icml2007/papers/387.pdf and a paper for "broader audience" can be found here: http://www.lri.fr/~gelly/paper/MoGoNectar.pdf (the picture you posted comes from that paper). I think your description of RAVE is not completely correct, or at least I don't understand it completely :). If I understand that sentence "only the siblings of the last node in the tree are updated accordingly", then it is wrong. There are two important parts in the algorithms: the backup and the use of the RAVE value. The second is the hardest to tune and to make it right. The proposed way of using the values in the original paper is not optimal (while already very useful). A much better way (especially in 19x19) has been described on that list by David Silver. For the backup (as it is your original question), for each node traversed by the simulation, back up the values exactly as it would be done in AMAF *if* the playout began at that node. Note that I call playout the whole simulation starting from the root and going to the end of the game. Things you have to take care about, for each node, including the root, including the last node in the tree (most of them obvious, but believe it is really easy to forget small details and get something totally useless): - Count only moves that happen after the node. - Count only a move if it is played first on the intersection. Be careful with captures, especially if they occur within the tree. It is really easy to mess up the statistics. - Count only a move if it is the same color of the node (if in the position of the node, black is to play, count only black moves for that node) - Count all moves that happen after the node, including those happening within the tree, and including the move that happen just after the node. I hope it will help you write a correct implementation. Don't hesitate to ask for precisions. Sylvain 2009/1/9 Isaac Deutsch > I'm a bit confused about the difference between AMAF and RAVE (if there's > one). > As far as I understand, with AMAF, you sample in each playout (after it > leaves > the tree) the moves played (by both players), but only the first move at > any > position, then you reward all moves played either by a win or loss, > depending on > their colors. > > I tried comparing this to RAVE, as described in many papers. There might be > multiple definitions of RAVE, but the one that seems the most clear to me > is > this one (picture used because of math stuff): > > http://img352.imageshack.us/img352/443/bild1pb3.png > > Is it correct that with RAVE, after a playout (after the tree), only the > siblings of the last node in the tree are updated accordingly? The main > difference to AMAF would be that instead of a single array with values of > AMAFsumReward and AMAFnumPlayed, each node keeps for each of its children > separate variables that hold these values. When a playout is finished, only > the > values of these children are updated instead of the single array. > > I hope you're able to make any sense out of this, sometimes a text can be > confusing when the writer is confused. ;p > > Cheers, ibd > -- > Sensationsangebot verlängert: GMX FreeDSL - Telefonanschluss + DSL > für nur 16,37 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K1308T4569a > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] How to "properly" implement RAVE?
I'm a bit confused about the difference between AMAF and RAVE (if there's one). As far as I understand, with AMAF, you sample in each playout (after it leaves the tree) the moves played (by both players), but only the first move at any position, then you reward all moves played either by a win or loss, depending on their colors. I tried comparing this to RAVE, as described in many papers. There might be multiple definitions of RAVE, but the one that seems the most clear to me is this one (picture used because of math stuff): http://img352.imageshack.us/img352/443/bild1pb3.png Is it correct that with RAVE, after a playout (after the tree), only the siblings of the last node in the tree are updated accordingly? The main difference to AMAF would be that instead of a single array with values of AMAFsumReward and AMAFnumPlayed, each node keeps for each of its children separate variables that hold these values. When a playout is finished, only the values of these children are updated instead of the single array. I hope you're able to make any sense out of this, sometimes a text can be confusing when the writer is confused. ;p Cheers, ibd -- Sensationsangebot verlängert: GMX FreeDSL - Telefonanschluss + DSL für nur 16,37 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K1308T4569a ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/