Re: [computer-go] How to properly implement RAVE?
Thanks you. I think that I understand it now :) On 23:21 Wed 14 Jan , Mark Boon wrote: You have to understand that the 'start' variable really starts at the root from the position for which we do the search. So all the moves 'below' the playoutNode are also taken into account. The check in the earlier part if (_colorMap[moveXY]==0) makes sure there's no move between the playoutNode and the n.move as you call it. Ah, yes, I did not get the meaning of start right. But still I think you have to incrementally add values to the maps as you go down the tree. I think there's a problem with caputres/ko-fights inside the tree. All nodes after the capture should get the amaf color/value for the stones put there after the capture and not the value of the stones that were put there and captured. Most likely I missed a little piece of code again, but hey, perhaps its a real bug. That is, if I understand you correctly. Because I'm not quite sure what you mean by 'finding all these nodes n'. This is not a sub-set of RAVE as I understand it. What you see in the code is the accumulation of the AMAF values after each playout. The RAVE value is calculated using these values when comparing move-candidates, which is in an altogether different place. (The MonteCarloTreeSearchResult class in my project). I'm afraid I haven't made things much clearer for you. The problem is that I was confused, but now, as I understand it, all your and all the other comments suddenly make sense ;) Mark Thank again, wabu ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to properly implement RAVE?
Hi, On 11:24 Thu 15 Jan , Mark Boon wrote: On Jan 15, 2009, at 10:47 AM, Daniel Waeber wrote: Thanks you. I think that I understand it now :) On 23:21 Wed 14 Jan , Mark Boon wrote: You have to understand that the 'start' variable really starts at the root from the position for which we do the search. So all the moves 'below' the playoutNode are also taken into account. The check in the earlier part if (_colorMap[moveXY]==0) makes sure there's no move between the playoutNode and the n.move as you call it. Ah, yes, I did not get the meaning of start right. But still I think you have to incrementally add values to the maps as you go down the tree. But it does that. This happens when playoutNode is set to its parent and the AMAF values are added again (now for the other color) until the root is reached. yes, but the weight/color maps stay the same for all updated nodes. I think the first playoutNode (the one most deep inside the tree) only should get amaf values for the random playout, the next one form random playout + from the first playoutNode ... and the root node amaf values form all the nodes. I attached code for what I would do, but had no time to test it. I think there's a problem with caputres/ko-fights inside the tree. All nodes after the capture should get the amaf color/value for the stones put there after the capture and not the value of the stones that were put there and captured. Most likely I missed a little piece of code again, but hey, perhaps its a real bug. I did stop to think about ko and 'ishi no shita' (something like 'playing under the stones') a little bit but couldn't come to an immediate conclusion what would be the best thing to do. So I decided to leave it as it is. You didn't miss any little piece of code there. Maybe there's room for improvement in case of ko, but my guess is the difference will be minimal at best. If you find it does make a big difference everyone here will be delighted to hear it. Given how it's performing I doubt there are major problems with my AMAF-RAVE implementation. regards wabu Mark diff --git a/TesujiRefBot/source/tesuji/games/go/reference/search/MonteCarloTreeSearch.java b/TesujiRefBot/source/tesuji/games/go/reference/search/MonteCarloTreeSearch.java index 5e47e00..ee650e1 100644 --- a/TesujiRefBot/source/tesuji/games/go/reference/search/MonteCarloTreeSearch.java +++ b/TesujiRefBot/source/tesuji/games/go/reference/search/MonteCarloTreeSearch.java @@ -566,6 +566,7 @@ public class MonteCarloTreeSearchMoveType extends Move TreeNodeMonteCarloTreeSearchResultMoveType node = getNodeToExpand(_rootNode, _searchAdministration); TreeNodeMonteCarloTreeSearchResultMoveType playoutNode = node; + int start = _searchAdministration.getMoveStack().getSize(); // begin at playoutNode if (node != null) { @@ -580,7 +581,6 @@ public class MonteCarloTreeSearchMoveType extends Move { IntStack playoutMoves = _searchAdministration.getMoveStack(); byte color = _monteCarloAdministration.getColorToMove(); - int start = _monteCarloAdministration.getMoveStack().getSize(); int end = playoutMoves.getSize(); double weight = 1.0; double weightDelta = 1.0 / (end - start + 1); // Michael Williams' idea to use decreasing weights @@ -613,6 +613,13 @@ public class MonteCarloTreeSearchMoveType extends Move if (_colorMap[xy]==color) result.increaseVirtualPlayouts(_weightMap[xy]*score,_weightMap[xy]); } + // add the move to the maps now + GoMove move = (GoMove) playoutNode.getContent().getMove(); + int xy = move.getXY(); + byte col = move.getColor(); + _colorMap[xy] = col; + _weightMap[xy] = 1.0; //TODO: handle weight + playoutNode = playoutNode.getParent(); } } ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to properly implement RAVE?
On 12:53 Thu 15 Jan , Mark Boon wrote: On Jan 15, 2009, at 12:33 PM, Daniel Waeber wrote: yes, but the weight/color maps stay the same for all updated nodes. I think the first playoutNode (the one most deep inside the tree) only should get amaf values for the random playout, the next one form random playout + from the first playoutNode ... and the root node amaf values form all the nodes. OK, I think I see now what you're trying to say. This is something I did think about. I hope my memory serves me well enough to say why I didn't do it that way. - What you propose adds complexity and possibly computation (if it means recalculating or adjusting the weight map). - I don't think it makes all that much difference. The reason I don't think it makes much difference is that adding AMAF values for the moves above (closer to the root) the playoutNode is that most likely those points are now occupied. Since the AMAF values are used to compute which empty points are the best next candidate, the AMAF values at occupied points are immaterial. They are not even added. So it only makes a difference in cases where played stones get captured and those points then are occupied again. This brings us back to the issue discussed earlier about ko and ishi-no-shita. Yes, I don't know if it would be a big improvement. It is not a real speed decrease, as you just move the point were you add the additional values to the maps. Perhaps I'll run some tests when I have the time. Mark P.S. what do I need to open that file? Is it a SVN patch? it's standard diff format, outputed by git. You can view it with a text editor and apply it with unix patch commandline tool or use eclipse-team-apply-patch. it's done agains svn-root, so you have to ignore 2 leading pathnames if you apply it inside the refbot folder. regards wabu ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to properly implement RAVE?
Hi, I'm also interested at the same thing. snip I tried putting this into pseudo code, but it already looks like C. ;p http://pastie.org/357231 If you could look at it, I would be most grateful. It sounds good but it seems to lack the check of whether a given move is first played in a given intersection. When you add that, it because a little more tricky to update in the tree. You also update the value of each move independently of the color, i.e. a position in which it is black turn will be update with white moves. You should restrict the update. I have a question about the the part were the stats are updated. (l.15-25). having an array of amaf-values in every node seems very memory intensive and I don't get how you would access these values. Something like searching for all nodes below the visited node with the same pos+color as the ones visited and updating amaf-stats directly in these nodes would make more sense to me. (but also costs more cpu time) Hope I don't bring in more confusion into the topic, but I really would like to get rave done right. Hopes that helps, Sylvain -Isaac thanks for you help wabu ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to properly implement RAVE?
Ok, I still have same questions about the refbot code. On 10:29 Wed 14 Jan , Mark Boon wrote: On Jan 14, 2009, at 9:36 AM, Daniel Waeber wrote: I have a question about the the part were the stats are updated. (l.15-25). having an array of amaf-values in every node seems very memory intensive and I don't get how you would access these values. You are right, it is memory intensive. I believe this is one of the reasons that most implementations wait a certain number of playouts before creating the next level of nodes. form http://pastie.org/pastes/357231 : node[visitedNode[i]].AMAFSum[visitedPos[j]]+=result; node[visitedNode[i]].AMAFPlayed[visitedPos[j]]+=1; I had some problems with these lines. looks to me like the nodes have an array of boardsize*boardsize inside *all* nodes, increasing the memory of the tree by a factor 10. But, correct me if I'm wrong, the refbot code just adds two simple values to the node for rave. Accessing the AMAF values depends on your implementation. The following is a code-snippet from my MCTS reference implementation that updates the AMAF values in the tree: if (_useAMAF) { IntStack playoutMoves = _searchAdministration.getMoveStack(); byte color = _monteCarloAdministration.getColorToMove(); int start = _monteCarloAdministration.getMoveStack().getSize(); int end = playoutMoves.getSize(); double weight = 1.0; double weightDelta = 1.0 / (end - start + 1); // Michael Williams' idea to use decreasing weights GoArray.clear(_weightMap); GoArray.clear(_colorMap); for (int i=start; iend; i++) { int moveXY = playoutMoves.get(i); if (_colorMap[moveXY]==0) { _colorMap[moveXY] = color; _weightMap[moveXY] = weight; } weight -= weightDelta; color = opposite(color); } until here it's clear to me. while (playoutNode!=null) { color = opposite(playoutNode.getContent().getMove().getColor()); boolean playerWins = (blackWins color==BLACK) || (!blackWins color==WHITE); double score = playerWins ? MonteCarloTreeSearchResult.MAX_SCORE : MonteCarloTreeSearchResult.MIN_SCORE; for (int i=0; iplayoutNode.getChildCount(); i++) { TreeNodeMonteCarloTreeSearchResultMoveType nextNode = playoutNode.getChildAt(i); MonteCarloTreeSearchResultMoveType result = nextNode.getContent(); GoMove move = (GoMove) result.getMove(); int xy = move.getXY(); if (_colorMap[xy]==color) result.increaseVirtualPlayouts(_weightMap[xy]*score,_weightMap[xy]); if i understand this code correctly, it updates the amaf vaules of all direct children of the playoutNode according to the weight/color maps. And that update is done for all nodes on the selected path. } playoutNode = playoutNode.getParent(); First of all, I miss an weight/colorMap update for xy here. Souldn't the move of the current playoutNode be considered as an amaf move for all the nodes below this one? } } But, most of all, I miss that the code only updates the amaf values of all direct children, and not of all nodes n below the playoutNode, where there is no play at n.move on the path between n and the playoutNode. Finding all these nodes n would be a costy thing to do, but wouldn't that be the right thing to do? Implementing a realistic subset of RAVE is another story, but first of all I want to understand the pure concept of RAVE. playoutNode is the move-node from which the playout was done. The amaf values are stored in its children by the increaseVirtualPlayout() method. Note that it goes up the tree by assigning the parent to playoutNode until it gets to the root. For more context it would be better to lookup the whole source at http://plug-and-go.dev.java.net If you think some more comments in the code could clarify things better I'm open to suggestions. Thanks for the code, didn't know amaf already is inside the mcts refbot. Good luck. Mark Regards, wabu ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/