Re: [computer-go] How to properly implement RAVE?
Hi, Sorry for the slow reply. After those discussions, I figured out that pseudo code was the fastest clear and not ambiguous way to describe how to precisely implement the algorithm. I needed to find some time to write it. Note that I did not write only the backup phase because to clearly describe the backup you have to see the playouts as well. I also describe the choice of the best move. Note also the fact that the backup from the moves in the tree and from the moves out of the tree is different. That is the somehow more tricky part to check that a move has not been already played (that is different for each node in the tree and we obviously don't want to keep a vector of already played moves for each node). Please forgive the mistakes that are in that code, of course I did not make any test, and it has been quite a long time I am not in the computer go trip anymore :). Please correct any mistake, I hope it makes things clearer. Best, Sylvain class AmafBoard { AmafBoard() { offset = 0 fill(alread_played, 0) } Clear() { offset++; } Play(move) { already_played[move] = offset } AlreadyPlayed(move) { return already_played[move] == offset } vector already_played int offset } ChooseMove(node, board) { bias = 0.015 // I put a random number here, to be tuned b = bias * bias / 0.25 best_value = -1 best_move = PASSMOVE for (move in board.allmoves) { c = node.child(move).counts w = node.child(move).wins rc = node.rave_counts[move] rw = node.rave_wins[move] coefficient = 1 - rc * (rc + c + rc * c * b) value = w / c * coef + rw / rc * (1 - coef) // please here take care of the c==0 and rc == 0 cases if (value best_value) { best_value = value best_move = move } } return best_move } PlayoutInTree(board, moves_played_in_tree, nodes_seen_in_tree) { node = tree.root while (node) { move = ChooseMove(node, board) moves_played_in_tree.append(move) nodes_seen_in_tree.append(node) board.PlayMove(move) node = node.child(move) } } PlayoutOutTree(board, AmafBoard played, moves) { int pass = 0 while (pass 2) { move = board.ChooseMoveAndPlay() if (move == PASSMOVE) { pass ++ continue } else { pass = 0 } if (!played.AlreadyPlayed(move)) { if (!board.last_move_was_black()) { move = -move } moves.append(move) } } return board.black_wins() } BackupNode(node, index, black_wins, moves_played_in_tree, moves_played_out_tree, already_played) { already_played.Clear() win = node.is_black_to_play ? black_wins : !black_wins // normal backup node.wins += win node.counts ++ // rave backup for (j = index; j moves_played_in_tree.size(); j += 2) { move = moves_played_in_tree[j] if (already_played.AlreadyPlayed(move)) continue already_played.Play(move) node.rave_wins[move] += win node.rave_counts[move] ++ } for (j = 0; j moves_played_out_tree.size(); ++j) { move = moves_played_out_tree[j] if (!node.is_black_to_play) move = -move // If it is either not the right color or the intersection is already played we ignore that move for that node if (move 0 || already_played.AlreadyPlayed(move)) continue already_played.Play(move) node.rave_wins[move] += win node.rave_counts[move] ++ } } Backup(black_wins, nodes_seen_in_tree, moves_played_in_tree, moves_played_out_tree, already_played) { for (i = 0; i nodes_seen_in_tree.size(); ++i) { node = nodes_seen_in_tree[i] BackupNode(node, i, black_wins, moves_played_in_tree, moves_played_out_tree, already_played) } } OneIteration(board_position, AmafBoard already_played) { board = board_position.Copy() already_played.Clear() vector moves_played_in_tree vector nodes_seen_in_tree PlayoutInTree(board, moves_played_in_tree, nodes_seen_in_tree) vector moves_played_out_tree bool black_wins = PlayoutOutTree(board, already_played, moves_played_out_tree) Backup(black_wins, nodes_seen_in_tree, moves_played_in_tree, moves_played_out_tree, already_played) } ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to properly implement RAVE?
Hi, Sorry for the slow reply. Hi I'd prefer quality over speed anytime. ;) Your pseudo code is excellent and helped me to understand some of the trickier things. Thanks again! I think I will now be able to implement my own version. :) Regards, Isaac -- Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to properly implement RAVE?
Thanks for taking the time Sylvain. I took a quick look to see how your pseudo-code compared to the reference implementation I made. There are a few small differences, and I think the place(s) where I deviated is exactly what confused Daniel Waeber. The first difference is that when I check whether a point has been played, I always start from the position of the root-node, whereas you start from the position of the node where the moves_played_in_tree is played. The second difference is I also don't count a move if the opposite color played on that point first. The result is I only need to compute the alreadyPlayed map once (in my code these are called weightMap and colorMap, I need two maps because I use decreasing weights) instead of computing it at each step back up in the tree. The only time these 'deviations' make a difference is in case of ko and ishi-no-shita. When I have a little spare time I'll check to see if this actually makes a difference in playing strength. If there's any noticeable difference I'll try to modify the code in my reference implementation to reflect the 'correct' method. Mark On Jan 17, 2009, at 11:48 AM, Sylvain Gelly wrote: Hi, Sorry for the slow reply. After those discussions, I figured out that pseudo code was the fastest clear and not ambiguous way to describe how to precisely implement the algorithm. I needed to find some time to write it. Note that I did not write only the backup phase because to clearly describe the backup you have to see the playouts as well. I also describe the choice of the best move. Note also the fact that the backup from the moves in the tree and from the moves out of the tree is different. That is the somehow more tricky part to check that a move has not been already played (that is different for each node in the tree and we obviously don't want to keep a vector of already played moves for each node). Please forgive the mistakes that are in that code, of course I did not make any test, and it has been quite a long time I am not in the computer go trip anymore :). Please correct any mistake, I hope it makes things clearer. Best, Sylvain class AmafBoard { AmafBoard() { offset = 0 fill(alread_played, 0) } Clear() { offset++; } Play(move) { already_played[move] = offset } AlreadyPlayed(move) { return already_played[move] == offset } vector already_played int offset } ChooseMove(node, board) { bias = 0.015 // I put a random number here, to be tuned b = bias * bias / 0.25 best_value = -1 best_move = PASSMOVE for (move in board.allmoves) { c = node.child(move).counts w = node.child(move).wins rc = node.rave_counts[move] rw = node.rave_wins[move] coefficient = 1 - rc * (rc + c + rc * c * b) value = w / c * coef + rw / rc * (1 - coef) // please here take care of the c==0 and rc == 0 cases if (value best_value) { best_value = value best_move = move } } return best_move } PlayoutInTree(board, moves_played_in_tree, nodes_seen_in_tree) { node = tree.root while (node) { move = ChooseMove(node, board) moves_played_in_tree.append(move) nodes_seen_in_tree.append(node) board.PlayMove(move) node = node.child(move) } } PlayoutOutTree(board, AmafBoard played, moves) { int pass = 0 while (pass 2) { move = board.ChooseMoveAndPlay() if (move == PASSMOVE) { pass ++ continue } else { pass = 0 } if (!played.AlreadyPlayed(move)) { if (!board.last_move_was_black()) { move = -move } moves.append(move) } } return board.black_wins() } BackupNode(node, index, black_wins, moves_played_in_tree, moves_played_out_tree, already_played) { already_played.Clear() win = node.is_black_to_play ? black_wins : !black_wins // normal backup node.wins += win node.counts ++ // rave backup for (j = index; j moves_played_in_tree.size(); j += 2) { move = moves_played_in_tree[j] if (already_played.AlreadyPlayed(move)) continue already_played.Play(move) node.rave_wins[move] += win node.rave_counts[move] ++ } for (j = 0; j moves_played_out_tree.size(); ++j) { move = moves_played_out_tree[j] if (!node.is_black_to_play) move = -move // If it is either not the right color or the intersection is already played we ignore that move for that node if (move 0 || already_played.AlreadyPlayed(move)) continue already_played.Play(move) node.rave_wins[move] += win node.rave_counts[move] ++ } } Backup(black_wins, nodes_seen_in_tree, moves_played_in_tree, moves_played_out_tree, already_played) { for (i = 0; i nodes_seen_in_tree.size(); ++i) { node = nodes_seen_in_tree[i] BackupNode(node, i, black_wins, moves_played_in_tree, moves_played_out_tree, already_played) } } OneIteration(board_position, AmafBoard already_played) { board = board_position.Copy() already_played.Clear() vector moves_played_in_tree vector nodes_seen_in_tree PlayoutInTree(board,
Re: [computer-go] How to properly implement RAVE?
A small point: in PlayoutOutTree, just after if (!played.AlreadyPlayed(move)) {, there should have a played.Play(move). I believe it does not change the final result (as the check is also done in the backup, and the move played in the backup), but I simply forgot that line (that should make moves_played_out_tree smaller). To avoid confusion, I repost the pseudo code with that correction (and hoping the indentation is not broken by the email editor once again). Sylvain class AmafBoard { AmafBoard() { offset = 0 fill(alread_played, 0) } Clear() { offset++; } Play(move) { already_played[move] = offset } AlreadyPlayed(move) { return already_played[move] == offset } vector already_played int offset } ChooseMove(node, board) { bias = 0.015 // I put a random number here, to be tuned b = bias * bias / 0.25 best_value = -1 best_move = PASSMOVE for (move in board.allmoves) { c = node.child(move).counts w = node.child(move).wins rc = node.rave_counts[move] rw = node.rave_wins[move] coefficient = 1 - rc * (rc + c + rc * c * b) value = w / c * coef + rw / rc * (1 - coef) // please here take care of the c==0 and rc == 0 cases if (value best_value) { best_value = value best_move = move } } return best_move } PlayoutInTree(board, moves_played_in_tree, nodes_seen_in_tree) { node = tree.root while (node) { move = ChooseMove(node, board) moves_played_in_tree.append(move) nodes_seen_in_tree.append(node) board.PlayMove(move) node = node.child(move) } } PlayoutOutTree(board, AmafBoard played, moves) { int pass = 0 while (pass 2) { move = board.ChooseMoveAndPlay() if (move == PASSMOVE) { pass ++ continue } else { pass = 0 } if (!played.AlreadyPlayed(move)) { played.Play(move) if (!board.last_move_was_black()) { move = -move } moves.append(move) } } return board.black_wins() } BackupNode(node, index, black_wins, moves_played_in_tree, moves_played_out_tree, already_played) { already_played.Clear() win = node.is_black_to_play ? black_wins : !black_wins // normal backup node.wins += win node.counts ++ // rave backup for (j = index; j moves_played_in_tree.size(); j += 2) { move = moves_played_in_tree[j] if (already_played.AlreadyPlayed(move)) continue already_played.Play(move) node.rave_wins[move] += win node.rave_counts[move] ++ } for (j = 0; j moves_played_out_tree.size(); ++j) { move = moves_played_out_tree[j] if (!node.is_black_to_play) move = -move // If it is either not the right color or the intersection is // already played we ignore that move for that node if (move 0 || already_played.AlreadyPlayed(move)) continue already_played.Play(move) node.rave_wins[move] += win node.rave_counts[move] ++ } } Backup(black_wins, nodes_seen_in_tree, moves_played_in_tree, moves_played_out_tree, already_played) { for (i = 0; i nodes_seen_in_tree.size(); ++i) { node = nodes_seen_in_tree[i] BackupNode(node, i, black_wins, moves_played_in_tree, moves_played_out_tree, already_played) } } OneIteration(board_position, AmafBoard already_played) { board = board_position.Copy() already_played.Clear() vector moves_played_in_tree vector nodes_seen_in_tree PlayoutInTree(board, moves_played_in_tree, nodes_seen_in_tree) vector moves_played_out_tree bool black_wins = PlayoutOutTree(board, already_played, moves_played_out_tree) Backup(black_wins, nodes_seen_in_tree, moves_played_in_tree, moves_played_out_tree, already_played) } 2009/1/17 Sylvain Gelly sylvain.ge...@m4x.org: Hi, Sorry for the slow reply. After those discussions, I figured out that pseudo code was the fastest clear and not ambiguous way to describe how to precisely implement the algorithm. I needed to find some time to write it. Note that I did not write only the backup phase because to clearly describe the backup you have to see the playouts as well. I also describe the choice of the best move. Note also the fact that the backup from the moves in the tree and from the moves out of the tree is different. That is the somehow more tricky part to check that a move has not been already played (that is different for each node in the tree and we obviously don't want to keep a vector of already played moves for each node). Please forgive the mistakes that are in that code, of course I did not make any test, and it has been quite a long time I am not in the computer go trip anymore :). Please correct any mistake, I hope it makes things clearer. Best, Sylvain class AmafBoard { AmafBoard() { offset = 0 fill(alread_played, 0) } Clear() { offset++; } Play(move) { already_played[move] = offset }
Re: [computer-go] How to properly implement RAVE?
Hi Mark, For the first difference you mention, as far as I remember it makes a small but significant difference and is one of the main reason I was talking about tricky details. For the second difference, I also don't want to count a move if the opposite color played on that point first, and, unless I am mistaken, I indeed don't count them in the part of the playout which is outside the tree. However you are right for the part inside the tree, due to the j+=2. I am a little confused and now believe it should not be a j+=2 but a j++, updating the already_played map for each move inside the tree during the backup, but doing the backup only once every too moves (to match the color). It may be a bug in what I proposed, I cannot test it anymore. However, what I am sure about is that this j+=2 is what I was using and gave very good results. If what you point out is indeed a bug and makes a difference, then it can only be even better :). I prefer not modifying the pseudo code I posted until someone verifies it becomes better. Thanks, Sylvain 2009/1/17 Mark Boon tesujisoftw...@gmail.com Thanks for taking the time Sylvain. I took a quick look to see how your pseudo-code compared to the reference implementation I made. There are a few small differences, and I think the place(s) where I deviated is exactly what confused Daniel Waeber. The first difference is that when I check whether a point has been played, I always start from the position of the root-node, whereas you start from the position of the node where the moves_played_in_tree is played. The second difference is I also don't count a move if the opposite color played on that point first. The result is I only need to compute the alreadyPlayed map once (in my code these are called weightMap and colorMap, I need two maps because I use decreasing weights) instead of computing it at each step back up in the tree. The only time these 'deviations' make a difference is in case of ko and ishi-no-shita. When I have a little spare time I'll check to see if this actually makes a difference in playing strength. If there's any noticeable difference I'll try to modify the code in my reference implementation to reflect the 'correct' method. Mark On Jan 17, 2009, at 11:48 AM, Sylvain Gelly wrote: Hi, Sorry for the slow reply. After those discussions, I figured out that pseudo code was the fastest clear and not ambiguous way to describe how to precisely implement the algorithm. I needed to find some time to write it. Note that I did not write only the backup phase because to clearly describe the backup you have to see the playouts as well. I also describe the choice of the best move. Note also the fact that the backup from the moves in the tree and from the moves out of the tree is different. That is the somehow more tricky part to check that a move has not been already played (that is different for each node in the tree and we obviously don't want to keep a vector of already played moves for each node). Please forgive the mistakes that are in that code, of course I did not make any test, and it has been quite a long time I am not in the computer go trip anymore :). Please correct any mistake, I hope it makes things clearer. Best, Sylvain class AmafBoard { AmafBoard() { offset = 0 fill(alread_played, 0) } Clear() { offset++; } Play(move) { already_played[move] = offset } AlreadyPlayed(move) { return already_played[move] == offset } vector already_played int offset } ChooseMove(node, board) { bias = 0.015 // I put a random number here, to be tuned b = bias * bias / 0.25 best_value = -1 best_move = PASSMOVE for (move in board.allmoves) { c = node.child(move).counts w = node.child(move).wins rc = node.rave_counts[move] rw = node.rave_wins[move] coefficient = 1 - rc * (rc + c + rc * c * b) value = w / c * coef + rw / rc * (1 - coef) // please here take care of the c==0 and rc == 0 cases if (value best_value) { best_value = value best_move = move } } return best_move } PlayoutInTree(board, moves_played_in_tree, nodes_seen_in_tree) { node = tree.root while (node) { move = ChooseMove(node, board) moves_played_in_tree.append(move) nodes_seen_in_tree.append(node) board.PlayMove(move) node = node.child(move) } } PlayoutOutTree(board, AmafBoard played, moves) { int pass = 0 while (pass 2) { move = board.ChooseMoveAndPlay() if (move == PASSMOVE) { pass ++ continue } else { pass = 0 } if (!played.AlreadyPlayed(move)) { if (!board.last_move_was_black()) { move = -move } moves.append(move) } } return board.black_wins() } BackupNode(node, index, black_wins, moves_played_in_tree, moves_played_out_tree, already_played) { already_played.Clear() win = node.is_black_to_play ? black_wins : !black_wins // normal backup node.wins += win node.counts ++ //
Re: [computer-go] How to properly implement RAVE?
Hi Issac, You are welcome, and I am happy there is finally a clearer of implementing RAVE out there. I believe I should have done it much earlier, sorry for that, but better late than never, no? :) Best, Sylvain 2009/1/17 Isaac Deutsch i...@gmx.ch Hi, Sorry for the slow reply. Hi I'd prefer quality over speed anytime. ;) Your pseudo code is excellent and helped me to understand some of the trickier things. Thanks again! I think I will now be able to implement my own version. :) Regards, Isaac -- Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to properly implement RAVE?
I think I found a bug in ChooseMove Quoting Sylvain Gelly sylvain.ge...@m4x.org: coefficient = 1 - rc * (rc + c + rc * c * b) I think this has to be coefficient = 1 - rc / (rc + c + rc * c * b) thus when c is 0 (initially) the coefficient is 0 when c goes towards infinity the coefficent goes 1 which is how this function should behave. Magnus -- Magnus Persson Berlin, Germany ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to properly implement RAVE?
Good catch :)Indeed it makes no sense with the *, sorry... Sylvain 2009/1/17 Magnus Persson magnus.pers...@phmp.se I think I found a bug in ChooseMove Quoting Sylvain Gelly sylvain.ge...@m4x.org: coefficient = 1 - rc * (rc + c + rc * c * b) I think this has to be coefficient = 1 - rc / (rc + c + rc * c * b) thus when c is 0 (initially) the coefficient is 0 when c goes towards infinity the coefficent goes 1 which is how this function should behave. Magnus -- Magnus Persson Berlin, Germany ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to properly implement RAVE?
On Jan 17, 2009, at 5:41 PM, Sylvain Gelly wrote: For the first difference you mention, as far as I remember it makes a small but significant difference and is one of the main reason I was talking about tricky details. OK, I ran a test and after 1,000 games with 2K semi-light playouts I get a winning percentage of 50.6% for your methods vs. mine. Of course it's possible I made some mistake, but my first impression is it makes no difference which way you do this particular detail. Your ChooseNode is also quite different from mine, mostly because I also still have a UCT component in there. I'll give your method a go one day, just to see if it changes anything. I've come to understand what you mean by tricky details, sometimes I see a big difference in playing strength that I find hard to explain given the change(s) I made. Conversely I've been in quite a few cases where I thought something would make a difference, only to find out it all didn't matter one bit. It's also possible that some deficiencies that would be apparent in one implementation, get compensated for in another. Some examples: David Fotland wrote he does light playouts with just a few patterns but no tactics. I find that using a moderate amount of tactics actually is the biggest contributor to playing strength (save one or more stones if can't be caught in ladder). However, augmenting patterns with tactical information I found doesn't help at all, even when disregarding the performance cost. Maybe David uses some patterns to compensate for part of the tactics and relies on the faster playouts to compensate for poorer playouts. I'm guessing here, but otherwise I can't imagine why he would forego what otherwise seems to be a big gain in strength. I also tried to use ownership maps to modify the RAVE value. Remi Coulom wrote in a paper he used ownership information of up to 63 playouts. When I tried something similar it always makes play weaker. Maybe I should use it in a different way, but I haven't stumbled on the solution yet. When I think of it, AMAF information is already something very similar to ownership information. So maybe combining the two doesn't make much sense. Lastly, in an earlier UCT bot that I made I gained a lot by initially reducing the number of moves and slowly expanding it. After using AMAF it turns out this method hardly gains anything at all anymore. So the devil is not only in the details, it's also in the combination of the details. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/