Re: [computer-go] How to properly implement RAVE?

2009-01-17 Thread Sylvain Gelly
Hi,

Sorry for the slow reply.
After those discussions, I figured out that pseudo code was the
fastest clear and not ambiguous way to describe how to precisely
implement the algorithm. I needed to find some time to write it.
Note that I did not write only the backup phase because to clearly
describe the backup you have to see the playouts as well. I also
describe the choice of the best move.
Note also the fact that the backup from the moves in the tree and from
the moves out of the tree is different. That is the somehow more
tricky part to check that a move has not been already played (that is
different for each node in the tree and we obviously don't want to
keep a vector of already played moves for each node).

Please forgive the mistakes that are in that code, of course I did not
make any test, and it has been quite a long time I am not in the
computer go trip anymore :). Please correct any mistake,
I hope it makes things clearer.

Best,
Sylvain

class AmafBoard {
  AmafBoard() {
offset = 0
fill(alread_played, 0)
  }
  Clear() {
offset++;
  }
  Play(move) {
already_played[move] = offset
  }
  AlreadyPlayed(move) {
return already_played[move] == offset
  }
  vector already_played
  int offset
}

ChooseMove(node, board) {
  bias = 0.015  // I put a random number here, to be tuned
  b = bias * bias / 0.25
  best_value = -1
  best_move = PASSMOVE
  for (move in board.allmoves) {
c = node.child(move).counts
w = node.child(move).wins
rc = node.rave_counts[move]
rw = node.rave_wins[move]
coefficient = 1 - rc * (rc + c + rc * c * b)
value = w / c * coef + rw / rc * (1 - coef)  // please here take
care of the c==0 and rc == 0 cases
if (value  best_value) {
  best_value = value
  best_move = move
}
  }
  return best_move
}
PlayoutInTree(board, moves_played_in_tree, nodes_seen_in_tree) {
  node = tree.root
  while (node) {
move = ChooseMove(node, board)
moves_played_in_tree.append(move)
nodes_seen_in_tree.append(node)
board.PlayMove(move)
node = node.child(move)
  }
}
PlayoutOutTree(board, AmafBoard played, moves) {
  int pass = 0
  while (pass  2) {
move = board.ChooseMoveAndPlay()
if (move == PASSMOVE) {
  pass ++
  continue
} else {
  pass = 0
}
if (!played.AlreadyPlayed(move)) {
  if (!board.last_move_was_black()) {
move = -move
  }
  moves.append(move)
}
  }
  return board.black_wins()
}

BackupNode(node, index, black_wins, moves_played_in_tree,
moves_played_out_tree, already_played) {
  already_played.Clear()
  win = node.is_black_to_play ? black_wins : !black_wins
  // normal backup
  node.wins += win
  node.counts ++
  // rave backup
  for (j = index; j  moves_played_in_tree.size(); j += 2) {
move = moves_played_in_tree[j]
if (already_played.AlreadyPlayed(move)) continue
already_played.Play(move)
node.rave_wins[move] += win
node.rave_counts[move] ++
  }
  for (j = 0; j  moves_played_out_tree.size(); ++j) {
move = moves_played_out_tree[j]
if (!node.is_black_to_play) move = -move
// If it is either not the right color or the intersection is
already played we ignore that move for that node
if (move  0 || already_played.AlreadyPlayed(move)) continue

already_played.Play(move)
node.rave_wins[move] += win
node.rave_counts[move] ++
  }
}

Backup(black_wins, nodes_seen_in_tree, moves_played_in_tree,
moves_played_out_tree, already_played) {
  for (i = 0; i  nodes_seen_in_tree.size(); ++i) {
node = nodes_seen_in_tree[i]
BackupNode(node, i, black_wins, moves_played_in_tree,
moves_played_out_tree, already_played)
  }
}
OneIteration(board_position, AmafBoard already_played) {
  board = board_position.Copy()
  already_played.Clear()

  vector moves_played_in_tree
  vector nodes_seen_in_tree
  PlayoutInTree(board, moves_played_in_tree, nodes_seen_in_tree)

  vector moves_played_out_tree
  bool black_wins = PlayoutOutTree(board, already_played,
moves_played_out_tree)
  Backup(black_wins, nodes_seen_in_tree, moves_played_in_tree,
moves_played_out_tree, already_played)
}
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] How to properly implement RAVE?

2009-01-17 Thread Isaac Deutsch
 Hi,
 
 Sorry for the slow reply.

Hi

I'd prefer quality over speed anytime. ;)
Your pseudo code is excellent and helped me to understand some of the trickier 
things. Thanks again!
I think I will now be able to implement my own version. :)

Regards,
Isaac
-- 
Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: 
http://www.gmx.net/de/go/multimessenger
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] How to properly implement RAVE?

2009-01-17 Thread Mark Boon
Thanks for taking the time Sylvain. I took a quick look to see how  
your pseudo-code compared to the reference implementation I made.  
There are a few small differences, and I think the place(s) where I  
deviated is exactly what confused Daniel Waeber.


The first difference is that when I check whether a point has been  
played, I always start from the position of the root-node, whereas you  
start from the position of the node where the moves_played_in_tree is  
played. The second difference is I also don't count a move if the  
opposite color played on that point first. The result is I only need  
to compute the alreadyPlayed map once (in my code these are called  
weightMap and colorMap, I need two maps because I use decreasing  
weights) instead of computing it at each step back up in the tree.


The only time these 'deviations' make a difference is in case of ko  
and ishi-no-shita. When I have a little spare time I'll check to see  
if this actually makes a difference in playing strength. If there's  
any noticeable difference I'll try to modify the code in my reference  
implementation to reflect the 'correct' method.


Mark


On Jan 17, 2009, at 11:48 AM, Sylvain Gelly wrote:


Hi,

Sorry for the slow reply.
After those discussions, I figured out that pseudo code was the
fastest clear and not ambiguous way to describe how to precisely
implement the algorithm. I needed to find some time to write it.
Note that I did not write only the backup phase because to clearly
describe the backup you have to see the playouts as well. I also
describe the choice of the best move.
Note also the fact that the backup from the moves in the tree and from
the moves out of the tree is different. That is the somehow more
tricky part to check that a move has not been already played (that is
different for each node in the tree and we obviously don't want to
keep a vector of already played moves for each node).

Please forgive the mistakes that are in that code, of course I did not
make any test, and it has been quite a long time I am not in the
computer go trip anymore :). Please correct any mistake,
I hope it makes things clearer.

Best,
Sylvain

class AmafBoard {
 AmafBoard() {
   offset = 0
   fill(alread_played, 0)
 }
 Clear() {
   offset++;
 }
 Play(move) {
   already_played[move] = offset
 }
 AlreadyPlayed(move) {
   return already_played[move] == offset
 }
 vector already_played
 int offset
}

ChooseMove(node, board) {
 bias = 0.015  // I put a random number here, to be tuned
 b = bias * bias / 0.25
 best_value = -1
 best_move = PASSMOVE
 for (move in board.allmoves) {
   c = node.child(move).counts
   w = node.child(move).wins
   rc = node.rave_counts[move]
   rw = node.rave_wins[move]
   coefficient = 1 - rc * (rc + c + rc * c * b)
   value = w / c * coef + rw / rc * (1 - coef)  // please here take
care of the c==0 and rc == 0 cases
   if (value  best_value) {
 best_value = value
 best_move = move
   }
 }
 return best_move
}
PlayoutInTree(board, moves_played_in_tree, nodes_seen_in_tree) {
 node = tree.root
 while (node) {
   move = ChooseMove(node, board)
   moves_played_in_tree.append(move)
   nodes_seen_in_tree.append(node)
   board.PlayMove(move)
   node = node.child(move)
 }
}
PlayoutOutTree(board, AmafBoard played, moves) {
 int pass = 0
 while (pass  2) {
   move = board.ChooseMoveAndPlay()
   if (move == PASSMOVE) {
 pass ++
 continue
   } else {
 pass = 0
   }
   if (!played.AlreadyPlayed(move)) {
 if (!board.last_move_was_black()) {
   move = -move
 }
 moves.append(move)
   }
 }
 return board.black_wins()
}

BackupNode(node, index, black_wins, moves_played_in_tree,
moves_played_out_tree, already_played) {
 already_played.Clear()
 win = node.is_black_to_play ? black_wins : !black_wins
 // normal backup
 node.wins += win
 node.counts ++
 // rave backup
 for (j = index; j  moves_played_in_tree.size(); j += 2) {
   move = moves_played_in_tree[j]
   if (already_played.AlreadyPlayed(move)) continue
   already_played.Play(move)
   node.rave_wins[move] += win
   node.rave_counts[move] ++
 }
 for (j = 0; j  moves_played_out_tree.size(); ++j) {
   move = moves_played_out_tree[j]
   if (!node.is_black_to_play) move = -move
   // If it is either not the right color or the intersection is
already played we ignore that move for that node
   if (move  0 || already_played.AlreadyPlayed(move)) continue

   already_played.Play(move)
   node.rave_wins[move] += win
   node.rave_counts[move] ++
 }
}

Backup(black_wins, nodes_seen_in_tree, moves_played_in_tree,
moves_played_out_tree, already_played) {
 for (i = 0; i  nodes_seen_in_tree.size(); ++i) {
   node = nodes_seen_in_tree[i]
   BackupNode(node, i, black_wins, moves_played_in_tree,
moves_played_out_tree, already_played)
 }
}
OneIteration(board_position, AmafBoard already_played) {
 board = board_position.Copy()
 already_played.Clear()

 vector moves_played_in_tree
 vector nodes_seen_in_tree
 PlayoutInTree(board, 

Re: [computer-go] How to properly implement RAVE?

2009-01-17 Thread Sylvain Gelly
A small point: in PlayoutOutTree, just after if
(!played.AlreadyPlayed(move)) {, there should have a played.Play(move).
I believe it does not change the final result (as the check is also done in
the backup, and the move played in the backup), but I simply forgot that
line (that should make moves_played_out_tree smaller).

To avoid confusion, I repost the pseudo code with that correction (and
hoping the indentation is not broken by the email editor once again).

Sylvain


class AmafBoard {
  AmafBoard() {
offset = 0
fill(alread_played, 0)
  }
  Clear() {
offset++;
  }  Play(move) {
 already_played[move] = offset
  }
  AlreadyPlayed(move) {
 return already_played[move] == offset
  }
  vector already_played
  int offset
}

ChooseMove(node, board) {
  bias = 0.015  // I put a random number here, to be tuned
  b = bias * bias / 0.25
  best_value = -1
  best_move = PASSMOVE
  for (move in board.allmoves) {
c = node.child(move).counts
w = node.child(move).wins
rc = node.rave_counts[move]
rw = node.rave_wins[move]
coefficient = 1 - rc * (rc + c + rc * c * b)
value = w / c * coef + rw / rc * (1 - coef)  // please here take care of
the c==0 and rc == 0 cases
if (value  best_value) {
  best_value = value
  best_move = move
}
  }
  return best_move
}


PlayoutInTree(board, moves_played_in_tree, nodes_seen_in_tree) {
  node = tree.root
  while (node) {
move = ChooseMove(node, board)
moves_played_in_tree.append(move)
nodes_seen_in_tree.append(node)
board.PlayMove(move)
node = node.child(move)
  }
}


PlayoutOutTree(board, AmafBoard played, moves) {
  int pass = 0
  while (pass  2) {
move = board.ChooseMoveAndPlay()
if (move == PASSMOVE) {
  pass ++
  continue
} else {
  pass = 0
}
if (!played.AlreadyPlayed(move)) {
  played.Play(move)
  if (!board.last_move_was_black()) {
move = -move
  }
  moves.append(move)
}
  }
  return board.black_wins()
}


BackupNode(node, index, black_wins, moves_played_in_tree,
moves_played_out_tree, already_played) {
  already_played.Clear()
  win = node.is_black_to_play ? black_wins : !black_wins
  // normal backup
  node.wins += win
  node.counts ++
  // rave backup
  for (j = index; j  moves_played_in_tree.size(); j += 2) {
move = moves_played_in_tree[j]
if (already_played.AlreadyPlayed(move)) continue
already_played.Play(move)
node.rave_wins[move] += win
node.rave_counts[move] ++
  }
  for (j = 0; j  moves_played_out_tree.size(); ++j) {
move = moves_played_out_tree[j]
if (!node.is_black_to_play) move = -move

// If it is either not the right color or the intersection is
// already played we ignore that move for that node
if (move  0 || already_played.AlreadyPlayed(move)) continue

already_played.Play(move)
node.rave_wins[move] += win
node.rave_counts[move] ++
  }
}


Backup(black_wins, nodes_seen_in_tree, moves_played_in_tree,
moves_played_out_tree, already_played) {
  for (i = 0; i  nodes_seen_in_tree.size(); ++i) {
node = nodes_seen_in_tree[i]
BackupNode(node, i, black_wins, moves_played_in_tree,
moves_played_out_tree, already_played)
  }
}


OneIteration(board_position, AmafBoard already_played) {
  board = board_position.Copy()
  already_played.Clear()

  vector moves_played_in_tree
  vector nodes_seen_in_tree
  PlayoutInTree(board, moves_played_in_tree, nodes_seen_in_tree)

  vector moves_played_out_tree
  bool black_wins = PlayoutOutTree(board, already_played,
 moves_played_out_tree)
  Backup(black_wins, nodes_seen_in_tree, moves_played_in_tree,
  moves_played_out_tree, already_played)
}

2009/1/17 Sylvain Gelly sylvain.ge...@m4x.org:
 Hi,

 Sorry for the slow reply.
 After those discussions, I figured out that pseudo code was the
 fastest clear and not ambiguous way to describe how to precisely
 implement the algorithm. I needed to find some time to write it.
 Note that I did not write only the backup phase because to clearly
 describe the backup you have to see the playouts as well. I also
 describe the choice of the best move.
 Note also the fact that the backup from the moves in the tree and from
 the moves out of the tree is different. That is the somehow more
 tricky part to check that a move has not been already played (that is
 different for each node in the tree and we obviously don't want to
 keep a vector of already played moves for each node).

 Please forgive the mistakes that are in that code, of course I did not
 make any test, and it has been quite a long time I am not in the
 computer go trip anymore :). Please correct any mistake,
 I hope it makes things clearer.

 Best,
 Sylvain

 class AmafBoard {
  AmafBoard() {
offset = 0
fill(alread_played, 0)
  }
  Clear() {
offset++;
  }
  Play(move) {
already_played[move] = offset
  }
  

Re: [computer-go] How to properly implement RAVE?

2009-01-17 Thread Sylvain Gelly
Hi Mark,
For the first difference you mention, as far as I remember it makes a small
but significant difference and is one of the main reason I was talking about
tricky details.
For the second difference, I also don't want to count a move if the
opposite color played on that point first, and, unless I am mistaken, I
indeed don't count them in the part of the playout which is outside the
tree.
However you are right for the part inside the tree, due to the j+=2. I am
a little confused and now believe it should not be a j+=2 but a j++,
updating the already_played map for each move inside the tree during the
backup, but doing the backup only once every too moves (to match the color).

It may be a bug in what I proposed, I cannot test it anymore. However, what
I am sure about is that this j+=2 is what I was using and gave very good
results. If what you point out is indeed a bug and makes a difference, then
it can only be even better :). I prefer not modifying the pseudo code I
posted until someone verifies it becomes better.

Thanks,
Sylvain

2009/1/17 Mark Boon tesujisoftw...@gmail.com

 Thanks for taking the time Sylvain. I took a quick look to see how your
 pseudo-code compared to the reference implementation I made. There are a few
 small differences, and I think the place(s) where I deviated is exactly what
 confused Daniel Waeber.

 The first difference is that when I check whether a point has been played,
 I always start from the position of the root-node, whereas you start from
 the position of the node where the moves_played_in_tree is played. The
 second difference is I also don't count a move if the opposite color played
 on that point first. The result is I only need to compute the alreadyPlayed
 map once (in my code these are called weightMap and colorMap, I need two
 maps because I use decreasing weights) instead of computing it at each step
 back up in the tree.

 The only time these 'deviations' make a difference is in case of ko and
 ishi-no-shita. When I have a little spare time I'll check to see if this
 actually makes a difference in playing strength. If there's any noticeable
 difference I'll try to modify the code in my reference implementation to
 reflect the 'correct' method.

 Mark



 On Jan 17, 2009, at 11:48 AM, Sylvain Gelly wrote:

  Hi,

 Sorry for the slow reply.
 After those discussions, I figured out that pseudo code was the
 fastest clear and not ambiguous way to describe how to precisely
 implement the algorithm. I needed to find some time to write it.
 Note that I did not write only the backup phase because to clearly
 describe the backup you have to see the playouts as well. I also
 describe the choice of the best move.
 Note also the fact that the backup from the moves in the tree and from
 the moves out of the tree is different. That is the somehow more
 tricky part to check that a move has not been already played (that is
 different for each node in the tree and we obviously don't want to
 keep a vector of already played moves for each node).

 Please forgive the mistakes that are in that code, of course I did not
 make any test, and it has been quite a long time I am not in the
 computer go trip anymore :). Please correct any mistake,
 I hope it makes things clearer.

 Best,
 Sylvain

 class AmafBoard {
  AmafBoard() {
   offset = 0
   fill(alread_played, 0)
  }
  Clear() {
   offset++;
  }
  Play(move) {
   already_played[move] = offset
  }
  AlreadyPlayed(move) {
   return already_played[move] == offset
  }
  vector already_played
  int offset
 }

 ChooseMove(node, board) {
  bias = 0.015  // I put a random number here, to be tuned
  b = bias * bias / 0.25
  best_value = -1
  best_move = PASSMOVE
  for (move in board.allmoves) {
   c = node.child(move).counts
   w = node.child(move).wins
   rc = node.rave_counts[move]
   rw = node.rave_wins[move]
   coefficient = 1 - rc * (rc + c + rc * c * b)
   value = w / c * coef + rw / rc * (1 - coef)  // please here take
 care of the c==0 and rc == 0 cases
   if (value  best_value) {
 best_value = value
 best_move = move
   }
  }
  return best_move
 }
 PlayoutInTree(board, moves_played_in_tree, nodes_seen_in_tree) {
  node = tree.root
  while (node) {
   move = ChooseMove(node, board)
   moves_played_in_tree.append(move)
   nodes_seen_in_tree.append(node)
   board.PlayMove(move)
   node = node.child(move)
  }
 }
 PlayoutOutTree(board, AmafBoard played, moves) {
  int pass = 0
  while (pass  2) {
   move = board.ChooseMoveAndPlay()
   if (move == PASSMOVE) {
 pass ++
 continue
   } else {
 pass = 0
   }
   if (!played.AlreadyPlayed(move)) {
 if (!board.last_move_was_black()) {
   move = -move
 }
 moves.append(move)
   }
  }
  return board.black_wins()
 }

 BackupNode(node, index, black_wins, moves_played_in_tree,
 moves_played_out_tree, already_played) {
  already_played.Clear()
  win = node.is_black_to_play ? black_wins : !black_wins
  // normal backup
  node.wins += win
  node.counts ++
  // 

Re: [computer-go] How to properly implement RAVE?

2009-01-17 Thread Sylvain Gelly
Hi Issac,
You are welcome, and I am happy there is finally a clearer of implementing
RAVE out there. I believe I should have done it much earlier, sorry for
that, but better late than never, no? :)

Best,
Sylvain

2009/1/17 Isaac Deutsch i...@gmx.ch

  Hi,
 
  Sorry for the slow reply.

 Hi

 I'd prefer quality over speed anytime. ;)
 Your pseudo code is excellent and helped me to understand some of the
 trickier things. Thanks again!
 I think I will now be able to implement my own version. :)

 Regards,
 Isaac
 --
 Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen:
 http://www.gmx.net/de/go/multimessenger
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-17 Thread Magnus Persson

I think I found a bug in ChooseMove

Quoting Sylvain Gelly sylvain.ge...@m4x.org:


coefficient = 1 - rc * (rc + c + rc * c * b)


I think this has to be

coefficient = 1 - rc / (rc + c + rc * c * b)

thus when c is 0 (initially) the coefficient is 0
when c goes towards infinity the coefficent goes 1

which is how this function should behave.

Magnus

--
Magnus Persson
Berlin, Germany
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] How to properly implement RAVE?

2009-01-17 Thread Sylvain Gelly
Good catch :)Indeed it makes no sense with the *, sorry...

Sylvain

2009/1/17 Magnus Persson magnus.pers...@phmp.se

 I think I found a bug in ChooseMove

 Quoting Sylvain Gelly sylvain.ge...@m4x.org:

 coefficient = 1 - rc * (rc + c + rc * c * b)


 I think this has to be

 coefficient = 1 - rc / (rc + c + rc * c * b)

 thus when c is 0 (initially) the coefficient is 0
 when c goes towards infinity the coefficent goes 1

 which is how this function should behave.

 Magnus

 --
 Magnus Persson
 Berlin, Germany

 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-17 Thread Mark Boon


On Jan 17, 2009, at 5:41 PM, Sylvain Gelly wrote:

For the first difference you mention, as far as I remember it makes  
a small but significant difference and is one of the main reason I  
was talking about tricky details.


OK, I ran a test and after 1,000 games with 2K semi-light playouts I  
get a winning percentage of 50.6% for your methods vs. mine. Of course  
it's possible I made some mistake, but my first impression is it makes  
no difference which way you do this particular detail.


Your ChooseNode is also quite different from mine, mostly because I  
also still have a UCT component in there. I'll give your method a go  
one day, just to see if it changes anything.


I've come to understand what you mean by tricky details, sometimes I  
see a big difference in playing strength that I find hard to explain  
given the change(s) I made. Conversely I've been in quite a few cases  
where I thought something would make a difference, only to find out it  
all didn't matter one bit.


It's also possible that some deficiencies that would be apparent in  
one implementation, get compensated for in another.


Some examples: David Fotland wrote he does light playouts with just a  
few patterns but no tactics. I find that using a moderate amount of  
tactics actually is the biggest contributor to playing strength (save  
one or more stones if can't be caught in ladder). However, augmenting  
patterns with tactical information I found doesn't help at all, even  
when disregarding the performance cost. Maybe David uses some patterns  
to compensate for part of the tactics and relies on the faster  
playouts to compensate for poorer playouts. I'm guessing here, but  
otherwise I can't imagine why he would forego what otherwise seems to  
be a big gain in strength.


I also tried to use ownership maps to modify the RAVE value. Remi  
Coulom wrote in a paper he used ownership information of up to 63  
playouts. When I tried something similar it always makes play weaker.  
Maybe I should use it in a different way, but I haven't stumbled on  
the solution yet. When I think of it, AMAF information is already  
something very similar to ownership information. So maybe combining  
the two doesn't make much sense.


Lastly, in an earlier UCT bot that I made I gained a lot by initially  
reducing the number of moves and slowly expanding it. After using AMAF  
it turns out this method hardly gains anything at all anymore.


So the devil is not only in the details, it's also in the combination  
of the details.


Mark


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/