Daniel Bump wrote: >> There's no particular need to do anything drastic with the existing >> code. The only thing that matters in the short term for making GNU Go >> stronger is to catch up with the Monte Carlo development, especially >> make the Monte Carlo code scale better to larger boards. Key ideas to >> look into there are RAVE (rapid action value estimation), >> progressive widening, and improved move ordering. > > Improved move ordering may be mainly a tuning matter. > > What is the process for tuning the existing Monte Carlo code?
There are two different processes involved here. One is move ordering for choosing the next unexplored move while building the search tree, the second is the probability distribution when doing a Monte Carlo simulation from a leaf node of the tree. The latter can be controlled by tuning 3x3 patterns and this is fairly well described in doc/montecarlo.texi. The code related to this is on lines 918-1516 of montecarlo.c, most of the function mc_generate_random_move() in the same file, and patterns/mkmcpat.c. The former is not so well-structured. Let's take a guided tour through the code. Starting point is the function uct_play_move(). It first computes uct values for the child nodes (i.e. so far explored moves). At line 1952 it starts looking for an unexplored move. The comment is misleading since it doesn't just choose a random move but there are some move ordering going on. First, if no move at all has been explored yet, it asks mc_generate_random_move() for a suggestion. The point of this is that certain reactive moves, such as capturing a self atari by the opponent, tend to be highly valued in the tuning of the probability distributions. Otherwise the ordering is decided by the tree->move_ordering array. This array is managed by the two functions init_move_ordering() and update_move_ordering(), which keep track of a numerical score for each move. The init function is called at the setup in uct_genmove() and sets the move scores to "(int) (10 * potential_moves[pos]) - 1", i.e. it uses the move values from the traditional move generator for initial move ordering (also notice the "FIXME: Quick and dirty experiment"). This is successively modified by calls to update_move_ordering() when collecting statistics from completed games in uct_traverse_tree() such that moves in the tree on the winning side increases their score by one. This is currently not tunable but there's nothing stopping this move ordering from being determined by patterns instead of what's done now. > How can we see what the code is doing? Some information can be obtained by setting mc_debug = 1 on line 38 of montecarlo.c (notice the FIXME) and/or enabling the call to uct_dump_tree() on line 2187. But it's not really easy to visualize what the Monte Carlo search is doing. /Gunnar _______________________________________________ gnugo-devel mailing list [email protected] http://lists.gnu.org/mailman/listinfo/gnugo-devel

