I find Monte-Carlo Go a fascinating avenue of research, but what pains me is that a huge number of simulations are performed each game and at the end of the game the results are thrown out. So what I was thinking is that perhaps the knowledge generated by the simulations could be collapsed in some way.
Suppose that epsilon greedy versions of a reasonably strong MC Go program were to play a very large number of games against themselves. By epsilon greedy versions I mean that with probability epsilon a random move is played and with probability 1- epsilon the move the MC Player would normally play is played. Each position in these games would be stored along with the Monte Calro/UCT evaluation for that position's desirability. This would produce an arbitrarily large database of position/score pairs. At this point a general function approximator / learning algorithm (such as a neural network) could be trained to map positions to scores. If this was successful, it would produce something that could very quickly (even a large neural net evaluation or what have you would be much faster than doing a large number of MC playouts) map positions to scores. Obviously the scores would not be perfect since the monte carlo program did not play anywhere near perfect Go. But this static evaluator could then be plugged back into the monte carlo player and used to bias the random playouts. Wouldn't it be useful to be able to quickly estimate the MC score without doing any playouts? Clearly this idea could be extended recursively with a lot of offline training. What makes this formulation more valuable is that given enough time and effort someone familiar with machine learning should be able to produce a learning architecture that can actually learn the MC scores. It would be a straightforward, if potentially quite difficult, supervised learning task with effectively unlimited data since more could be generated at will. Such a learning architecture could be used in the manner I described above or thrown at the more general reinforcement learning problem. Does anyone have any thoughts on this idea? Does anyone know of it being tried before? - George _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/