[computer-go] published stats on string merge frequencies, etc
Hi all, I've been looking through the archives trying to find out if anybody had published information on the following during uniform random playouts and/or real games: - Frequency of string merges - Frequency of placing stones in empty space - Average length of strings, etc. I noticed that quite a few people had experimented, has anybody put their stats up anywhere on the web, or in a paper? regards, Christian ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] cgos: Donn Daily?
Donn, your email at d...@mit.edu is bouncing. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
Hi Yamato, Could you give us the source code which you used? Your algorithm is too complicated, so it would be very helpful if possible. Actually I think the source code would be much harder to understand! It is written inside RLGO, and makes use of a substantial existing framework that would take a lot of effort to understand. (On a separate note I am considering making RLGO open source at some point, but I'd prefer to spend some effort cleaning it up before making it public). But I think maybe Algorithm 1 is much easier than you think: A: Estimate value V* of every position in a training set, using deep rollouts. B: Repeat, for each position in the training set 1. Run M simulations, estimate value of position (call this V) 2. Run another N simulations, average the value of psi(s,a) over all positions and moves in games that black won (call this g) 3. Adjust parameters by alpha * (V* - V) * g The feature vector is the set of patterns you use, with value 1 if a pattern is matched and 0 otherwise. The simulation policy selects actions in proportion to the exponentiated, weighted sum of all matching patterns. For example let's say move a matches patterns 1 and 2, move b matches patterns 1 and 3, and move c matches patterns 2 and 4. Then move a would be selected with probability e^(theta1 + theta2) / (e^(theta1 + theta2) + e^(theta1 + theta3) + e^(theta2 + theta4)). The theta values are the weights on the patterns which we would like to learn. They are the log of the Elo ratings in Remi Coulom's approach. The only tricky part is computing the vector psi(s,a). Each component of psi(s,a) corresponds to a particular pattern, and is the difference between the observed feature (i.e. whether the pattern actually occurred after move a in position s) and the expected feature (the average value of the pattern, weighted by the probability of selecting each action). It's also very important to be careful about signs and the colour to play - it's easy to make a mistake and follow the gradient in the wrong direction. Is that any clearer? -Dave ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
David Silver wrote: Hi Michael, But one thing confuses me: You are using the value from Fuego's 10k simulations as an approximation of the actual value of the position. But isn't the actual value of the position either a win or a loss? On such small boards, can't you assume that Fuego is able to correctly determin who is winning and round it's evaluation to the nearest win/loss? i.e. if it evaluates the position to 0.674, that gets rounded to 1. If such an assumption about Fuego's ability to read the position on a small board is valid, then it should improve the results of your balanced simulation strategy, right? Or am I missing something? It's true that 5x5 Go is solved, so in principle we could have used the true minimax values. However we chose to use an approach that can scale to larger boards, which means that we should treat the expert evaluations as approximate. And in fact Fuego was not always accurate on 6x6 boards, as we used only 10k simulations in our training set. Also, I think it really helps to have soft rather than hard expert evaluations. We want a simulation policy that helps differentiate e.g. a 90% winning position from an 85% winning position. Rounding all the expert evaluations to 0 or 1 would lose much of this advantage. -Dave By this argument (your last paragraph), you need to do some magical number of simulations for the training data. Not enough simulations and you have too much noise. And infinite simulations gives you hard 0 or 1 results. But I can't argue with your results. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Re: [computer go] Monte-Carlo Simulation Balancing
Hi Remi, What komi did you use for 5x5 and 6x6 ? I used 7.5 komi for both board sizes. I find it strange that you get only 70 Elo points from supervised learning over uniform random. Don't you have any feature for atari extension ? This one alone should improve strength immensely (extend string in atari because of the previous move). Actually no. The features are very simple, and know how to capture but not how to defend ataris. I'm sure that a better set of features could improve by more than 70 Elo, but I expect we would still see a benefit to balancing the weights correctly. For example, the Fuego policy defends ataris and follows several other common-sense rules, but the results in 5x5 and 6x6 show that it is not well balanced on small boards. Let us extrapolate: I got more than 200 Elo points of improvements from my patterns in 9x9 over uniform random (I never really measured that, it may be even more than 200). I guess you got more than 200 Elo on 9x9, in Mogo (Gelly et al. 2006) the improvement from uniform random was at least 400 Elo and I think your simulation policy is probably at least as strong. By the way I was sad to hear you're not working on Crazystone any more. Is it because you are you busy with other projects? So maybe I could get 600 more Elo points with your method. And even more on 19x19. I noticed that, in general, changes in the playout policy have a much bigger impact on larger boards than on smaller boards. As to whether we can extrapolate, I hope so :-) I share the same feeling that improving the simulation policy will be more impactful on bigger boards with longer simulations. On the other hand I've been surprised by many things in MC Go, so nothing is certain until we try it! -Dave ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
David Silver wrote: A: Estimate value V* of every position in a training set, using deep rollouts. B: Repeat, for each position in the training set 1. Run M simulations, estimate value of position (call this V) 2. Run another N simulations, average the value of psi(s,a) over all positions and moves in games that black won (call this g) 3. Adjust parameters by alpha * (V* - V) * g Thanks for the detailed explanation. M, N and alpha are constant numbers, right? What did you set them to? The feature vector is the set of patterns you use, with value 1 if a pattern is matched and 0 otherwise. The simulation policy selects actions in proportion to the exponentiated, weighted sum of all matching patterns. For example let's say move a matches patterns 1 and 2, move b matches patterns 1 and 3, and move c matches patterns 2 and 4. Then move a would be selected with probability e^(theta1 + theta2) / (e^(theta1 + theta2) + e^(theta1 + theta3) + e^(theta2 + theta4)). The theta values are the weights on the patterns which we would like to learn. They are the log of the Elo ratings in Remi Coulom's approach. OK, I guess it is the formula 5 in the paper. The only tricky part is computing the vector psi(s,a). Each component of psi(s,a) corresponds to a particular pattern, and is the difference between the observed feature (i.e. whether the pattern actually occurred after move a in position s) and the expected feature (the average value of the pattern, weighted by the probability of selecting each action). I still don't understand this. Is it the formula 6? Could you please give me an example like the above? -- Yamato ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/