Re: [computer-go] Re: Amsterdam 2007 paper
Hi John, You mean they are positions in which humans passed and scored the game? And you're just testing if the simulation correctly converts all territories into 1-point eyes, and kills all invasions? Actually MoGo (10k simulations) labelled the positions, then a human checked the labels. The positions were labelled if and only if MoGo thinks the winner is sure at 80%. So it is why it is rather end game positions, but not completely end game. You can have a look at the positions in sgf here (but those sgf does not give who is to play unfortunately. The native database is not in sgf): http://www.lri.fr/~gelly/mogo/positions/ Maybe there are some mistakes in the labels, but if any they are few and the result holds (we've go strong evidences). Sylvain ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Amsterdam 2007 paper
hi Sylvain, > Figure 3 in your UCT paper shows the accuracy of different simulation policies. > Could you repeat these experiments for accuracy of win/loss determination only? Actually the labelled positions are rather end game positions, and are labelled as 0/1 (loss/win). So we already are in the case you propose. You mean they are positions in which humans passed and scored the game? And you're just testing if the simulation correctly converts all territories into 1-point eyes, and kills all invasions? regards, -John ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Amsterdam 2007 paper
Hello John, Thank you for your interest. Figure 3 in your UCT paper shows the accuracy of different simulation policies. Could you repeat these experiments for accuracy of win/loss determination only? Actually the labelled positions are rather end game positions, and are labelled as 0/1 (loss/win). So we already are in the case you propose. BTW, experiments in actual play using the "best" simulation policy with RLGO (by "best" I mean giving the best MSE) has also been done (given in one table, I don't remember which) and showed the relevance of the MSE measure. (winning rate vs gnugo was around 9% against 8% with random simulation policy). Sylvain So for each test position, you determine if it's won or lost under perfect play, and then see how close each policy gets to either 0% or 100% wins. One might think that this measure of accuracy is what actually matters to UCT, since it is not concerned with margin of victory. Is the advantage of Mogo's policy still as pronounced? regards, -John ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Amsterdam 2007 paper
hi Sylvain & David, Figure 3 in your UCT paper shows the accuracy of different simulation policies. Could you repeat these experiments for accuracy of win/loss determination only? So for each test position, you determine if it's won or lost under perfect play, and then see how close each policy gets to either 0% or 100% wins. One might think that this measure of accuracy is what actually matters to UCT, since it is not concerned with margin of victory. Is the advantage of Mogo's policy still as pronounced? regards, -John ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Amsterdam 2007 paper (final version)
Hi, I have just updated my web page with the final version of my paper: http://remi.coulom.free.fr/Amsterdam2007/ I have tried to improve it based on all your comments and questions, and those of the workshop reviewer. I thank you all very much for your interesting remarks. I have not included results against GNU Go with progressive widening and no pattern in the random simulations: pattern are so deeply hard-wired into to my current random player that it would be too much work to produce such a version. Results with no pattern in the random simulations were obtained with the old version of Crazy Stone, before I added pattern stuff. So, I got 57% against GNU Go at 128 minutes/game, 1 CPU: first time I win against GNU on 19x19! I still have a lot of progress to make in order to catch up with Mogo, though. Thanks again, Rémi ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Amsterdam 2007 paper
Yamato wrote: Rémi, May I ask you some more questions? (1) You define Dj as Dj=Mij*ci+Bij. Is it not Aij but Bij? What does this mean? Yes, it is ! Thanks for pointing that mistake out. (2) You have relatively few shape patterns. How large is each pattern? 5x5, 7x7, or more? I use radius 3,4,5,6,7,8,9,10, according to the distance defined in Table 1. This looks very much like those used by de Groot and the Microsoft guys, with some very small differences. With a radius of 10 according to my distance, the most distant point is 5 vertices away from the center. I did not make big efforts to learn more patterns, and bigger ones, because I found that they do not improve the playing strength. It improves prediction rate a lot, but not playing strength. Crazy Stone is not significantly stronger with patterns of size 3 to 10 than it was with patterns of sizes 3 and 4 only. That may be because it is not efficient to use knowledge in the widening algorithm that is not available to the random simulations. Also, large patterns are useful only in the opening, not in the middle game where most crucial tactics take place. (3) You say "the nth move is added when 40*1.4^(n-2) simulations have been run." How did you determine these numbers? I tried plenty of alternatives and kept what produced the best strength against GNU Go. Remarkably, I found that the same formula produces good strength, whatever the size of the board. The alternatives I tried were linear widening (really does not work), and changing the values of 40 and 1.4. Performance is not very sensitive to those values. I tuned them when I was using less clever patterns, so it may be that they are not very optimal. Thank you very much for your feedback. Rémi ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Amsterdam 2007 paper
Rémi, May I ask you some more questions? (1) You define Dj as Dj=Mij*ci+Bij. Is it not Aij but Bij? What does this mean? (2) You have relatively few shape patterns. How large is each pattern? 5x5, 7x7, or more? (3) You say "the nth move is added when 40*1.4^(n-2) simulations have been run." How did you determine these numbers? Thanks -- Yamato ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Amsterdam 2007 paper
Yamato wrote: Thanks for the nice paper. I will run cleaner tests for the final version of the paper. I will also try to test what each feature brings. I hope that you do 70k simulations like MoGo, since "90s/game 1CPU" depends on the simulation speed and the time distribution. And I have a question. In section 4.1, you wrote: Contiguity to the previous move is a very strong feature (gamma = 23) Where has the number 23 come from? It was learnt by the algorithm. I did not re-use the pattern weights presented on Table 1. I ran the learning algorithm again, using a set of light-weight features. Contiguity to the previous move got a gamma of 23. Rémi ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Amsterdam 2007 paper
John Tromp wrote: On 5/18/07, Rémi Coulom <[EMAIL PROTECTED]> wrote: My idea was very similar to what you describe. The program built a collection of rules of the kind "if condition then move". Condition could be anything from a "tree-search rule" of the kind "in this particular position play x", or general rule such as "in atari, extend". It could be also anything in-between, such as a miai specific to the current position. The strengths of moves were updated with an incremental Elo-rating algorithm, from the outcomes of random simulations. The obvious way to update weights is to reward all the rules that fired for the winning side, and penalize all rules that fired for the losing side, with rewards and penalties decaying toward the end of the playout. But this is not quite Elo like, since it doesn't consider rules to beat each other. So one could make the reward dependent on the relative weight of the chosen rule versus all alternatives. increasing the reward if the alternatives carried a lot of weight. Is that how your ratings worked? It is Elo-like in the generalized Bradley-Terry sense I describe in my paper: you have one team of one color beating one team of the other color. What I do exactly is compute the total Elo rating of black moves (with a decay so that clean-up moves don't count, and moves close to the root count more), and the total Elo rating of white moves. Then I compute the difference between the real outcome and the expected outcome according to Elo ratings, and correct Elo ratings proportionally to that difference. Rémi ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Amsterdam 2007 paper
Thanks for the nice paper. > I will run cleaner tests for the final version of the paper. I will also > try to test what each feature brings. I hope that you do 70k simulations like MoGo, since "90s/game 1CPU" depends on the simulation speed and the time distribution. And I have a question. In section 4.1, you wrote: > Contiguity to the previous move is a very strong feature (gamma = 23) Where has the number 23 come from? -- Yamato -- Easy + Joy + Powerful = Yahoo! Bookmarks x Toolbar http://pr.mail.yahoo.co.jp/toolbar/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Amsterdam 2007 paper
On 5/18/07, Rémi Coulom <[EMAIL PROTECTED]> wrote: My idea was very similar to what you describe. The program built a collection of rules of the kind "if condition then move". Condition could be anything from a "tree-search rule" of the kind "in this particular position play x", or general rule such as "in atari, extend". It could be also anything in-between, such as a miai specific to the current position. The strengths of moves were updated with an incremental Elo-rating algorithm, from the outcomes of random simulations. The obvious way to update weights is to reward all the rules that fired for the winning side, and penalize all rules that fired for the losing side, with rewards and penalties decaying toward the end of the playout. But this is not quite Elo like, since it doesn't consider rules to beat each other. So one could make the reward dependent on the relative weight of the chosen rule versus all alternatives. increasing the reward if the alternatives carried a lot of weight. Is that how your ratings worked? I'm not sure how that compares with TD learning. Maybe someone more familiar with the latter can point out the differences. regards, -John ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Amsterdam 2007 paper
David Silver wrote: I would be very interested to hear more about Dimwit's approach, and also Remi's experiments with online learning in CrazyStone. Hi, My idea was very similar to what you describe. The program built a collection of rules of the kind "if condition then move". Condition could be anything from a "tree-search rule" of the kind "in this particular position play x", or general rule such as "in atari, extend". It could be also anything in-between, such as a miai specific to the current position. The strengths of moves were updated with an incremental Elo-rating algorithm, from the outcomes of random simulations. I did not go very far in that direction, and my rule-based program is still very weak. I found that I could bring very big improvements to Crazy Stone with the techniques I described in my paper, so I focused on that. I will incorporate my patterns into the rule-based program in the future. I found that my rule-based program scaled extremely well with larger board sizes. What about yours ? Rémi ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Amsterdam 2007 paper
On Fri, 2007-05-18 at 11:43 -0600, David Silver wrote: > I also use an online learning algorithm in RLGO to adjust feature > weights during the game. I use around a million features (all possible > patterns from 1x1 up to 3x3 at all locations on the board) and update > the weights online from simulated games using temporal difference > learning. I also use the sum of feature weights to estimate the value > of a move, rather than a multiplicative estimate. The learning signal > is only win/lose at the end of each simulation, rather than supervised > learning like Remi. The results are encouraging (currently ~1820 Elo > on CGOS, based on 5000 simulations per move) for a program that does > not use UCT or Monte-Carlo Tree Search in any way. This is impressive! Does your program use an alpha/beta tree search? I'm not clear on how it selects a move. - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Amsterdam 2007 paper
Rémi Coulom wrote: to Magnus: If you consider the example of section 2.2: 1,2,3 wins against 4,2 and 1,5,6,7. The probability is P=c1c2c3/(c1c2c3+c4c2+c1c5c6c7). For this example: N1j=c2c3,B1j=0,M1j=c2c3+c5c6c7,A1j=c4c2 N2j=c1c3,B2j=0 N3j=c1c2,B3j=0 N4j=0,B4j=c1c2c3 I will add this example to the paper if it makes things clearer. I'd recommend not using N_ij as an arbitrary constant when N is used for a completely different purpose in the paper. My other stumbling block was the jump to the f(x) equation. Calling it f(y_i) and putting the definition of W_i near there would help some. Putting log(N_ij*y_i +B_ij) = delta(N_ij)*[log(N_ij)+log(y_i)] + delta(B_ij)*log(N_ij) would make the jump very easy. by delta(x) = 1 when x=0 and 0 for anything else. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Amsterdam 2007 paper
Hi Rémi, 2007/5/17, Rémi Coulom <[EMAIL PROTECTED]>: to Sylvain: Here are tests of Crazy Stone at 90s/game 1CPU against GNU 3.6 level 10, measured over about 200 games [...] Thank you for your answer. These numbers are interesting. The improvement in the tree search is really huge. It is what I expected and is consistent with my own experiments (different of course, but comparing in the same class). That is good to be consistent, at least we have a chance to understand something :-p. Unfortunately I don't have time in the next weeks to read it really carefully and make (I hope) more interesting comments. But there are already so many interesting comments on the list ;-). Again thank you for this interesting work. Cheers, Sylvain ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Amsterdam 2007 paper
On 5/17/07, Rémi Coulom <[EMAIL PROTECTED]> wrote: Álvaro Begué wrote: > There are many things in the paper that we had never thought of, like > considering the distance to the penultimate move. That feature improved the effectiveness of progressive widening a lot. When I had only the distance to the previous move, and the opponent made a dangerous move, Crazy Stone sometimes wanted to tenuki at the root, to move the focus of local search away from the danger. Considering the distance to the penultimate move fixes a lot of the problems of local search. Maybe I should try to consider distances to even older moves. Nice paper, thanks! It's funny to see the distance to the last move now being used as an effective feature in Mont Carlo. Only a few years ago I got a lot of religious criticism when I used it in my move predictor... Erik ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Amsterdam 2007 paper
Yes, now I understand. I think what made it hard for me conceptually was that P(Rj) can be rewritten n different ways for each feature ci 1 <= i <= n. I had this problem with your example too. I first thought that the lines with the factors were arbitarary, but then I realized that each line goes with one way of rewriting P(Rj) it all became clear. Quoting Rémi Coulom <[EMAIL PROTECTED]>: to Magnus: If you consider the example of section 2.2: 1,2,3 wins against 4,2 and 1,5,6,7. The probability is P=c1c2c3/(c1c2c3+c4c2+c1c5c6c7). For this example: N1j=c2c3,B1j=0,M1j=c2c3+c5c6c7,A1j=c4c2 N2j=c1c3,B2j=0 N3j=c1c2,B3j=0 N4j=0,B4j=c1c2c3 I will add this example to the paper if it makes things clearer. Regarding the definition of Wi, |{Nij|Nij!=0}| means "number of elements of the set of all the Nij that verify Nij!=0". It is incorrect. It should be |{j|Nij!=0}|. I'll fix it in the final version. As I wrote in the next sentence, it is equal to the number of wins of i. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Amsterdam 2007 paper
Álvaro Begué wrote: There are many things in the paper that we had never thought of, like considering the distance to the penultimate move. That feature improved the effectiveness of progressive widening a lot. When I had only the distance to the previous move, and the opponent made a dangerous move, Crazy Stone sometimes wanted to tenuki at the root, to move the focus of local search away from the danger. Considering the distance to the penultimate move fixes a lot of the problems of local search. Maybe I should try to consider distances to even older moves. Also the approach you devised with John seems to be very similar to what I do, indeed. I also worked on the idea of adjusting the random policy online to adapt to the current position, with a method that looks like incremental Elo rating algorithms. I am deeply convinced that this can bring huge improvements. If the random policy only uses general patterns, whatever tree search we do at the root won't be able to make a strong Go player. We have to find a way to influence random simulations, not only inside the tree near the root, but also far from the root. Once we know how to do this effectively, we'll have very strong programs. Rémi ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Amsterdam 2007 paper
Rémi: John and I are planning a similar usage of patterns and features in the MC simulations, although we were thinking of a very different method to update "strengths". Although we derived our formulas from an extension of the logistic regression instead of Bradley-Terry models, we arrived at very similar expressions. The main difference is that what we have been calling "score" seems to be the log of your "strength", and instead of multiplying strengths when considering "teams", we just add the scores of different features. We use an online learning algorithm to adjust the scores during the game, which severely limits the kind of algorithms we can use, but also allows us to learn things that apply specifically to the situations that are appearing on the board in this game. Now we only have to make dimwit 400 points stronger to show the world that our approach works. :) There are many things in the paper that we had never thought of, like considering the distance to the penultimate move. Most of those things won't be directly applicable to dimwit, but it gives us many suggestions of things to try. Thanks for this important contribution to computer go. Álvaro. On 5/17/07, Rémi Coulom <[EMAIL PROTECTED]> wrote: Hi, It seems that e-mail at my university does not work any more. I have received none of the replies to my message of yesterday, but I could read them on the web archives of the list. So I have registered from another address, and will answer to the questions I have read on the web. to Matt: In Crazy Stone, UCT is still used as before to select moves. Patterns are used to shorten the list of moves that UCT considers. to Sylvain: Here are tests of Crazy Stone at 90s/game 1CPU against GNU 3.6 level 10, measured over about 200 games: no pattern: 38.2% (version available on my web page) patterns in simulations:68.2% patterns for pruning and simulations:90.6% I found these number in my history of Crazy Stone versions, so they may not be extremely accurate, because other things have changed in between. I will run cleaner tests for the final version of the paper. I will also try to test what each feature brings. to Magnus: If you consider the example of section 2.2: 1,2,3 wins against 4,2 and 1,5,6,7. The probability is P=c1c2c3/(c1c2c3+c4c2+c1c5c6c7). For this example: N1j=c2c3,B1j=0,M1j=c2c3+c5c6c7,A1j=c4c2 N2j=c1c3,B2j=0 N3j=c1c2,B3j=0 N4j=0,B4j=c1c2c3 I will add this example to the paper if it makes things clearer. Regarding the definition of Wi, |{Nij|Nij!=0}| means "number of elements of the set of all the Nij that verify Nij!=0". It is incorrect. It should be |{j|Nij!=0}|. I'll fix it in the final version. As I wrote in the next sentence, it is equal to the number of wins of i. Thanks to all for the kind words, and the interesting remarks and questions. Rémi ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Amsterdam 2007 paper
> It seems that e-mail at my university does not work any more. I have > received none of the replies to my message of yesterday, but I could > read them on the web archives of the list. So I have registered from > another address, and will answer to the questions I have read on the web. In section 2.3, you say: Also, the generalization to teams assumes that the strength of a team is the sum (in terms of Elo ratings) of the strengths of its members. But it seems from the equation just above that section that you means "the product" and not "the sum". Or did I misunderstand something? Never mind. I just remembered that Elo a log of y. So "the sum" makes sense now. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Amsterdam 2007 paper
It seems that e-mail at my university does not work any more. I have received none of the replies to my message of yesterday, but I could read them on the web archives of the list. So I have registered from another address, and will answer to the questions I have read on the web. In section 2.3, you say: Also, the generalization to teams assumes that the strength of a team is the sum (in terms of Elo ratings) of the strengths of its members. But it seems from the equation just above that section that you means "the product" and not "the sum". Or did I misunderstand something? ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/