Re: [computer-go] Re: Amsterdam 2007 paper

2007-06-11 Thread Sylvain Gelly

Hi John,


You mean they are positions in which humans passed and scored the game?
And you're just testing if the simulation correctly converts all
territories into 1-point eyes,
and kills all invasions?


Actually MoGo (10k simulations) labelled the positions, then a human
checked the labels. The positions were labelled if and only if MoGo
thinks the winner is sure at 80%. So it is why it is rather end game
positions, but not completely end game.

You can have a look at the positions in sgf here (but those sgf does
not give who is to play unfortunately. The native database is not in
sgf):
http://www.lri.fr/~gelly/mogo/positions/

Maybe there are some mistakes in the labels, but if any they are few
and the result holds (we've go strong evidences).

Sylvain
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Amsterdam 2007 paper

2007-06-11 Thread John Tromp

hi Sylvain,


> Figure 3 in your UCT paper shows the accuracy of different simulation 
policies.
> Could you repeat these experiments for accuracy of win/loss determination 
only?
Actually the labelled positions are rather end game positions, and are
labelled as 0/1 (loss/win). So we already are in the case you propose.


You mean they are positions in which humans passed and scored the game?
And you're just testing if the simulation correctly converts all
territories into 1-point eyes,
and kills all invasions?

regards,
-John
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Amsterdam 2007 paper

2007-06-11 Thread Sylvain Gelly

Hello John,

Thank you for your interest.


Figure 3 in your UCT paper shows the accuracy of different simulation policies.
Could you repeat these experiments for accuracy of win/loss determination only?

Actually the labelled positions are rather end game positions, and are
labelled as 0/1 (loss/win). So we already are in the case you propose.

BTW, experiments in actual play using the "best" simulation policy
with RLGO (by "best" I mean giving the best MSE) has also been done
(given in one table, I don't remember which) and showed the relevance
of the MSE measure. (winning rate vs gnugo was around 9% against 8%
with random simulation policy).

Sylvain


So for each test position, you determine if it's won or lost under perfect play,
and then see how close each policy gets to either 0% or 100% wins.
One might think that this measure of accuracy is what actually matters to UCT,
since it is not concerned with margin of victory.
Is the advantage of Mogo's policy still as pronounced?

regards,
-John
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Amsterdam 2007 paper

2007-06-11 Thread John Tromp

hi Sylvain & David,

Figure 3 in your UCT paper shows the accuracy of different simulation policies.
Could you repeat these experiments for accuracy of win/loss determination only?
So for each test position, you determine if it's won or lost under perfect play,
and then see how close each policy gets to either 0% or 100% wins.
One might think that this measure of accuracy is what actually matters to UCT,
since it is not concerned with margin of victory.
Is the advantage of Mogo's policy still as pronounced?

regards,
-John
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Amsterdam 2007 paper (final version)

2007-05-22 Thread Rémi Coulom

Hi,

I have just updated my web page with the final version of my paper:
http://remi.coulom.free.fr/Amsterdam2007/
I have tried to improve it based on all your comments and questions, and 
those of the workshop reviewer. I thank you all very much for your 
interesting remarks.


I have not included results against GNU Go with progressive widening and 
no pattern in the random simulations: pattern are so deeply hard-wired 
into to my current random player that it would be too much work to 
produce such a version. Results with no pattern in the random 
simulations were obtained with the old version of Crazy Stone, before I 
added pattern stuff.


So, I got 57% against GNU Go at 128 minutes/game, 1 CPU: first time I 
win against GNU on 19x19! I still have a lot of progress to make in 
order to catch up with Mogo, though.


Thanks again,

Rémi
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Amsterdam 2007 paper

2007-05-22 Thread Rémi Coulom

Yamato wrote:

Rémi,

May I ask you some more questions?

(1) You define Dj as Dj=Mij*ci+Bij. Is it not Aij but Bij?
What does this mean?
  

Yes, it is ! Thanks for pointing that mistake out.

(2) You have relatively few shape patterns. How large is each
pattern?  5x5, 7x7, or more?
  
I use radius 3,4,5,6,7,8,9,10, according to the distance defined in 
Table 1. This looks very much like those used by de Groot and the 
Microsoft guys, with some very small differences. With a radius of 10 
according to my distance, the most distant point is 5 vertices away from 
the center.


I did not make big efforts to learn more patterns, and bigger ones, 
because I found that they do not improve the playing strength. It 
improves prediction rate a lot, but not playing strength. Crazy Stone is 
not significantly stronger with patterns of size 3 to 10 than it was 
with patterns of sizes 3 and 4 only.


That may be because it is not efficient to use knowledge in the widening 
algorithm that is not available to the random simulations. Also, large 
patterns are useful only in the opening, not in the middle game where 
most crucial tactics take place.

(3) You say "the nth move is added when 40*1.4^(n-2) simulations
have been run." How did you determine these numbers?
  
I tried plenty of alternatives and kept what produced the best strength 
against GNU Go. Remarkably, I found that the same formula produces good 
strength, whatever the size of the board. The alternatives I tried were 
linear widening (really does not work), and changing the values of 40 
and 1.4. Performance is not very sensitive to those values. I tuned them 
when I was using less clever patterns, so it may be that they are not 
very optimal.


Thank you very much for your feedback.

Rémi
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Amsterdam 2007 paper

2007-05-21 Thread Yamato
Rémi,

May I ask you some more questions?

(1) You define Dj as Dj=Mij*ci+Bij. Is it not Aij but Bij?
What does this mean?

(2) You have relatively few shape patterns. How large is each
pattern?  5x5, 7x7, or more?

(3) You say "the nth move is added when 40*1.4^(n-2) simulations
have been run." How did you determine these numbers?

Thanks

--
Yamato
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Amsterdam 2007 paper

2007-05-19 Thread Rémi Coulom

Yamato wrote:

Thanks for the nice paper.

  
I will run cleaner tests for the final version of the paper. I will also 
try to test what each feature brings.



I hope that you do 70k simulations like MoGo, since "90s/game 1CPU"
depends on the simulation speed and the time distribution.

And I have a question. In section 4.1, you wrote:

  

Contiguity to the previous move is a very strong feature (gamma = 23)



Where has the number 23 come from?


It was learnt by the algorithm. I did not re-use the pattern weights 
presented on Table 1. I ran the learning algorithm again, using a set of 
light-weight features. Contiguity to the previous move got a gamma of 23.


Rémi
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Amsterdam 2007 paper

2007-05-18 Thread Rémi Coulom

John Tromp wrote:

On 5/18/07, Rémi Coulom <[EMAIL PROTECTED]> wrote:


My idea was very similar to what you describe. The program built a
collection of rules of the kind "if condition then move". Condition
could be anything from a "tree-search rule" of the kind "in this
particular position play x", or general rule such as "in atari, extend".
It could be also anything in-between, such as a miai specific to the
current position. The strengths of moves were updated with an
incremental Elo-rating algorithm, from the outcomes of random 
simulations.


The obvious way to update weights is to reward all the
rules that fired for the winning side, and penalize all rules that 
fired for

the losing side, with rewards and penalties decaying toward the end
of the playout. But this is not quite Elo like, since it doesn't 
consider rules
to beat each other. So one could make the reward dependent on the 
relative
weight of the chosen rule versus all alternatives. increasing the 
reward if the

alternatives carried a lot of weight.
Is that how your ratings worked?
It is Elo-like in the generalized Bradley-Terry sense I describe in my 
paper: you have one team of one color beating one team of the other 
color. What I do exactly is compute the total Elo rating of black moves 
(with a decay so that clean-up moves don't count, and moves close to the 
root count more), and the total Elo rating of white moves. Then I 
compute the difference between the real outcome and the expected outcome 
according to Elo ratings, and correct Elo ratings proportionally to that 
difference.


Rémi
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Amsterdam 2007 paper

2007-05-18 Thread Yamato
Thanks for the nice paper.

> I will run cleaner tests for the final version of the paper. I will also 
> try to test what each feature brings.

I hope that you do 70k simulations like MoGo, since "90s/game 1CPU"
depends on the simulation speed and the time distribution.

And I have a question. In section 4.1, you wrote:

> Contiguity to the previous move is a very strong feature (gamma = 23)

Where has the number 23 come from?

--
Yamato
--
Easy + Joy + Powerful = Yahoo! Bookmarks x Toolbar
http://pr.mail.yahoo.co.jp/toolbar/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Amsterdam 2007 paper

2007-05-18 Thread John Tromp

On 5/18/07, Rémi Coulom <[EMAIL PROTECTED]> wrote:


My idea was very similar to what you describe. The program built a
collection of rules of the kind "if condition then move". Condition
could be anything from a "tree-search rule" of the kind "in this
particular position play x", or general rule such as "in atari, extend".
It could be also anything in-between, such as a miai specific to the
current position. The strengths of moves were updated with an
incremental Elo-rating algorithm, from the outcomes of random simulations.


The obvious way to update weights is to reward all the
rules that fired for the winning side, and penalize all rules that fired for
the losing side, with rewards and penalties decaying toward the end
of the playout. But this is not quite Elo like, since it doesn't consider rules
to beat each other. So one could make the reward dependent on the relative
weight of the chosen rule versus all alternatives. increasing the reward if the
alternatives carried a lot of weight.
Is that how your ratings worked?

I'm not sure how that compares with TD learning. Maybe someone more
familiar with the latter can point out the differences.

regards,
-John
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Re: Amsterdam 2007 paper

2007-05-18 Thread Rémi Coulom

David Silver wrote:


I would be very interested to hear more about Dimwit's approach, and 
also Remi's experiments with online learning in CrazyStone.




Hi,

My idea was very similar to what you describe. The program built a 
collection of rules of the kind "if condition then move". Condition 
could be anything from a "tree-search rule" of the kind "in this 
particular position play x", or general rule such as "in atari, extend". 
It could be also anything in-between, such as a miai specific to the 
current position. The strengths of moves were updated with an 
incremental Elo-rating algorithm, from the outcomes of random simulations.


I did not go very far in that direction, and my rule-based program is 
still very weak.  I found that I could bring very big improvements to 
Crazy Stone with the techniques I described in my paper, so I focused on 
that. I will incorporate my patterns into the rule-based program in the 
future.


I found that my rule-based program scaled extremely well with larger 
board sizes. What about yours ?


Rémi
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Amsterdam 2007 paper

2007-05-18 Thread Don Dailey
On Fri, 2007-05-18 at 11:43 -0600, David Silver wrote:
> I also use an online learning algorithm in RLGO to adjust feature
> weights during the game. I use around a million features (all possible
> patterns from 1x1 up to 3x3 at all locations on the board) and update
> the weights online from simulated games using temporal difference
> learning. I also use the sum of feature weights to estimate the value
> of a move, rather than a multiplicative estimate. The learning signal
> is only win/lose at the end of each simulation, rather than supervised
> learning like Remi. The results are encouraging (currently ~1820 Elo
> on CGOS, based on 5000 simulations per move) for a program that does
> not use UCT or Monte-Carlo Tree Search in any way.  

This is impressive!   Does your program use an alpha/beta tree search?
I'm not clear on how it selects a move.

- Don


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Amsterdam 2007 paper

2007-05-17 Thread Jason House

Rémi Coulom wrote:
to Magnus: If you consider the example of section 2.2: 1,2,3 wins 
against 4,2 and 1,5,6,7. The probability is 
P=c1c2c3/(c1c2c3+c4c2+c1c5c6c7). For this example:

N1j=c2c3,B1j=0,M1j=c2c3+c5c6c7,A1j=c4c2
N2j=c1c3,B2j=0
N3j=c1c2,B3j=0
N4j=0,B4j=c1c2c3
I will add this example to the paper if it makes things clearer.


I'd recommend not using N_ij as an arbitrary constant when N is used for 
a completely different purpose in the paper. 

My other stumbling block was the jump to the f(x) equation.  Calling it 
f(y_i) and putting the definition of W_i near there would help some. 

Putting log(N_ij*y_i +B_ij) = delta(N_ij)*[log(N_ij)+log(y_i)] + 
delta(B_ij)*log(N_ij) would make the jump very easy.  by delta(x) = 1 
when x=0 and 0 for anything else.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Amsterdam 2007 paper

2007-05-17 Thread Sylvain Gelly

Hi Rémi,

2007/5/17, Rémi Coulom <[EMAIL PROTECTED]>:

to Sylvain: Here are tests of Crazy Stone at 90s/game 1CPU against GNU
3.6 level 10, measured over about 200 games
[...]


Thank you for your answer. These numbers are interesting.
The improvement in the tree search is really huge. It is what I
expected and is consistent with my own experiments (different of
course, but comparing in the same class).
That is good to be consistent, at least we have a chance to understand
something :-p.

Unfortunately I don't have time in the next weeks to read it really
carefully and make (I hope) more interesting comments. But there are
already so many interesting comments on the list ;-).

Again thank you for this interesting work.

Cheers,
Sylvain
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Amsterdam 2007 paper

2007-05-17 Thread Erik van der Werf

On 5/17/07, Rémi Coulom <[EMAIL PROTECTED]> wrote:

Álvaro Begué wrote:
> There are many things in the paper that we had never thought of, like
> considering the distance to the penultimate move.
That feature improved the effectiveness of progressive widening a lot.
When I had only the distance to the previous move, and the opponent made
a dangerous move, Crazy Stone sometimes wanted to tenuki at the root, to
move the focus of local search away from the danger. Considering the
distance to the penultimate move fixes a lot of the problems of local
search. Maybe I should try to consider distances to even older moves.


Nice paper, thanks! It's funny to see the distance to the last move
now being used as an effective feature in Mont Carlo. Only a few years
ago I got a lot of religious criticism when I used it in my move
predictor...

Erik
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Amsterdam 2007 paper

2007-05-17 Thread Magnus Persson

Yes, now I understand. I think what made it hard for me conceptually was that
P(Rj) can be rewritten n different ways for each feature ci 1 <= i <= n. I had
this problem with your example too. I first thought that the lines with the
factors were arbitarary, but then I realized that each line goes with one way
of rewriting P(Rj) it all became clear.

Quoting Rémi Coulom <[EMAIL PROTECTED]>:



to Magnus: If you consider the example of section 2.2: 1,2,3 wins
against 4,2 and 1,5,6,7. The probability is
P=c1c2c3/(c1c2c3+c4c2+c1c5c6c7). For this example:
N1j=c2c3,B1j=0,M1j=c2c3+c5c6c7,A1j=c4c2
N2j=c1c3,B2j=0
N3j=c1c2,B3j=0
N4j=0,B4j=c1c2c3
I will add this example to the paper if it makes things clearer.

Regarding the definition of Wi, |{Nij|Nij!=0}| means "number of
elements of the set of all the Nij that verify Nij!=0". It is
incorrect. It should be |{j|Nij!=0}|. I'll fix it in the final
version. As I wrote in the next sentence, it is equal to the number
of wins of i.


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Amsterdam 2007 paper

2007-05-17 Thread Rémi Coulom

Álvaro Begué wrote:

There are many things in the paper that we had never thought of, like
considering the distance to the penultimate move.
That feature improved the effectiveness of progressive widening a lot. 
When I had only the distance to the previous move, and the opponent made 
a dangerous move, Crazy Stone sometimes wanted to tenuki at the root, to 
move the focus of local search away from the danger. Considering the 
distance to the penultimate move fixes a lot of the problems of local 
search. Maybe I should try to consider distances to even older moves.


Also the approach you devised with John seems to be very similar to what 
I do, indeed. I also worked on the idea of adjusting the random policy 
online to adapt to the current position, with a method that looks like 
incremental Elo rating algorithms. I am deeply convinced that this can 
bring huge improvements. If the random policy only uses general 
patterns, whatever tree search we do at the root won't be able to make a 
strong Go player. We have to find a way to influence random simulations, 
not only inside the tree near the root, but also far from the root. Once 
we know how to do this effectively, we'll have very strong programs.


Rémi
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Amsterdam 2007 paper

2007-05-17 Thread Álvaro Begué

Rémi:

John and I are planning a similar usage of patterns and features in
the MC simulations, although we were thinking of a very different
method to update "strengths". Although we derived our formulas from an
extension of the logistic regression instead of Bradley-Terry models,
we arrived at very similar expressions. The main difference is that
what we have been calling "score" seems to be the log of your
"strength", and instead of multiplying strengths when considering
"teams", we just add the scores of different features. We use an
online learning algorithm to adjust the scores during the game, which
severely limits the kind of algorithms we can use, but also allows us
to learn things that apply specifically to the situations that are
appearing on the board in this game.

Now we only have to make dimwit 400 points stronger to show the world
that our approach works. :)

There are many things in the paper that we had never thought of, like
considering the distance to the penultimate move. Most of those things
won't be directly applicable to dimwit, but it gives us many
suggestions of things to try.

Thanks for this important contribution to computer go.


Álvaro.


On 5/17/07, Rémi Coulom <[EMAIL PROTECTED]> wrote:

Hi,

It seems that e-mail at my university does not work any more. I have
received none of the replies to my message of yesterday, but I could
read them on the web archives of the list. So I have registered from
another address, and will answer to the questions I have read on the web.

to Matt: In Crazy Stone, UCT is still used as before to select moves.
Patterns are used to shorten the list of moves that UCT considers.

to Sylvain: Here are tests of Crazy Stone at 90s/game 1CPU against GNU
3.6 level 10, measured over about 200 games:
no pattern: 38.2% (version available on my web page)
patterns in simulations:68.2%
patterns for pruning and simulations:90.6%
I found these number in my history of Crazy Stone versions, so they may
not be extremely accurate, because other things have changed in between.
I will run cleaner tests for the final version of the paper. I will also
try to test what each feature brings.

to Magnus: If you consider the example of section 2.2: 1,2,3 wins
against 4,2 and 1,5,6,7. The probability is
P=c1c2c3/(c1c2c3+c4c2+c1c5c6c7). For this example:
N1j=c2c3,B1j=0,M1j=c2c3+c5c6c7,A1j=c4c2
N2j=c1c3,B2j=0
N3j=c1c2,B3j=0
N4j=0,B4j=c1c2c3
I will add this example to the paper if it makes things clearer.

Regarding the definition of Wi, |{Nij|Nij!=0}| means "number of elements
of the set of all the Nij that verify Nij!=0". It is incorrect. It
should be |{j|Nij!=0}|. I'll fix it in the final version. As I wrote in
the next sentence, it is equal to the number of wins of i.

Thanks to all for the kind words, and the interesting remarks and questions.

Rémi
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Amsterdam 2007 paper

2007-05-17 Thread Chris Fant

> It seems that e-mail at my university does not work any more. I have
> received none of the replies to my message of yesterday, but I could
> read them on the web archives of the list. So I have registered from
> another address, and will answer to the questions I have read on the web.


In section 2.3, you say:

Also, the generalization to teams assumes that the strength of a team
is the sum (in terms of Elo ratings) of the strengths of its members.

But it seems from the equation just above that section that you means
"the product" and not "the sum".  Or did I misunderstand something?


Never mind.  I just remembered that Elo a log of y.  So "the sum"
makes sense now.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Amsterdam 2007 paper

2007-05-17 Thread Chris Fant

It seems that e-mail at my university does not work any more. I have
received none of the replies to my message of yesterday, but I could
read them on the web archives of the list. So I have registered from
another address, and will answer to the questions I have read on the web.



In section 2.3, you say:

Also, the generalization to teams assumes that the strength of a team
is the sum (in terms of Elo ratings) of the strengths of its members.

But it seems from the equation just above that section that you means
"the product" and not "the sum".  Or did I misunderstand something?
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/