[computer-go] published stats on string merge frequencies, etc

2009-04-29 Thread Christian Nentwich

Hi all,

I've been looking through the archives trying to find out if anybody had 
published information on the following during uniform random playouts 
and/or real games:

 - Frequency of string merges
 - Frequency of placing stones in empty space
 - Average length of strings, etc.

I noticed that quite a few people had experimented, has anybody put 
their stats up anywhere on the web, or in a paper?


regards,
Christian
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] cgos: Donn Daily?

2009-04-29 Thread Dave Dyer

Donn, your email at d...@mit.edu is bouncing. 

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-29 Thread David Silver

Hi Yamato,


Could you give us the source code which you used?  Your algorithm is
too complicated, so it would be very helpful if possible.


Actually I think the source code would be much harder to understand!  
It is written inside RLGO, and makes use of a substantial existing  
framework that would take a lot of effort to understand. (On a  
separate note I am considering making RLGO open source at some point,  
but I'd prefer to spend some effort cleaning it up before making it  
public).


But I think maybe Algorithm 1 is much easier than you think:

A: Estimate value V* of every position in a training set, using deep  
rollouts.


B: Repeat, for each position in the training set
1. Run M simulations, estimate value of position (call this V)
	2. Run another N simulations, average the value of psi(s,a) over all  
positions and moves in games that black won (call this g)

3. Adjust parameters by alpha * (V* - V) * g

The feature vector is the set of patterns you use, with value 1 if a  
pattern is matched and 0 otherwise. The simulation policy selects  
actions in proportion to the exponentiated, weighted sum of all  
matching patterns. For example let's say move a matches patterns 1 and  
2, move b matches patterns 1 and 3, and move c matches patterns 2 and  
4. Then move a would be selected with probability e^(theta1 +  
theta2) / (e^(theta1 + theta2) + e^(theta1 + theta3) + e^(theta2 +  
theta4)). The theta values are the weights on the patterns which we  
would like to learn. They are the log of the Elo ratings in Remi  
Coulom's approach.


The only tricky part is computing the vector psi(s,a). Each component  
of psi(s,a) corresponds to a particular pattern, and is the difference  
between the observed feature (i.e. whether the pattern actually  
occurred after move a in position s) and the expected feature (the  
average value of the pattern, weighted by the probability of selecting  
each action).


It's also very important to be careful about signs and the colour to  
play - it's easy to make a mistake and follow the gradient in the  
wrong direction.


Is that any clearer?
-Dave

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-29 Thread Michael Williams

David Silver wrote:

Hi Michael,

But one thing confuses me: You are using the value from Fuego's 10k 
simulations as an approximation of the actual value of the position.  
But isn't the actual
value of the position either a win or a loss?  On such small boards, 
can't you assume that Fuego is able to correctly determin who is 
winning and round it's
evaluation to the nearest win/loss?  i.e. if it evaluates the position 
to 0.674, that gets rounded to 1.  If such an assumption about Fuego's 
ability to read
the position on a small board is valid, then it should improve the 
results of your balanced simulation strategy, right?  Or am I missing 
something?


It's true that 5x5 Go is solved, so in principle we could have used the 
true minimax values. However we chose to use an approach that can scale 
to larger boards, which means that we should treat the expert 
evaluations as approximate. And in fact Fuego was not always accurate on 
6x6 boards, as we used only 10k simulations in our training set.


Also, I think it really helps to have soft rather than hard expert 
evaluations. We want a simulation policy that helps differentiate e.g. a 
90% winning position from an 85% winning position. Rounding all the 
expert evaluations to 0 or 1 would lose much of this advantage.


-Dave


By this argument (your last paragraph), you need to do some magical
number of simulations for the training data.  Not enough simulations
and you have too much noise.  And infinite simulations gives you hard
0 or 1 results.  But I can't argue with your results.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Re: [computer go] Monte-Carlo Simulation Balancing

2009-04-29 Thread David Silver

Hi Remi,


What komi did you use for 5x5 and 6x6 ?

I used 7.5 komi for both board sizes.


I find it strange that you get only 70 Elo points from supervised
learning over uniform random. Don't you have any feature for atari
extension ? This one alone should improve strength immensely (extend
string in atari because of the previous move).
Actually no. The features are very simple, and know how to capture but  
not how to defend ataris. I'm sure that a better set of features could  
improve by more than 70 Elo, but I expect we would still see a benefit  
to balancing the weights correctly. For example, the Fuego policy  
defends ataris and follows several other common-sense rules, but the  
results in 5x5 and 6x6 show that it is not well balanced on small  
boards.


Let us extrapolate: I got more than 200 Elo points of improvements  
from
my patterns in 9x9 over uniform random (I never really measured  
that, it

may be even more than 200).
I guess you got more than 200 Elo on 9x9, in Mogo (Gelly et al. 2006)  
the improvement from uniform random was at least 400 Elo and I think  
your simulation policy is probably at least as strong.


By the way I was sad to hear you're not working on Crazystone any  
more. Is it because you are you busy with other projects?



So maybe I could get 600 more Elo points
with your method. And even more on 19x19.
I noticed that, in general, changes in the playout policy have a much
bigger impact on larger boards than on smaller boards.


As to whether we can extrapolate, I hope so :-)
I share the same feeling that improving the simulation policy will be  
more impactful on bigger boards with longer simulations.
On the other hand I've been surprised by many things in MC Go, so  
nothing is certain until we try it!


-Dave

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-29 Thread Yamato
David Silver wrote:
A: Estimate value V* of every position in a training set, using deep  
rollouts.

B: Repeat, for each position in the training set
   1. Run M simulations, estimate value of position (call this V)
   2. Run another N simulations, average the value of psi(s,a) over all  
positions and moves in games that black won (call this g)
   3. Adjust parameters by alpha * (V* - V) * g

Thanks for the detailed explanation.
M, N and alpha are constant numbers, right?  What did you set them to?

The feature vector is the set of patterns you use, with value 1 if a  
pattern is matched and 0 otherwise. The simulation policy selects  
actions in proportion to the exponentiated, weighted sum of all  
matching patterns. For example let's say move a matches patterns 1 and  
2, move b matches patterns 1 and 3, and move c matches patterns 2 and  
4. Then move a would be selected with probability e^(theta1 +  
theta2) / (e^(theta1 + theta2) + e^(theta1 + theta3) + e^(theta2 +  
theta4)). The theta values are the weights on the patterns which we  
would like to learn. They are the log of the Elo ratings in Remi  
Coulom's approach.

OK, I guess it is the formula 5 in the paper.

The only tricky part is computing the vector psi(s,a). Each component  
of psi(s,a) corresponds to a particular pattern, and is the difference  
between the observed feature (i.e. whether the pattern actually  
occurred after move a in position s) and the expected feature (the  
average value of the pattern, weighted by the probability of selecting  
each action).

I still don't understand this. Is it the formula 6?
Could you please give me an example like the above?

--
Yamato
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/