Re: [computer-go] How to properly implement RAVE?

2009-01-30 Thread Magnus Persson

Quoting Sylvain Gelly sylvain.ge...@m4x.org:


On Wed, Jan 28, 2009 at 10:01 PM, Isaac Deutsch i...@gmx.ch wrote:


And a final question: You calculate the (beta) coefficient as
c = rc / (rc+c+rc*c*BIAS);
which looks similar to the formula proposed by David Silver (If I recall
his name correctly). However, in his formula, the last term looks like
rc*c*BIAS/(q_ur*(1-q_ur))
Is it correct that we could get q_ur from the current UCT-RAVE mean value,
and that it is used like that?



Yes the formula looks very similar (David proposed that formula to me in the
beginning of 2007). However my implementation did not contain
the (q_ur*(1-q_ur) factor, that I approximated by a constant, taking q=0.5
so the factor=0.25.
I did not try the other formula, maybe it works better in practice, while I
would expect it is similar in practice.


Valkyria uses an even more complicated version of what David Silver  
proposed (I really did not understand it so I came up with something  
that looked plausible to me that actually estimated the bias for each  
candidate move rather than assuming it constant).


When Sylvain proposed this simple version I tested that version  
against my own interpretation. On 9x9 my complicated version might  
have a win rate 3% better than the simple version for 3 data points  
(700 games each) near the maximum. The standard error according to  
twogtp is 1.9.


On 19x19 I got results where there no difference at all but with much  
higher uncertainty because there was not many games played.


But the term is important for sure, the constant BIAS used should be  
larger than 0 but you should be careful to not set it too high. For  
Valkyria the 0.015 value Sylvain posted here worked fine. But if it is  
higher for example 0.15 leads to bad performance on 9x9 and  
catastrophic performance on 19x19. Initially I thought this parameter  
should be something like 1 and that was completely wrong. I was just  
lucky to get it right, just by visual inspection of the search  
behavior when I played around with the parameter.


The reason is that the bias term of the denominator should be close to  
zero initially is to allow AMAF to have strong impact on the search  
but at some point (which is delayed by having a small BIAS constant)  
there is a quick shift towards using the real win rate instead.


-Magnus


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] How to properly implement RAVE?

2009-01-30 Thread Sylvain Gelly
On Fri, Jan 30, 2009 at 9:29 AM, Magnus Persson magnus.pers...@phmp.sewrote:

 Quoting Sylvain Gelly sylvain.ge...@m4x.org:

  On Wed, Jan 28, 2009 at 10:01 PM, Isaac Deutsch i...@gmx.ch wrote:

  And a final question: You calculate the (beta) coefficient as
 c = rc / (rc+c+rc*c*BIAS);
 which looks similar to the formula proposed by David Silver (If I recall
 his name correctly). However, in his formula, the last term looks like
 rc*c*BIAS/(q_ur*(1-q_ur))
 Is it correct that we could get q_ur from the current UCT-RAVE mean
 value,
 and that it is used like that?



 Yes the formula looks very similar (David proposed that formula to me in
 the
 beginning of 2007). However my implementation did not contain
 the (q_ur*(1-q_ur) factor, that I approximated by a constant, taking q=0.5
 so the factor=0.25.
 I did not try the other formula, maybe it works better in practice, while
 I
 would expect it is similar in practice.


 Valkyria uses an even more complicated version of what David Silver
 proposed (I really did not understand it so I came up with something that
 looked plausible to me that actually estimated the bias for each candidate
 move rather than assuming it constant).

 When Sylvain proposed this simple version I tested that version against my
 own interpretation. On 9x9 my complicated version might have a win rate 3%
 better than the simple version for 3 data points (700 games each) near the
 maximum. The standard error according to twogtp is 1.9.

 On 19x19 I got results where there no difference at all but with much
 higher uncertainty because there was not many games played.


Did you try to tune the bias constant at all or just took the one I posted?
I wrote it from memory and I believe it is in the range of possibly good
values, but it is certainly not optimal, I can be off by a significant
factor in both directions.


 But the term is important for sure, the constant BIAS used should be larger
 than 0 but you should be careful to not set it too high. For Valkyria the
 0.015 value Sylvain posted here worked fine. But if it is higher for example
 0.15 leads to bad performance on 9x9 and catastrophic performance on 19x19.
 Initially I thought this parameter should be something like 1 and that was
 completely wrong. I was just lucky to get it right, just by visual
 inspection of the search behavior when I played around with the parameter.


The bias can't possibly be 1 because by definition it is the difference
between the expected value of the normal tree search and the expected value
of AMAF. As those two values are in [0,1] anyway, the bias is certainly not
1, and in most cases very small.



 The reason is that the bias term of the denominator should be close to zero
 initially is to allow AMAF to have strong impact on the search but at some
 point (which is delayed by having a small BIAS constant) there is a quick
 shift towards using the real win rate instead.

 -Magnus



 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-30 Thread Magnus Persson

Quoting Sylvain Gelly sylvain.ge...@m4x.org:

On Fri, Jan 30, 2009 at 9:29 AM, Magnus Persson   
magnus.pers...@phmp.sewrote:



Did you try to tune the bias constant at all or just took the one I posted?
I wrote it from memory and I believe it is in the range of possibly good
values, but it is certainly not optimal, I can be off by a significant
factor in both directions.


Yes I always tries to test at least 5 values. Three values centered  
around what I currently believe is the best value and two or more  
values to the extremes to get the big picture and make sure that I am  
at least the maximum.


ConstantWinrate Against Gnugo
0.00015 43.9
0.0015  50
0.015   50
0.1543.1
1   7.1
10  6.7


The bias can't possibly be 1 because by definition it is the difference
between the expected value of the normal tree search and the expected value
of AMAF. As those two values are in [0,1] anyway, the bias is certainly not
1, and in most cases very small.


I think I was confused about exactly what in the term that was the  
bias. I do not really remember what and why I thought these things.  
Anyway your explanation is very clear.


On a side note I think Valkyria for some moves have very large biases  
because the playouts are very heavy. I am thinking of an experiment to  
test this. One way is to simply play uniform playouts from a single  
position N times for all moves and thus get accurate win rates for all  
those moves as well as AMAF values. Then taking the difference in  
values for AMAF and real win rates should reveal the size of bias in  
general and if there are systematic difference for certain patterns or  
situations.


Alternatively one could play N playouts for a single move and see what  
AMAF values one get. Now can see to what extent the biases are stable  
when moves are played on the board. The real values will change for  
all moves and the AMAF as well, but will the bias be constant? Or will  
it be random as a function of the position.


There is no tree search in these two cases.

Selective search is often similar to the second situation, and my  
question is whether this can cause larger systematic biases for some  
moves, that thus is missed. And if one understands the nature of such  
biases maybe they can be neutralized somehow.


It could be that using patterns to start with strong biases in the  
prior AMAF values is doing exactly this.


Has anyone has done anything like this or have any interesting  
opinions I would be thankful.


-Magnus
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] How to properly implement RAVE?

2009-01-30 Thread Isaac Deutsch
At the moment I'm also tuning this constant in the range 0.001-0.1. Given
uniformly random (light) playouts, is the bias expected to be bigger than with
heavy playouts, meaning the constant has to be bigger aswell?
-- 
Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: 
http://www.gmx.net/de/go/multimessenger01
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/