Re: [computer-go] How to properly implement RAVE?
Quoting Sylvain Gelly sylvain.ge...@m4x.org: On Wed, Jan 28, 2009 at 10:01 PM, Isaac Deutsch i...@gmx.ch wrote: And a final question: You calculate the (beta) coefficient as c = rc / (rc+c+rc*c*BIAS); which looks similar to the formula proposed by David Silver (If I recall his name correctly). However, in his formula, the last term looks like rc*c*BIAS/(q_ur*(1-q_ur)) Is it correct that we could get q_ur from the current UCT-RAVE mean value, and that it is used like that? Yes the formula looks very similar (David proposed that formula to me in the beginning of 2007). However my implementation did not contain the (q_ur*(1-q_ur) factor, that I approximated by a constant, taking q=0.5 so the factor=0.25. I did not try the other formula, maybe it works better in practice, while I would expect it is similar in practice. Valkyria uses an even more complicated version of what David Silver proposed (I really did not understand it so I came up with something that looked plausible to me that actually estimated the bias for each candidate move rather than assuming it constant). When Sylvain proposed this simple version I tested that version against my own interpretation. On 9x9 my complicated version might have a win rate 3% better than the simple version for 3 data points (700 games each) near the maximum. The standard error according to twogtp is 1.9. On 19x19 I got results where there no difference at all but with much higher uncertainty because there was not many games played. But the term is important for sure, the constant BIAS used should be larger than 0 but you should be careful to not set it too high. For Valkyria the 0.015 value Sylvain posted here worked fine. But if it is higher for example 0.15 leads to bad performance on 9x9 and catastrophic performance on 19x19. Initially I thought this parameter should be something like 1 and that was completely wrong. I was just lucky to get it right, just by visual inspection of the search behavior when I played around with the parameter. The reason is that the bias term of the denominator should be close to zero initially is to allow AMAF to have strong impact on the search but at some point (which is delayed by having a small BIAS constant) there is a quick shift towards using the real win rate instead. -Magnus ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to properly implement RAVE?
On Fri, Jan 30, 2009 at 9:29 AM, Magnus Persson magnus.pers...@phmp.sewrote: Quoting Sylvain Gelly sylvain.ge...@m4x.org: On Wed, Jan 28, 2009 at 10:01 PM, Isaac Deutsch i...@gmx.ch wrote: And a final question: You calculate the (beta) coefficient as c = rc / (rc+c+rc*c*BIAS); which looks similar to the formula proposed by David Silver (If I recall his name correctly). However, in his formula, the last term looks like rc*c*BIAS/(q_ur*(1-q_ur)) Is it correct that we could get q_ur from the current UCT-RAVE mean value, and that it is used like that? Yes the formula looks very similar (David proposed that formula to me in the beginning of 2007). However my implementation did not contain the (q_ur*(1-q_ur) factor, that I approximated by a constant, taking q=0.5 so the factor=0.25. I did not try the other formula, maybe it works better in practice, while I would expect it is similar in practice. Valkyria uses an even more complicated version of what David Silver proposed (I really did not understand it so I came up with something that looked plausible to me that actually estimated the bias for each candidate move rather than assuming it constant). When Sylvain proposed this simple version I tested that version against my own interpretation. On 9x9 my complicated version might have a win rate 3% better than the simple version for 3 data points (700 games each) near the maximum. The standard error according to twogtp is 1.9. On 19x19 I got results where there no difference at all but with much higher uncertainty because there was not many games played. Did you try to tune the bias constant at all or just took the one I posted? I wrote it from memory and I believe it is in the range of possibly good values, but it is certainly not optimal, I can be off by a significant factor in both directions. But the term is important for sure, the constant BIAS used should be larger than 0 but you should be careful to not set it too high. For Valkyria the 0.015 value Sylvain posted here worked fine. But if it is higher for example 0.15 leads to bad performance on 9x9 and catastrophic performance on 19x19. Initially I thought this parameter should be something like 1 and that was completely wrong. I was just lucky to get it right, just by visual inspection of the search behavior when I played around with the parameter. The bias can't possibly be 1 because by definition it is the difference between the expected value of the normal tree search and the expected value of AMAF. As those two values are in [0,1] anyway, the bias is certainly not 1, and in most cases very small. The reason is that the bias term of the denominator should be close to zero initially is to allow AMAF to have strong impact on the search but at some point (which is delayed by having a small BIAS constant) there is a quick shift towards using the real win rate instead. -Magnus ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to properly implement RAVE?
Quoting Sylvain Gelly sylvain.ge...@m4x.org: On Fri, Jan 30, 2009 at 9:29 AM, Magnus Persson magnus.pers...@phmp.sewrote: Did you try to tune the bias constant at all or just took the one I posted? I wrote it from memory and I believe it is in the range of possibly good values, but it is certainly not optimal, I can be off by a significant factor in both directions. Yes I always tries to test at least 5 values. Three values centered around what I currently believe is the best value and two or more values to the extremes to get the big picture and make sure that I am at least the maximum. ConstantWinrate Against Gnugo 0.00015 43.9 0.0015 50 0.015 50 0.1543.1 1 7.1 10 6.7 The bias can't possibly be 1 because by definition it is the difference between the expected value of the normal tree search and the expected value of AMAF. As those two values are in [0,1] anyway, the bias is certainly not 1, and in most cases very small. I think I was confused about exactly what in the term that was the bias. I do not really remember what and why I thought these things. Anyway your explanation is very clear. On a side note I think Valkyria for some moves have very large biases because the playouts are very heavy. I am thinking of an experiment to test this. One way is to simply play uniform playouts from a single position N times for all moves and thus get accurate win rates for all those moves as well as AMAF values. Then taking the difference in values for AMAF and real win rates should reveal the size of bias in general and if there are systematic difference for certain patterns or situations. Alternatively one could play N playouts for a single move and see what AMAF values one get. Now can see to what extent the biases are stable when moves are played on the board. The real values will change for all moves and the AMAF as well, but will the bias be constant? Or will it be random as a function of the position. There is no tree search in these two cases. Selective search is often similar to the second situation, and my question is whether this can cause larger systematic biases for some moves, that thus is missed. And if one understands the nature of such biases maybe they can be neutralized somehow. It could be that using patterns to start with strong biases in the prior AMAF values is doing exactly this. Has anyone has done anything like this or have any interesting opinions I would be thankful. -Magnus ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to properly implement RAVE?
At the moment I'm also tuning this constant in the range 0.001-0.1. Given uniformly random (light) playouts, is the bias expected to be bigger than with heavy playouts, meaning the constant has to be bigger aswell? -- Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01 ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/