Re: [computer-go] Negative result on using MC as a predictor

Don Dailey Fri, 05 Jun 2009 08:20:18 -0700

When I complete the new server, I hope that it will be easier to collect
larger samples of games.   I think this will help the situation a little.


There will be multiple time controls, but they will be in sync, so that your
program can always play in a shorter time control game without missing a
game at the longer time control.    The idea is to keep your bot busy while
waiting for future rounds.    You play in the longest time control, but when
you are finished you can play fast games while waiting.   I will have 2 or 3
levels of this,  I haven't decided yet.     If I have 3 levels, the slowest
time control will probably need to be a little slower than CGOS uses now.

I will also have a test mode for new bots.  The server itself will play test
games with your bot while you debug it.

I haven't decided if each time control should be rated separately,  but I'm
leaning in favor of doing it this way.

- Don


On Fri, Jun 5, 2009 at 11:03 AM, Magnus Persson <magnus.pers...@phmp.se>wrote:

> Hi Brian!
>
> In my tests with Valkyria I have something like a 4-5% improvement in
> winrate against gnugo using ownership. But I think you need to be much more
> careful in how you test these things.
>
> Testing on CGOS is a no-no for me, because the opposition changes from hour
> to hour, so unless there are large effects in playing strength it is very
> hard to detect them on CGOS. I am currently testing against GnuGo or Fuego
> or Valkyria itself using twogtp.jar from GoGui. The nice thing with testing
> MC programs is that one can set the number of playouts low and play a lot of
> games.
>
> You might also consider simplifying your code. Just doing something simple
> like this:
>
> CombinedWinrate = AMAFWinRate + k*OwnerShip.
>
> I would then vary k from 0, 0.001, 0.01, 0.1, 1.
>
> Here I am expecting really bad performance for k=1, but I always try to
> include some extreme values so that I know for sure that there is no bug and
> the results make sense.
>
> I will run 500-2500 games per parameter because often the effects are
> really small and needs tons of data to be detected. I learned the hard way
> that it is too tempting to make quick conclusion on insufficient data.
>
> For every test I also need to think hard about how fast the programs should
> play. Fast play gives more data, but may not generalize to slow play on CGOS
> for example.
>
> When you know how this works, you can start experimenting with more complex
> code and more parameters.
>
> That is the philosophy I try to follow for my testing of Valkyria, and I
> hope some of it could be helpful.
>
>
> Quoting Brian Sheppard <sheppar...@aol.com>:
>
>  In a paper published a while ago, Remi Coulom showed that 64 MC trials
>> (i.e., just random, no tree) was a useful predictor of move quality.
>>
>> In particular, Remi counted how often each point ended up in possession
>> of the side to move. He then measured the probability of being the best
>> move as a function of the frequency of possession. Remi found that if
>> the possession frequency was around 1/3 then the move was most likely
>> to be best, with decreasing probabilities elsewhere.
>>
>> I have been trying to extract more information from each trial, since
>> it seems to me that we are discarding useful information when we use
>> only the result of a trial. So I tried to implement Remi's idea in a UCT
>> program.
>>
>> This is very different from Remi's situation, in which the MC trials are
>> done before the predictor is used in a tree search. Here, we will have
>> a tree search going on concurrently with collecting data about point
>> ownership.
>>
>> My implementation used the first N trials of each UCT node to collect
>> point ownership information. After the first M trials, it would use that
>> information to bias the RAVE statistics. That is, in the selectBest
>> routine I had an expression like this:
>>
>>   for all moves {
>>      // Get the observed RAVE values:
>>      nRAVE = RAVETrials[move];
>>      wRAVE = RAVEWins[move];
>>
>>      // Dynamically adjust according to point ownership:
>>      if (trialCount < M) {
>>           ; // Do nothing.
>>      }
>>      else if (Ownership[move] < 0.125) {
>>           nRAVE += ownershipTrialsParams[0];
>>           wRAVE += ownershipWinsParams[0];
>>      }
>>      else if (Ownership[move] < 0.250) {
>>           nRAVE += ownershipTrialsParams[1];
>>           wRAVE += ownershipWinsParams[1];
>>      }
>>      else if (Ownership[move] < 0.375) {
>>           nRAVE += ownershipTrialsParams[2];
>>           wRAVE += ownershipWinsParams[2];
>>      }
>>      else if (Ownership[move] < 0.500) {
>>           nRAVE += ownershipTrialsParams[3];
>>           wRAVE += ownershipWinsParams[3];
>>      }
>>      else if (Ownership[move] < 0.625) {
>>           nRAVE += ownershipTrialsParams[4];
>>           wRAVE += ownershipWinsParams[4];
>>      }
>>      else if (Ownership[move] < 0.750) {
>>           nRAVE += ownershipTrialsParams[5];
>>           wRAVE += ownershipWinsParams[5];
>>      }
>>      else if (Ownership[move] < 0.875) {
>>           nRAVE += ownershipTrialsParams[6];
>>           wRAVE += ownershipWinsParams[6];
>>      }
>>      else {
>>           nRAVE += ownershipTrialsParams[7];
>>           wRAVE += ownershipWinsParams[7];
>>      }
>>
>>      // Now use nRAVE and wRAVE to order the moves for expansion....
>>   }
>>
>> The bottom line is that the result was negative. In the test period,
>> Pebbles
>> won 69% (724 out of 1039) of CGOS games when not using this feature and
>> less than 59% when using this feature. I tried a few parameter settings.
>> Far from exhaustive, but mostly in line with Remi's paper.
>> The best parameter settings showed 59% (110 out of 184, which is 2.4
>> standard deviations lower). But maybe you can learn from my mistakes
>> and figure out how to make it work.
>>
>> I have no idea why this implementation doesn't work. Maybe RAVE does a
>> good job already of determining where to play, so ownership information
>> is redundant. Maybe different parameter settings would work. Maybe just
>> overhead (but I doubt that; the overhead wouldn't account for such a
>> significant drop).
>>
>> Anyway, if you try something like this, please let me know how it works
>> out.
>> Or if you have other ideas about how to extract more information from
>> trials.
>>
>> Best,
>> Brian
>>
>> _______________________________________________
>> computer-go mailing list
>> computer-go@computer-go.org
>> http://www.computer-go.org/mailman/listinfo/computer-go/
>>
>>
>
>
> --
> Magnus Persson
> Berlin, Germany
>
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Negative result on using MC as a predictor

Reply via email to