Thanks, I will try re-rolling out these positions. Do you have any experience of how to do good rollouts of race positions? Good rollout settings for race positions?
-Øystein On Sun, Jun 9, 2019 at 11:38 PM Philippe Michel <[email protected]> wrote: > On Fri, Jun 07, 2019 at 08:30:10PM +0200, Øystein Schønning-Johansen wrote: > > > (Of course I remove any position duplicated in the two datasets, such > that > > the training and validation set are disjoint.) > > Is it really important (in general) ? I know one shouldn't use the same > dataset but is some limited random overlap really an issue ? I didn't > verify how limited it is in the case of gnubg's databases, though... > > > I train a neural network. If I validate the training with a 10% fraction > of > > the training dataset itself, I get a MSE error of about 1.0e-04. But if I > > validate against the dataset generated from train.bm-1.00.bz2 I get an > MSE > > error of 7e-04. About 7 times higher! > > > > This makes me believe that the rolled out positions in the > race-train-data > > file is rolled out in an other way (different tool, different settings, > > different neural net?) than the positions in train.bm-1.00.bz2. > > Different tool and different neural net. > > For the benchmark databases it is recorded as a comment at the beginning > of the file : > > s version 1.93 weights 1.00 moves2plyLimit 20 rolloutLimit 5 nRollOutGames > 1296 cubeAway 7 include0Ply 1 evalPlies 2 shortCuts 1 osrGames 1296 > osrInRoll 1 > > This is version 1.93 of the sagnubg tool, using the 1.OO weights file > (the current one). I rerolled the benchmark databases with it after the > new weights file was generated. > > The training database was rolled out with a slightly modified gnubg > (merely to have gnubg -t print the rollout results in the right format). > > This was done with earlier weights. I didn't kept notes but I think I > used one intermediate weights set for the race and possibly more than > one for the crashed net (rollout the training database with the 0.90 > net, train a new net, reroll the training database with it, etc...). For > the contact net I'm not sure. > > In any case, this was with different weights than the current benchmark > database. > > > Joseph? Philippe? Ian? Others? Do you know how these data where > generated? > > Is it maybe worth rolling these positions out again? I do remember that > > Joseph made a separate rollout tool, but I'm not sure what Philippe did? > > It is likely the different errors you got have another cause : as far as > I can see,the sagnubg tool used for creating the benchmark databases > doesn't use variance reduction. > > That should be enough of a reason to seriously consider rerolling them, > but we would have to implement variance reduction in sagnubg first or > use gnubg with some substantial pre- and post-processing. > > > (I also remember that the original benchmark was move based, and it > > calculates the loss based on incorrect moves picked, and that it might > not > > be that interesting if the rollout values are abit wrong....) > > I'm afraid they may not be just a bit wrong. It seems the standard > deviation of a 1296 trials rollout without variance reduction is larger > than the vast majority of the "errors" found when running the benchmark. >
_______________________________________________ Bug-gnubg mailing list [email protected] https://lists.gnu.org/mailman/listinfo/bug-gnubg
