On Fri, Jun 07, 2019 at 08:30:10PM +0200, Øystein Schønning-Johansen wrote:
> (Of course I remove any position duplicated in the two datasets, such that > the training and validation set are disjoint.) Is it really important (in general) ? I know one shouldn't use the same dataset but is some limited random overlap really an issue ? I didn't verify how limited it is in the case of gnubg's databases, though... > I train a neural network. If I validate the training with a 10% fraction of > the training dataset itself, I get a MSE error of about 1.0e-04. But if I > validate against the dataset generated from train.bm-1.00.bz2 I get an MSE > error of 7e-04. About 7 times higher! > > This makes me believe that the rolled out positions in the race-train-data > file is rolled out in an other way (different tool, different settings, > different neural net?) than the positions in train.bm-1.00.bz2. Different tool and different neural net. For the benchmark databases it is recorded as a comment at the beginning of the file : s version 1.93 weights 1.00 moves2plyLimit 20 rolloutLimit 5 nRollOutGames 1296 cubeAway 7 include0Ply 1 evalPlies 2 shortCuts 1 osrGames 1296 osrInRoll 1 This is version 1.93 of the sagnubg tool, using the 1.OO weights file (the current one). I rerolled the benchmark databases with it after the new weights file was generated. The training database was rolled out with a slightly modified gnubg (merely to have gnubg -t print the rollout results in the right format). This was done with earlier weights. I didn't kept notes but I think I used one intermediate weights set for the race and possibly more than one for the crashed net (rollout the training database with the 0.90 net, train a new net, reroll the training database with it, etc...). For the contact net I'm not sure. In any case, this was with different weights than the current benchmark database. > Joseph? Philippe? Ian? Others? Do you know how these data where generated? > Is it maybe worth rolling these positions out again? I do remember that > Joseph made a separate rollout tool, but I'm not sure what Philippe did? It is likely the different errors you got have another cause : as far as I can see,the sagnubg tool used for creating the benchmark databases doesn't use variance reduction. That should be enough of a reason to seriously consider rerolling them, but we would have to implement variance reduction in sagnubg first or use gnubg with some substantial pre- and post-processing. > (I also remember that the original benchmark was move based, and it > calculates the loss based on incorrect moves picked, and that it might not > be that interesting if the rollout values are abit wrong....) I'm afraid they may not be just a bit wrong. It seems the standard deviation of a 1296 trials rollout without variance reduction is larger than the vast majority of the "errors" found when running the benchmark. _______________________________________________ Bug-gnubg mailing list [email protected] https://lists.gnu.org/mailman/listinfo/bug-gnubg
