That is a good idea, I would also consider including a few other optional fields and making it human readable. In the system I work on all our data gets this type of "body tag", we include other things like:
- machine it was built on and perhaps the os user that did the run. - build date - source path to where the input data (in this case training set) - maybe a hash of the training set. - major/ minor version number - maybe the training tool allows you to pass a set of arbitrary key value pairs this way the above could be defined in an ant script or what have you. This way when you find this model sitting a disk some day you can actually figure out if you trust it. Nothing like going into production with something like this to find it was something built on your interns laptop just as a test that everyone forgot about. Best C On May 5, 2011, at 6:39 AM, Jörn Kottmann wrote: > On 5/3/11 5:05 PM, Jason Baldridge wrote: >> Sure. But that proposal will involve blasting things apart. ;) >> > > What do you think about defining some kind of training attribute file, > which specifies all the parameters which are needed to train a model. > > This file could contain the training algorithm combined with several > attributes, > e.g. cutoff, iterations, etc. The attributes could also be algorithm > dependent, > e.g for Perceptron there could be a property which defines the number of > iterations > where the accuracy must be identical in order to stop. > > Such a file would make our code simpler in some places, e.g command line > argument > handling, writing of these attributes in to the model packages, simple APIs > for training > with all kind of parameters, etc. > > Any opinions? > > Jörn
