Right, I guess so:

#Mon Mar 28 12:17:52 PDT 2011
Training-Eventhash=d61e8fc9af7e230ff91060f27e0d2959
Manifest-Version=1.0
Language=de
useTokenEnd=true
Training-Cutoff=5
Training-Iterations=100
OpenNLP-Version=1.5.0
Timestamp=1301339872213
Component-Name=SentenceDetectorME

though I meant also major minor version that the person doing the build can 
provide for the version of the data not the OpenNLP software (don't forget data 
location e.g. 
/Users/chris/model_training/en/me_playing_around_dont_use_in_production :-})

C
On May 5, 2011, at 9:01 AM, Jörn Kottmann wrote:

> On 5/5/11 5:57 PM, Chris Collins wrote:
>> That is a good idea, I would also consider including a few other optional 
>> fields and making it human readable.  In the system I work on all our data 
>> gets this type of "body tag", we include other things like:
>> 
>> - machine it was built on and perhaps the os user that did the run.
>> - build date
>> - source path to where the input data (in this case training set)
>> - maybe a hash of the training set.
>> - major/ minor version number
>> - maybe the training tool allows you to pass a set of arbitrary key value 
>> pairs this way the above could be defined in an ant script or what have you.
>> 
>> This way when you find this model sitting a disk some day you can actually 
>> figure out if you trust it.  Nothing like going into production with 
>> something like this to find it was something built on your interns laptop 
>> just as a test that everyone forgot about.
>> 
> 
> That just sounds like what we already write into the model, expect the 
> machine name, OS and user.
> The model itself is a zip package, and includes a manifest which includes 
> these values.
> 
> Maybe we should extend the cmd line tooling to display it, then you do not 
> need to unpack
> the zip package.
> 
> Jörn
> 

Reply via email to