Hello
Does someone have already used the UIMA TokenizerTrainer component ? I
am a bit confused since it does not create any model file.
In my stdout I got this :
Indexing events using cutoff of 5
Computing event counts...
done. 69669 events
Indexing... done.
Sorting and merging events... done. Reduced 69669 events to 16467.
Done indexing.
Incorporating indexed data for training...
done.
Number of Event Tokens: 16467
Number of Outcomes: 1
Number of Predicates: 5624
...done.
Computing model parameters...
Performing 100 iterations.
1: .. loglikelihood=0.0 1.0
2: .. loglikelihood=0.0 1.0
This look like a problem I got when I trained the model in command
line without using the '<SPLIT>' tag. In command line, It differs
since in command line I also got the following exception
Exception in thread "main" java.lang.IllegalArgumentException: The
maxent model is not compatible!
I solved this problem by adding the tag as it is mentioned in the post
of maxent model is not compatible with Tokenizer training Fri, 13 May,
09:33
http://mail-archives.apache.org/mod_mbox/incubator-opennlp-users/201105.mbox/browser
Does anyone know if it is the same problem ? In that case, how to
specify the '<SPLIT>' tag in the UIMA version? As much as I understand
its role, it is important to let the user the possibility of setting
it.
More globaly I am interested by any return on experience of people who
successfully managed to build models with the UIMA OpenNLP * Trainer
components. For now, I also got some trouble with the SentenceTrainer
and I do not have test the others.
/Nicolas
--
[email protected]
#
http://enicolashernandez.blogspot.com
http://www.univ-nantes.fr/hernandez-n
#
Laboratoire LINA-TALN CNRS UMR 6241
tel. +33 (0)2 51 12 58 55
#
Université de Nantes - Institut Universitaire de Technologie -
Département Informatique
tel. +33 (0)2 40 30 60 67