I was trying to run the OpenNLP POSTagger on Android and noticed that it
was taking way too long to load a large model (4-5 MB in size).
I had to settle for a much smaller model, to be able to get the POS tagger
to run in reasonable time (on Android).

I then ran some experiments (on my PC) with the POSTagger, by training the
model with varying #sentences from the Penn Treebank. Here is what I found :
(All the models are maxent, while the last one I believe is a perceptron
model).

#Sentences Accuracy Loading Time
240K         84.92 0.111
2000(482 KB) 93.35 0.136
4000(818 KB) 94.46 0.181
8000(1.3 MB) 95.33 0.221
38015(3.7 MB)   96.47        0.503
5.4 MB 95.95 1.15

While it seems like I can take a small sacrifice in accuracy, for a large
improvement in model load times (ex: 400KB model vs 5.4 MB model), I was
wondering if we can expect any improvements in model load times in any
upcoming releases ?
I am using OpenNLP-1.5.1-incubating.

Thanks
Vatsan

Reply via email to