I was trying to run the OpenNLP POSTagger on Android and noticed that it was taking way too long to load a large model (4-5 MB in size). I had to settle for a much smaller model, to be able to get the POS tagger to run in reasonable time (on Android).
I then ran some experiments (on my PC) with the POSTagger, by training the model with varying #sentences from the Penn Treebank. Here is what I found : (All the models are maxent, while the last one I believe is a perceptron model). #Sentences Accuracy Loading Time 240K 84.92 0.111 2000(482 KB) 93.35 0.136 4000(818 KB) 94.46 0.181 8000(1.3 MB) 95.33 0.221 38015(3.7 MB) 96.47 0.503 5.4 MB 95.95 1.15 While it seems like I can take a small sacrifice in accuracy, for a large improvement in model load times (ex: 400KB model vs 5.4 MB model), I was wondering if we can expect any improvements in model load times in any upcoming releases ? I am using OpenNLP-1.5.1-incubating. Thanks Vatsan
