Just today I was testing the TokenizerTrainer and I found a bug there with the isSkipAlphaNumerics parameter: in the initialize() method, I see that it's defined as a local variable too so the instance variable gets never assigned and this causes a NPE on the collectionProcessComplete(). The fix is in just removing the "Boolean" type definition at line 111 of TokenizerTrainer [1] which allows assignment of configuration parameter value to the instance variable. Tommaso
[1] : http://svn.apache.org/viewvc/incubator/opennlp/trunk/opennlp-uima/src/main/java/opennlp/uima/tokenize/TokenizerTrainer.java?view=markup 2011/4/1 Tommaso Teofili <[email protected]> > 2011/4/1 Jörn Kottmann <[email protected]> > >> On 4/1/11 12:58 PM, Tommaso Teofili wrote: >> >>> One issue I found is that the opennlp.uima.Language parameter is not >>> defined >>> in the trainers' descriptors causing them to fail during initialization >>> since the *Trainer classes need the language as a mandatory parameter >>> (that >>> is good I think since the statistical model built is language dependent). >>> Am I right or am I missing something? >>> >> >> No, that really sounds like a mistake, seems like I simply forgot to put >> the parameter >> deceleration into the descriptor. I will change it on Monday, or of course >> a patch is welcome :) > > > I didn't run in any other issues, will provide a patch for the descriptors > tomorrow or sunday :) > Tommaso > > >> >> p.s.: >>> Also within that fail case it seems >>> the org.apache.uima.UIMAException_Messages is missing, but I'd not >>> consider >>> this a bug at the moment since I am doing tests in a separate project >>> which >>> could need some tweaks but I though it was still useful to report >>> >> >> I will have a look, thanks for pointing out, even its not a bug we might >> want >> to improve it. >> >> Jörn >> > >
