On Thu, Oct 27, 2016 at 4:41 PM, Russ, Daniel (NIH/CIT) [E] < [email protected]> wrote:
> Hello, > > Background: > I am developing a tool that uses OpenNLP. I have a model that extends > BaseModel, and several AbstractModels. I allow the user (myself) to > specify the TrainerType (GIS/QN) for each model by using a list of > TrainingParameters. > > Potential Bugs: > > 1) Whenever I use QNTrainer, I get an error (number of threads <1). I > think the problem is that the parameters are initialized in the isValid() > method instead of the init() method. This works for GIS because in the > doTrain(DataIndexer) method, the number of threads is a local variable > taken from the TrainingParameters not a field in GIS. This leads to > another question. When it the isValid() method supposed to be called? I am > surprised that the TrainerFactory does not call it. > > It should be be called from the factory I think. Currently it is only called when the training starts in AbstractEventTrainer.train(ObjectStream<Event>). It is always better to make things fail as early as possible. Can you share the exception stack trace? I don't really understand yet why you get this error with the QNTrainer. I would like to investigate that. > > 2) The psf (public static final) String variables used by the > TrainingParameters are all over the place. The variables > THEADS_(PARAM/DEFAULT) are defined in both QNTrainer and > TrainingParameters. It should be defined in one of the places. I am not > sure that AbstractTrainer isn’t the best place to put THREADS_(P/D). It > isn’t just the variables Threads_(P/D), All the Training psf String > variables from TrainingParameters are duplicated in AbstractTrainer. > > I agree, the commonly used variables should only be in one place. Some trainer have specific variables which are not shared. It would be nice to get this re-factored. > > 3) Should the Interface EventTrainer have a doTrainDataIndexer and a > getDataIndexer method? This is important to me because I extended > OnePassDataIndexer to pre-assign the outputs. I know the outputs aprori, > and I want to quick combine the results of the multiple models. Since the > getEventTrainer returns an EventTrainer instead of an AbstractEventTrainer, > I cannot call doTrain(DataIndexer). I cannot use the > doTrain(ObjectStream<Event>); it creates a new OnePassIndexer. > I think it would be fine to add a second train method to EventTrainer which takes a DataIndexer. The current train method should probably be changed a bit, and not do the init things. It would like to understand your usage here a bit better. So you want to have control over the DataIndexer which is used for training, right? Another option could be to have a second train(ObjectStream<Event>, DataIndexer) method. And why would you need a getDataIndexer method if you can pass in your own instance? Jörn
