Re: new tool training

Joern Kottmann Thu, 27 Oct 2016 08:16:17 -0700

On Thu, Oct 27, 2016 at 4:41 PM, Russ, Daniel (NIH/CIT) [E] <
[email protected]> wrote:


> Hello,
>
> Background:
>    I am developing a tool that uses OpenNLP.  I have a model that extends
> BaseModel, and several AbstractModels.  I allow the user (myself) to
> specify the TrainerType (GIS/QN) for each model by using a list of
> TrainingParameters.
>
> Potential Bugs:
>
> 1)    Whenever I use QNTrainer, I get an error (number of threads <1).  I
> think the problem is that the parameters are initialized in the isValid()
> method instead of the init() method.  This works for GIS because in the
> doTrain(DataIndexer) method, the number of threads is a local variable
> taken from the TrainingParameters not a field in GIS.  This leads to
> another question. When it the isValid() method supposed to be called?  I am
> surprised that the TrainerFactory does not call it.
>
>
It should be be called from the factory I think. Currently it is only
called when the training starts in
AbstractEventTrainer.train(ObjectStream<Event>). It is always better to
make things fail as early as possible.

Can you share the exception stack trace? I don't really understand yet why
you get this error with the QNTrainer. I would like to investigate that.


>
> 2)    The psf (public static final) String variables used by the
> TrainingParameters are all over the place.  The variables
> THEADS_(PARAM/DEFAULT) are defined in both QNTrainer and
> TrainingParameters.  It should be defined in one of the places. I am not
> sure that AbstractTrainer isn’t the best place to put THREADS_(P/D).  It
> isn’t just the variables Threads_(P/D), All the Training psf String
> variables from TrainingParameters are duplicated in AbstractTrainer.
>
>
I agree, the commonly used variables should only be in one place. Some
trainer have specific variables which are not shared. It would be nice to
get this re-factored.



>
> 3)    Should the Interface EventTrainer have a doTrainDataIndexer and a
> getDataIndexer method?  This is important to me because I extended
> OnePassDataIndexer to pre-assign the outputs.  I know the outputs aprori,
> and I want to quick combine the results of the multiple models.  Since the
> getEventTrainer returns an EventTrainer instead of an AbstractEventTrainer,
> I cannot call doTrain(DataIndexer).  I cannot use the
> doTrain(ObjectStream<Event>); it creates a new OnePassIndexer.
>


I think it would be fine to add a second train method to EventTrainer which
takes a DataIndexer. The current train method should probably be changed a
bit, and not do the init things.
It would like to understand your usage here a bit better.

So you want to have control over the DataIndexer which is used for
training, right?
Another option could be to have a second train(ObjectStream<Event>,
DataIndexer) method.

And why would you need a getDataIndexer method if you can pass in your own
instance?


Jörn

Re: new tool training

Reply via email to