Re: OpenNLP UD models

Jeff Zemerick Sat, 23 Jan 2021 15:59:29 -0800

Thanks!

I think the evaluation results are also in that Dropbox folder. I will
double-check to be sure.


I don't think the Named Entity Finder currently supports the CoNLL-U format
that the UD uses. I think we need to add support for connlu for the Named
Entity Finder and then we can train those models.

I picked those languages mostly at random. I wanted languages that I
thought might appeal to the most users for a first release. We can
certainly expand the models to each of the other languages in the future,
assuming those languages have sufficient training data to make a decent
model.

Thanks,
Jeff


On Mon, Jan 18, 2021 at 12:50 PM William Colen <[email protected]>
wrote:

> Hello Jeff! Nice work!!
>
> Did you store the evaluation results somewhere?
>
> Does UD have Named Entity annotation? Do you have any reference to share?
>
> Why did you select only these languages? Any restrictions?
>
> Thank you
> William
>
> Em dom., 17 de jan. de 2021 às 21:15, Jeff Zemerick <[email protected]>
> escreveu:
>
> > Thanks, Bruno.
> >
> > If there aren't any major concerns I will kick off a VOTE thread for
> > releasing these models.
> >
> > The overall plan is to:
> >
> > 1. Release these models by making them available for download on the
> > website.
> > 2. Submit the pull request to enable automatic downloading for the
> > tokenizer, sentence, and POS tagger models.
> > 3. Update user's guide and release new version.
> > 4. Get NameFinder models trained and available.
> > 5. Establish a more automated and documented process for training the
> > models.
> >
> > Always open to suggestions and comments! Otherwise watch for a VOTE
> > thread over the next few days.
> >
> > Thanks,
> > Jeff
> >
> >
> > On Wed, Jan 6, 2021 at 7:24 PM Bruno P. Kinoshita <[email protected]>
> > wrote:
> >
> > >  Hi Jeff,
> > >
> > > Cannot comment much on the process or direction, except that it looks
> > good
> > > to me.
> > >
> > > >While decent performance is always beneficial, the primary purpose
> > > of this task is to provide working OpenNLP models the project can
> > > distribute. Having these models will help reduce the barrier to entry
> for
> > > users new to OpenNLP.
> > >
> > > +1! Had a read on the UD page, and looks well maintained, and even
> > > includes a pt-br dataset.
> > >
> > > Thanks!
> > > Bruno
> > >
> > >
> > >
> > >
> > >
> > >     On Wednesday, 6 January 2021, 11:31:32 am NZDT, Jeff Zemerick <
> > > [email protected]> wrote:
> > >
> > >  Hi all,
> > >
> > > I have created a script [1] to train OpenNLP models from Universal
> > > Dependencies [2] data to give OpenNLP models that can be distributed
> > under
> > > the Apache license,
> > >
> > > The script automates the training of tokenizer, sentence, and POS
> models
> > > for English, Dutch, French, German, and Italian. (The NameFinder does
> not
> > > currently support the input annotation format so those models will come
> > > later.) While decent performance is always beneficial, the primary
> > purpose
> > > of this task is to provide working OpenNLP models the project can
> > > distribute. Having these models will help reduce the barrier to entry
> for
> > > users new to OpenNLP.
> > >
> > > Once voted and approved, the trained models will be pushed to
> Subversion
> > > alongside the current OpenNLP language detection model. From there, the
> > > models can be made available for download on the OpenNLP website and
> > > programmatically through OPENNLP-1318 [3]. The script to train the
> models
> > > and instructions will be added to the OpenNLP repository.
> > >
> > > To use the script:
> > >
> > > 1. Download and extract UD.
> > > 2. Download and extract OpenNLP.
> > > 3. Create a directory to store the trained models.
> > > 3. Modify the ud-train.sh script to set the path to those three
> > > directories.
> > > 4. Execute the ud-train.sh script.
> > >
> > > The training log, evaluation output, and model files will be saved to
> the
> > > $OUTPUT_MODELS directory. Models and the output files I trained using
> > this
> > > script can be viewed on Dropbox [4].
> > >
> > > Before calling a vote to release the models, I would like to see if
> there
> > > is any feedback on the process or direction. If you have any comments
> > > please feel free.
> > >
> > > Thanks,
> > > Jeff
> > >
> > > [1]
> > >
> https://github.com/jzonthemtn/opennlp/blob/ud-models/scripts/ud-train.sh
> > > [2] https://universaldependencies.org/
> > > [3] https://issues.apache.org/jira/browse/OPENNLP-1318
> > > [4]
> > >
> >
> https://www.dropbox.com/sh/p8focuz0qwvw84b/AAC6GqO8mqZn_xkAqHZsVAsoa?dl=0
> > >
> >
>

Re: OpenNLP UD models

Reply via email to