Re: OpenNLP UD models

William Colen Mon, 18 Jan 2021 09:50:47 -0800

Hello Jeff! Nice work!!

Did you store the evaluation results somewhere?


Does UD have Named Entity annotation? Do you have any reference to share?

Why did you select only these languages? Any restrictions?

Thank you
William

Em dom., 17 de jan. de 2021 às 21:15, Jeff Zemerick <jzemer...@apache.org>
escreveu:

> Thanks, Bruno.
>
> If there aren't any major concerns I will kick off a VOTE thread for
> releasing these models.
>
> The overall plan is to:
>
> 1. Release these models by making them available for download on the
> website.
> 2. Submit the pull request to enable automatic downloading for the
> tokenizer, sentence, and POS tagger models.
> 3. Update user's guide and release new version.
> 4. Get NameFinder models trained and available.
> 5. Establish a more automated and documented process for training the
> models.
>
> Always open to suggestions and comments! Otherwise watch for a VOTE
> thread over the next few days.
>
> Thanks,
> Jeff
>
>
> On Wed, Jan 6, 2021 at 7:24 PM Bruno P. Kinoshita <ki...@apache.org>
> wrote:
>
> >  Hi Jeff,
> >
> > Cannot comment much on the process or direction, except that it looks
> good
> > to me.
> >
> > >While decent performance is always beneficial, the primary purpose
> > of this task is to provide working OpenNLP models the project can
> > distribute. Having these models will help reduce the barrier to entry for
> > users new to OpenNLP.
> >
> > +1! Had a read on the UD page, and looks well maintained, and even
> > includes a pt-br dataset.
> >
> > Thanks!
> > Bruno
> >
> >
> >
> >
> >
> >     On Wednesday, 6 January 2021, 11:31:32 am NZDT, Jeff Zemerick <
> > jzemer...@apache.org> wrote:
> >
> >  Hi all,
> >
> > I have created a script [1] to train OpenNLP models from Universal
> > Dependencies [2] data to give OpenNLP models that can be distributed
> under
> > the Apache license,
> >
> > The script automates the training of tokenizer, sentence, and POS models
> > for English, Dutch, French, German, and Italian. (The NameFinder does not
> > currently support the input annotation format so those models will come
> > later.) While decent performance is always beneficial, the primary
> purpose
> > of this task is to provide working OpenNLP models the project can
> > distribute. Having these models will help reduce the barrier to entry for
> > users new to OpenNLP.
> >
> > Once voted and approved, the trained models will be pushed to Subversion
> > alongside the current OpenNLP language detection model. From there, the
> > models can be made available for download on the OpenNLP website and
> > programmatically through OPENNLP-1318 [3]. The script to train the models
> > and instructions will be added to the OpenNLP repository.
> >
> > To use the script:
> >
> > 1. Download and extract UD.
> > 2. Download and extract OpenNLP.
> > 3. Create a directory to store the trained models.
> > 3. Modify the ud-train.sh script to set the path to those three
> > directories.
> > 4. Execute the ud-train.sh script.
> >
> > The training log, evaluation output, and model files will be saved to the
> > $OUTPUT_MODELS directory. Models and the output files I trained using
> this
> > script can be viewed on Dropbox [4].
> >
> > Before calling a vote to release the models, I would like to see if there
> > is any feedback on the process or direction. If you have any comments
> > please feel free.
> >
> > Thanks,
> > Jeff
> >
> > [1]
> > https://github.com/jzonthemtn/opennlp/blob/ud-models/scripts/ud-train.sh
> > [2] https://universaldependencies.org/
> > [3] https://issues.apache.org/jira/browse/OPENNLP-1318
> > [4]
> >
> https://www.dropbox.com/sh/p8focuz0qwvw84b/AAC6GqO8mqZn_xkAqHZsVAsoa?dl=0
> >
>

Re: OpenNLP UD models

Reply via email to