Thanks! I think the evaluation results are also in that Dropbox folder. I will double-check to be sure.
I don't think the Named Entity Finder currently supports the CoNLL-U format that the UD uses. I think we need to add support for connlu for the Named Entity Finder and then we can train those models. I picked those languages mostly at random. I wanted languages that I thought might appeal to the most users for a first release. We can certainly expand the models to each of the other languages in the future, assuming those languages have sufficient training data to make a decent model. Thanks, Jeff On Mon, Jan 18, 2021 at 12:50 PM William Colen <[email protected]> wrote: > Hello Jeff! Nice work!! > > Did you store the evaluation results somewhere? > > Does UD have Named Entity annotation? Do you have any reference to share? > > Why did you select only these languages? Any restrictions? > > Thank you > William > > Em dom., 17 de jan. de 2021 às 21:15, Jeff Zemerick <[email protected]> > escreveu: > > > Thanks, Bruno. > > > > If there aren't any major concerns I will kick off a VOTE thread for > > releasing these models. > > > > The overall plan is to: > > > > 1. Release these models by making them available for download on the > > website. > > 2. Submit the pull request to enable automatic downloading for the > > tokenizer, sentence, and POS tagger models. > > 3. Update user's guide and release new version. > > 4. Get NameFinder models trained and available. > > 5. Establish a more automated and documented process for training the > > models. > > > > Always open to suggestions and comments! Otherwise watch for a VOTE > > thread over the next few days. > > > > Thanks, > > Jeff > > > > > > On Wed, Jan 6, 2021 at 7:24 PM Bruno P. Kinoshita <[email protected]> > > wrote: > > > > > Hi Jeff, > > > > > > Cannot comment much on the process or direction, except that it looks > > good > > > to me. > > > > > > >While decent performance is always beneficial, the primary purpose > > > of this task is to provide working OpenNLP models the project can > > > distribute. Having these models will help reduce the barrier to entry > for > > > users new to OpenNLP. > > > > > > +1! Had a read on the UD page, and looks well maintained, and even > > > includes a pt-br dataset. > > > > > > Thanks! > > > Bruno > > > > > > > > > > > > > > > > > > On Wednesday, 6 January 2021, 11:31:32 am NZDT, Jeff Zemerick < > > > [email protected]> wrote: > > > > > > Hi all, > > > > > > I have created a script [1] to train OpenNLP models from Universal > > > Dependencies [2] data to give OpenNLP models that can be distributed > > under > > > the Apache license, > > > > > > The script automates the training of tokenizer, sentence, and POS > models > > > for English, Dutch, French, German, and Italian. (The NameFinder does > not > > > currently support the input annotation format so those models will come > > > later.) While decent performance is always beneficial, the primary > > purpose > > > of this task is to provide working OpenNLP models the project can > > > distribute. Having these models will help reduce the barrier to entry > for > > > users new to OpenNLP. > > > > > > Once voted and approved, the trained models will be pushed to > Subversion > > > alongside the current OpenNLP language detection model. From there, the > > > models can be made available for download on the OpenNLP website and > > > programmatically through OPENNLP-1318 [3]. The script to train the > models > > > and instructions will be added to the OpenNLP repository. > > > > > > To use the script: > > > > > > 1. Download and extract UD. > > > 2. Download and extract OpenNLP. > > > 3. Create a directory to store the trained models. > > > 3. Modify the ud-train.sh script to set the path to those three > > > directories. > > > 4. Execute the ud-train.sh script. > > > > > > The training log, evaluation output, and model files will be saved to > the > > > $OUTPUT_MODELS directory. Models and the output files I trained using > > this > > > script can be viewed on Dropbox [4]. > > > > > > Before calling a vote to release the models, I would like to see if > there > > > is any feedback on the process or direction. If you have any comments > > > please feel free. > > > > > > Thanks, > > > Jeff > > > > > > [1] > > > > https://github.com/jzonthemtn/opennlp/blob/ud-models/scripts/ud-train.sh > > > [2] https://universaldependencies.org/ > > > [3] https://issues.apache.org/jira/browse/OPENNLP-1318 > > > [4] > > > > > > https://www.dropbox.com/sh/p8focuz0qwvw84b/AAC6GqO8mqZn_xkAqHZsVAsoa?dl=0 > > > > > >
