Hi Joern,

Training data generated from reuters dataset is in the following format.
It has generated three files eng.train, eng.testa, eng.testb.

A DT I-NP O
rare JJ I-NP O
early JJ I-NP O
handwritten JJ I-NP O
draft NN I-NP O
of IN I-PP O
a DT I-NP O
song NN I-NP O
by IN I-PP O
U.S. NNP I-NP I-LOC
guitar NN I-NP O
legend NN I-NP O
Jimi NNP I-NP I-PER

Using this training data file when I ran the command:
./opennlp TokenNameFinderTrainer -model en-ner-person.bin -lang en -data
/home/centos/ner/eng.train -encoding UTF-8

It is giving me the following error:
ERROR: Not enough training data
The provided training data is not sufficient to create enough events to
train a model.
To resolve this error use more training data, if this doesn't help there
might
be some fundamental problem with the training data itself.

The format required for training opennlp models is in the form of sentences
but training data prepared from reuters dataset is in the baove said
format. So please tell me how training data can be generated in the
required format or how the existing training data format can be used for
generating models.

With Regards
Madhvi Gupta
*(Senior Software Engineer)*

On Mon, Feb 20, 2017 at 5:52 PM, Joern Kottmann <kottm...@gmail.com> wrote:

> Please explain to us what is not working. Any error messages or exceptions?
>
> The name finder by default trains on the default format which you can see
> in the documentation link i shared.
>
> Jörn
>
> On Mon, Feb 20, 2017 at 6:04 AM, Madhvi Gupta <mgmahi....@gmail.com>
> wrote:
>
> > Hi Joern,
> >
> > I have got the data from the following link which consist of corpus of
> new
> > articles.
> > http://trec.nist.gov/data/reuters/reuters.html
> >
> > Following the steps given in the below link I have created training and
> > test data but it is not working with the NameFinder of opennlp api.
> > http://www.clips.uantwerpen.be/conll2003/ner/000README
> >
> > So can you please help me how to create training data out of that corpus
> > and use it to create name entity detection models?
> >
> > With Regards
> > Madhvi Gupta
> > *(Senior Software Engineer)*
> >
> > On Mon, Feb 20, 2017 at 1:00 AM, Joern Kottmann <kottm...@gmail.com>
> > wrote:
> >
> > > Hello,
> > >
> > > to train the name finder you need training data that contains the
> > entities
> > > you would like to decect.
> > > Is that the case with the data you have?
> > >
> > > Take a look at our documentation:
> > > https://opennlp.apache.org/documentation/1.7.2/manual/
> > > opennlp.html#tools.namefind.training
> > >
> > > At the beginning of that section you can see how the data has to be
> > marked
> > > up.
> > >
> > > Please note you that you need many sentences to train the name finder.
> > >
> > > HTH,
> > > Jörn
> > >
> > >
> > > On Sat, Feb 18, 2017 at 11:28 AM, Madhvi Gupta <mgmahi....@gmail.com>
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I have got reuters data from NIST. Now I want to generate the
> training
> > > data
> > > > from that to create a model for detecting named entities. Can anyone
> > tell
> > > > me how the models can be generated from that.
> > > >
> > > > --
> > > > With Regards
> > > > Madhvi Gupta
> > > > *(Senior Software Engineer)*
> > > >
> > >
> >
> >
> >
> > --
> >
>

Reply via email to