Re: How to train a Named entity detection model

Madhvi Gupta Sun, 26 Feb 2017 21:43:01 -0800

Please let me know if anyone have any idea about this

With Regards
Madhvi Gupta
*(Senior Software Engineer)*


On Tue, Feb 21, 2017 at 10:51 AM, Madhvi Gupta <mgmahi....@gmail.com> wrote:

> Hi Joern,
>
> Training data generated from reuters dataset is in the following format.
> It has generated three files eng.train, eng.testa, eng.testb.
>
> A DT I-NP O
> rare JJ I-NP O
> early JJ I-NP O
> handwritten JJ I-NP O
> draft NN I-NP O
> of IN I-PP O
> a DT I-NP O
> song NN I-NP O
> by IN I-PP O
> U.S. NNP I-NP I-LOC
> guitar NN I-NP O
> legend NN I-NP O
> Jimi NNP I-NP I-PER
>
> Using this training data file when I ran the command:
> ./opennlp TokenNameFinderTrainer -model en-ner-person.bin -lang en -data
> /home/centos/ner/eng.train -encoding UTF-8
>
> It is giving me the following error:
> ERROR: Not enough training data
> The provided training data is not sufficient to create enough events to
> train a model.
> To resolve this error use more training data, if this doesn't help there
> might
> be some fundamental problem with the training data itself.
>
> The format required for training opennlp models is in the form of
> sentences but training data prepared from reuters dataset is in the baove
> said format. So please tell me how training data can be generated in the
> required format or how the existing training data format can be used for
> generating models.
>
> With Regards
> Madhvi Gupta
> *(Senior Software Engineer)*
>
> On Mon, Feb 20, 2017 at 5:52 PM, Joern Kottmann <kottm...@gmail.com>
> wrote:
>
>> Please explain to us what is not working. Any error messages or
>> exceptions?
>>
>> The name finder by default trains on the default format which you can see
>> in the documentation link i shared.
>>
>> Jörn
>>
>> On Mon, Feb 20, 2017 at 6:04 AM, Madhvi Gupta <mgmahi....@gmail.com>
>> wrote:
>>
>> > Hi Joern,
>> >
>> > I have got the data from the following link which consist of corpus of
>> new
>> > articles.
>> > http://trec.nist.gov/data/reuters/reuters.html
>> >
>> > Following the steps given in the below link I have created training and
>> > test data but it is not working with the NameFinder of opennlp api.
>> > http://www.clips.uantwerpen.be/conll2003/ner/000README
>> >
>> > So can you please help me how to create training data out of that corpus
>> > and use it to create name entity detection models?
>> >
>> > With Regards
>> > Madhvi Gupta
>> > *(Senior Software Engineer)*
>> >
>> > On Mon, Feb 20, 2017 at 1:00 AM, Joern Kottmann <kottm...@gmail.com>
>> > wrote:
>> >
>> > > Hello,
>> > >
>> > > to train the name finder you need training data that contains the
>> > entities
>> > > you would like to decect.
>> > > Is that the case with the data you have?
>> > >
>> > > Take a look at our documentation:
>> > > https://opennlp.apache.org/documentation/1.7.2/manual/
>> > > opennlp.html#tools.namefind.training
>> > >
>> > > At the beginning of that section you can see how the data has to be
>> > marked
>> > > up.
>> > >
>> > > Please note you that you need many sentences to train the name finder.
>> > >
>> > > HTH,
>> > > Jörn
>> > >
>> > >
>> > > On Sat, Feb 18, 2017 at 11:28 AM, Madhvi Gupta <mgmahi....@gmail.com>
>> > > wrote:
>> > >
>> > > > Hi All,
>> > > >
>> > > > I have got reuters data from NIST. Now I want to generate the
>> training
>> > > data
>> > > > from that to create a model for detecting named entities. Can anyone
>> > tell
>> > > > me how the models can be generated from that.
>> > > >
>> > > > --
>> > > > With Regards
>> > > > Madhvi Gupta
>> > > > *(Senior Software Engineer)*
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> >
>>
>
>

Re: How to train a Named entity detection model

Reply via email to