Re: How to train a Named entity detection model

Madhvi Gupta Mon, 27 Feb 2017 02:29:23 -0800

Hi Madhav,

My training data is not in format mentioned in [0] wiki.


It is in format generated through the following link:
http://www.clips.uantwerpen.be/conll2003/ner/000README

Its format is mentioned in the trailing mail.
I just want to know how opennlp models can be trained using that model. If
not then the how the required format can be generated?

With Regards
Madhvi Gupta
*(Senior Software Engineer)*

On Mon, Feb 27, 2017 at 12:47 PM, Madhav Sharan <msha...@usc.edu> wrote:

> Hi - Can you ensure that your training data is in format like mentioned in
> wiki ? [0]
>
> Like mentioned in wiki training should be something like this-
>
> <START:person> Pierre Vinken <END> 61 years old , will join the board as a
> nonexecutive director Nov. 29
>
> Here Type of Entity is "person" and "Pierre Vinken" is one of the person in
> training data.
>
> I was looking at links you shared and your data looks in different format.
> Can you ensure your eng.train is in above format?
>
> I think you can write your own code to read training file and convert it
> into OpenNLP format. Also look at [1] in case you can make use of some pre
> trained model available for OpenNLP
>
> HTH
>
>
>
> [0] https://opennlp.apache.org/documentation/1.7.2/manual/opennl
> p.html#tools.namefind.training
> [1] http://opennlp.sourceforge.net/models-1.5/
>
>
> --
> Madhav Sharan
>
>
> On Sun, Feb 26, 2017 at 9:42 PM, Madhvi Gupta <mgmahi....@gmail.com>
> wrote:
>
> > Please let me know if anyone have any idea about this
> >
> > With Regards
> > Madhvi Gupta
> > *(Senior Software Engineer)*
> >
> > On Tue, Feb 21, 2017 at 10:51 AM, Madhvi Gupta <mgmahi....@gmail.com>
> > wrote:
> >
> > > Hi Joern,
> > >
> > > Training data generated from reuters dataset is in the following
> format.
> > > It has generated three files eng.train, eng.testa, eng.testb.
> > >
> > > A DT I-NP O
> > > rare JJ I-NP O
> > > early JJ I-NP O
> > > handwritten JJ I-NP O
> > > draft NN I-NP O
> > > of IN I-PP O
> > > a DT I-NP O
> > > song NN I-NP O
> > > by IN I-PP O
> > > U.S. NNP I-NP I-LOC
> > > guitar NN I-NP O
> > > legend NN I-NP O
> > > Jimi NNP I-NP I-PER
> > >
> > > Using this training data file when I ran the command:
> > > ./opennlp TokenNameFinderTrainer -model en-ner-person.bin -lang en
> -data
> > > /home/centos/ner/eng.train -encoding UTF-8
> > >
> > > It is giving me the following error:
> > > ERROR: Not enough training data
> > > The provided training data is not sufficient to create enough events to
> > > train a model.
> > > To resolve this error use more training data, if this doesn't help
> there
> > > might
> > > be some fundamental problem with the training data itself.
> > >
> > > The format required for training opennlp models is in the form of
> > > sentences but training data prepared from reuters dataset is in the
> baove
> > > said format. So please tell me how training data can be generated in
> the
> > > required format or how the existing training data format can be used
> for
> > > generating models.
> > >
> > > With Regards
> > > Madhvi Gupta
> > > *(Senior Software Engineer)*
> > >
> > > On Mon, Feb 20, 2017 at 5:52 PM, Joern Kottmann <kottm...@gmail.com>
> > > wrote:
> > >
> > >> Please explain to us what is not working. Any error messages or
> > >> exceptions?
> > >>
> > >> The name finder by default trains on the default format which you can
> > see
> > >> in the documentation link i shared.
> > >>
> > >> Jörn
> > >>
> > >> On Mon, Feb 20, 2017 at 6:04 AM, Madhvi Gupta <mgmahi....@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi Joern,
> > >> >
> > >> > I have got the data from the following link which consist of corpus
> of
> > >> new
> > >> > articles.
> > >> > https://urldefense.proofpoint.com/v2/url?u=http-3A__trec.nis
> > t.gov_data_reuters_reuters.html&d=DwIFaQ&c=clK7kQUTWtAVEOVIg
> > vi0NU5BOUHhpN0H8p7CSfnc_gI&r=DhBa2eLkbd4gAFB01lkNgg&m=lMnAkl
> > nfFkmS3IfHhJy5PgR6CHe7-61J_5MAe3U8CJI&s=0sEQ0deDkUi3w600Svja
> > aKSVhtlEHEGzDh-l202X76o&e=
> > >> >
> > >> > Following the steps given in the below link I have created training
> > and
> > >> > test data but it is not working with the NameFinder of opennlp api.
> > >> > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.clip
> > s.uantwerpen.be_conll2003_ner_000README&d=DwIFaQ&c=clK7kQUTW
> > tAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=DhBa2eLkbd4gAFB01lkNgg&
> > m=lMnAklnfFkmS3IfHhJy5PgR6CHe7-61J_5MAe3U8CJI&s=ijG9-HM4_WRl
> > wIUM6VyvE0YB3arX5Z2BVN5SFKlmzN4&e=
> > >> >
> > >> > So can you please help me how to create training data out of that
> > corpus
> > >> > and use it to create name entity detection models?
> > >> >
> > >> > With Regards
> > >> > Madhvi Gupta
> > >> > *(Senior Software Engineer)*
> > >> >
> > >> > On Mon, Feb 20, 2017 at 1:00 AM, Joern Kottmann <kottm...@gmail.com
> >
> > >> > wrote:
> > >> >
> > >> > > Hello,
> > >> > >
> > >> > > to train the name finder you need training data that contains the
> > >> > entities
> > >> > > you would like to decect.
> > >> > > Is that the case with the data you have?
> > >> > >
> > >> > > Take a look at our documentation:
> > >> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__opennlp
> > .apache.org_documentation_1.7.2_manual_&d=DwIFaQ&c=clK7kQUTW
> > tAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=DhBa2eLkbd4gAFB01lkNgg&
> > m=lMnAklnfFkmS3IfHhJy5PgR6CHe7-61J_5MAe3U8CJI&s=aLn09MB1cLHy
> > ZI9a0NT3gLdj5ZNFrR_eg_PhHHQHYC4&e=
> > >> > > opennlp.html#tools.namefind.training
> > >> > >
> > >> > > At the beginning of that section you can see how the data has to
> be
> > >> > marked
> > >> > > up.
> > >> > >
> > >> > > Please note you that you need many sentences to train the name
> > finder.
> > >> > >
> > >> > > HTH,
> > >> > > Jörn
> > >> > >
> > >> > >
> > >> > > On Sat, Feb 18, 2017 at 11:28 AM, Madhvi Gupta <
> > mgmahi....@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > Hi All,
> > >> > > >
> > >> > > > I have got reuters data from NIST. Now I want to generate the
> > >> training
> > >> > > data
> > >> > > > from that to create a model for detecting named entities. Can
> > anyone
> > >> > tell
> > >> > > > me how the models can be generated from that.
> > >> > > >
> > >> > > > --
> > >> > > > With Regards
> > >> > > > Madhvi Gupta
> > >> > > > *(Senior Software Engineer)*
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> >
> > >>
> > >
> > >
> >
>

Re: How to train a Named entity detection model

Reply via email to