Hi Madhav, My training data is not in format mentioned in [0] wiki.
It is in format generated through the following link: http://www.clips.uantwerpen.be/conll2003/ner/000README Its format is mentioned in the trailing mail. I just want to know how opennlp models can be trained using that model. If not then the how the required format can be generated? With Regards Madhvi Gupta *(Senior Software Engineer)* On Mon, Feb 27, 2017 at 12:47 PM, Madhav Sharan <msha...@usc.edu> wrote: > Hi - Can you ensure that your training data is in format like mentioned in > wiki ? [0] > > Like mentioned in wiki training should be something like this- > > <START:person> Pierre Vinken <END> 61 years old , will join the board as a > nonexecutive director Nov. 29 > > Here Type of Entity is "person" and "Pierre Vinken" is one of the person in > training data. > > I was looking at links you shared and your data looks in different format. > Can you ensure your eng.train is in above format? > > I think you can write your own code to read training file and convert it > into OpenNLP format. Also look at [1] in case you can make use of some pre > trained model available for OpenNLP > > HTH > > > > [0] https://opennlp.apache.org/documentation/1.7.2/manual/opennl > p.html#tools.namefind.training > [1] http://opennlp.sourceforge.net/models-1.5/ > > > -- > Madhav Sharan > > > On Sun, Feb 26, 2017 at 9:42 PM, Madhvi Gupta <mgmahi....@gmail.com> > wrote: > > > Please let me know if anyone have any idea about this > > > > With Regards > > Madhvi Gupta > > *(Senior Software Engineer)* > > > > On Tue, Feb 21, 2017 at 10:51 AM, Madhvi Gupta <mgmahi....@gmail.com> > > wrote: > > > > > Hi Joern, > > > > > > Training data generated from reuters dataset is in the following > format. > > > It has generated three files eng.train, eng.testa, eng.testb. > > > > > > A DT I-NP O > > > rare JJ I-NP O > > > early JJ I-NP O > > > handwritten JJ I-NP O > > > draft NN I-NP O > > > of IN I-PP O > > > a DT I-NP O > > > song NN I-NP O > > > by IN I-PP O > > > U.S. NNP I-NP I-LOC > > > guitar NN I-NP O > > > legend NN I-NP O > > > Jimi NNP I-NP I-PER > > > > > > Using this training data file when I ran the command: > > > ./opennlp TokenNameFinderTrainer -model en-ner-person.bin -lang en > -data > > > /home/centos/ner/eng.train -encoding UTF-8 > > > > > > It is giving me the following error: > > > ERROR: Not enough training data > > > The provided training data is not sufficient to create enough events to > > > train a model. > > > To resolve this error use more training data, if this doesn't help > there > > > might > > > be some fundamental problem with the training data itself. > > > > > > The format required for training opennlp models is in the form of > > > sentences but training data prepared from reuters dataset is in the > baove > > > said format. So please tell me how training data can be generated in > the > > > required format or how the existing training data format can be used > for > > > generating models. > > > > > > With Regards > > > Madhvi Gupta > > > *(Senior Software Engineer)* > > > > > > On Mon, Feb 20, 2017 at 5:52 PM, Joern Kottmann <kottm...@gmail.com> > > > wrote: > > > > > >> Please explain to us what is not working. Any error messages or > > >> exceptions? > > >> > > >> The name finder by default trains on the default format which you can > > see > > >> in the documentation link i shared. > > >> > > >> Jörn > > >> > > >> On Mon, Feb 20, 2017 at 6:04 AM, Madhvi Gupta <mgmahi....@gmail.com> > > >> wrote: > > >> > > >> > Hi Joern, > > >> > > > >> > I have got the data from the following link which consist of corpus > of > > >> new > > >> > articles. > > >> > https://urldefense.proofpoint.com/v2/url?u=http-3A__trec.nis > > t.gov_data_reuters_reuters.html&d=DwIFaQ&c=clK7kQUTWtAVEOVIg > > vi0NU5BOUHhpN0H8p7CSfnc_gI&r=DhBa2eLkbd4gAFB01lkNgg&m=lMnAkl > > nfFkmS3IfHhJy5PgR6CHe7-61J_5MAe3U8CJI&s=0sEQ0deDkUi3w600Svja > > aKSVhtlEHEGzDh-l202X76o&e= > > >> > > > >> > Following the steps given in the below link I have created training > > and > > >> > test data but it is not working with the NameFinder of opennlp api. > > >> > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.clip > > s.uantwerpen.be_conll2003_ner_000README&d=DwIFaQ&c=clK7kQUTW > > tAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=DhBa2eLkbd4gAFB01lkNgg& > > m=lMnAklnfFkmS3IfHhJy5PgR6CHe7-61J_5MAe3U8CJI&s=ijG9-HM4_WRl > > wIUM6VyvE0YB3arX5Z2BVN5SFKlmzN4&e= > > >> > > > >> > So can you please help me how to create training data out of that > > corpus > > >> > and use it to create name entity detection models? > > >> > > > >> > With Regards > > >> > Madhvi Gupta > > >> > *(Senior Software Engineer)* > > >> > > > >> > On Mon, Feb 20, 2017 at 1:00 AM, Joern Kottmann <kottm...@gmail.com > > > > >> > wrote: > > >> > > > >> > > Hello, > > >> > > > > >> > > to train the name finder you need training data that contains the > > >> > entities > > >> > > you would like to decect. > > >> > > Is that the case with the data you have? > > >> > > > > >> > > Take a look at our documentation: > > >> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__opennlp > > .apache.org_documentation_1.7.2_manual_&d=DwIFaQ&c=clK7kQUTW > > tAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=DhBa2eLkbd4gAFB01lkNgg& > > m=lMnAklnfFkmS3IfHhJy5PgR6CHe7-61J_5MAe3U8CJI&s=aLn09MB1cLHy > > ZI9a0NT3gLdj5ZNFrR_eg_PhHHQHYC4&e= > > >> > > opennlp.html#tools.namefind.training > > >> > > > > >> > > At the beginning of that section you can see how the data has to > be > > >> > marked > > >> > > up. > > >> > > > > >> > > Please note you that you need many sentences to train the name > > finder. > > >> > > > > >> > > HTH, > > >> > > Jörn > > >> > > > > >> > > > > >> > > On Sat, Feb 18, 2017 at 11:28 AM, Madhvi Gupta < > > mgmahi....@gmail.com> > > >> > > wrote: > > >> > > > > >> > > > Hi All, > > >> > > > > > >> > > > I have got reuters data from NIST. Now I want to generate the > > >> training > > >> > > data > > >> > > > from that to create a model for detecting named entities. Can > > anyone > > >> > tell > > >> > > > me how the models can be generated from that. > > >> > > > > > >> > > > -- > > >> > > > With Regards > > >> > > > Madhvi Gupta > > >> > > > *(Senior Software Engineer)* > > >> > > > > > >> > > > > >> > > > >> > > > >> > > > >> > -- > > >> > > > >> > > > > > > > > >