Please let me know if anyone have any idea about this With Regards Madhvi Gupta *(Senior Software Engineer)*
On Tue, Feb 21, 2017 at 10:51 AM, Madhvi Gupta <mgmahi....@gmail.com> wrote: > Hi Joern, > > Training data generated from reuters dataset is in the following format. > It has generated three files eng.train, eng.testa, eng.testb. > > A DT I-NP O > rare JJ I-NP O > early JJ I-NP O > handwritten JJ I-NP O > draft NN I-NP O > of IN I-PP O > a DT I-NP O > song NN I-NP O > by IN I-PP O > U.S. NNP I-NP I-LOC > guitar NN I-NP O > legend NN I-NP O > Jimi NNP I-NP I-PER > > Using this training data file when I ran the command: > ./opennlp TokenNameFinderTrainer -model en-ner-person.bin -lang en -data > /home/centos/ner/eng.train -encoding UTF-8 > > It is giving me the following error: > ERROR: Not enough training data > The provided training data is not sufficient to create enough events to > train a model. > To resolve this error use more training data, if this doesn't help there > might > be some fundamental problem with the training data itself. > > The format required for training opennlp models is in the form of > sentences but training data prepared from reuters dataset is in the baove > said format. So please tell me how training data can be generated in the > required format or how the existing training data format can be used for > generating models. > > With Regards > Madhvi Gupta > *(Senior Software Engineer)* > > On Mon, Feb 20, 2017 at 5:52 PM, Joern Kottmann <kottm...@gmail.com> > wrote: > >> Please explain to us what is not working. Any error messages or >> exceptions? >> >> The name finder by default trains on the default format which you can see >> in the documentation link i shared. >> >> Jörn >> >> On Mon, Feb 20, 2017 at 6:04 AM, Madhvi Gupta <mgmahi....@gmail.com> >> wrote: >> >> > Hi Joern, >> > >> > I have got the data from the following link which consist of corpus of >> new >> > articles. >> > http://trec.nist.gov/data/reuters/reuters.html >> > >> > Following the steps given in the below link I have created training and >> > test data but it is not working with the NameFinder of opennlp api. >> > http://www.clips.uantwerpen.be/conll2003/ner/000README >> > >> > So can you please help me how to create training data out of that corpus >> > and use it to create name entity detection models? >> > >> > With Regards >> > Madhvi Gupta >> > *(Senior Software Engineer)* >> > >> > On Mon, Feb 20, 2017 at 1:00 AM, Joern Kottmann <kottm...@gmail.com> >> > wrote: >> > >> > > Hello, >> > > >> > > to train the name finder you need training data that contains the >> > entities >> > > you would like to decect. >> > > Is that the case with the data you have? >> > > >> > > Take a look at our documentation: >> > > https://opennlp.apache.org/documentation/1.7.2/manual/ >> > > opennlp.html#tools.namefind.training >> > > >> > > At the beginning of that section you can see how the data has to be >> > marked >> > > up. >> > > >> > > Please note you that you need many sentences to train the name finder. >> > > >> > > HTH, >> > > Jörn >> > > >> > > >> > > On Sat, Feb 18, 2017 at 11:28 AM, Madhvi Gupta <mgmahi....@gmail.com> >> > > wrote: >> > > >> > > > Hi All, >> > > > >> > > > I have got reuters data from NIST. Now I want to generate the >> training >> > > data >> > > > from that to create a model for detecting named entities. Can anyone >> > tell >> > > > me how the models can be generated from that. >> > > > >> > > > -- >> > > > With Regards >> > > > Madhvi Gupta >> > > > *(Senior Software Engineer)* >> > > > >> > > >> > >> > >> > >> > -- >> > >> > >