Re: OpenNLP: Named Entity Recognition ( Token Name Finder )
Thanks for the feedback. I have evaluated F1 with Maxent, will try with percepron as well. Nikhil Jain Sent from Yahoo Mail on Android From:"William Colen" Date:Thu, Jun 18, 2015 at 2:27 AM Subject:Re: OpenNLP: Named Entity Recognition ( Token Name Finder ) I can't remember if the interactions parameter is used in PERCEPTRON. With my experience with other tools, you should use Cutoff 0. Perceptron takes advantage of every feature you add. Did you try the evaluation tools to compute F1? 2015-06-17 13:25 GMT-03:00 nikhil jain : > Hello, > > > Did anyone get a chance to look at this. > > Please provide some feedback. > > > Thanks > > Nikhil Jain > > Sent from Yahoo Mail on Android > > From:"nikhil jain" > Date:Tue, Jun 16, 2015 at 4:36 PM > Subject:Re: OpenNLP: Named Entity Recognition ( Token Name Finder ) > > Hi William, > > > Thanks for the link. > > > I have tried both model Maxent and perception on my problem and Perception > is working much better than Maxent. > > > I have one question, when I am creating a perceptron model using cutoff 5 > and iterations 100 then after 5th iteration model is adjusting itself and > not going forward for further iterations, so my question is, is it correct > behaviour or I am doing something wrong. > > > Adding some code and logs for the reference. > > > ObjectStream sampleStream = new > NameSampleDataStream(lineStream); > > > > TokenNameFinderModel model = null; > > TrainingParameters tp = new TrainingParameters(); > > //tp.put(TrainingParameters.ALGORITHM_PARAM, "MAXENT"); > > tp.put(TrainingParameters.ALGORITHM_PARAM, "PERCEPTRON"); > > System.out.println("244:Hybrid parser:PERCEPTRON"); > > tp.put(TrainingParameters.ITERATIONS_PARAM, > Integer.toString(100)); > > tp.put(TrainingParameters.CUTOFF_PARAM, Integer.toString(5)); > > tp.put("Threads", "3"); > > > > opennlp.tools.util.featuregen.AdaptiveFeatureGenerator > generator = null; > > > > try { > > Map resources = null; > > model = NameFinderME.train( "en", "security", > sampleStream, tp, generator, resources); > > } catch (IOException e) { > > > > > > Indexing events using cutoff of 5 > > > > Computing event counts... done. 8209384 events > > Indexing... done. > > Collecting events... Done indexing. > > Incorporating indexed data for training... > > done. > > Number of Event Tokens: 8209384 > > Number of Outcomes: 34 > > Number of Predicates: 325780 > > Computing model parameters... > > Performing 100 iterations. > > 1: . (8209184/8209384) 0.75637636149 > > 2: . (8209291/8209384) 0.886715008093 > > 3: . (8209340/8209384) 0.946402799528 > > 4: . (8209356/8209384) 0.9999965892690609 > > 5: . (8209357/8209384) 0.967110808802 > > Stopping: change in training set accuracy less than 1.0E-5 > > Stats: (8104703/8209384) 0.9872486169486042 > > ...done. > > Compressed 325780 parameters to 3957 > > 532 outcome patterns > > > > Thanks > > Nikhil > > > Sent from Yahoo Mail on Android > > From:"William Colen" > Date:Fri, May 29, 2015 at 5:47 PM > Subject:Re: OpenNLP: Named Entity Recognition ( Token Name Finder ) > > The answer about the differences would be quite long. You can learn about > the theory researching online. Try some papers from here: > https://cwiki.apache.org/confluence/display/OPENNLP/NLP+Papers > > Which algorithm is better for you depends on your task and your data. You > can start developing using the standard Maxent and when your environment is > ready you can try other ML implementations. > > Regards, > William > > > 2015-05-29 7:07 GMT-03:00 nikhil jain : > > > Hello, > > > > > > Did anyone get a chance to look at the email. I know I am asking a very > > basic question but being a new in this subject, its very difficult to > > understand the terms. > > > > > > I tried to understand by reading wiki pages but not fully understand that > > why I raised a question. > > > > > > Thanks > > > > Nikhil > > > > Sent from Yahoo Mail on Android > > > > From:"nikhil jain" > > Date:Tue, May 19, 2015 at 11:51 PM > > Subject:OpenNLP: Named Entity Recognition ( Token Name Finder ) > > > > Hello Everyone, > > > > > > I was reading a openNLP documentation, and found that OpenNLP supports > > Maxent, Perceptron and Perceptron sequence type models. > > > > > > Could someone please explain me the difference in between them? > > > > > > I am trying to understand which one would be good for tagging sequence of > > data. > > > > > > BTW, I am new in NLP and Machine learning. so please help me to > understand > > this. > > > > > > Thanks > > > > Nikhil Jain > > > > > > > > > > > > > > > >
Re: OpenNLP: Named Entity Recognition ( Token Name Finder )
I can't remember if the interactions parameter is used in PERCEPTRON. With my experience with other tools, you should use Cutoff 0. Perceptron takes advantage of every feature you add. Did you try the evaluation tools to compute F1? 2015-06-17 13:25 GMT-03:00 nikhil jain : > Hello, > > > Did anyone get a chance to look at this. > > Please provide some feedback. > > > Thanks > > Nikhil Jain > > Sent from Yahoo Mail on Android > > From:"nikhil jain" > Date:Tue, Jun 16, 2015 at 4:36 PM > Subject:Re: OpenNLP: Named Entity Recognition ( Token Name Finder ) > > Hi William, > > > Thanks for the link. > > > I have tried both model Maxent and perception on my problem and Perception > is working much better than Maxent. > > > I have one question, when I am creating a perceptron model using cutoff 5 > and iterations 100 then after 5th iteration model is adjusting itself and > not going forward for further iterations, so my question is, is it correct > behaviour or I am doing something wrong. > > > Adding some code and logs for the reference. > > > ObjectStream sampleStream = new > NameSampleDataStream(lineStream); > > > > TokenNameFinderModel model = null; > > TrainingParameters tp = new TrainingParameters(); > > //tp.put(TrainingParameters.ALGORITHM_PARAM, "MAXENT"); > > tp.put(TrainingParameters.ALGORITHM_PARAM, "PERCEPTRON"); > > System.out.println("244:Hybrid parser:PERCEPTRON"); > > tp.put(TrainingParameters.ITERATIONS_PARAM, > Integer.toString(100)); > > tp.put(TrainingParameters.CUTOFF_PARAM, Integer.toString(5)); > > tp.put("Threads", "3"); > > > > opennlp.tools.util.featuregen.AdaptiveFeatureGenerator > generator = null; > > > > try { > > Map resources = null; > > model = NameFinderME.train( "en", "security", > sampleStream, tp, generator, resources); > > } catch (IOException e) { > > > > > > Indexing events using cutoff of 5 > > > >Computing event counts... done. 8209384 events > >Indexing... done. > > Collecting events... Done indexing. > > Incorporating indexed data for training... > > done. > >Number of Event Tokens: 8209384 > >Number of Outcomes: 34 > > Number of Predicates: 325780 > > Computing model parameters... > > Performing 100 iterations. > > 1: . (8209184/8209384) 0.75637636149 > > 2: . (8209291/8209384) 0.886715008093 > > 3: . (8209340/8209384) 0.946402799528 > > 4: . (8209356/8209384) 0.9999965892690609 > > 5: . (8209357/8209384) 0.967110808802 > > Stopping: change in training set accuracy less than 1.0E-5 > > Stats: (8104703/8209384) 0.9872486169486042 > > ...done. > > Compressed 325780 parameters to 3957 > > 532 outcome patterns > > > > Thanks > > Nikhil > > > Sent from Yahoo Mail on Android > > From:"William Colen" > Date:Fri, May 29, 2015 at 5:47 PM > Subject:Re: OpenNLP: Named Entity Recognition ( Token Name Finder ) > > The answer about the differences would be quite long. You can learn about > the theory researching online. Try some papers from here: > https://cwiki.apache.org/confluence/display/OPENNLP/NLP+Papers > > Which algorithm is better for you depends on your task and your data. You > can start developing using the standard Maxent and when your environment is > ready you can try other ML implementations. > > Regards, > William > > > 2015-05-29 7:07 GMT-03:00 nikhil jain : > > > Hello, > > > > > > Did anyone get a chance to look at the email. I know I am asking a very > > basic question but being a new in this subject, its very difficult to > > understand the terms. > > > > > > I tried to understand by reading wiki pages but not fully understand that > > why I raised a question. > > > > > > Thanks > > > > Nikhil > > > > Sent from Yahoo Mail on Android > > > > From:"nikhil jain" > > Date:Tue, May 19, 2015 at 11:51 PM > > Subject:OpenNLP: Named Entity Recognition ( Token Name Finder ) > > > > Hello Everyone, > > > > > > I was reading a openNLP documentation, and found that OpenNLP supports > > Maxent, Perceptron and Perceptron sequence type models. > > > > > > Could someone please explain me the difference in between them? > > > > > > I am trying to understand which one would be good for tagging sequence of > > data. > > > > > > BTW, I am new in NLP and Machine learning. so please help me to > understand > > this. > > > > > > Thanks > > > > Nikhil Jain > > > > > > > > > > > > > > > >
Re: OpenNLP: Named Entity Recognition ( Token Name Finder )
Hello, Did anyone get a chance to look at this. Please provide some feedback. Thanks Nikhil Jain Sent from Yahoo Mail on Android From:"nikhil jain" Date:Tue, Jun 16, 2015 at 4:36 PM Subject:Re: OpenNLP: Named Entity Recognition ( Token Name Finder ) Hi William, Thanks for the link. I have tried both model Maxent and perception on my problem and Perception is working much better than Maxent. I have one question, when I am creating a perceptron model using cutoff 5 and iterations 100 then after 5th iteration model is adjusting itself and not going forward for further iterations, so my question is, is it correct behaviour or I am doing something wrong. Adding some code and logs for the reference. ObjectStream sampleStream = new NameSampleDataStream(lineStream); TokenNameFinderModel model = null; TrainingParameters tp = new TrainingParameters(); //tp.put(TrainingParameters.ALGORITHM_PARAM, "MAXENT"); tp.put(TrainingParameters.ALGORITHM_PARAM, "PERCEPTRON"); System.out.println("244:Hybrid parser:PERCEPTRON"); tp.put(TrainingParameters.ITERATIONS_PARAM, Integer.toString(100)); tp.put(TrainingParameters.CUTOFF_PARAM, Integer.toString(5)); tp.put("Threads", "3"); opennlp.tools.util.featuregen.AdaptiveFeatureGenerator generator = null; try { Map resources = null; model = NameFinderME.train( "en", "security", sampleStream, tp, generator, resources); } catch (IOException e) { Indexing events using cutoff of 5 Computing event counts... done. 8209384 events Indexing... done. Collecting events... Done indexing. Incorporating indexed data for training... done. Number of Event Tokens: 8209384 Number of Outcomes: 34 Number of Predicates: 325780 Computing model parameters... Performing 100 iterations. 1: . (8209184/8209384) 0.75637636149 2: . (8209291/8209384) 0.886715008093 3: . (8209340/8209384) 0.946402799528 4: . (8209356/8209384) 0.965892690609 5: . (8209357/8209384) 0.967110808802 Stopping: change in training set accuracy less than 1.0E-5 Stats: (8104703/8209384) 0.9872486169486042 ...done. Compressed 325780 parameters to 3957 532 outcome patterns Thanks Nikhil Sent from Yahoo Mail on Android From:"William Colen" Date:Fri, May 29, 2015 at 5:47 PM Subject:Re: OpenNLP: Named Entity Recognition ( Token Name Finder ) The answer about the differences would be quite long. You can learn about the theory researching online. Try some papers from here: https://cwiki.apache.org/confluence/display/OPENNLP/NLP+Papers Which algorithm is better for you depends on your task and your data. You can start developing using the standard Maxent and when your environment is ready you can try other ML implementations. Regards, William 2015-05-29 7:07 GMT-03:00 nikhil jain : > Hello, > > > Did anyone get a chance to look at the email. I know I am asking a very > basic question but being a new in this subject, its very difficult to > understand the terms. > > > I tried to understand by reading wiki pages but not fully understand that > why I raised a question. > > > Thanks > > Nikhil > > Sent from Yahoo Mail on Android > > From:"nikhil jain" > Date:Tue, May 19, 2015 at 11:51 PM > Subject:OpenNLP: Named Entity Recognition ( Token Name Finder ) > > Hello Everyone, > > > I was reading a openNLP documentation, and found that OpenNLP supports > Maxent, Perceptron and Perceptron sequence type models. > > > Could someone please explain me the difference in between them? > > > I am trying to understand which one would be good for tagging sequence of > data. > > > BTW, I am new in NLP and Machine learning. so please help me to understand > this. > > > Thanks > > Nikhil Jain > > > > > > >
Re: OpenNLP: Named Entity Recognition ( Token Name Finder )
Hi William, Thanks for the link. I have tried both model Maxent and perception on my problem and Perception is working much better than Maxent. I have one question, when I am creating a perceptron model using cutoff 5 and iterations 100 then after 5th iteration model is adjusting itself and not going forward for further iterations, so my question is, is it correct behaviour or I am doing something wrong. Adding some code and logs for the reference. ObjectStream sampleStream = new NameSampleDataStream(lineStream); TokenNameFinderModel model = null; TrainingParameters tp = new TrainingParameters(); //tp.put(TrainingParameters.ALGORITHM_PARAM, "MAXENT"); tp.put(TrainingParameters.ALGORITHM_PARAM, "PERCEPTRON"); System.out.println("244:Hybrid parser:PERCEPTRON"); tp.put(TrainingParameters.ITERATIONS_PARAM, Integer.toString(100)); tp.put(TrainingParameters.CUTOFF_PARAM, Integer.toString(5)); tp.put("Threads", "3"); opennlp.tools.util.featuregen.AdaptiveFeatureGenerator generator = null; try { Map resources = null; model = NameFinderME.train( "en", "security", sampleStream, tp, generator, resources); } catch (IOException e) { Indexing events using cutoff of 5 Computing event counts... done. 8209384 events Indexing... done. Collecting events... Done indexing. Incorporating indexed data for training... done. Number of Event Tokens: 8209384 Number of Outcomes: 34 Number of Predicates: 325780 Computing model parameters... Performing 100 iterations. 1: . (8209184/8209384) 0.75637636149 2: . (8209291/8209384) 0.886715008093 3: . (8209340/8209384) 0.946402799528 4: . (8209356/8209384) 0.965892690609 5: . (8209357/8209384) 0.967110808802 Stopping: change in training set accuracy less than 1.0E-5 Stats: (8104703/8209384) 0.9872486169486042 ...done. Compressed 325780 parameters to 3957 532 outcome patterns Thanks Nikhil Sent from Yahoo Mail on Android From:"William Colen" Date:Fri, May 29, 2015 at 5:47 PM Subject:Re: OpenNLP: Named Entity Recognition ( Token Name Finder ) The answer about the differences would be quite long. You can learn about the theory researching online. Try some papers from here: https://cwiki.apache.org/confluence/display/OPENNLP/NLP+Papers Which algorithm is better for you depends on your task and your data. You can start developing using the standard Maxent and when your environment is ready you can try other ML implementations. Regards, William 2015-05-29 7:07 GMT-03:00 nikhil jain : > Hello, > > > Did anyone get a chance to look at the email. I know I am asking a very > basic question but being a new in this subject, its very difficult to > understand the terms. > > > I tried to understand by reading wiki pages but not fully understand that > why I raised a question. > > > Thanks > > Nikhil > > Sent from Yahoo Mail on Android > > From:"nikhil jain" > Date:Tue, May 19, 2015 at 11:51 PM > Subject:OpenNLP: Named Entity Recognition ( Token Name Finder ) > > Hello Everyone, > > > I was reading a openNLP documentation, and found that OpenNLP supports > Maxent, Perceptron and Perceptron sequence type models. > > > Could someone please explain me the difference in between them? > > > I am trying to understand which one would be good for tagging sequence of > data. > > > BTW, I am new in NLP and Machine learning. so please help me to understand > this. > > > Thanks > > Nikhil Jain > > > > > > >
Re: OpenNLP: Named Entity Recognition ( Token Name Finder )
The answer about the differences would be quite long. You can learn about the theory researching online. Try some papers from here: https://cwiki.apache.org/confluence/display/OPENNLP/NLP+Papers Which algorithm is better for you depends on your task and your data. You can start developing using the standard Maxent and when your environment is ready you can try other ML implementations. Regards, William 2015-05-29 7:07 GMT-03:00 nikhil jain : > Hello, > > > Did anyone get a chance to look at the email. I know I am asking a very > basic question but being a new in this subject, its very difficult to > understand the terms. > > > I tried to understand by reading wiki pages but not fully understand that > why I raised a question. > > > Thanks > > Nikhil > > Sent from Yahoo Mail on Android > > From:"nikhil jain" > Date:Tue, May 19, 2015 at 11:51 PM > Subject:OpenNLP: Named Entity Recognition ( Token Name Finder ) > > Hello Everyone, > > > I was reading a openNLP documentation, and found that OpenNLP supports > Maxent, Perceptron and Perceptron sequence type models. > > > Could someone please explain me the difference in between them? > > > I am trying to understand which one would be good for tagging sequence of > data. > > > BTW, I am new in NLP and Machine learning. so please help me to understand > this. > > > Thanks > > Nikhil Jain > > > > > > >
Re: OpenNLP: Named Entity Recognition ( Token Name Finder )
Hello, Did anyone get a chance to look at the email. I know I am asking a very basic question but being a new in this subject, its very difficult to understand the terms. I tried to understand by reading wiki pages but not fully understand that why I raised a question. Thanks Nikhil Sent from Yahoo Mail on Android From:"nikhil jain" Date:Tue, May 19, 2015 at 11:51 PM Subject:OpenNLP: Named Entity Recognition ( Token Name Finder ) Hello Everyone, I was reading a openNLP documentation, and found that OpenNLP supports Maxent, Perceptron and Perceptron sequence type models. Could someone please explain me the difference in between them? I am trying to understand which one would be good for tagging sequence of data. BTW, I am new in NLP and Machine learning. so please help me to understand this. Thanks Nikhil Jain
OpenNLP: Named Entity Recognition ( Token Name Finder )
Hello Everyone, I was reading a openNLP documentation, and found that OpenNLP supports Maxent, Perceptron and Perceptron sequence type models. Could someone please explain me the difference in between them? I am trying to understand which one would be good for tagging sequence of data. BTW, I am new in NLP and Machine learning. so please help me to understand this. ThanksNikhil Jain