Re: Sentiment Analysis Parser updates
Great work:! ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA WWW: http://irds.usc.edu/ ++ On 7/6/16, 2:06 PM, "Anastasija Mensikova"wrote: >Hi everyone, > >Here are some updates on Sentiment Analysis Parser. > >As you might now, during the last week I created a pull request to OpenNLP, >and I still have a few things to fix there, but the work is happening right >now for it behind the scenes. I have also used Stanford Sentiment Treebank >to create a new, categorical, labeled dataset to train on to have more than >two categories (or Facebook similar categories) for sentiment analysis. Of >course, this categorical sentiment analysis is not perfect yet, but it is >up and working. I have also started working on our own SentimentEvaluator >and SentimentCrossValidator, which will hopefully be done soon. >My next goal is to, of course, finish the Evaluator and CrossValidator and >use my new categorical output to create more D3 graphs on our GitHub page. > >Have a great day/night! > >Thank you, >Anastasija.
Sentiment Analysis Parser updates
Hi everyone, Here are some updates on Sentiment Analysis Parser. As you might now, during the last week I created a pull request to OpenNLP, and I still have a few things to fix there, but the work is happening right now for it behind the scenes. I have also used Stanford Sentiment Treebank to create a new, categorical, labeled dataset to train on to have more than two categories (or Facebook similar categories) for sentiment analysis. Of course, this categorical sentiment analysis is not perfect yet, but it is up and working. I have also started working on our own SentimentEvaluator and SentimentCrossValidator, which will hopefully be done soon. My next goal is to, of course, finish the Evaluator and CrossValidator and use my new categorical output to create more D3 graphs on our GitHub page. Have a great day/night! Thank you, Anastasija.
Re: Sentiment Analysis Parser updates
Thanks William, this is a great idea. I will discuss it with Anastasija tomorrow. Cheers, Chris ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA WWW: http://irds.usc.edu/ ++ On 6/28/16, 12:01 PM, "William Colen"wrote: >Hi, > >I tried your code. Very good work so far! Congratulations. > >Is the examples/result file corrupted? It has only one line. > >Do you plan to implement a simple CLI to use it interactively from command >line, similar to > >bin/opennlp Doccat >bin/opennlp TokenNameFinder > >? > >Also, do you plan to add evaluation tools by extending >AbstractEvaluatorTool and AbstractCrossValidatorTool, as well as the >listener EvaluationErrorPrinter? I found these tools very useful while I am >developing new models and features, maybe you would find it useful as well. > >You could also check the DoccatFineGrainedReportListener as a start point >to create a confusion matrix (I think it would be easy because Doccat data >structures are similar to yours). > >The result would look like the follow (this is a 300 entries Portuguese >corpus I am building from Facebook messages): > > >=== Evaluation summary === > Number of documents:298 >Min sentence size: 1 >Max sentence size:463 >Average sentence size: 18,01 > Categories count: 4 > Accuracy: 61,41% > >=== Detailed Accuracy By Tag === > >- >| Tag | Errors | Count | % Err | Precision | Recall | F-Measure | >- >| neutral | 46 | 56 | 0,821 | 0,588 | 0,179 | 0,274 | >| positive | 46 | 70 | 0,657 | 0,48 | 0,343 | 0,4 | >| negative | 18 |167 | 0,108 | 0,651 | 0,892 | 0,753 | >| spam | 5 | 5 | 1 | 0 | 0 | 0 | >- > >=== Confusion matrix === > > >a b c d | Accuracy | <-- classified as > <149> 13 4 1 | 89,22% | a = negative > 42 <24>3 1 | 34,29% | b = positive > 3511 <10>. | 17,86% | c = neutral >3 2 .<.>| 0% | d = spam > > > > >Regards, >William > >2016-06-23 2:11 GMT-03:00 Mattmann, Chris A (3980) < >chris.a.mattm...@jpl.nasa.gov>: > >> Thank you Jason! >> >> ++ >> Chris Mattmann, Ph.D. >> Chief Architect >> Instrument Software and Science Data Systems Section (398) >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 168-519, Mailstop: 168-527 >> Email: chris.a.mattm...@nasa.gov >> WWW: http://sunset.usc.edu/~mattmann/ >> ++ >> Director, Information Retrieval and Data Science Group (IRDS) >> Adjunct Associate Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> WWW: http://irds.usc.edu/ >> ++ >> >> >> >> >> >> >> >> >> >> >> On 6/22/16, 8:41 PM, "Jason Baldridge" wrote: >> >> >Anastasija, >> > >> >There might be a few appropriate sentiment datasets listed in my homework >> >on Twitter sentiment analysis: >> > >> >https://github.com/utcompling/applied-nlp/wiki/Homework5 >> > >> >There may also be some useful data sets in the Crowdflower Open Data >> >collection: >> > >> >https://www.crowdflower.com/data-for-everyone/ >> > >> >Hope this helps! >> > >> >-Jason >> > >> >On Wed, 22 Jun 2016 at 15:59 Anastasija Mensikova < >> >mensikova.anastas...@gmail.com> wrote: >> > >> >> Hi everyone, >> >> >> >> Some updates on our Sentiment Analysis Parser work. >> >> >> >> You might have noticed, I have enhanced our website (the GH page) >> recently, >> >> polished it and made it more user-friendly. My next step will be >> sending a >> >> pull request to Tika. However, my main goal until the end of Google >> Summer >> >> of Code is to enhance the parser in a way that will allow it to work >> >> categorically (in other words, the sentiment determined won't be just >> >> positive or negative, it will have a few categories). This means that my >> >> next step is to look for a categorical open data set (which I will >> >> hopefully do by
Re: Sentiment Analysis Parser updates
Hi, I tried your code. Very good work so far! Congratulations. Is the examples/result file corrupted? It has only one line. Do you plan to implement a simple CLI to use it interactively from command line, similar to bin/opennlp Doccat bin/opennlp TokenNameFinder ? Also, do you plan to add evaluation tools by extending AbstractEvaluatorTool and AbstractCrossValidatorTool, as well as the listener EvaluationErrorPrinter? I found these tools very useful while I am developing new models and features, maybe you would find it useful as well. You could also check the DoccatFineGrainedReportListener as a start point to create a confusion matrix (I think it would be easy because Doccat data structures are similar to yours). The result would look like the follow (this is a 300 entries Portuguese corpus I am building from Facebook messages): === Evaluation summary === Number of documents:298 Min sentence size: 1 Max sentence size:463 Average sentence size: 18,01 Categories count: 4 Accuracy: 61,41% === Detailed Accuracy By Tag === - | Tag | Errors | Count | % Err | Precision | Recall | F-Measure | - | neutral | 46 | 56 | 0,821 | 0,588 | 0,179 | 0,274 | | positive | 46 | 70 | 0,657 | 0,48 | 0,343 | 0,4 | | negative | 18 |167 | 0,108 | 0,651 | 0,892 | 0,753 | | spam | 5 | 5 | 1 | 0 | 0 | 0 | - === Confusion matrix === a b c d | Accuracy | <-- classified as <149> 13 4 1 | 89,22% | a = negative 42 <24>3 1 | 34,29% | b = positive 3511 <10>. | 17,86% | c = neutral 3 2 .<.>| 0% | d = spam Regards, William 2016-06-23 2:11 GMT-03:00 Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov>: > Thank you Jason! > > ++ > Chris Mattmann, Ph.D. > Chief Architect > Instrument Software and Science Data Systems Section (398) > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 168-519, Mailstop: 168-527 > Email: chris.a.mattm...@nasa.gov > WWW: http://sunset.usc.edu/~mattmann/ > ++ > Director, Information Retrieval and Data Science Group (IRDS) > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > WWW: http://irds.usc.edu/ > ++ > > > > > > > > > > > On 6/22/16, 8:41 PM, "Jason Baldridge"wrote: > > >Anastasija, > > > >There might be a few appropriate sentiment datasets listed in my homework > >on Twitter sentiment analysis: > > > >https://github.com/utcompling/applied-nlp/wiki/Homework5 > > > >There may also be some useful data sets in the Crowdflower Open Data > >collection: > > > >https://www.crowdflower.com/data-for-everyone/ > > > >Hope this helps! > > > >-Jason > > > >On Wed, 22 Jun 2016 at 15:59 Anastasija Mensikova < > >mensikova.anastas...@gmail.com> wrote: > > > >> Hi everyone, > >> > >> Some updates on our Sentiment Analysis Parser work. > >> > >> You might have noticed, I have enhanced our website (the GH page) > recently, > >> polished it and made it more user-friendly. My next step will be > sending a > >> pull request to Tika. However, my main goal until the end of Google > Summer > >> of Code is to enhance the parser in a way that will allow it to work > >> categorically (in other words, the sentiment determined won't be just > >> positive or negative, it will have a few categories). This means that my > >> next step is to look for a categorical open data set (which I will > >> hopefully do by the end of the weekend the latest) and, of course, > enhance > >> my model and training. After that I will look into how the confidence > >> levels can be increased. > >> > >> Have a great day/night! > >> > >> Thank you, > >> Anastasija Mensikova. > >> >
Re: Sentiment Analysis Parser updates
Thank you Jason! ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA WWW: http://irds.usc.edu/ ++ On 6/22/16, 8:41 PM, "Jason Baldridge"wrote: >Anastasija, > >There might be a few appropriate sentiment datasets listed in my homework >on Twitter sentiment analysis: > >https://github.com/utcompling/applied-nlp/wiki/Homework5 > >There may also be some useful data sets in the Crowdflower Open Data >collection: > >https://www.crowdflower.com/data-for-everyone/ > >Hope this helps! > >-Jason > >On Wed, 22 Jun 2016 at 15:59 Anastasija Mensikova < >mensikova.anastas...@gmail.com> wrote: > >> Hi everyone, >> >> Some updates on our Sentiment Analysis Parser work. >> >> You might have noticed, I have enhanced our website (the GH page) recently, >> polished it and made it more user-friendly. My next step will be sending a >> pull request to Tika. However, my main goal until the end of Google Summer >> of Code is to enhance the parser in a way that will allow it to work >> categorically (in other words, the sentiment determined won't be just >> positive or negative, it will have a few categories). This means that my >> next step is to look for a categorical open data set (which I will >> hopefully do by the end of the weekend the latest) and, of course, enhance >> my model and training. After that I will look into how the confidence >> levels can be increased. >> >> Have a great day/night! >> >> Thank you, >> Anastasija Mensikova. >>
Re: Sentiment Analysis Parser updates
Anastasija, There might be a few appropriate sentiment datasets listed in my homework on Twitter sentiment analysis: https://github.com/utcompling/applied-nlp/wiki/Homework5 There may also be some useful data sets in the Crowdflower Open Data collection: https://www.crowdflower.com/data-for-everyone/ Hope this helps! -Jason On Wed, 22 Jun 2016 at 15:59 Anastasija Mensikova < mensikova.anastas...@gmail.com> wrote: > Hi everyone, > > Some updates on our Sentiment Analysis Parser work. > > You might have noticed, I have enhanced our website (the GH page) recently, > polished it and made it more user-friendly. My next step will be sending a > pull request to Tika. However, my main goal until the end of Google Summer > of Code is to enhance the parser in a way that will allow it to work > categorically (in other words, the sentiment determined won't be just > positive or negative, it will have a few categories). This means that my > next step is to look for a categorical open data set (which I will > hopefully do by the end of the weekend the latest) and, of course, enhance > my model and training. After that I will look into how the confidence > levels can be increased. > > Have a great day/night! > > Thank you, > Anastasija Mensikova. >
Sentiment Analysis Parser updates
Hi everyone, Some updates on our Sentiment Analysis Parser work. You might have noticed, I have enhanced our website (the GH page) recently, polished it and made it more user-friendly. My next step will be sending a pull request to Tika. However, my main goal until the end of Google Summer of Code is to enhance the parser in a way that will allow it to work categorically (in other words, the sentiment determined won't be just positive or negative, it will have a few categories). This means that my next step is to look for a categorical open data set (which I will hopefully do by the end of the weekend the latest) and, of course, enhance my model and training. After that I will look into how the confidence levels can be increased. Have a great day/night! Thank you, Anastasija Mensikova.
Re: Sentiment Analysis Parser updates
Great update Anastasija! ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA WWW: http://irds.usc.edu/ ++ On 6/17/16, 2:28 PM, "Anastasija Mensikova"wrote: >Hello everyone, > >Some updates on my work on the Sentiment Analysis Parser. > >As you know, I have finished a basic version of the parser, and I'm >currently working on going through the results of the parser run on the gun >ads the right way so I can easily build graphs to illustrate how it all >works. >As you probably noticed, I changed some parts of the parser allowing it to >output the data in JSON. I have also worked on creating scripts (not on >GitHub) that load the 100 random gun ads, perform sentiment analysis on >them using the parser and output the data needed for the graph. Using the >output I received I have already managed to build two graphs using D3: one >solely on the distribution of sentiment among the gun ads, and the other >one on the distribution of sentiment of the gun ads in the countries (where >the guns were made) presented, which you can all see on our GitHub page. > >I hope you have a great weekend! > >Thank you, >Anastasija.