Re: Sentiment Analysis Parser updates

2016-07-06 Thread Mattmann, Chris A (3980)
Great work:!

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++










On 7/6/16, 2:06 PM, "Anastasija Mensikova"  
wrote:

>Hi everyone,
>
>Here are some updates on Sentiment Analysis Parser.
>
>As you might now, during the last week I created a pull request to OpenNLP,
>and I still have a few things to fix there, but the work is happening right
>now for it behind the scenes. I have also used Stanford Sentiment Treebank
>to create a new, categorical, labeled dataset to train on to have more than
>two categories (or Facebook similar categories) for sentiment analysis. Of
>course, this categorical sentiment analysis is not perfect yet, but it is
>up and working. I have also started working on our own SentimentEvaluator
>and SentimentCrossValidator, which will hopefully be done soon.
>My next goal is to, of course, finish the Evaluator and CrossValidator and
>use my new categorical output to create more D3 graphs on our GitHub page.
>
>Have a great day/night!
>
>Thank you,
>Anastasija.


Sentiment Analysis Parser updates

2016-07-06 Thread Anastasija Mensikova
Hi everyone,

Here are some updates on Sentiment Analysis Parser.

As you might now, during the last week I created a pull request to OpenNLP,
and I still have a few things to fix there, but the work is happening right
now for it behind the scenes. I have also used Stanford Sentiment Treebank
to create a new, categorical, labeled dataset to train on to have more than
two categories (or Facebook similar categories) for sentiment analysis. Of
course, this categorical sentiment analysis is not perfect yet, but it is
up and working. I have also started working on our own SentimentEvaluator
and SentimentCrossValidator, which will hopefully be done soon.
My next goal is to, of course, finish the Evaluator and CrossValidator and
use my new categorical output to create more D3 graphs on our GitHub page.

Have a great day/night!

Thank you,
Anastasija.


Re: Sentiment Analysis Parser updates

2016-06-28 Thread Mattmann, Chris A (3980)
Thanks William, this is a great idea. I will discuss it with 
Anastasija tomorrow.


Cheers,
Chris


++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++










On 6/28/16, 12:01 PM, "William Colen"  wrote:

>Hi,
>
>I tried your code. Very good work so far! Congratulations.
>
>Is the examples/result file corrupted? It has only one line.
>
>Do you plan to implement a simple CLI to use it interactively from command
>line, similar to
>
>bin/opennlp Doccat
>bin/opennlp TokenNameFinder
>
>?
>
>Also, do you plan to add evaluation tools by extending
>AbstractEvaluatorTool and AbstractCrossValidatorTool, as well as the
>listener EvaluationErrorPrinter? I found these tools very useful while I am
>developing new models and features, maybe you would find it useful as well.
>
>You could also check the DoccatFineGrainedReportListener as a start point
>to create a confusion matrix (I think it would be easy because Doccat data
>structures are similar to yours).
>
>The result would look like the follow (this is a 300 entries Portuguese
>corpus I am building from Facebook messages):
>
>
>=== Evaluation summary ===
>  Number of documents:298
>Min sentence size:  1
>Max sentence size:463
>Average sentence size:  18,01
> Categories count:  4
> Accuracy: 61,41%
>
>=== Detailed Accuracy By Tag ===
>
>-
>|  Tag | Errors |  Count |   % Err | Precision | Recall | F-Measure |
>-
>|  neutral | 46 | 56 | 0,821   | 0,588 | 0,179  | 0,274 |
>| positive | 46 | 70 | 0,657   | 0,48  | 0,343  | 0,4   |
>| negative | 18 |167 | 0,108   | 0,651 | 0,892  | 0,753 |
>| spam |  5 |  5 | 1   | 0 | 0  | 0 |
>-
>
>=== Confusion matrix ===
>
>
>a b c d | Accuracy | <-- classified as
> <149>   13 4 1 |   89,22% |   a = negative
>   42   <24>3 1 |   34,29% |   b = positive
>   3511   <10>. |   17,86% |   c = neutral
>3 2 .<.>|   0% |   d = spam
>
>
>
>
>Regards,
>William
>
>2016-06-23 2:11 GMT-03:00 Mattmann, Chris A (3980) <
>chris.a.mattm...@jpl.nasa.gov>:
>
>> Thank you Jason!
>>
>> ++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattm...@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++
>> Director, Information Retrieval and Data Science Group (IRDS)
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> WWW: http://irds.usc.edu/
>> ++
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 6/22/16, 8:41 PM, "Jason Baldridge"  wrote:
>>
>> >Anastasija,
>> >
>> >There might be a few appropriate sentiment datasets listed in my homework
>> >on Twitter sentiment analysis:
>> >
>> >https://github.com/utcompling/applied-nlp/wiki/Homework5
>> >
>> >There may also be some useful data sets in the Crowdflower Open Data
>> >collection:
>> >
>> >https://www.crowdflower.com/data-for-everyone/
>> >
>> >Hope this helps!
>> >
>> >-Jason
>> >
>> >On Wed, 22 Jun 2016 at 15:59 Anastasija Mensikova <
>> >mensikova.anastas...@gmail.com> wrote:
>> >
>> >> Hi everyone,
>> >>
>> >> Some updates on our Sentiment Analysis Parser work.
>> >>
>> >> You might have noticed, I have enhanced our website (the GH page)
>> recently,
>> >> polished it and made it more user-friendly. My next step will be
>> sending a
>> >> pull request to Tika. However, my main goal until the end of Google
>> Summer
>> >> of Code is to enhance the parser in a way that will allow it to work
>> >> categorically (in other words, the sentiment determined won't be just
>> >> positive or negative, it will have a few categories). This means that my
>> >> next step is to look for a categorical open data set (which I will
>> >> hopefully do by 

Re: Sentiment Analysis Parser updates

2016-06-28 Thread William Colen
Hi,

I tried your code. Very good work so far! Congratulations.

Is the examples/result file corrupted? It has only one line.

Do you plan to implement a simple CLI to use it interactively from command
line, similar to

bin/opennlp Doccat
bin/opennlp TokenNameFinder

?

Also, do you plan to add evaluation tools by extending
AbstractEvaluatorTool and AbstractCrossValidatorTool, as well as the
listener EvaluationErrorPrinter? I found these tools very useful while I am
developing new models and features, maybe you would find it useful as well.

You could also check the DoccatFineGrainedReportListener as a start point
to create a confusion matrix (I think it would be easy because Doccat data
structures are similar to yours).

The result would look like the follow (this is a 300 entries Portuguese
corpus I am building from Facebook messages):


=== Evaluation summary ===
  Number of documents:298
Min sentence size:  1
Max sentence size:463
Average sentence size:  18,01
 Categories count:  4
 Accuracy: 61,41%

=== Detailed Accuracy By Tag ===

-
|  Tag | Errors |  Count |   % Err | Precision | Recall | F-Measure |
-
|  neutral | 46 | 56 | 0,821   | 0,588 | 0,179  | 0,274 |
| positive | 46 | 70 | 0,657   | 0,48  | 0,343  | 0,4   |
| negative | 18 |167 | 0,108   | 0,651 | 0,892  | 0,753 |
| spam |  5 |  5 | 1   | 0 | 0  | 0 |
-

=== Confusion matrix ===


a b c d | Accuracy | <-- classified as
 <149>   13 4 1 |   89,22% |   a = negative
   42   <24>3 1 |   34,29% |   b = positive
   3511   <10>. |   17,86% |   c = neutral
3 2 .<.>|   0% |   d = spam




Regards,
William

2016-06-23 2:11 GMT-03:00 Mattmann, Chris A (3980) <
chris.a.mattm...@jpl.nasa.gov>:

> Thank you Jason!
>
> ++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Director, Information Retrieval and Data Science Group (IRDS)
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> WWW: http://irds.usc.edu/
> ++
>
>
>
>
>
>
>
>
>
>
> On 6/22/16, 8:41 PM, "Jason Baldridge"  wrote:
>
> >Anastasija,
> >
> >There might be a few appropriate sentiment datasets listed in my homework
> >on Twitter sentiment analysis:
> >
> >https://github.com/utcompling/applied-nlp/wiki/Homework5
> >
> >There may also be some useful data sets in the Crowdflower Open Data
> >collection:
> >
> >https://www.crowdflower.com/data-for-everyone/
> >
> >Hope this helps!
> >
> >-Jason
> >
> >On Wed, 22 Jun 2016 at 15:59 Anastasija Mensikova <
> >mensikova.anastas...@gmail.com> wrote:
> >
> >> Hi everyone,
> >>
> >> Some updates on our Sentiment Analysis Parser work.
> >>
> >> You might have noticed, I have enhanced our website (the GH page)
> recently,
> >> polished it and made it more user-friendly. My next step will be
> sending a
> >> pull request to Tika. However, my main goal until the end of Google
> Summer
> >> of Code is to enhance the parser in a way that will allow it to work
> >> categorically (in other words, the sentiment determined won't be just
> >> positive or negative, it will have a few categories). This means that my
> >> next step is to look for a categorical open data set (which I will
> >> hopefully do by the end of the weekend the latest) and, of course,
> enhance
> >> my model and training. After that I will look into how the confidence
> >> levels can be increased.
> >>
> >> Have a great day/night!
> >>
> >> Thank you,
> >> Anastasija Mensikova.
> >>
>


Re: Sentiment Analysis Parser updates

2016-06-22 Thread Mattmann, Chris A (3980)
Thank you Jason!

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++










On 6/22/16, 8:41 PM, "Jason Baldridge"  wrote:

>Anastasija,
>
>There might be a few appropriate sentiment datasets listed in my homework
>on Twitter sentiment analysis:
>
>https://github.com/utcompling/applied-nlp/wiki/Homework5
>
>There may also be some useful data sets in the Crowdflower Open Data
>collection:
>
>https://www.crowdflower.com/data-for-everyone/
>
>Hope this helps!
>
>-Jason
>
>On Wed, 22 Jun 2016 at 15:59 Anastasija Mensikova <
>mensikova.anastas...@gmail.com> wrote:
>
>> Hi everyone,
>>
>> Some updates on our Sentiment Analysis Parser work.
>>
>> You might have noticed, I have enhanced our website (the GH page) recently,
>> polished it and made it more user-friendly. My next step will be sending a
>> pull request to Tika. However, my main goal until the end of Google Summer
>> of Code is to enhance the parser in a way that will allow it to work
>> categorically (in other words, the sentiment determined won't be just
>> positive or negative, it will have a few categories). This means that my
>> next step is to look for a categorical open data set (which I will
>> hopefully do by the end of the weekend the latest) and, of course, enhance
>> my model and training. After that I will look into how the confidence
>> levels can be increased.
>>
>> Have a great day/night!
>>
>> Thank you,
>> Anastasija Mensikova.
>>


Re: Sentiment Analysis Parser updates

2016-06-22 Thread Jason Baldridge
Anastasija,

There might be a few appropriate sentiment datasets listed in my homework
on Twitter sentiment analysis:

https://github.com/utcompling/applied-nlp/wiki/Homework5

There may also be some useful data sets in the Crowdflower Open Data
collection:

https://www.crowdflower.com/data-for-everyone/

Hope this helps!

-Jason

On Wed, 22 Jun 2016 at 15:59 Anastasija Mensikova <
mensikova.anastas...@gmail.com> wrote:

> Hi everyone,
>
> Some updates on our Sentiment Analysis Parser work.
>
> You might have noticed, I have enhanced our website (the GH page) recently,
> polished it and made it more user-friendly. My next step will be sending a
> pull request to Tika. However, my main goal until the end of Google Summer
> of Code is to enhance the parser in a way that will allow it to work
> categorically (in other words, the sentiment determined won't be just
> positive or negative, it will have a few categories). This means that my
> next step is to look for a categorical open data set (which I will
> hopefully do by the end of the weekend the latest) and, of course, enhance
> my model and training. After that I will look into how the confidence
> levels can be increased.
>
> Have a great day/night!
>
> Thank you,
> Anastasija Mensikova.
>


Sentiment Analysis Parser updates

2016-06-22 Thread Anastasija Mensikova
Hi everyone,

Some updates on our Sentiment Analysis Parser work.

You might have noticed, I have enhanced our website (the GH page) recently,
polished it and made it more user-friendly. My next step will be sending a
pull request to Tika. However, my main goal until the end of Google Summer
of Code is to enhance the parser in a way that will allow it to work
categorically (in other words, the sentiment determined won't be just
positive or negative, it will have a few categories). This means that my
next step is to look for a categorical open data set (which I will
hopefully do by the end of the weekend the latest) and, of course, enhance
my model and training. After that I will look into how the confidence
levels can be increased.

Have a great day/night!

Thank you,
Anastasija Mensikova.


Re: Sentiment Analysis Parser updates

2016-06-17 Thread Mattmann, Chris A (3980)
Great update Anastasija!

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++










On 6/17/16, 2:28 PM, "Anastasija Mensikova"  
wrote:

>Hello everyone,
>
>Some updates on my work on the Sentiment Analysis Parser.
>
>As you know, I have finished a basic version of the parser, and I'm
>currently working on going through the results of the parser run on the gun
>ads the right way so I can easily build graphs to illustrate how it all
>works.
>As you probably noticed, I changed some parts of the parser allowing it to
>output the data in JSON. I have also worked on creating scripts (not on
>GitHub) that load the 100 random gun ads, perform sentiment analysis on
>them using the parser and output the data needed for the graph. Using the
>output I received I have already managed to build two graphs using D3: one
>solely on the distribution of sentiment among the gun ads, and the other
>one on the distribution of sentiment of the gun ads in the countries (where
>the guns were made) presented, which you can all see on our GitHub page.
>
>I hope you have a great weekend!
>
>Thank you,
>Anastasija.