Hi,

I tried your code. Very good work so far! Congratulations.

Is the examples/result file corrupted? It has only one line.

Do you plan to implement a simple CLI to use it interactively from command
line, similar to

bin/opennlp Doccat
bin/opennlp TokenNameFinder

?

Also, do you plan to add evaluation tools by extending
AbstractEvaluatorTool and AbstractCrossValidatorTool, as well as the
listener EvaluationErrorPrinter? I found these tools very useful while I am
developing new models and features, maybe you would find it useful as well.

You could also check the DoccatFineGrainedReportListener as a start point
to create a confusion matrix (I think it would be easy because Doccat data
structures are similar to yours).

The result would look like the follow (this is a 300 entries Portuguese
corpus I am building from Facebook messages):


=== Evaluation summary ===
  Number of documents:    298
    Min sentence size:      1
    Max sentence size:    463
Average sentence size:  18,01
     Categories count:      4
             Accuracy: 61,41%

=== Detailed Accuracy By Tag ===

-------------------------------------------------------------------------
|      Tag | Errors |  Count |   % Err | Precision | Recall | F-Measure |
-------------------------------------------------------------------------
|  neutral |     46 |     56 | 0,821   | 0,588     | 0,179  | 0,274     |
| positive |     46 |     70 | 0,657   | 0,48      | 0,343  | 0,4       |
| negative |     18 |    167 | 0,108   | 0,651     | 0,892  | 0,753     |
|     spam |      5 |      5 | 1       | 0         | 0      | 0         |
-------------------------------------------------------------------------

=== Confusion matrix ===


    a     b     c     d | Accuracy | <-- classified as
 <149>   13     4     1 |   89,22% |   a = negative
   42   <24>    3     1 |   34,29% |   b = positive
   35    11   <10>    . |   17,86% |   c = neutral
    3     2     .    <.>|   0%     |   d = spam




Regards,
William

2016-06-23 2:11 GMT-03:00 Mattmann, Chris A (3980) <
chris.a.mattm...@jpl.nasa.gov>:

> Thank you Jason!
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Director, Information Retrieval and Data Science Group (IRDS)
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> WWW: http://irds.usc.edu/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
>
>
>
>
> On 6/22/16, 8:41 PM, "Jason Baldridge" <jasonbaldri...@gmail.com> wrote:
>
> >Anastasija,
> >
> >There might be a few appropriate sentiment datasets listed in my homework
> >on Twitter sentiment analysis:
> >
> >https://github.com/utcompling/applied-nlp/wiki/Homework5
> >
> >There may also be some useful data sets in the Crowdflower Open Data
> >collection:
> >
> >https://www.crowdflower.com/data-for-everyone/
> >
> >Hope this helps!
> >
> >-Jason
> >
> >On Wed, 22 Jun 2016 at 15:59 Anastasija Mensikova <
> >mensikova.anastas...@gmail.com> wrote:
> >
> >> Hi everyone,
> >>
> >> Some updates on our Sentiment Analysis Parser work.
> >>
> >> You might have noticed, I have enhanced our website (the GH page)
> recently,
> >> polished it and made it more user-friendly. My next step will be
> sending a
> >> pull request to Tika. However, my main goal until the end of Google
> Summer
> >> of Code is to enhance the parser in a way that will allow it to work
> >> categorically (in other words, the sentiment determined won't be just
> >> positive or negative, it will have a few categories). This means that my
> >> next step is to look for a categorical open data set (which I will
> >> hopefully do by the end of the weekend the latest) and, of course,
> enhance
> >> my model and training. After that I will look into how the confidence
> >> levels can be increased.
> >>
> >> Have a great day/night!
> >>
> >> Thank you,
> >> Anastasija Mensikova.
> >>
>

Reply via email to