Re: [GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

Chris Mattmann Thu, 29 Jun 2017 10:53:35 -0700

Thanks bro



On 6/29/17, 10:52 AM, "Suneel Marthi" <[email protected]> wrote:

    http://opennlp.apache.org/docs/1.8.0/manual/opennlp.html#tools.doccat
    
    On Thu, Jun 29, 2017 at 1:51 PM, Chris Mattmann <[email protected]> wrote:
    
    > Thanks I will investigate the below thanks Joern. Can someone send me some
    > pointers
    > to the Doc Cat API that I can find? Thanks.
    >
    >
    >
    >
    > On 6/29/17, 10:18 AM, "Joern Kottmann" <[email protected]> wrote:
    >
    >     For 2. I would like to suggest that we implement doccat format support
    >     to train on that data.
    >
    >     3. it would be best so think about how we want to test the doccat
    >     component, today we don't have any tests which use lots of data to
    >     evaluate it.
    >     Probably the sentitment data could solve this for us and a train and
    >     evaluate test could be included in the eval tests.
    >
    >     +1 to revert and then do these steps after the 1.8.1 release.
    >
    >     I can apply my PR myself if nobody objects.
    >
    >     Jörn
    >
    >     On Thu, Jun 29, 2017 at 7:10 PM, Chris Mattmann <[email protected]>
    > wrote:
    >     > Hi Rodrigo,
    >     >
    >     > This is very useful feedback that I wish we would have had a long
    > time ago.
    >     >
    >     > I will look into it and see if I can reproduce the CLI error. I did
    > a full build and mvn
    >     > install (which I though would run tests?) before commiting and as I
    > posted in JIRA
    >     > the tests passed for me? So I will have to look into that.
    >     >
    >     > That said, given your feedback that SentimentME and the Sentiment
    > Component
    >     > doesn’t offer much over Document Classifier I agree with you, but
    > wasn’t super
    >     > familiar with the Document Classifier API. That said, if we can get
    > the same functionality
    >     > by just using Document Classifier why don’t we:
    >     >
    >     > 1. Remove the SentimentME and associated code (except for the unit
    > tests)
    >     > 2. Use the sample datasets from NetFlix & Stanford Treebank
    > sentiment and
    >     > build models using Document Classifier API.
    >     > 3. Rename and keep the unit tests that test against Netflix and
    > Stanford tree bank.
    >     >
    >     > That way we get basic sentiment analysis (that is working for us
    > internally at JPL decently),
    >     > for Apache OpenNLP, and then if we want to build something better
    > than a Document
    >     > Classification approach to sentiment we can do so.
    >     >
    >     > Thoughts?
    >     >
    >     > Thanks for your useful feedback. If everyone agrees this is a plan I
    > can back out the code
    >     > using Joern’s revert, and then try and execute 1-3 above in a branch
    > first. Thanks.
    >     >
    >     > Cheers,
    >     > Chris
    >     >
    >     >
    >     >
    >     > On 6/29/17, 10:03 AM, "Rodrigo Agerri" <[email protected]> wrote:
    >     >
    >     >     Hi Chris,
    >     >
    >     >     I have been interested in the new sentiment component for a
    > while,
    >     >     although truth to be told, I did not follow that closely. I have
    > today
    >     >     looked at it and test it with some of the corpora you have
    > mentioned.
    >     >     In order to do that, I checkout master to work with from this
    > commit
    >     >     onwards
    >     >
    >     >     https://github.com/apache/opennlp/commit/
    > 56321aab51a470cd2004b76fb1f5330881b943c1
    >     >
    >     >     1. I tried to run it from the CLI. The Sentiment component did
    > not
    >     >     appear to be available.
    >     >     2. I added the SentimentTrainer and Evaluator to the cmdline.CLI
    > (no
    >     >     SentimentTool is implemented to tag with a trained model).
    >     >     3. After that, the CLI tests did not pass. So, the CLI is
    > currently
    >     >     non functional, unless I did something wrong, always possible, 
of
    >     >     course. See if you can reproduce that error.
    >     >
    >     >     I therefore did the tests via API. I implemented a little test
    > for
    >     >     training, evaluating and tagging here:
    >     >
    >     >     https://github.com/ixa-ehu/ixa-pipe-doc/tree/test
    >     >
    >     >     I run the training on the large movies review from Stanford for
    > binary
    >     >     polarity classification
    >     >
    >     >     http://ai.stanford.edu/~amaas/data/sentiment/
    >     >
    >     >     and on the two little samples multiclass files added in
    > resources and
    >     >     mentioned in the previous email, using the first one for
    > training and
    >     >     the second one for testing (maxent 100 iterations, cutoff 5).
    >     >
    >     >     2. Stanford results: 0.84264
    >     >     3. sample multiclass: 0.73
    >     >
    >     >     Given that this is a standard document classification task, I
    > decided
    >     >     to train the doccat component from the CLI:
    >     >
    >     >     1. Stanford results: 0.84264 (BOW features by default).
    >     >     2. sample multiclass: 0.73
    >     >
    >     >     I then looked at the code of the sentiment component and saw
    > that it
    >     >     is basically a document classifier working with bag of words
    > features.
    >     >     No added functionality. So, my conclusions are:
    >     >
    >     >     1. The CLI needs to be fixed.
    >     >     2. The Sentiment component, as it is, provides the same
    > functionality
    >     >     as the document classifier.
    >     >
    >     >     I would therefore reconsider this commit until those two issues
    > are
    >     >     addressed. Just my opinion.
    >     >
    >     >     Best regards,
    >     >
    >     >     Rodrigo
    >     >
    >     >     On Thu, Jun 29, 2017 at 5:30 PM, Chris Mattmann <
    > [email protected]> wrote:
    >     >     >
    >     >     > Hey Joern,
    >     >     >
    >     >     > Sure, you can find the model data links here, along with our
    > evaluation of them.
    >     >     >
    >     >     > http://irds.usc.edu/SentimentAnalysisParser/datasets.html
    >     >     >
    >     >     > There are other evaluations here:
    >     >     >
    >     >     > http://irds.usc.edu/SentimentAnalysisParser/models.html
    >     >     >
    >     >     > The HT provider review I cannot contribute at this time and I
    > question its broad
    >     >     > applicability since it’s related to human trafficking. In
    > addition we are still working
    >     >     > on publishing our analysis & evaluation of it which is why I
    > removed it from the
    >     >     > commit.
    >     >     >
    >     >     > Cheers,
    >     >     > Chris
    >     >     >
    >     >     >
    >     >     >
    >     >     >
    >     >     >
    >     >     > On 6/29/17, 7:36 AM, "Joern Kottmann" <[email protected]>
    > wrote:
    >     >     >
    >     >     >     Which data sets did you use to evaluate this?
    >     >     >     I was looking for a bit more than a sample file to train
    > it.
    >     >     >
    >     >     >     I noticed that you checked in stanford and netflix models.
    >     >     >
    >     >     >     The stanford data set is probably this one:
    >     >     >     http://ai.stanford.edu/~amaas/data/sentiment/
    >     >     >
    >     >     >     Do you have a link for the netflix data?
    >     >     >
    >     >     >     Jörn
    >     >     >
    >     >     >     On Thu, Jun 29, 2017 at 4:00 PM, Chris Mattmann <
    > [email protected]> wrote:
    >     >     >     > Absolutely you can find it here:
    >     >     >     >
    >     >     >     > 
opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ
    > (for categorical /multi-class)
    >     >     >     > 
opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ2
    > (for categorical/multi-class)
    >     >     >     >
    >     >     >     > We can also do similar files where instead of
    > multi-class, we just use pos/neg as the label.
    >     >     >     >
    >     >     >     > Cheers,
    >     >     >     > Chris
    >     >     >     >
    >     >     >     >
    >     >     >     >
    >     >     >     >
    >     >     >     >
    >     >     >     > On 6/29/17, 2:35 AM, "Joern Kottmann" <
    > [email protected]> wrote:
    >     >     >     >
    >     >     >     >     Hello Chris,
    >     >     >     >
    >     >     >     >     could you please point me to files I can use to
    > train the sentiment
    >     >     >     >     component? I am currently looking again through the
    > code and would
    >     >     >     >     like to train it myself.
    >     >     >     >
    >     >     >     >     Jörn
    >     >     >     >
    >     >     >     >     On Tue, Jun 27, 2017 at 4:59 PM, Dan Russ <
    > [email protected]> wrote:
    >     >     >     >     > Hi All,
    >     >     >     >     >    First, let me take a share of blame for the
    > comment Chris mentioned.  I believe I said something like the pull request
    > was X revision behind and Y revisions ahead.  It was not meant to be rude,
    > it was meant to say it is hard to review code when it is so different from
    > the current code base. I am very excited that sentiment analysis is going
    > to be added to OpenNLP, but I have not had time to play with it. If I were
    > to say “great job” before I have add a chance to look at it, it would be
    > flattery not honest praise.
    >     >     >     >     >
    >     >     >     >     >   Let’s clean up the merge.  I agree with Chris
    > that scalability and perfection should not be our initial goal.  Let’s get
    > something, and we can decide how to optimize later (even if it require a
    > complete rewrite).  Perfection is the enemy of the good.
    >     >     >     >     >
    >     >     >     >     >   Finally, because of Chris’ comments it is hard
    > to thank Ana and Chris without sounding insincere.  But I’ll try, thank 
you
    > Chris and Ana.  I hope we can get beyond this and that Chris and Ana will
    > continue to improve the performance of the sentiment analysis tool and
    > happily remain part of the OpenNLP family.  It is also a good time to toss
    > a big thank you to all of the committers, users, and PMC member.  I use
    > OpenNLP almost everyday.  Your work is extremely valuable to me.
    >     >     >     >     >
    >     >     >     >     > Thank you,
    >     >     >     >     > Daniel
    >     >     >     >     >
    >     >     >     >     >> On Jun 27, 2017, at 10:25 AM, Chris Mattmann <
    > [email protected]> wrote:
    >     >     >     >     >>
    >     >     >     >     >> Hi everyone,
    >     >     >     >     >>
    >     >     >     >     >> I spoke with Joern in Slack. Some of his concerns
    > are:
    >     >     >     >     >>
    >     >     >     >     >> 1. This was done with a Merge commit and
    > apparently they squash and rebase.
    >     >     >     >     >> [would be helpful to see some pointer on this for
    > documentation, thus far I
    >     >     >     >     >> haven’t found any]
    >     >     >     >     >> 2. Apparently we literally need to ask others for
    > +1 votes and record them
    >     >     >     >     >> before committing? I thought since Ana and I are
    > committers aren were +1,
    >     >     >     >     >> and since Joern had been providing feedback (the
    > last of which was to add
    >     >     >     >     >> tests, which we did) that he would be +1 as well
    > (I guess he is not, and I guess
    >     >     >     >     >> formally we need to do a +1 vote even still)
    >     >     >     >     >> 3. There was concern about scalability of the
    > code.
    >     >     >     >     >> 4. There are thoughts that the code was not
    > perfect yet (even though it works
    >     >     >     >     >> fine in the MEMEX project for Ana and I)
    >     >     >     >     >>
    >     >     >     >     >> So, Joern has opened up a revert PR.
    >     >     >     >     >>
    >     >     >     >     >> I suppose I should state I find this process
    > extremely heavyweight and unwelcoming.
    >     >     >     >     >> To me, there should be a modicum of trust for
    > committers, but I feel like even as a
    >     >     >     >     >> committer, I am operating as a “contributor” to
    > the project. Committer means that
    >     >     >     >     >> there is trust to modify the source code base. Of
    > the issues above, the only one I see
    >     >     >     >     >> as a moderate snafu was #1, and frankly if there
    > are some instructions that show me
    >     >     >     >     >> how to do squashing and rebasing *first* I will
    > try to do that in the future since I am
    >     >     >     >     >> not a GIt expert.
    >     >     >     >     >>
    >     >     >     >     >> That said, I must state I feel pretty put off by
    > Apache OpenNLP. This originated as a GSoC
    >     >     >     >     >> effort, and we have worked pretty consistently on
    > this over the last year. We used a
    >     >     >     >     >> separate GitHub project to get started, kept
    > Joern involved as another mentor, even
    >     >     >     >     >> provided access and commit writes to that GitHub
    > repository for a long time, so this
    >     >     >     >     >> code was developed in the open. Joern even
    > created a branch in ApacheOpenNLP in the code and I suppose
    >     >     >     >     >> I should have gone and worked on that branch
    > first since master is apparently so
    >     >     >     >     >> pristine that even an Apache veteran like me
    > can’t get something in to it without
    >     >     >     >     >> making a whole bunch of (what are IMO minor
    > issues, and what are IMO heavyweight
    >     >     >     >     >> “community” issues).
    >     >     >     >     >>
    >     >     >     >     >> I am concerned from a community point of view
    > that the first comment wasn’t “Great
    >     >     >     >     >> job Chris, you got Sentiment Analysis into
    > Apache, *but* I have these concerns 1-4 above”.
    >     >     >     >     >> It was “The PR was merged wrong in ways 1-4 and
    > I’m going to revert it.”
    >     >     >     >     >>
    >     >     >     >     >> That’s pretty off-putting to someone who is
    > semi-new like me and like Ana.
    >     >     >     >     >>
    >     >     >     >     >> Anyways, go ahead and revert it. Sorry to have
    > caused any issues.
    >     >     >     >     >>
    >     >     >     >     >> Chris
    >     >     >     >     >>
    >     >     >     >     >>
    >     >     >     >     >>
    >     >     >     >     >> On 6/27/17, 7:06 AM, "Chris Mattmann" <
    > [email protected]> wrote:
    >     >     >     >     >>
    >     >     >     >     >>    Hi Joern,
    >     >     >     >     >>
    >     >     >     >     >>    I’m confused. Why did you revert my commit?
    >     >     >     >     >>
    >     >     >     >     >>    Every one of those check points you put on the
    > PR was checked?
    >     >     >     >     >>    We have been discussing this for months, you
    > have seen the
    >     >     >     >     >>    code for months, Ana and I have worked
    > diligently on the code
    >     >     >     >     >>    in plain view of everyone.
    >     >     >     >     >>
    >     >     >     >     >>    Please explain.
    >     >     >     >     >>
    >     >     >     >     >>    Chris
    >     >     >     >     >>
    >     >     >     >     >>
    >     >     >     >     >>
    >     >     >     >     >>
    >     >     >     >     >>    On 6/27/17, 1:23 AM, "kottmann" <
    > [email protected]> wrote:
    >     >     >     >     >>
    >     >     >     >     >>        GitHub user kottmann opened a pull 
request:
    >     >     >     >     >>
    >     >     >     >     >>            https://github.com/apache/
    > opennlp/pull/238
    >     >     >     >     >>
    >     >     >     >     >>            Revert merging of sentiment work, no
    > consent to merge it
    >     >     >     >     >>
    >     >     >     >     >>            Thank you for contributing to Apache
    > OpenNLP.
    >     >     >     >     >>
    >     >     >     >     >>            In order to streamline the review of
    > the contribution we ask you
    >     >     >     >     >>            to ensure the following steps have
    > been taken:
    >     >     >     >     >>
    >     >     >     >     >>            ### For all changes:
    >     >     >     >     >>            - [ ] Is there a JIRA ticket
    > associated with this PR? Is it referenced
    >     >     >     >     >>                 in the commit message?
    >     >     >     >     >>
    >     >     >     >     >>            - [ ] Does your PR title start with
    > OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay
    > particular attention to the hyphen "-" character.
    >     >     >     >     >>
    >     >     >     >     >>            - [ ] Has your PR been rebased against
    > the latest commit within the target branch (typically master)?
    >     >     >     >     >>
    >     >     >     >     >>            - [ ] Is your initial contribution a
    > single, squashed commit?
    >     >     >     >     >>
    >     >     >     >     >>            ### For code changes:
    >     >     >     >     >>            - [ ] Have you ensured that the full
    > suite of tests is executed via mvn clean install at the root opennlp 
folder?
    >     >     >     >     >>            - [ ] Have you written or updated unit
    > tests to verify your changes?
    >     >     >     >     >>            - [ ] If adding new dependencies to
    > the code, are these dependencies licensed in a way that is compatible for
    > inclusion under [ASF 2.0](http://www.apache.org/
    > legal/resolved.html#category-a)?
    >     >     >     >     >>            - [ ] If applicable, have you updated
    > the LICENSE file, including the main LICENSE file in opennlp folder?
    >     >     >     >     >>            - [ ] If applicable, have you updated
    > the NOTICE file, including the main NOTICE file found in opennlp folder?
    >     >     >     >     >>
    >     >     >     >     >>            ### For documentation related changes:
    >     >     >     >     >>            - [ ] Have you ensured that format
    > looks appropriate for the output in which it is rendered?
    >     >     >     >     >>
    >     >     >     >     >>            ### Note:
    >     >     >     >     >>            Please ensure that once the PR is
    > submitted, you check travis-ci for build issues and submit an update to
    > your PR as soon as possible.
    >     >     >     >     >>
    >     >     >     >     >>
    >     >     >     >     >>        You can merge this pull request into a Git
    > repository by running:
    >     >     >     >     >>
    >     >     >     >     >>            $ git pull
    > https://github.com/kottmann/opennlp revert_sentiment
    >     >     >     >     >>
    >     >     >     >     >>        Alternatively you can review and apply
    > these changes as the patch at:
    >     >     >     >     >>
    >     >     >     >     >>            https://github.com/apache/
    > opennlp/pull/238.patch
    >     >     >     >     >>
    >     >     >     >     >>        To close this pull request, make a commit
    > to your master/trunk branch
    >     >     >     >     >>        with (at least) the following in the
    > commit message:
    >     >     >     >     >>
    >     >     >     >     >>            This closes #238
    >     >     >     >     >>
    >     >     >     >     >>        ----
    >     >     >     >     >>        commit 123222eb34724bae793e9d6d22e202
    > c0aee0aa45
    >     >     >     >     >>        Author: JÃ¶rn Kottmann <[email protected]>
    >     >     >     >     >>        Date:   2017-06-27T08:19:19Z
    >     >     >     >     >>
    >     >     >     >     >>            Revert merging of sentiment work, no
    > consent to merge it
    >     >     >     >     >>
    >     >     >     >     >>        ----
    >     >     >     >     >>
    >     >     >     >     >>
    >     >     >     >     >>        ---
    >     >     >     >     >>        If your project is set up for it, you can
    > reply to this email and have your
    >     >     >     >     >>        reply appear on GitHub as well. If your
    > project does not have this feature
    >     >     >     >     >>        enabled and wishes so, or if the feature
    > is enabled but not working, please
    >     >     >     >     >>        contact infrastructure at
    > [email protected] or file a JIRA ticket
    >     >     >     >     >>        with INFRA.
    >     >     >     >     >>        ---
    >     >     >     >     >>
    >     >     >     >     >>
    >     >     >     >     >>
    >     >     >     >     >>
    >     >     >     >     >>
    >     >     >     >     >>
    >     >     >     >     >
    >     >     >     >
    >     >     >     >
    >     >     >     >
    >     >     >
    >     >     >
    >     >     >
    >     >
    >     >
    >     >
    >
    >
    >
    >

Re: [GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

Reply via email to