Thanks bro
On 6/29/17, 10:52 AM, "Suneel Marthi" <smar...@apache.org> wrote: http://opennlp.apache.org/docs/1.8.0/manual/opennlp.html#tools.doccat On Thu, Jun 29, 2017 at 1:51 PM, Chris Mattmann <mattm...@apache.org> wrote: > Thanks I will investigate the below thanks Joern. Can someone send me some > pointers > to the Doc Cat API that I can find? Thanks. > > > > > On 6/29/17, 10:18 AM, "Joern Kottmann" <kottm...@gmail.com> wrote: > > For 2. I would like to suggest that we implement doccat format support > to train on that data. > > 3. it would be best so think about how we want to test the doccat > component, today we don't have any tests which use lots of data to > evaluate it. > Probably the sentitment data could solve this for us and a train and > evaluate test could be included in the eval tests. > > +1 to revert and then do these steps after the 1.8.1 release. > > I can apply my PR myself if nobody objects. > > Jörn > > On Thu, Jun 29, 2017 at 7:10 PM, Chris Mattmann <mattm...@apache.org> > wrote: > > Hi Rodrigo, > > > > This is very useful feedback that I wish we would have had a long > time ago. > > > > I will look into it and see if I can reproduce the CLI error. I did > a full build and mvn > > install (which I though would run tests?) before commiting and as I > posted in JIRA > > the tests passed for me? So I will have to look into that. > > > > That said, given your feedback that SentimentME and the Sentiment > Component > > doesn’t offer much over Document Classifier I agree with you, but > wasn’t super > > familiar with the Document Classifier API. That said, if we can get > the same functionality > > by just using Document Classifier why don’t we: > > > > 1. Remove the SentimentME and associated code (except for the unit > tests) > > 2. Use the sample datasets from NetFlix & Stanford Treebank > sentiment and > > build models using Document Classifier API. > > 3. Rename and keep the unit tests that test against Netflix and > Stanford tree bank. > > > > That way we get basic sentiment analysis (that is working for us > internally at JPL decently), > > for Apache OpenNLP, and then if we want to build something better > than a Document > > Classification approach to sentiment we can do so. > > > > Thoughts? > > > > Thanks for your useful feedback. If everyone agrees this is a plan I > can back out the code > > using Joern’s revert, and then try and execute 1-3 above in a branch > first. Thanks. > > > > Cheers, > > Chris > > > > > > > > On 6/29/17, 10:03 AM, "Rodrigo Agerri" <rage...@apache.org> wrote: > > > > Hi Chris, > > > > I have been interested in the new sentiment component for a > while, > > although truth to be told, I did not follow that closely. I have > today > > looked at it and test it with some of the corpora you have > mentioned. > > In order to do that, I checkout master to work with from this > commit > > onwards > > > > https://github.com/apache/opennlp/commit/ > 56321aab51a470cd2004b76fb1f5330881b943c1 > > > > 1. I tried to run it from the CLI. The Sentiment component did > not > > appear to be available. > > 2. I added the SentimentTrainer and Evaluator to the cmdline.CLI > (no > > SentimentTool is implemented to tag with a trained model). > > 3. After that, the CLI tests did not pass. So, the CLI is > currently > > non functional, unless I did something wrong, always possible, of > > course. See if you can reproduce that error. > > > > I therefore did the tests via API. I implemented a little test > for > > training, evaluating and tagging here: > > > > https://github.com/ixa-ehu/ixa-pipe-doc/tree/test > > > > I run the training on the large movies review from Stanford for > binary > > polarity classification > > > > http://ai.stanford.edu/~amaas/data/sentiment/ > > > > and on the two little samples multiclass files added in > resources and > > mentioned in the previous email, using the first one for > training and > > the second one for testing (maxent 100 iterations, cutoff 5). > > > > 2. Stanford results: 0.84264 > > 3. sample multiclass: 0.73 > > > > Given that this is a standard document classification task, I > decided > > to train the doccat component from the CLI: > > > > 1. Stanford results: 0.84264 (BOW features by default). > > 2. sample multiclass: 0.73 > > > > I then looked at the code of the sentiment component and saw > that it > > is basically a document classifier working with bag of words > features. > > No added functionality. So, my conclusions are: > > > > 1. The CLI needs to be fixed. > > 2. The Sentiment component, as it is, provides the same > functionality > > as the document classifier. > > > > I would therefore reconsider this commit until those two issues > are > > addressed. Just my opinion. > > > > Best regards, > > > > Rodrigo > > > > On Thu, Jun 29, 2017 at 5:30 PM, Chris Mattmann < > mattm...@apache.org> wrote: > > > > > > Hey Joern, > > > > > > Sure, you can find the model data links here, along with our > evaluation of them. > > > > > > http://irds.usc.edu/SentimentAnalysisParser/datasets.html > > > > > > There are other evaluations here: > > > > > > http://irds.usc.edu/SentimentAnalysisParser/models.html > > > > > > The HT provider review I cannot contribute at this time and I > question its broad > > > applicability since it’s related to human trafficking. In > addition we are still working > > > on publishing our analysis & evaluation of it which is why I > removed it from the > > > commit. > > > > > > Cheers, > > > Chris > > > > > > > > > > > > > > > > > > On 6/29/17, 7:36 AM, "Joern Kottmann" <kottm...@gmail.com> > wrote: > > > > > > Which data sets did you use to evaluate this? > > > I was looking for a bit more than a sample file to train > it. > > > > > > I noticed that you checked in stanford and netflix models. > > > > > > The stanford data set is probably this one: > > > http://ai.stanford.edu/~amaas/data/sentiment/ > > > > > > Do you have a link for the netflix data? > > > > > > Jörn > > > > > > On Thu, Jun 29, 2017 at 4:00 PM, Chris Mattmann < > mattm...@apache.org> wrote: > > > > Absolutely you can find it here: > > > > > > > > opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ > (for categorical /multi-class) > > > > opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ2 > (for categorical/multi-class) > > > > > > > > We can also do similar files where instead of > multi-class, we just use pos/neg as the label. > > > > > > > > Cheers, > > > > Chris > > > > > > > > > > > > > > > > > > > > > > > > On 6/29/17, 2:35 AM, "Joern Kottmann" < > kottm...@gmail.com> wrote: > > > > > > > > Hello Chris, > > > > > > > > could you please point me to files I can use to > train the sentiment > > > > component? I am currently looking again through the > code and would > > > > like to train it myself. > > > > > > > > Jörn > > > > > > > > On Tue, Jun 27, 2017 at 4:59 PM, Dan Russ < > danrus...@gmail.com> wrote: > > > > > Hi All, > > > > > First, let me take a share of blame for the > comment Chris mentioned. I believe I said something like the pull request > was X revision behind and Y revisions ahead. It was not meant to be rude, > it was meant to say it is hard to review code when it is so different from > the current code base. I am very excited that sentiment analysis is going > to be added to OpenNLP, but I have not had time to play with it. If I were > to say “great job” before I have add a chance to look at it, it would be > flattery not honest praise. > > > > > > > > > > Let’s clean up the merge. I agree with Chris > that scalability and perfection should not be our initial goal. Let’s get > something, and we can decide how to optimize later (even if it require a > complete rewrite). Perfection is the enemy of the good. > > > > > > > > > > Finally, because of Chris’ comments it is hard > to thank Ana and Chris without sounding insincere. But I’ll try, thank you > Chris and Ana. I hope we can get beyond this and that Chris and Ana will > continue to improve the performance of the sentiment analysis tool and > happily remain part of the OpenNLP family. It is also a good time to toss > a big thank you to all of the committers, users, and PMC member. I use > OpenNLP almost everyday. Your work is extremely valuable to me. > > > > > > > > > > Thank you, > > > > > Daniel > > > > > > > > > >> On Jun 27, 2017, at 10:25 AM, Chris Mattmann < > mattm...@apache.org> wrote: > > > > >> > > > > >> Hi everyone, > > > > >> > > > > >> I spoke with Joern in Slack. Some of his concerns > are: > > > > >> > > > > >> 1. This was done with a Merge commit and > apparently they squash and rebase. > > > > >> [would be helpful to see some pointer on this for > documentation, thus far I > > > > >> haven’t found any] > > > > >> 2. Apparently we literally need to ask others for > +1 votes and record them > > > > >> before committing? I thought since Ana and I are > committers aren were +1, > > > > >> and since Joern had been providing feedback (the > last of which was to add > > > > >> tests, which we did) that he would be +1 as well > (I guess he is not, and I guess > > > > >> formally we need to do a +1 vote even still) > > > > >> 3. There was concern about scalability of the > code. > > > > >> 4. There are thoughts that the code was not > perfect yet (even though it works > > > > >> fine in the MEMEX project for Ana and I) > > > > >> > > > > >> So, Joern has opened up a revert PR. > > > > >> > > > > >> I suppose I should state I find this process > extremely heavyweight and unwelcoming. > > > > >> To me, there should be a modicum of trust for > committers, but I feel like even as a > > > > >> committer, I am operating as a “contributor” to > the project. Committer means that > > > > >> there is trust to modify the source code base. Of > the issues above, the only one I see > > > > >> as a moderate snafu was #1, and frankly if there > are some instructions that show me > > > > >> how to do squashing and rebasing *first* I will > try to do that in the future since I am > > > > >> not a GIt expert. > > > > >> > > > > >> That said, I must state I feel pretty put off by > Apache OpenNLP. This originated as a GSoC > > > > >> effort, and we have worked pretty consistently on > this over the last year. We used a > > > > >> separate GitHub project to get started, kept > Joern involved as another mentor, even > > > > >> provided access and commit writes to that GitHub > repository for a long time, so this > > > > >> code was developed in the open. Joern even > created a branch in ApacheOpenNLP in the code and I suppose > > > > >> I should have gone and worked on that branch > first since master is apparently so > > > > >> pristine that even an Apache veteran like me > can’t get something in to it without > > > > >> making a whole bunch of (what are IMO minor > issues, and what are IMO heavyweight > > > > >> “community” issues). > > > > >> > > > > >> I am concerned from a community point of view > that the first comment wasn’t “Great > > > > >> job Chris, you got Sentiment Analysis into > Apache, *but* I have these concerns 1-4 above”. > > > > >> It was “The PR was merged wrong in ways 1-4 and > I’m going to revert it.” > > > > >> > > > > >> That’s pretty off-putting to someone who is > semi-new like me and like Ana. > > > > >> > > > > >> Anyways, go ahead and revert it. Sorry to have > caused any issues. > > > > >> > > > > >> Chris > > > > >> > > > > >> > > > > >> > > > > >> On 6/27/17, 7:06 AM, "Chris Mattmann" < > mattm...@apache.org> wrote: > > > > >> > > > > >> Hi Joern, > > > > >> > > > > >> I’m confused. Why did you revert my commit? > > > > >> > > > > >> Every one of those check points you put on the > PR was checked? > > > > >> We have been discussing this for months, you > have seen the > > > > >> code for months, Ana and I have worked > diligently on the code > > > > >> in plain view of everyone. > > > > >> > > > > >> Please explain. > > > > >> > > > > >> Chris > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> On 6/27/17, 1:23 AM, "kottmann" < > g...@git.apache.org> wrote: > > > > >> > > > > >> GitHub user kottmann opened a pull request: > > > > >> > > > > >> https://github.com/apache/ > opennlp/pull/238 > > > > >> > > > > >> Revert merging of sentiment work, no > consent to merge it > > > > >> > > > > >> Thank you for contributing to Apache > OpenNLP. > > > > >> > > > > >> In order to streamline the review of > the contribution we ask you > > > > >> to ensure the following steps have > been taken: > > > > >> > > > > >> ### For all changes: > > > > >> - [ ] Is there a JIRA ticket > associated with this PR? Is it referenced > > > > >> in the commit message? > > > > >> > > > > >> - [ ] Does your PR title start with > OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay > particular attention to the hyphen "-" character. > > > > >> > > > > >> - [ ] Has your PR been rebased against > the latest commit within the target branch (typically master)? > > > > >> > > > > >> - [ ] Is your initial contribution a > single, squashed commit? > > > > >> > > > > >> ### For code changes: > > > > >> - [ ] Have you ensured that the full > suite of tests is executed via mvn clean install at the root opennlp folder? > > > > >> - [ ] Have you written or updated unit > tests to verify your changes? > > > > >> - [ ] If adding new dependencies to > the code, are these dependencies licensed in a way that is compatible for > inclusion under [ASF 2.0](http://www.apache.org/ > legal/resolved.html#category-a)? > > > > >> - [ ] If applicable, have you updated > the LICENSE file, including the main LICENSE file in opennlp folder? > > > > >> - [ ] If applicable, have you updated > the NOTICE file, including the main NOTICE file found in opennlp folder? > > > > >> > > > > >> ### For documentation related changes: > > > > >> - [ ] Have you ensured that format > looks appropriate for the output in which it is rendered? > > > > >> > > > > >> ### Note: > > > > >> Please ensure that once the PR is > submitted, you check travis-ci for build issues and submit an update to > your PR as soon as possible. > > > > >> > > > > >> > > > > >> You can merge this pull request into a Git > repository by running: > > > > >> > > > > >> $ git pull > https://github.com/kottmann/opennlp revert_sentiment > > > > >> > > > > >> Alternatively you can review and apply > these changes as the patch at: > > > > >> > > > > >> https://github.com/apache/ > opennlp/pull/238.patch > > > > >> > > > > >> To close this pull request, make a commit > to your master/trunk branch > > > > >> with (at least) the following in the > commit message: > > > > >> > > > > >> This closes #238 > > > > >> > > > > >> ---- > > > > >> commit 123222eb34724bae793e9d6d22e202 > c0aee0aa45 > > > > >> Author: Jörn Kottmann <jo...@apache.org> > > > > >> Date: 2017-06-27T08:19:19Z > > > > >> > > > > >> Revert merging of sentiment work, no > consent to merge it > > > > >> > > > > >> ---- > > > > >> > > > > >> > > > > >> --- > > > > >> If your project is set up for it, you can > reply to this email and have your > > > > >> reply appear on GitHub as well. If your > project does not have this feature > > > > >> enabled and wishes so, or if the feature > is enabled but not working, please > > > > >> contact infrastructure at > infrastruct...@apache.org or file a JIRA ticket > > > > >> with INFRA. > > > > >> --- > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >