Which data sets did you use to evaluate this? I was looking for a bit more than a sample file to train it.
I noticed that you checked in stanford and netflix models. The stanford data set is probably this one: http://ai.stanford.edu/~amaas/data/sentiment/ Do you have a link for the netflix data? Jörn On Thu, Jun 29, 2017 at 4:00 PM, Chris Mattmann <[email protected]> wrote: > Absolutely you can find it here: > > opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ > (for categorical /multi-class) > opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ2 > (for categorical/multi-class) > > We can also do similar files where instead of multi-class, we just use > pos/neg as the label. > > Cheers, > Chris > > > > > > On 6/29/17, 2:35 AM, "Joern Kottmann" <[email protected]> wrote: > > Hello Chris, > > could you please point me to files I can use to train the sentiment > component? I am currently looking again through the code and would > like to train it myself. > > Jörn > > On Tue, Jun 27, 2017 at 4:59 PM, Dan Russ <[email protected]> wrote: > > Hi All, > > First, let me take a share of blame for the comment Chris mentioned. > I believe I said something like the pull request was X revision behind and Y > revisions ahead. It was not meant to be rude, it was meant to say it is hard > to review code when it is so different from the current code base. I am very > excited that sentiment analysis is going to be added to OpenNLP, but I have > not had time to play with it. If I were to say “great job” before I have add > a chance to look at it, it would be flattery not honest praise. > > > > Let’s clean up the merge. I agree with Chris that scalability and > perfection should not be our initial goal. Let’s get something, and we can > decide how to optimize later (even if it require a complete rewrite). > Perfection is the enemy of the good. > > > > Finally, because of Chris’ comments it is hard to thank Ana and Chris > without sounding insincere. But I’ll try, thank you Chris and Ana. I hope > we can get beyond this and that Chris and Ana will continue to improve the > performance of the sentiment analysis tool and happily remain part of the > OpenNLP family. It is also a good time to toss a big thank you to all of the > committers, users, and PMC member. I use OpenNLP almost everyday. Your work > is extremely valuable to me. > > > > Thank you, > > Daniel > > > >> On Jun 27, 2017, at 10:25 AM, Chris Mattmann <[email protected]> > wrote: > >> > >> Hi everyone, > >> > >> I spoke with Joern in Slack. Some of his concerns are: > >> > >> 1. This was done with a Merge commit and apparently they squash and > rebase. > >> [would be helpful to see some pointer on this for documentation, thus > far I > >> haven’t found any] > >> 2. Apparently we literally need to ask others for +1 votes and record > them > >> before committing? I thought since Ana and I are committers aren were > +1, > >> and since Joern had been providing feedback (the last of which was to > add > >> tests, which we did) that he would be +1 as well (I guess he is not, > and I guess > >> formally we need to do a +1 vote even still) > >> 3. There was concern about scalability of the code. > >> 4. There are thoughts that the code was not perfect yet (even though > it works > >> fine in the MEMEX project for Ana and I) > >> > >> So, Joern has opened up a revert PR. > >> > >> I suppose I should state I find this process extremely heavyweight and > unwelcoming. > >> To me, there should be a modicum of trust for committers, but I feel > like even as a > >> committer, I am operating as a “contributor” to the project. Committer > means that > >> there is trust to modify the source code base. Of the issues above, > the only one I see > >> as a moderate snafu was #1, and frankly if there are some instructions > that show me > >> how to do squashing and rebasing *first* I will try to do that in the > future since I am > >> not a GIt expert. > >> > >> That said, I must state I feel pretty put off by Apache OpenNLP. This > originated as a GSoC > >> effort, and we have worked pretty consistently on this over the last > year. We used a > >> separate GitHub project to get started, kept Joern involved as another > mentor, even > >> provided access and commit writes to that GitHub repository for a long > time, so this > >> code was developed in the open. Joern even created a branch in > ApacheOpenNLP in the code and I suppose > >> I should have gone and worked on that branch first since master is > apparently so > >> pristine that even an Apache veteran like me can’t get something in to > it without > >> making a whole bunch of (what are IMO minor issues, and what are IMO > heavyweight > >> “community” issues). > >> > >> I am concerned from a community point of view that the first comment > wasn’t “Great > >> job Chris, you got Sentiment Analysis into Apache, *but* I have these > concerns 1-4 above”. > >> It was “The PR was merged wrong in ways 1-4 and I’m going to revert > it.” > >> > >> That’s pretty off-putting to someone who is semi-new like me and like > Ana. > >> > >> Anyways, go ahead and revert it. Sorry to have caused any issues. > >> > >> Chris > >> > >> > >> > >> On 6/27/17, 7:06 AM, "Chris Mattmann" <[email protected]> wrote: > >> > >> Hi Joern, > >> > >> I’m confused. Why did you revert my commit? > >> > >> Every one of those check points you put on the PR was checked? > >> We have been discussing this for months, you have seen the > >> code for months, Ana and I have worked diligently on the code > >> in plain view of everyone. > >> > >> Please explain. > >> > >> Chris > >> > >> > >> > >> > >> On 6/27/17, 1:23 AM, "kottmann" <[email protected]> wrote: > >> > >> GitHub user kottmann opened a pull request: > >> > >> https://github.com/apache/opennlp/pull/238 > >> > >> Revert merging of sentiment work, no consent to merge it > >> > >> Thank you for contributing to Apache OpenNLP. > >> > >> In order to streamline the review of the contribution we > ask you > >> to ensure the following steps have been taken: > >> > >> ### For all changes: > >> - [ ] Is there a JIRA ticket associated with this PR? Is it > referenced > >> in the commit message? > >> > >> - [ ] Does your PR title start with OPENNLP-XXXX where XXXX > is the JIRA number you are trying to resolve? Pay particular attention to the > hyphen "-" character. > >> > >> - [ ] Has your PR been rebased against the latest commit > within the target branch (typically master)? > >> > >> - [ ] Is your initial contribution a single, squashed > commit? > >> > >> ### For code changes: > >> - [ ] Have you ensured that the full suite of tests is > executed via mvn clean install at the root opennlp folder? > >> - [ ] Have you written or updated unit tests to verify your > changes? > >> - [ ] If adding new dependencies to the code, are these > dependencies licensed in a way that is compatible for inclusion under [ASF > 2.0](http://www.apache.org/legal/resolved.html#category-a)? > >> - [ ] If applicable, have you updated the LICENSE file, > including the main LICENSE file in opennlp folder? > >> - [ ] If applicable, have you updated the NOTICE file, > including the main NOTICE file found in opennlp folder? > >> > >> ### For documentation related changes: > >> - [ ] Have you ensured that format looks appropriate for > the output in which it is rendered? > >> > >> ### Note: > >> Please ensure that once the PR is submitted, you check > travis-ci for build issues and submit an update to your PR as soon as > possible. > >> > >> > >> You can merge this pull request into a Git repository by > running: > >> > >> $ git pull https://github.com/kottmann/opennlp > revert_sentiment > >> > >> Alternatively you can review and apply these changes as the > patch at: > >> > >> https://github.com/apache/opennlp/pull/238.patch > >> > >> To close this pull request, make a commit to your master/trunk > branch > >> with (at least) the following in the commit message: > >> > >> This closes #238 > >> > >> ---- > >> commit 123222eb34724bae793e9d6d22e202c0aee0aa45 > >> Author: Jörn Kottmann <[email protected]> > >> Date: 2017-06-27T08:19:19Z > >> > >> Revert merging of sentiment work, no consent to merge it > >> > >> ---- > >> > >> > >> --- > >> If your project is set up for it, you can reply to this email > and have your > >> reply appear on GitHub as well. If your project does not have > this feature > >> enabled and wishes so, or if the feature is enabled but not > working, please > >> contact infrastructure at [email protected] or file a > JIRA ticket > >> with INFRA. > >> --- > >> > >> > >> > >> > >> > >> > > > > >
