[ https://issues.apache.org/jira/browse/MAHOUT-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169223#comment-13169223 ]
jirapos...@reviews.apache.org commented on MAHOUT-918: ------------------------------------------------------ bq. On 2011-12-13 13:24:28, Ted Dunning wrote: bq. > trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/AdaptiveLogisticRegressionDriver.java, lines 36-41 bq. > <https://reviews.apache.org/r/3072/diff/4/?file=64283#file64283line36> bq. > bq. > Direct and exact quotes from the paper should be either avoided or acknowledged. Better here to rephrase the language. Rephrased the language at revision 5. bq. On 2011-12-13 13:24:28, Ted Dunning wrote: bq. > trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/AdaptiveLogisticRegressionDriver.java, lines 60-63 bq. > <https://reviews.apache.org/r/3072/diff/4/?file=64283#file64283line60> bq. > bq. > Again, just quoting the paper is not a good idea. This isn't adding any information in any case since the exact same language was used in the class level java doc. bq. > bq. > It would be nice here to note that the average is an *unweighted* average. Rephrased the language at revision 5. bq. On 2011-12-13 13:24:28, Ted Dunning wrote: bq. > trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/AdaptiveLogisticRegressionMapper.java, lines 87-88 bq. > <https://reviews.apache.org/r/3072/diff/4/?file=64284#file64284line87> bq. > bq. > This looks like a bad key to use here. This key should be the average of log-likelihood of the best OnlineLogisticRegression in AdaptiveLogisticRegression. bq. On 2011-12-13 13:24:28, Ted Dunning wrote: bq. > trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/AdaptiveLogisticRegressionMapper.java, line 40 bq. > <https://reviews.apache.org/r/3072/diff/4/?file=64284#file64284line40> bq. > bq. > I don't think that this is correct. Is this really what the output is? Why are you dividing by a weight vector? How do you compute this score? bq. > bq. > Or do you mean to not divide here? bq. > bq. > If so, why do you use a score as the key? The way to explain it may be bad, but it means the Map output key is score and Map output value is new weight vector. I rewrote the comment at revision 5. bq. On 2011-12-13 13:24:28, Ted Dunning wrote: bq. > trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/AdaptiveLogisticRegressionReducer.java, lines 34-35 bq. > <https://reviews.apache.org/r/3072/diff/4/?file=64285#file64285line34> bq. > bq. > I don't think that this is correct. In the google paper, the average was unweighted. In any case how do you compute this score for weighting? bq. > bq. > Also, if the key is the score, how does the reducer work since each reduce function will only see one score? Are you assuming that there is exactly one reducer? The original paper(http://aclweb.org/anthology-new/N/N10/N10-1069.pdf) says it is a weighted average, but my simple experiment showed that the unweighted average was better than the weighted average. So I rewrote the code as the unweighted average at revision 5. The number of reducers should be set to one. I added the comment accordingly at revision 5. The number of reducers is set at runIteration function at Driver class. - issei ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3072/#review3875 ----------------------------------------------------------- On 2011-12-14 08:59:29, issei yoshida wrote: bq. bq. ----------------------------------------------------------- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/3072/ bq. ----------------------------------------------------------- bq. bq. (Updated 2011-12-14 08:59:29) bq. bq. bq. Review request for mahout. bq. bq. bq. Summary bq. ------- bq. bq. MAHOUT-918 Parallelized SGD in MapReduce bq. bq. bq. This addresses bug MAHOUT-918. bq. https://issues.apache.org/jira/browse/MAHOUT-918 bq. bq. bq. Diffs bq. ----- bq. bq. trunk/core/src/main/java/org/apache/mahout/classifier/sgd/PassiveAggressive.java 1214116 bq. trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/AdaptiveLogisticRegressionDriver.java PRE-CREATION bq. trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/AdaptiveLogisticRegressionMapper.java PRE-CREATION bq. trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/AdaptiveLogisticRegressionReducer.java PRE-CREATION bq. trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/LogisticRegressionDriver.java PRE-CREATION bq. trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/LogisticRegressionMapper.java PRE-CREATION bq. trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/LogisticRegressionReducer.java PRE-CREATION bq. trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/PassiveAggressiveDriver.java PRE-CREATION bq. trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/PassiveAggressiveMapper.java PRE-CREATION bq. trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/PassiveAggressiveReducer.java PRE-CREATION bq. trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/SGDDriver.java PRE-CREATION bq. trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/SGDMapper.java PRE-CREATION bq. trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/SGDReducer.java PRE-CREATION bq. trunk/core/src/test/java/org/apache/mahout/classifier/sgd/mapreduce/AdaptiveLogisticRegressionMapReduceTest.java PRE-CREATION bq. trunk/core/src/test/java/org/apache/mahout/classifier/sgd/mapreduce/LogisticRegressionMapReduceTest.java PRE-CREATION bq. trunk/core/src/test/java/org/apache/mahout/classifier/sgd/mapreduce/PassiveAggressiveMapReduceTest.java PRE-CREATION bq. trunk/core/src/test/java/org/apache/mahout/classifier/sgd/mapreduce/SGDMapReduceTest.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/3072/diff bq. bq. bq. Testing bq. ------- bq. bq. bq. Thanks, bq. bq. issei bq. bq. > Implement SGD based classifiers using MapReduce > ----------------------------------------------- > > Key: MAHOUT-918 > URL: https://issues.apache.org/jira/browse/MAHOUT-918 > Project: Mahout > Issue Type: New Feature > Components: Classification > Affects Versions: 0.6 > Reporter: issei yoshida > Attachments: MAHOUT-918.patch, design.pdf > > > Implement SGD based classifiers (Logistic Regression, Adaptive Logistic > regression and Passive-Aggressive) using MapReduce. > They are implemented using Iterative Parameter Mixtures algorithm which is > referred to in the following papers. > http://research.google.com/pubs/pub36948.html > http://aclweb.org/anthology-new/N/N10/N10-1069.pdf > http://books.nips.cc/papers/files/nips22/NIPS2009_0345.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira