[ https://issues.apache.org/jira/browse/MAHOUT-228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802032#action_12802032 ]
Olivier Grisel commented on MAHOUT-228: --------------------------------------- For the records: I am working adding more tests and debugging in the following branch (keps in sync with the trunk) hosted on github: http://github.com/ogrisel/mahout/commits/MAHOUT-228 Fixed so far: - convergence issues (inconstency on the index of the 'missing' beta row) - make sure that L1 is sparsity inducing my apply eager post update regularization Still TODO (independently of Ted's TODOs) - migh be splitted into specific jira issues: - test that highly redundant dataset can lean to very sparse models with L1 prior - an hadoop driver to do // extraction vector features of documents using the Randomizer classes - an hadoop driver to do // cross validation and confusion matrix evaluation (along with confidence interval) - an hadoop driver to perform hyperparameters grid search (lambda, priorfunc, learning rate, ...) - a sample hadoop driver to categorize wikipedia articles by country - profile it a bit > Need sequential logistic regression implementation using SGD techniques > ----------------------------------------------------------------------- > > Key: MAHOUT-228 > URL: https://issues.apache.org/jira/browse/MAHOUT-228 > Project: Mahout > Issue Type: New Feature > Components: Classification > Reporter: Ted Dunning > Fix For: 0.3 > > Attachments: logP.csv, MAHOUT-228-3.patch, r.csv, sgd-derivation.pdf, > sgd-derivation.tex, sgd.csv > > > Stochastic gradient descent (SGD) is often fast enough for highly scalable > learning (see Vowpal Wabbit, http://hunch.net/~vw/). > I often need to have a logistic regression in Java as well, so that is a > reasonable place to start. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.