[ https://issues.apache.org/jira/browse/MAHOUT-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Suneel Marthi updated MAHOUT-1273: ---------------------------------- Resolution: Won't Fix Status: Resolved (was: Patch Available) No activity on this in a long while now. Marking this as 'Won't Fix'. > Single Pass Algorithm for Penalized Linear Regression with Cross Validation > on MapReduce > ---------------------------------------------------------------------------------------- > > Key: MAHOUT-1273 > URL: https://issues.apache.org/jira/browse/MAHOUT-1273 > Project: Mahout > Issue Type: New Feature > Affects Versions: 0.9 > Reporter: Kun Yang > Labels: documentation, features, patch, test > Fix For: 1.0 > > Attachments: Algorithm and Numeric Stability.pdf, Examples.pdf, > Manual and Example.pdf, Manual and Example.pdf, Notes.pdf, > PenalizedLinear.pdf, PenalizedLinearRegression.patch, java files.pdf > > Original Estimate: 720h > Remaining Estimate: 720h > > Penalized linear regression such as Lasso, Elastic-net are widely used in > machine learning, but there are no very efficient scalable implementations on > MapReduce. > The published distributed algorithms for solving this problem is either > iterative (which is not good for MapReduce, see Steven Boyd's paper) or > approximate (what if we need exact solutions, see Paralleled stochastic > gradient descent); another disadvantage of these algorithms is that they can > not do cross validation in the training phase, which requires a > user-specified penalty parameter in advance. > My ideas can train the model with cross validation in a single pass. They are > based on some simple observations. > The core algorithm is a modified version of coordinate descent (see J. > Freedman's paper). They implemented a very efficient R package "glmnet", > which is the de facto standard of penalized regression. > I have implemented the primitive version of this algorithm in Alpine Data > Labs. -- This message was sent by Atlassian JIRA (v6.2#6252)