[ https://issues.apache.org/jira/browse/MAHOUT-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13714803#comment-13714803 ]
Ted Dunning commented on MAHOUT-1273: ------------------------------------- Should the document be updated to describe what you intend to do? > Single Pass Algorithm for Penalized Linear Regression on MapReduce > ------------------------------------------------------------------ > > Key: MAHOUT-1273 > URL: https://issues.apache.org/jira/browse/MAHOUT-1273 > Project: Mahout > Issue Type: New Feature > Reporter: Kun Yang > Attachments: PenalizedLinear.pdf > > Original Estimate: 720h > Remaining Estimate: 720h > > Penalized linear regression such as Lasso, Elastic-net are widely used in > machine learning, but there are no very efficient scalable implementations on > MapReduce. > The published distributed algorithms for solving this problem is either > iterative (which is not good for MapReduce, see Steven Boyd's paper) or > approximate (what if we need exact solutions, see Paralleled stochastic > gradient descent); another disadvantage of these algorithms is that they can > not do cross validation in the training phase, which requires a > user-specified penalty parameter in advance. > My ideas can train the model with cross validation in a single pass. They are > based on some simple observations. > I have implemented the primitive version of this algorithm in Alpine Data > Labs. Advanced features such as inner-mapper combiner are employed to reduce > the network traffic in the shuffle phase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira