[ https://issues.apache.org/jira/browse/MAHOUT-227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832089#action_12832089 ]
Ted Dunning commented on MAHOUT-227: ------------------------------------ Zhao, My thought is that having a good sequential SVM that learns very fast would be almost as scalable as a parallel implementation, especially if it is right next to a good SGD logistic regression implementation. My guess is that speedup by randomized variable sub-set is likely to be the most effective strategy if we absolutely need to have speedup. It is also possible that just speeding up the parameter sweeps that are normal practice for any serious data mining would be just about as useful as making learning fast for a single parameter setting. That would require giving different maps different parameter settings and having each of them read the entire data set. Each mapper should probably run multiple settings at once so that the data is re-used relatively efficiently. > Parallel SVM > ------------ > > Key: MAHOUT-227 > URL: https://issues.apache.org/jira/browse/MAHOUT-227 > Project: Mahout > Issue Type: Task > Components: Classification > Affects Versions: 0.2 > Reporter: zhao zhendong > Fix For: 0.4 > > Attachments: ParallelPegasos.doc, ParallelPegasos.pdf > > > I wrote a proposal of parallel algorithm for SVM training. Any comment is > welcome. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.