[ 
https://issues.apache.org/jira/browse/MAHOUT-227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832089#action_12832089
 ] 

Ted Dunning commented on MAHOUT-227:
------------------------------------


Zhao,

My thought is that having a good sequential SVM that learns very fast would be 
almost as scalable as a parallel implementation, especially if it is right next 
to a good SGD logistic regression implementation.

My guess is that speedup by randomized variable sub-set is likely to be the 
most effective strategy if we absolutely need to have speedup.  It is also 
possible that just speeding up the parameter sweeps that are normal practice 
for any serious data mining would be just about as useful as making learning 
fast for a single parameter setting.  That would require giving different maps 
different parameter settings and having each of them read the entire data set.  
Each mapper should probably run multiple settings at once so that the data is 
re-used relatively efficiently.
 


> Parallel SVM
> ------------
>
>                 Key: MAHOUT-227
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-227
>             Project: Mahout
>          Issue Type: Task
>          Components: Classification
>    Affects Versions: 0.2
>            Reporter: zhao zhendong
>             Fix For: 0.4
>
>         Attachments: ParallelPegasos.doc, ParallelPegasos.pdf
>
>
> I wrote a proposal of parallel algorithm for SVM training. Any comment is 
> welcome.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to