[ 
https://issues.apache.org/jira/browse/MAHOUT-227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793088#action_12793088
 ] 

Ted Dunning commented on MAHOUT-227:
------------------------------------

Here are a few formatting suggestions:

a) when cutting and pasting from somebody else's work, it is good to point this 
out.  You should directly credit figure 3 and the algorithm pseudo-code which 
are cut-and-pasted directly from the original paper.

b) text in your diagram got resized and is now only partially readable.  This 
makes it a bit harder to follow exactly what you intend.


More importantly, the parameter k in the original paper is a batch size.  You 
propose to parallelize the computation of each batch, but otherwise leave the 
main structure of the computation in place.  If we assume a small cluster with, 
say 100 cores (12 machines or so), then if you set k to 1000, each core will 
get to do about a dozen vector operations.  This is likely to be no more than a 
microsecond of computation per core per iteration.  My guess is that this will 
result in very, very poor CPU utilization since you will require on map-reduce 
invocation per iteration.  Concretely put, you will have about a millisecond of 
useful computation every 10 seconds or so.  

You approach would probably work much better if applied to a single multi-core 
machine where the very high rendezvous rate would be more achievable.  I don't 
expect that this proposed approach will work with map-reduce.

On the other hand, Pegasos is a pretty scalable algorithm even on a single 
machine.  If you were able to produce a high quality sequential implementation, 
that would be a substantial contribution to Mahout.


> Parallel SVM
> ------------
>
>                 Key: MAHOUT-227
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-227
>             Project: Mahout
>          Issue Type: Task
>          Components: Classification
>            Reporter: zhao zhendong
>         Attachments: ParallelPegasos.doc, ParallelPegasos.pdf
>
>
> I wrote a proposal of parallel algorithm for SVM training. Any comment is 
> welcome.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to