[ https://issues.apache.org/jira/browse/MAHOUT-227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793088#action_12793088 ]
Ted Dunning commented on MAHOUT-227: ------------------------------------ Here are a few formatting suggestions: a) when cutting and pasting from somebody else's work, it is good to point this out. You should directly credit figure 3 and the algorithm pseudo-code which are cut-and-pasted directly from the original paper. b) text in your diagram got resized and is now only partially readable. This makes it a bit harder to follow exactly what you intend. More importantly, the parameter k in the original paper is a batch size. You propose to parallelize the computation of each batch, but otherwise leave the main structure of the computation in place. If we assume a small cluster with, say 100 cores (12 machines or so), then if you set k to 1000, each core will get to do about a dozen vector operations. This is likely to be no more than a microsecond of computation per core per iteration. My guess is that this will result in very, very poor CPU utilization since you will require on map-reduce invocation per iteration. Concretely put, you will have about a millisecond of useful computation every 10 seconds or so. You approach would probably work much better if applied to a single multi-core machine where the very high rendezvous rate would be more achievable. I don't expect that this proposed approach will work with map-reduce. On the other hand, Pegasos is a pretty scalable algorithm even on a single machine. If you were able to produce a high quality sequential implementation, that would be a substantial contribution to Mahout. > Parallel SVM > ------------ > > Key: MAHOUT-227 > URL: https://issues.apache.org/jira/browse/MAHOUT-227 > Project: Mahout > Issue Type: Task > Components: Classification > Reporter: zhao zhendong > Attachments: ParallelPegasos.doc, ParallelPegasos.pdf > > > I wrote a proposal of parallel algorithm for SVM training. Any comment is > welcome. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.