[ https://issues.apache.org/jira/browse/MAHOUT-227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793487#action_12793487 ]
Ted Dunning commented on MAHOUT-227: ------------------------------------ {quote} I understand this concern. Actually, if we set the parameter k to 1,000,000 or higher, do you think it is reasonable to take advantage of Map-reduce framework? I mean, from system implementation's view. {quote} If you increase the value of k to very large values, you will be able to get a bit more computation, but if you follow my small cluster example I think that increasing k from 1000 to 1,000,000 will likely increase efficiency from 0.1% to less than 50% and will drive the algorithm well beyond the region were kT is constant. You will still have quite a lot of I/O per cycle which may prevent you from achieving even 10% efficiency. For larger clusters, the problem will be much worse. Go ahead and try it, though. Your real results count for more than my estimates. And as I said before, getting a good sequential implementation is of real value as well. > Parallel SVM > ------------ > > Key: MAHOUT-227 > URL: https://issues.apache.org/jira/browse/MAHOUT-227 > Project: Mahout > Issue Type: Task > Components: Classification > Reporter: zhao zhendong > Attachments: ParallelPegasos.doc, ParallelPegasos.pdf > > > I wrote a proposal of parallel algorithm for SVM training. Any comment is > welcome. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.