[ https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303257#comment-14303257 ]
ASF GitHub Bot commented on MAHOUT-1626: ---------------------------------------- Github user gcapan commented on the pull request: https://github.com/apache/mahout/pull/62#issuecomment-72650471 The status is that I need to revise the code based on reviews. But I have some concerns, summarized below: Here is the story. I'm going to contribute my recent work on distributed implementation of stochastic optimization to some open source library, and for me, the only reason that accumulating blocks matters is that I require it for averaging-based distributed stochastic gradient descent (DSGD). I was an advocate of having Mahout as the ML and Matrix Computations core for distributed processing engines, and was thinking that the Matrix DSL would be sufficient for implementing such algorithms (such as DSGD) in an engine-agnostic way. It seems that for implementing most optimization algorithms and ML models, one requires other-than-DSL operations. And those operations are highly engine-specific. Repeating the aggregating operation in Mahout is duplicate work, just like MLlib's having some of Mahout's Matrix DSL capabilities duplicated in uglier ways. Plus, having an algorithm in Mahout but not in MLlib (or vice versa) really bothers me because other's users could not benefit. Considering your recent codebase refactoring effort, @dlyubimov, I imagine the best way to use the DSL is by utilizing it inside MLlib (or whatever your favorite ML library is). That is, MLlib depends on Mahout Matrix-DSL implementation, Matrix I/O and computations are handled in Mahout, ML algorithms are handled in MLlib and/or other libraries. Can we just slow this down and think about what should be contributed to where, and reconsider the ideal Mahout-Spark integration? > Support for required quasi-algebraic operations and starting with aggregating > rows/blocks > ----------------------------------------------------------------------------------------- > > Key: MAHOUT-1626 > URL: https://issues.apache.org/jira/browse/MAHOUT-1626 > Project: Mahout > Issue Type: New Feature > Components: Math > Affects Versions: 1.0 > Reporter: Gokhan Capan > Fix For: 1.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)