[
https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303257#comment-14303257
]
ASF GitHub Bot commented on MAHOUT-1626:
----------------------------------------
Github user gcapan commented on the pull request:
https://github.com/apache/mahout/pull/62#issuecomment-72650471
The status is that I need to revise the code based on reviews.
But I have some concerns, summarized below:
Here is the story.
I'm going to contribute my recent work on distributed implementation of
stochastic optimization to some open source library, and for me, the only
reason that accumulating blocks matters is that I require it for
averaging-based distributed stochastic gradient descent (DSGD).
I was an advocate of having Mahout as the ML and Matrix Computations core
for distributed processing engines, and was thinking that the Matrix DSL would
be sufficient for implementing such algorithms (such as DSGD) in an
engine-agnostic way.
It seems that for implementing most optimization algorithms and ML models,
one requires other-than-DSL operations. And those operations are highly
engine-specific.
Repeating the aggregating operation in Mahout is duplicate work, just like
MLlib's having some of Mahout's Matrix DSL capabilities duplicated in uglier
ways. Plus, having an algorithm in Mahout but not in MLlib (or vice versa)
really bothers me because other's users could not benefit.
Considering your recent codebase refactoring effort, @dlyubimov, I imagine
the best way to use the DSL is by utilizing it inside MLlib (or whatever your
favorite ML library is). That is, MLlib depends on Mahout Matrix-DSL
implementation, Matrix I/O and computations are handled in Mahout, ML
algorithms are handled in MLlib and/or other libraries.
Can we just slow this down and think about what should be contributed to
where, and reconsider the ideal Mahout-Spark integration?
> Support for required quasi-algebraic operations and starting with aggregating
> rows/blocks
> -----------------------------------------------------------------------------------------
>
> Key: MAHOUT-1626
> URL: https://issues.apache.org/jira/browse/MAHOUT-1626
> Project: Mahout
> Issue Type: New Feature
> Components: Math
> Affects Versions: 1.0
> Reporter: Gokhan Capan
> Fix For: 1.0
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)