[ 
https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303257#comment-14303257
 ] 

ASF GitHub Bot commented on MAHOUT-1626:
----------------------------------------

Github user gcapan commented on the pull request:

    https://github.com/apache/mahout/pull/62#issuecomment-72650471
  
    The status is that I need to revise the code based on reviews.
    
    But I have some concerns, summarized below:
    
    Here is the story.
    
    I'm going to contribute my recent work on distributed implementation of 
stochastic optimization to some open source library, and for me, the only 
reason that accumulating blocks matters is that I require it for 
averaging-based distributed stochastic gradient descent (DSGD).
    
    I was an advocate of having Mahout as the ML and Matrix Computations core 
for distributed processing engines, and was thinking that the Matrix DSL would 
be sufficient for implementing such algorithms (such as DSGD) in an 
engine-agnostic way. 
    
    It seems that for implementing most optimization algorithms and ML models, 
one requires other-than-DSL operations. And those operations are highly 
engine-specific.
    
    Repeating the aggregating operation in Mahout is duplicate work, just like 
MLlib's having some of Mahout's Matrix DSL capabilities duplicated in uglier 
ways. Plus, having an algorithm in Mahout but not in MLlib (or vice versa) 
really bothers me because other's users could not benefit.
    
    Considering your recent codebase refactoring effort, @dlyubimov, I imagine 
the best way to use the DSL is by utilizing it inside MLlib (or whatever your 
favorite ML library is). That is, MLlib depends on Mahout Matrix-DSL 
implementation, Matrix I/O and computations are handled in Mahout, ML 
algorithms are handled in MLlib and/or other libraries.
    
    Can we just slow this down and think about what should be contributed to 
where, and reconsider the ideal Mahout-Spark integration?
    



> Support for required quasi-algebraic operations and starting with aggregating 
> rows/blocks
> -----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1626
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1626
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 1.0
>            Reporter: Gokhan Capan
>             Fix For: 1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to