[ 
https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520216#comment-14520216
 ] 

ASF GitHub Bot commented on MAHOUT-1626:
----------------------------------------

Github user gcapan commented on the pull request:

    https://github.com/apache/mahout/pull/62#issuecomment-97582966
  
    Under some conditions, which are satisfied in the case of linear and 
logistic regression, a statistical optimization problem (the parameter 
estimation) over i.i.d. data, distributed as:
    
    1. average of the local estimates
    2. a combination of the average of the local estimates and the average of 
the estimates on the subsamples of the local sample sets
    
    converges in mean to the optimal risk minimizer, as it is described in [1]. 
Given that, these methods are not only a way to distribute machine learning, 
they also provide a _justification for machine learning on Big Data_ (that is, 
these algorithms converge to the true risk minimizer as the whole data were 
processed on a single computer).
    
    With this motivation, I propose to add the two distributing schemes for 
machine learning: averaging and bootstrap-averaging. These would be abstracted 
away from the actual loss minimization algorithms, and the backend engines 
would only provide these two simple functions. The users can throw their 
favourite (in-core) optimization algorithm, and of course we would want to 
provide some of them out-of-box.
    
    Very soon, I am hoping to submit a patch for that. The current patch would 
be obsolete then, so there is no need to replicate this. Once I submit it, I'll 
close the current PR.
    
    [1] http://arxiv.org/abs/1209.4129 
    (The short version in NIPS: 
http://stanford.edu/~jduchi/projects/ZhangDuWa12_nips.pdf)



> Support for required quasi-algebraic operations and starting with aggregating 
> rows/blocks
> -----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1626
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1626
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 1.0.0
>            Reporter: Gokhan Capan
>            Assignee: Gokhan Capan
>              Labels: DSL, scala, spark
>             Fix For: 0.11.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to