[ 
https://issues.apache.org/jira/browse/SPARK-6682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485554#comment-14485554
 ] 

Alexander Ulanov commented on SPARK-6682:
-----------------------------------------

[~yuu.ishik...@gmail.com] 
They reside in package org.apache.spark.mllib.optimization: class LBFGS(private 
var gradient: Gradient, private var updater: Updater) and class GradientDescent 
private[mllib] (private var gradient: Gradient, private var updater: Updater). 
They extend Optimizer trait that has only one function: def optimize(data: 
RDD[(Double, Vector)], initialWeights: Vector): Vector. This function is 
limited to only one type of input: vectors and their labels. I have submitted a 
separate issue regarding this https://issues.apache.org/jira/browse/SPARK-5362. 

1. Right now static methods work with hard-coded optimizers, such as 
LogisticRegressionWithSGD. This is not very convenient. I think moving away 
from static methods and use builders implies that optimizers also could be set 
by users. It will be a problem because current optimizers require Updater and 
Gradient at the creation time. 
2. The workaround I suggested in the previous post addresses this.


> Deprecate static train and use builder instead for Scala/Java
> -------------------------------------------------------------
>
>                 Key: SPARK-6682
>                 URL: https://issues.apache.org/jira/browse/SPARK-6682
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.3.0
>            Reporter: Joseph K. Bradley
>
> In MLlib, we have for some time been unofficially moving away from the old 
> static train() methods and moving towards builder patterns.  This JIRA is to 
> discuss this move and (hopefully) make it official.
> "Old static train()" API:
> {code}
> val myModel = NaiveBayes.train(myData, ...)
> {code}
> "New builder pattern" API:
> {code}
> val nb = new NaiveBayes().setLambda(0.1)
> val myModel = nb.train(myData)
> {code}
> Pros of the builder pattern:
> * Much less code when algorithms have many parameters.  Since Java does not 
> support default arguments, we required *many* duplicated static train() 
> methods (for each prefix set of arguments).
> * Helps to enforce default parameters.  Users should ideally not have to even 
> think about setting parameters if they just want to try an algorithm quickly.
> * Matches spark.ml API
> Cons of the builder pattern:
> * In Python APIs, static train methods are more "Pythonic."
> Proposal:
> * Scala/Java: We should start deprecating the old static train() methods.  We 
> must keep them for API stability, but deprecating will help with API 
> consistency, making it clear that everyone should use the builder pattern.  
> As we deprecate them, we should make sure that the builder pattern supports 
> all parameters.
> * Python: Keep static train methods.
> CC: [~mengxr]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to