Joseph K. Bradley created SPARK-6682:
----------------------------------------

             Summary: Deprecate static train and use builder instead for 
Scala/Java
                 Key: SPARK-6682
                 URL: https://issues.apache.org/jira/browse/SPARK-6682
             Project: Spark
          Issue Type: Improvement
          Components: MLlib
    Affects Versions: 1.3.0
            Reporter: Joseph K. Bradley


In MLlib, we have for some time been unofficially moving away from the old 
static train() methods and moving towards builder patterns.  This JIRA is to 
discuss this move and (hopefully) make it official.

"Old static train()" API:
{code}
val myModel = NaiveBayes.train(myData, ...)
{code}

"New builder pattern" API:
{code}
val nb = new NaiveBayes().setLambda(0.1)
val myModel = nb.train(myData)
{code}

Pros of the builder pattern:
* Much less code when algorithms have many parameters.  Since Java does not 
support default arguments, we required *many* duplicated static train() methods 
(for each prefix set of arguments).
* Helps to enforce default parameters.  Users should ideally not have to even 
think about setting parameters if they just want to try an algorithm quickly.
* Matches spark.ml API

Cons:
* In Python APIs, static train methods are more "Pythonic."

Proposal:
* Scala/Java: We should start deprecating the old static train() methods.  We 
must keep them for API stability, but deprecating will help with API 
consistency, making it clear that everyone should use the builder pattern.  As 
we deprecate them, we should make sure that the builder pattern supports all 
parameters.
* Python: Keep static train methods.

CC: [~mengxr]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to