GitHub user jkbradley opened a pull request:

    https://github.com/apache/spark/pull/3637

    [SPARK-4789] [mllib] Standardize ML Prediction APIs

    This is part (1) of the updates from the WIP PR in 
[https://github.com/apache/spark/pull/3427]
    
    Abstract classes for learning algorithms:
    * Classifier
    * Regressor
    * Predictor
    
    Traits for learning algorithms
    * ProbabilisticClassificationModel
    
    Concrete classes: learning algorithms
    * LinearRegression
    * LogisticRegression (updated to use new abstract classes)
    
    Concrete classes: other
    * LabeledPoint (adding weight to the old LabeledPoint)
    
    Other updates:
    * Modified ParamMap to sort parameters in toString
    
    Test Suites:
    * LabeledPointSuite
    * LinearRegressionSuite
    * LogisticRegressionSuite
    * + Java versions of above suites
    
    CC: @mengxr  @etrain  @shivaram 


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jkbradley/spark ml-api-part1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/3637.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3637
    
----
commit de1e3b4c39b42757e56345a6bab2bdeefaa3ca25
Author: Joseph K. Bradley <jos...@databricks.com>
Date:   2014-11-24T07:18:52Z

    Added lots of classes for new ML API:
    
    Abstract classes for learning algorithms:
    * Classifier
    * Regressor
    * Predictor
    
    Traits for learning algorithms
    * HasDefaultEstimator
    * IterativeEstimator
    * IterativeSolver
    * ProbabilisticClassificationModel
    * WeakLearner
    
    Concrete classes: learning algorithms
    * AdaBoost (partly implemented)
    * NaiveBayes (rough implementation)
    * LinearRegression
    * LogisticRegression (updated to use new abstract classes)
    
    Concrete classes: evaluation
    * ClassificationEvaluator
    * RegressionEvaluator
    * PredictionEvaluator
    
    Concrete classes: other
    * LabeledPoint (adding weight to the old LabeledPoint)

commit 6551244b96d8f70f1daacd0415318cf81fd5111a
Author: Joseph K. Bradley <jos...@databricks.com>
Date:   2014-11-24T07:30:31Z

    fixed compilation issues, but have not added tests yet

commit 25b643d4b367fea5a3ba1b91564374c2b1b7a0f1
Author: Joseph K. Bradley <jos...@databricks.com>
Date:   2014-12-01T18:31:41Z

    removing everything except for simple class hierarchy for classification

commit e61e2738dcb2494be25cec2bd798c3e6e5156b73
Author: Joseph K. Bradley <jos...@databricks.com>
Date:   2014-12-04T21:37:29Z

    Added LinearRegression and Regressor back from ml-api branch

commit 272e62fb41fc8778f3a13f812d4262d9558a772b
Author: Joseph K. Bradley <jos...@databricks.com>
Date:   2014-12-05T00:11:02Z

    Modified ParamMap to sort parameters in toString.  Cleaned up classes in 
class hierarchy, before implementing tests and examples.

commit cc13d61f2a277b101f7422af240afa64dfb10236
Author: Joseph K. Bradley <jos...@databricks.com>
Date:   2014-12-05T01:11:22Z

    Fixed bug from last commit (sorting paramMap by parameter names in 
toString).  Fixed bug in persisting logreg data.  Added threshold_internal to 
logreg for faster test-time prediction (avoiding map lookup).

commit 09fb85fb7502a64a661c5f8ae4c941971ff861c8
Author: Joseph K. Bradley <jos...@databricks.com>
Date:   2014-12-05T18:22:10Z

    Fixed issue with logreg threshold being set correctly

commit a0faf022792524c5a33a20d7cb591a91a7ac160b
Author: Joseph K. Bradley <jos...@databricks.com>
Date:   2014-12-05T18:43:14Z

    Updated docs.  Added LabeledPointSuite to spark.ml

commit 3e961cb6616906940fd646639f818c58d29c04f6
Author: Joseph K. Bradley <jos...@databricks.com>
Date:   2014-12-05T23:15:48Z

    * Changed semantics of Predictor.train() to merge the given paramMap with 
the embedded paramMap.
    * remove threshold_internal from logreg
    * Added Predictor.copy()
    * Extended LogisticRegressionSuite

commit 8922966757e7b5d7588613f5dfc11cee267de1b4
Author: Joseph K. Bradley <jos...@databricks.com>
Date:   2014-12-06T01:32:14Z

    added train() to Predictor subclasses which does not take a ParamMap.

commit 0c45756e3614c027d662d70dfa11d736690dc837
Author: Joseph K. Bradley <jos...@databricks.com>
Date:   2014-12-06T03:57:12Z

    * fixed LinearRegression train() to use embedded paramMap
    * added Predictor.predict(RDD[Vector]) method
    * updated Linear/LogisticRegressionSuites

commit 6be36c16484478bdb9d847fd343d6b7319759b21
Author: Joseph K. Bradley <jos...@databricks.com>
Date:   2014-12-06T06:18:30Z

    Added JavaLabeledPointSuite.java for spark.ml, and added constructor to 
LabeledPoint which defaults weight to 1.0

commit d8eaf7099a9be6157f90b11f82917ca5b604e1bd
Author: Joseph K. Bradley <jos...@databricks.com>
Date:   2014-12-08T19:09:03Z

    Added methods:
    * Classifier: batch predictRaw()
    * Predictor: train() without paramMap
    ProbabilisticClassificationModel.predictProbabilities()
    * Java versions of all above batch methods + others
    
    Updated LogisticRegressionSuite.
    Updated JavaLogisticRegressionSuite to match LogisticRegressionSuite.

commit 1e46094fbf2534ff022cb843a811b3fbd7fb9d64
Author: Joseph K. Bradley <jos...@databricks.com>
Date:   2014-12-08T19:51:55Z

    Added spark.ml LinearRegressionSuite

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to