GitHub user jkbradley opened a pull request: https://github.com/apache/spark/pull/3637
[SPARK-4789] [mllib] Standardize ML Prediction APIs This is part (1) of the updates from the WIP PR in [https://github.com/apache/spark/pull/3427] Abstract classes for learning algorithms: * Classifier * Regressor * Predictor Traits for learning algorithms * ProbabilisticClassificationModel Concrete classes: learning algorithms * LinearRegression * LogisticRegression (updated to use new abstract classes) Concrete classes: other * LabeledPoint (adding weight to the old LabeledPoint) Other updates: * Modified ParamMap to sort parameters in toString Test Suites: * LabeledPointSuite * LinearRegressionSuite * LogisticRegressionSuite * + Java versions of above suites CC: @mengxr @etrain @shivaram You can merge this pull request into a Git repository by running: $ git pull https://github.com/jkbradley/spark ml-api-part1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3637.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3637 ---- commit de1e3b4c39b42757e56345a6bab2bdeefaa3ca25 Author: Joseph K. Bradley <jos...@databricks.com> Date: 2014-11-24T07:18:52Z Added lots of classes for new ML API: Abstract classes for learning algorithms: * Classifier * Regressor * Predictor Traits for learning algorithms * HasDefaultEstimator * IterativeEstimator * IterativeSolver * ProbabilisticClassificationModel * WeakLearner Concrete classes: learning algorithms * AdaBoost (partly implemented) * NaiveBayes (rough implementation) * LinearRegression * LogisticRegression (updated to use new abstract classes) Concrete classes: evaluation * ClassificationEvaluator * RegressionEvaluator * PredictionEvaluator Concrete classes: other * LabeledPoint (adding weight to the old LabeledPoint) commit 6551244b96d8f70f1daacd0415318cf81fd5111a Author: Joseph K. Bradley <jos...@databricks.com> Date: 2014-11-24T07:30:31Z fixed compilation issues, but have not added tests yet commit 25b643d4b367fea5a3ba1b91564374c2b1b7a0f1 Author: Joseph K. Bradley <jos...@databricks.com> Date: 2014-12-01T18:31:41Z removing everything except for simple class hierarchy for classification commit e61e2738dcb2494be25cec2bd798c3e6e5156b73 Author: Joseph K. Bradley <jos...@databricks.com> Date: 2014-12-04T21:37:29Z Added LinearRegression and Regressor back from ml-api branch commit 272e62fb41fc8778f3a13f812d4262d9558a772b Author: Joseph K. Bradley <jos...@databricks.com> Date: 2014-12-05T00:11:02Z Modified ParamMap to sort parameters in toString. Cleaned up classes in class hierarchy, before implementing tests and examples. commit cc13d61f2a277b101f7422af240afa64dfb10236 Author: Joseph K. Bradley <jos...@databricks.com> Date: 2014-12-05T01:11:22Z Fixed bug from last commit (sorting paramMap by parameter names in toString). Fixed bug in persisting logreg data. Added threshold_internal to logreg for faster test-time prediction (avoiding map lookup). commit 09fb85fb7502a64a661c5f8ae4c941971ff861c8 Author: Joseph K. Bradley <jos...@databricks.com> Date: 2014-12-05T18:22:10Z Fixed issue with logreg threshold being set correctly commit a0faf022792524c5a33a20d7cb591a91a7ac160b Author: Joseph K. Bradley <jos...@databricks.com> Date: 2014-12-05T18:43:14Z Updated docs. Added LabeledPointSuite to spark.ml commit 3e961cb6616906940fd646639f818c58d29c04f6 Author: Joseph K. Bradley <jos...@databricks.com> Date: 2014-12-05T23:15:48Z * Changed semantics of Predictor.train() to merge the given paramMap with the embedded paramMap. * remove threshold_internal from logreg * Added Predictor.copy() * Extended LogisticRegressionSuite commit 8922966757e7b5d7588613f5dfc11cee267de1b4 Author: Joseph K. Bradley <jos...@databricks.com> Date: 2014-12-06T01:32:14Z added train() to Predictor subclasses which does not take a ParamMap. commit 0c45756e3614c027d662d70dfa11d736690dc837 Author: Joseph K. Bradley <jos...@databricks.com> Date: 2014-12-06T03:57:12Z * fixed LinearRegression train() to use embedded paramMap * added Predictor.predict(RDD[Vector]) method * updated Linear/LogisticRegressionSuites commit 6be36c16484478bdb9d847fd343d6b7319759b21 Author: Joseph K. Bradley <jos...@databricks.com> Date: 2014-12-06T06:18:30Z Added JavaLabeledPointSuite.java for spark.ml, and added constructor to LabeledPoint which defaults weight to 1.0 commit d8eaf7099a9be6157f90b11f82917ca5b604e1bd Author: Joseph K. Bradley <jos...@databricks.com> Date: 2014-12-08T19:09:03Z Added methods: * Classifier: batch predictRaw() * Predictor: train() without paramMap ProbabilisticClassificationModel.predictProbabilities() * Java versions of all above batch methods + others Updated LogisticRegressionSuite. Updated JavaLogisticRegressionSuite to match LogisticRegressionSuite. commit 1e46094fbf2534ff022cb843a811b3fbd7fb9d64 Author: Joseph K. Bradley <jos...@databricks.com> Date: 2014-12-08T19:51:55Z Added spark.ml LinearRegressionSuite ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org