spark git commit: [SPARK-11608][MLLIB][DOC] Added migration guide for MLlib 1.6

jkbradley Wed, 16 Dec 2015 11:53:22 -0800

Repository: spark
Updated Branches:
  refs/heads/master 6a880afa8 -> 8148cc7a5



[SPARK-11608][MLLIB][DOC] Added migration guide for MLlib 1.6

No known breaking changes, but some deprecations and changes of behavior.

CC: mengxr

Author: Joseph K. Bradley <jos...@databricks.com>

Closes #10235 from jkbradley/mllib-guide-update-1.6.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8148cc7a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8148cc7a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8148cc7a

Branch: refs/heads/master
Commit: 8148cc7a5c9f52c82c2eb7652d9aeba85e72d406
Parents: 6a880af
Author: Joseph K. Bradley <jos...@databricks.com>
Authored: Wed Dec 16 11:53:04 2015 -0800
Committer: Joseph K. Bradley <jos...@databricks.com>
Committed: Wed Dec 16 11:53:04 2015 -0800

----------------------------------------------------------------------
 docs/mllib-guide.md            | 38 ++++++++++++++++++++++---------------
 docs/mllib-migration-guides.md | 19 +++++++++++++++++++
 2 files changed, 42 insertions(+), 15 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/8148cc7a/docs/mllib-guide.md
----------------------------------------------------------------------
diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index 680ed48..7ef91a1 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -74,7 +74,7 @@ We list major functionality from both below, with links to 
detailed guides.
 * [Advanced topics](ml-advanced.html)
 
 Some techniques are not available yet in spark.ml, most notably dimensionality 
reduction 
-Users can seemlessly combine the implementation of these techniques found in 
`spark.mllib` with the rest of the algorithms found in `spark.ml`.
+Users can seamlessly combine the implementation of these techniques found in 
`spark.mllib` with the rest of the algorithms found in `spark.ml`.
 
 # Dependencies
 
@@ -101,24 +101,32 @@ MLlib is under active development.
 The APIs marked `Experimental`/`DeveloperApi` may change in future releases,
 and the migration guide below will explain all changes between releases.
 
-## From 1.4 to 1.5
+## From 1.5 to 1.6
 
-In the `spark.mllib` package, there are no break API changes but several 
behavior changes:
+There are no breaking API changes in the `spark.mllib` or `spark.ml` packages, 
but there are
+deprecations and changes of behavior.
 
-* [SPARK-9005](https://issues.apache.org/jira/browse/SPARK-9005):
-  `RegressionMetrics.explainedVariance` returns the average regression sum of 
squares.
-* [SPARK-8600](https://issues.apache.org/jira/browse/SPARK-8600): 
`NaiveBayesModel.labels` become
-  sorted.
-* [SPARK-3382](https://issues.apache.org/jira/browse/SPARK-3382): 
`GradientDescent` has a default
-  convergence tolerance `1e-3`, and hence iterations might end earlier than 
1.4.
+Deprecations:
 
-In the `spark.ml` package, there exists one break API change and one behavior 
change:
+* [SPARK-11358](https://issues.apache.org/jira/browse/SPARK-11358):
+ In `spark.mllib.clustering.KMeans`, the `runs` parameter has been deprecated.
+* [SPARK-10592](https://issues.apache.org/jira/browse/SPARK-10592):
+ In `spark.ml.classification.LogisticRegressionModel` and
+ `spark.ml.regression.LinearRegressionModel`, the `weights` field has been 
deprecated in favor of
+ the new name `coefficients`.  This helps disambiguate from instance (row) 
"weights" given to
+ algorithms.
 
-* [SPARK-9268](https://issues.apache.org/jira/browse/SPARK-9268): Java's 
varargs support is removed
-  from `Params.setDefault` due to a
-  [Scala compiler bug](https://issues.scala-lang.org/browse/SI-9013).
-* [SPARK-10097](https://issues.apache.org/jira/browse/SPARK-10097): 
`Evaluator.isLargerBetter` is
-  added to indicate metric ordering. Metrics like RMSE no longer flip signs as 
in 1.4.
+Changes of behavior:
+
+* [SPARK-7770](https://issues.apache.org/jira/browse/SPARK-7770):
+ `spark.mllib.tree.GradientBoostedTrees`: `validationTol` has changed 
semantics in 1.6.
+ Previously, it was a threshold for absolute change in error. Now, it 
resembles the behavior of
+ `GradientDescent`'s `convergenceTol`: For large errors, it uses relative 
error (relative to the
+ previous error); for small errors (`< 0.01`), it uses absolute error.
+* [SPARK-11069](https://issues.apache.org/jira/browse/SPARK-11069):
+ `spark.ml.feature.RegexTokenizer`: Previously, it did not convert strings to 
lowercase before
+ tokenizing. Now, it converts to lowercase by default, with an option not to. 
This matches the
+ behavior of the simpler `Tokenizer` transformer.
 
 ## Previous Spark versions
 

http://git-wip-us.apache.org/repos/asf/spark/blob/8148cc7a/docs/mllib-migration-guides.md
----------------------------------------------------------------------
diff --git a/docs/mllib-migration-guides.md b/docs/mllib-migration-guides.md
index 73e4fdd..f3daef2 100644
--- a/docs/mllib-migration-guides.md
+++ b/docs/mllib-migration-guides.md
@@ -7,6 +7,25 @@ description: MLlib migration guides from before Spark 
SPARK_VERSION_SHORT
 
 The migration guide for the current Spark version is kept on the [MLlib 
Programming Guide main page](mllib-guide.html#migration-guide).
 
+## From 1.4 to 1.5
+
+In the `spark.mllib` package, there are no breaking API changes but several 
behavior changes:
+
+* [SPARK-9005](https://issues.apache.org/jira/browse/SPARK-9005):
+  `RegressionMetrics.explainedVariance` returns the average regression sum of 
squares.
+* [SPARK-8600](https://issues.apache.org/jira/browse/SPARK-8600): 
`NaiveBayesModel.labels` become
+  sorted.
+* [SPARK-3382](https://issues.apache.org/jira/browse/SPARK-3382): 
`GradientDescent` has a default
+  convergence tolerance `1e-3`, and hence iterations might end earlier than 
1.4.
+
+In the `spark.ml` package, there exists one breaking API change and one 
behavior change:
+
+* [SPARK-9268](https://issues.apache.org/jira/browse/SPARK-9268): Java's 
varargs support is removed
+  from `Params.setDefault` due to a
+  [Scala compiler bug](https://issues.scala-lang.org/browse/SI-9013).
+* [SPARK-10097](https://issues.apache.org/jira/browse/SPARK-10097): 
`Evaluator.isLargerBetter` is
+  added to indicate metric ordering. Metrics like RMSE no longer flip signs as 
in 1.4.
+
 ## From 1.3 to 1.4
 
 In the `spark.mllib` package, there were several breaking changes, but all in 
`DeveloperApi` or `Experimental` APIs:


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-11608][MLLIB][DOC] Added migration guide for MLlib 1.6

Reply via email to