Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16076#discussion_r90347291 --- Diff: docs/ml-guide.md --- @@ -60,152 +60,37 @@ MLlib is under active development. The APIs marked `Experimental`/`DeveloperApi` may change in future releases, and the migration guide below will explain all changes between releases. -## From 1.6 to 2.0 +## From 2.0 to 2.1 ### Breaking changes -There were several breaking changes in Spark 2.0, which are outlined below. - -**Linear algebra classes for DataFrame-based APIs** - -Spark's linear algebra dependencies were moved to a new project, `mllib-local` -(see [SPARK-13944](https://issues.apache.org/jira/browse/SPARK-13944)). -As part of this change, the linear algebra classes were copied to a new package, `spark.ml.linalg`. -The DataFrame-based APIs in `spark.ml` now depend on the `spark.ml.linalg` classes, -leading to a few breaking changes, predominantly in various model classes -(see [SPARK-14810](https://issues.apache.org/jira/browse/SPARK-14810) for a full list). - -**Note:** the RDD-based APIs in `spark.mllib` continue to depend on the previous package `spark.mllib.linalg`. - -_Converting vectors and matrices_ - -While most pipeline components support backward compatibility for loading, -some existing `DataFrames` and pipelines in Spark versions prior to 2.0, that contain vector or matrix -columns, may need to be migrated to the new `spark.ml` vector and matrix types. -Utilities for converting `DataFrame` columns from `spark.mllib.linalg` to `spark.ml.linalg` types -(and vice versa) can be found in `spark.mllib.util.MLUtils`. - -There are also utility methods available for converting single instances of -vectors and matrices. Use the `asML` method on a `mllib.linalg.Vector` / `mllib.linalg.Matrix` -for converting to `ml.linalg` types, and -`mllib.linalg.Vectors.fromML` / `mllib.linalg.Matrices.fromML` -for converting to `mllib.linalg` types. - -<div class="codetabs"> -<div data-lang="scala" markdown="1"> - -{% highlight scala %} -import org.apache.spark.mllib.util.MLUtils - -// convert DataFrame columns -val convertedVecDF = MLUtils.convertVectorColumnsToML(vecDF) -val convertedMatrixDF = MLUtils.convertMatrixColumnsToML(matrixDF) -// convert a single vector or matrix -val mlVec: org.apache.spark.ml.linalg.Vector = mllibVec.asML -val mlMat: org.apache.spark.ml.linalg.Matrix = mllibMat.asML -{% endhighlight %} - -Refer to the [`MLUtils` Scala docs](api/scala/index.html#org.apache.spark.mllib.util.MLUtils$) for further detail. -</div> - -<div data-lang="java" markdown="1"> - -{% highlight java %} -import org.apache.spark.mllib.util.MLUtils; -import org.apache.spark.sql.Dataset; - -// convert DataFrame columns -Dataset<Row> convertedVecDF = MLUtils.convertVectorColumnsToML(vecDF); -Dataset<Row> convertedMatrixDF = MLUtils.convertMatrixColumnsToML(matrixDF); -// convert a single vector or matrix -org.apache.spark.ml.linalg.Vector mlVec = mllibVec.asML(); -org.apache.spark.ml.linalg.Matrix mlMat = mllibMat.asML(); -{% endhighlight %} - -Refer to the [`MLUtils` Java docs](api/java/org/apache/spark/mllib/util/MLUtils.html) for further detail. -</div> - -<div data-lang="python" markdown="1"> - -{% highlight python %} -from pyspark.mllib.util import MLUtils - -# convert DataFrame columns -convertedVecDF = MLUtils.convertVectorColumnsToML(vecDF) -convertedMatrixDF = MLUtils.convertMatrixColumnsToML(matrixDF) -# convert a single vector or matrix -mlVec = mllibVec.asML() -mlMat = mllibMat.asML() -{% endhighlight %} - -Refer to the [`MLUtils` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.util.MLUtils) for further detail. -</div> -</div> - **Deprecated methods removed** -Several deprecated methods were removed in the `spark.mllib` and `spark.ml` packages: - -* `setScoreCol` in `ml.evaluation.BinaryClassificationEvaluator` -* `weights` in `LinearRegression` and `LogisticRegression` in `spark.ml` -* `setMaxNumIterations` in `mllib.optimization.LBFGS` (marked as `DeveloperApi`) -* `treeReduce` and `treeAggregate` in `mllib.rdd.RDDFunctions` (these functions are available on `RDD`s directly, and were marked as `DeveloperApi`) -* `defaultStategy` in `mllib.tree.configuration.Strategy` -* `build` in `mllib.tree.Node` -* libsvm loaders for multiclass and load/save labeledData methods in `mllib.util.MLUtils` - -A full list of breaking changes can be found at [SPARK-14810](https://issues.apache.org/jira/browse/SPARK-14810). +* `setLabelCol` in `feature.ChiSqSelectorModel` +* `numTrees` in `classification.RandomForestClassificationModel` (This now refers to the Param called `numTrees`) +* `numTrees` in `regression.RandomForestRegressionModel` (This now refers to the Param called `numTrees`) +* `model` in `regression.LinearRegressionSummary` +* `validateParams` in `PipelineStage` --- End diff -- Also: validateParams in Evaluator
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org