[GitHub] spark pull request #16076: [SPARK-18324][ML][DOC] Update ML programming and ...

jkbradley Wed, 30 Nov 2016 15:24:58 -0800

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16076#discussion_r90347291
  
    --- Diff: docs/ml-guide.md ---
    @@ -60,152 +60,37 @@ MLlib is under active development.
     The APIs marked `Experimental`/`DeveloperApi` may change in future 
releases,
     and the migration guide below will explain all changes between releases.
     
    -## From 1.6 to 2.0
    +## From 2.0 to 2.1
     
     ### Breaking changes
     
    -There were several breaking changes in Spark 2.0, which are outlined below.
    -
    -**Linear algebra classes for DataFrame-based APIs**
    -
    -Spark's linear algebra dependencies were moved to a new project, 
`mllib-local` 
    -(see [SPARK-13944](https://issues.apache.org/jira/browse/SPARK-13944)). 
    -As part of this change, the linear algebra classes were copied to a new 
package, `spark.ml.linalg`. 
    -The DataFrame-based APIs in `spark.ml` now depend on the `spark.ml.linalg` 
classes, 
    -leading to a few breaking changes, predominantly in various model classes 
    -(see [SPARK-14810](https://issues.apache.org/jira/browse/SPARK-14810) for 
a full list).
    -
    -**Note:** the RDD-based APIs in `spark.mllib` continue to depend on the 
previous package `spark.mllib.linalg`.
    -
    -_Converting vectors and matrices_
    -
    -While most pipeline components support backward compatibility for loading, 
    -some existing `DataFrames` and pipelines in Spark versions prior to 2.0, 
that contain vector or matrix 
    -columns, may need to be migrated to the new `spark.ml` vector and matrix 
types. 
    -Utilities for converting `DataFrame` columns from `spark.mllib.linalg` to 
`spark.ml.linalg` types
    -(and vice versa) can be found in `spark.mllib.util.MLUtils`.
    -
    -There are also utility methods available for converting single instances 
of 
    -vectors and matrices. Use the `asML` method on a `mllib.linalg.Vector` / 
`mllib.linalg.Matrix`
    -for converting to `ml.linalg` types, and 
    -`mllib.linalg.Vectors.fromML` / `mllib.linalg.Matrices.fromML` 
    -for converting to `mllib.linalg` types.
    -
    -<div class="codetabs">
    -<div data-lang="scala"  markdown="1">
    -
    -{% highlight scala %}
    -import org.apache.spark.mllib.util.MLUtils
    -
    -// convert DataFrame columns
    -val convertedVecDF = MLUtils.convertVectorColumnsToML(vecDF)
    -val convertedMatrixDF = MLUtils.convertMatrixColumnsToML(matrixDF)
    -// convert a single vector or matrix
    -val mlVec: org.apache.spark.ml.linalg.Vector = mllibVec.asML
    -val mlMat: org.apache.spark.ml.linalg.Matrix = mllibMat.asML
    -{% endhighlight %}
    -
    -Refer to the [`MLUtils` Scala 
docs](api/scala/index.html#org.apache.spark.mllib.util.MLUtils$) for further 
detail.
    -</div>
    -
    -<div data-lang="java" markdown="1">
    -
    -{% highlight java %}
    -import org.apache.spark.mllib.util.MLUtils;
    -import org.apache.spark.sql.Dataset;
    -
    -// convert DataFrame columns
    -Dataset<Row> convertedVecDF = MLUtils.convertVectorColumnsToML(vecDF);
    -Dataset<Row> convertedMatrixDF = 
MLUtils.convertMatrixColumnsToML(matrixDF);
    -// convert a single vector or matrix
    -org.apache.spark.ml.linalg.Vector mlVec = mllibVec.asML();
    -org.apache.spark.ml.linalg.Matrix mlMat = mllibMat.asML();
    -{% endhighlight %}
    -
    -Refer to the [`MLUtils` Java 
docs](api/java/org/apache/spark/mllib/util/MLUtils.html) for further detail.
    -</div>
    -
    -<div data-lang="python"  markdown="1">
    -
    -{% highlight python %}
    -from pyspark.mllib.util import MLUtils
    -
    -# convert DataFrame columns
    -convertedVecDF = MLUtils.convertVectorColumnsToML(vecDF)
    -convertedMatrixDF = MLUtils.convertMatrixColumnsToML(matrixDF)
    -# convert a single vector or matrix
    -mlVec = mllibVec.asML()
    -mlMat = mllibMat.asML()
    -{% endhighlight %}
    -
    -Refer to the [`MLUtils` Python 
docs](api/python/pyspark.mllib.html#pyspark.mllib.util.MLUtils) for further 
detail.
    -</div>
    -</div>
    -
     **Deprecated methods removed**
     
    -Several deprecated methods were removed in the `spark.mllib` and 
`spark.ml` packages:
    -
    -* `setScoreCol` in `ml.evaluation.BinaryClassificationEvaluator`
    -* `weights` in `LinearRegression` and `LogisticRegression` in `spark.ml`
    -* `setMaxNumIterations` in `mllib.optimization.LBFGS` (marked as 
`DeveloperApi`)
    -* `treeReduce` and `treeAggregate` in `mllib.rdd.RDDFunctions` (these 
functions are available on `RDD`s directly, and were marked as `DeveloperApi`)
    -* `defaultStategy` in `mllib.tree.configuration.Strategy`
    -* `build` in `mllib.tree.Node`
    -* libsvm loaders for multiclass and load/save labeledData methods in 
`mllib.util.MLUtils`
    -
    -A full list of breaking changes can be found at 
[SPARK-14810](https://issues.apache.org/jira/browse/SPARK-14810).
    +* `setLabelCol` in `feature.ChiSqSelectorModel`
    +* `numTrees` in `classification.RandomForestClassificationModel` (This now 
refers to the Param called `numTrees`)
    +* `numTrees` in `regression.RandomForestRegressionModel` (This now refers 
to the Param called `numTrees`)
    +* `model` in `regression.LinearRegressionSummary`
    +* `validateParams` in `PipelineStage`
    --- End diff --
    
    Also: validateParams in Evaluator



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16076: [SPARK-18324][ML][DOC] Update ML programming and ...

Reply via email to