spark git commit: [SPARK-15031][EXAMPLES][FOLLOW-UP] Make Python param example working with SparkSession
Repository: spark Updated Branches: refs/heads/branch-2.0 34c743c4b -> b2a4dac2d [SPARK-15031][EXAMPLES][FOLLOW-UP] Make Python param example working with SparkSession ## What changes were proposed in this pull request? It seems most of Python examples were changed to use SparkSession by https://github.com/apache/spark/pull/12809. This PR said both examples below: - `simple_params_example.py` - `aft_survival_regression.py` are not changed because it dose not work. It seems `aft_survival_regression.py` is changed by https://github.com/apache/spark/pull/13050 but `simple_params_example.py` is not yet. This PR corrects the example and make this use SparkSession. In more detail, it seems `threshold` is replaced to `thresholds` here and there by https://github.com/apache/spark/commit/5a23213c148bfe362514f9c71f5273ebda0a848a. However, when it calls `lr.fit(training, paramMap)` this overwrites the values. So, `threshold` was 5 and `thresholds` becomes 5.5 (by `1 / (1 + thresholds(0) / thresholds(1)`). According to the comment below. this is not allowed, https://github.com/apache/spark/blob/354f8f11bd4b20fa99bd67a98da3525fd3d75c81/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L58-L61. So, in this PR, it sets the equivalent value so that this does not throw an exception. ## How was this patch tested? Manully (`mvn package -DskipTests && spark-submit simple_params_example.py`) Author: hyukjinkwon Closes #13135 from HyukjinKwon/SPARK-15031. (cherry picked from commit e2ec32dab8530aa21ec95a27d60b1c22f3d1a18c) Signed-off-by: Nick Pentreath Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b2a4dac2 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b2a4dac2 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b2a4dac2 Branch: refs/heads/branch-2.0 Commit: b2a4dac2d92e906460fe3ca0a38fc672a82eb6cb Parents: 34c743c Author: hyukjinkwon Authored: Thu May 19 08:52:41 2016 +0200 Committer: Nick Pentreath Committed: Thu May 19 08:53:35 2016 +0200 -- .../examples/ml/JavaSimpleParamsExample.java| 2 +- .../src/main/python/ml/simple_params_example.py | 24 +--- .../spark/examples/ml/SimpleParamsExample.scala | 2 +- 3 files changed, 13 insertions(+), 15 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b2a4dac2/examples/src/main/java/org/apache/spark/examples/ml/JavaSimpleParamsExample.java -- diff --git a/examples/src/main/java/org/apache/spark/examples/ml/JavaSimpleParamsExample.java b/examples/src/main/java/org/apache/spark/examples/ml/JavaSimpleParamsExample.java index ff1eb07..ca80d0d 100644 --- a/examples/src/main/java/org/apache/spark/examples/ml/JavaSimpleParamsExample.java +++ b/examples/src/main/java/org/apache/spark/examples/ml/JavaSimpleParamsExample.java @@ -77,7 +77,7 @@ public class JavaSimpleParamsExample { ParamMap paramMap = new ParamMap(); paramMap.put(lr.maxIter().w(20)); // Specify 1 Param. paramMap.put(lr.maxIter(), 30); // This overwrites the original maxIter. -double[] thresholds = {0.45, 0.55}; +double[] thresholds = {0.5, 0.5}; paramMap.put(lr.regParam().w(0.1), lr.thresholds().w(thresholds)); // Specify multiple Params. // One can also combine ParamMaps. http://git-wip-us.apache.org/repos/asf/spark/blob/b2a4dac2/examples/src/main/python/ml/simple_params_example.py -- diff --git a/examples/src/main/python/ml/simple_params_example.py b/examples/src/main/python/ml/simple_params_example.py index 2d6d115..c57e59d 100644 --- a/examples/src/main/python/ml/simple_params_example.py +++ b/examples/src/main/python/ml/simple_params_example.py @@ -20,11 +20,10 @@ from __future__ import print_function import pprint import sys -from pyspark import SparkContext from pyspark.ml.classification import LogisticRegression from pyspark.mllib.linalg import DenseVector from pyspark.mllib.regression import LabeledPoint -from pyspark.sql import SQLContext +from pyspark.sql import SparkSession """ A simple example demonstrating ways to specify parameters for Estimators and Transformers. @@ -33,21 +32,20 @@ Run with: """ if __name__ == "__main__": -if len(sys.argv) > 1: -print("Usage: simple_params_example", file=sys.stderr) -exit(1) -sc = SparkContext(appName="PythonSimpleParamsExample") -sqlContext = SQLContext(sc) +spark = SparkSession \ +.builder \ +.appName("SimpleTextClassificationPipeline") \ +.getOrCreate() # prepare training data. # We create an RDD of LabeledPoints and convert them into a DataFrame. # A LabeledPoint is an Object with two fi
spark git commit: [SPARK-15031][EXAMPLES][FOLLOW-UP] Make Python param example working with SparkSession
Repository: spark Updated Branches: refs/heads/master 661c21049 -> e2ec32dab [SPARK-15031][EXAMPLES][FOLLOW-UP] Make Python param example working with SparkSession ## What changes were proposed in this pull request? It seems most of Python examples were changed to use SparkSession by https://github.com/apache/spark/pull/12809. This PR said both examples below: - `simple_params_example.py` - `aft_survival_regression.py` are not changed because it dose not work. It seems `aft_survival_regression.py` is changed by https://github.com/apache/spark/pull/13050 but `simple_params_example.py` is not yet. This PR corrects the example and make this use SparkSession. In more detail, it seems `threshold` is replaced to `thresholds` here and there by https://github.com/apache/spark/commit/5a23213c148bfe362514f9c71f5273ebda0a848a. However, when it calls `lr.fit(training, paramMap)` this overwrites the values. So, `threshold` was 5 and `thresholds` becomes 5.5 (by `1 / (1 + thresholds(0) / thresholds(1)`). According to the comment below. this is not allowed, https://github.com/apache/spark/blob/354f8f11bd4b20fa99bd67a98da3525fd3d75c81/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L58-L61. So, in this PR, it sets the equivalent value so that this does not throw an exception. ## How was this patch tested? Manully (`mvn package -DskipTests && spark-submit simple_params_example.py`) Author: hyukjinkwon Closes #13135 from HyukjinKwon/SPARK-15031. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e2ec32da Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e2ec32da Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e2ec32da Branch: refs/heads/master Commit: e2ec32dab8530aa21ec95a27d60b1c22f3d1a18c Parents: 661c210 Author: hyukjinkwon Authored: Thu May 19 08:52:41 2016 +0200 Committer: Nick Pentreath Committed: Thu May 19 08:52:41 2016 +0200 -- .../examples/ml/JavaSimpleParamsExample.java| 2 +- .../src/main/python/ml/simple_params_example.py | 24 +--- .../spark/examples/ml/SimpleParamsExample.scala | 2 +- 3 files changed, 13 insertions(+), 15 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e2ec32da/examples/src/main/java/org/apache/spark/examples/ml/JavaSimpleParamsExample.java -- diff --git a/examples/src/main/java/org/apache/spark/examples/ml/JavaSimpleParamsExample.java b/examples/src/main/java/org/apache/spark/examples/ml/JavaSimpleParamsExample.java index ff1eb07..ca80d0d 100644 --- a/examples/src/main/java/org/apache/spark/examples/ml/JavaSimpleParamsExample.java +++ b/examples/src/main/java/org/apache/spark/examples/ml/JavaSimpleParamsExample.java @@ -77,7 +77,7 @@ public class JavaSimpleParamsExample { ParamMap paramMap = new ParamMap(); paramMap.put(lr.maxIter().w(20)); // Specify 1 Param. paramMap.put(lr.maxIter(), 30); // This overwrites the original maxIter. -double[] thresholds = {0.45, 0.55}; +double[] thresholds = {0.5, 0.5}; paramMap.put(lr.regParam().w(0.1), lr.thresholds().w(thresholds)); // Specify multiple Params. // One can also combine ParamMaps. http://git-wip-us.apache.org/repos/asf/spark/blob/e2ec32da/examples/src/main/python/ml/simple_params_example.py -- diff --git a/examples/src/main/python/ml/simple_params_example.py b/examples/src/main/python/ml/simple_params_example.py index 2d6d115..c57e59d 100644 --- a/examples/src/main/python/ml/simple_params_example.py +++ b/examples/src/main/python/ml/simple_params_example.py @@ -20,11 +20,10 @@ from __future__ import print_function import pprint import sys -from pyspark import SparkContext from pyspark.ml.classification import LogisticRegression from pyspark.mllib.linalg import DenseVector from pyspark.mllib.regression import LabeledPoint -from pyspark.sql import SQLContext +from pyspark.sql import SparkSession """ A simple example demonstrating ways to specify parameters for Estimators and Transformers. @@ -33,21 +32,20 @@ Run with: """ if __name__ == "__main__": -if len(sys.argv) > 1: -print("Usage: simple_params_example", file=sys.stderr) -exit(1) -sc = SparkContext(appName="PythonSimpleParamsExample") -sqlContext = SQLContext(sc) +spark = SparkSession \ +.builder \ +.appName("SimpleTextClassificationPipeline") \ +.getOrCreate() # prepare training data. # We create an RDD of LabeledPoints and convert them into a DataFrame. # A LabeledPoint is an Object with two fields named label and features # and Spark SQL identifies these fields and creates the schema appropriat