spark git commit: [SPARK-15031][EXAMPLES][FOLLOW-UP] Make Python param example working with SparkSession

2016-05-18 Thread mlnick
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 34c743c4b -> b2a4dac2d


[SPARK-15031][EXAMPLES][FOLLOW-UP] Make Python param example working with 
SparkSession

## What changes were proposed in this pull request?

It seems most of Python examples were changed to use SparkSession by 
https://github.com/apache/spark/pull/12809. This PR said both examples below:

- `simple_params_example.py`
- `aft_survival_regression.py`

are not changed because it dose not work. It seems `aft_survival_regression.py` 
is changed by https://github.com/apache/spark/pull/13050 but 
`simple_params_example.py` is not yet.

This PR corrects the example and make this use SparkSession.

In more detail, it seems `threshold` is replaced to `thresholds` here and there 
by 
https://github.com/apache/spark/commit/5a23213c148bfe362514f9c71f5273ebda0a848a.
 However, when it calls `lr.fit(training, paramMap)` this overwrites the 
values. So, `threshold` was 5 and `thresholds` becomes 5.5 (by `1 / (1 + 
thresholds(0) / thresholds(1)`).

According to the comment below. this is not allowed, 
https://github.com/apache/spark/blob/354f8f11bd4b20fa99bd67a98da3525fd3d75c81/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L58-L61.

So, in this PR, it sets the equivalent value so that this does not throw an 
exception.

## How was this patch tested?

Manully (`mvn package -DskipTests && spark-submit simple_params_example.py`)

Author: hyukjinkwon 

Closes #13135 from HyukjinKwon/SPARK-15031.

(cherry picked from commit e2ec32dab8530aa21ec95a27d60b1c22f3d1a18c)
Signed-off-by: Nick Pentreath 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b2a4dac2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b2a4dac2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b2a4dac2

Branch: refs/heads/branch-2.0
Commit: b2a4dac2d92e906460fe3ca0a38fc672a82eb6cb
Parents: 34c743c
Author: hyukjinkwon 
Authored: Thu May 19 08:52:41 2016 +0200
Committer: Nick Pentreath 
Committed: Thu May 19 08:53:35 2016 +0200

--
 .../examples/ml/JavaSimpleParamsExample.java|  2 +-
 .../src/main/python/ml/simple_params_example.py | 24 +---
 .../spark/examples/ml/SimpleParamsExample.scala |  2 +-
 3 files changed, 13 insertions(+), 15 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b2a4dac2/examples/src/main/java/org/apache/spark/examples/ml/JavaSimpleParamsExample.java
--
diff --git 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaSimpleParamsExample.java
 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaSimpleParamsExample.java
index ff1eb07..ca80d0d 100644
--- 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaSimpleParamsExample.java
+++ 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaSimpleParamsExample.java
@@ -77,7 +77,7 @@ public class JavaSimpleParamsExample {
 ParamMap paramMap = new ParamMap();
 paramMap.put(lr.maxIter().w(20)); // Specify 1 Param.
 paramMap.put(lr.maxIter(), 30); // This overwrites the original maxIter.
-double[] thresholds = {0.45, 0.55};
+double[] thresholds = {0.5, 0.5};
 paramMap.put(lr.regParam().w(0.1), lr.thresholds().w(thresholds)); // 
Specify multiple Params.
 
 // One can also combine ParamMaps.

http://git-wip-us.apache.org/repos/asf/spark/blob/b2a4dac2/examples/src/main/python/ml/simple_params_example.py
--
diff --git a/examples/src/main/python/ml/simple_params_example.py 
b/examples/src/main/python/ml/simple_params_example.py
index 2d6d115..c57e59d 100644
--- a/examples/src/main/python/ml/simple_params_example.py
+++ b/examples/src/main/python/ml/simple_params_example.py
@@ -20,11 +20,10 @@ from __future__ import print_function
 import pprint
 import sys
 
-from pyspark import SparkContext
 from pyspark.ml.classification import LogisticRegression
 from pyspark.mllib.linalg import DenseVector
 from pyspark.mllib.regression import LabeledPoint
-from pyspark.sql import SQLContext
+from pyspark.sql import SparkSession
 
 """
 A simple example demonstrating ways to specify parameters for Estimators and 
Transformers.
@@ -33,21 +32,20 @@ Run with:
 """
 
 if __name__ == "__main__":
-if len(sys.argv) > 1:
-print("Usage: simple_params_example", file=sys.stderr)
-exit(1)
-sc = SparkContext(appName="PythonSimpleParamsExample")
-sqlContext = SQLContext(sc)
+spark = SparkSession \
+.builder \
+.appName("SimpleTextClassificationPipeline") \
+.getOrCreate()
 
 # prepare training data.
 # We create an RDD of LabeledPoints and convert them into a DataFrame.
 # A LabeledPoint is an Object with two fi

spark git commit: [SPARK-15031][EXAMPLES][FOLLOW-UP] Make Python param example working with SparkSession

2016-05-18 Thread mlnick
Repository: spark
Updated Branches:
  refs/heads/master 661c21049 -> e2ec32dab


[SPARK-15031][EXAMPLES][FOLLOW-UP] Make Python param example working with 
SparkSession

## What changes were proposed in this pull request?

It seems most of Python examples were changed to use SparkSession by 
https://github.com/apache/spark/pull/12809. This PR said both examples below:

- `simple_params_example.py`
- `aft_survival_regression.py`

are not changed because it dose not work. It seems `aft_survival_regression.py` 
is changed by https://github.com/apache/spark/pull/13050 but 
`simple_params_example.py` is not yet.

This PR corrects the example and make this use SparkSession.

In more detail, it seems `threshold` is replaced to `thresholds` here and there 
by 
https://github.com/apache/spark/commit/5a23213c148bfe362514f9c71f5273ebda0a848a.
 However, when it calls `lr.fit(training, paramMap)` this overwrites the 
values. So, `threshold` was 5 and `thresholds` becomes 5.5 (by `1 / (1 + 
thresholds(0) / thresholds(1)`).

According to the comment below. this is not allowed, 
https://github.com/apache/spark/blob/354f8f11bd4b20fa99bd67a98da3525fd3d75c81/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L58-L61.

So, in this PR, it sets the equivalent value so that this does not throw an 
exception.

## How was this patch tested?

Manully (`mvn package -DskipTests && spark-submit simple_params_example.py`)

Author: hyukjinkwon 

Closes #13135 from HyukjinKwon/SPARK-15031.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e2ec32da
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e2ec32da
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e2ec32da

Branch: refs/heads/master
Commit: e2ec32dab8530aa21ec95a27d60b1c22f3d1a18c
Parents: 661c210
Author: hyukjinkwon 
Authored: Thu May 19 08:52:41 2016 +0200
Committer: Nick Pentreath 
Committed: Thu May 19 08:52:41 2016 +0200

--
 .../examples/ml/JavaSimpleParamsExample.java|  2 +-
 .../src/main/python/ml/simple_params_example.py | 24 +---
 .../spark/examples/ml/SimpleParamsExample.scala |  2 +-
 3 files changed, 13 insertions(+), 15 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e2ec32da/examples/src/main/java/org/apache/spark/examples/ml/JavaSimpleParamsExample.java
--
diff --git 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaSimpleParamsExample.java
 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaSimpleParamsExample.java
index ff1eb07..ca80d0d 100644
--- 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaSimpleParamsExample.java
+++ 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaSimpleParamsExample.java
@@ -77,7 +77,7 @@ public class JavaSimpleParamsExample {
 ParamMap paramMap = new ParamMap();
 paramMap.put(lr.maxIter().w(20)); // Specify 1 Param.
 paramMap.put(lr.maxIter(), 30); // This overwrites the original maxIter.
-double[] thresholds = {0.45, 0.55};
+double[] thresholds = {0.5, 0.5};
 paramMap.put(lr.regParam().w(0.1), lr.thresholds().w(thresholds)); // 
Specify multiple Params.
 
 // One can also combine ParamMaps.

http://git-wip-us.apache.org/repos/asf/spark/blob/e2ec32da/examples/src/main/python/ml/simple_params_example.py
--
diff --git a/examples/src/main/python/ml/simple_params_example.py 
b/examples/src/main/python/ml/simple_params_example.py
index 2d6d115..c57e59d 100644
--- a/examples/src/main/python/ml/simple_params_example.py
+++ b/examples/src/main/python/ml/simple_params_example.py
@@ -20,11 +20,10 @@ from __future__ import print_function
 import pprint
 import sys
 
-from pyspark import SparkContext
 from pyspark.ml.classification import LogisticRegression
 from pyspark.mllib.linalg import DenseVector
 from pyspark.mllib.regression import LabeledPoint
-from pyspark.sql import SQLContext
+from pyspark.sql import SparkSession
 
 """
 A simple example demonstrating ways to specify parameters for Estimators and 
Transformers.
@@ -33,21 +32,20 @@ Run with:
 """
 
 if __name__ == "__main__":
-if len(sys.argv) > 1:
-print("Usage: simple_params_example", file=sys.stderr)
-exit(1)
-sc = SparkContext(appName="PythonSimpleParamsExample")
-sqlContext = SQLContext(sc)
+spark = SparkSession \
+.builder \
+.appName("SimpleTextClassificationPipeline") \
+.getOrCreate()
 
 # prepare training data.
 # We create an RDD of LabeledPoints and convert them into a DataFrame.
 # A LabeledPoint is an Object with two fields named label and features
 # and Spark SQL identifies these fields and creates the schema 
appropriat