[GitHub] spark issue #20896: [SPARK-23788][SS] Fix race in StreamingQuerySuite

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20896
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88555/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20896: [SPARK-23788][SS] Fix race in StreamingQuerySuite

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20896
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20896: [SPARK-23788][SS] Fix race in StreamingQuerySuite

2018-03-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20896
  
**[Test build #88555 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88555/testReport)**
 for PR 20896 at commit 
[`e257b69`](https://github.com/apache/spark/commit/e257b69044966a9b797886f367b3cd1792c2d687).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20897: [MINOR][DOC] Fix a few markdown typos

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20897
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20897: [MINOR][DOC] Fix a few markdown typos

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20897
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20889: [MINOR][DOC] Fix ml-guide markdown typos

2018-03-23 Thread Lemonjing
Github user Lemonjing commented on the issue:

https://github.com/apache/spark/pull/20889
  
can someone help me to close this pr?  i have merged these commits to a new 
pr.
https://github.com/apache/spark/pull/20897


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20897: [MINOR][DOC] Fix a few markdown typos

2018-03-23 Thread Lemonjing
GitHub user Lemonjing opened a pull request:

https://github.com/apache/spark/pull/20897

[MINOR][DOC] Fix a few markdown typos

## What changes were proposed in this pull request?

Easy fix in the markdown.

## How was this patch tested?

jekyII build test manually.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Lemonjing/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20897.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20897


commit 937bbef522eedddbcb502f7f9692564040a63cd7
Author: lemonjing <932191671@...>
Date:   2018-03-24T04:45:43Z

[MINOR][DOC] Fix a few markdown typos




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20842: [SPARK-23162][PySpark][ML] Add r2adj into Python ...

2018-03-23 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20842#discussion_r176899545
  
--- Diff: python/pyspark/ml/regression.py ---
@@ -347,6 +347,20 @@ def r2(self):
 """
 return self._call_java("r2")
 
+@property
+@since("2.4.0")
+def r2adj(self):
+"""
+Returns Adjusted R^2^, the adjusted coefficient of determination.
+
+.. seealso:: `Wikipedia coefficient of determination \
+`
--- End diff --

ok, done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20842: [SPARK-23162][PySpark][ML] Add r2adj into Python ...

2018-03-23 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20842#discussion_r176899541
  
--- Diff: python/pyspark/ml/regression.py ---
@@ -347,6 +347,20 @@ def r2(self):
 """
 return self._call_java("r2")
 
+@property
+@since("2.4.0")
+def r2adj(self):
+"""
+Returns Adjusted R^2^, the adjusted coefficient of determination.
--- End diff --

sure, I will change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20842: [SPARK-23162][PySpark][ML] Add r2adj into Python API in ...

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20842
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88556/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20842: [SPARK-23162][PySpark][ML] Add r2adj into Python API in ...

2018-03-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20842
  
**[Test build #88556 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88556/testReport)**
 for PR 20842 at commit 
[`ab0b04d`](https://github.com/apache/spark/commit/ab0b04da689f2723e2d928c6f178cbb203194a99).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20842: [SPARK-23162][PySpark][ML] Add r2adj into Python API in ...

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20842
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20842: [SPARK-23162][PySpark][ML] Add r2adj into Python API in ...

2018-03-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20842
  
**[Test build #88556 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88556/testReport)**
 for PR 20842 at commit 
[`ab0b04d`](https://github.com/apache/spark/commit/ab0b04da689f2723e2d928c6f178cbb203194a99).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20003: [SPARK-22817][R] Use fixed testthat version for SparkR t...

2018-03-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20003
  
@shaneknapp, the current failure of 
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6-ubuntu-test/509/console
 can be fixed by lowering the version of `testthat` just as a gentle remainder.

@felixcheung opened  JIRA for it here - 
https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-23435

cc @JoshRosen, @shivaram and @falaki too just FYI in case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20892: [SPARK-23700][PYTHON] Cleanup imports in pyspark....

2018-03-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20892#discussion_r176897639
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -28,10 +27,10 @@
 
 from pyspark import since, SparkContext
 from pyspark.rdd import ignore_unicode_prefix, PythonEvalType
-from pyspark.serializers import PickleSerializer, AutoBatchedSerializer
 from pyspark.sql.column import Column, _to_java_column, _to_seq
 from pyspark.sql.dataframe import DataFrame
 from pyspark.sql.types import StringType, DataType
+# Keep UserDefinedFunction import for backwards compatible import; moved 
in SPARK-22409
 from pyspark.sql.udf import UserDefinedFunction, _create_udf
--- End diff --

yea, I think we should better keep this import and the comment looks good.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20884: [SPARK-23773][SQL] JacksonGenerator does not incl...

2018-03-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20884#discussion_r176897510
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
@@ -1229,7 +1229,7 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
 val df2 = df1.toDF
 val result = df2.toJSON.collect()
 // scalastyle:off
-assert(result(0) === 
"{\"f1\":1,\"f2\":\"A1\",\"f3\":true,\"f4\":[\"1\",\" A1\",\" true\",\" 
null\"]}")
+assert(result(0) === 
"{\"f1\":1,\"f2\":\"A1\",\"f3\":true,\"f4\":[\"1\",\" A1\",\" true\",\" 
null\"],\"f5\":null}")
--- End diff --

If we go the current way, it'd write out every `null` with every field:

```json
{"a":null,"b":null,"c":null}
{"a":null,"b":null,"c":1}
{"a":1,"b":null,"c":1}
{"a":1,"b":2,"c":3}
```

which I think's quit inefficient. Does that fix actually use case to be 
clear?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-23 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r176897485
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -353,6 +353,13 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val PARQUET_FILTER_PUSHDOWN_DATE_ENABLED = 
buildConf("spark.sql.parquet.filterPushdown.date")
+.doc("If true, enables Parquet filter push-down optimization for Date. 
" +
+  "This configuration only has an effect when 
'spark.sql.parquet.filterPushdown' is enabled.")
+.internal()
+.booleanConf
+.createWithDefault(false)
--- End diff --

I think it's common that we turn on new feature by default, if there is no 
known regression. And turn it off if we find regression later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20895: [SPARK-23787][tests] Fix file download test in SparkSubm...

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20895
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88553/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20895: [SPARK-23787][tests] Fix file download test in SparkSubm...

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20895
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20895: [SPARK-23787][tests] Fix file download test in SparkSubm...

2018-03-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20895
  
**[Test build #88553 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88553/testReport)**
 for PR 20895 at commit 
[`80b84ac`](https://github.com/apache/spark/commit/80b84ac66b1647f2b2b2191a8cf516574c519474).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20884: [SPARK-23773][SQL] JacksonGenerator does not incl...

2018-03-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20884#discussion_r176897268
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala
 ---
@@ -56,7 +56,7 @@ class JacksonGeneratorSuite extends SparkFunSuite {
 val gen = new JacksonGenerator(dataType, writer, option)
 gen.write(input)
 gen.flush()
-assert(writer.toString === """[{}]""")
+assert(writer.toString === """[{"a":null}]""")
--- End diff --

I think you should compare this:

```scala
scala> sql(""" select array(cast(null as struct)) as 
my_array""").toJSON.collect().foreach(println)
{"my_array":[null]}

scala> sql(""" select array(struct(cast(null as string))) as 
my_array""").toJSON.collect().foreach(println)
{"my_array":[{}]}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20896: [SPARK-23788][SS] Fix race in StreamingQuerySuite

2018-03-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20896
  
**[Test build #88555 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88555/testReport)**
 for PR 20896 at commit 
[`e257b69`](https://github.com/apache/spark/commit/e257b69044966a9b797886f367b3cd1792c2d687).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20896: [SPARK-23788][SS] Fix race in StreamingQuerySuite

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20896
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20896: [SPARK-23788][SS] Fix race in StreamingQuerySuite

2018-03-23 Thread jose-torres
Github user jose-torres commented on the issue:

https://github.com/apache/spark/pull/20896
  
@zsxwing 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20896: [SPARK-23788][SS] Fix race in StreamingQuerySuite

2018-03-23 Thread jose-torres
GitHub user jose-torres opened a pull request:

https://github.com/apache/spark/pull/20896

[SPARK-23788][SS] Fix race in StreamingQuerySuite

## What changes were proposed in this pull request?

The serializability test uses the same MemoryStream instance for 3 
different queries. If any of those queries ask it to commit before the others 
have run, the rest will see empty dataframes. This can fail the test if q3 is 
affected.

We should use one instance per query instead.

## How was this patch tested?

Existing unit test. If I move q2.processAllAvailable() before starting q3, 
the test always fails without the fix.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jose-torres/spark fixrace

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20896.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20896






---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20894
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20894
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88551/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-03-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20894
  
**[Test build #88551 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88551/testReport)**
 for PR 20894 at commit 
[`811df6f`](https://github.com/apache/spark/commit/811df6fa7b17ff12bdd70318cf330a0f54815397).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20777: [SPARK-23615][ML][PYSPARK]Add maxDF Parameter to Python ...

2018-03-23 Thread huaxingao
Github user huaxingao commented on the issue:

https://github.com/apache/spark/pull/20777
  
Thank you very much for your help! @BryanCutler 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19041: [SPARK-21097][CORE] Add option to recover cached data

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19041
  
Build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19041: [SPARK-21097][CORE] Add option to recover cached data

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19041
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88550/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19041: [SPARK-21097][CORE] Add option to recover cached data

2018-03-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19041
  
**[Test build #88550 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88550/testReport)**
 for PR 19041 at commit 
[`c79b68f`](https://github.com/apache/spark/commit/c79b68f8b22e5f0137f5c3431dfc1b124bad3d77).
 * This patch **fails PySpark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20839: [SPARK-23699][PYTHON][SQL] Raise same type of error caug...

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20839
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20839: [SPARK-23699][PYTHON][SQL] Raise same type of error caug...

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20839
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88554/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20839: [SPARK-23699][PYTHON][SQL] Raise same type of error caug...

2018-03-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20839
  
**[Test build #88554 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88554/testReport)**
 for PR 20839 at commit 
[`5a43edf`](https://github.com/apache/spark/commit/5a43edf6c2ad0b6dda155a90e8831181376502e7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...

2018-03-23 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20858
  
ok, I'll check later!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20839: [SPARK-23699][PYTHON][SQL] Raise same type of error caug...

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20839
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1733/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20839: [SPARK-23699][PYTHON][SQL] Raise same type of error caug...

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20839
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20839: [SPARK-23699][PYTHON][SQL] Raise same type of error caug...

2018-03-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20839
  
**[Test build #88554 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88554/testReport)**
 for PR 20839 at commit 
[`5a43edf`](https://github.com/apache/spark/commit/5a43edf6c2ad0b6dda155a90e8831181376502e7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20777: [SPARK-23615][ML][PYSPARK]Add maxDF Parameter to Python ...

2018-03-23 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/20777
  
merged to master! thanks @huaxingao 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20777: [SPARK-23615][ML][PYSPARK]Add maxDF Parameter to ...

2018-03-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20777


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20839: [SPARK-23699][PYTHON][SQL] Raise same type of error caug...

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20839
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20839: [SPARK-23699][PYTHON][SQL] Raise same type of error caug...

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20839
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88552/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20839: [SPARK-23699][PYTHON][SQL] Raise same type of error caug...

2018-03-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20839
  
**[Test build #88552 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88552/testReport)**
 for PR 20839 at commit 
[`1a6be1d`](https://github.com/apache/spark/commit/1a6be1df25a41b5bdcfc0e47378a757be384efab).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20895: [SPARK-23787][tests] Fix file download test in SparkSubm...

2018-03-23 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/20895
  
@jerryshao 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20895: [SPARK-23787][tests] Fix file download test in SparkSubm...

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20895
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1732/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20895: [SPARK-23787][tests] Fix file download test in SparkSubm...

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20895
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20895: [SPARK-23787][tests] Fix file download test in SparkSubm...

2018-03-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20895
  
**[Test build #88553 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88553/testReport)**
 for PR 20895 at commit 
[`80b84ac`](https://github.com/apache/spark/commit/80b84ac66b1647f2b2b2191a8cf516574c519474).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20895: [SPARK-23787][tests] Fix file download test in Sp...

2018-03-23 Thread vanzin
GitHub user vanzin opened a pull request:

https://github.com/apache/spark/pull/20895

[SPARK-23787][tests] Fix file download test in SparkSubmitSuite for Hadoop 
2.9.

This particular test assumed that Hadoop libraries did not support
http as a file system. Hadoop 2.9 does, so the test failed. The test
now forces a non-existent implementation for the http fs, which
forces the expected error.

There were also a couple of other issues in the same test: SparkSubmit
arguments in the wrong order, and the wrong check later when asserting,
which was being masked by the previous issues.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vanzin/spark SPARK-23787

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20895.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20895


commit 80b84ac66b1647f2b2b2191a8cf516574c519474
Author: Marcelo Vanzin 
Date:   2018-03-23T22:24:39Z

[SPARK-23787][tests] Fix file download test in SparkSubmitSuite for Hadoop 
2.9.

This particular test assumed that Hadoop libraries did not support
http as a file system. Hadoop 2.9 does, so the test failed. The test
now forces a non-existent implementation for the http fs, which
forces the expected error.

There were also a couple of other issues in the same test: SparkSubmit
arguments in the wrong order, and the wrong check later when asserting,
which was being masked by the previous issues.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20888: [SPARK-23775][TEST] DataFrameRangeSuite should wait for ...

2018-03-23 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue:

https://github.com/apache/spark/pull/20888
  
@vanzin @squito yeah, there is an issue with threading as well. I'm just 
taking a look at it because it's not obvious.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20839: [SPARK-23699][PYTHON][SQL] Raise same type of error caug...

2018-03-23 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/20839
  
### After Reworded Warnings

```
In [2]: spark.createDataFrame(pd.DataFrame([[{u'a': 1}]]), "a: map")
/home/bryan/git/spark/python/pyspark/sql/session.py:688: UserWarning: 
createDataFrame attempted Arrow optimization because 
'spark.sql.execution.arrow.enabled' is set to true, but has reached the error 
below and will not continue because automatic fallback with 
'spark.sql.execution.arrow.fallback.enabled' has been set to false.
  PyArrow >= 0.8.0 must be installed; however, it was not found.
  warnings.warn(msg)
---
ImportError   Traceback (most recent call last)
 in ()
> 1 spark.createDataFrame(pd.DataFrame([[{u'a': 1}]]), "a: map")

~/git/spark/python/pyspark/sql/session.py in createDataFrame(self, data, 
schema, samplingRatio, verifySchema)
665 and len(data) > 0:
666 try:
--> 667 return 
self._create_from_pandas_with_arrow(data, schema, timezone)
668 except Exception as e:
669 from pyspark.util import _exception_message

~/git/spark/python/pyspark/sql/session.py in 
_create_from_pandas_with_arrow(self, pdf, schema, timezone)
508 
509 require_minimum_pandas_version()
--> 510 require_minimum_pyarrow_version()
511 
512 from pandas.api.types import is_datetime64_dtype, 
is_datetime64tz_dtype

~/git/spark/python/pyspark/sql/utils.py in require_minimum_pyarrow_version()
147 if not have_arrow:
148 raise ImportError("PyArrow >= %s must be installed; 
however, "
--> 149   "it was not found." % 
minimum_pyarrow_version)
150 if LooseVersion(pyarrow.__version__) < 
LooseVersion(minimum_pyarrow_version):
151 raise ImportError("PyArrow >= %s must be installed; 
however, "

ImportError: PyArrow >= 0.8.0 must be installed; however, it was not found.
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20888: [SPARK-23775][TEST] DataFrameRangeSuite should wait for ...

2018-03-23 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/20888
  
I had sort of the same doubt as Imran, because I thought scalasuite ran 
tests in order... but I just ran a suite where the tests were not run in the 
order declared in the source file. So it sounds possible that this bug would be 
hit even when not running that test in isolation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20839: [SPARK-23699][PYTHON][SQL] Raise same type of error caug...

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20839
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20839: [SPARK-23699][PYTHON][SQL] Raise same type of error caug...

2018-03-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20839
  
**[Test build #88552 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88552/testReport)**
 for PR 20839 at commit 
[`1a6be1d`](https://github.com/apache/spark/commit/1a6be1df25a41b5bdcfc0e47378a757be384efab).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20839: [SPARK-23699][PYTHON][SQL] Raise same type of error caug...

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20839
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1731/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20208
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88548/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20208
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-03-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20208
  
**[Test build #88548 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88548/testReport)**
 for PR 20208 at commit 
[`6085986`](https://github.com/apache/spark/commit/6085986a3d0c5b00c281b2543f3bfe6ed4e1813c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20885: [SPARK-23724][SPARK-23765][SQL] Line separator fo...

2018-03-23 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20885#discussion_r176867866
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -770,12 +773,15 @@ def json(self, path, mode=None, compression=None, 
dateFormat=None, timestampForm
 formats follow the formats at 
``java.text.SimpleDateFormat``.
 This applies to timestamp type. If None is 
set, it uses the
 default value, 
``-MM-dd'T'HH:mm:ss.SSSXXX``.
+:param lineSep: defines the line separator that should be used for 
writing. If None is
+set, it uses the default value, ``\\n``.
--- End diff --

It is a method of DataFrameWriter. It writes exactly `'\n'` 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-03-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20894
  
**[Test build #88551 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88551/testReport)**
 for PR 20894 at commit 
[`811df6f`](https://github.com/apache/spark/commit/811df6fa7b17ff12bdd70318cf330a0f54815397).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20894
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-03-23 Thread MaxGekk
GitHub user MaxGekk opened a pull request:

https://github.com/apache/spark/pull/20894

[SPARK-23786][SQL] Checking column names of csv headers

## What changes were proposed in this pull request?

Currently column names of headers in CSV files are not checked against 
provided schema of CSV data. It could cause errors like showed in the 
[SPARK-23786](https://issues.apache.org/jira/browse/SPARK-23786). I introduced 
new CSV option - `checkHeader` (`true` by default) which enables checking of 
column names against schema's fields. The checking is performed during 
processing of the first partition of csv files. If names are not matched, the 
following exception is thrown:

```
java.lang.IllegalArgumentException: Fields in the header of csv file are 
not matched to field names of the schema:
 Header: depth, temperature
 Schema: temperature, depth
``` 

## How was this patch tested?

The changes were tested by existing tests of CSVSuite and by 2 new tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MaxGekk/spark-1 check-column-names

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20894.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20894


commit 112ce2d34d0d039711777351c1ab8e74629fc8e6
Author: Maxim Gekk 
Date:   2018-03-20T15:30:44Z

Checks column names are compatible to provided schema

commit a85ccce23c3c5ee69ff321303ad830c71dd05931
Author: Maxim Gekk 
Date:   2018-03-20T20:51:03Z

Checking header is matched to schema in per-line mode

commit 75e15345b6a5a9e807375fdf465dccfce4ea62c7
Author: Maxim Gekk 
Date:   2018-03-20T21:36:56Z

Extract header and check that it is matched to schema

commit 8eb45b8b634ba2c9b641de12e09f17c63240ccc4
Author: Maxim Gekk 
Date:   2018-03-21T10:57:30Z

Checking column names in header in multiLine mode

commit 9b1a9862531b8d3fb3cffce75126413ca9a844b9
Author: Maxim Gekk 
Date:   2018-03-21T11:13:17Z

Adding the checkHeader option with true by default

commit 64426332b2ab42a1cd9c5a05a77e90332572bbec
Author: Maxim Gekk 
Date:   2018-03-21T11:25:31Z

Fix csv test by changing headers or disabling header checking

commit 9440d8a5c097a1d8e111b397fbda9e54751b7a84
Author: Maxim Gekk 
Date:   2018-03-21T11:36:21Z

Adding comment for the checkHeader option

commit 9f91ce73c5c313a9c51067a81e395e9385016ec5
Author: Maxim Gekk 
Date:   2018-03-21T11:42:48Z

Added comments

commit 0878f7aad3c074e63ac3ab1d6e471ce8b988f278
Author: Maxim Gekk 
Date:   2018-03-21T12:09:20Z

Adding a space between column names

commit a341dd79c976df59fc8bffb272449973a09b86fe
Author: Maxim Gekk 
Date:   2018-03-21T15:15:14Z

Fix a test: checking name duplication in schemas

commit 98c27eaa80cf3fae11092d78f22122688e4041a4
Author: Maxim Gekk 
Date:   2018-03-23T21:04:57Z

Fixing the test and adding ticket number to test's title

commit 811df6fa7b17ff12bdd70318cf330a0f54815397
Author: Maxim Gekk 
Date:   2018-03-23T21:10:20Z

Refactoring - removing unneeded parameter




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20893: [SPARK-23785][LAUNCHER] LauncherBackend doesn't check st...

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20893
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20893: [SPARK-23785][LAUNCHER] LauncherBackend doesn't check st...

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20893
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20893: [SPARK-23785][LAUNCHER] LauncherBackend doesn't c...

2018-03-23 Thread sahilTakiar
GitHub user sahilTakiar opened a pull request:

https://github.com/apache/spark/pull/20893

[SPARK-23785][LAUNCHER] LauncherBackend doesn't check state of connection 
before setting state

## What changes were proposed in this pull request?

Changed `LauncherBackend` `set` method so that it checks if the connection 
is open or not before writing to it (uses `isConnected`).

## How was this patch tested?

None

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sahilTakiar/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20893.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20893


commit 1896236c9b79975e8add4017eb7fafc1cec59d70
Author: Sahil Takiar 
Date:   2018-03-23T20:24:21Z

[SPARK-23785][LAUNCHER] LauncherBackend doesn't check state of connection 
before setting state




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19041: [SPARK-21097][CORE] Add option to recover cached ...

2018-03-23 Thread brad-kaiser
Github user brad-kaiser commented on a diff in the pull request:

https://github.com/apache/spark/pull/19041#discussion_r176854060
  
--- Diff: 
core/src/test/scala/org/apache/spark/CacheRecoveryManagerSuite.scala ---
@@ -0,0 +1,200 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import java.util.concurrent.{ConcurrentHashMap, TimeUnit}
+import java.util.concurrent.atomic.AtomicInteger
+
+import scala.concurrent.{Future, Promise}
+import scala.concurrent.ExecutionContext.Implicits.global
+import scala.concurrent.duration.Duration
+import scala.reflect.ClassTag
+
+import org.mockito.Mockito._
+import org.scalatest.Matchers
+import org.scalatest.concurrent.Eventually
+import org.scalatest.mockito.MockitoSugar
+import org.scalatest.time.{Millis, Span}
+
+import 
org.apache.spark.internal.config.DYN_ALLOCATION_CACHE_RECOVERY_TIMEOUT
+import org.apache.spark.rpc._
+import org.apache.spark.storage.{BlockId, BlockManagerId, RDDBlockId}
+import org.apache.spark.storage.BlockManagerMessages._
+import org.apache.spark.util.ThreadUtils
+
+class CacheRecoveryManagerSuite
+  extends SparkFunSuite with MockitoSugar with Matchers with Eventually {
+
+  val oneGB: Long = 1024L * 1024L * 1024L * 1024L
+  val plentyOfMem = Map(
+BlockManagerId("1", "host", 12, None) -> ((oneGB, oneGB)),
+BlockManagerId("2", "host", 12, None) -> ((oneGB, oneGB)),
+BlockManagerId("3", "host", 12, None) -> ((oneGB, oneGB)))
+
+  test("CacheRecoveryManager will replicate blocks until empty and then 
kill executor") {
+val conf = new SparkConf()
+val eam = mock[ExecutorAllocationManager]
+val blocks = Seq(RDDBlockId(1, 1), RDDBlockId(2, 1))
+val bmme = FakeBMM(1, blocks.iterator, plentyOfMem)
+val bmmeRef = DummyRef(bmme)
+val cacheRecoveryManager = new CacheRecoveryManager(bmmeRef, eam, conf)
+when(eam.killExecutors(Seq("1"))).thenReturn(Seq("1"))
+val result = cacheRecoveryManager.startCacheRecovery(Seq("1"))
+
+eventually {
+  verify(eam).killExecutors(Seq("1"))
+  bmme.replicated.get("1").get shouldBe 2
+}
+
+cleanup(result, cacheRecoveryManager)
+  }
+
+  test("CacheRecoveryManager will kill executor if it takes too long to 
replicate") {
+val conf = new 
SparkConf().set(DYN_ALLOCATION_CACHE_RECOVERY_TIMEOUT.key, "1s")
+val eam = mock[ExecutorAllocationManager]
+val blocks = Set(RDDBlockId(1, 1), RDDBlockId(2, 1), RDDBlockId(3, 1), 
RDDBlockId(4, 1))
+val bmme = FakeBMM(600, blocks.iterator, plentyOfMem)
+val bmmeRef = DummyRef(bmme)
+val cacheRecoveryManager = new CacheRecoveryManager(bmmeRef, eam, conf)
+val result = cacheRecoveryManager.startCacheRecovery(Seq("1"))
+
+eventually(timeout(Span(1010, Millis)), interval(Span(500, Millis))) {
+  verify(eam, times(1)).killExecutors(Seq("1"))
+  bmme.replicated.get("1").get shouldBe 1
+}
+
+cleanup(result, cacheRecoveryManager)
+  }
+
+  test("shutdown timer will get cancelled if replication finishes") {
+val conf = new 
SparkConf().set(DYN_ALLOCATION_CACHE_RECOVERY_TIMEOUT.key, "1s")
+val eam = mock[ExecutorAllocationManager]
+val blocks = Set(RDDBlockId(1, 1))
+val bmme = FakeBMM(1, blocks.iterator, plentyOfMem)
+val bmmeRef = DummyRef(bmme)
+val cacheRecoveryManager = new CacheRecoveryManager(bmmeRef, eam, conf)
+
+val result = cacheRecoveryManager.startCacheRecovery(Seq("1"))
+val minimumTime = System.currentTimeMillis() + 1000
+
+eventually(timeout(Span(1500, Millis)), interval(Span(500, Millis))) {
+  // should be killed once not twice
+  verify(eam, times(1)).killExecutors(Seq("1"))
+  // wait at least a second to be sure we don't kill executor twice
+  System.currentTimeMillis() should be > minimumTime
+}
+
+

[GitHub] spark pull request #19041: [SPARK-21097][CORE] Add option to recover cached ...

2018-03-23 Thread brad-kaiser
Github user brad-kaiser commented on a diff in the pull request:

https://github.com/apache/spark/pull/19041#discussion_r176854000
  
--- Diff: 
core/src/test/scala/org/apache/spark/CacheRecoveryManagerSuite.scala ---
@@ -0,0 +1,200 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import java.util.concurrent.{ConcurrentHashMap, TimeUnit}
+import java.util.concurrent.atomic.AtomicInteger
+
+import scala.concurrent.{Future, Promise}
+import scala.concurrent.ExecutionContext.Implicits.global
+import scala.concurrent.duration.Duration
+import scala.reflect.ClassTag
+
+import org.mockito.Mockito._
+import org.scalatest.Matchers
+import org.scalatest.concurrent.Eventually
+import org.scalatest.mockito.MockitoSugar
+import org.scalatest.time.{Millis, Span}
+
+import 
org.apache.spark.internal.config.DYN_ALLOCATION_CACHE_RECOVERY_TIMEOUT
+import org.apache.spark.rpc._
+import org.apache.spark.storage.{BlockId, BlockManagerId, RDDBlockId}
+import org.apache.spark.storage.BlockManagerMessages._
+import org.apache.spark.util.ThreadUtils
+
+class CacheRecoveryManagerSuite
+  extends SparkFunSuite with MockitoSugar with Matchers with Eventually {
+
+  val oneGB: Long = 1024L * 1024L * 1024L * 1024L
+  val plentyOfMem = Map(
+BlockManagerId("1", "host", 12, None) -> ((oneGB, oneGB)),
+BlockManagerId("2", "host", 12, None) -> ((oneGB, oneGB)),
+BlockManagerId("3", "host", 12, None) -> ((oneGB, oneGB)))
+
+  test("CacheRecoveryManager will replicate blocks until empty and then 
kill executor") {
+val conf = new SparkConf()
+val eam = mock[ExecutorAllocationManager]
+val blocks = Seq(RDDBlockId(1, 1), RDDBlockId(2, 1))
+val bmme = FakeBMM(1, blocks.iterator, plentyOfMem)
+val bmmeRef = DummyRef(bmme)
+val cacheRecoveryManager = new CacheRecoveryManager(bmmeRef, eam, conf)
+when(eam.killExecutors(Seq("1"))).thenReturn(Seq("1"))
+val result = cacheRecoveryManager.startCacheRecovery(Seq("1"))
+
+eventually {
+  verify(eam).killExecutors(Seq("1"))
+  bmme.replicated.get("1").get shouldBe 2
+}
+
+cleanup(result, cacheRecoveryManager)
+  }
+
+  test("CacheRecoveryManager will kill executor if it takes too long to 
replicate") {
+val conf = new 
SparkConf().set(DYN_ALLOCATION_CACHE_RECOVERY_TIMEOUT.key, "1s")
+val eam = mock[ExecutorAllocationManager]
+val blocks = Set(RDDBlockId(1, 1), RDDBlockId(2, 1), RDDBlockId(3, 1), 
RDDBlockId(4, 1))
+val bmme = FakeBMM(600, blocks.iterator, plentyOfMem)
+val bmmeRef = DummyRef(bmme)
+val cacheRecoveryManager = new CacheRecoveryManager(bmmeRef, eam, conf)
+val result = cacheRecoveryManager.startCacheRecovery(Seq("1"))
+
+eventually(timeout(Span(1010, Millis)), interval(Span(500, Millis))) {
+  verify(eam, times(1)).killExecutors(Seq("1"))
+  bmme.replicated.get("1").get shouldBe 1
+}
+
+cleanup(result, cacheRecoveryManager)
+  }
+
+  test("shutdown timer will get cancelled if replication finishes") {
+val conf = new 
SparkConf().set(DYN_ALLOCATION_CACHE_RECOVERY_TIMEOUT.key, "1s")
+val eam = mock[ExecutorAllocationManager]
+val blocks = Set(RDDBlockId(1, 1))
+val bmme = FakeBMM(1, blocks.iterator, plentyOfMem)
+val bmmeRef = DummyRef(bmme)
+val cacheRecoveryManager = new CacheRecoveryManager(bmmeRef, eam, conf)
+
+val result = cacheRecoveryManager.startCacheRecovery(Seq("1"))
--- End diff --

Updated how the futures returned from .startCacheRecovery work so now I do 
just can just check the future return values. This is fixed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19041: [SPARK-21097][CORE] Add option to recover cached ...

2018-03-23 Thread brad-kaiser
Github user brad-kaiser commented on a diff in the pull request:

https://github.com/apache/spark/pull/19041#discussion_r176854025
  
--- Diff: 
core/src/test/scala/org/apache/spark/CacheRecoveryManagerSuite.scala ---
@@ -0,0 +1,200 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import java.util.concurrent.{ConcurrentHashMap, TimeUnit}
+import java.util.concurrent.atomic.AtomicInteger
+
+import scala.concurrent.{Future, Promise}
+import scala.concurrent.ExecutionContext.Implicits.global
+import scala.concurrent.duration.Duration
+import scala.reflect.ClassTag
+
+import org.mockito.Mockito._
+import org.scalatest.Matchers
+import org.scalatest.concurrent.Eventually
+import org.scalatest.mockito.MockitoSugar
+import org.scalatest.time.{Millis, Span}
+
+import 
org.apache.spark.internal.config.DYN_ALLOCATION_CACHE_RECOVERY_TIMEOUT
+import org.apache.spark.rpc._
+import org.apache.spark.storage.{BlockId, BlockManagerId, RDDBlockId}
+import org.apache.spark.storage.BlockManagerMessages._
+import org.apache.spark.util.ThreadUtils
+
+class CacheRecoveryManagerSuite
+  extends SparkFunSuite with MockitoSugar with Matchers with Eventually {
+
+  val oneGB: Long = 1024L * 1024L * 1024L * 1024L
+  val plentyOfMem = Map(
+BlockManagerId("1", "host", 12, None) -> ((oneGB, oneGB)),
+BlockManagerId("2", "host", 12, None) -> ((oneGB, oneGB)),
+BlockManagerId("3", "host", 12, None) -> ((oneGB, oneGB)))
+
+  test("CacheRecoveryManager will replicate blocks until empty and then 
kill executor") {
+val conf = new SparkConf()
+val eam = mock[ExecutorAllocationManager]
+val blocks = Seq(RDDBlockId(1, 1), RDDBlockId(2, 1))
+val bmme = FakeBMM(1, blocks.iterator, plentyOfMem)
+val bmmeRef = DummyRef(bmme)
+val cacheRecoveryManager = new CacheRecoveryManager(bmmeRef, eam, conf)
+when(eam.killExecutors(Seq("1"))).thenReturn(Seq("1"))
+val result = cacheRecoveryManager.startCacheRecovery(Seq("1"))
+
+eventually {
+  verify(eam).killExecutors(Seq("1"))
+  bmme.replicated.get("1").get shouldBe 2
+}
+
+cleanup(result, cacheRecoveryManager)
+  }
+
+  test("CacheRecoveryManager will kill executor if it takes too long to 
replicate") {
+val conf = new 
SparkConf().set(DYN_ALLOCATION_CACHE_RECOVERY_TIMEOUT.key, "1s")
+val eam = mock[ExecutorAllocationManager]
+val blocks = Set(RDDBlockId(1, 1), RDDBlockId(2, 1), RDDBlockId(3, 1), 
RDDBlockId(4, 1))
+val bmme = FakeBMM(600, blocks.iterator, plentyOfMem)
+val bmmeRef = DummyRef(bmme)
+val cacheRecoveryManager = new CacheRecoveryManager(bmmeRef, eam, conf)
+val result = cacheRecoveryManager.startCacheRecovery(Seq("1"))
+
+eventually(timeout(Span(1010, Millis)), interval(Span(500, Millis))) {
+  verify(eam, times(1)).killExecutors(Seq("1"))
+  bmme.replicated.get("1").get shouldBe 1
+}
+
+cleanup(result, cacheRecoveryManager)
+  }
+
+  test("shutdown timer will get cancelled if replication finishes") {
+val conf = new 
SparkConf().set(DYN_ALLOCATION_CACHE_RECOVERY_TIMEOUT.key, "1s")
+val eam = mock[ExecutorAllocationManager]
+val blocks = Set(RDDBlockId(1, 1))
+val bmme = FakeBMM(1, blocks.iterator, plentyOfMem)
+val bmmeRef = DummyRef(bmme)
+val cacheRecoveryManager = new CacheRecoveryManager(bmmeRef, eam, conf)
+
+val result = cacheRecoveryManager.startCacheRecovery(Seq("1"))
+val minimumTime = System.currentTimeMillis() + 1000
+
+eventually(timeout(Span(1500, Millis)), interval(Span(500, Millis))) {
+  // should be killed once not twice
+  verify(eam, times(1)).killExecutors(Seq("1"))
+  // wait at least a second to be sure we don't kill executor twice
+  System.currentTimeMillis() should be > minimumTime
+}
+
+

[GitHub] spark issue #19041: [SPARK-21097][CORE] Add option to recover cached data

2018-03-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19041
  
**[Test build #88550 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88550/testReport)**
 for PR 19041 at commit 
[`c79b68f`](https://github.com/apache/spark/commit/c79b68f8b22e5f0137f5c3431dfc1b124bad3d77).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20892: [SPARK-23700][PYTHON] Cleanup imports in pyspark.sql

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20892
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20892: [SPARK-23700][PYTHON] Cleanup imports in pyspark.sql

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20892
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88549/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20892: [SPARK-23700][PYTHON] Cleanup imports in pyspark.sql

2018-03-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20892
  
**[Test build #88549 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88549/testReport)**
 for PR 20892 at commit 
[`5214f41`](https://github.com/apache/spark/commit/5214f411d28a19b244a97ffe25f8be5852e273c1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...

2018-03-23 Thread mn-mikke
Github user mn-mikke commented on a diff in the pull request:

https://github.com/apache/spark/pull/20858#discussion_r176847009
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 ---
@@ -699,3 +699,88 @@ abstract class TernaryExpression extends Expression {
  * and Hive function wrappers.
  */
 trait UserDefinedExpression
+
+/**
+ * The trait covers logic for performing null save evaluation and code 
generation.
+ */
+trait NullSafeEvaluation extends Expression
+{
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  override def nullable: Boolean = children.exists(_.nullable)
+
+  /**
+   * Default behavior of evaluation according to the default nullability 
of NullSafeEvaluation.
+   * If a class utilizing NullSaveEvaluation override [[nullable]], 
probably should also
+   * override this.
+   */
+  override def eval(input: InternalRow): Any =
+  {
+val values = children.map(_.eval(input))
+if (values.contains(null)) null
+else nullSafeEval(values)
+  }
+
+  /**
+   * Called by default [[eval]] implementation. If a class utilizing 
NullSaveEvaluation keep
+   * the default nullability, they can override this method to save 
null-check code.  If we need
+   * full control of evaluation process, we should override [[eval]].
+   */
+  protected def nullSafeEval(inputs: Seq[Any]): Any =
+sys.error(s"The class utilizing NullSaveEvaluation must override 
either eval or nullSafeEval")
+
+  /**
+   * Short hand for generating of null save evaluation code.
+   * If either of the sub-expressions is null, the result of this 
computation
+   * is assumed to be null.
+   *
+   * @param f accepts a sequence of variable names and returns Java code 
to compute the output.
+   */
+  protected def defineCodeGen(
+ctx: CodegenContext,
+ev: ExprCode,
+f: Seq[String] => String): ExprCode = {
+nullSafeCodeGen(ctx, ev, values => {
+  s"${ev.value} = ${f(values)};"
+})
+  }
+
+  /**
+   * Called by expressions to generate null safe evaluation code.
+   * If either of the sub-expressions is null, the result of this 
computation
+   * is assumed to be null.
+   *
+   * @param f a function that accepts a sequence of non-null evaluation 
result names of children
+   *  and returns Java code to compute the output.
+   */
+  protected def nullSafeCodeGen(
--- End diff --

@WeichenXu123 I do agree that there are strong similarities in the code.

If you take a look at `UniryExpression`, `BinaryExpression`, 
`TernaryExpression`, you will see that methods responsible for null save 
evaluation and code generation are the same except the number of parameters. My 
intention has been to generalize the methods into the `NullSaveEvaluation` 
trait and remove the original methods in a different PR once the trait is in. I 
didn't want to create a big bang PR because of one additional function in API.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19876: [ML][SPARK-23783][SPARK-11239] Add PMML export to Spark ...

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19876
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88546/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19876: [ML][SPARK-23783][SPARK-11239] Add PMML export to Spark ...

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19876
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19876: [ML][SPARK-23783][SPARK-11239] Add PMML export to Spark ...

2018-03-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19876
  
**[Test build #88546 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88546/testReport)**
 for PR 19876 at commit 
[`cb6fd70`](https://github.com/apache/spark/commit/cb6fd70d0c61b6477f7514431ee2e1c097ec0aff).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20884: [SPARK-23773][SQL] JacksonGenerator does not incl...

2018-03-23 Thread makagonov
Github user makagonov commented on a diff in the pull request:

https://github.com/apache/spark/pull/20884#discussion_r176842732
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala
 ---
@@ -56,7 +56,7 @@ class JacksonGeneratorSuite extends SparkFunSuite {
 val gen = new JacksonGenerator(dataType, writer, option)
 gen.write(input)
 gen.flush()
-assert(writer.toString === """[{}]""")
+assert(writer.toString === """[{"a":null}]""")
--- End diff --

@HyukjinKwon actually, it looks like the result should be `[null]` rather 
than `[{}]`.
Look at the following repro from spark-shell (downloaded binaries):
```scala
scala> val df = sqlContext.sql(""" select array(cast(null as 
struct)) as my_array""")
df: org.apache.spark.sql.DataFrame = [my_array: array]

scala> df.printSchema
root
 |-- my_array: array (nullable = false)
 ||-- element: struct (containsNull = true)
 |||-- k: string (nullable = true)
scala> df.toJSON.collect().foreach(println)
{"my_array":[null]}
scala> df.select(to_json($"my_array")).collect().foreach(x => println(x(0)))
[null]
```

In older version of `JacksonGenerator`, we had a filter by element value, 
and if it was `null`, `gen.writeNull()` was called no matter what the type was 
([old 
implementation](https://github.com/apache/spark/blob/3258f27a881dfeb5ab8bae90c338603fa4b6f9d8/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JacksonGenerator.scala#L41)).
 But currently, we're calling `gen.writeStartObject()...gen.writeEndObject()` 
no matter if the value is null.

I couldn't repro this with a query, but when `StructsToJson` is called from 
this unit test, it goes through `JacksonGenerator.arrElementWriter` which has 
lines
```scala
case st: StructType =>
(arr: SpecializedGetters, i: Int) => {
  writeObject(writeFields(arr.getStruct(i, st.length), st, 
rootFieldWriters))
}
```
that makes it print json object even there is `null`.

I'll look into this later and will try to find the easy workaround.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...

2018-03-23 Thread mn-mikke
Github user mn-mikke commented on a diff in the pull request:

https://github.com/apache/spark/pull/20858#discussion_r176841337
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 ---
@@ -408,6 +408,7 @@ object FunctionRegistry {
 expression[MapValues]("map_values"),
 expression[Size]("size"),
 expression[SortArray]("sort_array"),
+expression[ConcatArrays]("concat_arrays"),
--- End diff --

Ok, will merge the functions into one. Do you find having one expression 
class concatenation per the concatenation type ok?

I'm afraid if I incorporate all the logic into one expression class then 
the code will become messy since each codeGen and eveluation has a different 
nature.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet state sho...

2018-03-23 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/18982
  
Thanks @holdenk @HyukjinKwon and @viirya !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20892: [SPARK-23700][PYTHON] Cleanup imports in pyspark.sql

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20892
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20890: [WIP][SPARK-23779][SQL] TaskMemoryManager and UnsafeSort...

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20890
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88544/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20890: [WIP][SPARK-23779][SQL] TaskMemoryManager and UnsafeSort...

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20890
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20892: [SPARK-23700][PYTHON] Cleanup imports in pyspark.sql

2018-03-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20892
  
**[Test build #88549 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88549/testReport)**
 for PR 20892 at commit 
[`5214f41`](https://github.com/apache/spark/commit/5214f411d28a19b244a97ffe25f8be5852e273c1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20892: [SPARK-23700][PYTHON] Cleanup imports in pyspark.sql

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20892
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1730/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20890: [WIP][SPARK-23779][SQL] TaskMemoryManager and UnsafeSort...

2018-03-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20890
  
**[Test build #88544 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88544/testReport)**
 for PR 20890 at commit 
[`4a0a5e3`](https://github.com/apache/spark/commit/4a0a5e34d5efebcdf9f58d70bff8d9e46d953099).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20892: [SPARK-23700][PYTHON] Cleanup imports in pyspark.sql

2018-03-23 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/20892
  
Jenkins retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20892: [SPARK-23700][PYTHON] Cleanup imports in pyspark.sql

2018-03-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20892
  
**[Test build #88547 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88547/testReport)**
 for PR 20892 at commit 
[`5214f41`](https://github.com/apache/spark/commit/5214f411d28a19b244a97ffe25f8be5852e273c1).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20892: [SPARK-23700][PYTHON] Cleanup imports in pyspark.sql

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20892
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20892: [SPARK-23700][PYTHON] Cleanup imports in pyspark.sql

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20892
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88547/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20839: [SPARK-23699][PYTHON][SQL] Raise same type of error caug...

2018-03-23 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/20839
  
Thanks @HyukjinKwon for reviewing! I agree with what you said about the 
rephrasing the warning message, I'll try to make that sound better.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19876: [ML][SPARK-23783][SPARK-11239] Add PMML export to...

2018-03-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19876


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20839: [SPARK-23699][PYTHON][SQL] Raise same type of err...

2018-03-23 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/20839#discussion_r176835709
  
--- Diff: python/pyspark/sql/utils.py ---
@@ -121,7 +121,10 @@ def require_minimum_pandas_version():
 from distutils.version import LooseVersion
 try:
 import pandas
+have_pandas = True
 except ImportError:
+have_pandas = False
+if not have_pandas:
--- End diff --

I think having the traceback to the `raise ImportError` below is all the 
information needed.  If that happens, then the only possible cause is that the 
import failed from here.  The problem with how it was before is that for Python 
3, it will print out `During handling of the above exception, another exception 
occurred:` which makes it seem like it is not being handled correctly, since 
it's really just a failed import.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-03-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20208
  
**[Test build #88548 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88548/testReport)**
 for PR 20208 at commit 
[`6085986`](https://github.com/apache/spark/commit/6085986a3d0c5b00c281b2543f3bfe6ed4e1813c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20208
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20208
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1729/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20208: [SPARK-23007][SQL][TEST] Add schema evolution tes...

2018-03-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20208#discussion_r176834608
  
--- Diff: docs/sql-programming-guide.md ---
@@ -815,6 +815,54 @@ should start with, they can set `basePath` in the data 
source options. For examp
 when `path/to/table/gender=male` is the path of the data and
 users set `basePath` to `path/to/table/`, `gender` will be a partitioning 
column.
 
+### Schema Evolution
--- End diff --

@gatorsmile . I rebased to the master and added this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet st...

2018-03-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18982


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet state sho...

2018-03-23 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/18982
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20892: [SPARK-23700][PYTHON] Cleanup imports in pyspark.sql

2018-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20892
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1728/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20892: [SPARK-23700][PYTHON] Cleanup imports in pyspark.sql

2018-03-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20892
  
**[Test build #88547 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88547/testReport)**
 for PR 20892 at commit 
[`5214f41`](https://github.com/apache/spark/commit/5214f411d28a19b244a97ffe25f8be5852e273c1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >