[GitHub] spark issue #22479: [MINOR][PYTHON][TEST] Use collect() instead of show() to...

2018-09-19 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/22479
  
Thanks @HyukjinKwon. LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22479: [MINOR][PYTHON][TEST] Use collect() instead of sh...

2018-09-19 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22479#discussion_r219053623
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -1168,7 +1168,7 @@ def test_simple_udt_in_df(self):
 df = self.spark.createDataFrame(
 [(i % 3, PythonOnlyPoint(float(i), float(i))) for i in 
range(10)],
 schema=schema)
-df.show()
+df.collect()
--- End diff --

LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22483: [MINOR][PYTHON] Use a helper in `PythonUtils` ins...

2018-09-19 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/22483

[MINOR][PYTHON] Use a helper in `PythonUtils` instead of direct accessing 
Scala package

## What changes were proposed in this pull request?

This PR proposes to use add a helper in `PythonUtils` instead of direct 
accessing Scala package.

## How was this patch tested?

Jenkins tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark minor-refactoring

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22483.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22483


commit cce9d4d1bb6c297e15dbec5b53f8ed3163e88d9c
Author: hyukjinkwon 
Date:   2018-09-20T06:34:54Z

Minor refactoring




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21596: [SPARK-24601] Update Jackson to 2.9.6

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21596
  
**[Test build #96330 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96330/testReport)**
 for PR 21596 at commit 
[`44b8d1b`](https://github.com/apache/spark/commit/44b8d1b73cf2cc83b4ebfcc11ccf12951878f2d6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22460: DO NOT MERGE

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22460
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22460: DO NOT MERGE

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22460
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96327/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22479: [MINOR][PYTHON][TEST] Use collect() instead of sh...

2018-09-19 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22479#discussion_r219052270
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -1168,7 +1168,7 @@ def test_simple_udt_in_df(self):
 df = self.spark.createDataFrame(
 [(i % 3, PythonOnlyPoint(float(i), float(i))) for i in 
range(10)],
 schema=schema)
-df.show()
+df.collect()
--- End diff --

cc @viirya since this is added 
[here](https://github.com/apache/spark/commit/146001a9ffefc7aaedd3d888d68c7a9b80bca545#diff-7c2fe8530271c0635fb99f7b49e0c4a4R583).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22399: [SPARK-25408] Move to mode ideomatic Java8

2018-09-19 Thread Fokko
Github user Fokko commented on the issue:

https://github.com/apache/spark/pull/22399
  
@srowen Any incentive to move this forward? Or are PR's like these not 
appreciated? Let me know.

Most of the changes are cosmetic, but having https://github.com/apache/spark/pull/22399/files#diff-6c2c45f79666e2e52eb9f9411fa8b4baR49
makes the codebase a bit nicer in my opinion, since the class already has a 
`.close()` method, it makes sense to also implement the `Closeable` interface.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22443: [SPARK-25339][TEST] Refactor FilterPushdownBenchmark

2018-09-19 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22443
  
Thank you, @gengliangwang !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22408: [SPARK-25417][SQL] ArrayContains function may return inc...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22408
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22443: [SPARK-25339][TEST] Refactor FilterPushdownBenchmark

2018-09-19 Thread gengliangwang
Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/22443
  
@dongjoon-hyun No problem. I was waiting for this PR to be merged.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22460: DO NOT MERGE

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22460
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96310/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22460: DO NOT MERGE

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22460
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22460: DO NOT MERGE

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22460
  
**[Test build #96310 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96310/testReport)**
 for PR 22460 at commit 
[`09baf06`](https://github.com/apache/spark/commit/09baf06505f9da34cdcccdffcc1a4061ed825f44).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22460: DO NOT MERGE

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22460
  
**[Test build #4344 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4344/testReport)**
 for PR 22460 at commit 
[`4106040`](https://github.com/apache/spark/commit/410604012cbd1c9e7c284a1e05f95b3827c728a5).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22467: [SPARK-25465][TEST] Refactor Parquet test suites in proj...

2018-09-19 Thread gengliangwang
Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/22467
  
@xuanyuanking Thanks for the review!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22408: [SPARK-25417][SQL] ArrayContains function may return inc...

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22408
  
**[Test build #96329 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96329/testReport)**
 for PR 22408 at commit 
[`d79e9d4`](https://github.com/apache/spark/commit/d79e9d46bca28c721887625b89814e91e923e7ca).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22482
  
**[Test build #96328 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96328/testReport)**
 for PR 22482 at commit 
[`ad0b746`](https://github.com/apache/spark/commit/ad0b7466ef3f79354a99bd1b95c23e4c308502d5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22460: DO NOT MERGE

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22460
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3283/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22408: [SPARK-25417][SQL] ArrayContains function may return inc...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22408
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22408: [SPARK-25417][SQL] ArrayContains function may return inc...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22408
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3284/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22460: DO NOT MERGE

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22460
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22460: DO NOT MERGE

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22460
  
**[Test build #96327 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96327/testReport)**
 for PR 22460 at commit 
[`3bdb38a`](https://github.com/apache/spark/commit/3bdb38aec74b08b135aa5976982c20f74aae9736).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22467: [SPARK-25465][TEST] Refactor Parquet test suites in proj...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22467
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22467: [SPARK-25465][TEST] Refactor Parquet test suites in proj...

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22467
  
**[Test build #96326 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96326/testReport)**
 for PR 22467 at commit 
[`11d61a4`](https://github.com/apache/spark/commit/11d61a414ee41449feb2db744657696d79db5560).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22467: [SPARK-25465][TEST] Refactor Parquet test suites in proj...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22467
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3282/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not respec...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22462
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-19 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r219039607
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -735,6 +735,60 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
   df.selectExpr("array_contains(array(1, null), array(1, null)[0])"),
   Seq(Row(true), Row(true))
 )
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1), 1.23D)"),
+  Seq(Row(false), Row(false))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1), 1.0D)"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1.0D), 1)"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1.23D), 1)"),
+  Seq(Row(false), Row(false))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(array(1)), array(1.0D))"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(array(1)), array(1.23D))"),
+  Seq(Row(false), Row(false))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(array(1)), array(1.23))"),
--- End diff --

@cloud-fan Yes. it should :-) I think i had changed this test case to 
verify the fix to tighestCommonType.. and pushed it by mistake. Sorry about it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not respec...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22462
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3281/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not respec...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22462
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not respec...

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22462
  
**[Test build #96325 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96325/testReport)**
 for PR 22462 at commit 
[`897cf69`](https://github.com/apache/spark/commit/897cf69a4b3c6eb07eb321c23644167c1bed211b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r219039187
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1879,6 +1879,66 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
 
 ## Upgrading From Spark SQL 2.3 to 2.4
 
+  - In Spark version 2.3 and earlier, the second parameter to 
array_contains function is implicitly promoted to the element type of first 
array type parameter. This type promotion can be lossy and may cause 
`array_contains` function to return wrong result. This problem has been 
addressed in 2.4 by employing a safer type promotion mechanism. This can cause 
some change in behavior and are illustrated in the table below.
+  
+
+  
+Query
+  
+  
+Result Spark 2.3 or Prior
+  
+  
+Result Spark 2.4
+  
+  
+Remarks
+  
+
+
+  
+SELECT  array_contains(array(1), 1.34D);
+  
+  
+true
+  
+  
+false
+  
+  
+In Spark 2.4, both left and right parameters are  promoted 
to array(double) and double type respectively.
+  
+
+
+  
+SELECT  array_contains(array(1), '1');
+  
+  
+true
+  
+  
+AnalysisException is thrown since integer type can not be 
promoted to string type in a loss-less manner.
--- End diff --

Ah then it's fine, we don't need to change anything here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r219038798
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -735,6 +735,60 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
   df.selectExpr("array_contains(array(1, null), array(1, null)[0])"),
   Seq(Row(true), Row(true))
 )
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1), 1.23D)"),
+  Seq(Row(false), Row(false))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1), 1.0D)"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1.0D), 1)"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1.23D), 1)"),
+  Seq(Row(false), Row(false))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(array(1)), array(1.0D))"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(array(1)), array(1.23D))"),
+  Seq(Row(false), Row(false))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(array(1)), array(1.23))"),
--- End diff --

hmm? shouldn't this fail because of the bug?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22482
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22482
  
**[Test build #96324 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96324/testReport)**
 for PR 22482 at commit 
[`7d8371c`](https://github.com/apache/spark/commit/7d8371c34fe275ba3186dc97d0844cfd90ba06ed).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22482
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96324/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22443: [SPARK-25339][TEST] Refactor FilterPushdownBenchmark

2018-09-19 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22443
  
@gengliangwang . SPARK-25475 is created like the above, could you revise 
https://github.com/apache/spark/pull/22451 in order print the output as a 
separate file like this PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22475: [SPARK-4502][SQL] Rename to spark.sql.optimizer.nestedSc...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22475
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96309/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22475: [SPARK-4502][SQL] Rename to spark.sql.optimizer.nestedSc...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22475
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r219038350
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -735,6 +735,60 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
   df.selectExpr("array_contains(array(1, null), array(1, null)[0])"),
   Seq(Row(true), Row(true))
 )
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1), 1.23D)"),
--- End diff --

this query doesn't read any data from `df`, so the 2 result rows are always 
same. Can we use `OneRowRelation` here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22475: [SPARK-4502][SQL] Rename to spark.sql.optimizer.nestedSc...

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22475
  
**[Test build #96309 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96309/testReport)**
 for PR 22475 at commit 
[`5159883`](https://github.com/apache/spark/commit/5159883f5b4a65ac8ecec8b0368e172680aa6897).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22482
  
**[Test build #96324 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96324/testReport)**
 for PR 22482 at commit 
[`7d8371c`](https://github.com/apache/spark/commit/7d8371c34fe275ba3186dc97d0844cfd90ba06ed).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22443: [SPARK-25339][TEST] Refactor FilterPushdownBenchmark

2018-09-19 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22443
  
https://issues.apache.org/jira/browse/SPARK-25475 is created.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-19 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r219037732
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1879,6 +1879,66 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
 
 ## Upgrading From Spark SQL 2.3 to 2.4
 
+  - In Spark version 2.3 and earlier, the second parameter to 
array_contains function is implicitly promoted to the element type of first 
array type parameter. This type promotion can be lossy and may cause 
`array_contains` function to return wrong result. This problem has been 
addressed in 2.4 by employing a safer type promotion mechanism. This can cause 
some change in behavior and are illustrated in the table below.
+  
+
+  
+Query
+  
+  
+Result Spark 2.3 or Prior
+  
+  
+Result Spark 2.4
+  
+  
+Remarks
+  
+
+
+  
+SELECT  array_contains(array(1), 1.34D);
+  
+  
+true
+  
+  
+false
+  
+  
+In Spark 2.4, both left and right parameters are  promoted 
to array(double) and double type respectively.
+  
+
+
+  
+SELECT  array_contains(array(1), '1');
+  
+  
+true
+  
+  
+AnalysisException is thrown since integer type can not be 
promoted to string type in a loss-less manner.
--- End diff --

@cloud-fan Yeah, presto gives error. Please refer to my earlier comment 
showing the presto output. Did you want anything to change in the description ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r219037301
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1879,6 +1879,66 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
 
 ## Upgrading From Spark SQL 2.3 to 2.4
 
+  - In Spark version 2.3 and earlier, the second parameter to 
array_contains function is implicitly promoted to the element type of first 
array type parameter. This type promotion can be lossy and may cause 
`array_contains` function to return wrong result. This problem has been 
addressed in 2.4 by employing a safer type promotion mechanism. This can cause 
some change in behavior and are illustrated in the table below.
+  
+
+  
+Query
+  
+  
+Result Spark 2.3 or Prior
+  
+  
+Result Spark 2.4
+  
+  
+Remarks
+  
+
+
+  
+SELECT  array_contains(array(1), 1.34D);
+  
+  
+true
+  
+  
+false
+  
+  
+In Spark 2.4, both left and right parameters are  promoted 
to array(double) and double type respectively.
+  
+
+
+  
+SELECT  array_contains(array(1), '1');
+  
+  
+true
+  
+  
+AnalysisException is thrown since integer type can not be 
promoted to string type in a loss-less manner.
--- End diff --

If presto doesn't do it, we should follow it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22482
  
**[Test build #96323 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96323/testReport)**
 for PR 22482 at commit 
[`0072ebe`](https://github.com/apache/spark/commit/0072ebe1a46ff9d1230e18b33ca22c2f32cfb958).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22482
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22482
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96323/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r219037194
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1879,6 +1879,66 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
 
 ## Upgrading From Spark SQL 2.3 to 2.4
 
+  - In Spark version 2.3 and earlier, the second parameter to 
array_contains function is implicitly promoted to the element type of first 
array type parameter. This type promotion can be lossy and may cause 
`array_contains` function to return wrong result. This problem has been 
addressed in 2.4 by employing a safer type promotion mechanism. This can cause 
some change in behavior and are illustrated in the table below.
+  
+
+  
+Query
+  
+  
+Result Spark 2.3 or Prior
+  
+  
+Result Spark 2.4
+  
+  
+Remarks
+  
+
+
+  
+SELECT  array_contains(array(1), 1.34D);
+  
+  
+true
+  
+  
+false
+  
+  
+In Spark 2.4, both left and right parameters are  promoted 
to array(double) and double type respectively.
+  
+
+
+  
+SELECT  array_contains(array(1), '1');
+  
+  
+true
+  
+  
+AnalysisException is thrown since integer type can not be 
promoted to string type in a loss-less manner.
--- End diff --

We can promote `int` to `string`, but I'm not sure that's a common behavior 
in other databases


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r219037086
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1879,6 +1879,66 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
 
 ## Upgrading From Spark SQL 2.3 to 2.4
 
+  - In Spark version 2.3 and earlier, the second parameter to 
array_contains function is implicitly promoted to the element type of first 
array type parameter. This type promotion can be lossy and may cause 
`array_contains` function to return wrong result. This problem has been 
addressed in 2.4 by employing a safer type promotion mechanism. This can cause 
some change in behavior and are illustrated in the table below.
+  
+
+  
+Query
+  
+  
+Result Spark 2.3 or Prior
+  
+  
+Result Spark 2.4
+  
+  
+Remarks
+  
+
+
+  
+SELECT  array_contains(array(1), 1.34D);
+  
+  
+true
+  
+  
+false
+  
+  
+In Spark 2.4, both left and right parameters are  promoted 
to array(double) and double type respectively.
--- End diff --

remove `both`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22482
  
**[Test build #96323 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96323/testReport)**
 for PR 22482 at commit 
[`0072ebe`](https://github.com/apache/spark/commit/0072ebe1a46ff9d1230e18b33ca22c2f32cfb958).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-09-19 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/22482
  
The patch is a bit huge, so I'm not sure we would be better to squash 
commits into one before reviewing.

Two TODOs are left hence marking the patch as WIP, but closer to be a 
complete patch:

1. Optimal implementation of state for session window.

It borrowed the state implementation from streaming join since it fits the 
necessary concept of state for session window, but it may not be optimal one so 
I'm going to see we can have better implementation.

2. Javadoc (Maybe structured streaming guide doc too?)

I didn't add javadoc yet to speed up POC and actual development, but to 
complete the patch I guess I need to write javadoc for new classes as well as 
methods (maybe).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22408: [SPARK-25417][SQL] ArrayContains function may return inc...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22408
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3280/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22408: [SPARK-25417][SQL] ArrayContains function may return inc...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22408
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22482
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22482
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96321/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22482
  
**[Test build #96321 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96321/testReport)**
 for PR 22482 at commit 
[`fb19879`](https://github.com/apache/spark/commit/fb19879ff2bbafdf7c844d1a8da9d30c07aefd76).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22482
  
**[Test build #96321 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96321/testReport)**
 for PR 22482 at commit 
[`fb19879`](https://github.com/apache/spark/commit/fb19879ff2bbafdf7c844d1a8da9d30c07aefd76).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22408: [SPARK-25417][SQL] ArrayContains function may return inc...

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22408
  
**[Test build #96322 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96322/testReport)**
 for PR 22408 at commit 
[`df5ea47`](https://github.com/apache/spark/commit/df5ea4768781ac82927128b8dfeefb5ab421ee14).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22482
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22482
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22365: [SPARK-25381][SQL] Stratified sampling by Column ...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22365#discussion_r219034294
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala ---
@@ -370,29 +370,76 @@ final class DataFrameStatFunctions private[sql](df: 
DataFrame) {
* @since 1.5.0
*/
   def sampleBy[T](col: String, fractions: Map[T, Double], seed: Long): 
DataFrame = {
+sampleBy(Column(col), fractions, seed)
+  }
+
+  /**
+   * Returns a stratified sample without replacement based on the fraction 
given on each stratum.
+   * @param col column that defines strata
+   * @param fractions sampling fraction for each stratum. If a stratum is 
not specified, we treat
+   *  its fraction as zero.
+   * @param seed random seed
+   * @tparam T stratum type
+   * @return a new `DataFrame` that represents the stratified sample
+   *
+   * @since 1.5.0
+   */
+  def sampleBy[T](col: String, fractions: ju.Map[T, jl.Double], seed: 
Long): DataFrame = {
+sampleBy(col, fractions.asScala.toMap.asInstanceOf[Map[T, Double]], 
seed)
+  }
+
+  /**
+   * Returns a stratified sample without replacement based on the fraction 
given on each stratum.
+   * @param col column that defines strata
+   * @param fractions sampling fraction for each stratum. If a stratum is 
not specified, we treat
+   *  its fraction as zero.
+   * @param seed random seed
+   * @tparam T stratum type
+   * @return a new `DataFrame` that represents the stratified sample
+   *
+   * The stratified sample can be performed over multiple columns:
+   * {{{
+   *import org.apache.spark.sql.Row
+   *import org.apache.spark.sql.functions.struct
+   *
+   *val df = spark.createDataFrame(Seq(("Bob", 17), ("Alice", 10), 
("Nico", 8), ("Bob", 17),
+   *  ("Alice", 10))).toDF("name", "age")
+   *val fractions = Map(Row("Alice", 10) -> 0.3, Row("Nico", 8) -> 1.0)
+   *df.stat.sampleBy(struct($"name", $"age"), fractions, 36L).show()
+   *+-+---+
+   *| name|age|
+   *+-+---+
+   *| Nico|  8|
+   *|Alice| 10|
+   *+-+---+
+   * }}}
+   *
+   * @since 3.0.0
--- End diff --

the next release is 2.5.0


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22365: [SPARK-25381][SQL] Stratified sampling by Column argumen...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22365
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22482: WIP - [SPARK-10816][SS] Support session window na...

2018-09-19 Thread HeartSaVioR
GitHub user HeartSaVioR opened a pull request:

https://github.com/apache/spark/pull/22482

WIP - [SPARK-10816][SS] Support session window natively

## What changes were proposed in this pull request?

This patch proposes native support of session window, like Spark has been 
supporting for time window.

Please refer the attached doc in 
[SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816) for more 
details on rationalization, concepts, and limitation, etc.

In point of end users' view, only the change is addition of "session" SQL 
function. End users could define query with session window as replacing 
"window" function to "session" function, and "window" column to "session" 
column. After then the patch will provide same experience with time window.

Internally, this patch will change the physical plan of aggregation a bit: 
if there's session function being used in query, it will sort the input rows as 
"grouping keys" + "session", and merge overlapped sessions into one with 
applying aggregations, so it's like a sort based aggregation but the unit of 
group is grouping keys + session.

Due to handle late event, there's a case multiple session windows co-exist 
per key which are not yet to evict. This patch handles the case via borrowing 
state implementation from streaming join which can handle multiple values for 
given key.

## How was this patch tested?

Many UTs are added to verify session window queries for both batch and 
streaming.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HeartSaVioR/spark SPARK-10816

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22482.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22482


commit a1af74611df7dd5b979fc1a288de96e0b3d415da
Author: Jungtaek Lim 
Date:   2018-09-04T23:10:47Z

WIP nothing worked, just recording the progress

commit be502485047283e203933a4d78e3b580a0c567df
Author: Jungtaek Lim 
Date:   2018-09-06T04:36:11Z

WIP not working yet... lots of implementations needed

commit 7c60c0ad922ddacf025ad4762b85d06ab7cb258f
Author: Jungtaek Lim 
Date:   2018-09-06T13:31:08Z

WIP Finished implementing UpdatingSessionIterator

commit 4e8c260a6e6b73b9bcd347ca242b8e77aedf8d1e
Author: Jungtaek Lim 
Date:   2018-09-07T08:35:32Z

WIP add verification on precondition "rows in iterator are sorted by key"

commit 39069ded62dc5836b0b0f7c8ec7fb8ce869e5292
Author: Jungtaek Lim 
Date:   2018-09-08T04:36:46Z

Rename SymmetricHashJoinStateManager to MultiValuesStateManager

* This will be also used from session window state as well

commit c2716340e008000e1fcc5e4d3fcf9befa419ff77
Author: Jungtaek Lim 
Date:   2018-09-08T04:41:37Z

Move package of UpdatingSessionIterator

commit df4cffd5fd1ea82be509f1cd97e5fc3a7ef8acb6
Author: Jungtaek Lim 
Date:   2018-09-10T05:52:28Z

WIP add MergingSortWithMultiValuesStateIterator, now integrating with 
stateful operators (WIP...)

commit 79e32b918c3db41c7d6c1c1d55276d3f696746d5
Author: Jungtaek Lim 
Date:   2018-09-13T06:54:37Z

WIP the first version of working one! Still have lots of TODOs and FIXMEs 
to go

commit fb7aa17488e5753c5460f383e1b0f4bedca6dee8
Author: Jungtaek Lim 
Date:   2018-09-13T08:13:45Z

Add more explanations

commit 9f41b9d6e7960031c52603bd1da9aeca747e1dfb
Author: Jungtaek Lim 
Date:   2018-09-13T08:49:01Z

Silly bugfix & block session window for batch query as of now

We can enable it but there're lots of approaches on aggregations in batch 
side...

* AggUtils.planAggregateWithoutDistinct
* AggUtils.planAggregateWithOneDistinct
* RewriteDistinctAggregates
* AggregateInPandasExec

So unless we are sure which things to support, just block them for now...

commit 0a62b1f0c274859061c0f3ab2c63450052985ac7
Author: Jungtaek Lim 
Date:   2018-09-13T09:28:34Z

More works: majorly split out updating session to individual physical node

* we will leverage such node for batch case if we want

commit acb5a0c42641041ca3adae2c9f2293b4dfa837cf
Author: Jungtaek Lim 
Date:   2018-09-13T09:38:00Z

Fix a silly bug and also add check for session window against batch query

commit 1b6502c92231b7aaa9d0d6f620a5bcc624b862ec
Author: Jungtaek Lim 
Date:   2018-09-13T11:30:15Z

WIP Fixed eviction on update mode

commit fec9a8ae5c1d421322738bd474fcb5508421f51a
Author: Jungtaek Lim 
Date:   2018-09-13T12:48:07Z

WIP found root reason of broken UT... fixed it

commit c87e4eebcc53c81328d52e4d4ea270bcede8b26e
Author: Jungtaek Lim 
Date:   2018-09-13T12:50:31Z

WIP remove printing "explain" on UTs

commit c0726d7447ce84440e46013d1cc392f1e397

[GitHub] spark issue #22480: [SPARK-25473][PYTHON][SS][TEST] ForeachWriter tests fail...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22480
  
+1 for adding the note


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22443: [SPARK-25339][TEST] Refactor FilterPushdownBenchmark

2018-09-19 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22443
  
I see, @cloud-fan .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22443: [SPARK-25339][TEST] Refactor FilterPushdownBenchm...

2018-09-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22443


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22443: [SPARK-25339][TEST] Refactor FilterPushdownBenchmark

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22443
  
thanks, merging to master!

@dongjoon-hyun Can you create an umbrella JIRA for updating all the 
benchmark and take care of it? Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not...

2018-09-19 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22462#discussion_r219032068
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/sources/StreamingDataSourceV2Suite.scala
 ---
@@ -143,15 +185,18 @@ class StreamingDataSourceV2Suite extends StreamTest {
 Trigger.ProcessingTime(1000),
 Trigger.Continuous(1000))
 
-  private def testPositiveCase(readFormat: String, writeFormat: String, 
trigger: Trigger) = {
--- End diff --

Yup


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21445: [SPARK-24404][SS] Increase currentEpoch when meet a Epoc...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21445
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22481: Revert [SPARK-19355][SPARK-25352]

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22481
  
LGTM, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21649: [SPARK-23648][R][SQL]Adds more types for hint in ...

2018-09-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21649


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...

2018-09-19 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/7
  
long thread, are we all good with this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21649: [SPARK-23648][R][SQL]Adds more types for hint in SparkR

2018-09-19 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21649
  
merged to master, thx


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22475: [SPARK-4502][SQL] Rename to spark.sql.optimizer.n...

2018-09-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22475


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22460: DO NOT MERGE

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22460
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96313/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22460: DO NOT MERGE

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22460
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22460: DO NOT MERGE

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22460
  
**[Test build #96313 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96313/testReport)**
 for PR 22460 at commit 
[`d930dc7`](https://github.com/apache/spark/commit/d930dc73a7c73d7ce6cab96025c30993af4ea8e7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22173: [SPARK-24355] Spark external shuffle server improvement ...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22173
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22475: [SPARK-4502][SQL] Rename to spark.sql.optimizer.nestedSc...

2018-09-19 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22475
  
Thanks! Merged to master/2.4


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22173: [SPARK-24355] Spark external shuffle server improvement ...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22173
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96307/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219030775
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -244,11 +244,15 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-cols <- lapply(dtypes(object), function(l) {
-  paste(l, collapse = ":")
-})
-s <- paste(cols, collapse = ", ")
-cat(paste(class(object), "[", s, "]\n", sep = ""))
+if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", 
"false")[[1]], "true")) {
--- End diff --

also not sure if it's done for python, consider adding to the doc above 
(L229) how it behaves with eagerEval


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22173: [SPARK-24355] Spark external shuffle server improvement ...

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22173
  
**[Test build #96307 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96307/testReport)**
 for PR 22173 at commit 
[`0348ec8`](https://github.com/apache/spark/commit/0348ec8d5570aab9d744043a3d6a88950f4aeb5c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219030350
  
--- Diff: docs/sparkr.md ---
@@ -450,6 +450,42 @@ print(model.summaries)
 {% endhighlight %}
 
 
+### Eager execution
+
+If the eager execution is enabled, the data will be returned to R client 
immediately when the `SparkDataFrame` is created. Eager execution can be 
enabled by setting the configuration property 
`spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started 
up.
+
+
+{% highlight r %}
+
+# Start up spark session with eager execution enabled
+sparkR.session(master = "local[*]", sparkConfig = 
list(spark.sql.repl.eagerEval.enabled = "true"))
+
+df <- createDataFrame(faithful)
+
+# Instead of displaying the SparkDataFrame class, displays the data 
returned
+df
+
+##+-+---+  
   
+##|eruptions|waiting|
+##+-+---+
+##|  3.6|   79.0|
+##|  1.8|   54.0|
+##|3.333|   74.0|
+##|2.283|   62.0|
+##|4.533|   85.0|
+##|2.883|   55.0|
+##|  4.7|   88.0|
+##|  3.6|   85.0|
+##| 1.95|   51.0|
+##| 4.35|   85.0|
+##+-+---+
+##only showing top 10 rows
+
+{% endhighlight %} 
+
+
+Note that the `SparkSession` created by `sparkR` shell does not have eager 
execution enabled. You can stop the current session and start up a new session 
like above to enable.
--- End diff --

actually I think the suggestion should be to set that in the `sparkR` 
command line as spark conf?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219030512
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -0,0 +1,58 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(testthat)
+
+context("Show SparkDataFrame when eager execution is enabled.")
+
+test_that("eager execution is not enabled", {
+  # Start Spark session without eager execution enabled
+  sparkSession <- if (windows_with_hadoop()) {
+sparkR.session(master = sparkRTestMaster)
+  } else {
+sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
+  }
+  
+  df <- suppressWarnings(createDataFrame(iris))
--- End diff --

use a different dataset that does not require `suppressWarnings`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219030211
  
--- Diff: docs/sparkr.md ---
@@ -450,6 +450,42 @@ print(model.summaries)
 {% endhighlight %}
 
 
+### Eager execution
+
+If the eager execution is enabled, the data will be returned to R client 
immediately when the `SparkDataFrame` is created. Eager execution can be 
enabled by setting the configuration property 
`spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started 
up.
+
+
+{% highlight r %}
+
+# Start up spark session with eager execution enabled
+sparkR.session(master = "local[*]", sparkConfig = 
list(spark.sql.repl.eagerEval.enabled = "true"))
+
+df <- createDataFrame(faithful)
+
+# Instead of displaying the SparkDataFrame class, displays the data 
returned
--- End diff --

we could also start here by saying "similar to R data.frame`...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219030277
  
--- Diff: docs/sparkr.md ---
@@ -450,6 +450,42 @@ print(model.summaries)
 {% endhighlight %}
 
 
+### Eager execution
+
+If the eager execution is enabled, the data will be returned to R client 
immediately when the `SparkDataFrame` is created. Eager execution can be 
enabled by setting the configuration property 
`spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started 
up.
+
+
+{% highlight r %}
+
+# Start up spark session with eager execution enabled
+sparkR.session(master = "local[*]", sparkConfig = 
list(spark.sql.repl.eagerEval.enabled = "true"))
+
+df <- createDataFrame(faithful)
+
+# Instead of displaying the SparkDataFrame class, displays the data 
returned
+df
+
+##+-+---+  
   
+##|eruptions|waiting|
+##+-+---+
+##|  3.6|   79.0|
+##|  1.8|   54.0|
+##|3.333|   74.0|
+##|2.283|   62.0|
+##|4.533|   85.0|
+##|2.883|   55.0|
+##|  4.7|   88.0|
+##|  3.6|   85.0|
+##| 1.95|   51.0|
+##| 4.35|   85.0|
+##+-+---+
+##only showing top 10 rows
+
+{% endhighlight %} 
+
+
+Note that the `SparkSession` created by `sparkR` shell does not have eager 
execution enabled. You can stop the current session and start up a new session 
like above to enable.
--- End diff --

change to `Note that the `SparkSession` created by `sparkR` shell by 
default does not `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219029847
  
--- Diff: docs/sparkr.md ---
@@ -450,6 +450,42 @@ print(model.summaries)
 {% endhighlight %}
 
 
+### Eager execution
--- End diff --

should be `` I think?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219030474
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -0,0 +1,58 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(testthat)
+
+context("Show SparkDataFrame when eager execution is enabled.")
+
+test_that("eager execution is not enabled", {
--- End diff --

I'm neutral, should these tests be in test_sparkSQL.R? it takes longer to 
run with many test files


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219030085
  
--- Diff: docs/sparkr.md ---
@@ -450,6 +450,42 @@ print(model.summaries)
 {% endhighlight %}
 
 
+### Eager execution
+
+If the eager execution is enabled, the data will be returned to R client 
immediately when the `SparkDataFrame` is created. Eager execution can be 
enabled by setting the configuration property 
`spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started 
up.
+
+
+{% highlight r %}
+
+# Start up spark session with eager execution enabled
+sparkR.session(master = "local[*]", sparkConfig = 
list(spark.sql.repl.eagerEval.enabled = "true"))
+
+df <- createDataFrame(faithful)
--- End diff --

perhaps a more complete example - like `summarize(groupBy(df, df$waiting), 
count = n(df$waiting))`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219030537
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -0,0 +1,58 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(testthat)
+
+context("Show SparkDataFrame when eager execution is enabled.")
+
+test_that("eager execution is not enabled", {
+  # Start Spark session without eager execution enabled
+  sparkSession <- if (windows_with_hadoop()) {
+sparkR.session(master = sparkRTestMaster)
+  } else {
+sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
+  }
+  
+  df <- suppressWarnings(createDataFrame(iris))
+  expect_is(df, "SparkDataFrame")
+  expected <- "Sepal_Length:double, Sepal_Width:double, 
Petal_Length:double, Petal_Width:double, Species:string"
+  expect_output(show(df), expected)
+  
+  # Stop Spark session
+  sparkR.session.stop()
+})
+
+test_that("eager execution is enabled", {
+  # Start Spark session without eager execution enabled
+  sparkSession <- if (windows_with_hadoop()) {
+sparkR.session(master = sparkRTestMaster,
+   sparkConfig = list(spark.sql.repl.eagerEval.enabled = 
"true"))
+  } else {
+sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE, 
+   sparkConfig = list(spark.sql.repl.eagerEval.enabled = 
"true"))
+  }
+  
+  df <- suppressWarnings(createDataFrame(iris))
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-09-19 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22379
  
think maybe someone to review the SQL stuff more?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22481: Revert [SPARK-19355][SPARK-25352]

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22481
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3279/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22481: Revert [SPARK-19355][SPARK-25352]

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22481
  
**[Test build #96320 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96320/testReport)**
 for PR 22481 at commit 
[`4532aaa`](https://github.com/apache/spark/commit/4532aaa2471c04c57f3b59bdcec26ad83627df68).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22481: Revert [SPARK-19355][SPARK-25352]

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22481
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22464: Revert [SPARK-19355][SPARK-25352]

2018-09-19 Thread viirya
Github user viirya closed the pull request at:

https://github.com/apache/spark/pull/22464


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22481: Revert [SPARK-19355][SPARK-25352]

2018-09-19 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/22481

Revert [SPARK-19355][SPARK-25352]

## What changes were proposed in this pull request?

This goes to revert sequential PRs based on some discussion and comments at 
https://github.com/apache/spark/pull/16677#issuecomment-422650759.

#22344
#22330
#22239
#16677

## How was this patch tested?

Existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 revert-SPARK-19355-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22481.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22481


commit 7b81a95e9496ba953648880a70896085e2bfd043
Author: Liang-Chi Hsieh 
Date:   2018-09-20T03:14:50Z

Revert "[SPARK-25352][SQL] Perform ordered global limit when limit number 
is bigger than topKSortFallbackThreshold"

This reverts commit 2f422398b524eacc89ab58e423bb134ae3ca3941.

commit cea66899194f19812ac217cde2d8b6fe1fbe1328
Author: Liang-Chi Hsieh 
Date:   2018-09-20T03:57:38Z

Revert "[SPARK-19355][SQL][FOLLOWUP][TEST] Properly recycle SparkSession on 
TakeOrderedAndProjectSuite finishes"

This reverts commit 3aa60282cc84d471ea32ef240ec84e5b6e3e231b.

commit 2dae33e5b897c0ec05f675ec565abee5f2c4ea34
Author: Liang-Chi Hsieh 
Date:   2018-09-20T03:58:11Z

Revert "[SPARK-19355][SQL][FOLLOWUP] Remove the child.outputOrdering check 
in global limit"

This reverts commit 5c27b0d4f8d378bd7889d26fb358f478479b9996.

commit 4532aaa2471c04c57f3b59bdcec26ad83627df68
Author: Liang-Chi Hsieh 
Date:   2018-09-20T04:00:46Z

Revert "[SPARK-19355][SQL] Use map output statistics to improve global 
limit's parallelism"

This reverts commit 4f175850985cfc4c64afb90d784bb292e81dc0b7.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22481: Revert [SPARK-19355][SPARK-25352]

2018-09-19 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/22481
  
cc @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22480: [SPARK-25473][PYTHON][SS][TEST] ForeachWriter tests fail...

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22480
  
**[Test build #96319 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96319/testReport)**
 for PR 22480 at commit 
[`97e95af`](https://github.com/apache/spark/commit/97e95afeba368dd06f747665c41f96a50141305a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-19 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r219028470
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1879,6 +1879,80 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
 
 ## Upgrading From Spark SQL 2.3 to 2.4
 
+  - In Spark version 2.3 and earlier, the second parameter to 
array_contains function is implicitly promoted to the element type of first 
array type parameter. This type promotion can be lossy and may cause 
`array_contains` function to return wrong result. This problem has been 
addressed in 2.4 by employing a safer type promotion mechanism. This can cause 
some change in behavior and are illustrated in the table below.
+  
+
+  
+Query
+  
+  
+Result Spark 2.3 or Prior
+  
+  
+Result Spark 2.4
+  
+  
+Remarks
+  
+
+
+  
+SELECT  array_contains(array(1), 1.34D);
+  
+  
+true
+  
+  
+false
+  
+  
+In Spark 2.4, both left and right parameters are  promoted 
to array(double) and double type respectively.
+  
+
+
+  
+SELECT  array_contains(array(1), 1.34);
--- End diff --

@cloud-fan OK.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >