[GitHub] spark issue #22503: [SPARK-25493] [SQL] Fix multiline crlf

2018-09-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22503
  
Also, please fix the PR title to be more descriptive. For instance, 
`[SPARK-25493][SQL] Use auto-detection for CRLF in CSV datasource multiline 
mode`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22503: [SPARK-25493] [SQL] Fix multiline crlf

2018-09-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22503
  
**[Test build #96485 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96485/testReport)**
 for PR 22503 at commit 
[`2f349d7`](https://github.com/apache/spark/commit/2f349d7a779cd8f347b73ec59e2f4216450075f1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22326: [SPARK-25314][SQL] Fix Python UDF accessing attributes f...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22326
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22503: [SPARK-25493] [SQL] Fix multiline crlf

2018-09-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22503#discussion_r219688971
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
 ---
@@ -212,6 +212,7 @@ class CSVOptions(
 settings.setEmptyValue(emptyValueInRead)
 settings.setMaxCharsPerColumn(maxCharsPerColumn)
 
settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_DELIMITER)
+settings.setLineSeparatorDetectionEnabled(true)
--- End diff --

Yup, I would rather enable this only for multiline mode. Also, please add 
what this configuration does in the PR description.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22326: [SPARK-25314][SQL] Fix Python UDF accessing attributes f...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22326
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96482/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22326: [SPARK-25314][SQL] Fix Python UDF accessing attributes f...

2018-09-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22326
  
**[Test build #96482 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96482/testReport)**
 for PR 22326 at commit 
[`caf6f94`](https://github.com/apache/spark/commit/caf6f94b980e877f02c57b9647bae7df5d4e16ae).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22503: [SPARK-25493] [SQL] Fix multiline crlf

2018-09-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22503
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22529: [SPARK-25460][BRANCH-2.4][SS] DataSourceV2: SS sources d...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22529
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22529: [SPARK-25460][BRANCH-2.4][SS] DataSourceV2: SS sources d...

2018-09-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22529
  
**[Test build #96484 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96484/testReport)**
 for PR 22529 at commit 
[`b080b0d`](https://github.com/apache/spark/commit/b080b0d7cb018f739afd578c7952d5f23d3375e2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22529: [SPARK-25460][BRANCH-2.4][SS] DataSourceV2: SS sources d...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22529
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3386/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision ...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21632
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96483/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision ...

2018-09-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21632
  
**[Test build #96483 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96483/testReport)**
 for PR 21632 at commit 
[`f0cb95f`](https://github.com/apache/spark/commit/f0cb95f6fd95b1819a028bdd674ea5f7c3a2e754).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision ...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21632
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision ...

2018-09-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21632
  
**[Test build #96483 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96483/testReport)**
 for PR 21632 at commit 
[`f0cb95f`](https://github.com/apache/spark/commit/f0cb95f6fd95b1819a028bdd674ea5f7c3a2e754).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision ...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21632
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3385/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision ...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21632
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22326: [SPARK-25314][SQL] Fix Python UDF accessing attributes f...

2018-09-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22326
  
**[Test build #96482 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96482/testReport)**
 for PR 22326 at commit 
[`caf6f94`](https://github.com/apache/spark/commit/caf6f94b980e877f02c57b9647bae7df5d4e16ae).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22326: [SPARK-25314][SQL] Fix Python UDF accessing attributes f...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22326
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22326: [SPARK-25314][SQL] Fix Python UDF accessing attributes f...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22326
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3384/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22480: [SPARK-25473][PYTHON][SS][TEST] ForeachWriter tes...

2018-09-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22480


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22480: [SPARK-25473][PYTHON][SS][TEST] ForeachWriter tests fail...

2018-09-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22480
  
Thanks, @cloud-fan, @BryanCutler and @holdenk 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22480: [SPARK-25473][PYTHON][SS][TEST] ForeachWriter tests fail...

2018-09-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22480
  
Merged only to master since I assume it's likely we will meet the test 
failures on master branch specifically more often.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22529: [SPARK-25460][BRANCH-2.4][SS] DataSourceV2: SS sources d...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22529
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96480/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22529: [SPARK-25460][BRANCH-2.4][SS] DataSourceV2: SS sources d...

2018-09-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22529
  
**[Test build #96480 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96480/testReport)**
 for PR 22529 at commit 
[`b6f8880`](https://github.com/apache/spark/commit/b6f8880ad6bdbcb721ca0863502ec4b6c85b162c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22529: [SPARK-25460][BRANCH-2.4][SS] DataSourceV2: SS sources d...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22529
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22316#discussion_r219686833
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -416,7 +426,7 @@ class RelationalGroupedDataset protected[sql](
 new RelationalGroupedDataset(
   df,
   groupingExprs,
-  RelationalGroupedDataset.PivotType(pivotColumn.expr, 
values.map(Literal.apply)))
+  RelationalGroupedDataset.PivotType(pivotColumn.expr, 
values.map(lit(_).expr)))
--- End diff --

That's true in general but specifically is decimal precision more correct?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...

2018-09-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/7
  
**[Test build #96481 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96481/testReport)**
 for PR 7 at commit 
[`5c8f487`](https://github.com/apache/spark/commit/5c8f48715748bdeda703761fba6a4d1828a19985).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...

2018-09-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/7
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22529: [SPARK-25460][BRANCH-2.4][SS] DataSourceV2: SS sources d...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22529
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22529: [SPARK-25460][BRANCH-2.4][SS] DataSourceV2: SS sources d...

2018-09-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22529
  
**[Test build #96480 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96480/testReport)**
 for PR 22529 at commit 
[`b6f8880`](https://github.com/apache/spark/commit/b6f8880ad6bdbcb721ca0863502ec4b6c85b162c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22529: [SPARK-25460][BRANCH-2.4][SS] DataSourceV2: SS sources d...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22529
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3383/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not respec...

2018-09-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22462
  
The conflicts looks mainly renaming. I opened a backport - 
https://github.com/apache/spark/pull/22529


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22529: [SPARK-25460][BRANCH-2.4][SS] DataSourceV2: SS so...

2018-09-22 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/22529

[SPARK-25460][BRANCH-2.4][SS] DataSourceV2: SS sources do not respect 
SessionConfigSupport

## What changes were proposed in this pull request?

This PR proposes to backport SPARK-25460 to branch-2.4:

This PR proposes to respect `SessionConfigSupport` in SS datasources as 
well. Currently these are only respected in batch sources:


https://github.com/apache/spark/blob/e06da95cd9423f55cdb154a2778b0bddf7be984c/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L198-L203


https://github.com/apache/spark/blob/e06da95cd9423f55cdb154a2778b0bddf7be984c/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L244-L249

If a developer makes a datasource V2 that supports both structured 
streaming and batch jobs, batch jobs respect a specific configuration, let's 
say, URL to connect and fetch data (which end users might not be aware of); 
however, structured streaming ends up with not supporting this (and should 
explicitly be set into options).

## How was this patch tested?

Unit tests were added.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-25460-backport

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22529.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22529


commit b6f8880ad6bdbcb721ca0863502ec4b6c85b162c
Author: hyukjinkwon 
Date:   2018-09-20T12:22:55Z

[SPARK-25460][SS] DataSourceV2: SS sources do not respect 
SessionConfigSupport

This PR proposes to respect `SessionConfigSupport` in SS datasources as 
well. Currently these are only respected in batch sources:


https://github.com/apache/spark/blob/e06da95cd9423f55cdb154a2778b0bddf7be984c/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L198-L203


https://github.com/apache/spark/blob/e06da95cd9423f55cdb154a2778b0bddf7be984c/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L244-L249

If a developer makes a datasource V2 that supports both structured 
streaming and batch jobs, batch jobs respect a specific configuration, let's 
say, URL to connect and fetch data (which end users might not be aware of); 
however, structured streaming ends up with not supporting this (and should 
explicitly be set into options).

Unit tests were added.

Closes #22462 from HyukjinKwon/SPARK-25460.

Authored-by: hyukjinkwon 
Signed-off-by: Wenchen Fan 




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18544: [SPARK-21318][SQL]Improve exception message thrown by `l...

2018-09-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18544
  
**[Test build #96479 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96479/testReport)**
 for PR 18544 at commit 
[`623b282`](https://github.com/apache/spark/commit/623b282b2edf872cb4e4bd93e27837ac567854e1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21747: [SPARK-24165][SQL][branch-2.3] Fixing conditional expres...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21747
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22523: [MINOR][PYSPARK] Always Close the tempFile in _se...

2018-09-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22523


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22523: [MINOR][PYSPARK] Always Close the tempFile in _serialize...

2018-09-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22523
  
and branch-2.4.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22523: [MINOR][PYSPARK] Always Close the tempFile in _se...

2018-09-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22523#discussion_r219686544
  
--- Diff: python/pyspark/context.py ---
@@ -537,8 +537,10 @@ def _serialize_to_jvm(self, data, serializer, 
reader_func, createRDDServer):
 # parallelize from there.
 tempFile = NamedTemporaryFile(delete=False, dir=self._temp_dir)
--- End diff --

Actually, we better use a context manager:

```python
with NamedTemporaryFile(delete=False, dir=self._temp_dir) as tempfile:
...
```

but not a big deal. LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18544: [SPARK-21318][SQL]Improve exception message thrown by `l...

2018-09-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18544
  
**[Test build #96478 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96478/testReport)**
 for PR 18544 at commit 
[`53dc155`](https://github.com/apache/spark/commit/53dc1558ecdb64623d004e615a6000745989ceed).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22527: [SPARK-17952][SQL] Nested Java beans support in c...

2018-09-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22527#discussion_r219686433
  
--- Diff: 
sql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameSuite.java ---
@@ -171,7 +184,12 @@ void validateDataFrameWithBeans(Bean bean, 
Dataset df) {
   schema.apply("d"));
 Assert.assertEquals(new StructField("e", 
DataTypes.createDecimalType(38,0), true,
   Metadata.empty()), schema.apply("e"));
-Row first = df.select("a", "b", "c", "d", "e").first();
+Assert.assertEquals(new StructField("f",
+
DataTypes.createStructType(Collections.singletonList(new StructField(
+"a", IntegerType$.MODULE$, false, 
Metadata.empty(,
+true, Metadata.empty()),
+schema.apply("f"));
--- End diff --

should be double spaced.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22527: [SPARK-17952][SQL] Nested Java beans support in c...

2018-09-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22527#discussion_r219686429
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
@@ -1100,13 +1101,24 @@ object SQLContext {
   attrs: Seq[AttributeReference]): Iterator[InternalRow] = {
 val extractors =
   
JavaTypeInference.getJavaBeanReadableProperties(beanClass).map(_.getReadMethod)
-val methodsToConverts = extractors.zip(attrs).map { case (e, attr) =>
-  (e, CatalystTypeConverters.createToCatalystConverter(attr.dataType))
+val methodsToTypes = extractors.zip(attrs).map { case (e, attr) =>
+  (e, attr.dataType)
+}
+def invoke(element: Any)(tuple: (Method, DataType)): Any = tuple match 
{
+  case (e, structType: StructType) =>
+val value = e.invoke(element)
+val nestedExtractors = 
JavaTypeInference.getJavaBeanReadableProperties(value.getClass)
+.map(desc => desc.getName -> desc.getReadMethod)
+.toMap
+new GenericInternalRow(structType.map(nestedProperty =>
+  invoke(value)(nestedExtractors(nestedProperty.name) -> 
nestedProperty.dataType)
+).toArray)
--- End diff --

Why should we use a map here while we don't need it for the root bean?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22517: Branch 2.3 how can i fix error use Pyspark

2018-09-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22517
  
@lovezeropython please close this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22528: [SPARK-25513][SQL] Read zipped CSV and JSON

2018-09-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22528#discussion_r219686326
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/CodecStreams.scala
 ---
@@ -41,7 +42,12 @@ object CodecStreams {
 
 getDecompressionCodec(config, file)
   .map(codec => codec.createInputStream(inputStream))
-  .getOrElse(inputStream)
+  .orElse {
+if (file.getName.toLowerCase.endsWith(".zip")) {
+  val zip = new ZipArchiveInputStream(inputStream)
+  if (zip.getNextEntry != null) Some(zip) else None
+} else None
+  }.getOrElse(inputStream)
--- End diff --

@MaxGekk, I got that we can support zipped one but isn't this difficult to 
extend this support to non multiline modes as well? Basically deflate is the 
same codec and I wonder if we really should allow this zip one specifically in 
multiline mode for CSV / JSON specifically with a clear restriction (single 
file). Please correct me if I misunderstood.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...

2018-09-22 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22489
  
I've considered this for 2.3.3 since 2.3.2 RC6 vote was already started. 
For now, I'm waiting the result of vote.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark

2018-09-22 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22513
  
+1, late LGTM.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22509: [SPARK-25384][SQL] Clarify fromJsonForceNullableSchema w...

2018-09-22 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22509
  
Sorry for missing this deprecation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22499: [SPARK-25489][ML][TEST] Refactor UDTSerialization...

2018-09-22 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22499#discussion_r219685155
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/linalg/UDTSerializationBenchmark.scala
 ---
@@ -18,52 +18,52 @@
 package org.apache.spark.mllib.linalg
 
 import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
-import org.apache.spark.util.Benchmark
+import org.apache.spark.util.{Benchmark, BenchmarkBase => 
FileBenchmarkBase}
 
 /**
  * Serialization benchmark for VectorUDT.
+ * To run this benchmark:
+ * 1. without sbt: bin/spark-submit --class  
--- End diff --

+1 for fix the docs to pass Jenkins.
Also, could you rebase this PR to resolve conflicts, @seancxmao ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22528: [SPARK-25513][SQL] Read zipped CSV and JSON

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22528
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96477/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22528: [SPARK-25513][SQL] Read zipped CSV and JSON

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22528
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22528: [SPARK-25513][SQL] Read zipped CSV and JSON

2018-09-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22528
  
**[Test build #96477 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96477/testReport)**
 for PR 22528 at commit 
[`ec8ba0d`](https://github.com/apache/spark/commit/ec8ba0da6a29efb7f4dfeccb7cb68c2085c6890f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22407: [SPARK-25416][SQL] ArrayPosition function may return inc...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22407
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22407: [SPARK-25416][SQL] ArrayPosition function may return inc...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22407
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96475/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22407: [SPARK-25416][SQL] ArrayPosition function may return inc...

2018-09-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22407
  
**[Test build #96475 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96475/testReport)**
 for PR 22407 at commit 
[`55d4b95`](https://github.com/apache/spark/commit/55d4b950951892f3a239f960feadbe1a25198659).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22522: [SPARK-25510][TEST] Create new trait replace Benc...

2018-09-22 Thread wangyum
Github user wangyum closed the pull request at:

https://github.com/apache/spark/pull/22522


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark t...

2018-09-22 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r219683925
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala
 ---
@@ -34,621 +34,508 @@ import org.apache.spark.unsafe.map.BytesToBytesMap
 
 /**
  * Benchmark to measure performance for aggregate primitives.
- * To run this:
- *  build/sbt "sql/test-only *benchmark.AggregateBenchmark"
- *
- * Benchmarks in this file are skipped in normal builds.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt: bin/spark-submit --class  
+ *   2. build/sbt "sql/test:runMain "
+ *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt 
"sql/test:runMain "
+ *  Results will be written to 
"benchmarks/AggregateBenchmark-results.txt".
+ * }}}
  */
-class AggregateBenchmark extends BenchmarkWithCodegen {
+object AggregateBenchmark extends RunBenchmarkWithCodegen {
 
-  ignore("aggregate without grouping") {
-val N = 500L << 22
-val benchmark = new Benchmark("agg without grouping", N)
-runBenchmark("agg w/o group", N) {
-  sparkSession.range(N).selectExpr("sum(id)").collect()
+  override def benchmark(): Unit = {
+runBenchmark("aggregate without grouping") {
+  val N = 500L << 22
+  runBenchmark("agg w/o group", N) {
--- End diff --

Yes. Do you have a suggested name?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13440: [SPARK-15699] [ML] Implement a Chi-Squared test statisti...

2018-09-22 Thread erikerlandson
Github user erikerlandson commented on the issue:

https://github.com/apache/spark/pull/13440
  
I think targeting 3.0 with a refactor makes the most sense.  There's no way 
to do this without making small breaking changes, but slightly larger changes 
could clean up the design.  `ImpurityCalculator` can subsume `Impurity`, and a 
more general rethinking of gain and impurity can be accommodated too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13440: [SPARK-15699] [ML] Implement a Chi-Squared test statisti...

2018-09-22 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/13440
  
Yeah I take your point that the trait Impurity already defines two methods, 
only one of which is implemented for each of the subclasses. It's already a 
funky design that probably should have been generalized differently. I think a 
rewrite for Spark 3 would be worthwhile, personally. I'm also not quite sure of 
the difference between the Impurity and ImpurityCalculator class; it seems like 
Impurity should fold into ImpurityCalculator. 

Is the single method we really want to define something like 
`computeInformationGain(ImpurityCalculator, ImpurityCalculator)`? even the new 
method you've added is not directly computing info gain, nor were the existing 
ones in Impurity. But that's the thing we need and abstraction for over several 
implementations, it seems.

Well, I think either this gets a bigger redesign in 3.0, or we try to get 
it into 2.5 and accept some API changes. I think I lean towards a bolder 
breaking change to fix it up in 3.0, unless there's a pressing need for this 
metric.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22528: [SPARK-25513][SQL] Read zipped CSV and JSON

2018-09-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22528
  
**[Test build #96477 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96477/testReport)**
 for PR 22528 at commit 
[`ec8ba0d`](https://github.com/apache/spark/commit/ec8ba0da6a29efb7f4dfeccb7cb68c2085c6890f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22528: [SPARK-25513][SQL] Read zipped CSV and JSON

2018-09-22 Thread MaxGekk
Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/22528
  
jenkins, retest this, please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22528: [SPARK-25513][SQL] Read zipped CSV and JSON

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22528
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22528: [SPARK-25513][SQL] Read zipped CSV and JSON

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22528
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96476/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22528: [SPARK-25513][SQL] Read zipped CSV and JSON

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22528
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22528: [SPARK-25513][SQL] Read zipped CSV and JSON

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22528
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22528: [SPARK-25513][SQL] Read zipped CSV and JSON

2018-09-22 Thread MaxGekk
GitHub user MaxGekk opened a pull request:

https://github.com/apache/spark/pull/22528

[SPARK-25513][SQL] Read zipped CSV and JSON

## What changes were proposed in this pull request?

In the PR, I propose to support reading of zip archives containing **one** 
CSV or JSON file in the multi-line mode. 

## How was this patch tested?

Added tests for CSV and JSON where zip archives are created by Java library.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MaxGekk/spark-1 read-zipped-csv-json

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22528.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22528


commit a926d277e0cecb4d2d66e6500a68e656da6e1d2f
Author: Maxim Gekk 
Date:   2018-09-22T19:49:44Z

Support zip archives

commit 29716248b1ef504ab828c6b8af8ac78f1013923a
Author: Maxim Gekk 
Date:   2018-09-22T19:49:59Z

Add test for zipped CSV files

commit 149e452d17cffecb024c29771dc05322295ba437
Author: Maxim Gekk 
Date:   2018-09-22T19:52:18Z

Fix imports

commit 1dff39eb7e06435551ab7ba0d0443b106e60e4b6
Author: Maxim Gekk 
Date:   2018-09-22T19:57:10Z

Added a test for zipped JSON

commit 09dff81b34600c05a3b30a135c32e9dcd40e5bae
Author: Maxim Gekk 
Date:   2018-09-22T19:58:56Z

Refactoring of the CSV test

commit 5fda51a3505437c4a32f146940a908cd1557bbf5
Author: Maxim Gekk 
Date:   2018-09-22T20:02:37Z

Make extension checking case agnostic




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22484
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96473/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22484
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

2018-09-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22484
  
**[Test build #96473 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96473/testReport)**
 for PR 22484 at commit 
[`42230b6`](https://github.com/apache/spark/commit/42230b6e3edb731eb69b3b8800805805e2234d10).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait RunBenchmarkWithCodegen extends BenchmarkBase `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22326: [SPARK-25314][SQL] Fix Python UDF accessing attributes f...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22326
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96474/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22326: [SPARK-25314][SQL] Fix Python UDF accessing attributes f...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22326
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22326: [SPARK-25314][SQL] Fix Python UDF accessing attributes f...

2018-09-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22326
  
**[Test build #96474 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96474/testReport)**
 for PR 22326 at commit 
[`e7c1aee`](https://github.com/apache/spark/commit/e7c1aeecff433ecdd272a9e2a85567d438152722).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class HandlePythonUDFInJoinCondition(conf: SQLConf) extends 
Rule[LogicalPlan] `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22407: [SPARK-25416][SQL] ArrayPosition function may return inc...

2018-09-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22407
  
**[Test build #96475 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96475/testReport)**
 for PR 22407 at commit 
[`55d4b95`](https://github.com/apache/spark/commit/55d4b950951892f3a239f960feadbe1a25198659).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22407: [SPARK-25416][SQL] ArrayPosition function may return inc...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22407
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3382/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22527: [SPARK-17952][SQL] Nested Java beans support in createDa...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22527
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22407: [SPARK-25416][SQL] ArrayPosition function may return inc...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22407
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22527: [SPARK-17952][SQL] Nested Java beans support in createDa...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22527
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22527: [SPARK-17952][SQL] Nested Java beans support in createDa...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22527
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22527: [SPARK-17952][SQL] Nested Java beans support in c...

2018-09-22 Thread michalsenkyr
GitHub user michalsenkyr opened a pull request:

https://github.com/apache/spark/pull/22527

[SPARK-17952][SQL] Nested Java beans support in createDataFrame

## What changes were proposed in this pull request?

When constructing a DataFrame from a Java bean, using nested beans throws 
an error despite 
[documentation](http://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection)
 stating otherwise. This PR aims to add that support.

This PR does not yet add nested beans support in array or List fields. This 
can be added later or in another PR.

## How was this patch tested?

Nested bean was added to the appropriate unit test.

Also manually tested in Spark shell on code emulating the referenced JIRA:

```
scala> import scala.beans.BeanProperty
import scala.beans.BeanProperty

scala> class SubCategory(@BeanProperty var id: String, @BeanProperty var 
name: String) extends Serializable
defined class SubCategory

scala> class Category(@BeanProperty var id: String, @BeanProperty var 
subCategory: SubCategory) extends Serializable
defined class Category

scala> import scala.collection.JavaConverters._
import scala.collection.JavaConverters._

scala> spark.createDataFrame(Seq(new Category("s-111", new 
SubCategory("sc-111", "Sub-1"))).asJava, classOf[Category])
java.lang.IllegalArgumentException: The value (SubCategory@65130cf2) of the 
type (SubCategory) cannot be converted to struct
  at 
org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:262)
  at 
org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:238)
  at 
org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103)
  at 
org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:396)
  at 
org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:1108)
  at 
org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:1108)
  at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
  at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
  at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
  at 
org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:1108)
  at 
org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:1106)
  at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
  at scala.collection.Iterator$class.toStream(Iterator.scala:1320)
  at scala.collection.AbstractIterator.toStream(Iterator.scala:1334)
  at scala.collection.TraversableOnce$class.toSeq(TraversableOnce.scala:298)
  at scala.collection.AbstractIterator.toSeq(Iterator.scala:1334)
  at 
org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:423)
  ... 51 elided
```

New behavior:

```
scala> spark.createDataFrame(Seq(new Category("s-111", new 
SubCategory("sc-111", "Sub-1"))).asJava, classOf[Category])
res0: org.apache.spark.sql.DataFrame = [id: string, subCategory: struct]

scala> res0.show()
+-+---+
|   id|subCategory|
+-+---+
|s-111|[sc-111, Sub-1]|
+-+---+
```



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/michalsenkyr/spark SPARK-17952

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22527.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22527


commit ccea758b069c4622e9b1f71b92167c81cfcd81b8
Author: Michal Senkyr 
Date:   2018-09-22T18:25:36Z

Add nested Java beans support to SQLContext.beansToRow




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21816: [SPARK-24794][CORE] Driver launched through rest should ...

2018-09-22 Thread bsikander
Github user bsikander commented on the issue:

https://github.com/apache/spark/pull/21816
  
Could some please have a look at this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22467: [SPARK-25465][TEST] Refactor Parquet test suites ...

2018-09-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22467


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22522: [SPARK-25510][TEST] Create new trait replace BenchmarkWi...

2018-09-22 Thread gengliangwang
Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/22522
  
@wangyum I have left my comment in 
https://github.com/apache/spark/pull/22484 .
Also, should we close this one and move to 
https://github.com/apache/spark/pull/22484 ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark t...

2018-09-22 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r219676161
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala
 ---
@@ -34,621 +34,508 @@ import org.apache.spark.unsafe.map.BytesToBytesMap
 
 /**
  * Benchmark to measure performance for aggregate primitives.
- * To run this:
- *  build/sbt "sql/test-only *benchmark.AggregateBenchmark"
- *
- * Benchmarks in this file are skipped in normal builds.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt: bin/spark-submit --class  
+ *   2. build/sbt "sql/test:runMain "
+ *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt 
"sql/test:runMain "
+ *  Results will be written to 
"benchmarks/AggregateBenchmark-results.txt".
+ * }}}
  */
-class AggregateBenchmark extends BenchmarkWithCodegen {
+object AggregateBenchmark extends RunBenchmarkWithCodegen {
 
-  ignore("aggregate without grouping") {
-val N = 500L << 22
-val benchmark = new Benchmark("agg without grouping", N)
-runBenchmark("agg w/o group", N) {
-  sparkSession.range(N).selectExpr("sum(id)").collect()
+  override def benchmark(): Unit = {
+runBenchmark("aggregate without grouping") {
+  val N = 500L << 22
+  runBenchmark("agg w/o group", N) {
--- End diff --

The `runBenchmark` here is different from the on in line 48, but they have 
the same name. We should have a different name.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22326: [SPARK-25314][SQL] Fix Python UDF accessing attributes f...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22326
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22326: [SPARK-25314][SQL] Fix Python UDF accessing attributes f...

2018-09-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22326
  
**[Test build #96474 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96474/testReport)**
 for PR 22326 at commit 
[`e7c1aee`](https://github.com/apache/spark/commit/e7c1aeecff433ecdd272a9e2a85567d438152722).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22326: [SPARK-25314][SQL] Fix Python UDF accessing attributes f...

2018-09-22 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/22326
  
@cloud-fan Great thanks for your offline guidance, as our discussion, I 
reimplement this by adding a new rule `HandlePythonUDFInJoinCondition` in 
Analyzer, revert all changes in `Optimizer` before. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22326: [SPARK-25314][SQL] Fix Python UDF accessing attributes f...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22326
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3381/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22326: [SPARK-25314][SQL] Fix Python UDF accessing attri...

2018-09-22 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/22326#discussion_r219675105
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -995,7 +995,8 @@ class Dataset[T] private[sql](
 // After the cloning, left and right side will have distinct 
expression ids.
 val plan = withPlan(
   Join(logicalPlan, right.logicalPlan, JoinType(joinType), 
Some(joinExprs.expr)))
-  .queryExecution.analyzed.asInstanceOf[Join]
+  .queryExecution.analyzed
+val joinPlan = plan.collectFirst { case j: Join => j }.get
--- End diff --

For reviewer, we need this change cause the rule 
`HandlePythonUDFInJoinCondition` will break the assumption about the join plan 
after analyzing will only return Join. After we add the rule of handling python 
udf, we'll add filter or project node on top of Join.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22484
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3380/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22522: [SPARK-25510][TEST] Create new trait replace Benc...

2018-09-22 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22522#discussion_r219674900
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/RunBenchmarkWithCodegen.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.SparkSession
+
+/**
+ * Common base trait for micro benchmarks that are supposed to run 
standalone (i.e. not together
+ * with other benchmarks).
+ */
+private[benchmark] trait RunBenchmarkWithCodegen {
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22484
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22522: [SPARK-25510][TEST] Create new trait replace BenchmarkWi...

2018-09-22 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/22522
  
Thanks @cloud-fan I have migrate 
[`AggregateBenchmark`](https://github.com/apache/spark/pull/22484/files) to use 
new trait.
 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

2018-09-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22484
  
**[Test build #96473 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96473/testReport)**
 for PR 22484 at commit 
[`42230b6`](https://github.com/apache/spark/commit/42230b6e3edb731eb69b3b8800805805e2234d10).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-09-22 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/19868
  
Sure, updated.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22525: [SPARK-25503][WEBUI] Total task message in stage page is...

2018-09-22 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/22525
  
cc @vanzin 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22526: [SPARK-25502][WEBUI]Empty Page when page number exceeds ...

2018-09-22 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/22526
  
cc @vanzin 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22526: [SPARK-25502]Empty Page when page number exceeds the rea...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22526
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22526: [SPARK-25502]Empty Page when page number exceeds the rea...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22526
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22526: [SPARK-25502]Empty Page when page number exceeds the rea...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22526
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22526: [SPARK-25502]Empty Page when page number exceeds ...

2018-09-22 Thread shahidki31
GitHub user shahidki31 opened a pull request:

https://github.com/apache/spark/pull/22526

[SPARK-25502]Empty Page when page number exceeds the reatinedTask size.

## What changes were proposed in this pull request?
Test steps :
1)  bin/spark-shell --conf spark.ui.retainedTasks=200
2) val rdd = sc.parallelize(1 to 1000, 1000)
3) rdd.count

Stage tab in the UI will display 10 pages with 100 tasks per page. But 
number of retained tasks in only 200. So, from the 3rd page onwards will 
display nothing. 
 We have to calculate total pages based on the number of tasks need display 
in the UI. 

**Before the change:**

![empty_4](https://user-images.githubusercontent.com/23054875/45918251-b1650580-bea1-11e8-90d3-7e0d491981a2.jpg)

**After the change:**

![empty_3](https://user-images.githubusercontent.com/23054875/45918257-c2ae1200-bea1-11e8-960f-dfbdb4a90ae7.jpg)



## How was this patch tested?

Manually tested

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shahidki31/spark SPARK-25502

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22526.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22526


commit 6204cbe46b99cc6d897dbcebec81e89b369d58d2
Author: Shahid 
Date:   2018-09-22T14:07:22Z

[SPARK-25502]Empty Page when page number exceeds the reatinedTask size.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] ReuseSubquery can be useless when the...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22518
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] ReuseSubquery can be useless when the...

2018-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22518
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96472/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >