date:20180629

[GitHub] spark issue #21623: [SPARK-24638][SQL] StringStartsWith support push down

2018-06-29 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21623
  
thanks, merging to master!

@dongjoon-hyun is there something similar in ORC?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21675: [SPARK-24698][PYTHON]: Fixed typo in pyspark.ml's Identi...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21675
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92495/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21675: [SPARK-24698][PYTHON]: Fixed typo in pyspark.ml's Identi...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21675
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21675: [SPARK-24698][PYTHON]: Fixed typo in pyspark.ml's Identi...

2018-06-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21675
  
**[Test build #92495 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92495/testReport)**
 for PR 21675 at commit 
[`d1e44b7`](https://github.com/apache/spark/commit/d1e44b7392f3c4523a12748db10acc310676ceb6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21675: [SPARK-24698][PYTHON]: Fixed typo in pyspark.ml's Identi...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21675
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92494/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21675: [SPARK-24698][PYTHON]: Fixed typo in pyspark.ml's Identi...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21675
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21675: [SPARK-24698][PYTHON]: Fixed typo in pyspark.ml's Identi...

2018-06-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21675
  
**[Test build #92494 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92494/testReport)**
 for PR 21675 at commit 
[`d1e44b7`](https://github.com/apache/spark/commit/d1e44b7392f3c4523a12748db10acc310676ceb6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21675: [SPARK-24698][PYTHON]: Fixed typo in pyspark.ml's Identi...

2018-06-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21675
  
**[Test build #92495 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92495/testReport)**
 for PR 21675 at commit 
[`d1e44b7`](https://github.com/apache/spark/commit/d1e44b7392f3c4523a12748db10acc310676ceb6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spark on K8...

2018-06-29 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21669
  
btw, have you sent out this + doc to d...@spark.apache.org?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21675: [SPARK-24698][PYTHON]: Fixed typo in pyspark.ml's Identi...

2018-06-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21675
  
**[Test build #92494 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92494/testReport)**
 for PR 21675 at commit 
[`d1e44b7`](https://github.com/apache/spark/commit/d1e44b7392f3c4523a12748db10acc310676ceb6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21675: [SPARK-24698][PYTHON]: Fixed typo in pyspark.ml's Identi...

2018-06-29 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21675
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21666: [SPARK-24535][SPARKR] fix tests on java check err...

2018-06-29 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21666#discussion_r199313370
  
--- Diff: R/pkg/R/client.R ---
@@ -61,6 +61,11 @@ generateSparkSubmitArgs <- function(args, sparkHome, 
jars, sparkSubmitOpts, pack
 }
 
 checkJavaVersion <- function() {
+  if (is_windows()) {
+# See SPARK-24535
--- End diff --

it would - I guess we need to root cause the reason why checkJavaVersion() 
fails on Windows/jdk 8 - otherwise sparkR.session() would just fail and no way 
to continue.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21649: [SPARK-23648][R][SQL]Adds more types for hint in ...

2018-06-29 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21649#discussion_r199313334
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3905,6 +3905,18 @@ setMethod("rollup",
 groupedData(sgd)
   })
 
+isTypeAllowed <- function(x) {
+  if (is.character(x)) {
+TRUE
+  } else if (is.list(x)) {
--- End diff --

I'd actually prefer to be more restrictive.. maybe just one level, unless 
we know of a real need for nested list?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21221
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21221
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92490/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-06-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21221
  
**[Test build #92490 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92490/testReport)**
 for PR 21221 at commit 
[`8d9acdf`](https://github.com/apache/spark/commit/8d9acdf32984c0c9c621a058b45805872bb9e4c5).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21677: [SPARK-24692][TESTS] Improvement FilterPushdownBenchmark

2018-06-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21677
  
**[Test build #92493 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92493/testReport)**
 for PR 21677 at commit 
[`616933e`](https://github.com/apache/spark/commit/616933e3759739dcdae2140f5c58b659c943ab7f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21677: [SPARK-24692][TESTS] Improvement FilterPushdownBenchmark

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21677
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/597/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21677: [SPARK-24692][TESTS] Improvement FilterPushdownBenchmark

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21677
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21677: [SPARK-24692][TESTS] Improvement FilterPushdownBenchmark

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21677
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21677: [SPARK-24692][TESTS] Improvement FilterPushdownBenchmark

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21677
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92491/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21677: [SPARK-24692][TESTS] Improvement FilterPushdownBenchmark

2018-06-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21677
  
**[Test build #92491 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92491/testReport)**
 for PR 21677 at commit 
[`ccdd21c`](https://github.com/apache/spark/commit/ccdd21cfa75f8577b5f8093c8e0b1eba6aa2e055).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class FilterPushdownBenchmark extends SparkFunSuite with 
BenchmarkBeforeAndAfterEachTest `
  * `trait BenchmarkBeforeAndAfterEachTest extends 
BeforeAndAfterEachTestData `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21603: [SPARK-17091][SQL] Add rule to convert IN predicate to e...

2018-06-29 Thread wangyum

Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/21603
  
Benchmark result:
```
##[ Pushdown benchmark for InSet -> InFilters 
]##
Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz

InSet -> InFilters (threshold: 10, values count: 5, distribution: 10): 
Best/Avg Time(ms)Rate(M/s)   Per Row(ns)   Relative


Parquet Vectorized7649 / 7678  2.1  
   486.3   1.0X
Parquet Vectorized (Pushdown)  316 /  325 49.8  
20.1  24.2X
Native ORC Vectorized 6787 / 7353  2.3  
   431.5   1.1X
Native ORC Vectorized (Pushdown)  1010 / 1020 15.6  
64.2   7.6X

InSet -> InFilters (threshold: 10, values count: 5, distribution: 50): 
Best/Avg Time(ms)Rate(M/s)   Per Row(ns)   Relative


Parquet Vectorized7537 / 7944  2.1  
   479.2   1.0X
Parquet Vectorized (Pushdown)  297 /  306 52.9  
18.9  25.3X
Native ORC Vectorized 6768 / 6779  2.3  
   430.3   1.1X
Native ORC Vectorized (Pushdown)   998 / 1017 15.8  
63.4   7.6X

InSet -> InFilters (threshold: 10, values count: 5, distribution: 90): 
Best/Avg Time(ms)Rate(M/s)   Per Row(ns)   Relative


Parquet Vectorized7500 / 7592  2.1  
   476.8   1.0X
Parquet Vectorized (Pushdown)  299 /  306 52.5  
19.0  25.1X
Native ORC Vectorized 6758 / 6797  2.3  
   429.7   1.1X
Native ORC Vectorized (Pushdown)   982 /  993 16.0  
62.4   7.6X

InSet -> InFilters (threshold: 10, values count: 10, distribution: 10): 
Best/Avg Time(ms)Rate(M/s)   Per Row(ns)   Relative


Parquet Vectorized7566 / 8153  2.1  
   481.1   1.0X
Parquet Vectorized (Pushdown)  319 /  328 49.3  
20.3  23.7X
Native ORC Vectorized 6761 / 6812  2.3  
   429.8   1.1X
Native ORC Vectorized (Pushdown)   995 / 1013 15.8  
63.3   7.6X

InSet -> InFilters (threshold: 10, values count: 10, distribution: 50): 
Best/Avg Time(ms)Rate(M/s)   Per Row(ns)   Relative


Parquet Vectorized7512 / 7581  2.1  
   477.6   1.0X
Parquet Vectorized (Pushdown)  315 /  322 50.0  
20.0  23.9X
Native ORC Vectorized 6712 / 6774  2.3  
   426.8   1.1X
Native ORC Vectorized (Pushdown)  1001 / 1032 15.7  
63.6   7.5X

InSet -> InFilters (threshold: 10, values count: 10, distribution: 90): 
Best/Avg Time(ms)Rate(M/s)   Per Row(ns)   Relative


Parquet Vectorized7603 / 7689  2.1  
   483.4   1.0X
Parquet Vectorized (Pushdown)  308 /  317 51.0  
19.6  24.7X
Native ORC Vectorized 7011 / 7605  2.2  
   445.7   1.1X
Native ORC Vectorized (Pushdown)  1038 / 1067 15.2  
66.0   7.3X

InSet -> InFilters (threshold: 10, values count: 50, distribution: 10): 
Best/Avg Time(ms)Rate(M/s)   Per Row(ns)   Relative


Parquet Vectorized7750 / 7796  2.0  
   492.7   1.0X
Parquet Vectorized (Pushdown) 7855 / 7961  2.0  
   499.4   1.0X
Native ORC Vectorized 7120 / 7820  2.2  
   452.7   1.1X
Native ORC Vectorized (Pushdown)  1085 / 1122 14.5  
69.0   7.1X

InSet -> InFilters (threshold: 10, values count: 50, distribution: 50): 
Best/Avg Time(ms)Rate(M/s)   Per Row(ns)   Relative

[GitHub] spark issue #21678: [SPARK-23461][R]vignettes should include model predictio...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21678
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21678: [SPARK-23461][R]vignettes should include model predictio...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21678
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92492/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21678: [SPARK-23461][R]vignettes should include model predictio...

2018-06-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21678
  
**[Test build #92492 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92492/testReport)**
 for PR 21678 at commit 
[`85ad256`](https://github.com/apache/spark/commit/85ad2563a9a84aacd80e2fe1833042e50c33f08c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21678: [SPARK-23461][R]vignettes should include model predictio...

2018-06-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21678
  
**[Test build #92492 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92492/testReport)**
 for PR 21678 at commit 
[`85ad256`](https://github.com/apache/spark/commit/85ad2563a9a84aacd80e2fe1833042e50c33f08c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21678: [SPARK-23461][R]vignettes should include model predictio...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21678
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21678: [SPARK-23461][R]vignettes should include model predictio...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21678
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/596/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21678: [SPARK-23461][R]vignettes should include model pr...

2018-06-29 Thread huaxingao

GitHub user huaxingao opened a pull request:

https://github.com/apache/spark/pull/21678

[SPARK-23461][R]vignettes should include model predictions for some ML 
models

## What changes were proposed in this pull request?

Add model predictions for Linear Support Vector Machine (SVM) Classifier, 
Logistic Regression, GBT, RF and DecisionTree in vignettes.

## How was this patch tested?

Manually ran the test and checked the result.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/huaxingao/spark spark-23461

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21678.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21678


commit 85ad2563a9a84aacd80e2fe1833042e50c33f08c
Author: Huaxin Gao 
Date:   2018-06-30T02:28:27Z

[SPARK-23461][R]vignettes should include model predictions for some ML 
models




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21674: [SPARK-24696][SQL] ColumnPruning rule fails to re...

2018-06-29 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21674#discussion_r199310542
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala ---
@@ -149,6 +149,7 @@ package object dsl {
   }
 }
 
+def rand(e: Long): Expression = Rand(Literal.create(e, LongType))
--- End diff --

I mean: `def rand(e: Long): Expression = Rand(e)`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21674: [SPARK-24696][SQL] ColumnPruning rule fails to re...

2018-06-29 Thread maryannxue

Github user maryannxue commented on a diff in the pull request:

https://github.com/apache/spark/pull/21674#discussion_r199310515
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala ---
@@ -149,6 +149,7 @@ package object dsl {
   }
 }
 
+def rand(e: Long): Expression = Rand(Literal.create(e, LongType))
--- End diff --

Since we already have a bunch of expressions here, I don't think it would 
hurt to add this one?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21546
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21546
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92487/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21546
  
**[Test build #92487 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92487/testReport)**
 for PR 21546 at commit 
[`8b814b7`](https://github.com/apache/spark/commit/8b814b76d48e9727fafcd85723949142a3658617).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21674: [SPARK-24696][SQL] ColumnPruning rule fails to re...

2018-06-29 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21674#discussion_r199309806
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2792,4 +2792,25 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  test("SPARK-24696 ColumnPruning rule fails to remove extra Project") {
--- End diff --

The test in Jira is simpler than this. Do we need to have two tables and a 
join?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21674: [SPARK-24696][SQL] ColumnPruning rule fails to re...

2018-06-29 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21674#discussion_r199309687
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala ---
@@ -149,6 +149,7 @@ package object dsl {
   }
 }
 
+def rand(e: Long): Expression = Rand(Literal.create(e, LongType))
--- End diff --

We can just use `Rand(seed: Long)`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21674: [SPARK-24696][SQL] ColumnPruning rule fails to remove ex...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21674
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92486/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21674: [SPARK-24696][SQL] ColumnPruning rule fails to remove ex...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21674
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21674: [SPARK-24696][SQL] ColumnPruning rule fails to remove ex...

2018-06-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21674
  
**[Test build #92486 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92486/testReport)**
 for PR 21674 at commit 
[`f45a8b8`](https://github.com/apache/spark/commit/f45a8b8705eb6f7b40141ff528ce9c70a74da136).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21674: [SPARK-24696][SQL] ColumnPruning rule fails to remove ex...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21674
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92485/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21674: [SPARK-24696][SQL] ColumnPruning rule fails to remove ex...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21674
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21674: [SPARK-24696][SQL] ColumnPruning rule fails to remove ex...

2018-06-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21674
  
**[Test build #92485 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92485/testReport)**
 for PR 21674 at commit 
[`11fde8b`](https://github.com/apache/spark/commit/11fde8ba4b64416d863a69c5587c0db67ea61d0a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21623: [SPARK-24638][SQL] StringStartsWith support push down

2018-06-29 Thread wangyum

Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/21623
  
Benchmark result:
```
###[ Pushdown benchmark for StringStartsWith 
]###
Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz

StringStartsWith filter: (value like '10%'): Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


Parquet Vectorized  10104 / 11125  1.6  
   642.4   1.0X
Parquet Vectorized (Pushdown) 3002 / 3608  5.2  
   190.8   3.4X
Native ORC Vectorized9589 / 10454  1.6  
   609.7   1.1X
Native ORC Vectorized (Pushdown) 9798 / 10509  1.6  
   622.9   1.0X

StringStartsWith filter: (value like '1000%'): Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative


Parquet Vectorized8437 / 8563  1.9  
   536.4   1.0X
Parquet Vectorized (Pushdown)  279 /  289 56.3  
17.8  30.2X
Native ORC Vectorized 7354 / 7568  2.1  
   467.5   1.1X
Native ORC Vectorized (Pushdown)  7730 / 7972  2.0  
   491.4   1.1X

StringStartsWith filter: (value like '786432%'): Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative


Parquet Vectorized8290 / 8510  1.9  
   527.0   1.0X
Parquet Vectorized (Pushdown)  260 /  272 60.5  
16.5  31.9X
Native ORC Vectorized 7361 / 7395  2.1  
   468.0   1.1X
Native ORC Vectorized (Pushdown)  7694 / 7811  2.0  
   489.2   1.1X
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21673: SPARK-24697: Fix the reported start offsets in streaming...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21673
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92484/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21673: SPARK-24697: Fix the reported start offsets in streaming...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21673
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21673: SPARK-24697: Fix the reported start offsets in streaming...

2018-06-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21673
  
**[Test build #92484 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92484/testReport)**
 for PR 21673 at commit 
[`24d75ea`](https://github.com/apache/spark/commit/24d75eaadfafdd668cd30d2de3bf463ce775cd69).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21462: [SPARK-24428][K8S] Fix unused code

2018-06-29 Thread skonto

Github user skonto commented on a diff in the pull request:

https://github.com/apache/spark/pull/21462#discussion_r199307562
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterManager.scala
 ---
@@ -46,8 +46,6 @@ private[spark] class KubernetesClusterManager extends 
ExternalClusterManager wit
   sc: SparkContext,
   masterURL: String,
   scheduler: TaskScheduler): SchedulerBackend = {
-val executorSecretNamesToMountPaths = 
KubernetesUtils.parsePrefixedKeyValuePairs(
-  sc.conf, KUBERNETES_EXECUTOR_SECRETS_PREFIX)
--- End diff --

@mccheah I think the logic has changed otherwise the tests I have in this 
PR(https://github.com/apache/spark/pull/21652)  would have failed when removed 
that part and re-run them.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21677: [SPARK-24692][TESTS] Improvement FilterPushdownBenchmark

2018-06-29 Thread wangyum

Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/21677
  
cc @maropu 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...

2018-06-29 Thread wangyum

Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/21556
  
Benchmark results:
```
###[ Pushdown benchmark for Decimal 
]
Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz

Select 1 decimal(9, 2) row (value = 7864320): Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative


Parquet Vectorized4004 / 5309  3.9  
   254.5   1.0X
Parquet Vectorized (Pushdown) 1401 / 1431 11.2  
89.1   2.9X
Native ORC Vectorized 4499 / 4567  3.5  
   286.0   0.9X
Native ORC Vectorized (Pushdown)   899 /  961 17.5  
57.2   4.5X

Select 10% decimal(9, 2) rows (value < 1572864): Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative


Parquet Vectorized5376 / 6437  2.9  
   341.8   1.0X
Parquet Vectorized (Pushdown) 2696 / 2754  5.8  
   171.4   2.0X
Native ORC Vectorized 5458 / 5623  2.9  
   347.0   1.0X
Native ORC Vectorized (Pushdown)  2230 / 2255  7.1  
   141.8   2.4X

Select 50% decimal(9, 2) rows (value < 7864320): Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative


Parquet Vectorized8280 / 8487  1.9  
   526.4   1.0X
Parquet Vectorized (Pushdown) 7716 / 7757  2.0  
   490.6   1.1X
Native ORC Vectorized 9144 / 9495  1.7  
   581.4   0.9X
Native ORC Vectorized (Pushdown)  7918 / 8118  2.0  
   503.4   1.0X

Select 90% decimal(9, 2) rows (value < 14155776): Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative


Parquet Vectorized9648 / 9676  1.6  
   613.4   1.0X
Parquet Vectorized (Pushdown) 9647 / 9778  1.6  
   613.3   1.0X
Native ORC Vectorized   10782 / 10867  1.5  
   685.5   0.9X
Native ORC Vectorized (Pushdown)10108 / 10269  1.6  
   642.6   1.0X

Select 1 decimal(18, 2) row (value = 7864320): Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative


Parquet Vectorized4066 / 4147  3.9  
   258.5   1.0X
Parquet Vectorized (Pushdown)   84 /   89188.0  
 5.3  48.6X
Native ORC Vectorized 5430 / 5512  2.9  
   345.3   0.7X
Native ORC Vectorized (Pushdown)  1054 / 1076 14.9  
67.0   3.9X

Select 10% decimal(18, 2) rows (value < 1572864): Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative


Parquet Vectorized5028 / 5154  3.1  
   319.7   1.0X
Parquet Vectorized (Pushdown) 1360 / 1421 11.6  
86.5   3.7X
Native ORC Vectorized 6266 / 6360  2.5  
   398.4   0.8X
Native ORC Vectorized (Pushdown)  2513 / 2550  6.3  
   159.8   2.0X

Select 50% decimal(18, 2) rows (value < 7864320): Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative


Parquet Vectorized8571 / 8600  1.8  
   544.9   1.0X
Parquet Vectorized (Pushdown) 6455 / 6713  2.4  
   410.4   1.3X
Native ORC Vectorized   10138 / 10353  1.6  
   644.5   0.8X
Native ORC Vectorized (Pushdown)  8166 / 8418  1.9  
   519.2   1.0X

Select 90% decimal(18, 2) rows (value < 14155776): Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative


Parquet Vectorized  12184 / 12253  1.3

[GitHub] spark issue #21677: [SPARK-24692][TESTS] Improvement FilterPushdownBenchmark

2018-06-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21677
  
**[Test build #92491 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92491/testReport)**
 for PR 21677 at commit 
[`ccdd21c`](https://github.com/apache/spark/commit/ccdd21cfa75f8577b5f8093c8e0b1eba6aa2e055).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21677: [SPARK-24692][TESTS] Improvement FilterPushdownBenchmark

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21677
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21677: [SPARK-24692][TESTS] Improvement FilterPushdownBenchmark

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21677
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/595/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21677: [SPARK-24692][TESTS] Improvement FilterPushdownBe...

2018-06-29 Thread wangyum

GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/21677

[SPARK-24692][TESTS] Improvement FilterPushdownBenchmark

## What changes were proposed in this pull request?

1. Write the result to `benchmarks/FilterPushdownBenchmark-results.txt` for 
easy maintenance.
2. Add more benchmark case: `StringStartsWith`, `Decimal` and `InSet -> 
InFilters`.

## How was this patch tested?

manual tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-24692

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21677.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21677


commit ccdd21cfa75f8577b5f8093c8e0b1eba6aa2e055
Author: Yuming Wang 
Date:   2018-06-30T00:22:16Z

Improvement FilterPushdownBenchmark




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-06-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21221
  
**[Test build #92490 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92490/testReport)**
 for PR 21221 at commit 
[`8d9acdf`](https://github.com/apache/spark/commit/8d9acdf32984c0c9c621a058b45805872bb9e4c5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21676: [SPARK-24699][SS][WIP] Watermark / Append mode should wo...

2018-06-29 Thread c-horn

Github user c-horn commented on the issue:

https://github.com/apache/spark/pull/21676
  
@tdas @marmbrus


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21672: [SPARK-24694][K8S] Pass all app args to integration test...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21672
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21672: [SPARK-24694][K8S] Pass all app args to integration test...

2018-06-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21672
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/594/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21672: [SPARK-24694][K8S] Pass all app args to integration test...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21672
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/594/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21672: [SPARK-24694][K8S] Pass all app args to integration test...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21672
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92489/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21672: [SPARK-24694][K8S] Pass all app args to integration test...

2018-06-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21672
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/594/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21672: [SPARK-24694][K8S] Pass all app args to integration test...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21672
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21672: [SPARK-24694][K8S] Pass all app args to integration test...

2018-06-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21672
  
**[Test build #92489 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92489/testReport)**
 for PR 21672 at commit 
[`92d8292`](https://github.com/apache/spark/commit/92d8292deed6de8e160fdb82adbcd4e9d5d00a48).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21672: [SPARK-24694][K8S] Pass all app args to integration test...

2018-06-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21672
  
**[Test build #92489 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92489/testReport)**
 for PR 21672 at commit 
[`92d8292`](https://github.com/apache/spark/commit/92d8292deed6de8e160fdb82adbcd4e9d5d00a48).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21672: [SPARK-24694][K8S] Pass all app args to integration test...

2018-06-29 Thread skonto

Github user skonto commented on the issue:

https://github.com/apache/spark/pull/21672
  
@vanzin @ssuchter removed the condition, I think its ok now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21674: [SPARK-24696][SQL] ColumnPruning rule fails to remove ex...

2018-06-29 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21674
  
LGTM, too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21662: [SPARK-24662][SQL][SS] Support limit in structure...

2018-06-29 Thread mukulmurthy

Github user mukulmurthy commented on a diff in the pull request:

https://github.com/apache/spark/pull/21662#discussion_r199300670
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/MemorySinkSuite.scala
 ---
@@ -70,35 +68,9 @@ class MemorySinkSuite extends StreamTest with 
BeforeAndAfter {
 checkAnswer(sink.allData, 1 to 9)
   }
 
-  test("directly add data in Append output mode with row limit") {
--- End diff --

I thought about it, but I didn't want to have the feature not working (even 
though no one is probably using it), and figured it would be easier this way 
since there's basically no file overlap between the two (except for a test 
class).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21676: [SPARK-24699][SQL][WIP] Watermark / Append mode should w...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21676
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21676: [SPARK-24699][SQL][WIP] Watermark / Append mode should w...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21676
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21676: [SPARK-24699][SQL][WIP] Watermark / Append mode should w...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21676
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21676: [SPARK-24699][SQL][WIP] Watermark / Append mode s...

2018-06-29 Thread c-horn

GitHub user c-horn opened a pull request:

https://github.com/apache/spark/pull/21676

[SPARK-24699][SQL][WIP] Watermark / Append mode should work with 
Trigger.Once

## What changes were proposed in this pull request?

https://issues.apache.org/jira/browse/SPARK-24699

Structured streaming using `Trigger.Once` does not persist watermark state 
between batches, causing streams to never yield output. I will attach some 
scripts that reproduce the issue in the Jira issue.

It seems like the microbatcher only calculates the watermark off of the 
previous batch's input and emits new aggs based off of that timestamp. I 
believe the issue here is that the previous batch state is not persisted to the 
checkpoint, and therefore cannot be used when the stream is started again with 
`Trigger.Once`.

I will investigate ways of fixing this but I am definitely interested in 
input from anyone who worked on SS.

## How was this patch tested?

Failing unit test provided.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/c-horn/spark SPARK-24699

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21676.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21676


commit 1b42cc4a449248da65402a6ea2112c55a3bb8501
Author: Chris Horn 
Date:   2018-06-29T22:54:45Z

a failing test case




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21672: [SPARK-24694][K8S] Pass all app args to integrati...

2018-06-29 Thread skonto

Github user skonto commented on a diff in the pull request:

https://github.com/apache/spark/pull/21672#discussion_r199297784
  
--- Diff: 
resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesTestComponents.scala
 ---
@@ -106,7 +106,7 @@ private[spark] object SparkAppLauncher extends Logging {
 val sparkSubmitExecutable = sparkHomeDir.resolve(Paths.get("bin", 
"spark-submit"))
 logInfo(s"Launching a spark app with arguments $appArguments and conf 
$appConf")
 val appArgsArray =
-  if (appArguments.appArgs.length > 0) 
Array(appArguments.appArgs.mkString(" "))
--- End diff --

Yes unless you pass a null... I updated my comment above ^^.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21672: [SPARK-24694][K8S] Pass all app args to integrati...

2018-06-29 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21672#discussion_r199297547
  
--- Diff: 
resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesTestComponents.scala
 ---
@@ -106,7 +106,7 @@ private[spark] object SparkAppLauncher extends Logging {
 val sparkSubmitExecutable = sparkHomeDir.resolve(Paths.get("bin", 
"spark-submit"))
 logInfo(s"Launching a spark app with arguments $appArguments and conf 
$appConf")
 val appArgsArray =
-  if (appArguments.appArgs.length > 0) 
Array(appArguments.appArgs.mkString(" "))
--- End diff --

Doesn't this make `appArgsArray` basically redundant?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21221
  
Build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-06-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21221
  
**[Test build #92488 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92488/testReport)**
 for PR 21221 at commit 
[`7ed42a5`](https://github.com/apache/spark/commit/7ed42a5d0eb0b93bb9ddecf14d9461c80dfe1ea0).
 * This patch **fails to build**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21221
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92488/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21672: [SPARK-24694][K8S] Pass all app args to integration test...

2018-06-29 Thread skonto

Github user skonto commented on the issue:

https://github.com/apache/spark/pull/21672
  
@ssuchter yes if you check the jira, the old behavior does not work with 
more than one args. In the future might be a problem.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-06-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21221
  
**[Test build #92488 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92488/testReport)**
 for PR 21221 at commit 
[`7ed42a5`](https://github.com/apache/spark/commit/7ed42a5d0eb0b93bb9ddecf14d9461c80dfe1ea0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21672: [SPARK-24694][K8S] Pass all app args to integration test...

2018-06-29 Thread ssuchter

Github user ssuchter commented on the issue:

https://github.com/apache/spark/pull/21672
  
BTW, for committers - I think this patch is good to merge.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21672: [SPARK-24694][K8S] Pass all app args to integration test...

2018-06-29 Thread ssuchter

Github user ssuchter commented on the issue:

https://github.com/apache/spark/pull/21672
  
I see why the old behavior was there. I made a minimal change to some 
existing code to fix a bug:


https://github.com/ssuchter/spark/commit/1d8a265d13b65dcec8db11a5be09d4a029037d2c

but this new way is better.

Question: In this new way, do we even need the test for 
appArguments.appArgs.length > 0? Could we just use appArguments.appArgs?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21675: [SPARK-24698][PYTHON]: Fixed typo in pyspark.ml's Identi...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21675
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21675: [SPARK-24698][PYTHON]: Fixed typo in pyspark.ml's Identi...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21675
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21675: [SPARK-24698][PYTHON]: Fixed typo in pyspark.ml's Identi...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21675
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21675: [SPARK-24698][PYTHON]: Fixed typo in pyspark.ml's...

2018-06-29 Thread mcteo

GitHub user mcteo opened a pull request:

https://github.com/apache/spark/pull/21675

[SPARK-24698][PYTHON]: Fixed typo in pyspark.ml's Identifiable class.

## What changes were proposed in this pull request?

Fixed a small typo in the code that caused 20 random characters to be added 
to the UID, rather than 12.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mcteo/spark SPARK-24698-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21675.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21675


commit d1e44b7392f3c4523a12748db10acc310676ceb6
Author: mcteo 
Date:   2018-06-29T22:09:07Z

SPARK-24698: Fixed typo in pyspark.ml's Identifiable class.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21440: [SPARK-24307][CORE] Support reading remote cached...

2018-06-29 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21440#discussion_r199291252
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/io/ChunkedByteBufferFileRegion.scala 
---
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util.io
+
+import java.nio.channels.WritableByteChannel
+
+import io.netty.channel.FileRegion
+import io.netty.util.AbstractReferenceCounted
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.network.util.AbstractFileRegion
+
+
+/**
+ * This exposes a ChunkedByteBuffer as a netty FileRegion, just to allow 
sending > 2gb in one netty
+ * message.  This is because netty cannot send a ByteBuf > 2g, but it can 
send a large FileRegion,
+ * even though the data is not backed by a file.
+ */
+private[io] class ChunkedByteBufferFileRegion(
+private val chunkedByteBuffer: ChunkedByteBuffer,
+private val ioChunkSize: Int) extends AbstractFileRegion {
+
+  private var _transferred: Long = 0
+  // this duplicates the original chunks, so we're free to modify the 
position, limit, etc.
+  private val chunks = chunkedByteBuffer.getChunks()
+  private val size = chunks.foldLeft(0) { _ + _.remaining() }
--- End diff --

`0L`? Otherwise this will overflow for > 2G right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21672: [SPARK-24694][K8S] Pass all app args to integration test...

2018-06-29 Thread ssuchter

Github user ssuchter commented on the issue:

https://github.com/apache/spark/pull/21672
  
So this changes behavior, I think. In the old behavior, if the args were 
['a', 'b'] then you'd get a single arg of ['a b'] passed through, and with this 
you'd get ['a', 'b'].

This new behavior seems better, I'm just trying a bit to remember why we 
had the old behavior.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21662: [SPARK-24662][SQL][SS] Support limit in structure...

2018-06-29 Thread jose-torres

Github user jose-torres commented on a diff in the pull request:

https://github.com/apache/spark/pull/21662#discussion_r199288285
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingLimitExec.scala
 ---
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.streaming
+
+import java.util.concurrent.TimeUnit.NANOSECONDS
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.catalyst.expressions.GenericInternalRow
+import org.apache.spark.sql.catalyst.expressions.UnsafeProjection
+import org.apache.spark.sql.catalyst.expressions.UnsafeRow
+import org.apache.spark.sql.catalyst.plans.physical.{AllTuples, 
Distribution, Partitioning}
+import org.apache.spark.sql.execution.{SparkPlan, UnaryExecNode}
+import org.apache.spark.sql.execution.streaming.state.StateStoreOps
+import org.apache.spark.sql.types.{LongType, NullType, StructField, 
StructType}
+import org.apache.spark.util.CompletionIterator
+
+/**
+ * A physical operator for executing a streaming limit, which makes sure 
no more than streamLimit
+ * rows are returned.
+ */
+case class StreamingLimitExec(
+streamLimit: Long,
+child: SparkPlan,
+stateInfo: Option[StatefulOperatorStateInfo] = None)
+  extends UnaryExecNode with StateStoreWriter {
+
+  private val keySchema = StructType(Array(StructField("key", NullType)))
+  private val valueSchema = StructType(Array(StructField("value", 
LongType)))
+
+  override protected def doExecute(): RDD[InternalRow] = {
+metrics // force lazy init at driver
+
+child.execute().mapPartitionsWithStateStore(
+  getStateInfo,
+  keySchema,
+  valueSchema,
+  indexOrdinal = None,
+  sqlContext.sessionState,
+  Some(sqlContext.streams.stateStoreCoordinator)) { (store, iter) =>
+  val key = UnsafeProjection.create(keySchema)(new 
GenericInternalRow(Array[Any](null)))
+  val numOutputRows = longMetric("numOutputRows")
+  val numUpdatedStateRows = longMetric("numUpdatedStateRows")
+  val allUpdatesTimeMs = longMetric("allUpdatesTimeMs")
+  val commitTimeMs = longMetric("commitTimeMs")
+  val updatesStartTimeNs = System.nanoTime
+
+  val startCount: Long = 
Option(store.get(key)).map(_.getLong(0)).getOrElse(0L)
+  var rowCount = startCount
+
+  val result = iter.filter { r =>
+val x = rowCount < streamLimit
--- End diff --

Oh, I missed that distribution. Makes sense then.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21674: [SPARK-24696][SQL] ColumnPruning rule fails to remove ex...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21674
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21546
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21546
  
**[Test build #92487 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92487/testReport)**
 for PR 21546 at commit 
[`8b814b7`](https://github.com/apache/spark/commit/8b814b76d48e9727fafcd85723949142a3658617).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21674: [SPARK-24696][SQL] ColumnPruning rule fails to remove ex...

2018-06-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21674
  
**[Test build #92486 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92486/testReport)**
 for PR 21674 at commit 
[`f45a8b8`](https://github.com/apache/spark/commit/f45a8b8705eb6f7b40141ff528ce9c70a74da136).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21674: [SPARK-24696][SQL] ColumnPruning rule fails to remove ex...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21674
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/592/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21546
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/593/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream ...

2018-06-29 Thread BryanCutler

Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/21546#discussion_r199287248
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -3236,13 +3237,50 @@ class Dataset[T] private[sql](
   }
 
   /**
-   * Collect a Dataset as ArrowPayload byte arrays and serve to PySpark.
+   * Collect a Dataset as Arrow batches and serve stream to PySpark.
*/
   private[sql] def collectAsArrowToPython(): Array[Any] = {
+val timeZoneId = sparkSession.sessionState.conf.sessionLocalTimeZone
+
 withAction("collectAsArrowToPython", queryExecution) { plan =>
-  val iter: Iterator[Array[Byte]] =
-toArrowPayload(plan).collect().iterator.map(_.asPythonSerializable)
-  PythonRDD.serveIterator(iter, "serve-Arrow")
+  PythonRDD.serveToStream("serve-Arrow") { outputStream =>
+val out = new DataOutputStream(outputStream)
+val batchWriter = new ArrowBatchStreamWriter(schema, out, 
timeZoneId)
+val arrowBatchRdd = getArrowBatchRdd(plan)
+val numPartitions = arrowBatchRdd.partitions.length
+
+// Batches ordered by index of partition + batch number for that 
partition
+val batchOrder = new ArrayBuffer[Int]()
+var partitionCount = 0
+
+// Handler to eagerly write batches to Python out of order
+def handlePartitionBatches(index: Int, arrowBatches: 
Array[Array[Byte]]): Unit = {
+  if (arrowBatches.nonEmpty) {
+batchWriter.writeBatches(arrowBatches.iterator)
+(0 until arrowBatches.length).foreach { i =>
+  batchOrder.append(index + i)
+}
+  }
+  partitionCount += 1
+
+  // After last batch, end the stream and write batch order
+  if (partitionCount == numPartitions) {
+batchWriter.end()
+out.writeInt(batchOrder.length)
+// Batch order indices are from 0 to N-1 batches, sorted by 
order they arrived
+batchOrder.zipWithIndex.sortBy(_._1).foreach { case (_, i) =>
--- End diff --

Yeah, looks like something wasn't quite right with the batch indexing... I 
fixed it and added your test.  Thanks @sethah !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21662: [SPARK-24662][SQL][SS] Support limit in structure...

2018-06-29 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/21662#discussion_r199286570
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingLimitExec.scala
 ---
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.streaming
+
+import java.util.concurrent.TimeUnit.NANOSECONDS
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.catalyst.expressions.GenericInternalRow
+import org.apache.spark.sql.catalyst.expressions.UnsafeProjection
+import org.apache.spark.sql.catalyst.expressions.UnsafeRow
+import org.apache.spark.sql.catalyst.plans.physical.{AllTuples, 
Distribution, Partitioning}
+import org.apache.spark.sql.execution.{SparkPlan, UnaryExecNode}
+import org.apache.spark.sql.execution.streaming.state.StateStoreOps
+import org.apache.spark.sql.types.{LongType, NullType, StructField, 
StructType}
+import org.apache.spark.util.CompletionIterator
+
+/**
+ * A physical operator for executing a streaming limit, which makes sure 
no more than streamLimit
+ * rows are returned.
+ */
+case class StreamingLimitExec(
+streamLimit: Long,
+child: SparkPlan,
+stateInfo: Option[StatefulOperatorStateInfo] = None)
+  extends UnaryExecNode with StateStoreWriter {
+
+  private val keySchema = StructType(Array(StructField("key", NullType)))
+  private val valueSchema = StructType(Array(StructField("value", 
LongType)))
+
+  override protected def doExecute(): RDD[InternalRow] = {
+metrics // force lazy init at driver
+
+child.execute().mapPartitionsWithStateStore(
+  getStateInfo,
+  keySchema,
+  valueSchema,
+  indexOrdinal = None,
+  sqlContext.sessionState,
+  Some(sqlContext.streams.stateStoreCoordinator)) { (store, iter) =>
+  val key = UnsafeProjection.create(keySchema)(new 
GenericInternalRow(Array[Any](null)))
+  val numOutputRows = longMetric("numOutputRows")
+  val numUpdatedStateRows = longMetric("numUpdatedStateRows")
+  val allUpdatesTimeMs = longMetric("allUpdatesTimeMs")
+  val commitTimeMs = longMetric("commitTimeMs")
+  val updatesStartTimeNs = System.nanoTime
+
+  val startCount: Long = 
Option(store.get(key)).map(_.getLong(0)).getOrElse(0L)
+  var rowCount = startCount
+
+  val result = iter.filter { r =>
+val x = rowCount < streamLimit
--- End diff --

Oh and we should be planning a `LocalLimit` before this and perhaps 
`GlobalStreamingLimitExec` would be a better name to make the functionality 
obvious.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21662: [SPARK-24662][SQL][SS] Support limit in structure...

2018-06-29 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/21662#discussion_r199286349
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingLimitExec.scala
 ---
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.streaming
+
+import java.util.concurrent.TimeUnit.NANOSECONDS
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.catalyst.expressions.GenericInternalRow
+import org.apache.spark.sql.catalyst.expressions.UnsafeProjection
+import org.apache.spark.sql.catalyst.expressions.UnsafeRow
+import org.apache.spark.sql.catalyst.plans.physical.{AllTuples, 
Distribution, Partitioning}
+import org.apache.spark.sql.execution.{SparkPlan, UnaryExecNode}
+import org.apache.spark.sql.execution.streaming.state.StateStoreOps
+import org.apache.spark.sql.types.{LongType, NullType, StructField, 
StructType}
+import org.apache.spark.util.CompletionIterator
+
+/**
+ * A physical operator for executing a streaming limit, which makes sure 
no more than streamLimit
+ * rows are returned.
+ */
+case class StreamingLimitExec(
+streamLimit: Long,
+child: SparkPlan,
+stateInfo: Option[StatefulOperatorStateInfo] = None)
+  extends UnaryExecNode with StateStoreWriter {
+
+  private val keySchema = StructType(Array(StructField("key", NullType)))
+  private val valueSchema = StructType(Array(StructField("value", 
LongType)))
+
+  override protected def doExecute(): RDD[InternalRow] = {
+metrics // force lazy init at driver
+
+child.execute().mapPartitionsWithStateStore(
+  getStateInfo,
+  keySchema,
+  valueSchema,
+  indexOrdinal = None,
+  sqlContext.sessionState,
+  Some(sqlContext.streams.stateStoreCoordinator)) { (store, iter) =>
+  val key = UnsafeProjection.create(keySchema)(new 
GenericInternalRow(Array[Any](null)))
+  val numOutputRows = longMetric("numOutputRows")
+  val numUpdatedStateRows = longMetric("numUpdatedStateRows")
+  val allUpdatesTimeMs = longMetric("allUpdatesTimeMs")
+  val commitTimeMs = longMetric("commitTimeMs")
+  val updatesStartTimeNs = System.nanoTime
+
+  val startCount: Long = 
Option(store.get(key)).map(_.getLong(0)).getOrElse(0L)
+  var rowCount = startCount
+
+  val result = iter.filter { r =>
+val x = rowCount < streamLimit
--- End diff --

I think its okay due to `override def requiredChildDistribution: 
Seq[Distribution] = AllTuples :: Nil`.

+1 to making sure there are tests with more than one partition though.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21674: [SPARK-24696][SQL] ColumnPruning rule fails to remove ex...

2018-06-29 Thread maryannxue

Github user maryannxue commented on the issue:

https://github.com/apache/spark/pull/21674
  
cc @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21674: [SPARK-24696][SQL] ColumnPruning rule fails to remove ex...

2018-06-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21674
  
**[Test build #92485 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92485/testReport)**
 for PR 21674 at commit 
[`11fde8b`](https://github.com/apache/spark/commit/11fde8ba4b64416d863a69c5587c0db67ea61d0a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21674: [SPARK-24696][SQL] ColumnPruning rule fails to remove ex...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21674
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/591/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21674: [SPARK-24696][SQL] ColumnPruning rule fails to remove ex...

2018-06-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21674
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 388 matches

Mail list logo