[GitHub] spark issue #21623: [SPARK-24638][SQL] StringStartsWith support push down
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21623 thanks, merging to master! @dongjoon-hyun is there something similar in ORC? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21675: [SPARK-24698][PYTHON]: Fixed typo in pyspark.ml's Identi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21675 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92495/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21675: [SPARK-24698][PYTHON]: Fixed typo in pyspark.ml's Identi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21675 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21675: [SPARK-24698][PYTHON]: Fixed typo in pyspark.ml's Identi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21675 **[Test build #92495 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92495/testReport)** for PR 21675 at commit [`d1e44b7`](https://github.com/apache/spark/commit/d1e44b7392f3c4523a12748db10acc310676ceb6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21675: [SPARK-24698][PYTHON]: Fixed typo in pyspark.ml's Identi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21675 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92494/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21675: [SPARK-24698][PYTHON]: Fixed typo in pyspark.ml's Identi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21675 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21675: [SPARK-24698][PYTHON]: Fixed typo in pyspark.ml's Identi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21675 **[Test build #92494 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92494/testReport)** for PR 21675 at commit [`d1e44b7`](https://github.com/apache/spark/commit/d1e44b7392f3c4523a12748db10acc310676ceb6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21675: [SPARK-24698][PYTHON]: Fixed typo in pyspark.ml's Identi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21675 **[Test build #92495 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92495/testReport)** for PR 21675 at commit [`d1e44b7`](https://github.com/apache/spark/commit/d1e44b7392f3c4523a12748db10acc310676ceb6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spark on K8...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/21669 btw, have you sent out this + doc to d...@spark.apache.org? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21675: [SPARK-24698][PYTHON]: Fixed typo in pyspark.ml's Identi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21675 **[Test build #92494 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92494/testReport)** for PR 21675 at commit [`d1e44b7`](https://github.com/apache/spark/commit/d1e44b7392f3c4523a12748db10acc310676ceb6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21675: [SPARK-24698][PYTHON]: Fixed typo in pyspark.ml's Identi...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/21675 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21666: [SPARK-24535][SPARKR] fix tests on java check err...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21666#discussion_r199313370 --- Diff: R/pkg/R/client.R --- @@ -61,6 +61,11 @@ generateSparkSubmitArgs <- function(args, sparkHome, jars, sparkSubmitOpts, pack } checkJavaVersion <- function() { + if (is_windows()) { +# See SPARK-24535 --- End diff -- it would - I guess we need to root cause the reason why checkJavaVersion() fails on Windows/jdk 8 - otherwise sparkR.session() would just fail and no way to continue. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21649: [SPARK-23648][R][SQL]Adds more types for hint in ...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21649#discussion_r199313334 --- Diff: R/pkg/R/DataFrame.R --- @@ -3905,6 +3905,18 @@ setMethod("rollup", groupedData(sgd) }) +isTypeAllowed <- function(x) { + if (is.character(x)) { +TRUE + } else if (is.list(x)) { --- End diff -- I'd actually prefer to be more restrictive.. maybe just one level, unless we know of a real need for nested list? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21221 Build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21221 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92490/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21221 **[Test build #92490 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92490/testReport)** for PR 21221 at commit [`8d9acdf`](https://github.com/apache/spark/commit/8d9acdf32984c0c9c621a058b45805872bb9e4c5). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21677: [SPARK-24692][TESTS] Improvement FilterPushdownBenchmark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21677 **[Test build #92493 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92493/testReport)** for PR 21677 at commit [`616933e`](https://github.com/apache/spark/commit/616933e3759739dcdae2140f5c58b659c943ab7f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21677: [SPARK-24692][TESTS] Improvement FilterPushdownBenchmark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21677 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/597/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21677: [SPARK-24692][TESTS] Improvement FilterPushdownBenchmark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21677 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21677: [SPARK-24692][TESTS] Improvement FilterPushdownBenchmark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21677 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21677: [SPARK-24692][TESTS] Improvement FilterPushdownBenchmark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21677 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92491/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21677: [SPARK-24692][TESTS] Improvement FilterPushdownBenchmark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21677 **[Test build #92491 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92491/testReport)** for PR 21677 at commit [`ccdd21c`](https://github.com/apache/spark/commit/ccdd21cfa75f8577b5f8093c8e0b1eba6aa2e055). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class FilterPushdownBenchmark extends SparkFunSuite with BenchmarkBeforeAndAfterEachTest ` * `trait BenchmarkBeforeAndAfterEachTest extends BeforeAndAfterEachTestData ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21603: [SPARK-17091][SQL] Add rule to convert IN predicate to e...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/21603 Benchmark result: ``` ##[ Pushdown benchmark for InSet -> InFilters ]## Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz InSet -> InFilters (threshold: 10, values count: 5, distribution: 10): Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative Parquet Vectorized7649 / 7678 2.1 486.3 1.0X Parquet Vectorized (Pushdown) 316 / 325 49.8 20.1 24.2X Native ORC Vectorized 6787 / 7353 2.3 431.5 1.1X Native ORC Vectorized (Pushdown) 1010 / 1020 15.6 64.2 7.6X InSet -> InFilters (threshold: 10, values count: 5, distribution: 50): Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative Parquet Vectorized7537 / 7944 2.1 479.2 1.0X Parquet Vectorized (Pushdown) 297 / 306 52.9 18.9 25.3X Native ORC Vectorized 6768 / 6779 2.3 430.3 1.1X Native ORC Vectorized (Pushdown) 998 / 1017 15.8 63.4 7.6X InSet -> InFilters (threshold: 10, values count: 5, distribution: 90): Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative Parquet Vectorized7500 / 7592 2.1 476.8 1.0X Parquet Vectorized (Pushdown) 299 / 306 52.5 19.0 25.1X Native ORC Vectorized 6758 / 6797 2.3 429.7 1.1X Native ORC Vectorized (Pushdown) 982 / 993 16.0 62.4 7.6X InSet -> InFilters (threshold: 10, values count: 10, distribution: 10): Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative Parquet Vectorized7566 / 8153 2.1 481.1 1.0X Parquet Vectorized (Pushdown) 319 / 328 49.3 20.3 23.7X Native ORC Vectorized 6761 / 6812 2.3 429.8 1.1X Native ORC Vectorized (Pushdown) 995 / 1013 15.8 63.3 7.6X InSet -> InFilters (threshold: 10, values count: 10, distribution: 50): Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative Parquet Vectorized7512 / 7581 2.1 477.6 1.0X Parquet Vectorized (Pushdown) 315 / 322 50.0 20.0 23.9X Native ORC Vectorized 6712 / 6774 2.3 426.8 1.1X Native ORC Vectorized (Pushdown) 1001 / 1032 15.7 63.6 7.5X InSet -> InFilters (threshold: 10, values count: 10, distribution: 90): Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative Parquet Vectorized7603 / 7689 2.1 483.4 1.0X Parquet Vectorized (Pushdown) 308 / 317 51.0 19.6 24.7X Native ORC Vectorized 7011 / 7605 2.2 445.7 1.1X Native ORC Vectorized (Pushdown) 1038 / 1067 15.2 66.0 7.3X InSet -> InFilters (threshold: 10, values count: 50, distribution: 10): Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative Parquet Vectorized7750 / 7796 2.0 492.7 1.0X Parquet Vectorized (Pushdown) 7855 / 7961 2.0 499.4 1.0X Native ORC Vectorized 7120 / 7820 2.2 452.7 1.1X Native ORC Vectorized (Pushdown) 1085 / 1122 14.5 69.0 7.1X InSet -> InFilters (threshold: 10, values count: 50, distribution: 50): Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative
[GitHub] spark issue #21678: [SPARK-23461][R]vignettes should include model predictio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21678 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21678: [SPARK-23461][R]vignettes should include model predictio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21678 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92492/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21678: [SPARK-23461][R]vignettes should include model predictio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21678 **[Test build #92492 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92492/testReport)** for PR 21678 at commit [`85ad256`](https://github.com/apache/spark/commit/85ad2563a9a84aacd80e2fe1833042e50c33f08c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21678: [SPARK-23461][R]vignettes should include model predictio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21678 **[Test build #92492 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92492/testReport)** for PR 21678 at commit [`85ad256`](https://github.com/apache/spark/commit/85ad2563a9a84aacd80e2fe1833042e50c33f08c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21678: [SPARK-23461][R]vignettes should include model predictio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21678 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21678: [SPARK-23461][R]vignettes should include model predictio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21678 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/596/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21678: [SPARK-23461][R]vignettes should include model pr...
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/21678 [SPARK-23461][R]vignettes should include model predictions for some ML models ## What changes were proposed in this pull request? Add model predictions for Linear Support Vector Machine (SVM) Classifier, Logistic Regression, GBT, RF and DecisionTree in vignettes. ## How was this patch tested? Manually ran the test and checked the result. You can merge this pull request into a Git repository by running: $ git pull https://github.com/huaxingao/spark spark-23461 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21678.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21678 commit 85ad2563a9a84aacd80e2fe1833042e50c33f08c Author: Huaxin Gao Date: 2018-06-30T02:28:27Z [SPARK-23461][R]vignettes should include model predictions for some ML models --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21674: [SPARK-24696][SQL] ColumnPruning rule fails to re...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21674#discussion_r199310542 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala --- @@ -149,6 +149,7 @@ package object dsl { } } +def rand(e: Long): Expression = Rand(Literal.create(e, LongType)) --- End diff -- I mean: `def rand(e: Long): Expression = Rand(e)`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21674: [SPARK-24696][SQL] ColumnPruning rule fails to re...
Github user maryannxue commented on a diff in the pull request: https://github.com/apache/spark/pull/21674#discussion_r199310515 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala --- @@ -149,6 +149,7 @@ package object dsl { } } +def rand(e: Long): Expression = Rand(Literal.create(e, LongType)) --- End diff -- Since we already have a bunch of expressions here, I don't think it would hurt to add this one? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92487/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21546 **[Test build #92487 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92487/testReport)** for PR 21546 at commit [`8b814b7`](https://github.com/apache/spark/commit/8b814b76d48e9727fafcd85723949142a3658617). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21674: [SPARK-24696][SQL] ColumnPruning rule fails to re...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21674#discussion_r199309806 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2792,4 +2792,25 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } } + + test("SPARK-24696 ColumnPruning rule fails to remove extra Project") { --- End diff -- The test in Jira is simpler than this. Do we need to have two tables and a join? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21674: [SPARK-24696][SQL] ColumnPruning rule fails to re...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21674#discussion_r199309687 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala --- @@ -149,6 +149,7 @@ package object dsl { } } +def rand(e: Long): Expression = Rand(Literal.create(e, LongType)) --- End diff -- We can just use `Rand(seed: Long)`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21674: [SPARK-24696][SQL] ColumnPruning rule fails to remove ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21674 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92486/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21674: [SPARK-24696][SQL] ColumnPruning rule fails to remove ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21674 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21674: [SPARK-24696][SQL] ColumnPruning rule fails to remove ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21674 **[Test build #92486 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92486/testReport)** for PR 21674 at commit [`f45a8b8`](https://github.com/apache/spark/commit/f45a8b8705eb6f7b40141ff528ce9c70a74da136). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21674: [SPARK-24696][SQL] ColumnPruning rule fails to remove ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21674 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92485/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21674: [SPARK-24696][SQL] ColumnPruning rule fails to remove ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21674 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21674: [SPARK-24696][SQL] ColumnPruning rule fails to remove ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21674 **[Test build #92485 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92485/testReport)** for PR 21674 at commit [`11fde8b`](https://github.com/apache/spark/commit/11fde8ba4b64416d863a69c5587c0db67ea61d0a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21623: [SPARK-24638][SQL] StringStartsWith support push down
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/21623 Benchmark result: ``` ###[ Pushdown benchmark for StringStartsWith ]### Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz StringStartsWith filter: (value like '10%'): Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative Parquet Vectorized 10104 / 11125 1.6 642.4 1.0X Parquet Vectorized (Pushdown) 3002 / 3608 5.2 190.8 3.4X Native ORC Vectorized9589 / 10454 1.6 609.7 1.1X Native ORC Vectorized (Pushdown) 9798 / 10509 1.6 622.9 1.0X StringStartsWith filter: (value like '1000%'): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative Parquet Vectorized8437 / 8563 1.9 536.4 1.0X Parquet Vectorized (Pushdown) 279 / 289 56.3 17.8 30.2X Native ORC Vectorized 7354 / 7568 2.1 467.5 1.1X Native ORC Vectorized (Pushdown) 7730 / 7972 2.0 491.4 1.1X StringStartsWith filter: (value like '786432%'): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative Parquet Vectorized8290 / 8510 1.9 527.0 1.0X Parquet Vectorized (Pushdown) 260 / 272 60.5 16.5 31.9X Native ORC Vectorized 7361 / 7395 2.1 468.0 1.1X Native ORC Vectorized (Pushdown) 7694 / 7811 2.0 489.2 1.1X ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21673: SPARK-24697: Fix the reported start offsets in streaming...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21673 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92484/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21673: SPARK-24697: Fix the reported start offsets in streaming...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21673 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21673: SPARK-24697: Fix the reported start offsets in streaming...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21673 **[Test build #92484 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92484/testReport)** for PR 21673 at commit [`24d75ea`](https://github.com/apache/spark/commit/24d75eaadfafdd668cd30d2de3bf463ce775cd69). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21462: [SPARK-24428][K8S] Fix unused code
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/21462#discussion_r199307562 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterManager.scala --- @@ -46,8 +46,6 @@ private[spark] class KubernetesClusterManager extends ExternalClusterManager wit sc: SparkContext, masterURL: String, scheduler: TaskScheduler): SchedulerBackend = { -val executorSecretNamesToMountPaths = KubernetesUtils.parsePrefixedKeyValuePairs( - sc.conf, KUBERNETES_EXECUTOR_SECRETS_PREFIX) --- End diff -- @mccheah I think the logic has changed otherwise the tests I have in this PR(https://github.com/apache/spark/pull/21652) would have failed when removed that part and re-run them. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21677: [SPARK-24692][TESTS] Improvement FilterPushdownBenchmark
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/21677 cc @maropu --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/21556 Benchmark results: ``` ###[ Pushdown benchmark for Decimal ] Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 1 decimal(9, 2) row (value = 7864320): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative Parquet Vectorized4004 / 5309 3.9 254.5 1.0X Parquet Vectorized (Pushdown) 1401 / 1431 11.2 89.1 2.9X Native ORC Vectorized 4499 / 4567 3.5 286.0 0.9X Native ORC Vectorized (Pushdown) 899 / 961 17.5 57.2 4.5X Select 10% decimal(9, 2) rows (value < 1572864): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative Parquet Vectorized5376 / 6437 2.9 341.8 1.0X Parquet Vectorized (Pushdown) 2696 / 2754 5.8 171.4 2.0X Native ORC Vectorized 5458 / 5623 2.9 347.0 1.0X Native ORC Vectorized (Pushdown) 2230 / 2255 7.1 141.8 2.4X Select 50% decimal(9, 2) rows (value < 7864320): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative Parquet Vectorized8280 / 8487 1.9 526.4 1.0X Parquet Vectorized (Pushdown) 7716 / 7757 2.0 490.6 1.1X Native ORC Vectorized 9144 / 9495 1.7 581.4 0.9X Native ORC Vectorized (Pushdown) 7918 / 8118 2.0 503.4 1.0X Select 90% decimal(9, 2) rows (value < 14155776): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative Parquet Vectorized9648 / 9676 1.6 613.4 1.0X Parquet Vectorized (Pushdown) 9647 / 9778 1.6 613.3 1.0X Native ORC Vectorized 10782 / 10867 1.5 685.5 0.9X Native ORC Vectorized (Pushdown)10108 / 10269 1.6 642.6 1.0X Select 1 decimal(18, 2) row (value = 7864320): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative Parquet Vectorized4066 / 4147 3.9 258.5 1.0X Parquet Vectorized (Pushdown) 84 / 89188.0 5.3 48.6X Native ORC Vectorized 5430 / 5512 2.9 345.3 0.7X Native ORC Vectorized (Pushdown) 1054 / 1076 14.9 67.0 3.9X Select 10% decimal(18, 2) rows (value < 1572864): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative Parquet Vectorized5028 / 5154 3.1 319.7 1.0X Parquet Vectorized (Pushdown) 1360 / 1421 11.6 86.5 3.7X Native ORC Vectorized 6266 / 6360 2.5 398.4 0.8X Native ORC Vectorized (Pushdown) 2513 / 2550 6.3 159.8 2.0X Select 50% decimal(18, 2) rows (value < 7864320): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative Parquet Vectorized8571 / 8600 1.8 544.9 1.0X Parquet Vectorized (Pushdown) 6455 / 6713 2.4 410.4 1.3X Native ORC Vectorized 10138 / 10353 1.6 644.5 0.8X Native ORC Vectorized (Pushdown) 8166 / 8418 1.9 519.2 1.0X Select 90% decimal(18, 2) rows (value < 14155776): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative Parquet Vectorized 12184 / 12253 1.3
[GitHub] spark issue #21677: [SPARK-24692][TESTS] Improvement FilterPushdownBenchmark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21677 **[Test build #92491 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92491/testReport)** for PR 21677 at commit [`ccdd21c`](https://github.com/apache/spark/commit/ccdd21cfa75f8577b5f8093c8e0b1eba6aa2e055). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21677: [SPARK-24692][TESTS] Improvement FilterPushdownBenchmark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21677 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21677: [SPARK-24692][TESTS] Improvement FilterPushdownBenchmark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21677 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/595/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21677: [SPARK-24692][TESTS] Improvement FilterPushdownBe...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/21677 [SPARK-24692][TESTS] Improvement FilterPushdownBenchmark ## What changes were proposed in this pull request? 1. Write the result to `benchmarks/FilterPushdownBenchmark-results.txt` for easy maintenance. 2. Add more benchmark case: `StringStartsWith`, `Decimal` and `InSet -> InFilters`. ## How was this patch tested? manual tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-24692 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21677.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21677 commit ccdd21cfa75f8577b5f8093c8e0b1eba6aa2e055 Author: Yuming Wang Date: 2018-06-30T00:22:16Z Improvement FilterPushdownBenchmark --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21221 **[Test build #92490 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92490/testReport)** for PR 21221 at commit [`8d9acdf`](https://github.com/apache/spark/commit/8d9acdf32984c0c9c621a058b45805872bb9e4c5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21676: [SPARK-24699][SS][WIP] Watermark / Append mode should wo...
Github user c-horn commented on the issue: https://github.com/apache/spark/pull/21676 @tdas @marmbrus --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21672: [SPARK-24694][K8S] Pass all app args to integration test...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21672 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21672: [SPARK-24694][K8S] Pass all app args to integration test...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21672 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/594/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21672: [SPARK-24694][K8S] Pass all app args to integration test...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21672 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/594/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21672: [SPARK-24694][K8S] Pass all app args to integration test...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21672 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92489/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21672: [SPARK-24694][K8S] Pass all app args to integration test...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21672 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/594/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21672: [SPARK-24694][K8S] Pass all app args to integration test...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21672 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21672: [SPARK-24694][K8S] Pass all app args to integration test...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21672 **[Test build #92489 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92489/testReport)** for PR 21672 at commit [`92d8292`](https://github.com/apache/spark/commit/92d8292deed6de8e160fdb82adbcd4e9d5d00a48). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21672: [SPARK-24694][K8S] Pass all app args to integration test...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21672 **[Test build #92489 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92489/testReport)** for PR 21672 at commit [`92d8292`](https://github.com/apache/spark/commit/92d8292deed6de8e160fdb82adbcd4e9d5d00a48). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21672: [SPARK-24694][K8S] Pass all app args to integration test...
Github user skonto commented on the issue: https://github.com/apache/spark/pull/21672 @vanzin @ssuchter removed the condition, I think its ok now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21674: [SPARK-24696][SQL] ColumnPruning rule fails to remove ex...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21674 LGTM, too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21662: [SPARK-24662][SQL][SS] Support limit in structure...
Github user mukulmurthy commented on a diff in the pull request: https://github.com/apache/spark/pull/21662#discussion_r199300670 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/MemorySinkSuite.scala --- @@ -70,35 +68,9 @@ class MemorySinkSuite extends StreamTest with BeforeAndAfter { checkAnswer(sink.allData, 1 to 9) } - test("directly add data in Append output mode with row limit") { --- End diff -- I thought about it, but I didn't want to have the feature not working (even though no one is probably using it), and figured it would be easier this way since there's basically no file overlap between the two (except for a test class). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21676: [SPARK-24699][SQL][WIP] Watermark / Append mode should w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21676 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21676: [SPARK-24699][SQL][WIP] Watermark / Append mode should w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21676 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21676: [SPARK-24699][SQL][WIP] Watermark / Append mode should w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21676 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21676: [SPARK-24699][SQL][WIP] Watermark / Append mode s...
GitHub user c-horn opened a pull request: https://github.com/apache/spark/pull/21676 [SPARK-24699][SQL][WIP] Watermark / Append mode should work with Trigger.Once ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-24699 Structured streaming using `Trigger.Once` does not persist watermark state between batches, causing streams to never yield output. I will attach some scripts that reproduce the issue in the Jira issue. It seems like the microbatcher only calculates the watermark off of the previous batch's input and emits new aggs based off of that timestamp. I believe the issue here is that the previous batch state is not persisted to the checkpoint, and therefore cannot be used when the stream is started again with `Trigger.Once`. I will investigate ways of fixing this but I am definitely interested in input from anyone who worked on SS. ## How was this patch tested? Failing unit test provided. You can merge this pull request into a Git repository by running: $ git pull https://github.com/c-horn/spark SPARK-24699 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21676.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21676 commit 1b42cc4a449248da65402a6ea2112c55a3bb8501 Author: Chris Horn Date: 2018-06-29T22:54:45Z a failing test case --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21672: [SPARK-24694][K8S] Pass all app args to integrati...
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/21672#discussion_r199297784 --- Diff: resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesTestComponents.scala --- @@ -106,7 +106,7 @@ private[spark] object SparkAppLauncher extends Logging { val sparkSubmitExecutable = sparkHomeDir.resolve(Paths.get("bin", "spark-submit")) logInfo(s"Launching a spark app with arguments $appArguments and conf $appConf") val appArgsArray = - if (appArguments.appArgs.length > 0) Array(appArguments.appArgs.mkString(" ")) --- End diff -- Yes unless you pass a null... I updated my comment above ^^. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21672: [SPARK-24694][K8S] Pass all app args to integrati...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21672#discussion_r199297547 --- Diff: resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesTestComponents.scala --- @@ -106,7 +106,7 @@ private[spark] object SparkAppLauncher extends Logging { val sparkSubmitExecutable = sparkHomeDir.resolve(Paths.get("bin", "spark-submit")) logInfo(s"Launching a spark app with arguments $appArguments and conf $appConf") val appArgsArray = - if (appArguments.appArgs.length > 0) Array(appArguments.appArgs.mkString(" ")) --- End diff -- Doesn't this make `appArgsArray` basically redundant? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21221 Build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21221 **[Test build #92488 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92488/testReport)** for PR 21221 at commit [`7ed42a5`](https://github.com/apache/spark/commit/7ed42a5d0eb0b93bb9ddecf14d9461c80dfe1ea0). * This patch **fails to build**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21221 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92488/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21672: [SPARK-24694][K8S] Pass all app args to integration test...
Github user skonto commented on the issue: https://github.com/apache/spark/pull/21672 @ssuchter yes if you check the jira, the old behavior does not work with more than one args. In the future might be a problem. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21221 **[Test build #92488 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92488/testReport)** for PR 21221 at commit [`7ed42a5`](https://github.com/apache/spark/commit/7ed42a5d0eb0b93bb9ddecf14d9461c80dfe1ea0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21672: [SPARK-24694][K8S] Pass all app args to integration test...
Github user ssuchter commented on the issue: https://github.com/apache/spark/pull/21672 BTW, for committers - I think this patch is good to merge. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21672: [SPARK-24694][K8S] Pass all app args to integration test...
Github user ssuchter commented on the issue: https://github.com/apache/spark/pull/21672 I see why the old behavior was there. I made a minimal change to some existing code to fix a bug: https://github.com/ssuchter/spark/commit/1d8a265d13b65dcec8db11a5be09d4a029037d2c but this new way is better. Question: In this new way, do we even need the test for appArguments.appArgs.length > 0? Could we just use appArguments.appArgs? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21675: [SPARK-24698][PYTHON]: Fixed typo in pyspark.ml's Identi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21675 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21675: [SPARK-24698][PYTHON]: Fixed typo in pyspark.ml's Identi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21675 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21675: [SPARK-24698][PYTHON]: Fixed typo in pyspark.ml's Identi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21675 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21675: [SPARK-24698][PYTHON]: Fixed typo in pyspark.ml's...
GitHub user mcteo opened a pull request: https://github.com/apache/spark/pull/21675 [SPARK-24698][PYTHON]: Fixed typo in pyspark.ml's Identifiable class. ## What changes were proposed in this pull request? Fixed a small typo in the code that caused 20 random characters to be added to the UID, rather than 12. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mcteo/spark SPARK-24698-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21675.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21675 commit d1e44b7392f3c4523a12748db10acc310676ceb6 Author: mcteo Date: 2018-06-29T22:09:07Z SPARK-24698: Fixed typo in pyspark.ml's Identifiable class. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21440: [SPARK-24307][CORE] Support reading remote cached...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21440#discussion_r199291252 --- Diff: core/src/main/scala/org/apache/spark/util/io/ChunkedByteBufferFileRegion.scala --- @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.util.io + +import java.nio.channels.WritableByteChannel + +import io.netty.channel.FileRegion +import io.netty.util.AbstractReferenceCounted + +import org.apache.spark.internal.Logging +import org.apache.spark.network.util.AbstractFileRegion + + +/** + * This exposes a ChunkedByteBuffer as a netty FileRegion, just to allow sending > 2gb in one netty + * message. This is because netty cannot send a ByteBuf > 2g, but it can send a large FileRegion, + * even though the data is not backed by a file. + */ +private[io] class ChunkedByteBufferFileRegion( +private val chunkedByteBuffer: ChunkedByteBuffer, +private val ioChunkSize: Int) extends AbstractFileRegion { + + private var _transferred: Long = 0 + // this duplicates the original chunks, so we're free to modify the position, limit, etc. + private val chunks = chunkedByteBuffer.getChunks() + private val size = chunks.foldLeft(0) { _ + _.remaining() } --- End diff -- `0L`? Otherwise this will overflow for > 2G right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21672: [SPARK-24694][K8S] Pass all app args to integration test...
Github user ssuchter commented on the issue: https://github.com/apache/spark/pull/21672 So this changes behavior, I think. In the old behavior, if the args were ['a', 'b'] then you'd get a single arg of ['a b'] passed through, and with this you'd get ['a', 'b']. This new behavior seems better, I'm just trying a bit to remember why we had the old behavior. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21662: [SPARK-24662][SQL][SS] Support limit in structure...
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21662#discussion_r199288285 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingLimitExec.scala --- @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.streaming + +import java.util.concurrent.TimeUnit.NANOSECONDS + +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.catalyst.expressions.GenericInternalRow +import org.apache.spark.sql.catalyst.expressions.UnsafeProjection +import org.apache.spark.sql.catalyst.expressions.UnsafeRow +import org.apache.spark.sql.catalyst.plans.physical.{AllTuples, Distribution, Partitioning} +import org.apache.spark.sql.execution.{SparkPlan, UnaryExecNode} +import org.apache.spark.sql.execution.streaming.state.StateStoreOps +import org.apache.spark.sql.types.{LongType, NullType, StructField, StructType} +import org.apache.spark.util.CompletionIterator + +/** + * A physical operator for executing a streaming limit, which makes sure no more than streamLimit + * rows are returned. + */ +case class StreamingLimitExec( +streamLimit: Long, +child: SparkPlan, +stateInfo: Option[StatefulOperatorStateInfo] = None) + extends UnaryExecNode with StateStoreWriter { + + private val keySchema = StructType(Array(StructField("key", NullType))) + private val valueSchema = StructType(Array(StructField("value", LongType))) + + override protected def doExecute(): RDD[InternalRow] = { +metrics // force lazy init at driver + +child.execute().mapPartitionsWithStateStore( + getStateInfo, + keySchema, + valueSchema, + indexOrdinal = None, + sqlContext.sessionState, + Some(sqlContext.streams.stateStoreCoordinator)) { (store, iter) => + val key = UnsafeProjection.create(keySchema)(new GenericInternalRow(Array[Any](null))) + val numOutputRows = longMetric("numOutputRows") + val numUpdatedStateRows = longMetric("numUpdatedStateRows") + val allUpdatesTimeMs = longMetric("allUpdatesTimeMs") + val commitTimeMs = longMetric("commitTimeMs") + val updatesStartTimeNs = System.nanoTime + + val startCount: Long = Option(store.get(key)).map(_.getLong(0)).getOrElse(0L) + var rowCount = startCount + + val result = iter.filter { r => +val x = rowCount < streamLimit --- End diff -- Oh, I missed that distribution. Makes sense then. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21674: [SPARK-24696][SQL] ColumnPruning rule fails to remove ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21674 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21546 **[Test build #92487 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92487/testReport)** for PR 21546 at commit [`8b814b7`](https://github.com/apache/spark/commit/8b814b76d48e9727fafcd85723949142a3658617). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21674: [SPARK-24696][SQL] ColumnPruning rule fails to remove ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21674 **[Test build #92486 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92486/testReport)** for PR 21674 at commit [`f45a8b8`](https://github.com/apache/spark/commit/f45a8b8705eb6f7b40141ff528ce9c70a74da136). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21674: [SPARK-24696][SQL] ColumnPruning rule fails to remove ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21674 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/592/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/593/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream ...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/21546#discussion_r199287248 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -3236,13 +3237,50 @@ class Dataset[T] private[sql]( } /** - * Collect a Dataset as ArrowPayload byte arrays and serve to PySpark. + * Collect a Dataset as Arrow batches and serve stream to PySpark. */ private[sql] def collectAsArrowToPython(): Array[Any] = { +val timeZoneId = sparkSession.sessionState.conf.sessionLocalTimeZone + withAction("collectAsArrowToPython", queryExecution) { plan => - val iter: Iterator[Array[Byte]] = -toArrowPayload(plan).collect().iterator.map(_.asPythonSerializable) - PythonRDD.serveIterator(iter, "serve-Arrow") + PythonRDD.serveToStream("serve-Arrow") { outputStream => +val out = new DataOutputStream(outputStream) +val batchWriter = new ArrowBatchStreamWriter(schema, out, timeZoneId) +val arrowBatchRdd = getArrowBatchRdd(plan) +val numPartitions = arrowBatchRdd.partitions.length + +// Batches ordered by index of partition + batch number for that partition +val batchOrder = new ArrayBuffer[Int]() +var partitionCount = 0 + +// Handler to eagerly write batches to Python out of order +def handlePartitionBatches(index: Int, arrowBatches: Array[Array[Byte]]): Unit = { + if (arrowBatches.nonEmpty) { +batchWriter.writeBatches(arrowBatches.iterator) +(0 until arrowBatches.length).foreach { i => + batchOrder.append(index + i) +} + } + partitionCount += 1 + + // After last batch, end the stream and write batch order + if (partitionCount == numPartitions) { +batchWriter.end() +out.writeInt(batchOrder.length) +// Batch order indices are from 0 to N-1 batches, sorted by order they arrived +batchOrder.zipWithIndex.sortBy(_._1).foreach { case (_, i) => --- End diff -- Yeah, looks like something wasn't quite right with the batch indexing... I fixed it and added your test. Thanks @sethah ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21662: [SPARK-24662][SQL][SS] Support limit in structure...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/21662#discussion_r199286570 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingLimitExec.scala --- @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.streaming + +import java.util.concurrent.TimeUnit.NANOSECONDS + +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.catalyst.expressions.GenericInternalRow +import org.apache.spark.sql.catalyst.expressions.UnsafeProjection +import org.apache.spark.sql.catalyst.expressions.UnsafeRow +import org.apache.spark.sql.catalyst.plans.physical.{AllTuples, Distribution, Partitioning} +import org.apache.spark.sql.execution.{SparkPlan, UnaryExecNode} +import org.apache.spark.sql.execution.streaming.state.StateStoreOps +import org.apache.spark.sql.types.{LongType, NullType, StructField, StructType} +import org.apache.spark.util.CompletionIterator + +/** + * A physical operator for executing a streaming limit, which makes sure no more than streamLimit + * rows are returned. + */ +case class StreamingLimitExec( +streamLimit: Long, +child: SparkPlan, +stateInfo: Option[StatefulOperatorStateInfo] = None) + extends UnaryExecNode with StateStoreWriter { + + private val keySchema = StructType(Array(StructField("key", NullType))) + private val valueSchema = StructType(Array(StructField("value", LongType))) + + override protected def doExecute(): RDD[InternalRow] = { +metrics // force lazy init at driver + +child.execute().mapPartitionsWithStateStore( + getStateInfo, + keySchema, + valueSchema, + indexOrdinal = None, + sqlContext.sessionState, + Some(sqlContext.streams.stateStoreCoordinator)) { (store, iter) => + val key = UnsafeProjection.create(keySchema)(new GenericInternalRow(Array[Any](null))) + val numOutputRows = longMetric("numOutputRows") + val numUpdatedStateRows = longMetric("numUpdatedStateRows") + val allUpdatesTimeMs = longMetric("allUpdatesTimeMs") + val commitTimeMs = longMetric("commitTimeMs") + val updatesStartTimeNs = System.nanoTime + + val startCount: Long = Option(store.get(key)).map(_.getLong(0)).getOrElse(0L) + var rowCount = startCount + + val result = iter.filter { r => +val x = rowCount < streamLimit --- End diff -- Oh and we should be planning a `LocalLimit` before this and perhaps `GlobalStreamingLimitExec` would be a better name to make the functionality obvious. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21662: [SPARK-24662][SQL][SS] Support limit in structure...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/21662#discussion_r199286349 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingLimitExec.scala --- @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.streaming + +import java.util.concurrent.TimeUnit.NANOSECONDS + +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.catalyst.expressions.GenericInternalRow +import org.apache.spark.sql.catalyst.expressions.UnsafeProjection +import org.apache.spark.sql.catalyst.expressions.UnsafeRow +import org.apache.spark.sql.catalyst.plans.physical.{AllTuples, Distribution, Partitioning} +import org.apache.spark.sql.execution.{SparkPlan, UnaryExecNode} +import org.apache.spark.sql.execution.streaming.state.StateStoreOps +import org.apache.spark.sql.types.{LongType, NullType, StructField, StructType} +import org.apache.spark.util.CompletionIterator + +/** + * A physical operator for executing a streaming limit, which makes sure no more than streamLimit + * rows are returned. + */ +case class StreamingLimitExec( +streamLimit: Long, +child: SparkPlan, +stateInfo: Option[StatefulOperatorStateInfo] = None) + extends UnaryExecNode with StateStoreWriter { + + private val keySchema = StructType(Array(StructField("key", NullType))) + private val valueSchema = StructType(Array(StructField("value", LongType))) + + override protected def doExecute(): RDD[InternalRow] = { +metrics // force lazy init at driver + +child.execute().mapPartitionsWithStateStore( + getStateInfo, + keySchema, + valueSchema, + indexOrdinal = None, + sqlContext.sessionState, + Some(sqlContext.streams.stateStoreCoordinator)) { (store, iter) => + val key = UnsafeProjection.create(keySchema)(new GenericInternalRow(Array[Any](null))) + val numOutputRows = longMetric("numOutputRows") + val numUpdatedStateRows = longMetric("numUpdatedStateRows") + val allUpdatesTimeMs = longMetric("allUpdatesTimeMs") + val commitTimeMs = longMetric("commitTimeMs") + val updatesStartTimeNs = System.nanoTime + + val startCount: Long = Option(store.get(key)).map(_.getLong(0)).getOrElse(0L) + var rowCount = startCount + + val result = iter.filter { r => +val x = rowCount < streamLimit --- End diff -- I think its okay due to `override def requiredChildDistribution: Seq[Distribution] = AllTuples :: Nil`. +1 to making sure there are tests with more than one partition though. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21674: [SPARK-24696][SQL] ColumnPruning rule fails to remove ex...
Github user maryannxue commented on the issue: https://github.com/apache/spark/pull/21674 cc @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21674: [SPARK-24696][SQL] ColumnPruning rule fails to remove ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21674 **[Test build #92485 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92485/testReport)** for PR 21674 at commit [`11fde8b`](https://github.com/apache/spark/commit/11fde8ba4b64416d863a69c5587c0db67ea61d0a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21674: [SPARK-24696][SQL] ColumnPruning rule fails to remove ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21674 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/591/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21674: [SPARK-24696][SQL] ColumnPruning rule fails to remove ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21674 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org