[GitHub] spark issue #21866: [SPARK-24768][FollowUp][SQL]Avro migration followup: cha...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21866 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1288/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #93502 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93502/testReport)** for PR 16677 at commit [`d05c144`](https://github.com/apache/spark/commit/d05c144aecdd57f4ee3d179a240ccafa6c02bb66). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21118 Thanks for reviewing and merging @cloud-fan, @gatorsmile, @felixcheung! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` when the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21850 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21854: [SPARK-24896][SQL] Uuid should produce different values ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21854 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1267/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21854: [SPARK-24896][SQL] Uuid should produce different values ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21854 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20699: [SPARK-23544][SQL]Remove redundancy ShuffleExchange in t...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/20699 cc @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` when the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21850 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1268/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21858: [SPARK-24899][SQL][DOC] Add example of monotonically_inc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21858 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1270/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21858: [SPARK-24899][SQL][DOC] Add example of monotonically_inc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21858 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21858: [SPARK-24899][SQL][DOC] Add example of monotonically_inc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21858 **[Test build #93494 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93494/testReport)** for PR 21858 at commit [`29def00`](https://github.com/apache/spark/commit/29def0069d96ca449204ad27e8c66ca2a218ce84). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21859: [SPARK-24900][SQL]speed up sort when the dataset ...
GitHub user sddyljsx opened a pull request: https://github.com/apache/spark/pull/21859 [SPARK-24900][SQL]speed up sort when the dataset is small ## What changes were proposed in this pull request? when running the sql like 'select * from order where order_status = 4 order by order_id'. The filescan and filter will be executed twice, it may take a long time. If the final dataset is small, and the sample data covers all the data, there is no need to do so. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sddyljsx/spark order-optimization Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21859.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21859 commit dd50783d638ca5804531061c0a8aef2c8fef9dc1 Author: neal Date: 2018-07-24T07:26:58Z speed up sort when the dataset is small --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21859: [SPARK-24900][SQL]speed up sort when the dataset is smal...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21859 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21859: [SPARK-24900][SQL]speed up sort when the dataset is smal...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21859 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21860 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93486/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/21857#discussion_r204679789 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -222,6 +222,37 @@ case class Stack(children: Seq[Expression]) extends Generator { } } +/** + * Replicate the row N times. N is specified as the first argument to the function. + * This is a internal function solely used by optimizer to rewrite EXCEPT ALL AND + * INTERSECT ALL queries. + */ +@ExpressionDescription( --- End diff -- @HyukjinKwon OK.. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21802: [SPARK-23928][SQL] Add shuffle collection function.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21802 **[Test build #93493 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93493/testReport)** for PR 21802 at commit [`c56ecc5`](https://github.com/apache/spark/commit/c56ecc5b727b03734a5bd7917bae14b07d09ad7d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21859: [SPARK-24900][SQL]speed up sort when the dataset is smal...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21859 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21853: [SPARK-23957][SQL] Sorts in subqueries are redundant and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21853 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93485/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21857 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1271/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21857 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21802: [SPARK-23928][SQL] Add shuffle collection function.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21802 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21802: [SPARK-23928][SQL] Add shuffle collection function.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21802 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1269/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21860: [SPARK-24901][SQL]Merge the codegen of RegularHas...
GitHub user heary-cao opened a pull request: https://github.com/apache/spark/pull/21860 [SPARK-24901][SQL]Merge the codegen of RegularHashMap and fastHashMap to reduce compiler maxCodesize when VectorizedHashMap is false. ## What changes were proposed in this pull request? Currently, Generate code of update UnsafeRow in hash aggregation. FastHashMap and RegularHashMap are two separate codesï¼These two separate codes need only when VectorizedHashMap is true. but other cases, we can merge together to reduce compiler maxCodesize. thanks. case class DistinctAgg(a: Int, b: Float, c: Double, d: Int, e: String) spark.sparkContext.parallelize( DistinctAgg(8, 2, 3, 4, "a") :: DistinctAgg(9, 3, 4, 5, "b") ::Nil).toDF()createOrReplaceTempView("distinctAgg") val df = sql("select a,b,e, min(d) as mind, min(case when a > 10 then a else null end) as mincasea, min(a) as mina from distinctAgg group by a, b, e") println(org.apache.spark.sql.execution.debug.codegenString(df.queryExecution.executedPlan)) df.show() Generate code like: Before modified: Generated code: /* 001 */ public Object generate(Object[] references) { /* 002 */ return new GeneratedIteratorForCodegenStage1(references); /* 003 */ } /* 004 */ ... /* 354 */ /* 355 */ if (agg_fastAggBuffer_0 != null) { /* 356 */ // common sub-expressions /* 357 */ /* 358 */ // evaluate aggregate function /* 359 */ agg_agg_isNull_31_0 = true; /* 360 */ int agg_value_34 = -1; /* 361 */ /* 362 */ boolean agg_isNull_32 = agg_fastAggBuffer_0.isNullAt(0); /* 363 */ int agg_value_35 = agg_isNull_32 ? /* 364 */ -1 : (agg_fastAggBuffer_0.getInt(0)); /* 365 */ /* 366 */ if (!agg_isNull_32 && (agg_agg_isNull_31_0 || /* 367 */ agg_value_34 > agg_value_35)) { /* 368 */ agg_agg_isNull_31_0 = false; /* 369 */ agg_value_34 = agg_value_35; /* 370 */ } /* 371 */ /* 372 */ if (!false && (agg_agg_isNull_31_0 || /* 373 */ agg_value_34 > agg_expr_2_0)) { /* 374 */ agg_agg_isNull_31_0 = false; /* 375 */ agg_value_34 = agg_expr_2_0; /* 376 */ } /* 377 */ agg_agg_isNull_34_0 = true; /* 378 */ int agg_value_37 = -1; /* 379 */ /* 380 */ boolean agg_isNull_35 = agg_fastAggBuffer_0.isNullAt(1); /* 381 */ int agg_value_38 = agg_isNull_35 ? /* 382 */ -1 : (agg_fastAggBuffer_0.getInt(1)); /* 383 */ /* 384 */ if (!agg_isNull_35 && (agg_agg_isNull_34_0 || /* 385 */ agg_value_37 > agg_value_38)) { /* 386 */ agg_agg_isNull_34_0 = false; /* 387 */ agg_value_37 = agg_value_38; /* 388 */ } /* 389 */ /* 390 */ byte agg_caseWhenResultState_1 = -1; /* 391 */ do { /* 392 */ boolean agg_value_40 = false; /* 393 */ agg_value_40 = agg_expr_0_0 > 10; /* 394 */ if (!false && agg_value_40) { /* 395 */ agg_caseWhenResultState_1 = (byte)(false ? 1 : 0); /* 396 */ agg_agg_value_39_0 = agg_expr_0_0; /* 397 */ continue; /* 398 */ } /* 399 */ /* 400 */ agg_caseWhenResultState_1 = (byte)(true ? 1 : 0); /* 401 */ agg_agg_value_39_0 = -1; /* 402 */ /* 403 */ } while (false); /* 404 */ // TRUE if any condition is met and the result is null, or no any condition is met. /* 405 */ final boolean agg_isNull_36 = (agg_caseWhenResultState_1 != 0); /* 406 */ /* 407 */ if (!agg_isNull_36 && (agg_agg_isNull_34_0 || /* 408 */ agg_value_37 > agg_agg_value_39_0)) { /* 409 */ agg_agg_isNull_34_0 = false; /* 410 */ agg_value_37 = agg_agg_value_39_0; /* 411 */ } /* 412 */ agg_agg_isNull_42_0 = true; /* 413 */ int agg_value_45 = -1; /* 414 */ /* 415 */ boolean agg_isNull_43 = agg_fastAggBuffer_0.isNullAt(2); /* 416 */ int agg_value_46 = agg_isNull_43 ? /* 417 */ -1 : (agg_fastAggBuffer_0.getInt(2)); /* 418 */ /* 419 */ if (!agg_isNull_43 && (agg_agg_isNull_42_0 || /* 420 */ agg_value_45 > agg_value_46)) { /* 421 */ agg_agg_isNull_42_0 = false; /* 422 */ agg_value_45 = agg_value_46; /* 423 */ } /* 424 */ /* 425 */ if (!false && (agg_agg_isNull_42_0 || /* 426 */ agg_value_45 > agg_expr_0_0)) { /* 427 */ agg_agg_isNull_42_0 = false; /* 428 */ agg_value_45 = agg_expr_0_0; /* 429 */ } /* 430 */ // update fast row /* 431 */ agg_fastAggBuffer_0.setInt(0, agg_value_34); /* 432 */ /* 433 */ if (!agg_agg_isNull_34_0) { /* 434 */ agg_fastAggBuffer_0.setInt(1,
[GitHub] spark issue #21854: [SPARK-24896][SQL] Uuid should produce different values ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21854 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21857#discussion_r204677899 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -222,6 +222,37 @@ case class Stack(children: Seq[Expression]) extends Generator { } } +/** + * Replicate the row N times. N is specified as the first argument to the function. + * This is a internal function solely used by optimizer to rewrite EXCEPT ALL AND + * INTERSECT ALL queries. + */ +@ExpressionDescription( --- End diff -- If it's for an internal purpose, you can just remove this though. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21772: [SPARK-24809] [SQL] Serializing LongToUnsafeRowMap in ex...
Github user liutang123 commented on the issue: https://github.com/apache/spark/pull/21772 Jenkins, test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21858: [SPARK-24899][SQL][DOC] Add example of monotonica...
GitHub user jaceklaskowski opened a pull request: https://github.com/apache/spark/pull/21858 [SPARK-24899][SQL][DOC] Add example of monotonically_increasing_id standard function to scaladoc ## What changes were proposed in this pull request? Example of `monotonically_increasing_id` standard function (with how it works internally) in scaladoc ## How was this patch tested? Local build. Waiting for Jenkins You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaceklaskowski/spark SPARK-24899-monotonically_increasing_id Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21858.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21858 commit 29def0069d96ca449204ad27e8c66ca2a218ce84 Author: Jacek Laskowski Date: 2018-07-24T09:34:49Z [SPARK-24899][SQL][DOC] Add example of monotonically_increasing_id standard function to scaladoc --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR ...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/21584#discussion_r204713851 --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/R/Dockerfile --- @@ -0,0 +1,29 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +ARG base_img +FROM $base_img +WORKDIR / +RUN mkdir ${SPARK_HOME}/R +COPY R ${SPARK_HOME}/R + +RUN apk add --no-cache R R-dev + --- End diff -- This is only for Python packaging. R does not have `/root/.cache` when created by alpine. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21845: [SPARK-24886][INFRA] Fix the testing script to increase ...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/21845 @HyukjinKwon I saw the following test run for 11 minutes on jenkins for one of my PR. Not sure if its a transient problem. Just thought, i should let you know. On the nightly runs, should we have test that runs for that long ? SPARK-22499: Least and greatest should not generate codes beyond 64KB (11 minutes, 38 seconds) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21853: [SPARK-23957][SQL] Sorts in subqueries are redundant and...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21853 **[Test build #93485 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93485/testReport)** for PR 21853 at commit [`a86cb9f`](https://github.com/apache/spark/commit/a86cb9f8764ac4962905ee1b8772fec5692d4342). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21860 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21853: [SPARK-23957][SQL] Sorts in subqueries are redundant and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21853 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21857 **[Test build #93495 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93495/testReport)** for PR 21857 at commit [`a6fc341`](https://github.com/apache/spark/commit/a6fc34101261d4627f2c42f5aefc9d377e44e29e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21857 **[Test build #93510 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93510/testReport)** for PR 21857 at commit [`c516f78`](https://github.com/apache/spark/commit/c516f788a21f39abc0442e64d7b54b8e76f40043). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21752: [SPARK-24788][SQL] fixed UnresolvedException when toStri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21752 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93515/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21752: [SPARK-24788][SQL] fixed UnresolvedException when toStri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21752 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21851: [SPARK-24891][SQL] Fix HandleNullInputsForUDF rule
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21851 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` w...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21850#discussion_r204933531 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -414,6 +414,9 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper { // these branches can be pruned away val (h, t) = branches.span(_._1 != TrueLiteral) CaseWhen( h :+ t.head, None) + + case CaseWhen((cond, branchValue) :: Nil, elseValue) => +If(cond, branchValue, elseValue.getOrElse(Literal(null, branchValue.dataType))) --- End diff -- Look like not much difference in term of performance, but `If` primitive has more opportunities for further optimization. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21775: [SPARK-24812][SQL] Last Access Time in the table descrip...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21775 **[Test build #93513 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93513/testReport)** for PR 21775 at commit [`76a34c6`](https://github.com/apache/spark/commit/76a34c6d3c05c3f729be5893210b199ebb6c093c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21865: [SPARK-24895] Remove spotbugs plugin
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21865 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21848: [SPARK-24890] [SQL] Short circuiting the `if` con...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21848#discussion_r204938763 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -1627,6 +1627,8 @@ case class InitializeJavaBean(beanInstance: Expression, setters: Map[String, Exp case class AssertNotNull(child: Expression, walkedTypePath: Seq[String] = Nil) extends UnaryExpression with NonSQLExpression { + override lazy val deterministic: Boolean = false --- End diff -- Fair. I'll create a followup PR for this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21854: [SPARK-24896][SQL] Uuid should produce different values ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21854 **[Test build #93517 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93517/testReport)** for PR 21854 at commit [`1d629dc`](https://github.com/apache/spark/commit/1d629dc40060578aba16cb56a6ba89f89107e74b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21864: [SPARK-24908][R][style] removing spaces to make lintr ha...
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/21864 LGTM. Merged into master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21865: [SPARK-24895] Remove spotbugs plugin
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/21865 cc @HyukjinKwon @kiszk I will merge this PR once it passes the test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21867: [SPARK-24307][CORE] Add conf to revert to old code.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21867 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1290/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21867: [SPARK-24307][CORE] Add conf to revert to old cod...
GitHub user squito opened a pull request: https://github.com/apache/spark/pull/21867 [SPARK-24307][CORE] Add conf to revert to old code. In case there are any issues in converting FileSegmentManagedBuffer to ChunkedByteBuffer, add a conf to go back to old code path. Followup to 7e847646d1f377f46dc3154dea37148d4e557a03 You can merge this pull request into a Git repository by running: $ git pull https://github.com/squito/spark SPARK-24307-p2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21867.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21867 commit bc2ea46b291fe2aea6b9d254dc0fdb4e81f90ebd Author: Imran Rashid Date: 2018-07-24T20:37:26Z [SPARK-24307][CORE] Add conf to revert to old code. In case there are any issues in converting FileSegmentManagedBuffer to ChunkedByteBuffer, add a conf to go back to old code path. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21867: [SPARK-24307][CORE] Add conf to revert to old code.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21867 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r204917880 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -359,20 +366,55 @@ private[spark] class TaskSchedulerImpl( // of locality levels so that it gets a chance to launch local tasks on all of them. // NOTE: the preferredLocality order: PROCESS_LOCAL, NODE_LOCAL, NO_PREF, RACK_LOCAL, ANY for (taskSet <- sortedTaskSets) { - var launchedAnyTask = false - var launchedTaskAtCurrentMaxLocality = false - for (currentMaxLocality <- taskSet.myLocalityLevels) { -do { - launchedTaskAtCurrentMaxLocality = resourceOfferSingleTaskSet( -taskSet, currentMaxLocality, shuffledOffers, availableCpus, tasks) - launchedAnyTask |= launchedTaskAtCurrentMaxLocality -} while (launchedTaskAtCurrentMaxLocality) - } - if (!launchedAnyTask) { -taskSet.abortIfCompletelyBlacklisted(hostToExecutors) + // Skip the barrier taskSet if the available slots are less than the number of pending tasks. + if (taskSet.isBarrier && availableSlots < taskSet.numTasks) { --- End diff -- we should probably have a hard failure if DynamicAllocation is enabled until that is properly addressed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r204914384 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDDBarrier.scala --- @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.rdd + +import scala.reflect.ClassTag + +import org.apache.spark.BarrierTaskContext +import org.apache.spark.TaskContext +import org.apache.spark.annotation.{Experimental, Since} + +/** Represents an RDD barrier, which forces Spark to launch tasks of this stage together. */ +class RDDBarrier[T: ClassTag](rdd: RDD[T]) { + + /** + * :: Experimental :: + * Maps partitions together with a provided BarrierTaskContext. + * + * `preservesPartitioning` indicates whether the input function preserves the partitioner, which + * should be `false` unless `rdd` is a pair RDD and the input function doesn't modify the keys. + */ + @Experimental + @Since("2.4.0") + def mapPartitions[S: ClassTag]( --- End diff -- if the only thing you can do on this is `mapPartitions`, is there any particular reason its divided into two calls `barrier().mapPartititons()`, instead of just `barrierMapPartitions()` or something? Are there more things planned here? I can users expecting the ability to be able to call other functions after `.barrier()`, eg. `barrier().reduceByKey()` or something. the compiler will help with this, but just wondering if we can make it more obvious. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r204917245 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1647,6 +1647,14 @@ abstract class RDD[T: ClassTag]( } } + /** + * :: Experimental :: + * Indicates that Spark must launch the tasks together for the current stage. + */ + @Experimental + @Since("2.4.0") + def barrier(): RDDBarrier[T] = withScope(new RDDBarrier[T](this)) --- End diff -- scheduling from seems to have a very hard requirement that the number of partitions is less than the number of available task slots. It seems really hard for users to get this right. Eg., if I just do `sc.textFile(...).barrier().mapPartitions()` the number of partitions is based on the hdfs input splits. I see lots of users getting confused by this -- it'll work sometimes, won't work other times, and they won't know why. Should there be some automatic repartitioning based on cluster resources? Or at least an api which lets users do this? Even `repartition()` isn't great here, because users dont' want to think about cluster resources. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21752: [SPARK-24788][SQL] fixed UnresolvedException when toStri...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21752 **[Test build #93515 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93515/testReport)** for PR 21752 at commit [`db83c44`](https://github.com/apache/spark/commit/db83c4478cd4077526ced45559a19ba1b84414e0). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class DataFrameAggregateSuite extends QueryTest with SharedSQLContext ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21851: [SPARK-24891][SQL] Fix HandleNullInputsForUDF rule
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21851 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93514/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21851: [SPARK-24891][SQL] Fix HandleNullInputsForUDF rule
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21851 **[Test build #93514 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93514/testReport)** for PR 21851 at commit [`b499b97`](https://github.com/apache/spark/commit/b499b9727a4cb9cc42149d05a4d54dba2de8bd9e). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class KnowNotNull(child: Expression) extends UnaryExpression ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21775: [SPARK-24812][SQL] Last Access Time in the table descrip...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21775 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21775: [SPARK-24812][SQL] Last Access Time in the table descrip...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21775 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93513/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21772: [SPARK-24809] [SQL] Serializing LongToUnsafeRowMap in ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21772 **[Test build #93516 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93516/testReport)** for PR 21772 at commit [`c9ebfd0`](https://github.com/apache/spark/commit/c9ebfd0acdeefa1495b48df84b137ea213b2f7fc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21865: [SPARK-24895] Remove spotbugs plugin
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21865 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21848: [SPARK-24890] [SQL] Short circuiting the `if` condition ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21848 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21848: [SPARK-24890] [SQL] Short circuiting the `if` condition ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21848 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1292/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21864: [SPARK-24908][R][style] removing spaces to make l...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21864 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r204912925 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -359,20 +368,56 @@ private[spark] class TaskSchedulerImpl( // of locality levels so that it gets a chance to launch local tasks on all of them. // NOTE: the preferredLocality order: PROCESS_LOCAL, NODE_LOCAL, NO_PREF, RACK_LOCAL, ANY for (taskSet <- sortedTaskSets) { - var launchedAnyTask = false - var launchedTaskAtCurrentMaxLocality = false - for (currentMaxLocality <- taskSet.myLocalityLevels) { -do { - launchedTaskAtCurrentMaxLocality = resourceOfferSingleTaskSet( -taskSet, currentMaxLocality, shuffledOffers, availableCpus, tasks) - launchedAnyTask |= launchedTaskAtCurrentMaxLocality -} while (launchedTaskAtCurrentMaxLocality) - } - if (!launchedAnyTask) { -taskSet.abortIfCompletelyBlacklisted(hostToExecutors) + // Skip the barrier taskSet if the available slots are less than the number of pending tasks. + if (taskSet.isBarrier && availableSlots < taskSet.numTasks) { +// Skip the launch process. +// TODO SPARK-24819 If the job requires more slots than available (both busy and free +// slots), fail the job on submit. +logInfo(s"Skip current round of resource offers for barrier stage ${taskSet.stageId} " + + s"because the barrier taskSet requires ${taskSet.numTasks} slots, while the total " + + s"number of available slots is ${availableSlots}.") + } else { +var launchedAnyTask = false +var launchedTaskAtCurrentMaxLocality = false +// Record all the executor IDs assigned barrier tasks on. +val addresses = ArrayBuffer[String]() +val taskDescs = ArrayBuffer[TaskDescription]() +for (currentMaxLocality <- taskSet.myLocalityLevels) { + do { +launchedTaskAtCurrentMaxLocality = resourceOfferSingleTaskSet(taskSet, + currentMaxLocality, shuffledOffers, availableCpus, tasks, addresses, taskDescs) +launchedAnyTask |= launchedTaskAtCurrentMaxLocality + } while (launchedTaskAtCurrentMaxLocality) +} +if (!launchedAnyTask) { + taskSet.abortIfCompletelyBlacklisted(hostToExecutors) +} +if (launchedAnyTask && taskSet.isBarrier) { + // Check whether the barrier tasks are partially launched. + // TODO SPARK-24818 handle the assert failure case (that can happen when some locality + // requirements are not fulfilled, and we should revert the launched tasks). + require(taskDescs.size == taskSet.numTasks, +s"Skip current round of resource offers for barrier stage ${taskSet.stageId} " + + s"because only ${taskDescs.size} out of a total number of ${taskSet.numTasks} " + + "tasks got resource offers. The resource offers may have been blacklisted or " + + "cannot fulfill task locality requirements.") + + // Update the taskInfos into all the barrier task properties. + val addressesStr = addresses.zip(taskDescs) +// Addresses ordered by partitionId +.sortBy(_._2.partitionId) +.map(_._1) +.mkString(",") + taskDescs.foreach(_.properties.setProperty("addresses", addressesStr)) + + logInfo(s"Successfully scheduled all the ${taskDescs.size} tasks for barrier stage " + +s"${taskSet.stageId}.") +} } } +// TODO SPARK-24823 Cancel a job that contains barrier stage(s) if the barrier tasks don't get +// launched within a configured time. --- End diff -- with concurrently executing jobs, one job could easily cause starvation for the barrier job, right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21439#discussion_r204915146 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -101,6 +102,17 @@ class JacksonParser( } } + private def makeArrayRootConverter(at: ArrayType): JsonParser => Seq[InternalRow] = { +val elemConverter = makeConverter(at.elementType) +(parser: JsonParser) => parseJsonToken[Seq[InternalRow]](parser, at) { + case START_ARRAY => Seq(InternalRow(convertArray(parser, elemConverter))) + case START_OBJECT if at.elementType.isInstanceOf[StructType] => --- End diff -- Shall we add a comment on top of this `case` to explain it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21439#discussion_r204932903 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -544,34 +544,27 @@ case class JsonToStructs( timeZoneId = None) override def checkInputDataTypes(): TypeCheckResult = nullableSchema match { -case _: StructType | ArrayType(_: StructType, _) | _: MapType => +case _: StructType | _: ArrayType | _: MapType => super.checkInputDataTypes() case _ => TypeCheckResult.TypeCheckFailure( s"Input schema ${nullableSchema.catalogString} must be a struct or an array of structs.") --- End diff -- `or an array of structs.` -> `or an array.` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21865: [SPARK-24895] Remove spotbugs plugin
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21865 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93511/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21865: [SPARK-24895] Remove spotbugs plugin
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21865 **[Test build #93511 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93511/testReport)** for PR 21865 at commit [`af0ecf5`](https://github.com/apache/spark/commit/af0ecf5d39824ed2c0bb0515d9c4ff8651a58f74). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21439#discussion_r204933936 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -544,34 +544,27 @@ case class JsonToStructs( timeZoneId = None) --- End diff -- Pleas also update the comment of `JsonToStructs`: `Converts an json input string to a [[StructType]] or [[ArrayType]] of [[StructType]]s`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21772: [SPARK-24809] [SQL] Serializing LongToUnsafeRowMap in ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21772 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93516/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21772: [SPARK-24809] [SQL] Serializing LongToUnsafeRowMap in ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21772 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21848: [SPARK-24890] [SQL] Short circuiting the `if` condition ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21848 **[Test build #93523 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93523/testReport)** for PR 21848 at commit [`b4f1431`](https://github.com/apache/spark/commit/b4f143180adc0196aa16650efc399226b463699f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21867: [SPARK-24307][CORE] Add conf to revert to old code.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21867 **[Test build #93520 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93520/testReport)** for PR 21867 at commit [`bc2ea46`](https://github.com/apache/spark/commit/bc2ea46b291fe2aea6b9d254dc0fdb4e81f90ebd). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21857 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21857 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93510/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21863: [SPARK-18874][SQL][FOLLOW-UP] Improvement type mismatche...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/21863 @gatorsmile Hi sean, isn't @mgaido91 working in the same area with the in subq pr ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21439#discussion_r204931175 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala --- @@ -136,12 +136,11 @@ class JsonFunctionsSuite extends QueryTest with SharedSQLContext { test("from_json invalid schema") { --- End diff -- Not a invalid schema now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` w...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21850#discussion_r204933164 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -414,6 +414,9 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper { // these branches can be pruned away val (h, t) = branches.span(_._1 != TrueLiteral) CaseWhen( h :+ t.head, None) + + case CaseWhen((cond, branchValue) :: Nil, elseValue) => +If(cond, branchValue, elseValue.getOrElse(Literal(null, branchValue.dataType))) --- End diff -- Before: ``` == Parsed Logical Plan == 'Project [CASE WHEN isnull('a) THEN 1 END AS col1#181] +- 'UnresolvedRelation == Optimized Logical Plan == Project [CASE WHEN isnull(a#182) THEN 1 END AS col1#181] +- Relation[a#182] parquet ``` Generated Java code ```java /* 043 */ protected void processNext() throws java.io.IOException { /* 044 */ if (scan_mutableStateArray_1[0] == null) { /* 045 */ scan_nextBatch_0(); /* 046 */ } /* 047 */ while (scan_mutableStateArray_1[0] != null) { /* 048 */ int scan_numRows_0 = scan_mutableStateArray_1[0].numRows(); /* 049 */ int scan_localEnd_0 = scan_numRows_0 - scan_batchIdx_0; /* 050 */ for (int scan_localIdx_0 = 0; scan_localIdx_0 < scan_localEnd_0; scan_localIdx_0++) { /* 051 */ int scan_rowIdx_0 = scan_batchIdx_0 + scan_localIdx_0; /* 052 */ byte project_caseWhenResultState_0 = -1; /* 053 */ do { /* 054 */ boolean scan_isNull_0 = scan_mutableStateArray_2[0].isNullAt(scan_rowIdx_0); /* 055 */ int scan_value_0 = scan_isNull_0 ? -1 : (scan_mutableStateArray_2[0].getInt(scan_rowIdx_0)); /* 056 */ if (!false && scan_isNull_0) { /* 057 */ project_caseWhenResultState_0 = (byte)(false ? 1 : 0); /* 058 */ project_project_value_0_0 = 1; /* 059 */ continue; /* 060 */ } /* 061 */ /* 062 */ } while (false); /* 063 */ // TRUE if any condition is met and the result is null, or no any condition is met. /* 064 */ final boolean project_isNull_0 = (project_caseWhenResultState_0 != 0); /* 065 */ scan_mutableStateArray_3[1].reset(); /* 066 */ /* 067 */ scan_mutableStateArray_3[1].zeroOutNullBytes(); /* 068 */ /* 069 */ if (project_isNull_0) { /* 070 */ scan_mutableStateArray_3[1].setNullAt(0); /* 071 */ } else { /* 072 */ scan_mutableStateArray_3[1].write(0, project_project_value_0_0); /* 073 */ } /* 074 */ append((scan_mutableStateArray_3[1].getRow())); /* 075 */ if (shouldStop()) { scan_batchIdx_0 = scan_rowIdx_0 + 1; return; } /* 076 */ } /* 077 */ scan_batchIdx_0 = scan_numRows_0; /* 078 */ scan_mutableStateArray_1[0] = null; /* 079 */ scan_nextBatch_0(); /* 080 */ } /* 081 */ ((org.apache.spark.sql.execution.metric.SQLMetric) references[1] /* scanTime */).add(scan_scanTime_0 / (1000 * 1000)); /* 082 */ scan_scanTime_0 = 0; /* 083 */ } ``` After: ``` == Parsed Logical Plan == 'Project [CASE WHEN isnull('a) THEN 1 END AS b#186] +- 'UnresolvedRelation `td` == Optimized Logical Plan == Project [if (isnull(a#187)) 1 else null AS b#186] +- Relation[a#187,b#188] parquet ``` Generated Java code: ```java /* 042 */ protected void processNext() throws java.io.IOException { /* 043 */ if (scan_mutableStateArray_1[0] == null) { /* 044 */ scan_nextBatch_0(); /* 045 */ } /* 046 */ while (scan_mutableStateArray_1[0] != null) { /* 047 */ int scan_numRows_0 = scan_mutableStateArray_1[0].numRows(); /* 048 */ int scan_localEnd_0 = scan_numRows_0 - scan_batchIdx_0; /* 049 */ for (int scan_localIdx_0 = 0; scan_localIdx_0 < scan_localEnd_0; scan_localIdx_0++) { /* 050 */ int scan_rowIdx_0 = scan_batchIdx_0 + scan_localIdx_0; /* 051 */ boolean scan_isNull_0 = scan_mutableStateArray_2[0].isNullAt(scan_rowIdx_0); /* 052 */ int scan_value_0 = scan_isNull_0 ? -1 : (scan_mutableStateArray_2[0].getInt(scan_rowIdx_0)); /* 053 */ boolean project_isNull_0 = false; /* 054 */ int project_value_0 = -1; /* 055 */ if (!false && scan_isNull_0) { /* 056 */ project_isNull_0 = false; /* 057 */ project_value_0 = 1; /* 058 */ } else { /* 059 */ project_isNull_0 = true; /* 060 */ project_value_0 = -1; /* 061 */ } /* 062 */
[GitHub] spark issue #21865: [SPARK-24895] Remove spotbugs plugin
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/21865 lgtm. I am merging this PR to master branch. Then, I will kick off https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` when the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21850 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21803: [SPARK-24849][SPARK-24911][SQL] Converting a value of St...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21803 **[Test build #93522 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93522/testReport)** for PR 21803 at commit [`738e97c`](https://github.com/apache/spark/commit/738e97cdc1801c95b8b9d87ad00c6c8aeaf0f20b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` when the...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21850 **[Test build #93521 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93521/testReport)** for PR 21850 at commit [`59fada7`](https://github.com/apache/spark/commit/59fada75fb59b1c3dabdac0a5d22b35c8f139a44). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` when the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21850 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1291/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21848: [SPARK-24890] [SQL] Short circuiting the `if` condition ...
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/21848 @kiszk `trait Stateful extends Nondeterministic`, and this rule will not be invoked when an expression is nondeterministic. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21854: [SPARK-24896][SQL] Uuid should produce different values ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21854 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21854: [SPARK-24896][SQL] Uuid should produce different values ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21854 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93517/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21848: [SPARK-24890] [SQL] Short circuiting the `if` condition ...
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/21848 Here is a followup PR for making `AssertTrue` and `AssertNotNull` `non-deterministic` https://issues.apache.org/jira/browse/SPARK-24913 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21866: [SPARK-24768][FollowUp][SQL]Avro migration followup: cha...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21866 **[Test build #93518 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93518/testReport)** for PR 21866 at commit [`cff6f2a`](https://github.com/apache/spark/commit/cff6f2a0459e8cc4e48f28bde8103ea44ce5a1ab). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21861: [SPARK-24907][WIP] Migrate JDBC DataSource to JDBCDataSo...
Github user tengpeng commented on the issue: https://github.com/apache/spark/pull/21861 @gatorsmile Got you. I will update the implementation after DataSourceV2 API changes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20699: [SPARK-23544][SQL]Remove redundancy ShuffleExchan...
Github user heary-cao closed the pull request at: https://github.com/apache/spark/pull/20699 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21305 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93519/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21305 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204957474 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -751,7 +751,8 @@ object TypeCoercion { */ case class ConcatCoercion(conf: SQLConf) extends TypeCoercionRule { -override protected def coerceTypes(plan: LogicalPlan): LogicalPlan = plan transform { case p => +override protected def coerceTypes( + plan: LogicalPlan): LogicalPlan = plan resolveOperatorsDown { case p => --- End diff -- im using a weird wrapping here to minimize the diff. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21867: [SPARK-24307][CORE] Add conf to revert to old code.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21867 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93520/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21866: [SPARK-24768][FollowUp][SQL]Avro migration followup: cha...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21866 **[Test build #93528 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93528/testReport)** for PR 21866 at commit [`cff6f2a`](https://github.com/apache/spark/commit/cff6f2a0459e8cc4e48f28bde8103ea44ce5a1ab). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21867: [SPARK-24307][CORE] Add conf to revert to old cod...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21867#discussion_r204959300 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -731,7 +731,14 @@ private[spark] class BlockManager( } if (data != null) { -return Some(ChunkedByteBuffer.fromManagedBuffer(data, chunkSize)) +// SPARK-24307 undocumented "escape-hatch" in case there are any issues in converting to +// to ChunkedByteBuffer, to go back to old code-path. Can be removed post Spark 2.4 if +// new path is stable. +if (conf.getBoolean("spark.fetchToNioBuffer", false)) { --- End diff -- can we have a better prefix, rather than just spark. ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21865: [SPARK-24895] Remove spotbugs plugin
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21865 Thank you all. I couldn't foresee this problem. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21542: [SPARK-24529][Build][test-maven] Add spotbugs into maven...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21542 This was reverted in favour of https://github.com/apache/spark/pull/21865 and SPARK-24895 for now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21851: [SPARK-24891][SQL] Fix HandleNullInputsForUDF rule
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21851 LGTM Thanks! Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20949: [SPARK-19018][SQL] Add support for custom encoding on cs...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20949 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org