[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21758 **[Test build #93580 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93580/testReport)** for PR 21758 at commit [`c7600c2`](https://github.com/apache/spark/commit/c7600c24221d29fde31dca921d9d5863af2666e9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/21758 > What's the failure mode if there are not enough slots for the barrier mode? We should throw an exception right? Yes, as mentioned in https://github.com/apache/spark/pull/21758/files/c16a47f0d15998133b9d61d8df5310f1f66b11b0#diff-d4000438827afe3a185ae75b24987a61R372 , we shall fail the job on submit if there is no enough slots for the barrier stage. I'll submit another PR to add this check (tracked by SPARK-24819). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21867: [SPARK-24307][CORE] Add conf to revert to old code.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21867 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93574/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21789: [SPARK-24829][STS]In Spark Thrift Server, CAST AS FLOAT ...
Github user zuotingbing commented on the issue: https://github.com/apache/spark/pull/21789 @HyukjinKwon could you help to merge this to master branch? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21867: [SPARK-24307][CORE] Add conf to revert to old code.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21867 **[Test build #93574 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93574/testReport)** for PR 21867 at commit [`a5b00b8`](https://github.com/apache/spark/commit/a5b00b8a05538a6adb3a4525c2fecc1e15575f7c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21867: [SPARK-24307][CORE] Add conf to revert to old code.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21867 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21857#discussion_r205325587 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -52,7 +52,7 @@ trait CheckAnalysis extends PredicateHelper { } protected def mapColumnInSetOperation(plan: LogicalPlan): Option[Attribute] = plan match { -case _: Intersect | _: Except | _: Distinct => +case _: Intersect | _: ExceptBase | _: Distinct => --- End diff -- I am fine about that. Please make a change and avoid introducing a new LogicalPlan node. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21102 I agree with @ueshin's. I wouldn't make a guarantee of returning order here in documentation yet though. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21852 **[Test build #93576 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93576/testReport)** for PR 21852 at commit [`0b67e2e`](https://github.com/apache/spark/commit/0b67e2efcb6f827248ee11fffe9eca44a86fceaa). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21878 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93577/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21878 **[Test build #93577 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93577/testReport)** for PR 21878 at commit [`d2759cc`](https://github.com/apache/spark/commit/d2759cce48eb9a85145e90d8a126fb83351d0fda). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21878 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21103 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21103 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1340/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21306 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20861: [SPARK-23599][SQL] Use RandomUUIDGenerator in Uui...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20861#discussion_r205337069 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1994,6 +1996,20 @@ class Analyzer( } } + /** + * Set the seed for random number generation in Uuid expressions. + */ + object ResolvedUuidExpressions extends Rule[LogicalPlan] { +private lazy val random = new Random() + +override def apply(plan: LogicalPlan): LogicalPlan = plan.transformUp { + case p if p.resolved => p + case p => p transformExpressionsUp { +case Uuid(None) => Uuid(Some(random.nextLong())) --- End diff -- what's the current behavior for rand in streaming? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21320 do we have comments other than code style issues? Generally we should not block a PR just for code style issues, as long as the PR passes the style check. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21821: [SPARK-24867] [SQL] Add AnalysisBarrier to DataFr...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21821 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21320 Hey @mallman, I want to thank you for your work on this so far. I've been watching this pull request hoping this would get merged into 2.4 since it would be a benefit to me, but can see how it might be frustrating. Unfortunately, I've only been following the comments and not the code/architecture itself, so I can't take over effectively, but I did try to make the minor comments as requested hopefully to help out. I've started in 7ee616076f93d6cfd55b6646314f3d4a6d1530d3. This may not be super helpful right now, but if these were the only blockers for getting this change into mainline in time for 2.4. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r205318258 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -359,20 +366,55 @@ private[spark] class TaskSchedulerImpl( // of locality levels so that it gets a chance to launch local tasks on all of them. // NOTE: the preferredLocality order: PROCESS_LOCAL, NODE_LOCAL, NO_PREF, RACK_LOCAL, ANY for (taskSet <- sortedTaskSets) { - var launchedAnyTask = false - var launchedTaskAtCurrentMaxLocality = false - for (currentMaxLocality <- taskSet.myLocalityLevels) { -do { - launchedTaskAtCurrentMaxLocality = resourceOfferSingleTaskSet( -taskSet, currentMaxLocality, shuffledOffers, availableCpus, tasks) - launchedAnyTask |= launchedTaskAtCurrentMaxLocality -} while (launchedTaskAtCurrentMaxLocality) - } - if (!launchedAnyTask) { -taskSet.abortIfCompletelyBlacklisted(hostToExecutors) + // Skip the barrier taskSet if the available slots are less than the number of pending tasks. + if (taskSet.isBarrier && availableSlots < taskSet.numTasks) { --- End diff -- We plan to fail the job on submit if it requires more slots than available. Are there other scenarios we shall fail fast with dynamic allocation? IIUC the barrier tasks that have not get launched are still counted into the number of pending tasks, so dynamic resource allocation shall still be able to compute a correct expected number of executors. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21875: [SPARK-24288][SQL] Add a JDBC Option to enable preventin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21875 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21875: [SPARK-24288][SQL] Add a JDBC Option to enable preventin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21875 **[Test build #93579 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93579/testReport)** for PR 21875 at commit [`027b6c4`](https://github.com/apache/spark/commit/027b6c43f8c448d3231d19b21c64ab8306881fde). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21789: [SPARK-24829][STS]In Spark Thrift Server, CAST AS FLOAT ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21789 Let me leave this open for few days in case some reviewers have more comments on this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21852 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21852 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93578/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21103 **[Test build #93582 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93582/testReport)** for PR 21103 at commit [`cf76c1f`](https://github.com/apache/spark/commit/cf76c1f4a2a41ec88fcd744470a113321e897a71). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21830: [SPARK-24878][SQL] Fix reverse function for array...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21830#discussion_r205336425 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1244,46 +1244,50 @@ case class Reverse(child: Expression) extends UnaryExpression with ImplicitCastI } private def arrayCodeGen(ctx: CodegenContext, ev: ExprCode, childName: String): String = { -val length = ctx.freshName("length") -val javaElementType = CodeGenerator.javaType(elementType) + val isPrimitiveType = CodeGenerator.isPrimitiveType(elementType) +val numElements = ctx.freshName("numElements") +val arrayData = ctx.freshName("arrayData") + val initialization = if (isPrimitiveType) { - s"$childName.copy()" + ctx.createUnsafeArray(arrayData, numElements, elementType, s" $prettyName failed.") } else { - s"new ${classOf[GenericArrayData].getName()}(new Object[$length])" -} - -val numberOfIterations = if (isPrimitiveType) s"$length / 2" else length - -val swapAssigments = if (isPrimitiveType) { - val setFunc = "set" + CodeGenerator.primitiveTypeName(elementType) - val getCall = (index: String) => CodeGenerator.getValue(ev.value, elementType, index) - s"""|boolean isNullAtK = ${ev.value}.isNullAt(k); - |boolean isNullAtL = ${ev.value}.isNullAt(l); - |if(!isNullAtK) { - | $javaElementType el = ${getCall("k")}; - | if(!isNullAtL) { - |${ev.value}.$setFunc(k, ${getCall("l")}); - | } else { - |${ev.value}.setNullAt(k); - | } - | ${ev.value}.$setFunc(l, el); - |} else if (!isNullAtL) { - | ${ev.value}.$setFunc(k, ${getCall("l")}); - | ${ev.value}.setNullAt(l); - |}""".stripMargin + val arrayDataClass = classOf[GenericArrayData].getName + s"$arrayDataClass $arrayData = new $arrayDataClass(new Object[$numElements]);" +} + +val i = ctx.freshName("i") +val j = ctx.freshName("j") + +val getValue = CodeGenerator.getValue(childName, elementType, i) + +val setFunc = if (isPrimitiveType) { + s"set${CodeGenerator.primitiveTypeName(elementType)}" +} else { + "update" +} + +val assignment = if (isPrimitiveType && dataType.asInstanceOf[ArrayType].containsNull) { --- End diff -- nit: we can simplify the code if we do `override def dataType: ArrayType = child.dataType.asInstanceOf[ArrayType]` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21830: [SPARK-24878][SQL] Fix reverse function for array type o...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21830 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21830: [SPARK-24878][SQL] Fix reverse function for array...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21830#discussion_r205338428 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1244,46 +1244,50 @@ case class Reverse(child: Expression) extends UnaryExpression with ImplicitCastI } private def arrayCodeGen(ctx: CodegenContext, ev: ExprCode, childName: String): String = { -val length = ctx.freshName("length") -val javaElementType = CodeGenerator.javaType(elementType) + val isPrimitiveType = CodeGenerator.isPrimitiveType(elementType) +val numElements = ctx.freshName("numElements") +val arrayData = ctx.freshName("arrayData") + val initialization = if (isPrimitiveType) { - s"$childName.copy()" + ctx.createUnsafeArray(arrayData, numElements, elementType, s" $prettyName failed.") } else { - s"new ${classOf[GenericArrayData].getName()}(new Object[$length])" -} - -val numberOfIterations = if (isPrimitiveType) s"$length / 2" else length - -val swapAssigments = if (isPrimitiveType) { - val setFunc = "set" + CodeGenerator.primitiveTypeName(elementType) - val getCall = (index: String) => CodeGenerator.getValue(ev.value, elementType, index) - s"""|boolean isNullAtK = ${ev.value}.isNullAt(k); - |boolean isNullAtL = ${ev.value}.isNullAt(l); - |if(!isNullAtK) { - | $javaElementType el = ${getCall("k")}; - | if(!isNullAtL) { - |${ev.value}.$setFunc(k, ${getCall("l")}); - | } else { - |${ev.value}.setNullAt(k); - | } - | ${ev.value}.$setFunc(l, el); - |} else if (!isNullAtL) { - | ${ev.value}.$setFunc(k, ${getCall("l")}); - | ${ev.value}.setNullAt(l); - |}""".stripMargin + val arrayDataClass = classOf[GenericArrayData].getName + s"$arrayDataClass $arrayData = new $arrayDataClass(new Object[$numElements]);" +} + +val i = ctx.freshName("i") +val j = ctx.freshName("j") + +val getValue = CodeGenerator.getValue(childName, elementType, i) + +val setFunc = if (isPrimitiveType) { + s"set${CodeGenerator.primitiveTypeName(elementType)}" +} else { + "update" +} + +val assignment = if (isPrimitiveType && dataType.asInstanceOf[ArrayType].containsNull) { + s""" + |if ($childName.isNullAt($i)) { + | $arrayData.setNullAt($j); + |} else { + | $arrayData.$setFunc($j, $getValue); + |} + """.stripMargin } else { - s"${ev.value}.update(k, ${CodeGenerator.getValue(childName, elementType, "l")});" + s"$arrayData.$setFunc($j, $getValue);" } s""" - |final int $length = $childName.numElements(); - |${ev.value} = $initialization; - |for(int k = 0; k < $numberOfIterations; k++) { - | int l = $length - k - 1; - | $swapAssigments + |final int $numElements = $childName.numElements(); + |$initialization + |for (int $i = 0; $i < $numElements; $i++) { + | int $j = $numElements - $i - 1; --- End diff -- We still need to calculate the index of the opposite side? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21875: [SPARK-24288][SQL] Add a JDBC Option to enable preventin...
Github user maryannxue commented on the issue: https://github.com/apache/spark/pull/21875 Programming guide updated. Thank you, @dilipbiswal and @HyukjinKwon! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21758 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1338/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21852 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21852 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93576/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21875: [SPARK-24288][SQL] Add a JDBC Option to enable preventin...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/21875 @maryannxue It looks good to me. As a minor comment, could we state the default value for this parameter as well ? For some of the other parameters, we specify the default value. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/21403 @maryannxue as I said my initial proposal was like that. I think that this has the advantage of avoiding some code duplication as the same logic which is added in ResolveInValues has to be spread over all the places where a In is build and avoiding to change the In signature, so that if a user is using In directly in his/her code we don't break it. On the other side, I agree with you that the approach having a `Seq[Expression]` is cleaner IMO (that's why it was my original proposal). @cloud-fan @gatorsmile what do you think about this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21852 **[Test build #93578 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93578/testReport)** for PR 21852 at commit [`4acda6f`](https://github.com/apache/spark/commit/4acda6fbf4fb5b1be30a0ad213cd5369b64b02b5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21306 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1339/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21306 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21306 **[Test build #93581 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93581/testReport)** for PR 21306 at commit [`f95800c`](https://github.com/apache/spark/commit/f95800c737f160255122da6bbe336309a4e1532e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21306 **[Test build #93581 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93581/testReport)** for PR 21306 at commit [`f95800c`](https://github.com/apache/spark/commit/f95800c737f160255122da6bbe336309a4e1532e). * This patch **fails Java style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21306 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93581/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...
Github user ajacques commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r205329769 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/SelectedField.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.planning + +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.types._ + +/** + * A Scala extractor that builds a [[org.apache.spark.sql.types.StructField]] from a Catalyst + * complex type extractor. For example, consider a relation with the following schema: + * + * {{{ + * root + *|-- name: struct (nullable = true) + *||-- first: string (nullable = true) + *||-- last: string (nullable = true) + *}}} + * + * Further, suppose we take the select expression `name.first`. This will parse into an + * `Alias(child, "first")`. Ignoring the alias, `child` matches the following pattern: + * + * {{{ + * GetStructFieldObject( + * AttributeReference("name", StructType(_), _, _), + * StructField("first", StringType, _, _)) + * }}} + * + * [[SelectedField]] converts that expression into + * + * {{{ + * StructField("name", StructType(Array(StructField("first", StringType + * }}} + * + * by mapping each complex type extractor to a [[org.apache.spark.sql.types.StructField]] with the + * same name as its child (or "parent" going right to left in the select expression) and a data + * type appropriate to the complex type extractor. In our example, the name of the child expression + * is "name" and its data type is a [[org.apache.spark.sql.types.StructType]] with a single string + * field named "first". + * + * @param expr the top-level complex type extractor + */ +object SelectedField { + def unapply(expr: Expression): Option[StructField] = { --- End diff -- ``` Error:(61, 12) constructor cannot be instantiated to expected type; found : org.apache.spark.sql.catalyst.expressions.Alias required: org.apache.spark.sql.catalyst.expressions.ExtractValue case Alias(child, _) => child ``` Alias takes: `Alias(child: Expression, name: String)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21320 @HyukjinKwon, I'm not totally familiar with Spark internals yet, so to be honest I don't feel confident making big changes and hopefully can keep them simple at first. I've gone through the code review comments and made as many changes as possible [here](https://github.com/apache/spark/compare/master...ajacques:spark-4502-parquet_column_pruning-foundation). If this PR is mostly feature complete and it's just small things, then I can push forward. If the feedback comments push past simple refactoring level right now I would prefer to let someone else take over, but feel free to use what I've done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...
Github user ajacques commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r205329633 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/ProjectionOverSchema.scala --- @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.planning + +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.types._ + +/** + * A Scala extractor that projects an expression over a given schema. Data types, + * field indexes and field counts of complex type extractors and attributes + * are adjusted to fit the schema. All other expressions are left as-is. This + * class is motivated by columnar nested schema pruning. + */ +case class ProjectionOverSchema(schema: StructType) { --- End diff -- We can move this to `sql.execution` if we move all three classes: `ProjectionOverSchema`, `GetStructFieldObject`, and `SelectedField`. Is there a difference in the catalyst.planning vs the execution packages? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21758 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/21857#discussion_r205331401 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -52,7 +52,7 @@ trait CheckAnalysis extends PredicateHelper { } protected def mapColumnInSetOperation(plan: LogicalPlan): Option[Attribute] = plan match { -case _: Intersect | _: Except | _: Distinct => +case _: Intersect | _: ExceptBase | _: Distinct => --- End diff -- @gatorsmile @maropu OK --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21830: [SPARK-24878][SQL] Fix reverse function for array...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21830#discussion_r205336334 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1244,46 +1244,50 @@ case class Reverse(child: Expression) extends UnaryExpression with ImplicitCastI } private def arrayCodeGen(ctx: CodegenContext, ev: ExprCode, childName: String): String = { -val length = ctx.freshName("length") -val javaElementType = CodeGenerator.javaType(elementType) + val isPrimitiveType = CodeGenerator.isPrimitiveType(elementType) +val numElements = ctx.freshName("numElements") +val arrayData = ctx.freshName("arrayData") + val initialization = if (isPrimitiveType) { - s"$childName.copy()" + ctx.createUnsafeArray(arrayData, numElements, elementType, s" $prettyName failed.") } else { - s"new ${classOf[GenericArrayData].getName()}(new Object[$length])" -} - -val numberOfIterations = if (isPrimitiveType) s"$length / 2" else length - -val swapAssigments = if (isPrimitiveType) { - val setFunc = "set" + CodeGenerator.primitiveTypeName(elementType) - val getCall = (index: String) => CodeGenerator.getValue(ev.value, elementType, index) - s"""|boolean isNullAtK = ${ev.value}.isNullAt(k); - |boolean isNullAtL = ${ev.value}.isNullAt(l); - |if(!isNullAtK) { - | $javaElementType el = ${getCall("k")}; - | if(!isNullAtL) { - |${ev.value}.$setFunc(k, ${getCall("l")}); - | } else { - |${ev.value}.setNullAt(k); - | } - | ${ev.value}.$setFunc(l, el); - |} else if (!isNullAtL) { - | ${ev.value}.$setFunc(k, ${getCall("l")}); - | ${ev.value}.setNullAt(l); - |}""".stripMargin + val arrayDataClass = classOf[GenericArrayData].getName + s"$arrayDataClass $arrayData = new $arrayDataClass(new Object[$numElements]);" +} + +val i = ctx.freshName("i") +val j = ctx.freshName("j") + +val getValue = CodeGenerator.getValue(childName, elementType, i) + +val setFunc = if (isPrimitiveType) { + s"set${CodeGenerator.primitiveTypeName(elementType)}" +} else { + "update" +} + +val assignment = if (isPrimitiveType && dataType.asInstanceOf[ArrayType].containsNull) { + s""" + |if ($childName.isNullAt($i)) { + | $arrayData.setNullAt($j); + |} else { + | $arrayData.$setFunc($j, $getValue); + |} + """.stripMargin } else { - s"${ev.value}.update(k, ${CodeGenerator.getValue(childName, elementType, "l")});" + s"$arrayData.$setFunc($j, $getValue);" } s""" - |final int $length = $childName.numElements(); - |${ev.value} = $initialization; - |for(int k = 0; k < $numberOfIterations; k++) { - | int l = $length - k - 1; - | $swapAssigments + |final int $numElements = $childName.numElements(); + |$initialization + |for (int $i = 0; $i < $numElements; $i++) { + | int $j = $numElements - $i - 1; --- End diff -- we don't need `j` if we do ``` for (int i = numElements - 1; i >=0; i--) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21830: [SPARK-24878][SQL] Fix reverse function for array...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21830#discussion_r205338930 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1244,46 +1244,50 @@ case class Reverse(child: Expression) extends UnaryExpression with ImplicitCastI } private def arrayCodeGen(ctx: CodegenContext, ev: ExprCode, childName: String): String = { -val length = ctx.freshName("length") -val javaElementType = CodeGenerator.javaType(elementType) + val isPrimitiveType = CodeGenerator.isPrimitiveType(elementType) +val numElements = ctx.freshName("numElements") +val arrayData = ctx.freshName("arrayData") + val initialization = if (isPrimitiveType) { - s"$childName.copy()" + ctx.createUnsafeArray(arrayData, numElements, elementType, s" $prettyName failed.") } else { - s"new ${classOf[GenericArrayData].getName()}(new Object[$length])" -} - -val numberOfIterations = if (isPrimitiveType) s"$length / 2" else length - -val swapAssigments = if (isPrimitiveType) { - val setFunc = "set" + CodeGenerator.primitiveTypeName(elementType) - val getCall = (index: String) => CodeGenerator.getValue(ev.value, elementType, index) - s"""|boolean isNullAtK = ${ev.value}.isNullAt(k); - |boolean isNullAtL = ${ev.value}.isNullAt(l); - |if(!isNullAtK) { - | $javaElementType el = ${getCall("k")}; - | if(!isNullAtL) { - |${ev.value}.$setFunc(k, ${getCall("l")}); - | } else { - |${ev.value}.setNullAt(k); - | } - | ${ev.value}.$setFunc(l, el); - |} else if (!isNullAtL) { - | ${ev.value}.$setFunc(k, ${getCall("l")}); - | ${ev.value}.setNullAt(l); - |}""".stripMargin + val arrayDataClass = classOf[GenericArrayData].getName + s"$arrayDataClass $arrayData = new $arrayDataClass(new Object[$numElements]);" +} + +val i = ctx.freshName("i") +val j = ctx.freshName("j") + +val getValue = CodeGenerator.getValue(childName, elementType, i) + +val setFunc = if (isPrimitiveType) { + s"set${CodeGenerator.primitiveTypeName(elementType)}" +} else { + "update" +} + +val assignment = if (isPrimitiveType && dataType.asInstanceOf[ArrayType].containsNull) { --- End diff -- We can't override `dataType` only for `ArrayType` because `Reverse` is also used for `StringType`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21875: [SPARK-24288][SQL] Add a JDBC Option to enable pr...
Github user TomaszGaweda commented on a diff in the pull request: https://github.com/apache/spark/pull/21875#discussion_r205268370 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala --- @@ -183,6 +183,9 @@ class JDBCOptions( } // An option to execute custom SQL before fetching data from the remote DB val sessionInitStatement = parameters.get(JDBC_SESSION_INIT_STATEMENT) + + // An option to allow/disallow pushing down predicate into JDBC data source + val pushDownPredicate = parameters.getOrElse(JDBC_PUSHDOWN_PREDICATE, "true").toBoolean --- End diff -- Yeah, consistency is a very good argument :) Indeed it will be plural or not, depending from which side we are looking at it --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Pyt...
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/21650#discussion_r205268767 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala --- @@ -94,36 +95,94 @@ object ExtractPythonUDFFromAggregate extends Rule[LogicalPlan] { */ object ExtractPythonUDFs extends Rule[SparkPlan] with PredicateHelper { - private def hasPythonUDF(e: Expression): Boolean = { + private case class LazyEvalType(var evalType: Int = -1) { + +def isSet: Boolean = evalType >= 0 + +def set(evalType: Int): Unit = { + if (isSet) { +throw new IllegalStateException("Eval type has already been set") + } else { +this.evalType = evalType + } +} + +def get(): Int = { + if (!isSet) { +throw new IllegalStateException("Eval type is not set") + } else { +evalType + } +} + } + + private def hasScalarPythonUDF(e: Expression): Boolean = { e.find(PythonUDF.isScalarPythonUDF).isDefined } - private def canEvaluateInPython(e: PythonUDF): Boolean = { -e.children match { - // single PythonUDF child could be chained and evaluated in Python - case Seq(u: PythonUDF) => canEvaluateInPython(u) - // Python UDF can't be evaluated directly in JVM - case children => !children.exists(hasPythonUDF) + /** + * Check whether a PythonUDF expression can be evaluated in Python. + * + * If the lazy eval type is not set, this method checks for either Batched Python UDF and Scalar + * Pandas UDF. If the lazy eval type is set, this method checks for the expression of the + * specified eval type. + * + * This method will also set the lazy eval type to be the type of the first evaluable expression, + * i.e., if lazy eval type is not set and we find a evaluable Python UDF expression, lazy eval + * type will be set to the eval type of the expression. + * + */ + private def canEvaluateInPython(e: PythonUDF, lazyEvalType: LazyEvalType): Boolean = { --- End diff -- Yes it's in the most recent commit. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21875: [SPARK-24288][SQL] Add a JDBC Option to enable pr...
Github user maryannxue commented on a diff in the pull request: https://github.com/apache/spark/pull/21875#discussion_r205268701 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala --- @@ -172,7 +172,11 @@ private[sql] case class JDBCRelation( // Check if JDBCRDD.compileFilter can accept input filters override def unhandledFilters(filters: Array[Filter]): Array[Filter] = { -filters.filter(JDBCRDD.compileFilter(_, JdbcDialects.get(jdbcOptions.url)).isEmpty) +if (jdbcOptions.pushDownPredicate) { --- End diff -- No. I share your opinion actually. It is confusing here... maybe we should change the parameter names at some point. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21875: [SPARK-24288][SQL] Add a JDBC Option to enable pr...
Github user TomaszGaweda commented on a diff in the pull request: https://github.com/apache/spark/pull/21875#discussion_r205269684 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala --- @@ -172,7 +172,11 @@ private[sql] case class JDBCRelation( // Check if JDBCRDD.compileFilter can accept input filters override def unhandledFilters(filters: Array[Filter]): Array[Filter] = { -filters.filter(JDBCRDD.compileFilter(_, JdbcDialects.get(jdbcOptions.url)).isEmpty) +if (jdbcOptions.pushDownPredicate) { --- End diff -- Indeed it's confusing. `buildScan` argument may be named `pushedFilters`, variables also, then code will be self-describing --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21875: [SPARK-24288][SQL] Add a JDBC Option to enable pr...
Github user TomaszGaweda commented on a diff in the pull request: https://github.com/apache/spark/pull/21875#discussion_r205269695 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala --- @@ -172,7 +172,11 @@ private[sql] case class JDBCRelation( // Check if JDBCRDD.compileFilter can accept input filters override def unhandledFilters(filters: Array[Filter]): Array[Filter] = { -filters.filter(JDBCRDD.compileFilter(_, JdbcDialects.get(jdbcOptions.url)).isEmpty) +if (jdbcOptions.pushDownPredicate) { --- End diff -- Indeed it's confusing. `buildScan` argument may be named `pushedFilters`, variables also, then code will be self-describing --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21876: [SPARK-24802][SQL][FOLLOW-UP] Add a new config fo...
GitHub user maryannxue opened a pull request: https://github.com/apache/spark/pull/21876 [SPARK-24802][SQL][FOLLOW-UP] Add a new config for Optimization Rule Exclusion ## What changes were proposed in this pull request? This is an extension to the original PR, in which rule exclusion did not work for classes derived from Optimizer, e.g., SparkOptimizer. To solve this issue, Optimizer and its derived classes will define/override `defaultBatches` and `nonExcludableRules` in order to define its default rule set as well as rules that cannot be excluded by the SQL config. In the meantime, Optimizer's `batches` method is dedicated to the rule exclusion logic and is defined "final". ## How was this patch tested? Added UT. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maryannxue/spark rule-exclusion Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21876.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21876 commit eaec2f5f2b4e3193de41655b84a1dc936b0e50a3 Author: maryannxue Date: 2018-07-13T21:32:01Z [SPARK-24802] Optimization Rule Exclusion commit 84f1a6b5cba08df8684179e9d7195545be655e76 Author: maryannxue Date: 2018-07-18T06:13:50Z Address review comments commit ff23edf81a4a78d1589ed582a1802b94a8ebf4c6 Author: maryannxue Date: 2018-07-21T02:37:54Z Address review comments commit b154979236e211dc7185ca8e450493f0c6b0f469 Author: maryannxue Date: 2018-07-21T02:41:21Z change test name commit 87afe4fbcaf71d303b07612f9ceb9ad25dd3dcda Author: maryannxue Date: 2018-07-23T00:35:01Z address review comments commit 39b6ce9548c99363e81cb246b4cbe5534d710f3e Author: maryannxue Date: 2018-07-23T04:28:00Z address review comments commit a2161ef1f333f2cc039df0ecc8c96e5ec27e00ff Author: maryannxue Date: 2018-07-25T18:52:13Z Merge remote-tracking branch 'origin/master' into rule-exclusion commit 3730053d7386188042b2f2d4bd6784c3de722df6 Author: maryannxue Date: 2018-07-25T20:08:19Z Extend rule-exclusion to Optimizer sub-classes, esp. SparkOptimizer --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21876: [SPARK-24802][SQL][FOLLOW-UP] Add a new config for Optim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21876 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21876: [SPARK-24802][SQL][FOLLOW-UP] Add a new config for Optim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21876 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1329/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21876: [SPARK-24802][SQL][FOLLOW-UP] Add a new config for Optim...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21876 **[Test build #93571 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93571/testReport)** for PR 21876 at commit [`3730053`](https://github.com/apache/spark/commit/3730053d7386188042b2f2d4bd6784c3de722df6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21451: [SPARK-24296][CORE][WIP] Replicate large blocks as a str...
Github user squito commented on the issue: https://github.com/apache/spark/pull/21451 fyi, I did finally run my scale tests again on a cluster, and shuffles, remote reads, and replication worked for blocks over 2gb (sorry got sidetracked with a few other things in the meantime) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21821: [SPARK-24867] [SQL] Add AnalysisBarrier to DataFrameWrit...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21821 **[Test build #93555 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93555/testReport)** for PR 21821 at commit [`ddbd9f7`](https://github.com/apache/spark/commit/ddbd9f7c796e8bedfbae3141c9c7098370c217ce). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21821: [SPARK-24867] [SQL] Add AnalysisBarrier to DataFrameWrit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21821 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93555/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21821: [SPARK-24867] [SQL] Add AnalysisBarrier to DataFrameWrit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21821 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20414: [SPARK-23243][SQL] Shuffle+Repartition on an RDD could l...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20414 **[Test build #93558 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93558/testReport)** for PR 20414 at commit [`6910ed6`](https://github.com/apache/spark/commit/6910ed62c272bedfa251cab589bb52bed36be3ed). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20414: [SPARK-23243][SQL] Shuffle+Repartition on an RDD could l...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20414 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93558/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20414: [SPARK-23243][SQL] Shuffle+Repartition on an RDD could l...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20414 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > Few comments like #21320 (comment) or ^ are not minor or nits. I leave hard -1 if they are not addressed. I'm sorry to say I'm very close to hanging up this PR. I put a lot of care, time and effort into my work. After more than two years of off and on review, discussion/debate, nitpicking, commits, steps forward and backwards, to have someone swoop in at this time with a new raft of nitpicking and stylistic issues that set the review back again further is beyond maddening. And that's why you have not received much cooperation from me. Perhaps I should have stated that up front, but I've lost patience defending myself. Contributing to this PR is a tax on what is completely voluntary, unpaid time. I have no professional responsibility to this effort. Maybe it's better off done by someone who does. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20405: [SPARK-23229][SQL] Dataset.hint should use planWithBarri...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20405 **[Test build #93559 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93559/testReport)** for PR 20405 at commit [`47bb245`](https://github.com/apache/spark/commit/47bb245353202208f2c41634c3796c8e4d2be663). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20405: [SPARK-23229][SQL] Dataset.hint should use planWithBarri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20405 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93559/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20405: [SPARK-23229][SQL] Dataset.hint should use planWithBarri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20405 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Pyt...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/21650#discussion_r205275406 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala --- @@ -94,36 +95,94 @@ object ExtractPythonUDFFromAggregate extends Rule[LogicalPlan] { */ object ExtractPythonUDFs extends Rule[SparkPlan] with PredicateHelper { - private def hasPythonUDF(e: Expression): Boolean = { + private case class LazyEvalType(var evalType: Int = -1) { + +def isSet: Boolean = evalType >= 0 + +def set(evalType: Int): Unit = { + if (isSet) { +throw new IllegalStateException("Eval type has already been set") + } else { +this.evalType = evalType + } +} + +def get(): Int = { + if (!isSet) { +throw new IllegalStateException("Eval type is not set") + } else { +evalType + } +} + } + + private def hasScalarPythonUDF(e: Expression): Boolean = { e.find(PythonUDF.isScalarPythonUDF).isDefined } - private def canEvaluateInPython(e: PythonUDF): Boolean = { -e.children match { - // single PythonUDF child could be chained and evaluated in Python - case Seq(u: PythonUDF) => canEvaluateInPython(u) - // Python UDF can't be evaluated directly in JVM - case children => !children.exists(hasPythonUDF) + /** + * Check whether a PythonUDF expression can be evaluated in Python. + * + * If the lazy eval type is not set, this method checks for either Batched Python UDF and Scalar + * Pandas UDF. If the lazy eval type is set, this method checks for the expression of the + * specified eval type. + * + * This method will also set the lazy eval type to be the type of the first evaluable expression, + * i.e., if lazy eval type is not set and we find a evaluable Python UDF expression, lazy eval + * type will be set to the eval type of the expression. + * + */ + private def canEvaluateInPython(e: PythonUDF, lazyEvalType: LazyEvalType): Boolean = { --- End diff -- Ok, I think I see the problem. Since there was a map over `plan.expressions`, a new `FirstEvalType` object was being created for each expression. Changing this to the following corrected the failure: ``` val setEvalType = new FirstEvalType val udfs = plan.expressions.flatMap(collectEvaluableUDFs(_, setEvalType)) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21868: [SPARK-24906][SQL] Adaptively enlarge split / par...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21868#discussion_r205278202 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -381,6 +381,26 @@ object SQLConf { .booleanConf .createWithDefault(true) + val IS_PARQUET_PARTITION_ADAPTIVE_ENABLED = buildConf("spark.sql.parquet.adaptiveFileSplit") +.doc("For columnar file format (e.g., Parquet), it's possible that only few (not all) " + + "columns are needed. So, it's better to make sure that the total size of the selected " + + "columns is about 128 MB " +) +.booleanConf +.createWithDefault(false) + + val PARQUET_STRUCT_LENGTH = buildConf("spark.sql.parquet.struct.length") +.intConf +.createWithDefault(StructType.defaultConcreteType.defaultSize) + + val PARQUET_MAP_LENGTH = buildConf("spark.sql.parquet.map.length") +.intConf +.createWithDefault(MapType.defaultConcreteType.defaultSize) + + val PARQUET_ARRAY_LENGTH = buildConf("spark.sql.parquet.array.length") +.intConf +.createWithDefault(ArrayType.defaultConcreteType.defaultSize) --- End diff -- `ArrayType.defaultConcreteType` is `ArrayType(NullType, containsNull = true)`. I think using this you won't get a reasonable number. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21877: [SPARK-24923][SQL][WIP] Add unpartitioned CTAS and RTAS ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21877 **[Test build #93572 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93572/testReport)** for PR 21877 at commit [`d308d3c`](https://github.com/apache/spark/commit/d308d3c75f78242c822eab6d11fb651d94f10aa6). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class CreateTableAsSelect(` * `case class ReplaceTableAsSelect(` * `case class TableV2Relation(` * `case class AppendDataExec(` * `case class CreateTableAsSelectExec(` * `case class ReplaceTableAsSelectExec(` * `case class WriteToDataSourceV2Exec(` * `abstract case class V2TableWriteExec(` * ` implicit class CatalogHelper(catalog: CatalogProvider) ` * ` implicit class TableHelper(table: Table) ` * ` implicit class SourceHelper(source: DataSourceV2) ` * ` implicit class OptionsHelper(options: Map[String, String]) ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21877: [SPARK-24923][SQL][WIP] Add unpartitioned CTAS and RTAS ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21877 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93572/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21877: [SPARK-24923][SQL][WIP] Add unpartitioned CTAS and RTAS ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21877 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21867: [SPARK-24307][CORE] Add conf to revert to old code.
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21867 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21867: [SPARK-24307][CORE] Add conf to revert to old code.
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21867 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21852 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21852 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1334/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21878 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21878 **[Test build #93577 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93577/testReport)** for PR 21878 at commit [`d2759cc`](https://github.com/apache/spark/commit/d2759cce48eb9a85145e90d8a126fb83351d0fda). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21878 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1335/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21221 **[Test build #93570 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93570/testReport)** for PR 21221 at commit [`8905d23`](https://github.com/apache/spark/commit/8905d231c3a959f70266223d3546b17a655cee39). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21320 > After more than two years of off and on review, discussion/debate, nitpicking, commits, steps forward and backwards, to have someone swoop in at this time with a new raft of nitpicking and stylistic issues that set the review back again further is beyond maddening. I think that's primarily because the change looks incomplete but the feature itself sounds good to have. I think that's why people try to take a look a lot. Stepping forward and backwards is bad. That's why I am sticking with this PR to get this change in and help you address other people's comments and prevent such forward and backward. Stylistic issues are virtually based upon https://github.com/databricks/scala-style-guide . Nitpicking from me is basically from referring other codes or PRs in Spark, or other committer's preference so that we can get through this. I guess nits are still good to fix if you happen to push more changes. I guess it would take few seconds to address them. If that's not, please ignore my nit or minor comments. They don't block the PR usually. For clarification, few comments mentioned in https://github.com/apache/spark/pull/21320#issuecomment-407714036 are pretty reject comments in general in other PRs too. > Contributing to this PR is a tax on what is completely voluntary, unpaid time. FWIW, all my works have been unpaid and completely voluntary to me more than 3 years in the past except the recent half 6 months (which basically means until I became a committer). To be honest, I believe I still work on Spark like when I worked individually before. > I have no professional responsibility to this effort. Maybe it's better off done by someone who does. I completely agree. There should be no professional responsibility like a task to do in an open source in general. I think no one has that professional responsibility to take this and here we should be transparent on this. If anyone interested in this finds that you want someone else to take over, this might be taken over _voluntarily_ with a comment saying I want to take over this. I might cc some people who might be interested in this in order to inform them but it doesn't mean I hand it off to someone else. I am sorry if you felt I am pushing or rushing you - was trying to get this change in since people find it's a good feature to have. That's why I prioritized this and stick to this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21103: [SPARK-23915][SQL] Add array_except function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21103#discussion_r205310335 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3805,3 +3799,330 @@ object ArrayUnion { new GenericArrayData(arrayBuffer) } } + +/** + * Returns an array of the elements in the intersect of x and y, without duplicates + */ +@ExpressionDescription( + usage = """ + _FUNC_(array1, array2) - Returns an array of the elements in array1 but not in array2, +without duplicates. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5)); + array(2) + """, + since = "2.4.0") +case class ArrayExcept(left: Expression, right: Expression) extends ArraySetLike { + override def dataType: DataType = left.dataType + + var hsInt: OpenHashSet[Int] = _ + var hsLong: OpenHashSet[Long] = _ + + def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { +val elem = array.getInt(idx) +if (!hsInt.contains(elem)) { + if (resultArray != null) { +resultArray.setInt(pos, elem) + } + hsInt.add(elem) + true +} else { + false +} + } + + def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { +val elem = array.getLong(idx) +if (!hsLong.contains(elem)) { + if (resultArray != null) { +resultArray.setLong(pos, elem) + } + hsLong.add(elem) + true +} else { + false +} + } + + def evalIntLongPrimitiveType( + array1: ArrayData, + array2: ArrayData, + resultArray: ArrayData, + isLongType: Boolean): Int = { +// store elements into resultArray +var notFoundNullElement = true +var i = 0 +while (i < array2.numElements()) { + if (array2.isNullAt(i)) { +notFoundNullElement = false + } else { +val assigned = if (!isLongType) { + hsInt.add(array2.getInt(i)) +} else { + hsLong.add(array2.getLong(i)) +} + } + i += 1 +} +var pos = 0 +i = 0 +while (i < array1.numElements()) { + if (array1.isNullAt(i)) { +if (notFoundNullElement) { + if (resultArray != null) { +resultArray.setNullAt(pos) + } + pos += 1 + notFoundNullElement = false +} + } else { +val assigned = if (!isLongType) { + assignInt(array1, i, resultArray, pos) +} else { + assignLong(array1, i, resultArray, pos) +} +if (assigned) { + pos += 1 +} + } + i += 1 +} +pos + } + + override def nullSafeEval(input1: Any, input2: Any): Any = { +val array1 = input1.asInstanceOf[ArrayData] +val array2 = input2.asInstanceOf[ArrayData] + +if (elementTypeSupportEquals) { + elementType match { +case IntegerType => + // avoid boxing of primitive int array elements + // calculate result array size + hsInt = new OpenHashSet[Int] + val elements = evalIntLongPrimitiveType(array1, array2, null, false) + // allocate result array + hsInt = new OpenHashSet[Int] + val resultArray = if (UnsafeArrayData.shouldUseGenericArrayData( +IntegerType.defaultSize, elements)) { +new GenericArrayData(new Array[Any](elements)) + } else { +UnsafeArrayData.forPrimitiveArray( + Platform.INT_ARRAY_OFFSET, elements, IntegerType.defaultSize) + } + // assign elements into the result array + evalIntLongPrimitiveType(array1, array2, resultArray, false) + resultArray +case LongType => + // avoid boxing of primitive long array elements + // calculate result array size + hsLong = new OpenHashSet[Long] + val elements = evalIntLongPrimitiveType(array1, array2, null, true) + // allocate result array + hsLong = new OpenHashSet[Long] + val resultArray = if (UnsafeArrayData.shouldUseGenericArrayData( +LongType.defaultSize, elements)) { +new GenericArrayData(new Array[Any](elements)) + } else { +UnsafeArrayData.forPrimitiveArray( + Platform.LONG_ARRAY_OFFSET, elements, LongType.defaultSize) + } +
[GitHub] spark issue #21875: [SPARK-24288][SQL] Add a JDBC Option to enable preventin...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21875 which is here https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md#jdbc-to-other-databases --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21221 Build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21867: [SPARK-24307][CORE] Add conf to revert to old code.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21867 **[Test build #93566 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93566/testReport)** for PR 21867 at commit [`a5b00b8`](https://github.com/apache/spark/commit/a5b00b8a05538a6adb3a4525c2fecc1e15575f7c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21868: [SPARK-24906][SQL] Adaptively enlarge split / par...
Github user habren commented on a diff in the pull request: https://github.com/apache/spark/pull/21868#discussion_r205287123 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -381,6 +381,26 @@ object SQLConf { .booleanConf .createWithDefault(true) + val IS_PARQUET_PARTITION_ADAPTIVE_ENABLED = buildConf("spark.sql.parquet.adaptiveFileSplit") +.doc("For columnar file format (e.g., Parquet), it's possible that only few (not all) " + + "columns are needed. So, it's better to make sure that the total size of the selected " + + "columns is about 128 MB " +) +.booleanConf +.createWithDefault(false) + + val PARQUET_STRUCT_LENGTH = buildConf("spark.sql.parquet.struct.length") +.intConf +.createWithDefault(StructType.defaultConcreteType.defaultSize) + + val PARQUET_MAP_LENGTH = buildConf("spark.sql.parquet.map.length") +.intConf +.createWithDefault(MapType.defaultConcreteType.defaultSize) + + val PARQUET_ARRAY_LENGTH = buildConf("spark.sql.parquet.array.length") +.intConf +.createWithDefault(ArrayType.defaultConcreteType.defaultSize) --- End diff -- Thanks for your comments. I set the default value to StringType.defaultSize (8). It's default size, use should configure it according to the real data --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21877: [SPARK-24923][SQL][WIP] Add unpartitioned CTAS an...
GitHub user rdblue opened a pull request: https://github.com/apache/spark/pull/21877 [SPARK-24923][SQL][WIP] Add unpartitioned CTAS and RTAS support for DataSourceV2 ## What changes were proposed in this pull request? * Remove extends from `ReadSupport` and `WriteSupport` classes for use with `Table` * Add CTAS and RTAS logical plans * Refactor physical write plans so AppendData, CTAS, and RTAS use the same base class * Add support for `TableCatalog` to `DataFrameReader` and `DataFrameWriter` * Add `TableV2Relation` for tables that are loaded by `TableCatalog` and have no `DataSource` instance * Move implicit helpers into `DataSourceV2Implicits` to avoid future churn Note that this doesn't handle `partitionBy` in `DataFrameWriter`. Adding support for partitioned tables will require validation rules. This is based on unmerged work and includes the commits from #21306 and #21305. ## How was this patch tested? Adding unit tests for CTAS and RTAS. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rdblue/spark add-ctas-rtas-v2-plans Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21877.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21877 commit 8372f5bb47a0d6269bb16b3dc16f6f3278d2f5fd Author: Ryan Blue Date: 2018-05-05T01:13:01Z SPARK-24252: Add v2 data source mix-in for catalog support. commit 1238af73872b0105d0c5dfbbd8da5c8f18afe408 Author: Ryan Blue Date: 2018-05-07T15:54:37Z SPARK-24251: Add AppendData logical plan. This adds a new logical plan, AppendData, that was proposed in SPARK-23521. This also adds an analyzer rule to validate data written with AppendData against the target table. DataFrameWriter is also updated so that v2 writes use the new AppendData logical plan. commit d308d3c75f78242c822eab6d11fb651d94f10aa6 Author: Ryan Blue Date: 2018-07-25T18:11:45Z Add CTAS and RTAS support. This uses the catalog API introduced in SPARK-24252 to implement CTAS and RTAS plans. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/21857#discussion_r205289050 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1275,6 +1276,64 @@ object ReplaceExceptWithAntiJoin extends Rule[LogicalPlan] { } } +/** + * Replaces logical [[ExceptAll]] operator using a combination of Union, Aggregate + * and Generate operator. + * + * Input Query : + * {{{ + *SELECT c1 FROM ut1 EXCEPT ALL SELECT c1 FROM ut2 + * }}} + * + * Rewritten Query: + * {{{ + * SELECT c1 + * FROM ( + * SELECT replicate_rows(sum_val, c1) AS (sum_val, c1) + * FROM ( + * SELECT c1, cnt, sum_val --- End diff -- @ueshin Thanks.. will change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21878 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21878 **[Test build #93575 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93575/testReport)** for PR 21878 at commit [`d95ba40`](https://github.com/apache/spark/commit/d95ba4081ac1188515b7e6363640700d56f2c93f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21596 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93569/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21821: [SPARK-24867] [SQL] Add AnalysisBarrier to DataFrameWrit...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21821 Thanks! Merged to master/2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21852#discussion_r205305691 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyConditionalSuite.scala --- @@ -122,4 +126,25 @@ class SimplifyConditionalSuite extends PlanTest with PredicateHelper { None), CaseWhen(normalBranch :: trueBranch :: Nil, None)) } + + test("remove entire CaseWhen if all the outputs are semantic equivalence") { --- End diff -- Yes, I plan to add couple more tests tonight. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21852 **[Test build #93578 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93578/testReport)** for PR 21852 at commit [`4acda6f`](https://github.com/apache/spark/commit/4acda6fbf4fb5b1be30a0ad213cd5369b64b02b5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21221 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93570/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21221 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21852#discussion_r205309619 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -416,6 +416,21 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper { // these branches can be pruned away val (h, t) = branches.span(_._1 != TrueLiteral) CaseWhen( h :+ t.head, None) + + case e @ CaseWhen(branches, Some(elseValue)) if { +val list = branches.map(_._2) :+ elseValue +list.tail.forall(list.head.semanticEquals) + } => +// For non-deterministic conditions with side effect, we can not remove it. +// Since the output of all the branches are semantic equivalence, `elseValue` +// is picked for all the branches. +val newBranches = branches.map(_._1).filter(!_.deterministic).map(cond => (cond, elseValue)) --- End diff -- All conds must be deterministic, otherwise a non deterministic one not run before can be run after this rule. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21867: [SPARK-24307][CORE] Add conf to revert to old cod...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/21867#discussion_r205312971 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -731,7 +731,14 @@ private[spark] class BlockManager( } if (data != null) { -return Some(ChunkedByteBuffer.fromManagedBuffer(data, chunkSize)) +// SPARK-24307 undocumented "escape-hatch" in case there are any issues in converting to +// to ChunkedByteBuffer, to go back to old code-path. Can be removed post Spark 2.4 if +// new path is stable. +if (conf.getBoolean("spark.fetchToNioBuffer", false)) { --- End diff -- Maybe we'd better to rename that one "spark.maxRemoteBlockSizeFetchToMem" also ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21878 **[Test build #93575 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93575/testReport)** for PR 21878 at commit [`d95ba40`](https://github.com/apache/spark/commit/d95ba4081ac1188515b7e6363640700d56f2c93f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org