[GitHub] [spark] AmplabJenkins commented on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
AmplabJenkins commented on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-661673834 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
SparkQA removed a comment on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-661604422 **[Test build #126215 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126215/testReport)** for PR 29104 at commit [`0db75b3`](https://github.com/apache/spark/commit/0db75b33bbff9de6791b1c3b5107a747c732bca9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28911: [SPARK-32077][CORE] Support host-local shuffle data reading when external shuffle service is disabled
AmplabJenkins commented on pull request #28911: URL: https://github.com/apache/spark/pull/28911#issuecomment-661673216 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28911: [SPARK-32077][CORE] Support host-local shuffle data reading when external shuffle service is disabled
AmplabJenkins removed a comment on pull request #28911: URL: https://github.com/apache/spark/pull/28911#issuecomment-661673216 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable
viirya commented on pull request #29079: URL: https://github.com/apache/spark/pull/29079#issuecomment-661673387 It's too late today. I will take another look tomorrow if this is not merged yet. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26953: [SPARK-30306][CORE][PYTHON] Instrument Python UDF execution time and throughput metrics using Spark Metrics system
AmplabJenkins commented on pull request #26953: URL: https://github.com/apache/spark/pull/26953#issuecomment-661673221 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26953: [SPARK-30306][CORE][PYTHON] Instrument Python UDF execution time and throughput metrics using Spark Metrics system
AmplabJenkins removed a comment on pull request #26953: URL: https://github.com/apache/spark/pull/26953#issuecomment-661673221 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
AmplabJenkins removed a comment on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-661664316 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126214/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
SparkQA commented on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-661673065 **[Test build #126215 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126215/testReport)** for PR 29104 at commit [`0db75b3`](https://github.com/apache/spark/commit/0db75b33bbff9de6791b1c3b5107a747c732bca9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
leanken commented on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-661672887 > In ExtractEquiJoinKeys (in patterns.scala) there is code like this: > > ``` > case EqualTo(l, r) if canEvaluate(l, left) && canEvaluate(r, right) => Some((l, r)) > case EqualTo(l, r) if canEvaluate(l, right) && canEvaluate(r, left) => Some((r, l)) > ``` > > So I am wondering if it is possible that LeftAnti join can actually have the left side as the build and the right side as streaming ? Is it possible we won't optimize that case ? Do you think it makes sense to ensure that BNLJ is not present for any not-in query ? (ie this optimization should kick in always). > > I am still a bit surprised that you didn't have to modify any .sql.out files because the plan would have changed from BNLJ to BHJ. I think when both canBroadcastBySize(left) and canBroadcastBySize(right) are false, the ExtractSingleColumnNullAwareAntiJoin pattern will be misMatch, then it will still fallback to the origin BNLJ, FYI. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26953: [SPARK-30306][CORE][PYTHON] Instrument Python UDF execution time and throughput metrics using Spark Metrics system
SparkQA commented on pull request #26953: URL: https://github.com/apache/spark/pull/26953#issuecomment-661672747 **[Test build #126228 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126228/testReport)** for PR 26953 at commit [`277245c`](https://github.com/apache/spark/commit/277245c15c9f635c0747528f32a10c6aa857707e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28911: [SPARK-32077][CORE] Support host-local shuffle data reading when external shuffle service is disabled
SparkQA commented on pull request #28911: URL: https://github.com/apache/spark/pull/28911#issuecomment-661672706 **[Test build #126227 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126227/testReport)** for PR 28911 at commit [`bcb6012`](https://github.com/apache/spark/commit/bcb60123ef68feacf4d571de47cc99068b8496b8). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #28911: [SPARK-32077][CORE] Support host-local shuffle data reading when external shuffle service is disabled
Ngone51 commented on a change in pull request #28911: URL: https://github.com/apache/spark/pull/28911#discussion_r457876717 ## File path: core/src/main/scala/org/apache/spark/network/netty/NettyBlockTransferService.scala ## @@ -194,6 +196,45 @@ private[spark] class NettyBlockTransferService( result.future } + override def getHostLocalDirs( + host: String, + port: Int, + execIds: Array[String], + hostLocalDirsCompletable: CompletableFuture[util.Map[String, Array[String]]]): Unit = { +val getLocalDirsMessage = new GetLocalDirsForExecutors(appId, execIds) +try { + val client = clientFactory.createClient(host, port) + client.sendRpc(getLocalDirsMessage.toByteBuffer, new RpcResponseCallback() { +override def onSuccess(response: ByteBuffer): Unit = { + try { +val msgObj = BlockTransferMessage.Decoder.fromByteBuffer(response) +hostLocalDirsCompletable.complete( + msgObj.asInstanceOf[LocalDirsForExecutors].getLocalDirsByExec) + } catch { +case t: Throwable => + logWarning(s"Error trying to get the host local dirs for executor ${execIds.head}", +t.getCause) + hostLocalDirsCompletable.completeExceptionally(t) + } finally { +client.close() + } +} + +override def onFailure(t: Throwable): Unit = { + logWarning(s"Error trying to get the host local dirs for executor ${execIds.head}", +t.getCause) + hostLocalDirsCompletable.completeExceptionally(t) + client.close() +} + }) +} catch { + case e: IOException => Review comment: Yes. Good idea. I've did the refactor. Please take a look. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #29166: [SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict plan
viirya commented on a change in pull request #29166: URL: https://github.com/apache/spark/pull/29166#discussion_r457874203 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1237,20 +1249,44 @@ class Analyzer( if (conflictPlans.isEmpty) { right } else { -val attributeRewrites = AttributeMap(conflictPlans.flatMap { - case (oldRelation, newRelation) => oldRelation.output.zip(newRelation.output)}) -val conflictPlanMap = conflictPlans.toMap -// transformDown so that we can replace all the old Relations in one turn due to -// the reason that `conflictPlans` are also collected in pre-order. -right transformDown { - case r => conflictPlanMap.getOrElse(r, r) -} transformUp { - case other => other transformExpressions { +rewritePlan(right, conflictPlans.toMap)._1 + } +} + +private def rewritePlan(plan: LogicalPlan, conflictPlanMap: Map[LogicalPlan, LogicalPlan]) + : (LogicalPlan, mutable.ArrayBuffer[(Attribute, Attribute)]) = { + val attrMapping = new mutable.ArrayBuffer[(Attribute, Attribute)]() + if (conflictPlanMap.contains(plan)) { +// If the plan is the one that conflict the with left one, we'd +// just replace it with the new plan and collect the rewrite +// attributes for the parent node. +val newRelation = conflictPlanMap(plan) +attrMapping ++= plan.output.zip(newRelation.output) +newRelation -> attrMapping + } else { +var newPlan = plan.mapChildren { child => + // If not, we'd rewrite child plan recursively until we find the + // conflict node or reach the leaf node. + val (newChild, childAttrMapping) = rewritePlan(child, conflictPlanMap) + attrMapping ++= childAttrMapping + newChild +} + +if (attrMapping.isEmpty) { + newPlan -> attrMapping +} else { + assert(!attrMapping.groupBy(_._1.exprId) +.exists(_._2.map(_._2.exprId).distinct.length > 1), +"Found duplicate rewrite attributes") + val attributeRewrites = AttributeMap(attrMapping) + // rewrite the attributes of parent node + newPlan = newPlan.transformExpressions { Review comment: Oh, I see. This looks more clear. +1 for this change. ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1192,11 +1192,23 @@ class Analyzer( if findAliases(projectList).intersect(conflictingAttributes).nonEmpty => Seq((oldVersion, oldVersion.copy(projectList = newAliases(projectList +// We don't need to search child plan recursively if the projectList of a Project +// is only composed of Alias and doesn't contain any conflicting attributes. +// Because, even if the child plan has some conflicting attributes, the attributes +// will be aliased to non-conflicting attributes by the Project at the end. +case _ @ Project(projectList, _) + if findAliases(projectList).size == projectList.size => + Nil Review comment: Don't we need to put this before previous `Project` pattern? ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1192,11 +1192,23 @@ class Analyzer( if findAliases(projectList).intersect(conflictingAttributes).nonEmpty => Seq((oldVersion, oldVersion.copy(projectList = newAliases(projectList +// We don't need to search child plan recursively if the projectList of a Project +// is only composed of Alias and doesn't contain any conflicting attributes. +// Because, even if the child plan has some conflicting attributes, the attributes +// will be aliased to non-conflicting attributes by the Project at the end. +case _ @ Project(projectList, _) + if findAliases(projectList).size == projectList.size => + Nil + case oldVersion @ Aggregate(_, aggregateExpressions, _) if findAliases(aggregateExpressions).intersect(conflictingAttributes).nonEmpty => Seq((oldVersion, oldVersion.copy( aggregateExpressions = newAliases(aggregateExpressions +case _ @ Aggregate(_, aggregateExpressions, _) Review comment: Same reason as above? Add a simple comment too? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr.
[GitHub] [spark] Ngone51 commented on a change in pull request #28911: [SPARK-32077][CORE] Support host-local shuffle data reading when external shuffle service is disabled
Ngone51 commented on a change in pull request #28911: URL: https://github.com/apache/spark/pull/28911#discussion_r457876162 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/BlockStoreClient.java ## @@ -61,4 +63,17 @@ public MetricSet shuffleMetrics() { // Return an empty MetricSet by default. return () -> Collections.emptyMap(); } + + /** + * Request the local disk directories, which are specified by DiskBlockManager, for the executors + * from the external shuffle service (when this is a ExternalBlockStoreClient) or BlockManager + * (when this is a NettyBlockTransferService). Note there's only one executor when this is a + * NettyBlockTransferService because we ask one specific executor at a time. Review comment: I added the check to ensure it's the only one executor id but didn't check its equality with blockManager's executor id. Because we only have `BlockDataManager` in `NettyBlockRpcServer` which does not expose executor id. I am still wondering whether it's worthwhile to expose it for the sanity check purpose. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
leanken commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r457870914 ## File path: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java ## @@ -171,6 +171,23 @@ private volatile MapIterator destructiveIterator = null; private LinkedList spillWriters = new LinkedList<>(); + private boolean anyNullKeyExists = false; + + public boolean inputEmpty() + { +return ((numKeys == 0) && !anyNullKeyExists); + } + + public boolean isAnyNullKeyExists() + { +return anyNullKeyExists; + } + + public void setAnyNullKeyExists(boolean anyNullKeyExists) + { +this.anyNullKeyExists = anyNullKeyExists; Review comment: ``` yes, no extra scan is needed. I set anyNullKeyExists during going through the input iterator if input is empty or there are no null keys row, it will stay as default value false. while (input.hasNext) { val row = input.next().asInstanceOf[UnsafeRow] numFields = row.numFields() val key = keyGenerator(row) if (!key.anyNull) { val loc = binaryMap.lookup(key.getBaseObject, key.getBaseOffset, key.getSizeInBytes) val success = loc.append( key.getBaseObject, key.getBaseOffset, key.getSizeInBytes, row.getBaseObject, row.getBaseOffset, row.getSizeInBytes) if (!success) { binaryMap.free() // scalastyle:off throwerror throw new SparkOutOfMemoryError("There is not enough memory to build hash map") // scalastyle:on throwerror } } else { binaryMap.setAnyNullKeyExists(true) // HERE } } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
leanken commented on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-661669277 > In ExtractEquiJoinKeys (in patterns.scala) there is code like this: > > ``` > case EqualTo(l, r) if canEvaluate(l, left) && canEvaluate(r, right) => Some((l, r)) > case EqualTo(l, r) if canEvaluate(l, right) && canEvaluate(r, left) => Some((r, l)) > ``` > > So I am wondering if it is possible that LeftAnti join can actually have the left side as the build and the right side as streaming ? Is it possible we won't optimize that case ? Do you think it makes sense to ensure that BNLJ is not present for any not-in query ? (ie this optimization should kick in always). > > I am still a bit surprised that you didn't have to modify any .sql.out files because the plan would have changed from BNLJ to BHJ. .sql.out only verified the output schema and output answer, as long as the change itself could come out with right answer, the verified will pass. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
leanken commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r457873194 ## File path: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java ## @@ -171,6 +171,23 @@ private volatile MapIterator destructiveIterator = null; private LinkedList spillWriters = new LinkedList<>(); + private boolean anyNullKeyExists = false; + + public boolean inputEmpty() + { +return ((numKeys == 0) && !anyNullKeyExists); Review comment: > I am still a bit surprised that you didn't have to modify any .sql.out files because the plan would have changed from BNLJ to BHJ. .sql.out only verified the output schema and output answer, as long as the change itself could come out with right answer, the verified will pass. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
leanken commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r457870914 ## File path: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java ## @@ -171,6 +171,23 @@ private volatile MapIterator destructiveIterator = null; private LinkedList spillWriters = new LinkedList<>(); + private boolean anyNullKeyExists = false; + + public boolean inputEmpty() + { +return ((numKeys == 0) && !anyNullKeyExists); + } + + public boolean isAnyNullKeyExists() + { +return anyNullKeyExists; + } + + public void setAnyNullKeyExists(boolean anyNullKeyExists) + { +this.anyNullKeyExists = anyNullKeyExists; Review comment: ``` I set anyNullKeyExists during going through the input iterator if input is empty or there are no null keys row, it will stay as default value false. while (input.hasNext) { val row = input.next().asInstanceOf[UnsafeRow] numFields = row.numFields() val key = keyGenerator(row) if (!key.anyNull) { val loc = binaryMap.lookup(key.getBaseObject, key.getBaseOffset, key.getSizeInBytes) val success = loc.append( key.getBaseObject, key.getBaseOffset, key.getSizeInBytes, row.getBaseObject, row.getBaseOffset, row.getSizeInBytes) if (!success) { binaryMap.free() // scalastyle:off throwerror throw new SparkOutOfMemoryError("There is not enough memory to build hash map") // scalastyle:on throwerror } } else { binaryMap.setAnyNullKeyExists(true) // HERE } } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
leanken commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r457871466 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ## @@ -71,6 +71,16 @@ private[execution] sealed trait HashedRelation extends KnownSizeEstimation { */ def keyIsUnique: Boolean + /** + * is input: Iterator[InternalRow] empty Review comment: will do. ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ## @@ -71,6 +71,16 @@ private[execution] sealed trait HashedRelation extends KnownSizeEstimation { */ def keyIsUnique: Boolean + /** + * is input: Iterator[InternalRow] empty + */ + def inputEmpty: Boolean + + /** + * anyNull key exists in input Review comment: will do. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
leanken commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r457870914 ## File path: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java ## @@ -171,6 +171,23 @@ private volatile MapIterator destructiveIterator = null; private LinkedList spillWriters = new LinkedList<>(); + private boolean anyNullKeyExists = false; + + public boolean inputEmpty() + { +return ((numKeys == 0) && !anyNullKeyExists); + } + + public boolean isAnyNullKeyExists() + { +return anyNullKeyExists; + } + + public void setAnyNullKeyExists(boolean anyNullKeyExists) + { +this.anyNullKeyExists = anyNullKeyExists; Review comment: ``` I set anyNullKeyExists during go through the input iterator while (input.hasNext) { val row = input.next().asInstanceOf[UnsafeRow] numFields = row.numFields() val key = keyGenerator(row) if (!key.anyNull) { val loc = binaryMap.lookup(key.getBaseObject, key.getBaseOffset, key.getSizeInBytes) val success = loc.append( key.getBaseObject, key.getBaseOffset, key.getSizeInBytes, row.getBaseObject, row.getBaseOffset, row.getSizeInBytes) if (!success) { binaryMap.free() // scalastyle:off throwerror throw new SparkOutOfMemoryError("There is not enough memory to build hash map") // scalastyle:on throwerror } } else { binaryMap.setAnyNullKeyExists(true) // HERE } } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode
AmplabJenkins removed a comment on pull request #28968: URL: https://github.com/apache/spark/pull/28968#issuecomment-661666519 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126222/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
leanken commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r457870914 ## File path: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java ## @@ -171,6 +171,23 @@ private volatile MapIterator destructiveIterator = null; private LinkedList spillWriters = new LinkedList<>(); + private boolean anyNullKeyExists = false; + + public boolean inputEmpty() + { +return ((numKeys == 0) && !anyNullKeyExists); + } + + public boolean isAnyNullKeyExists() + { +return anyNullKeyExists; + } + + public void setAnyNullKeyExists(boolean anyNullKeyExists) + { +this.anyNullKeyExists = anyNullKeyExists; Review comment: ``` I set anyNullKeyExists during going through the input iterator while (input.hasNext) { val row = input.next().asInstanceOf[UnsafeRow] numFields = row.numFields() val key = keyGenerator(row) if (!key.anyNull) { val loc = binaryMap.lookup(key.getBaseObject, key.getBaseOffset, key.getSizeInBytes) val success = loc.append( key.getBaseObject, key.getBaseOffset, key.getSizeInBytes, row.getBaseObject, row.getBaseOffset, row.getSizeInBytes) if (!success) { binaryMap.free() // scalastyle:off throwerror throw new SparkOutOfMemoryError("There is not enough memory to build hash map") // scalastyle:on throwerror } } else { binaryMap.setAnyNullKeyExists(true) // HERE } } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode
SparkQA removed a comment on pull request #28968: URL: https://github.com/apache/spark/pull/28968#issuecomment-661658276 **[Test build #126222 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126222/testReport)** for PR 28968 at commit [`a78fd43`](https://github.com/apache/spark/commit/a78fd4314ba39d1feb63ba1539ac9a2acf40de77). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [SPARK-32363][PYTHON][BUILD] Fix flakiness in pip package testing in Jenkins
AmplabJenkins removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-661665966 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126216/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sekingme commented on pull request #29173: [SPARK-32378][YARN] Fix permission problem while prepareLocalResources
sekingme commented on pull request #29173: URL: https://github.com/apache/spark/pull/29173#issuecomment-661666591 > @sekingme, please file a JIRA and format the PR title correctly. See also http://spark.apache.org/contributing.html. @HyukjinKwon done~ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode
AmplabJenkins removed a comment on pull request #28968: URL: https://github.com/apache/spark/pull/28968#issuecomment-661666508 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode
AmplabJenkins commented on pull request #28968: URL: https://github.com/apache/spark/pull/28968#issuecomment-661666508 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sekingme removed a comment on pull request #29173: [SPARK-32378][YARN] Fix permission problem while prepareLocalResources
sekingme removed a comment on pull request #29173: URL: https://github.com/apache/spark/pull/29173#issuecomment-661666414 > @sekingme, please file a JIRA and format the PR title correctly. See also http://spark.apache.org/contributing.html. Done~ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode
SparkQA commented on pull request #28968: URL: https://github.com/apache/spark/pull/28968#issuecomment-661666372 **[Test build #126222 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126222/testReport)** for PR 28968 at commit [`a78fd43`](https://github.com/apache/spark/commit/a78fd4314ba39d1feb63ba1539ac9a2acf40de77). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class InheritableThread(threading.Thread):` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sekingme commented on pull request #29173: [SPARK-32378][YARN] Fix permission problem while prepareLocalResources
sekingme commented on pull request #29173: URL: https://github.com/apache/spark/pull/29173#issuecomment-661666414 > @sekingme, please file a JIRA and format the PR title correctly. See also http://spark.apache.org/contributing.html. Done~ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [SPARK-32363][PYTHON][BUILD] Fix flakiness in pip package testing in Jenkins
AmplabJenkins removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-661665961 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29117: [SPARK-32363][PYTHON][BUILD] Fix flakiness in pip package testing in Jenkins
AmplabJenkins commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-661665961 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
leanken commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r457869773 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala ## @@ -388,3 +390,36 @@ object PhysicalWindow { case _ => None } } + +object ExtractSingleColumnNullAwareAntiJoin extends JoinSelectionHelper { + + // SingleColumn NullAwareAntiJoin + // streamedSideKeys, buildSideKeys + // currently these two return Seq[Expression] should have only one element + private type ReturnType = (Seq[Expression], Seq[Expression]) + + /** + * See. [SPARK-32290] + * LeftAnti(condition: Or(EqualTo(a=b), IsNull(EqualTo(a=b))) + * will almost certainly be planned as a Broadcast Nested Loop join, + * which is very time consuming because it's an O(M*N) calculation. + * But if it's a single column case, and buildSide data is small enough, + * O(M*N) calculation could be optimized into O(M) using hash lookup instead of loop lookup. + */ + def unapply(join: Join): Option[ReturnType] = join match { +case Join(left, right, LeftAnti, + Some(Or(EqualTo(leftAttr: AttributeReference, rightAttr: AttributeReference), +IsNull(EqualTo(tmpLeft: AttributeReference, tmpRight: AttributeReference, _) +if SQLConf.get.nullAwareAntiJoinOptimizeEnabled && + leftAttr.semanticEquals(tmpLeft) && rightAttr.semanticEquals(tmpRight) && Review comment: > Should this also refer to `canEvaluate` as done in `ExtractEquiJoinKeys` I think semanticEquals is necessary. Because besides not in subquery can translate into LeftAnti I can always write a SQL like select * from a left anti join b on a.key = b.key OR isnull(a.key=b.keyB) which passed the canEvaluate of left or right plan. But it's not right. I think the pattern rule here is should be as Strict as we can. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29153: [SPARK-32310][ML][PySpark][WIP] ML params default value parity in feature and tuning
AmplabJenkins removed a comment on pull request #29153: URL: https://github.com/apache/spark/pull/29153#issuecomment-661665075 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126220/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29117: [SPARK-32363][PYTHON][BUILD] Fix flakiness in pip package testing in Jenkins
SparkQA removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-661611654 **[Test build #126216 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126216/testReport)** for PR 29117 at commit [`51187d1`](https://github.com/apache/spark/commit/51187d1e012ea8e6259492d125037a32ea75c1f1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29153: [SPARK-32310][ML][PySpark][WIP] ML params default value parity in feature and tuning
SparkQA removed a comment on pull request #29153: URL: https://github.com/apache/spark/pull/29153#issuecomment-661643229 **[Test build #126220 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126220/testReport)** for PR 29153 at commit [`1586e30`](https://github.com/apache/spark/commit/1586e3079b442daf5ab5332a3d690f218df423cc). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29153: [SPARK-32310][ML][PySpark][WIP] ML params default value parity in feature and tuning
AmplabJenkins removed a comment on pull request #29153: URL: https://github.com/apache/spark/pull/29153#issuecomment-661665066 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29117: [SPARK-32363][PYTHON][BUILD] Fix flakiness in pip package testing in Jenkins
SparkQA commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-661665187 **[Test build #126216 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126216/testReport)** for PR 29117 at commit [`51187d1`](https://github.com/apache/spark/commit/51187d1e012ea8e6259492d125037a32ea75c1f1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29153: [SPARK-32310][ML][PySpark][WIP] ML params default value parity in feature and tuning
SparkQA commented on pull request #29153: URL: https://github.com/apache/spark/pull/29153#issuecomment-661664851 **[Test build #126220 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126220/testReport)** for PR 29153 at commit [`1586e30`](https://github.com/apache/spark/commit/1586e3079b442daf5ab5332a3d690f218df423cc). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29153: [SPARK-32310][ML][PySpark][WIP] ML params default value parity in feature and tuning
AmplabJenkins commented on pull request #29153: URL: https://github.com/apache/spark/pull/29153#issuecomment-661665066 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning
AmplabJenkins removed a comment on pull request #29014: URL: https://github.com/apache/spark/pull/29014#issuecomment-661664391 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AmplabJenkins removed a comment on pull request #29085: URL: https://github.com/apache/spark/pull/29085#issuecomment-661664389 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning
AmplabJenkins commented on pull request #29014: URL: https://github.com/apache/spark/pull/29014#issuecomment-661664391 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
AmplabJenkins removed a comment on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-661664297 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
AmplabJenkins commented on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-661664297 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AmplabJenkins commented on pull request #29085: URL: https://github.com/apache/spark/pull/29085#issuecomment-661664389 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
leanken commented on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-661664217 > In ExtractEquiJoinKeys (in patterns.scala) there is code like this: > > ``` > case EqualTo(l, r) if canEvaluate(l, left) && canEvaluate(r, right) => Some((l, r)) > case EqualTo(l, r) if canEvaluate(l, right) && canEvaluate(r, left) => Some((r, l)) > ``` > > So I am wondering if it is possible that LeftAnti join can actually have the left side as the build and the right side as streaming ? Is it possible we won't optimize that case ? Do you think it makes sense to ensure that BNLJ is not present for any not-in query ? (ie this optimization should kick in always). > > I am still a bit surprised that you didn't have to modify any .sql.out files because the plan would have changed from BNLJ to BHJ. ``` join.scala has restricted rules about left anti join build side it seems that only InnerLike can support both buildLeft and buildRight def canBuildLeft(joinType: JoinType): Boolean = { joinType match { case _: InnerLike | RightOuter => true case _ => false } } def canBuildRight(joinType: JoinType): Boolean = { joinType match { case _: InnerLike | LeftOuter | LeftSemi | LeftAnti | _: ExistenceJoin => true case _ => false } } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AngersZh commented on pull request #29085: URL: https://github.com/apache/spark/pull/29085#issuecomment-661663602 @maropu Update for spark defult serde support ArrayType/MapType/StructType and UT changed too This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29175: [SPARK-32377][SQL][2.4] CaseInsensitiveMap should be deterministic for addition
dongjoon-hyun commented on pull request #29175: URL: https://github.com/apache/spark/pull/29175#issuecomment-661663853 Thank you, @HyukjinKwon . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning
SparkQA commented on pull request #29014: URL: https://github.com/apache/spark/pull/29014#issuecomment-661663875 **[Test build #126226 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126226/testReport)** for PR 29014 at commit [`05c871f`](https://github.com/apache/spark/commit/05c871f2672db9c7fb814441c3d201c4aea654c3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] agrawaldevesh commented on a change in pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning
agrawaldevesh commented on a change in pull request #29014: URL: https://github.com/apache/spark/pull/29014#discussion_r457867091 ## File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ## @@ -1767,10 +1767,18 @@ private[spark] class DAGScheduler( // TODO: mark the executor as failed only if there were lots of fetch failures on it if (bmAddress != null) { -val hostToUnregisterOutputs = if (env.blockManager.externalShuffleServiceEnabled && - unRegisterOutputOnHostOnFetchFailure) { - // We had a fetch failure with the external shuffle service, so we - // assume all shuffle data on the node is bad. +val externalShuffleServiceEnabled = env.blockManager.externalShuffleServiceEnabled +val isHostDecommissioned = taskScheduler + .getExecutorDecommissionInfo(bmAddress.executorId) + .exists(_.isHostDecommissioned) +// Host shuffle data is considered lost if: +// - If we know that the host was decommissioned +// - Or when `unRegisterOutputOnHostOnFetchFailure` is enabled and we had +// a fetch failure with the external shuffle service, so we assume all +// shuffle data on the node is bad. +val hostLost = isHostDecommissioned || (externalShuffleServiceEnabled && Review comment: Edit: @attilapiros thanks for bringing me to rework this. I think I get the main intention of your suggestion now: unRegisterOutputOnHostOnFetchFailure now uniformly applies to both normal fetch failures and decommissioning. So the description does not need to be reworked. Please take a look at the new version. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
SparkQA commented on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-661663518 **[Test build #126214 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126214/testReport)** for PR 29104 at commit [`65c51bb`](https://github.com/apache/spark/commit/65c51bb21a58a5b0d3977674517c6b78d55524d7). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
SparkQA removed a comment on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-661601896 **[Test build #126214 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126214/testReport)** for PR 29104 at commit [`65c51bb`](https://github.com/apache/spark/commit/65c51bb21a58a5b0d3977674517c6b78d55524d7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
SparkQA commented on pull request #29085: URL: https://github.com/apache/spark/pull/29085#issuecomment-661663833 **[Test build #126225 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126225/testReport)** for PR 29085 at commit [`cfecc90`](https://github.com/apache/spark/commit/cfecc90861ecae94a90e37654412fb31e934d14e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
leanken commented on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-661663720 > canEvaluate ``` join.scala has restricted rules about left anti join build side def canBuildLeft(joinType: JoinType): Boolean = { joinType match { case _: InnerLike | RightOuter => true case _ => false } } def canBuildRight(joinType: JoinType): Boolean = { joinType match { case _: InnerLike | LeftOuter | LeftSemi | LeftAnti | _: ExistenceJoin => true case _ => false } } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command
AmplabJenkins removed a comment on pull request #28840: URL: https://github.com/apache/spark/pull/28840#issuecomment-661661384 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command
AmplabJenkins commented on pull request #28840: URL: https://github.com/apache/spark/pull/28840#issuecomment-661661384 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29170: [SPARK-30876][SQL]: Optimizer fails to infer constraints within join
AmplabJenkins commented on pull request #29170: URL: https://github.com/apache/spark/pull/29170#issuecomment-661661397 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29170: [SPARK-30876][SQL]: Optimizer fails to infer constraints within join
AmplabJenkins removed a comment on pull request #29170: URL: https://github.com/apache/spark/pull/29170#issuecomment-661661397 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command
cloud-fan commented on a change in pull request #28840: URL: https://github.com/apache/spark/pull/28840#discussion_r457864581 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala ## @@ -236,6 +236,44 @@ case class ShowFunctionsCommand( } } + +/** + * A command for users to refresh the persistent function. + * The syntax of using this command in SQL is: + * {{{ + *REFRESH FUNCTION functionName + * }}} + */ +case class RefreshFunctionCommand( +databaseName: Option[String], +functionName: String) + extends RunnableCommand { + + override def run(sparkSession: SparkSession): Seq[Row] = { +val catalog = sparkSession.sessionState.catalog +if (FunctionRegistry.builtin.functionExists(FunctionIdentifier(functionName))) { + throw new AnalysisException(s"Cannot refresh builtin function $functionName") +} +if (catalog.isTemporaryFunction(FunctionIdentifier(functionName, databaseName))) { + throw new AnalysisException(s"Cannot refresh temporary function $functionName") +} + +val identifier = FunctionIdentifier( + functionName, Some(databaseName.getOrElse(catalog.getCurrentDatabase))) +// we only refresh the permanent function. +if (catalog.isPersistentFunction(identifier)) { + // register overwrite function. + val func = catalog.getFunctionMetadata(identifier) + catalog.registerFunction(func, true) +} else { + // clear cached function. + catalog.unregisterFunction(identifier) Review comment: can you change it? I'd expect something like ``` catalog.unregisterFunction(identifier) throw new NoSuchFunction... ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29170: [SPARK-30876][SQL]: Optimizer fails to infer constraints within join
SparkQA commented on pull request #29170: URL: https://github.com/apache/spark/pull/29170#issuecomment-661660909 **[Test build #126223 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126223/testReport)** for PR 29170 at commit [`6f01ac9`](https://github.com/apache/spark/commit/6f01ac9c5b85109a194d31c8afd25e00fda77f0a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command
SparkQA commented on pull request #28840: URL: https://github.com/apache/spark/pull/28840#issuecomment-661660951 **[Test build #126224 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126224/testReport)** for PR 28840 at commit [`fc4789f`](https://github.com/apache/spark/commit/fc4789fcb5357bd1a7cfc88b76c7d76822457db7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command
cloud-fan commented on pull request #28840: URL: https://github.com/apache/spark/pull/28840#issuecomment-661660176 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #29170: [SPARK-30876][SQL]: Optimizer fails to infer constraints within join
maropu commented on pull request #29170: URL: https://github.com/apache/spark/pull/29170#issuecomment-661659346 Thanks for the contribution, @navinvishy This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29170: [SPARK-30876][SQL]: Optimizer fails to infer constraints within join
AmplabJenkins removed a comment on pull request #29170: URL: https://github.com/apache/spark/pull/29170#issuecomment-661458294 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29170: [SPARK-30876][SQL]: Optimizer fails to infer constraints within join
maropu commented on a change in pull request #29170: URL: https://github.com/apache/spark/pull/29170#discussion_r457862395 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -116,7 +116,8 @@ abstract class Optimizer(catalogManager: CatalogManager) operatorOptimizationRuleSet.filterNot(_ == InferFiltersFromConstraints) Batch("Operator Optimization before Inferring Filters", fixedPoint, rulesWithoutInferFiltersFromConstraints: _*) :: - Batch("Infer Filters", Once, + Batch("Infer Filters", fixedPoint, +PushDownPredicates, InferFiltersFromConstraints) :: Review comment: Note: This rule was separated because https://github.com/apache/spark/pull/19149 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #29170: [SPARK-30876][SQL]: Optimizer fails to infer constraints within join
maropu commented on pull request #29170: URL: https://github.com/apache/spark/pull/29170#issuecomment-661659200 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] agrawaldevesh commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
agrawaldevesh commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r457857768 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala ## @@ -498,6 +547,8 @@ case class BroadcastHashJoinExec( |} | } |} + |// special case for NullAwareAntiJoin, if anyNull in streamedRow, row should be dropped. Review comment: Wow !. Did you take a look at that tiny code diff. Great job !! ## File path: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java ## @@ -171,6 +171,23 @@ private volatile MapIterator destructiveIterator = null; private LinkedList spillWriters = new LinkedList<>(); + private boolean anyNullKeyExists = false; + + public boolean inputEmpty() + { +return ((numKeys == 0) && !anyNullKeyExists); Review comment: nit: The outer params can be dropped ? ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ## @@ -71,6 +71,16 @@ private[execution] sealed trait HashedRelation extends KnownSizeEstimation { */ def keyIsUnique: Boolean + /** + * is input: Iterator[InternalRow] empty + */ + def inputEmpty: Boolean + + /** + * anyNull key exists in input Review comment: Need more context why this is worthwhile to consider: Perhaps to the effect that it is only used in null aware anti join. ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ## @@ -71,6 +71,16 @@ private[execution] sealed trait HashedRelation extends KnownSizeEstimation { */ def keyIsUnique: Boolean + /** + * is input: Iterator[InternalRow] empty Review comment: Can you expand this comment please: You can add a "Note that, the hashed relation can be empty despite the Iterator[InternalRow] being not empty since the hashed relation skips over null keys" ## File path: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java ## @@ -171,6 +171,23 @@ private volatile MapIterator destructiveIterator = null; private LinkedList spillWriters = new LinkedList<>(); + private boolean anyNullKeyExists = false; + + public boolean inputEmpty() + { +return ((numKeys == 0) && !anyNullKeyExists); + } + + public boolean isAnyNullKeyExists() + { +return anyNullKeyExists; + } + + public void setAnyNullKeyExists(boolean anyNullKeyExists) + { +this.anyNullKeyExists = anyNullKeyExists; Review comment: So just making sure I am reading this code right: There is no extra scan of the rows done to know if there are no null keys. ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala ## @@ -388,3 +390,36 @@ object PhysicalWindow { case _ => None } } + +object ExtractSingleColumnNullAwareAntiJoin extends JoinSelectionHelper { + + // SingleColumn NullAwareAntiJoin + // streamedSideKeys, buildSideKeys + // currently these two return Seq[Expression] should have only one element + private type ReturnType = (Seq[Expression], Seq[Expression]) + + /** + * See. [SPARK-32290] + * LeftAnti(condition: Or(EqualTo(a=b), IsNull(EqualTo(a=b))) + * will almost certainly be planned as a Broadcast Nested Loop join, + * which is very time consuming because it's an O(M*N) calculation. + * But if it's a single column case, and buildSide data is small enough, + * O(M*N) calculation could be optimized into O(M) using hash lookup instead of loop lookup. + */ + def unapply(join: Join): Option[ReturnType] = join match { +case Join(left, right, LeftAnti, + Some(Or(EqualTo(leftAttr: AttributeReference, rightAttr: AttributeReference), +IsNull(EqualTo(tmpLeft: AttributeReference, tmpRight: AttributeReference, _) +if SQLConf.get.nullAwareAntiJoinOptimizeEnabled && + leftAttr.semanticEquals(tmpLeft) && rightAttr.semanticEquals(tmpRight) && Review comment: Should this also refer to `canEvaluate` as done in `ExtractEquiJoinKeys` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode
SparkQA commented on pull request #28968: URL: https://github.com/apache/spark/pull/28968#issuecomment-661658276 **[Test build #126222 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126222/testReport)** for PR 28968 at commit [`a78fd43`](https://github.com/apache/spark/commit/a78fd4314ba39d1feb63ba1539ac9a2acf40de77). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode
AmplabJenkins removed a comment on pull request #28968: URL: https://github.com/apache/spark/pull/28968#issuecomment-661656050 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode
AmplabJenkins commented on pull request #28968: URL: https://github.com/apache/spark/pull/28968#issuecomment-661656050 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode
HyukjinKwon commented on pull request #28968: URL: https://github.com/apache/spark/pull/28968#issuecomment-661655744 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition
AmplabJenkins removed a comment on pull request #29172: URL: https://github.com/apache/spark/pull/29172#issuecomment-661654861 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126219/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition
AmplabJenkins removed a comment on pull request #29172: URL: https://github.com/apache/spark/pull/29172#issuecomment-661654854 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition
SparkQA removed a comment on pull request #29172: URL: https://github.com/apache/spark/pull/29172#issuecomment-661632576 **[Test build #126219 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126219/testReport)** for PR 29172 at commit [`a6dc25d`](https://github.com/apache/spark/commit/a6dc25dcacd34b315bf67d8bb1d28e52f2dec3bb). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition
AmplabJenkins commented on pull request #29172: URL: https://github.com/apache/spark/pull/29172#issuecomment-661654854 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition
SparkQA commented on pull request #29172: URL: https://github.com/apache/spark/pull/29172#issuecomment-661654645 **[Test build #126219 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126219/testReport)** for PR 29172 at commit [`a6dc25d`](https://github.com/apache/spark/commit/a6dc25dcacd34b315bf67d8bb1d28e52f2dec3bb). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29166: [SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict plan
maropu commented on a change in pull request #29166: URL: https://github.com/apache/spark/pull/29166#discussion_r457855360 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1237,20 +1249,44 @@ class Analyzer( if (conflictPlans.isEmpty) { right } else { -val attributeRewrites = AttributeMap(conflictPlans.flatMap { - case (oldRelation, newRelation) => oldRelation.output.zip(newRelation.output)}) -val conflictPlanMap = conflictPlans.toMap -// transformDown so that we can replace all the old Relations in one turn due to -// the reason that `conflictPlans` are also collected in pre-order. -right transformDown { - case r => conflictPlanMap.getOrElse(r, r) -} transformUp { - case other => other transformExpressions { +rewritePlan(right, conflictPlans.toMap)._1 + } +} + +private def rewritePlan(plan: LogicalPlan, conflictPlanMap: Map[LogicalPlan, LogicalPlan]) + : (LogicalPlan, mutable.ArrayBuffer[(Attribute, Attribute)]) = { Review comment: nit: In the most cases of recursive calls, a return type would be `Seq` instead of `ArrayBuffer`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning
AmplabJenkins removed a comment on pull request #29014: URL: https://github.com/apache/spark/pull/29014#issuecomment-661650757 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126213/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29166: [SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict plan
maropu commented on a change in pull request #29166: URL: https://github.com/apache/spark/pull/29166#discussion_r457854335 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1237,20 +1249,44 @@ class Analyzer( if (conflictPlans.isEmpty) { right } else { -val attributeRewrites = AttributeMap(conflictPlans.flatMap { - case (oldRelation, newRelation) => oldRelation.output.zip(newRelation.output)}) -val conflictPlanMap = conflictPlans.toMap -// transformDown so that we can replace all the old Relations in one turn due to -// the reason that `conflictPlans` are also collected in pre-order. -right transformDown { - case r => conflictPlanMap.getOrElse(r, r) -} transformUp { - case other => other transformExpressions { +rewritePlan(right, conflictPlans.toMap)._1 + } +} + +private def rewritePlan(plan: LogicalPlan, conflictPlanMap: Map[LogicalPlan, LogicalPlan]) + : (LogicalPlan, mutable.ArrayBuffer[(Attribute, Attribute)]) = { + val attrMapping = new mutable.ArrayBuffer[(Attribute, Attribute)]() + if (conflictPlanMap.contains(plan)) { +// If the plan is the one that conflict the with left one, we'd +// just replace it with the new plan and collect the rewrite +// attributes for the parent node. +val newRelation = conflictPlanMap(plan) +attrMapping ++= plan.output.zip(newRelation.output) +newRelation -> attrMapping + } else { +var newPlan = plan.mapChildren { child => Review comment: nit: `val` is better. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning
AmplabJenkins removed a comment on pull request #29014: URL: https://github.com/apache/spark/pull/29014#issuecomment-661650753 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning
AmplabJenkins commented on pull request #29014: URL: https://github.com/apache/spark/pull/29014#issuecomment-661650753 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor
AmplabJenkins removed a comment on pull request #29032: URL: https://github.com/apache/spark/pull/29032#issuecomment-661649619 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126212/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning
SparkQA removed a comment on pull request #29014: URL: https://github.com/apache/spark/pull/29014#issuecomment-661598793 **[Test build #126213 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126213/testReport)** for PR 29014 at commit [`54c2235`](https://github.com/apache/spark/commit/54c2235e480e6673fb8ab84341a338d68970c3f3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor
AmplabJenkins removed a comment on pull request #29032: URL: https://github.com/apache/spark/pull/29032#issuecomment-661649608 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning
SparkQA commented on pull request #29014: URL: https://github.com/apache/spark/pull/29014#issuecomment-661649994 **[Test build #126213 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126213/testReport)** for PR 29014 at commit [`54c2235`](https://github.com/apache/spark/commit/54c2235e480e6673fb8ab84341a338d68970c3f3). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class ExecutorProcessLost(` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor
AmplabJenkins commented on pull request #29032: URL: https://github.com/apache/spark/pull/29032#issuecomment-661649608 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition
AmplabJenkins removed a comment on pull request #29172: URL: https://github.com/apache/spark/pull/29172#issuecomment-661648937 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition
AmplabJenkins commented on pull request #29172: URL: https://github.com/apache/spark/pull/29172#issuecomment-661648937 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29152: [SPARK-32356][SQL] Forbid create view with null type
cloud-fan commented on pull request #29152: URL: https://github.com/apache/spark/pull/29152#issuecomment-661648916 LGTM if tests pass. One last question: Hive also fails to create view if the column is null type? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition
SparkQA removed a comment on pull request #29172: URL: https://github.com/apache/spark/pull/29172#issuecomment-661503811 **[Test build #126209 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126209/testReport)** for PR 29172 at commit [`27bd6d5`](https://github.com/apache/spark/commit/27bd6d55ff85e4deecbb36eb04185382cca256d3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor
SparkQA removed a comment on pull request #29032: URL: https://github.com/apache/spark/pull/29032#issuecomment-661598746 **[Test build #126212 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126212/testReport)** for PR 29032 at commit [`71633f0`](https://github.com/apache/spark/commit/71633f0e433b4c81ffd626f1522327b0d6d21759). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition
SparkQA commented on pull request #29172: URL: https://github.com/apache/spark/pull/29172#issuecomment-661648372 **[Test build #126209 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126209/testReport)** for PR 29172 at commit [`27bd6d5`](https://github.com/apache/spark/commit/27bd6d55ff85e4deecbb36eb04185382cca256d3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor
SparkQA commented on pull request #29032: URL: https://github.com/apache/spark/pull/29032#issuecomment-661648257 **[Test build #126212 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126212/testReport)** for PR 29032 at commit [`71633f0`](https://github.com/apache/spark/commit/71633f0e433b4c81ffd626f1522327b0d6d21759). * This patch **fails PySpark pip packaging tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class ExecutorDecommissionInfo(message: String, isHostDecommissioned: Boolean)` * ` case class DecommissionExecutor(executorId: String, decommissionInfo: ExecutorDecommissionInfo)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29175: [SPARK-32377][SQL][2.4] CaseInsensitiveMap should be deterministic for addition
AmplabJenkins removed a comment on pull request #29175: URL: https://github.com/apache/spark/pull/29175#issuecomment-661645796 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29175: [SPARK-32377][SQL][2.4] CaseInsensitiveMap should be deterministic for addition
AmplabJenkins commented on pull request #29175: URL: https://github.com/apache/spark/pull/29175#issuecomment-661645796 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29175: [SPARK-32377][SQL][2.4] CaseInsensitiveMap should be deterministic for addition
SparkQA commented on pull request #29175: URL: https://github.com/apache/spark/pull/29175#issuecomment-661645518 **[Test build #126221 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126221/testReport)** for PR 29175 at commit [`91683e2`](https://github.com/apache/spark/commit/91683e2e161a150b0521c8289234a233b73cd98c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28422: [SPARK-17604][SS] FileStreamSource: provide a new option to have retention on input files
HeartSaVioR commented on pull request #28422: URL: https://github.com/apache/spark/pull/28422#issuecomment-661645101 FYI, I've initiated the discussion around this in dev@ mailing list, how to deal with "latestFirst" option and metadata growing. https://lists.apache.org/thread.html/r08e3a8d7df74354b38d19ffdebe1afe7fa73c2f611f0a812a867dffb%40%3Cdev.spark.apache.org%3E This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29175: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition
dongjoon-hyun commented on a change in pull request #29175: URL: https://github.com/apache/spark/pull/29175#discussion_r457847560 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/CaseInsensitiveMapSuite.scala ## @@ -0,0 +1,30 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import org.apache.spark.SparkFunSuite + +class CaseInsensitiveMapSuite extends SparkFunSuite { Review comment: I want to keep this test in `catalyst` module, but cannot find a proper test suite. I'm open to all suggestion for a better test suite if exists. ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/CaseInsensitiveMapSuite.scala ## @@ -0,0 +1,30 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import org.apache.spark.SparkFunSuite + +class CaseInsensitiveMapSuite extends SparkFunSuite { Review comment: I want to keep this test in `catalyst` module, but cannot find a proper test suite. I'm open to any suggestion for a better test suite if exists. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org