date:20200720

[GitHub] [spark] AmplabJenkins commented on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-20 Thread GitBox



AmplabJenkins commented on pull request #29104:
URL: https://github.com/apache/spark/pull/29104#issuecomment-661673834







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-20 Thread GitBox



SparkQA removed a comment on pull request #29104:
URL: https://github.com/apache/spark/pull/29104#issuecomment-661604422


   **[Test build #126215 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126215/testReport)**
 for PR 29104 at commit 
[`0db75b3`](https://github.com/apache/spark/commit/0db75b33bbff9de6791b1c3b5107a747c732bca9).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28911: [SPARK-32077][CORE] Support host-local shuffle data reading when external shuffle service is disabled

2020-07-20 Thread GitBox



AmplabJenkins commented on pull request #28911:
URL: https://github.com/apache/spark/pull/28911#issuecomment-661673216







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28911: [SPARK-32077][CORE] Support host-local shuffle data reading when external shuffle service is disabled

2020-07-20 Thread GitBox



AmplabJenkins removed a comment on pull request #28911:
URL: https://github.com/apache/spark/pull/28911#issuecomment-661673216







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

2020-07-20 Thread GitBox



viirya commented on pull request #29079:
URL: https://github.com/apache/spark/pull/29079#issuecomment-661673387


   It's too late today. I will take another look tomorrow if this is not merged 
yet.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #26953: [SPARK-30306][CORE][PYTHON] Instrument Python UDF execution time and throughput metrics using Spark Metrics system

2020-07-20 Thread GitBox



AmplabJenkins commented on pull request #26953:
URL: https://github.com/apache/spark/pull/26953#issuecomment-661673221







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #26953: [SPARK-30306][CORE][PYTHON] Instrument Python UDF execution time and throughput metrics using Spark Metrics system

2020-07-20 Thread GitBox



AmplabJenkins removed a comment on pull request #26953:
URL: https://github.com/apache/spark/pull/26953#issuecomment-661673221







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-20 Thread GitBox



AmplabJenkins removed a comment on pull request #29104:
URL: https://github.com/apache/spark/pull/29104#issuecomment-661664316


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126214/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-20 Thread GitBox



SparkQA commented on pull request #29104:
URL: https://github.com/apache/spark/pull/29104#issuecomment-661673065


   **[Test build #126215 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126215/testReport)**
 for PR 29104 at commit 
[`0db75b3`](https://github.com/apache/spark/commit/0db75b33bbff9de6791b1c3b5107a747c732bca9).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] leanken commented on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-20 Thread GitBox



leanken commented on pull request #29104:
URL: https://github.com/apache/spark/pull/29104#issuecomment-661672887


   > In ExtractEquiJoinKeys (in patterns.scala) there is code like this:
   > 
   > ```
   > case EqualTo(l, r) if canEvaluate(l, left) && canEvaluate(r, 
right) => Some((l, r))
   > case EqualTo(l, r) if canEvaluate(l, right) && canEvaluate(r, 
left) => Some((r, l))
   > ```
   > 
   > So I am wondering if it is possible that LeftAnti join can actually have 
the left side as the build and the right side as streaming ? Is it possible we 
won't optimize that case ? Do you think it makes sense to ensure that BNLJ is 
not present for any not-in query ? (ie this optimization should kick in always).
   > 
   > I am still a bit surprised that you didn't have to modify any .sql.out 
files because the plan would have changed from BNLJ to BHJ.
   
   I think when both canBroadcastBySize(left) and canBroadcastBySize(right) are 
false, the ExtractSingleColumnNullAwareAntiJoin pattern will be misMatch,  then 
it will still fallback to the origin BNLJ, FYI.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #26953: [SPARK-30306][CORE][PYTHON] Instrument Python UDF execution time and throughput metrics using Spark Metrics system

2020-07-20 Thread GitBox



SparkQA commented on pull request #26953:
URL: https://github.com/apache/spark/pull/26953#issuecomment-661672747


   **[Test build #126228 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126228/testReport)**
 for PR 26953 at commit 
[`277245c`](https://github.com/apache/spark/commit/277245c15c9f635c0747528f32a10c6aa857707e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28911: [SPARK-32077][CORE] Support host-local shuffle data reading when external shuffle service is disabled

2020-07-20 Thread GitBox



SparkQA commented on pull request #28911:
URL: https://github.com/apache/spark/pull/28911#issuecomment-661672706


   **[Test build #126227 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126227/testReport)**
 for PR 28911 at commit 
[`bcb6012`](https://github.com/apache/spark/commit/bcb60123ef68feacf4d571de47cc99068b8496b8).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #28911: [SPARK-32077][CORE] Support host-local shuffle data reading when external shuffle service is disabled

2020-07-20 Thread GitBox



Ngone51 commented on a change in pull request #28911:
URL: https://github.com/apache/spark/pull/28911#discussion_r457876717



##
File path: 
core/src/main/scala/org/apache/spark/network/netty/NettyBlockTransferService.scala
##
@@ -194,6 +196,45 @@ private[spark] class NettyBlockTransferService(
 result.future
   }
 
+  override def getHostLocalDirs(
+  host: String,
+  port: Int,
+  execIds: Array[String],
+  hostLocalDirsCompletable: CompletableFuture[util.Map[String, 
Array[String]]]): Unit = {
+val getLocalDirsMessage = new GetLocalDirsForExecutors(appId, execIds)
+try {
+  val client = clientFactory.createClient(host, port)
+  client.sendRpc(getLocalDirsMessage.toByteBuffer, new 
RpcResponseCallback() {
+override def onSuccess(response: ByteBuffer): Unit = {
+  try {
+val msgObj = BlockTransferMessage.Decoder.fromByteBuffer(response)
+hostLocalDirsCompletable.complete(
+  msgObj.asInstanceOf[LocalDirsForExecutors].getLocalDirsByExec)
+  } catch {
+case t: Throwable =>
+  logWarning(s"Error trying to get the host local dirs for 
executor ${execIds.head}",
+t.getCause)
+  hostLocalDirsCompletable.completeExceptionally(t)
+  } finally {
+client.close()
+  }
+}
+
+override def onFailure(t: Throwable): Unit = {
+  logWarning(s"Error trying to get the host local dirs for executor 
${execIds.head}",
+t.getCause)
+  hostLocalDirsCompletable.completeExceptionally(t)
+  client.close()
+}
+  })
+} catch {
+  case e: IOException =>

Review comment:
   Yes. Good idea. I've did the refactor. Please take a look.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #29166: [SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict plan

2020-07-20 Thread GitBox



viirya commented on a change in pull request #29166:
URL: https://github.com/apache/spark/pull/29166#discussion_r457874203



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1237,20 +1249,44 @@ class Analyzer(
   if (conflictPlans.isEmpty) {
 right
   } else {
-val attributeRewrites = AttributeMap(conflictPlans.flatMap {
-  case (oldRelation, newRelation) => 
oldRelation.output.zip(newRelation.output)})
-val conflictPlanMap = conflictPlans.toMap
-// transformDown so that we can replace all the old Relations in one 
turn due to
-// the reason that `conflictPlans` are also collected in pre-order.
-right transformDown {
-  case r => conflictPlanMap.getOrElse(r, r)
-} transformUp {
-  case other => other transformExpressions {
+rewritePlan(right, conflictPlans.toMap)._1
+  }
+}
+
+private def rewritePlan(plan: LogicalPlan, conflictPlanMap: 
Map[LogicalPlan, LogicalPlan])
+  : (LogicalPlan, mutable.ArrayBuffer[(Attribute, Attribute)]) = {
+  val attrMapping = new mutable.ArrayBuffer[(Attribute, Attribute)]()
+  if (conflictPlanMap.contains(plan)) {
+// If the plan is the one that conflict the with left one, we'd
+// just replace it with the new plan and collect the rewrite
+// attributes for the parent node.
+val newRelation = conflictPlanMap(plan)
+attrMapping ++= plan.output.zip(newRelation.output)
+newRelation -> attrMapping
+  } else {
+var newPlan = plan.mapChildren { child =>
+  // If not, we'd rewrite child plan recursively until we find the
+  // conflict node or reach the leaf node.
+  val (newChild, childAttrMapping) = rewritePlan(child, 
conflictPlanMap)
+  attrMapping ++= childAttrMapping
+  newChild
+}
+
+if (attrMapping.isEmpty) {
+  newPlan -> attrMapping
+} else {
+  assert(!attrMapping.groupBy(_._1.exprId)
+.exists(_._2.map(_._2.exprId).distinct.length > 1),
+"Found duplicate rewrite attributes")
+  val attributeRewrites = AttributeMap(attrMapping)
+  // rewrite the attributes of parent node
+  newPlan = newPlan.transformExpressions {

Review comment:
   Oh, I see. This looks more clear. +1 for this change.

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1192,11 +1192,23 @@ class Analyzer(
 if 
findAliases(projectList).intersect(conflictingAttributes).nonEmpty =>
   Seq((oldVersion, oldVersion.copy(projectList = 
newAliases(projectList
 
+// We don't need to search child plan recursively if the projectList 
of a Project
+// is only composed of Alias and doesn't contain any conflicting 
attributes.
+// Because, even if the child plan has some conflicting attributes, 
the attributes
+// will be aliased to non-conflicting attributes by the Project at the 
end.
+case _ @ Project(projectList, _)
+  if findAliases(projectList).size == projectList.size =>
+  Nil

Review comment:
   Don't we need to put this before previous `Project` pattern?

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1192,11 +1192,23 @@ class Analyzer(
 if 
findAliases(projectList).intersect(conflictingAttributes).nonEmpty =>
   Seq((oldVersion, oldVersion.copy(projectList = 
newAliases(projectList
 
+// We don't need to search child plan recursively if the projectList 
of a Project
+// is only composed of Alias and doesn't contain any conflicting 
attributes.
+// Because, even if the child plan has some conflicting attributes, 
the attributes
+// will be aliased to non-conflicting attributes by the Project at the 
end.
+case _ @ Project(projectList, _)
+  if findAliases(projectList).size == projectList.size =>
+  Nil
+
 case oldVersion @ Aggregate(_, aggregateExpressions, _)
 if 
findAliases(aggregateExpressions).intersect(conflictingAttributes).nonEmpty =>
   Seq((oldVersion, oldVersion.copy(
 aggregateExpressions = newAliases(aggregateExpressions
 
+case _ @ Aggregate(_, aggregateExpressions, _)

Review comment:
   Same reason as above? Add a simple comment too?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr.

[GitHub] [spark] Ngone51 commented on a change in pull request #28911: [SPARK-32077][CORE] Support host-local shuffle data reading when external shuffle service is disabled

2020-07-20 Thread GitBox



Ngone51 commented on a change in pull request #28911:
URL: https://github.com/apache/spark/pull/28911#discussion_r457876162



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/BlockStoreClient.java
##
@@ -61,4 +63,17 @@ public MetricSet shuffleMetrics() {
 // Return an empty MetricSet by default.
 return () -> Collections.emptyMap();
   }
+
+  /**
+   * Request the local disk directories, which are specified by 
DiskBlockManager, for the executors
+   * from the external shuffle service (when this is a 
ExternalBlockStoreClient) or BlockManager
+   * (when this is a NettyBlockTransferService). Note there's only one 
executor when this is a
+   * NettyBlockTransferService because we ask one specific executor at a time.

Review comment:
   I added the check to ensure it's the only one executor id but didn't 
check its equality with blockManager's executor id. Because we only have 
`BlockDataManager` in `NettyBlockRpcServer` which does not expose executor id. 
   
   I am still wondering whether it's worthwhile to expose it for the sanity 
check purpose.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-20 Thread GitBox



leanken commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r457870914



##
File path: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java
##
@@ -171,6 +171,23 @@
   private volatile MapIterator destructiveIterator = null;
   private LinkedList spillWriters = new 
LinkedList<>();
 
+  private boolean anyNullKeyExists = false;
+
+  public boolean inputEmpty()
+  {
+return ((numKeys == 0) && !anyNullKeyExists);
+  }
+
+  public boolean isAnyNullKeyExists()
+  {
+return anyNullKeyExists;
+  }
+
+  public void setAnyNullKeyExists(boolean anyNullKeyExists)
+  {
+this.anyNullKeyExists = anyNullKeyExists;

Review comment:
   ```
   yes, no extra scan is needed.
   I set anyNullKeyExists during going through the input iterator
   if input is empty or there are no null keys row, it will stay as default 
value false.
   
   while (input.hasNext) {
 val row = input.next().asInstanceOf[UnsafeRow]
 numFields = row.numFields()
 val key = keyGenerator(row)
 if (!key.anyNull) {
   val loc = binaryMap.lookup(key.getBaseObject, key.getBaseOffset, 
key.getSizeInBytes)
   val success = loc.append(
 key.getBaseObject, key.getBaseOffset, key.getSizeInBytes,
 row.getBaseObject, row.getBaseOffset, row.getSizeInBytes)
   if (!success) {
 binaryMap.free()
 // scalastyle:off throwerror
 throw new SparkOutOfMemoryError("There is not enough memory to 
build hash map")
 // scalastyle:on throwerror
   }
 } else {
   binaryMap.setAnyNullKeyExists(true)   // HERE
 }
   }
   
   
   
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] leanken commented on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-20 Thread GitBox



leanken commented on pull request #29104:
URL: https://github.com/apache/spark/pull/29104#issuecomment-661669277


   > In ExtractEquiJoinKeys (in patterns.scala) there is code like this:
   > 
   > ```
   > case EqualTo(l, r) if canEvaluate(l, left) && canEvaluate(r, 
right) => Some((l, r))
   > case EqualTo(l, r) if canEvaluate(l, right) && canEvaluate(r, 
left) => Some((r, l))
   > ```
   > 
   > So I am wondering if it is possible that LeftAnti join can actually have 
the left side as the build and the right side as streaming ? Is it possible we 
won't optimize that case ? Do you think it makes sense to ensure that BNLJ is 
not present for any not-in query ? (ie this optimization should kick in always).
   > 
   > I am still a bit surprised that you didn't have to modify any .sql.out 
files because the plan would have changed from BNLJ to BHJ.
   
   .sql.out only verified the output schema and output answer, as long as the 
change itself could come out with right answer, the verified will pass.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-20 Thread GitBox



leanken commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r457873194



##
File path: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java
##
@@ -171,6 +171,23 @@
   private volatile MapIterator destructiveIterator = null;
   private LinkedList spillWriters = new 
LinkedList<>();
 
+  private boolean anyNullKeyExists = false;
+
+  public boolean inputEmpty()
+  {
+return ((numKeys == 0) && !anyNullKeyExists);

Review comment:
   > I am still a bit surprised that you didn't have to modify any .sql.out 
files because the plan would have changed from BNLJ to BHJ.
   
   .sql.out only verified the output schema and output answer, as long as the 
change itself could come out with right answer, the verified will pass.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-20 Thread GitBox



leanken commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r457870914



##
File path: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java
##
@@ -171,6 +171,23 @@
   private volatile MapIterator destructiveIterator = null;
   private LinkedList spillWriters = new 
LinkedList<>();
 
+  private boolean anyNullKeyExists = false;
+
+  public boolean inputEmpty()
+  {
+return ((numKeys == 0) && !anyNullKeyExists);
+  }
+
+  public boolean isAnyNullKeyExists()
+  {
+return anyNullKeyExists;
+  }
+
+  public void setAnyNullKeyExists(boolean anyNullKeyExists)
+  {
+this.anyNullKeyExists = anyNullKeyExists;

Review comment:
   ```
   I set anyNullKeyExists during going through the input iterator
   if input is empty or there are no null keys row, it will stay as default 
value false.
   
   while (input.hasNext) {
 val row = input.next().asInstanceOf[UnsafeRow]
 numFields = row.numFields()
 val key = keyGenerator(row)
 if (!key.anyNull) {
   val loc = binaryMap.lookup(key.getBaseObject, key.getBaseOffset, 
key.getSizeInBytes)
   val success = loc.append(
 key.getBaseObject, key.getBaseOffset, key.getSizeInBytes,
 row.getBaseObject, row.getBaseOffset, row.getSizeInBytes)
   if (!success) {
 binaryMap.free()
 // scalastyle:off throwerror
 throw new SparkOutOfMemoryError("There is not enough memory to 
build hash map")
 // scalastyle:on throwerror
   }
 } else {
   binaryMap.setAnyNullKeyExists(true)   // HERE
 }
   }
   
   
   
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-20 Thread GitBox



leanken commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r457871466



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
##
@@ -71,6 +71,16 @@ private[execution] sealed trait HashedRelation extends 
KnownSizeEstimation {
*/
   def keyIsUnique: Boolean
 
+  /**
+   * is input: Iterator[InternalRow] empty

Review comment:
   will do.

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
##
@@ -71,6 +71,16 @@ private[execution] sealed trait HashedRelation extends 
KnownSizeEstimation {
*/
   def keyIsUnique: Boolean
 
+  /**
+   * is input: Iterator[InternalRow] empty
+   */
+  def inputEmpty: Boolean
+
+  /**
+   * anyNull key exists in input

Review comment:
   will do.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-20 Thread GitBox



leanken commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r457870914



##
File path: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java
##
@@ -171,6 +171,23 @@
   private volatile MapIterator destructiveIterator = null;
   private LinkedList spillWriters = new 
LinkedList<>();
 
+  private boolean anyNullKeyExists = false;
+
+  public boolean inputEmpty()
+  {
+return ((numKeys == 0) && !anyNullKeyExists);
+  }
+
+  public boolean isAnyNullKeyExists()
+  {
+return anyNullKeyExists;
+  }
+
+  public void setAnyNullKeyExists(boolean anyNullKeyExists)
+  {
+this.anyNullKeyExists = anyNullKeyExists;

Review comment:
   ```
   I set anyNullKeyExists during go through the input iterator
   
   while (input.hasNext) {
 val row = input.next().asInstanceOf[UnsafeRow]
 numFields = row.numFields()
 val key = keyGenerator(row)
 if (!key.anyNull) {
   val loc = binaryMap.lookup(key.getBaseObject, key.getBaseOffset, 
key.getSizeInBytes)
   val success = loc.append(
 key.getBaseObject, key.getBaseOffset, key.getSizeInBytes,
 row.getBaseObject, row.getBaseOffset, row.getSizeInBytes)
   if (!success) {
 binaryMap.free()
 // scalastyle:off throwerror
 throw new SparkOutOfMemoryError("There is not enough memory to 
build hash map")
 // scalastyle:on throwerror
   }
 } else {
   binaryMap.setAnyNullKeyExists(true)   // HERE
 }
   }
   
   
   
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode

2020-07-20 Thread GitBox



AmplabJenkins removed a comment on pull request #28968:
URL: https://github.com/apache/spark/pull/28968#issuecomment-661666519


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126222/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-20 Thread GitBox



leanken commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r457870914



##
File path: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java
##
@@ -171,6 +171,23 @@
   private volatile MapIterator destructiveIterator = null;
   private LinkedList spillWriters = new 
LinkedList<>();
 
+  private boolean anyNullKeyExists = false;
+
+  public boolean inputEmpty()
+  {
+return ((numKeys == 0) && !anyNullKeyExists);
+  }
+
+  public boolean isAnyNullKeyExists()
+  {
+return anyNullKeyExists;
+  }
+
+  public void setAnyNullKeyExists(boolean anyNullKeyExists)
+  {
+this.anyNullKeyExists = anyNullKeyExists;

Review comment:
   ```
   I set anyNullKeyExists during going through the input iterator
   
   while (input.hasNext) {
 val row = input.next().asInstanceOf[UnsafeRow]
 numFields = row.numFields()
 val key = keyGenerator(row)
 if (!key.anyNull) {
   val loc = binaryMap.lookup(key.getBaseObject, key.getBaseOffset, 
key.getSizeInBytes)
   val success = loc.append(
 key.getBaseObject, key.getBaseOffset, key.getSizeInBytes,
 row.getBaseObject, row.getBaseOffset, row.getSizeInBytes)
   if (!success) {
 binaryMap.free()
 // scalastyle:off throwerror
 throw new SparkOutOfMemoryError("There is not enough memory to 
build hash map")
 // scalastyle:on throwerror
   }
 } else {
   binaryMap.setAnyNullKeyExists(true)   // HERE
 }
   }
   
   
   
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode

2020-07-20 Thread GitBox



SparkQA removed a comment on pull request #28968:
URL: https://github.com/apache/spark/pull/28968#issuecomment-661658276


   **[Test build #126222 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126222/testReport)**
 for PR 28968 at commit 
[`a78fd43`](https://github.com/apache/spark/commit/a78fd4314ba39d1feb63ba1539ac9a2acf40de77).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [SPARK-32363][PYTHON][BUILD] Fix flakiness in pip package testing in Jenkins

2020-07-20 Thread GitBox



AmplabJenkins removed a comment on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-661665966


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126216/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sekingme commented on pull request #29173: [SPARK-32378][YARN] Fix permission problem while prepareLocalResources

2020-07-20 Thread GitBox



sekingme commented on pull request #29173:
URL: https://github.com/apache/spark/pull/29173#issuecomment-661666591


   > @sekingme, please file a JIRA and format the PR title correctly. See also 
http://spark.apache.org/contributing.html.
   
   @HyukjinKwon  done~



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode

2020-07-20 Thread GitBox



AmplabJenkins removed a comment on pull request #28968:
URL: https://github.com/apache/spark/pull/28968#issuecomment-661666508


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode

2020-07-20 Thread GitBox



AmplabJenkins commented on pull request #28968:
URL: https://github.com/apache/spark/pull/28968#issuecomment-661666508







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sekingme removed a comment on pull request #29173: [SPARK-32378][YARN] Fix permission problem while prepareLocalResources

2020-07-20 Thread GitBox



sekingme removed a comment on pull request #29173:
URL: https://github.com/apache/spark/pull/29173#issuecomment-661666414


   > @sekingme, please file a JIRA and format the PR title correctly. See also 
http://spark.apache.org/contributing.html.
   
   Done~



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode

2020-07-20 Thread GitBox



SparkQA commented on pull request #28968:
URL: https://github.com/apache/spark/pull/28968#issuecomment-661666372


   **[Test build #126222 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126222/testReport)**
 for PR 28968 at commit 
[`a78fd43`](https://github.com/apache/spark/commit/a78fd4314ba39d1feb63ba1539ac9a2acf40de77).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `class InheritableThread(threading.Thread):`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sekingme commented on pull request #29173: [SPARK-32378][YARN] Fix permission problem while prepareLocalResources

2020-07-20 Thread GitBox



sekingme commented on pull request #29173:
URL: https://github.com/apache/spark/pull/29173#issuecomment-661666414


   > @sekingme, please file a JIRA and format the PR title correctly. See also 
http://spark.apache.org/contributing.html.
   
   Done~



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [SPARK-32363][PYTHON][BUILD] Fix flakiness in pip package testing in Jenkins

2020-07-20 Thread GitBox



AmplabJenkins removed a comment on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-661665961


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29117: [SPARK-32363][PYTHON][BUILD] Fix flakiness in pip package testing in Jenkins

2020-07-20 Thread GitBox



AmplabJenkins commented on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-661665961







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-20 Thread GitBox



leanken commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r457869773



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
##
@@ -388,3 +390,36 @@ object PhysicalWindow {
 case _ => None
   }
 }
+
+object ExtractSingleColumnNullAwareAntiJoin extends JoinSelectionHelper {
+
+  // SingleColumn NullAwareAntiJoin
+  // streamedSideKeys, buildSideKeys
+  // currently these two return Seq[Expression] should have only one element
+  private type ReturnType = (Seq[Expression], Seq[Expression])
+
+  /**
+   * See. [SPARK-32290]
+   * LeftAnti(condition: Or(EqualTo(a=b), IsNull(EqualTo(a=b)))
+   * will almost certainly be planned as a Broadcast Nested Loop join,
+   * which is very time consuming because it's an O(M*N) calculation.
+   * But if it's a single column case, and buildSide data is small enough,
+   * O(M*N) calculation could be optimized into O(M) using hash lookup instead 
of loop lookup.
+   */
+  def unapply(join: Join): Option[ReturnType] = join match {
+case Join(left, right, LeftAnti,
+  Some(Or(EqualTo(leftAttr: AttributeReference, rightAttr: 
AttributeReference),
+IsNull(EqualTo(tmpLeft: AttributeReference, tmpRight: 
AttributeReference, _)
+if SQLConf.get.nullAwareAntiJoinOptimizeEnabled &&
+  leftAttr.semanticEquals(tmpLeft) && 
rightAttr.semanticEquals(tmpRight) &&

Review comment:
   > Should this also refer to `canEvaluate` as done in 
`ExtractEquiJoinKeys`
   
   I think semanticEquals is necessary. Because besides not in subquery can 
translate into LeftAnti
   
   I can always write a SQL like
   
   select * from a left anti join b on a.key = b.key OR isnull(a.key=b.keyB) 
which passed the canEvaluate of left or right plan. But it's not right. I think 
the pattern rule here is should be as Strict as we can. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29153: [SPARK-32310][ML][PySpark][WIP] ML params default value parity in feature and tuning

2020-07-20 Thread GitBox



AmplabJenkins removed a comment on pull request #29153:
URL: https://github.com/apache/spark/pull/29153#issuecomment-661665075


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126220/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29117: [SPARK-32363][PYTHON][BUILD] Fix flakiness in pip package testing in Jenkins

2020-07-20 Thread GitBox



SparkQA removed a comment on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-661611654


   **[Test build #126216 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126216/testReport)**
 for PR 29117 at commit 
[`51187d1`](https://github.com/apache/spark/commit/51187d1e012ea8e6259492d125037a32ea75c1f1).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29153: [SPARK-32310][ML][PySpark][WIP] ML params default value parity in feature and tuning

2020-07-20 Thread GitBox



SparkQA removed a comment on pull request #29153:
URL: https://github.com/apache/spark/pull/29153#issuecomment-661643229


   **[Test build #126220 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126220/testReport)**
 for PR 29153 at commit 
[`1586e30`](https://github.com/apache/spark/commit/1586e3079b442daf5ab5332a3d690f218df423cc).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29153: [SPARK-32310][ML][PySpark][WIP] ML params default value parity in feature and tuning

2020-07-20 Thread GitBox



AmplabJenkins removed a comment on pull request #29153:
URL: https://github.com/apache/spark/pull/29153#issuecomment-661665066


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29117: [SPARK-32363][PYTHON][BUILD] Fix flakiness in pip package testing in Jenkins

2020-07-20 Thread GitBox



SparkQA commented on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-661665187


   **[Test build #126216 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126216/testReport)**
 for PR 29117 at commit 
[`51187d1`](https://github.com/apache/spark/commit/51187d1e012ea8e6259492d125037a32ea75c1f1).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29153: [SPARK-32310][ML][PySpark][WIP] ML params default value parity in feature and tuning

2020-07-20 Thread GitBox



SparkQA commented on pull request #29153:
URL: https://github.com/apache/spark/pull/29153#issuecomment-661664851


   **[Test build #126220 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126220/testReport)**
 for PR 29153 at commit 
[`1586e30`](https://github.com/apache/spark/commit/1586e3079b442daf5ab5332a3d690f218df423cc).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29153: [SPARK-32310][ML][PySpark][WIP] ML params default value parity in feature and tuning

2020-07-20 Thread GitBox



AmplabJenkins commented on pull request #29153:
URL: https://github.com/apache/spark/pull/29153#issuecomment-661665066







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-20 Thread GitBox



AmplabJenkins removed a comment on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-661664391







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-20 Thread GitBox



AmplabJenkins removed a comment on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-661664389







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-20 Thread GitBox



AmplabJenkins commented on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-661664391







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-20 Thread GitBox



AmplabJenkins removed a comment on pull request #29104:
URL: https://github.com/apache/spark/pull/29104#issuecomment-661664297


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-20 Thread GitBox



AmplabJenkins commented on pull request #29104:
URL: https://github.com/apache/spark/pull/29104#issuecomment-661664297







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-20 Thread GitBox



AmplabJenkins commented on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-661664389







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] leanken commented on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-20 Thread GitBox



leanken commented on pull request #29104:
URL: https://github.com/apache/spark/pull/29104#issuecomment-661664217


   > In ExtractEquiJoinKeys (in patterns.scala) there is code like this:
   > 
   > ```
   > case EqualTo(l, r) if canEvaluate(l, left) && canEvaluate(r, 
right) => Some((l, r))
   > case EqualTo(l, r) if canEvaluate(l, right) && canEvaluate(r, 
left) => Some((r, l))
   > ```
   > 
   > So I am wondering if it is possible that LeftAnti join can actually have 
the left side as the build and the right side as streaming ? Is it possible we 
won't optimize that case ? Do you think it makes sense to ensure that BNLJ is 
not present for any not-in query ? (ie this optimization should kick in always).
   > 
   > I am still a bit surprised that you didn't have to modify any .sql.out 
files because the plan would have changed from BNLJ to BHJ.
   ```
   
   join.scala has restricted rules about left anti join build side
   it seems that only InnerLike can support both buildLeft and buildRight
   
   def canBuildLeft(joinType: JoinType): Boolean = {
   joinType match {
 case _: InnerLike | RightOuter => true
 case _ => false
   }
 }
   
 def canBuildRight(joinType: JoinType): Boolean = {
   joinType match {
 case _: InnerLike | LeftOuter | LeftSemi | LeftAnti | _: ExistenceJoin 
=> true
 case _ => false
   }
 }
   
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-20 Thread GitBox



AngersZh commented on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-661663602


   @maropu  Update for spark defult serde support ArrayType/MapType/StructType 
and UT changed too



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #29175: [SPARK-32377][SQL][2.4] CaseInsensitiveMap should be deterministic for addition

2020-07-20 Thread GitBox



dongjoon-hyun commented on pull request #29175:
URL: https://github.com/apache/spark/pull/29175#issuecomment-661663853


   Thank you, @HyukjinKwon .



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-20 Thread GitBox



SparkQA commented on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-661663875


   **[Test build #126226 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126226/testReport)**
 for PR 29014 at commit 
[`05c871f`](https://github.com/apache/spark/commit/05c871f2672db9c7fb814441c3d201c4aea654c3).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] agrawaldevesh commented on a change in pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-20 Thread GitBox



agrawaldevesh commented on a change in pull request #29014:
URL: https://github.com/apache/spark/pull/29014#discussion_r457867091



##
File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
##
@@ -1767,10 +1767,18 @@ private[spark] class DAGScheduler(
 
   // TODO: mark the executor as failed only if there were lots of 
fetch failures on it
   if (bmAddress != null) {
-val hostToUnregisterOutputs = if 
(env.blockManager.externalShuffleServiceEnabled &&
-  unRegisterOutputOnHostOnFetchFailure) {
-  // We had a fetch failure with the external shuffle service, so 
we
-  // assume all shuffle data on the node is bad.
+val externalShuffleServiceEnabled = 
env.blockManager.externalShuffleServiceEnabled
+val isHostDecommissioned = taskScheduler
+  .getExecutorDecommissionInfo(bmAddress.executorId)
+  .exists(_.isHostDecommissioned)
+// Host shuffle data is considered lost if:
+// - If we know that the host was decommissioned
+// - Or when `unRegisterOutputOnHostOnFetchFailure` is enabled and 
we had
+//   a fetch failure with the external shuffle service, so we 
assume all
+//   shuffle data on the node is bad.
+val hostLost = isHostDecommissioned || 
(externalShuffleServiceEnabled &&

Review comment:
   Edit: @attilapiros thanks for bringing me to rework this. I think I get 
the main intention of your suggestion now: unRegisterOutputOnHostOnFetchFailure 
now uniformly applies to both normal fetch failures and decommissioning. So the 
description does not need to be reworked. Please take a look at the new version.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-20 Thread GitBox



SparkQA commented on pull request #29104:
URL: https://github.com/apache/spark/pull/29104#issuecomment-661663518


   **[Test build #126214 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126214/testReport)**
 for PR 29104 at commit 
[`65c51bb`](https://github.com/apache/spark/commit/65c51bb21a58a5b0d3977674517c6b78d55524d7).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-20 Thread GitBox



SparkQA removed a comment on pull request #29104:
URL: https://github.com/apache/spark/pull/29104#issuecomment-661601896


   **[Test build #126214 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126214/testReport)**
 for PR 29104 at commit 
[`65c51bb`](https://github.com/apache/spark/commit/65c51bb21a58a5b0d3977674517c6b78d55524d7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-20 Thread GitBox



SparkQA commented on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-661663833


   **[Test build #126225 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126225/testReport)**
 for PR 29085 at commit 
[`cfecc90`](https://github.com/apache/spark/commit/cfecc90861ecae94a90e37654412fb31e934d14e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] leanken commented on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-20 Thread GitBox



leanken commented on pull request #29104:
URL: https://github.com/apache/spark/pull/29104#issuecomment-661663720


   > canEvaluate
   
   ```
   join.scala has restricted rules about left anti join build side
   
   def canBuildLeft(joinType: JoinType): Boolean = {
   joinType match {
 case _: InnerLike | RightOuter => true
 case _ => false
   }
 }
   
 def canBuildRight(joinType: JoinType): Boolean = {
   joinType match {
 case _: InnerLike | LeftOuter | LeftSemi | LeftAnti | _: ExistenceJoin 
=> true
 case _ => false
   }
 }
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command

2020-07-20 Thread GitBox



AmplabJenkins removed a comment on pull request #28840:
URL: https://github.com/apache/spark/pull/28840#issuecomment-661661384







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command

2020-07-20 Thread GitBox



AmplabJenkins commented on pull request #28840:
URL: https://github.com/apache/spark/pull/28840#issuecomment-661661384







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29170: [SPARK-30876][SQL]: Optimizer fails to infer constraints within join

2020-07-20 Thread GitBox



AmplabJenkins commented on pull request #29170:
URL: https://github.com/apache/spark/pull/29170#issuecomment-661661397







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29170: [SPARK-30876][SQL]: Optimizer fails to infer constraints within join

2020-07-20 Thread GitBox



AmplabJenkins removed a comment on pull request #29170:
URL: https://github.com/apache/spark/pull/29170#issuecomment-661661397







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command

2020-07-20 Thread GitBox



cloud-fan commented on a change in pull request #28840:
URL: https://github.com/apache/spark/pull/28840#discussion_r457864581



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala
##
@@ -236,6 +236,44 @@ case class ShowFunctionsCommand(
   }
 }
 
+
+/**
+ * A command for users to refresh the persistent function.
+ * The syntax of using this command in SQL is:
+ * {{{
+ *REFRESH FUNCTION functionName
+ * }}}
+ */
+case class RefreshFunctionCommand(
+databaseName: Option[String],
+functionName: String)
+  extends RunnableCommand {
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+if 
(FunctionRegistry.builtin.functionExists(FunctionIdentifier(functionName))) {
+  throw new AnalysisException(s"Cannot refresh builtin function 
$functionName")
+}
+if (catalog.isTemporaryFunction(FunctionIdentifier(functionName, 
databaseName))) {
+  throw new AnalysisException(s"Cannot refresh temporary function 
$functionName")
+}
+
+val identifier = FunctionIdentifier(
+  functionName, Some(databaseName.getOrElse(catalog.getCurrentDatabase)))
+// we only refresh the permanent function.
+if (catalog.isPersistentFunction(identifier)) {
+  // register overwrite function.
+  val func = catalog.getFunctionMetadata(identifier)
+  catalog.registerFunction(func, true)
+} else {
+  // clear cached function.
+  catalog.unregisterFunction(identifier)

Review comment:
   can you change it? I'd expect something like
   ```
   catalog.unregisterFunction(identifier)
   throw new NoSuchFunction...
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29170: [SPARK-30876][SQL]: Optimizer fails to infer constraints within join

2020-07-20 Thread GitBox



SparkQA commented on pull request #29170:
URL: https://github.com/apache/spark/pull/29170#issuecomment-661660909


   **[Test build #126223 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126223/testReport)**
 for PR 29170 at commit 
[`6f01ac9`](https://github.com/apache/spark/commit/6f01ac9c5b85109a194d31c8afd25e00fda77f0a).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command

2020-07-20 Thread GitBox



SparkQA commented on pull request #28840:
URL: https://github.com/apache/spark/pull/28840#issuecomment-661660951


   **[Test build #126224 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126224/testReport)**
 for PR 28840 at commit 
[`fc4789f`](https://github.com/apache/spark/commit/fc4789fcb5357bd1a7cfc88b76c7d76822457db7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command

2020-07-20 Thread GitBox



cloud-fan commented on pull request #28840:
URL: https://github.com/apache/spark/pull/28840#issuecomment-661660176


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on pull request #29170: [SPARK-30876][SQL]: Optimizer fails to infer constraints within join

2020-07-20 Thread GitBox



maropu commented on pull request #29170:
URL: https://github.com/apache/spark/pull/29170#issuecomment-661659346


   Thanks for the contribution, @navinvishy 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29170: [SPARK-30876][SQL]: Optimizer fails to infer constraints within join

2020-07-20 Thread GitBox



AmplabJenkins removed a comment on pull request #29170:
URL: https://github.com/apache/spark/pull/29170#issuecomment-661458294


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #29170: [SPARK-30876][SQL]: Optimizer fails to infer constraints within join

2020-07-20 Thread GitBox



maropu commented on a change in pull request #29170:
URL: https://github.com/apache/spark/pull/29170#discussion_r457862395



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##
@@ -116,7 +116,8 @@ abstract class Optimizer(catalogManager: CatalogManager)
 operatorOptimizationRuleSet.filterNot(_ == InferFiltersFromConstraints)
   Batch("Operator Optimization before Inferring Filters", fixedPoint,
 rulesWithoutInferFiltersFromConstraints: _*) ::
-  Batch("Infer Filters", Once,
+  Batch("Infer Filters", fixedPoint,
+PushDownPredicates,
 InferFiltersFromConstraints) ::

Review comment:
   Note: This rule was separated because 
https://github.com/apache/spark/pull/19149





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on pull request #29170: [SPARK-30876][SQL]: Optimizer fails to infer constraints within join

2020-07-20 Thread GitBox



maropu commented on pull request #29170:
URL: https://github.com/apache/spark/pull/29170#issuecomment-661659200


   ok to test



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] agrawaldevesh commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-20 Thread GitBox



agrawaldevesh commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r457857768



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala
##
@@ -498,6 +547,8 @@ case class BroadcastHashJoinExec(
  |}
  |  }
  |}
+ |// special case for NullAwareAntiJoin, if anyNull in streamedRow, 
row should be dropped.

Review comment:
   Wow !. Did you take a look at that tiny code diff. Great job !!

##
File path: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java
##
@@ -171,6 +171,23 @@
   private volatile MapIterator destructiveIterator = null;
   private LinkedList spillWriters = new 
LinkedList<>();
 
+  private boolean anyNullKeyExists = false;
+
+  public boolean inputEmpty()
+  {
+return ((numKeys == 0) && !anyNullKeyExists);

Review comment:
   nit: The outer params can be dropped ? 

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
##
@@ -71,6 +71,16 @@ private[execution] sealed trait HashedRelation extends 
KnownSizeEstimation {
*/
   def keyIsUnique: Boolean
 
+  /**
+   * is input: Iterator[InternalRow] empty
+   */
+  def inputEmpty: Boolean
+
+  /**
+   * anyNull key exists in input

Review comment:
   Need more context why this is worthwhile to consider: Perhaps to the 
effect that it is only used in null aware anti join.
   
   

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
##
@@ -71,6 +71,16 @@ private[execution] sealed trait HashedRelation extends 
KnownSizeEstimation {
*/
   def keyIsUnique: Boolean
 
+  /**
+   * is input: Iterator[InternalRow] empty

Review comment:
   Can you expand this comment please: You can add a "Note that, the hashed 
relation can be empty despite the Iterator[InternalRow] being not empty since 
the hashed relation skips over null keys"

##
File path: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java
##
@@ -171,6 +171,23 @@
   private volatile MapIterator destructiveIterator = null;
   private LinkedList spillWriters = new 
LinkedList<>();
 
+  private boolean anyNullKeyExists = false;
+
+  public boolean inputEmpty()
+  {
+return ((numKeys == 0) && !anyNullKeyExists);
+  }
+
+  public boolean isAnyNullKeyExists()
+  {
+return anyNullKeyExists;
+  }
+
+  public void setAnyNullKeyExists(boolean anyNullKeyExists)
+  {
+this.anyNullKeyExists = anyNullKeyExists;

Review comment:
   So just making sure I am reading this code right: There is no extra scan 
of the rows done to know if there are no null keys.

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
##
@@ -388,3 +390,36 @@ object PhysicalWindow {
 case _ => None
   }
 }
+
+object ExtractSingleColumnNullAwareAntiJoin extends JoinSelectionHelper {
+
+  // SingleColumn NullAwareAntiJoin
+  // streamedSideKeys, buildSideKeys
+  // currently these two return Seq[Expression] should have only one element
+  private type ReturnType = (Seq[Expression], Seq[Expression])
+
+  /**
+   * See. [SPARK-32290]
+   * LeftAnti(condition: Or(EqualTo(a=b), IsNull(EqualTo(a=b)))
+   * will almost certainly be planned as a Broadcast Nested Loop join,
+   * which is very time consuming because it's an O(M*N) calculation.
+   * But if it's a single column case, and buildSide data is small enough,
+   * O(M*N) calculation could be optimized into O(M) using hash lookup instead 
of loop lookup.
+   */
+  def unapply(join: Join): Option[ReturnType] = join match {
+case Join(left, right, LeftAnti,
+  Some(Or(EqualTo(leftAttr: AttributeReference, rightAttr: 
AttributeReference),
+IsNull(EqualTo(tmpLeft: AttributeReference, tmpRight: 
AttributeReference, _)
+if SQLConf.get.nullAwareAntiJoinOptimizeEnabled &&
+  leftAttr.semanticEquals(tmpLeft) && 
rightAttr.semanticEquals(tmpRight) &&

Review comment:
   Should this also refer to `canEvaluate` as done in `ExtractEquiJoinKeys`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode

2020-07-20 Thread GitBox



SparkQA commented on pull request #28968:
URL: https://github.com/apache/spark/pull/28968#issuecomment-661658276


   **[Test build #126222 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126222/testReport)**
 for PR 28968 at commit 
[`a78fd43`](https://github.com/apache/spark/commit/a78fd4314ba39d1feb63ba1539ac9a2acf40de77).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode

2020-07-20 Thread GitBox



AmplabJenkins removed a comment on pull request #28968:
URL: https://github.com/apache/spark/pull/28968#issuecomment-661656050







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode

2020-07-20 Thread GitBox



AmplabJenkins commented on pull request #28968:
URL: https://github.com/apache/spark/pull/28968#issuecomment-661656050







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode

2020-07-20 Thread GitBox



HyukjinKwon commented on pull request #28968:
URL: https://github.com/apache/spark/pull/28968#issuecomment-661655744


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

2020-07-20 Thread GitBox



AmplabJenkins removed a comment on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661654861


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126219/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

2020-07-20 Thread GitBox



AmplabJenkins removed a comment on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661654854


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

2020-07-20 Thread GitBox



SparkQA removed a comment on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661632576


   **[Test build #126219 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126219/testReport)**
 for PR 29172 at commit 
[`a6dc25d`](https://github.com/apache/spark/commit/a6dc25dcacd34b315bf67d8bb1d28e52f2dec3bb).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

2020-07-20 Thread GitBox



AmplabJenkins commented on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661654854







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

2020-07-20 Thread GitBox



SparkQA commented on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661654645


   **[Test build #126219 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126219/testReport)**
 for PR 29172 at commit 
[`a6dc25d`](https://github.com/apache/spark/commit/a6dc25dcacd34b315bf67d8bb1d28e52f2dec3bb).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #29166: [SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict plan

2020-07-20 Thread GitBox



maropu commented on a change in pull request #29166:
URL: https://github.com/apache/spark/pull/29166#discussion_r457855360



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1237,20 +1249,44 @@ class Analyzer(
   if (conflictPlans.isEmpty) {
 right
   } else {
-val attributeRewrites = AttributeMap(conflictPlans.flatMap {
-  case (oldRelation, newRelation) => 
oldRelation.output.zip(newRelation.output)})
-val conflictPlanMap = conflictPlans.toMap
-// transformDown so that we can replace all the old Relations in one 
turn due to
-// the reason that `conflictPlans` are also collected in pre-order.
-right transformDown {
-  case r => conflictPlanMap.getOrElse(r, r)
-} transformUp {
-  case other => other transformExpressions {
+rewritePlan(right, conflictPlans.toMap)._1
+  }
+}
+
+private def rewritePlan(plan: LogicalPlan, conflictPlanMap: 
Map[LogicalPlan, LogicalPlan])
+  : (LogicalPlan, mutable.ArrayBuffer[(Attribute, Attribute)]) = {

Review comment:
   nit: In the most cases of recursive calls, a return type would be `Seq` 
instead of `ArrayBuffer`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-20 Thread GitBox



AmplabJenkins removed a comment on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-661650757


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126213/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #29166: [SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict plan

2020-07-20 Thread GitBox



maropu commented on a change in pull request #29166:
URL: https://github.com/apache/spark/pull/29166#discussion_r457854335



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1237,20 +1249,44 @@ class Analyzer(
   if (conflictPlans.isEmpty) {
 right
   } else {
-val attributeRewrites = AttributeMap(conflictPlans.flatMap {
-  case (oldRelation, newRelation) => 
oldRelation.output.zip(newRelation.output)})
-val conflictPlanMap = conflictPlans.toMap
-// transformDown so that we can replace all the old Relations in one 
turn due to
-// the reason that `conflictPlans` are also collected in pre-order.
-right transformDown {
-  case r => conflictPlanMap.getOrElse(r, r)
-} transformUp {
-  case other => other transformExpressions {
+rewritePlan(right, conflictPlans.toMap)._1
+  }
+}
+
+private def rewritePlan(plan: LogicalPlan, conflictPlanMap: 
Map[LogicalPlan, LogicalPlan])
+  : (LogicalPlan, mutable.ArrayBuffer[(Attribute, Attribute)]) = {
+  val attrMapping = new mutable.ArrayBuffer[(Attribute, Attribute)]()
+  if (conflictPlanMap.contains(plan)) {
+// If the plan is the one that conflict the with left one, we'd
+// just replace it with the new plan and collect the rewrite
+// attributes for the parent node.
+val newRelation = conflictPlanMap(plan)
+attrMapping ++= plan.output.zip(newRelation.output)
+newRelation -> attrMapping
+  } else {
+var newPlan = plan.mapChildren { child =>

Review comment:
   nit: `val` is better.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-20 Thread GitBox



AmplabJenkins removed a comment on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-661650753


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-20 Thread GitBox



AmplabJenkins commented on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-661650753







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-20 Thread GitBox



AmplabJenkins removed a comment on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-661649619


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126212/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-20 Thread GitBox



SparkQA removed a comment on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-661598793


   **[Test build #126213 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126213/testReport)**
 for PR 29014 at commit 
[`54c2235`](https://github.com/apache/spark/commit/54c2235e480e6673fb8ab84341a338d68970c3f3).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-20 Thread GitBox



AmplabJenkins removed a comment on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-661649608


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-20 Thread GitBox



SparkQA commented on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-661649994


   **[Test build #126213 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126213/testReport)**
 for PR 29014 at commit 
[`54c2235`](https://github.com/apache/spark/commit/54c2235e480e6673fb8ab84341a338d68970c3f3).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `case class ExecutorProcessLost(`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-20 Thread GitBox



AmplabJenkins commented on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-661649608







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

2020-07-20 Thread GitBox



AmplabJenkins removed a comment on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661648937







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

2020-07-20 Thread GitBox



AmplabJenkins commented on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661648937







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #29152: [SPARK-32356][SQL] Forbid create view with null type

2020-07-20 Thread GitBox



cloud-fan commented on pull request #29152:
URL: https://github.com/apache/spark/pull/29152#issuecomment-661648916


   LGTM if tests pass.
   
   One last question: Hive also fails to create view if the column is null type?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

2020-07-20 Thread GitBox



SparkQA removed a comment on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661503811


   **[Test build #126209 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126209/testReport)**
 for PR 29172 at commit 
[`27bd6d5`](https://github.com/apache/spark/commit/27bd6d55ff85e4deecbb36eb04185382cca256d3).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-20 Thread GitBox



SparkQA removed a comment on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-661598746


   **[Test build #126212 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126212/testReport)**
 for PR 29032 at commit 
[`71633f0`](https://github.com/apache/spark/commit/71633f0e433b4c81ffd626f1522327b0d6d21759).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

2020-07-20 Thread GitBox



SparkQA commented on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661648372


   **[Test build #126209 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126209/testReport)**
 for PR 29172 at commit 
[`27bd6d5`](https://github.com/apache/spark/commit/27bd6d55ff85e4deecbb36eb04185382cca256d3).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-20 Thread GitBox



SparkQA commented on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-661648257


   **[Test build #126212 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126212/testReport)**
 for PR 29032 at commit 
[`71633f0`](https://github.com/apache/spark/commit/71633f0e433b4c81ffd626f1522327b0d6d21759).
* This patch **fails PySpark pip packaging tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `case class ExecutorDecommissionInfo(message: String, 
isHostDecommissioned: Boolean)`
 * `  case class DecommissionExecutor(executorId: String, decommissionInfo: 
ExecutorDecommissionInfo)`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29175: [SPARK-32377][SQL][2.4] CaseInsensitiveMap should be deterministic for addition

2020-07-20 Thread GitBox



AmplabJenkins removed a comment on pull request #29175:
URL: https://github.com/apache/spark/pull/29175#issuecomment-661645796







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29175: [SPARK-32377][SQL][2.4] CaseInsensitiveMap should be deterministic for addition

2020-07-20 Thread GitBox



AmplabJenkins commented on pull request #29175:
URL: https://github.com/apache/spark/pull/29175#issuecomment-661645796







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29175: [SPARK-32377][SQL][2.4] CaseInsensitiveMap should be deterministic for addition

2020-07-20 Thread GitBox



SparkQA commented on pull request #29175:
URL: https://github.com/apache/spark/pull/29175#issuecomment-661645518


   **[Test build #126221 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126221/testReport)**
 for PR 29175 at commit 
[`91683e2`](https://github.com/apache/spark/commit/91683e2e161a150b0521c8289234a233b73cd98c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #28422: [SPARK-17604][SS] FileStreamSource: provide a new option to have retention on input files

2020-07-20 Thread GitBox



HeartSaVioR commented on pull request #28422:
URL: https://github.com/apache/spark/pull/28422#issuecomment-661645101


   FYI, I've initiated the discussion around this in dev@ mailing list, how to 
deal with "latestFirst" option and metadata growing.
   
https://lists.apache.org/thread.html/r08e3a8d7df74354b38d19ffdebe1afe7fa73c2f611f0a812a867dffb%40%3Cdev.spark.apache.org%3E



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29175: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

2020-07-20 Thread GitBox



dongjoon-hyun commented on a change in pull request #29175:
URL: https://github.com/apache/spark/pull/29175#discussion_r457847560



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/CaseInsensitiveMapSuite.scala
##
@@ -0,0 +1,30 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import org.apache.spark.SparkFunSuite
+
+class CaseInsensitiveMapSuite extends SparkFunSuite {

Review comment:
   I want to keep this test in `catalyst` module, but cannot find a proper 
test suite. I'm open to all suggestion for a better test suite if exists.

##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/CaseInsensitiveMapSuite.scala
##
@@ -0,0 +1,30 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import org.apache.spark.SparkFunSuite
+
+class CaseInsensitiveMapSuite extends SparkFunSuite {

Review comment:
   I want to keep this test in `catalyst` module, but cannot find a proper 
test suite. I'm open to any suggestion for a better test suite if exists.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 >

1 - 100 of 853 matches

Mail list logo