[GitHub] [spark] c21 commented on a change in pull request #32528: [SPARK-35350][SQL] Add code-gen for left semi sort merge join

2021-05-13 Thread GitBox


c21 commented on a change in pull request #32528:
URL: https://github.com/apache/spark/pull/32528#discussion_r631610336



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
##
@@ -724,9 +749,32 @@ case class SortMergeJoinExec(
""".stripMargin
 }
 
+lazy val semiJoin = {

Review comment:
   @maropu - yes I was thinking at the first place but worried about number 
of parameters to be too many. Refined the code a bit and updated now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on a change in pull request #32528: [SPARK-35350][SQL] Add code-gen for left semi sort merge join

2021-05-13 Thread GitBox


c21 commented on a change in pull request #32528:
URL: https://github.com/apache/spark/pull/32528#discussion_r631610541



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
##
@@ -424,8 +424,18 @@ case class SortMergeJoinExec(
 // A list to hold all matched rows from buffered side.
 val clsName = classOf[ExternalAppendOnlyUnsafeRowArray].getName
 
+// Flag to only buffer first matched row, to avoid buffering unnecessary 
rows.
+val onlyBufferFirstMatchedRow = (joinType, condition) match {
+  case (LeftSemi, None) => true
+  case _ => false
+}
+val inMemoryThreshold =
+  if (onlyBufferFirstMatchedRow) {

Review comment:
   Good call. Actually the non-code-gen path can also depend on this, so I 
make it just a `val` now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on pull request #32528: [SPARK-35350][SQL] Add code-gen for left semi sort merge join

2021-05-13 Thread GitBox


c21 commented on pull request #32528:
URL: https://github.com/apache/spark/pull/32528#issuecomment-840363206


   To ease for review, the change for all plan files is used by followed 
command:
   
   ```
   SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly *PlanStabilitySuite" 
   SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly 
*PlanStabilityWithStatsSuite"
   ```
   
   None of them are updated manually.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #32531: [SPARK-35394][K8S][BUILD] Move kubernetes-client.version to root pom file

2021-05-13 Thread GitBox


dongjoon-hyun commented on pull request #32531:
URL: https://github.com/apache/spark/pull/32531#issuecomment-840363273


   The Python module UT failure is SPARK-35392 and will be fixed in master 
branch via https://github.com/apache/spark/pull/32533 .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32531: [SPARK-35394][K8S][BUILD] Move kubernetes-client.version to root pom file

2021-05-13 Thread GitBox


SparkQA commented on pull request #32531:
URL: https://github.com/apache/spark/pull/32531#issuecomment-840365448


   **[Test build #138491 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138491/testReport)**
 for PR 32531 at commit 
[`c6ce0b7`](https://github.com/apache/spark/commit/c6ce0b720c114b962e73af1a989eb113df3ec70a).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32531: [SPARK-35394][K8S][BUILD] Move kubernetes-client.version to root pom file

2021-05-13 Thread GitBox


SparkQA removed a comment on pull request #32531:
URL: https://github.com/apache/spark/pull/32531#issuecomment-840290823


   **[Test build #138491 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138491/testReport)**
 for PR 32531 at commit 
[`c6ce0b7`](https://github.com/apache/spark/commit/c6ce0b720c114b962e73af1a989eb113df3ec70a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #32531: [SPARK-35394][K8S][BUILD] Move kubernetes-client.version to root pom file

2021-05-13 Thread GitBox


dongjoon-hyun commented on pull request #32531:
URL: https://github.com/apache/spark/pull/32531#issuecomment-840365894


   Hi, @HyukjinKwon . Could you review this PR please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32532: [SPARK-35384][SQL][FOLLOWUP] Move `HashMap.get` out of `InvokeLike.invoke`

2021-05-13 Thread GitBox


SparkQA commented on pull request #32532:
URL: https://github.com/apache/spark/pull/32532#issuecomment-840373603






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32533: [SPARK-35392][ML][PYTHON] Remove Flaky GMM Test in ml/clustering.py

2021-05-13 Thread GitBox


SparkQA commented on pull request #32533:
URL: https://github.com/apache/spark/pull/32533#issuecomment-840373369






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32528: [SPARK-35350][SQL] Add code-gen for left semi sort merge join

2021-05-13 Thread GitBox


cloud-fan commented on a change in pull request #32528:
URL: https://github.com/apache/spark/pull/32528#discussion_r631619359



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
##
@@ -679,60 +704,124 @@ case class SortMergeJoinExec(
   (evaluateVariables(streamedVars), "")
 }
 
-val thisPlan = ctx.addReferenceObj("plan", this)
-val eagerCleanup = s"$thisPlan.cleanupResources();"
-
-lazy val innerJoin =
+val beforeLoop =
   s"""
- |while (findNextJoinRows($streamedInput, $bufferedInput)) {
- |  ${streamedVarDecl.mkString("\n")}
- |  ${beforeLoop.trim}
- |  scala.collection.Iterator $iterator = 
$matches.generateIterator();
- |  while ($iterator.hasNext()) {
- |InternalRow $bufferedRow = (InternalRow) $iterator.next();
- |${condCheck.trim}
- |$numOutput.add(1);
- |${consume(ctx, resultVars)}
- |  }
- |  if (shouldStop()) return;
- |}
- |$eagerCleanup
- """.stripMargin
-
-lazy val outerJoin = {
-  val hasOutputRow = ctx.freshName("hasOutputRow")
+ |${streamedVarDecl.mkString("\n")}
+ |${streamedBeforeLoop.trim}
+ |scala.collection.Iterator $iterator = 
$matches.generateIterator();
+   """.stripMargin
+val outputRow =
   s"""
- |while ($streamedInput.hasNext()) {
- |  findNextJoinRows($streamedInput, $bufferedInput);
- |  ${streamedVarDecl.mkString("\n")}
- |  ${beforeLoop.trim}
- |  scala.collection.Iterator $iterator = 
$matches.generateIterator();
- |  boolean $hasOutputRow = false;
- |
- |  // the last iteration of this loop is to emit an empty row if 
there is no matched rows.
- |  while ($iterator.hasNext() || !$hasOutputRow) {
- |InternalRow $bufferedRow = $iterator.hasNext() ?
- |  (InternalRow) $iterator.next() : null;
- |${condCheck.trim}
- |$hasOutputRow = true;
- |$numOutput.add(1);
- |${consume(ctx, resultVars)}
- |  }
- |  if (shouldStop()) return;
- |}
- |$eagerCleanup
+ |$numOutput.add(1);
+ |${consume(ctx, resultVars)}
""".stripMargin
-}
+val findNextJoinRows = s"findNextJoinRows($streamedInput, $bufferedInput)"
+val thisPlan = ctx.addReferenceObj("plan", this)
+val eagerCleanup = s"$thisPlan.cleanupResources();"
 
 joinType match {
-  case _: InnerLike => innerJoin
-  case LeftOuter | RightOuter => outerJoin
+  case _: InnerLike =>
+codegenInner(findNextJoinRows, beforeLoop, iterator, bufferedRow, 
condCheck, outputRow,

Review comment:
   shall we pass `beforeLoop.trim` so that we don't need to do it in all 
the 3 methods?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader

2021-05-13 Thread GitBox


AmplabJenkins commented on pull request #32515:
URL: https://github.com/apache/spark/pull/32515#issuecomment-840374735


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138482/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32531: [SPARK-35394][K8S][BUILD] Move kubernetes-client.version to root pom file

2021-05-13 Thread GitBox


AmplabJenkins commented on pull request #32531:
URL: https://github.com/apache/spark/pull/32531#issuecomment-840374732


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138491/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32532: [SPARK-35384][SQL][FOLLOWUP] Move `HashMap.get` out of `InvokeLike.invoke`

2021-05-13 Thread GitBox


AmplabJenkins commented on pull request #32532:
URL: https://github.com/apache/spark/pull/32532#issuecomment-840374733


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43019/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32528: [SPARK-35350][SQL] Add code-gen for left semi sort merge join

2021-05-13 Thread GitBox


cloud-fan commented on a change in pull request #32528:
URL: https://github.com/apache/spark/pull/32528#discussion_r631620203



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
##
@@ -679,60 +704,124 @@ case class SortMergeJoinExec(
   (evaluateVariables(streamedVars), "")
 }
 
-val thisPlan = ctx.addReferenceObj("plan", this)
-val eagerCleanup = s"$thisPlan.cleanupResources();"
-
-lazy val innerJoin =
+val beforeLoop =
   s"""
- |while (findNextJoinRows($streamedInput, $bufferedInput)) {
- |  ${streamedVarDecl.mkString("\n")}
- |  ${beforeLoop.trim}
- |  scala.collection.Iterator $iterator = 
$matches.generateIterator();
- |  while ($iterator.hasNext()) {
- |InternalRow $bufferedRow = (InternalRow) $iterator.next();
- |${condCheck.trim}
- |$numOutput.add(1);
- |${consume(ctx, resultVars)}
- |  }
- |  if (shouldStop()) return;
- |}
- |$eagerCleanup
- """.stripMargin
-
-lazy val outerJoin = {
-  val hasOutputRow = ctx.freshName("hasOutputRow")
+ |${streamedVarDecl.mkString("\n")}
+ |${streamedBeforeLoop.trim}
+ |scala.collection.Iterator $iterator = 
$matches.generateIterator();
+   """.stripMargin
+val outputRow =
   s"""
- |while ($streamedInput.hasNext()) {
- |  findNextJoinRows($streamedInput, $bufferedInput);
- |  ${streamedVarDecl.mkString("\n")}
- |  ${beforeLoop.trim}
- |  scala.collection.Iterator $iterator = 
$matches.generateIterator();
- |  boolean $hasOutputRow = false;
- |
- |  // the last iteration of this loop is to emit an empty row if 
there is no matched rows.
- |  while ($iterator.hasNext() || !$hasOutputRow) {
- |InternalRow $bufferedRow = $iterator.hasNext() ?
- |  (InternalRow) $iterator.next() : null;
- |${condCheck.trim}
- |$hasOutputRow = true;
- |$numOutput.add(1);
- |${consume(ctx, resultVars)}
- |  }
- |  if (shouldStop()) return;
- |}
- |$eagerCleanup
+ |$numOutput.add(1);
+ |${consume(ctx, resultVars)}
""".stripMargin
-}
+val findNextJoinRows = s"findNextJoinRows($streamedInput, $bufferedInput)"
+val thisPlan = ctx.addReferenceObj("plan", this)
+val eagerCleanup = s"$thisPlan.cleanupResources();"
 
 joinType match {
-  case _: InnerLike => innerJoin
-  case LeftOuter | RightOuter => outerJoin
+  case _: InnerLike =>
+codegenInner(findNextJoinRows, beforeLoop, iterator, bufferedRow, 
condCheck, outputRow,
+  eagerCleanup)
+  case LeftOuter | RightOuter =>
+codegenOuter(streamedInput, findNextJoinRows, beforeLoop, iterator, 
bufferedRow, condCheck,
+  ctx.freshName("hasOutputRow"), outputRow, eagerCleanup)
+  case LeftSemi =>
+codegenSemi(findNextJoinRows, beforeLoop, iterator, bufferedRow, 
condCheck,
+  ctx.freshName("hasOutputRow"), outputRow, eagerCleanup)
   case x =>
 throw new IllegalArgumentException(
   s"SortMergeJoin.doProduce should not take $x as the JoinType")
 }
   }
 
+  /**
+   * Generates the code for Inner join.
+   */
+  private def codegenInner(
+  findNextJoinRows: String,
+  beforeLoop: String,
+  matchIterator: String,
+  bufferedRow: String,
+  conditionCheck: String,
+  outputRow: String,
+  eagerCleanup: String): String = {
+s"""
+   |while ($findNextJoinRows) {
+   |  ${beforeLoop.trim}
+   |  while ($matchIterator.hasNext()) {
+   |InternalRow $bufferedRow = (InternalRow) $matchIterator.next();
+   |${conditionCheck.trim}
+   |$outputRow
+   |  }
+   |  if (shouldStop()) return;
+   |}
+   |$eagerCleanup
+ """.stripMargin
+  }
+
+  /**
+   * Generates the code for Left or Right Outer join.
+   */
+  private def codegenOuter(
+  streamedInput: String,
+  findNextJoinRows: String,
+  beforeLoop: String,
+  matchIterator: String,
+  bufferedRow: String,
+  conditionCheck: String,
+  hasOutputRow: String,
+  outputRow: String,
+  eagerCleanup: String): String = {
+s"""
+   |while ($streamedInput.hasNext()) {
+   |  $findNextJoinRows;
+   |  ${beforeLoop.trim}
+   |  boolean $hasOutputRow = false;
+   |
+   |  // the last iteration of this loop is to emit an empty row if there 
is no matched rows.
+   |  while ($matchIterator.hasNext() || !$hasOutputRow) {
+   |InternalRow $bufferedRow = $matchIterator.hasNext() ?
+   |  (InternalRow) $matchIterator.next() : null;
+   |${conditionCheck.trim}
+   |$hasOutputRow = true;
+   |$outputRow
+   |  }
+   |  if (shouldStop()) return;
+   |}
+ 

[GitHub] [spark] AmplabJenkins commented on pull request #32533: [SPARK-35392][ML][PYTHON] Remove Flaky GMM Test in ml/clustering.py

2021-05-13 Thread GitBox


AmplabJenkins commented on pull request #32533:
URL: https://github.com/apache/spark/pull/32533#issuecomment-840374731






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32533: [SPARK-35392][ML][PYTHON] Remove Flaky GMM Test in ml/clustering.py

2021-05-13 Thread GitBox


AmplabJenkins removed a comment on pull request #32533:
URL: https://github.com/apache/spark/pull/32533#issuecomment-840374731






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32532: [SPARK-35384][SQL][FOLLOWUP] Move `HashMap.get` out of `InvokeLike.invoke`

2021-05-13 Thread GitBox


AmplabJenkins removed a comment on pull request #32532:
URL: https://github.com/apache/spark/pull/32532#issuecomment-840374733


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43019/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32531: [SPARK-35394][K8S][BUILD] Move kubernetes-client.version to root pom file

2021-05-13 Thread GitBox


AmplabJenkins removed a comment on pull request #32531:
URL: https://github.com/apache/spark/pull/32531#issuecomment-840374732


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138491/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader

2021-05-13 Thread GitBox


AmplabJenkins removed a comment on pull request #32515:
URL: https://github.com/apache/spark/pull/32515#issuecomment-840374735


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138482/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #32411: [SPARK-28551][SQL] CTAS with LOCATION should not allow to a non-empty directory.

2021-05-13 Thread GitBox


cloud-fan commented on pull request #32411:
URL: https://github.com/apache/spark/pull/32411#issuecomment-840376552


   The new behavior makes sense to me. I'm wondering what shall we do for data 
source tables. e.g.
   ```
   CREATE TABLE t USING jdbc OPTIONS (... dbtable="foo") AS SELECT ...
   ```
   The "foo" in JDBC is not empty, what shall we do?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32478: [SPARK-35063][SQL] Group exception messages in sql/catalyst

2021-05-13 Thread GitBox


SparkQA commented on pull request #32478:
URL: https://github.com/apache/spark/pull/32478#issuecomment-840376956


   **[Test build #138501 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138501/testReport)**
 for PR 32478 at commit 
[`4509cbb`](https://github.com/apache/spark/commit/4509cbbbe00debe19b670c463c85fb7fe295c8c3).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32528: [SPARK-35350][SQL] Add code-gen for left semi sort merge join

2021-05-13 Thread GitBox


SparkQA commented on pull request #32528:
URL: https://github.com/apache/spark/pull/32528#issuecomment-840376871


   **[Test build #138500 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138500/testReport)**
 for PR 32528 at commit 
[`979c759`](https://github.com/apache/spark/commit/979c75928b0c2ca95fdf4820531a1cb6461129a6).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32292: [SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE

2021-05-13 Thread GitBox


SparkQA commented on pull request #32292:
URL: https://github.com/apache/spark/pull/32292#issuecomment-840377216


   **[Test build #138502 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138502/testReport)**
 for PR 32292 at commit 
[`91bdc4c`](https://github.com/apache/spark/commit/91bdc4cb30d911684cc693e9a7cd8b674e53e2e1).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] vinodkc commented on a change in pull request #32411: [SPARK-28551][SQL] CTAS with LOCATION should not allow to a non-empty directory.

2021-05-13 Thread GitBox


vinodkc commented on a change in pull request #32411:
URL: https://github.com/apache/spark/pull/32411#discussion_r631622569



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala
##
@@ -96,4 +98,23 @@ object DataWritingCommand {
   sparkContext.getLocalProperty(SQLExecution.EXECUTION_ID_KEY),
   metrics.values.toSeq)
   }
+
+  def assertEmptyRootPath(tablePath: URI, saveMode: SaveMode, sparkSession: 
SparkSession) {
+if (saveMode != SaveMode.Overwrite &&
+  !sparkSession.sqlContext.conf.allowCreateTableAsSelectNonEmptyLocation) {
+  val filePath = new org.apache.hadoop.fs.Path(tablePath)
+  val fs = 
filePath.getFileSystem(sparkSession.sparkContext.hadoopConfiguration)
+  if(fs != null && fs.exists(filePath)) {
+val locStats = fs.getFileStatus(filePath)
+if(locStats != null && locStats.isDirectory) {
+  val lStats = fs.listStatus(filePath)
+  if(lStats != null && lStats.length != 0) {
+throw new AnalysisException(
+  s"CREATE-TABLE-AS-SELECT cannot create table" +

Review comment:
   In Hive , 
https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java#L408
   Error message is `CREATE-TABLE-AS-SELECT cannot create table with location 
to a non-empty directory.`
   To be consistent, shall we show this message?
"CREATE-TABLE-AS-SELECT cannot create table with location to a non-empty 
directory.  To disable the behavior, set 
'spark.sql.legacy.allowNonEmptyLocationInCTAS to true"
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32496: [SPARK-35207][SQL] Normalize hash function behavior with negative zero

2021-05-13 Thread GitBox


cloud-fan commented on a change in pull request #32496:
URL: https://github.com/apache/spark/pull/32496#discussion_r631622732



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala
##
@@ -369,11 +369,25 @@ abstract class HashExpression[E] extends Expression {
   protected def genHashBoolean(input: String, result: String): String =
 genHashInt(s"$input ? 1 : 0", result)
 
-  protected def genHashFloat(input: String, result: String): String =
-genHashInt(s"Float.floatToIntBits($input)", result)
+  protected def genHashFloat(input: String, result: String): String = {
+s"""
+   |if(Float.floatToIntBits($input) == Float.floatToIntBits(-0.0f)) {

Review comment:
   +1




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32496: [SPARK-35207][SQL] Normalize hash function behavior with negative zero

2021-05-13 Thread GitBox


cloud-fan commented on a change in pull request #32496:
URL: https://github.com/apache/spark/pull/32496#discussion_r631622732



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala
##
@@ -369,11 +369,25 @@ abstract class HashExpression[E] extends Expression {
   protected def genHashBoolean(input: String, result: String): String =
 genHashInt(s"$input ? 1 : 0", result)
 
-  protected def genHashFloat(input: String, result: String): String =
-genHashInt(s"Float.floatToIntBits($input)", result)
+  protected def genHashFloat(input: String, result: String): String = {
+s"""
+   |if(Float.floatToIntBits($input) == Float.floatToIntBits(-0.0f)) {

Review comment:
   +1, `$input == 0.0f` should be good enough.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32496: [SPARK-35207][SQL] Normalize hash function behavior with negative zero

2021-05-13 Thread GitBox


cloud-fan commented on a change in pull request #32496:
URL: https://github.com/apache/spark/pull/32496#discussion_r631623223



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala
##
@@ -654,4 +654,30 @@ class WholeStageCodegenSuite extends QueryTest with 
SharedSparkSession
   }
 }
   }
+
+  test("SPARK-35207: Compute hash consistent between -0.0 and 0.0 doubles with 
Codegen") {

Review comment:
   +1, if you use `checkEvaluation`, both codegen and interpreted are 
checked.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32410: [SPARK-35286][SQL] Replace SessionState.start with SessionState.setCurrentSessionState

2021-05-13 Thread GitBox


SparkQA commented on pull request #32410:
URL: https://github.com/apache/spark/pull/32410#issuecomment-840378127


   **[Test build #138496 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138496/testReport)**
 for PR 32410 at commit 
[`4bca8ec`](https://github.com/apache/spark/commit/4bca8ecaec066ef19d04a12e134ba830320a2e0f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32410: [SPARK-35286][SQL] Replace SessionState.start with SessionState.setCurrentSessionState

2021-05-13 Thread GitBox


SparkQA removed a comment on pull request #32410:
URL: https://github.com/apache/spark/pull/32410#issuecomment-840310594


   **[Test build #138496 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138496/testReport)**
 for PR 32410 at commit 
[`4bca8ec`](https://github.com/apache/spark/commit/4bca8ecaec066ef19d04a12e134ba830320a2e0f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32410: [SPARK-35286][SQL] Replace SessionState.start with SessionState.setCurrentSessionState

2021-05-13 Thread GitBox


AmplabJenkins commented on pull request #32410:
URL: https://github.com/apache/spark/pull/32410#issuecomment-840378916


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138496/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32482: [SPARK-35332][SQL] Make cache plan disable configs configurable

2021-05-13 Thread GitBox


cloud-fan commented on a change in pull request #32482:
URL: https://github.com/apache/spark/pull/32482#discussion_r631624087



##
File path: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala
##
@@ -1554,4 +1554,65 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils
   assert(!spark.catalog.isCached(viewName))
 }
   }
+
+  test("SPARK-35332: Make cache plan disable configs configurable - check 
AQE") {
+withSQLConf(SQLConf.SHUFFLE_PARTITIONS.key -> "2",
+  SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM.key -> "1",
+  SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true") {
+
+  withTempView("t1", "t2", "t3") {
+withSQLConf(SQLConf.CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING.key -> 
"false") {
+  sql("CACHE TABLE t1 as SELECT /*+ REPARTITION */ * FROM values(1) as 
t(c)")
+  assert(spark.table("t1").rdd.partitions.length == 2)
+
+  withSQLConf(SQLConf.CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING.key 
-> "true") {
+assert(spark.table("t1").rdd.partitions.length == 2)
+sql("CACHE TABLE t2 as SELECT /*+ REPARTITION */ * FROM values(2) 
as t(c)")
+assert(spark.table("t2").rdd.partitions.length == 1)
+
+withSQLConf(SQLConf.CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING.key 
-> "false") {

Review comment:
   why is this one nested?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32410: [SPARK-35286][SQL] Replace SessionState.start with SessionState.setCurrentSessionState

2021-05-13 Thread GitBox


AmplabJenkins removed a comment on pull request #32410:
URL: https://github.com/apache/spark/pull/32410#issuecomment-840378916


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138496/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32482: [SPARK-35332][SQL] Make cache plan disable configs configurable

2021-05-13 Thread GitBox


cloud-fan commented on a change in pull request #32482:
URL: https://github.com/apache/spark/pull/32482#discussion_r631624820



##
File path: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala
##
@@ -1554,4 +1554,65 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils
   assert(!spark.catalog.isCached(viewName))
 }
   }
+
+  test("SPARK-35332: Make cache plan disable configs configurable - check 
AQE") {
+withSQLConf(SQLConf.SHUFFLE_PARTITIONS.key -> "2",
+  SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM.key -> "1",
+  SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true") {
+
+  withTempView("t1", "t2", "t3") {
+withSQLConf(SQLConf.CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING.key -> 
"false") {
+  sql("CACHE TABLE t1 as SELECT /*+ REPARTITION */ * FROM values(1) as 
t(c)")
+  assert(spark.table("t1").rdd.partitions.length == 2)
+
+  withSQLConf(SQLConf.CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING.key 
-> "true") {
+assert(spark.table("t1").rdd.partitions.length == 2)
+sql("CACHE TABLE t2 as SELECT /*+ REPARTITION */ * FROM values(2) 
as t(c)")
+assert(spark.table("t2").rdd.partitions.length == 1)
+
+withSQLConf(SQLConf.CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING.key 
-> "false") {

Review comment:
   I'm expecting something like
   ```
   withSQLConf(SQLConf.CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING.key -> 
"false") {
 test1
   }
   withSQLConf(SQLConf.CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING.key -> 
"true") {
 test2
   }
   withSQLConf(SQLConf.CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING.key -> 
"false") {
 test3
   }
   ```

##
File path: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala
##
@@ -1554,4 +1554,65 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils
   assert(!spark.catalog.isCached(viewName))
 }
   }
+
+  test("SPARK-35332: Make cache plan disable configs configurable - check 
AQE") {
+withSQLConf(SQLConf.SHUFFLE_PARTITIONS.key -> "2",
+  SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM.key -> "1",
+  SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true") {
+
+  withTempView("t1", "t2", "t3") {
+withSQLConf(SQLConf.CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING.key -> 
"false") {
+  sql("CACHE TABLE t1 as SELECT /*+ REPARTITION */ * FROM values(1) as 
t(c)")
+  assert(spark.table("t1").rdd.partitions.length == 2)
+
+  withSQLConf(SQLConf.CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING.key 
-> "true") {
+assert(spark.table("t1").rdd.partitions.length == 2)
+sql("CACHE TABLE t2 as SELECT /*+ REPARTITION */ * FROM values(2) 
as t(c)")
+assert(spark.table("t2").rdd.partitions.length == 1)
+
+withSQLConf(SQLConf.CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING.key 
-> "false") {
+  assert(spark.table("t1").rdd.partitions.length == 2)
+  assert(spark.table("t2").rdd.partitions.length == 1)
+  sql("CACHE TABLE t3 as SELECT /*+ REPARTITION */ * FROM 
values(3) as t(c)")
+  assert(spark.table("t3").rdd.partitions.length == 2)
+}
+  }
+}
+  }
+}
+  }
+
+  test("SPARK-35332: Make cache plan disable configs configurable - check 
bucket scan") {
+withTable("t1", "t2", "t3") {
+  Seq(1, 2, 3).foreach { i =>
+spark.range(1, 2)
+  .write
+  .format("parquet")
+  .bucketBy(2, "id")
+  .saveAsTable(s"t$i")
+  }
+
+  withCache("t1", "t2", "t3") {
+withSQLConf(SQLConf.BUCKETING_ENABLED.key -> "true",
+  SQLConf.FILES_MIN_PARTITION_NUM.key -> "1",
+  SQLConf.CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING.key -> "false") {
+  sql("CACHE TABLE t1")
+  assert(spark.table("t1").rdd.partitions.length == 2)
+
+  withSQLConf(SQLConf.CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING.key 
-> "true") {
+assert(spark.table("t1").rdd.partitions.length == 2)
+sql("CACHE TABLE t2")
+assert(spark.table("t2").rdd.partitions.length == 1)
+
+withSQLConf(SQLConf.CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING.key 
-> "false") {

Review comment:
   ditto




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #32365: [SPARK-35228][SQL] Add expression ToPrettyString for keep consistent between hive/spark format in df.show and transform

2021-05-13 Thread GitBox


AngersZh commented on a change in pull request #32365:
URL: https://github.com/apache/spark/pull/32365#discussion_r631625554



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
##
@@ -2649,3 +2652,78 @@ case class Sentences(
 copy(str = newFirst, language = newSecond, country = newThird)
 
 }
+
+case class ToPrettyString(child: Expression, timeZoneId: Option[String] = None)
+  extends UnaryExpression with TimeZoneAwareExpression {
+  import ToPrettyString._
+
+  override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression =
+copy(timeZoneId = Option(timeZoneId))
+  override def dataType: DataType = StringType
+
+  private val timeFormatters: TimeFormatters =
+TimeFormatters(DateFormatter(zoneId), 
TimestampFormatter.getFractionFormatter(zoneId))
+
+  override def nullSafeEval(input: Any): Any = {
+UTF8String.fromString(toHiveString((input, child.dataType), false, 
timeFormatters))
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+nullSafeCodeGen(ctx, ev, eval => {
+  val toHiveString = ToPrettyString.getClass.getName.stripSuffix("$")
+  val tuple2 = Tuple2.getClass.getName.stripSuffix("$")
+  val dataType = JavaCode.global(
+ctx.addReferenceObj("dataType", child.dataType),
+child.dataType.getClass)
+  val formatter = JavaCode.global(
+ctx.addReferenceObj("dateFormatter", timeFormatters),
+timeFormatters.getClass)
+  s"""${ev.value} = UTF8String.fromString($toHiveString.toHiveString(
+ |$tuple2.apply($eval, ${dataType}), false, 
$formatter));""".stripMargin
+})
+  }
+
+  override def prettyName: String = "to_hive_string"
+
+  override protected def withNewChildInternal(newChild: Expression): 
Expression =
+ToPrettyString(newChild)
+}
+
+object ToPrettyString {
+  case class TimeFormatters(date: DateFormatter, timestamp: TimestampFormatter)
+
+  def toHiveString(
+  a: (Any, DataType),
+  nested: Boolean,
+  formatters: TimeFormatters): String = a match {

Review comment:
   > This is inefficient in codegen, as we know the data type ahead but we 
are still doing runtime pattern match.
   > 
   > Can we follow `Cast` to implement the expression?
   
   Done, can you recheck this?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #32531: [SPARK-35394][K8S][BUILD] Move kubernetes-client.version to root pom file

2021-05-13 Thread GitBox


dongjoon-hyun commented on pull request #32531:
URL: https://github.com/apache/spark/pull/32531#issuecomment-840382268


   Oh, thank you so much, @viirya ! 
   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #32531: [SPARK-35394][K8S][BUILD] Move kubernetes-client.version to root pom file

2021-05-13 Thread GitBox


dongjoon-hyun closed pull request #32531:
URL: https://github.com/apache/spark/pull/32531


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xuechendi opened a new pull request #32534: [WIP][SPARK-35396]Add AutoCloseable close to BlockManager and InMemoryRelation

2021-05-13 Thread GitBox


xuechendi opened a new pull request #32534:
URL: https://github.com/apache/spark/pull/32534


   This PR is proposing a add-on to support to manual close entries in 
MemoryStore and InMemoryRelation
   
   ### What changes were proposed in this pull request?
   Currently:
   MemoryStore uses a LinkedHashMap[BlockId, MemoryEntry[_]] to store all 
OnHeap or OffHeap entries.
   And when memoryStore.remove(blockId) is called, codes will simply remove one 
entry from LinkedHashMap and leverage Java GC to do release work.
   
   This PR:
   We are proposing a add-on to manually close any object stored in 
MemoryStore and InMemoryRelation if this object is extended from AutoCloseable.
   
   Veifiication:
   In our own use case, we implemented a user-defined off-heap-hashRelation 
for BHJ, and we verified that by adding this manual close, we can make sure our 
defined off-heap-hashRelation can be released when evict is called.
   Also, we implemented user-defined cachedBatch and will be release when 
InMemoryRelation.clearCache() is called by this PR 
   
   
   ### Why are the changes needed?
   This changes can help to clean some off-heap user-defined object may be 
cached in InMemoryRelation or MemoryStore
   
   
   ### Does this PR introduce _any_ user-facing change?
   NO
   
   
   ### How was this patch tested?
   WIP
   
   Signed-off-by: Chendi Xue 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32534: [WIP][SPARK-35396]Add AutoCloseable close to BlockManager and InMemoryRelation

2021-05-13 Thread GitBox


AmplabJenkins commented on pull request #32534:
URL: https://github.com/apache/spark/pull/32534#issuecomment-840383881


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya opened a new pull request #32535: [SPARK-35397][SQL] Replace sys.err usage with explicit exception type

2021-05-13 Thread GitBox


viirya opened a new pull request #32535:
URL: https://github.com/apache/spark/pull/32535


   
   
   ### What changes were proposed in this pull request?
   
   
   This patch replaces `sys.err` usages with explicit exception types.
   
   ### Why are the changes needed?
   
   
   Motivated by the previous comment 
https://github.com/apache/spark/pull/32519#discussion_r630787080, it sounds 
better to replace `sys.err` usages with explicit exception type.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No.
   
   ### How was this patch tested?
   
   
   Existing tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32532: [SPARK-35384][SQL][FOLLOWUP] Move `HashMap.get` out of `InvokeLike.invoke`

2021-05-13 Thread GitBox


dongjoon-hyun commented on a change in pull request #32532:
URL: https://github.com/apache/spark/pull/32532#discussion_r631629800



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
##
@@ -145,12 +151,7 @@ trait InvokeLike extends Expression with NonSQLExpression {
 case e: java.lang.reflect.InvocationTargetException if e.getCause != 
null =>
   throw e.getCause
   }
-  val boxedClass = ScalaReflection.typeBoxedJavaMapping.get(dataType)
-  if (boxedClass.isDefined) {
-boxedClass.get.cast(ret)
-  } else {
-ret
-  }
+  boxingFn(ret)

Review comment:
   > This could speed up the performance a bit.
   
   Are you going to attach the new benchmark result?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32535: [SPARK-35397][SQL] Replace sys.err usage with explicit exception type

2021-05-13 Thread GitBox


dongjoon-hyun commented on a change in pull request #32535:
URL: https://github.com/apache/spark/pull/32535#discussion_r631630221



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -3338,7 +3338,8 @@ class Analyzer(override val catalogManager: 
CatalogManager)
   case _ : InnerLike =>
 (leftKeys ++ lUniqueOutput ++ rUniqueOutput, rightKeys)
   case _ =>
-sys.error("Unsupported natural join type " + joinType)
+throw QueryExecutionErrors.unsupportedNaturalJoinTypeError(joinType)
+

Review comment:
   nit. extra space.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation

2021-05-13 Thread GitBox


SparkQA commented on pull request #32498:
URL: https://github.com/apache/spark/pull/32498#issuecomment-840387634


   **[Test build #138494 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138494/testReport)**
 for PR 32498 at commit 
[`b7a6cc7`](https://github.com/apache/spark/commit/b7a6cc71110fe8de45e8c74d487ebd23b7942f34).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation

2021-05-13 Thread GitBox


SparkQA removed a comment on pull request #32498:
URL: https://github.com/apache/spark/pull/32498#issuecomment-840310425


   **[Test build #138494 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138494/testReport)**
 for PR 32498 at commit 
[`b7a6cc7`](https://github.com/apache/spark/commit/b7a6cc71110fe8de45e8c74d487ebd23b7942f34).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] YuzhouSun commented on pull request #32207: [SPARK-35106] Avoid failing rename in HadoopMapReduceCommitProtocol with dynamic partition overwrite

2021-05-13 Thread GitBox


YuzhouSun commented on pull request #32207:
URL: https://github.com/apache/spark/pull/32207#issuecomment-840387892


   Created https://github.com/apache/spark/pull/32530. @cloud-fan Could you 
review it? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation

2021-05-13 Thread GitBox


SparkQA commented on pull request #32498:
URL: https://github.com/apache/spark/pull/32498#issuecomment-840388226


   **[Test build #138485 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138485/testReport)**
 for PR 32498 at commit 
[`1cf79bf`](https://github.com/apache/spark/commit/1cf79bf05bd66446629628aac99828bc776f012c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation

2021-05-13 Thread GitBox


SparkQA removed a comment on pull request #32498:
URL: https://github.com/apache/spark/pull/32498#issuecomment-840264723


   **[Test build #138485 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138485/testReport)**
 for PR 32498 at commit 
[`1cf79bf`](https://github.com/apache/spark/commit/1cf79bf05bd66446629628aac99828bc776f012c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation

2021-05-13 Thread GitBox


SparkQA commented on pull request #32494:
URL: https://github.com/apache/spark/pull/32494#issuecomment-840389098


   **[Test build #138486 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138486/testReport)**
 for PR 32494 at commit 
[`bd838ac`](https://github.com/apache/spark/commit/bd838ac955ec62a817360347325a5be83b49be4a).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation

2021-05-13 Thread GitBox


SparkQA removed a comment on pull request #32494:
URL: https://github.com/apache/spark/pull/32494#issuecomment-840264760


   **[Test build #138486 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138486/testReport)**
 for PR 32494 at commit 
[`bd838ac`](https://github.com/apache/spark/commit/bd838ac955ec62a817360347325a5be83b49be4a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] fhygh commented on a change in pull request #32501: [SPARK-35359][SQL] Insert data with char/varchar datatype will fail when data length exceed length limitation

2021-05-13 Thread GitBox


fhygh commented on a change in pull request #32501:
URL: https://github.com/apache/spark/pull/32501#discussion_r631638329



##
File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/util/CharVarcharCodegenUtils.java
##
@@ -26,7 +27,7 @@ private static UTF8String trimTrailingSpaces(
   UTF8String inputStr, int numChars, int limit) {
 int numTailSpacesToTrim = numChars - limit;
 UTF8String trimmed = inputStr.trimTrailingSpaces(numTailSpacesToTrim);
-if (trimmed.numChars() > limit) {
+if (trimmed.numChars() > limit && !SQLConf.get().charVarcharAsString()) {

Review comment:
   then do we need update PartitioningUtils.charTypeWriteSideCheck and 
PartitioningUtils.varcharTypeWriteSideCheck ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32528: [SPARK-35350][SQL] Add code-gen for left semi sort merge join

2021-05-13 Thread GitBox


SparkQA commented on pull request #32528:
URL: https://github.com/apache/spark/pull/32528#issuecomment-840401476






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on a change in pull request #32528: [SPARK-35350][SQL] Add code-gen for left semi sort merge join

2021-05-13 Thread GitBox


c21 commented on a change in pull request #32528:
URL: https://github.com/apache/spark/pull/32528#discussion_r631646593



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
##
@@ -679,60 +704,124 @@ case class SortMergeJoinExec(
   (evaluateVariables(streamedVars), "")
 }
 
-val thisPlan = ctx.addReferenceObj("plan", this)
-val eagerCleanup = s"$thisPlan.cleanupResources();"
-
-lazy val innerJoin =
+val beforeLoop =
   s"""
- |while (findNextJoinRows($streamedInput, $bufferedInput)) {
- |  ${streamedVarDecl.mkString("\n")}
- |  ${beforeLoop.trim}
- |  scala.collection.Iterator $iterator = 
$matches.generateIterator();
- |  while ($iterator.hasNext()) {
- |InternalRow $bufferedRow = (InternalRow) $iterator.next();
- |${condCheck.trim}
- |$numOutput.add(1);
- |${consume(ctx, resultVars)}
- |  }
- |  if (shouldStop()) return;
- |}
- |$eagerCleanup
- """.stripMargin
-
-lazy val outerJoin = {
-  val hasOutputRow = ctx.freshName("hasOutputRow")
+ |${streamedVarDecl.mkString("\n")}
+ |${streamedBeforeLoop.trim}
+ |scala.collection.Iterator $iterator = 
$matches.generateIterator();
+   """.stripMargin
+val outputRow =
   s"""
- |while ($streamedInput.hasNext()) {
- |  findNextJoinRows($streamedInput, $bufferedInput);
- |  ${streamedVarDecl.mkString("\n")}
- |  ${beforeLoop.trim}
- |  scala.collection.Iterator $iterator = 
$matches.generateIterator();
- |  boolean $hasOutputRow = false;
- |
- |  // the last iteration of this loop is to emit an empty row if 
there is no matched rows.
- |  while ($iterator.hasNext() || !$hasOutputRow) {
- |InternalRow $bufferedRow = $iterator.hasNext() ?
- |  (InternalRow) $iterator.next() : null;
- |${condCheck.trim}
- |$hasOutputRow = true;
- |$numOutput.add(1);
- |${consume(ctx, resultVars)}
- |  }
- |  if (shouldStop()) return;
- |}
- |$eagerCleanup
+ |$numOutput.add(1);
+ |${consume(ctx, resultVars)}
""".stripMargin
-}
+val findNextJoinRows = s"findNextJoinRows($streamedInput, $bufferedInput)"
+val thisPlan = ctx.addReferenceObj("plan", this)
+val eagerCleanup = s"$thisPlan.cleanupResources();"
 
 joinType match {
-  case _: InnerLike => innerJoin
-  case LeftOuter | RightOuter => outerJoin
+  case _: InnerLike =>
+codegenInner(findNextJoinRows, beforeLoop, iterator, bufferedRow, 
condCheck, outputRow,
+  eagerCleanup)
+  case LeftOuter | RightOuter =>
+codegenOuter(streamedInput, findNextJoinRows, beforeLoop, iterator, 
bufferedRow, condCheck,
+  ctx.freshName("hasOutputRow"), outputRow, eagerCleanup)
+  case LeftSemi =>
+codegenSemi(findNextJoinRows, beforeLoop, iterator, bufferedRow, 
condCheck,
+  ctx.freshName("hasOutputRow"), outputRow, eagerCleanup)
   case x =>
 throw new IllegalArgumentException(
   s"SortMergeJoin.doProduce should not take $x as the JoinType")
 }
   }
 
+  /**
+   * Generates the code for Inner join.
+   */
+  private def codegenInner(
+  findNextJoinRows: String,
+  beforeLoop: String,
+  matchIterator: String,
+  bufferedRow: String,
+  conditionCheck: String,
+  outputRow: String,
+  eagerCleanup: String): String = {
+s"""
+   |while ($findNextJoinRows) {
+   |  ${beforeLoop.trim}
+   |  while ($matchIterator.hasNext()) {
+   |InternalRow $bufferedRow = (InternalRow) $matchIterator.next();
+   |${conditionCheck.trim}
+   |$outputRow
+   |  }
+   |  if (shouldStop()) return;
+   |}
+   |$eagerCleanup
+ """.stripMargin
+  }
+
+  /**
+   * Generates the code for Left or Right Outer join.
+   */
+  private def codegenOuter(
+  streamedInput: String,
+  findNextJoinRows: String,
+  beforeLoop: String,
+  matchIterator: String,
+  bufferedRow: String,
+  conditionCheck: String,
+  hasOutputRow: String,
+  outputRow: String,
+  eagerCleanup: String): String = {
+s"""
+   |while ($streamedInput.hasNext()) {
+   |  $findNextJoinRows;
+   |  ${beforeLoop.trim}
+   |  boolean $hasOutputRow = false;
+   |
+   |  // the last iteration of this loop is to emit an empty row if there 
is no matched rows.
+   |  while ($matchIterator.hasNext() || !$hasOutputRow) {
+   |InternalRow $bufferedRow = $matchIterator.hasNext() ?
+   |  (InternalRow) $matchIterator.next() : null;
+   |${conditionCheck.trim}
+   |$hasOutputRow = true;
+   |$outputRow
+   |  }
+   |  if (shouldStop()) return;
+   |}
+   

[GitHub] [spark] HyukjinKwon commented on pull request #32531: [SPARK-35394][K8S][BUILD] Move kubernetes-client.version to root pom file

2021-05-13 Thread GitBox


HyukjinKwon commented on pull request #32531:
URL: https://github.com/apache/spark/pull/32531#issuecomment-840403834


   LGTM2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32528: [SPARK-35350][SQL] Add code-gen for left semi sort merge join

2021-05-13 Thread GitBox


AmplabJenkins removed a comment on pull request #32528:
URL: https://github.com/apache/spark/pull/32528#issuecomment-840404809


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43020/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation

2021-05-13 Thread GitBox


AmplabJenkins commented on pull request #32494:
URL: https://github.com/apache/spark/pull/32494#issuecomment-840404805


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138486/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32528: [SPARK-35350][SQL] Add code-gen for left semi sort merge join

2021-05-13 Thread GitBox


AmplabJenkins commented on pull request #32528:
URL: https://github.com/apache/spark/pull/32528#issuecomment-840404809


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43020/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation

2021-05-13 Thread GitBox


AmplabJenkins removed a comment on pull request #32494:
URL: https://github.com/apache/spark/pull/32494#issuecomment-840404805


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138486/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation

2021-05-13 Thread GitBox


AmplabJenkins removed a comment on pull request #32498:
URL: https://github.com/apache/spark/pull/32498#issuecomment-840404804






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation

2021-05-13 Thread GitBox


AmplabJenkins commented on pull request #32498:
URL: https://github.com/apache/spark/pull/32498#issuecomment-840404804






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32292: [SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE

2021-05-13 Thread GitBox


AmplabJenkins commented on pull request #32292:
URL: https://github.com/apache/spark/pull/32292#issuecomment-840405169


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43022/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32292: [SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE

2021-05-13 Thread GitBox


SparkQA commented on pull request #32292:
URL: https://github.com/apache/spark/pull/32292#issuecomment-840405132






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32292: [SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE

2021-05-13 Thread GitBox


AmplabJenkins removed a comment on pull request #32292:
URL: https://github.com/apache/spark/pull/32292#issuecomment-840405169


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43022/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32535: [SPARK-35397][SQL] Replace sys.err usage with explicit exception type

2021-05-13 Thread GitBox


SparkQA commented on pull request #32535:
URL: https://github.com/apache/spark/pull/32535#issuecomment-840406769


   **[Test build #138503 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138503/testReport)**
 for PR 32535 at commit 
[`122d853`](https://github.com/apache/spark/commit/122d85371ba45e0395715793fc3ac798300768a1).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32478: [SPARK-35063][SQL] Group exception messages in sql/catalyst

2021-05-13 Thread GitBox


SparkQA commented on pull request #32478:
URL: https://github.com/apache/spark/pull/32478#issuecomment-840406845


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43021/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn closed pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader

2021-05-13 Thread GitBox


yaooqinn closed pull request #32515:
URL: https://github.com/apache/spark/pull/32515


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader

2021-05-13 Thread GitBox


yaooqinn commented on pull request #32515:
URL: https://github.com/apache/spark/pull/32515#issuecomment-840410375


   Thanks for the help ~ @dongjoon-hyun @HyukjinKwon @cloud-fan 
   
   +1 for myself. merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32478: [SPARK-35063][SQL] Group exception messages in sql/catalyst

2021-05-13 Thread GitBox


SparkQA commented on pull request #32478:
URL: https://github.com/apache/spark/pull/32478#issuecomment-840410483


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43021/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32478: [SPARK-35063][SQL] Group exception messages in sql/catalyst

2021-05-13 Thread GitBox


AmplabJenkins commented on pull request #32478:
URL: https://github.com/apache/spark/pull/32478#issuecomment-840410504


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43021/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32478: [SPARK-35063][SQL] Group exception messages in sql/catalyst

2021-05-13 Thread GitBox


AmplabJenkins removed a comment on pull request #32478:
URL: https://github.com/apache/spark/pull/32478#issuecomment-840410504


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43021/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader

2021-05-13 Thread GitBox


SparkQA commented on pull request #32515:
URL: https://github.com/apache/spark/pull/32515#issuecomment-840414987


   **[Test build #138489 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138489/testReport)**
 for PR 32515 at commit 
[`ad18acc`](https://github.com/apache/spark/commit/ad18acca9e991251fa44d33f53e8c4648fbcdbb7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader

2021-05-13 Thread GitBox


SparkQA removed a comment on pull request #32515:
URL: https://github.com/apache/spark/pull/32515#issuecomment-840286591


   **[Test build #138489 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138489/testReport)**
 for PR 32515 at commit 
[`ad18acc`](https://github.com/apache/spark/commit/ad18acca9e991251fa44d33f53e8c4648fbcdbb7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on a change in pull request #32482: [SPARK-35332][SQL] Make cache plan disable configs configurable

2021-05-13 Thread GitBox


ulysses-you commented on a change in pull request #32482:
URL: https://github.com/apache/spark/pull/32482#discussion_r631671896



##
File path: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala
##
@@ -1554,4 +1554,65 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils
   assert(!spark.catalog.isCached(viewName))
 }
   }
+
+  test("SPARK-35332: Make cache plan disable configs configurable - check 
AQE") {
+withSQLConf(SQLConf.SHUFFLE_PARTITIONS.key -> "2",
+  SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM.key -> "1",
+  SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true") {
+
+  withTempView("t1", "t2", "t3") {
+withSQLConf(SQLConf.CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING.key -> 
"false") {
+  sql("CACHE TABLE t1 as SELECT /*+ REPARTITION */ * FROM values(1) as 
t(c)")
+  assert(spark.table("t1").rdd.partitions.length == 2)
+
+  withSQLConf(SQLConf.CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING.key 
-> "true") {
+assert(spark.table("t1").rdd.partitions.length == 2)
+sql("CACHE TABLE t2 as SELECT /*+ REPARTITION */ * FROM values(2) 
as t(c)")
+assert(spark.table("t2").rdd.partitions.length == 1)
+
+withSQLConf(SQLConf.CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING.key 
-> "false") {

Review comment:
   Updated it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32535: [SPARK-35397][SQL] Replace sys.err usage with explicit exception type

2021-05-13 Thread GitBox


SparkQA commented on pull request #32535:
URL: https://github.com/apache/spark/pull/32535#issuecomment-840429681






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page

2021-05-13 Thread GitBox


SparkQA commented on pull request #32204:
URL: https://github.com/apache/spark/pull/32204#issuecomment-840430061


   **[Test build #138492 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138492/testReport)**
 for PR 32204 at commit 
[`a386788`](https://github.com/apache/spark/commit/a386788b44fb5255d2784ce423e3f879ba97f53c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page

2021-05-13 Thread GitBox


SparkQA removed a comment on pull request #32204:
URL: https://github.com/apache/spark/pull/32204#issuecomment-840291088


   **[Test build #138492 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138492/testReport)**
 for PR 32204 at commit 
[`a386788`](https://github.com/apache/spark/commit/a386788b44fb5255d2784ce423e3f879ba97f53c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32478: [SPARK-35063][SQL] Group exception messages in sql/catalyst

2021-05-13 Thread GitBox


SparkQA commented on pull request #32478:
URL: https://github.com/apache/spark/pull/32478#issuecomment-840430677


   **[Test build #138501 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138501/testReport)**
 for PR 32478 at commit 
[`4509cbb`](https://github.com/apache/spark/commit/4509cbbbe00debe19b670c463c85fb7fe295c8c3).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32478: [SPARK-35063][SQL] Group exception messages in sql/catalyst

2021-05-13 Thread GitBox


SparkQA removed a comment on pull request #32478:
URL: https://github.com/apache/spark/pull/32478#issuecomment-840376956


   **[Test build #138501 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138501/testReport)**
 for PR 32478 at commit 
[`4509cbb`](https://github.com/apache/spark/commit/4509cbbbe00debe19b670c463c85fb7fe295c8c3).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page

2021-05-13 Thread GitBox


AmplabJenkins commented on pull request #32204:
URL: https://github.com/apache/spark/pull/32204#issuecomment-840436292


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138492/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32535: [SPARK-35397][SQL] Replace sys.err usage with explicit exception type

2021-05-13 Thread GitBox


AmplabJenkins commented on pull request #32535:
URL: https://github.com/apache/spark/pull/32535#issuecomment-840436294


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43023/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32478: [SPARK-35063][SQL] Group exception messages in sql/catalyst

2021-05-13 Thread GitBox


AmplabJenkins commented on pull request #32478:
URL: https://github.com/apache/spark/pull/32478#issuecomment-840436297


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138501/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader

2021-05-13 Thread GitBox


AmplabJenkins commented on pull request #32515:
URL: https://github.com/apache/spark/pull/32515#issuecomment-840436293


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138489/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32478: [SPARK-35063][SQL] Group exception messages in sql/catalyst

2021-05-13 Thread GitBox


AmplabJenkins removed a comment on pull request #32478:
URL: https://github.com/apache/spark/pull/32478#issuecomment-840436297


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138501/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader

2021-05-13 Thread GitBox


AmplabJenkins removed a comment on pull request #32515:
URL: https://github.com/apache/spark/pull/32515#issuecomment-840436293


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138489/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32535: [SPARK-35397][SQL] Replace sys.err usage with explicit exception type

2021-05-13 Thread GitBox


AmplabJenkins removed a comment on pull request #32535:
URL: https://github.com/apache/spark/pull/32535#issuecomment-840436294


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43023/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page

2021-05-13 Thread GitBox


AmplabJenkins removed a comment on pull request #32204:
URL: https://github.com/apache/spark/pull/32204#issuecomment-840436292


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138492/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32482: [SPARK-35332][SQL] Make cache plan disable configs configurable

2021-05-13 Thread GitBox


SparkQA commented on pull request #32482:
URL: https://github.com/apache/spark/pull/32482#issuecomment-840438115


   **[Test build #138504 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138504/testReport)**
 for PR 32482 at commit 
[`2e8492b`](https://github.com/apache/spark/commit/2e8492baf2c9b6dba3bd635f2b3a25d288e7f3ab).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang opened a new pull request #32536: Spark 35253 followup

2021-05-13 Thread GitBox


LuciferYang opened a new pull request #32536:
URL: https://github.com/apache/spark/pull/32536


   ### What changes were proposed in this pull request?
   SPARK-35253 upgraded janino from 3.0.16 to 3.1.4, `ClassBodyEvaluator` 
provides the `getBytecodes` method to get 
   the mapping from `ClassFile.getThisClassName` to `ClassFile.toByteArray` 
directly in this version and we don't need to get this variable by reflection 
api anymore.
   
   So the main purpose of this pr is simplify the way to get classes from 
`ClassBodyEvaluator` in `CodeGenerator.updateAndGetCompilationStats` method.
   
   
   ### Why are the changes needed?
   Code simplification.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Pass the Jenkins or GitHub Action
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32536: [SPARK-35253][SQL][FOLLOWUP] Simplify the way to get classes from ClassBodyEvaluator in `CodeGenerator.updateAndGetCompilationStats` method

2021-05-13 Thread GitBox


SparkQA commented on pull request #32536:
URL: https://github.com/apache/spark/pull/32536#issuecomment-840440073


   **[Test build #138505 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138505/testReport)**
 for PR 32536 at commit 
[`738bdc6`](https://github.com/apache/spark/commit/738bdc6d3459c30399ac4efec6e4bef799611ddd).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #32536: [SPARK-35398][SQL] Simplify the way to get classes from ClassBodyEvaluator in `CodeGenerator.updateAndGetCompilationStats` method

2021-05-13 Thread GitBox


LuciferYang commented on pull request #32536:
URL: https://github.com/apache/spark/pull/32536#issuecomment-840441694


   Waiting for CI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #32536: [SPARK-35398][SQL] Simplify the way to get classes from ClassBodyEvaluator in `CodeGenerator.updateAndGetCompilationStats` method

2021-05-13 Thread GitBox


LuciferYang commented on pull request #32536:
URL: https://github.com/apache/spark/pull/32536#issuecomment-840441827


   cc @maropu @srowen 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xuanyuanking commented on a change in pull request #32272: [SPARK-35172][SS] The implementation of RocksDBCheckpointMetadata

2021-05-13 Thread GitBox


xuanyuanking commented on a change in pull request #32272:
URL: https://github.com/apache/spark/pull/32272#discussion_r631688986



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala
##
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming.state
+
+import java.io.File
+import java.nio.charset.StandardCharsets.UTF_8
+import java.nio.file.Files
+
+import scala.collection.Seq
+
+import com.fasterxml.jackson.annotation.JsonInclude.Include
+import com.fasterxml.jackson.databind.{DeserializationFeature, ObjectMapper}
+import com.fasterxml.jackson.module.scala.{DefaultScalaModule, 
ScalaObjectMapper}
+import org.json4s.NoTypeHints
+import org.json4s.jackson.Serialization
+
+/**
+ * Classes to represent metadata of checkpoints saved to DFS. Since this is 
converted to JSON, any
+ * changes to this MUST be backward-compatible.
+ */
+case class RocksDBCheckpointMetadata(
+sstFiles: Seq[RocksDBSstFile],
+logFiles: Seq[RocksDBLogFile],
+numKeys: Long) {
+  import RocksDBCheckpointMetadata._
+
+  def json: String = {
+// We turn this field into a null to avoid write a empty logFiles field in 
the json.
+val nullified = if (logFiles.isEmpty) this.copy(logFiles = null) else this
+mapper.writeValueAsString(nullified)
+  }
+
+  def prettyJson: String = 
Serialization.writePretty(this)(RocksDBCheckpointMetadata.format)

Review comment:
   The only difference is the `logFiles` fields. Actually the `prettyJson` 
field is for providing a readable string for log. `json` field is for files 
writing.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xuanyuanking commented on a change in pull request #32272: [SPARK-35172][SS] The implementation of RocksDBCheckpointMetadata

2021-05-13 Thread GitBox


xuanyuanking commented on a change in pull request #32272:
URL: https://github.com/apache/spark/pull/32272#discussion_r631689200



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala
##
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming.state
+
+import java.io._
+import java.nio.charset.Charset
+
+import org.apache.commons.io.FileUtils
+
+import org.apache.spark._
+import org.apache.spark.util.Utils
+
+class RocksDBSuite extends SparkFunSuite {
+
+  test("checkpoint metadata serde roundtrip") {
+def checkJsonRoundtrip(metadata: RocksDBCheckpointMetadata, json: String): 
Unit = {
+  assert(metadata.json == json)
+  withTempDirectory { dir =>
+val file = new File(dir, "json")
+FileUtils.write(file, s"v1\n$json", Charset.defaultCharset)
+assert(metadata == RocksDBCheckpointMetadata.readFromFile(file))
+  }
+}
+val sstFiles = Seq(RocksDBSstFile("1.sst", "1-uuid.sst", 
12345678901234L))
+val logFiles = Seq(RocksDBLogFile("1.log", "1-uuid.log", 
12345678901234L))
+
+// scalastyle:off line.size.limit
+// should always include sstFiles and numKeys
+checkJsonRoundtrip(
+  RocksDBCheckpointMetadata(Seq.empty, 0L),
+  """{"sstFiles":[],"numKeys":0}"""
+)
+// shouldn't include the "logFiles" field in json when it's empty
+checkJsonRoundtrip(
+  RocksDBCheckpointMetadata(sstFiles, 12345678901234L),
+  
"""{"sstFiles":[{"localFileName":"1.sst","dfsSstFileName":"1-uuid.sst","sizeBytes":12345678901234}],"numKeys":12345678901234}"""
+)
+checkJsonRoundtrip(
+  RocksDBCheckpointMetadata(sstFiles, logFiles, 12345678901234L),
+  
"""{"sstFiles":[{"localFileName":"1.sst","dfsSstFileName":"1-uuid.sst","sizeBytes":12345678901234}],"logFiles":[{"localFileName":"1.log","dfsLogFileName":"1-uuid.log","sizeBytes":12345678901234}],"numKeys":12345678901234}""")
+// scalastyle:on line.size.limit
+  }
+
+  private def withTempDirectory(f: File => Unit): Unit = {

Review comment:
   Ah yes. Let me update.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xuanyuanking commented on a change in pull request #32272: [SPARK-35172][SS] The implementation of RocksDBCheckpointMetadata

2021-05-13 Thread GitBox


xuanyuanking commented on a change in pull request #32272:
URL: https://github.com/apache/spark/pull/32272#discussion_r631689463



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala
##
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming.state
+
+import java.io.File
+import java.nio.charset.StandardCharsets.UTF_8
+import java.nio.file.Files
+
+import scala.collection.Seq
+
+import com.fasterxml.jackson.annotation.JsonInclude.Include
+import com.fasterxml.jackson.databind.{DeserializationFeature, ObjectMapper}
+import com.fasterxml.jackson.module.scala.{DefaultScalaModule, 
ScalaObjectMapper}
+import org.json4s.NoTypeHints
+import org.json4s.jackson.Serialization
+
+/**
+ * Classes to represent metadata of checkpoints saved to DFS. Since this is 
converted to JSON, any
+ * changes to this MUST be backward-compatible.
+ */
+case class RocksDBCheckpointMetadata(
+sstFiles: Seq[RocksDBSstFile],
+logFiles: Seq[RocksDBLogFile],
+numKeys: Long) {
+  import RocksDBCheckpointMetadata._
+
+  def json: String = {
+// We turn this field into a null to avoid write a empty logFiles field in 
the json.
+val nullified = if (logFiles.isEmpty) this.copy(logFiles = null) else this

Review comment:
   Yes, the `logFiles` field not always has value.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32528: [SPARK-35350][SQL] Add code-gen for left semi sort merge join

2021-05-13 Thread GitBox


SparkQA commented on pull request #32528:
URL: https://github.com/apache/spark/pull/32528#issuecomment-840444115


   **[Test build #138506 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138506/testReport)**
 for PR 32528 at commit 
[`6282a09`](https://github.com/apache/spark/commit/6282a09d2b78aad102926f956a10d68272e05c8d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on a change in pull request #32528: [SPARK-35350][SQL] Add code-gen for left semi sort merge join

2021-05-13 Thread GitBox


c21 commented on a change in pull request #32528:
URL: https://github.com/apache/spark/pull/32528#discussion_r631690259



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
##
@@ -679,60 +704,124 @@ case class SortMergeJoinExec(
   (evaluateVariables(streamedVars), "")
 }
 
-val thisPlan = ctx.addReferenceObj("plan", this)
-val eagerCleanup = s"$thisPlan.cleanupResources();"
-
-lazy val innerJoin =
+val beforeLoop =
   s"""
- |while (findNextJoinRows($streamedInput, $bufferedInput)) {
- |  ${streamedVarDecl.mkString("\n")}
- |  ${beforeLoop.trim}
- |  scala.collection.Iterator $iterator = 
$matches.generateIterator();
- |  while ($iterator.hasNext()) {
- |InternalRow $bufferedRow = (InternalRow) $iterator.next();
- |${condCheck.trim}
- |$numOutput.add(1);
- |${consume(ctx, resultVars)}
- |  }
- |  if (shouldStop()) return;
- |}
- |$eagerCleanup
- """.stripMargin
-
-lazy val outerJoin = {
-  val hasOutputRow = ctx.freshName("hasOutputRow")
+ |${streamedVarDecl.mkString("\n")}
+ |${streamedBeforeLoop.trim}
+ |scala.collection.Iterator $iterator = 
$matches.generateIterator();
+   """.stripMargin
+val outputRow =
   s"""
- |while ($streamedInput.hasNext()) {
- |  findNextJoinRows($streamedInput, $bufferedInput);
- |  ${streamedVarDecl.mkString("\n")}
- |  ${beforeLoop.trim}
- |  scala.collection.Iterator $iterator = 
$matches.generateIterator();
- |  boolean $hasOutputRow = false;
- |
- |  // the last iteration of this loop is to emit an empty row if 
there is no matched rows.
- |  while ($iterator.hasNext() || !$hasOutputRow) {
- |InternalRow $bufferedRow = $iterator.hasNext() ?
- |  (InternalRow) $iterator.next() : null;
- |${condCheck.trim}
- |$hasOutputRow = true;
- |$numOutput.add(1);
- |${consume(ctx, resultVars)}
- |  }
- |  if (shouldStop()) return;
- |}
- |$eagerCleanup
+ |$numOutput.add(1);
+ |${consume(ctx, resultVars)}
""".stripMargin
-}
+val findNextJoinRows = s"findNextJoinRows($streamedInput, $bufferedInput)"
+val thisPlan = ctx.addReferenceObj("plan", this)
+val eagerCleanup = s"$thisPlan.cleanupResources();"
 
 joinType match {
-  case _: InnerLike => innerJoin
-  case LeftOuter | RightOuter => outerJoin
+  case _: InnerLike =>
+codegenInner(findNextJoinRows, beforeLoop, iterator, bufferedRow, 
condCheck, outputRow,

Review comment:
   Actually after double checking, we need to do `beforeLoop.trim` as 
`beforeLoop` already has `stripMargin`, and has no trailing spaces. Also 
updated to avoid repeated `conditionCheck.trim`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on a change in pull request #32528: [SPARK-35350][SQL] Add code-gen for left semi sort merge join

2021-05-13 Thread GitBox


c21 commented on a change in pull request #32528:
URL: https://github.com/apache/spark/pull/32528#discussion_r631690259



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
##
@@ -679,60 +704,124 @@ case class SortMergeJoinExec(
   (evaluateVariables(streamedVars), "")
 }
 
-val thisPlan = ctx.addReferenceObj("plan", this)
-val eagerCleanup = s"$thisPlan.cleanupResources();"
-
-lazy val innerJoin =
+val beforeLoop =
   s"""
- |while (findNextJoinRows($streamedInput, $bufferedInput)) {
- |  ${streamedVarDecl.mkString("\n")}
- |  ${beforeLoop.trim}
- |  scala.collection.Iterator $iterator = 
$matches.generateIterator();
- |  while ($iterator.hasNext()) {
- |InternalRow $bufferedRow = (InternalRow) $iterator.next();
- |${condCheck.trim}
- |$numOutput.add(1);
- |${consume(ctx, resultVars)}
- |  }
- |  if (shouldStop()) return;
- |}
- |$eagerCleanup
- """.stripMargin
-
-lazy val outerJoin = {
-  val hasOutputRow = ctx.freshName("hasOutputRow")
+ |${streamedVarDecl.mkString("\n")}
+ |${streamedBeforeLoop.trim}
+ |scala.collection.Iterator $iterator = 
$matches.generateIterator();
+   """.stripMargin
+val outputRow =
   s"""
- |while ($streamedInput.hasNext()) {
- |  findNextJoinRows($streamedInput, $bufferedInput);
- |  ${streamedVarDecl.mkString("\n")}
- |  ${beforeLoop.trim}
- |  scala.collection.Iterator $iterator = 
$matches.generateIterator();
- |  boolean $hasOutputRow = false;
- |
- |  // the last iteration of this loop is to emit an empty row if 
there is no matched rows.
- |  while ($iterator.hasNext() || !$hasOutputRow) {
- |InternalRow $bufferedRow = $iterator.hasNext() ?
- |  (InternalRow) $iterator.next() : null;
- |${condCheck.trim}
- |$hasOutputRow = true;
- |$numOutput.add(1);
- |${consume(ctx, resultVars)}
- |  }
- |  if (shouldStop()) return;
- |}
- |$eagerCleanup
+ |$numOutput.add(1);
+ |${consume(ctx, resultVars)}
""".stripMargin
-}
+val findNextJoinRows = s"findNextJoinRows($streamedInput, $bufferedInput)"
+val thisPlan = ctx.addReferenceObj("plan", this)
+val eagerCleanup = s"$thisPlan.cleanupResources();"
 
 joinType match {
-  case _: InnerLike => innerJoin
-  case LeftOuter | RightOuter => outerJoin
+  case _: InnerLike =>
+codegenInner(findNextJoinRows, beforeLoop, iterator, bufferedRow, 
condCheck, outputRow,

Review comment:
   Actually after double checking, we do not need to do `beforeLoop.trim` 
as `beforeLoop` already has `stripMargin`, and has no trailing spaces. Also 
updated to avoid repeated `conditionCheck.trim`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32272: [SPARK-35172][SS] The implementation of RocksDBCheckpointMetadata

2021-05-13 Thread GitBox


SparkQA commented on pull request #32272:
URL: https://github.com/apache/spark/pull/32272#issuecomment-84023


   **[Test build #138507 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138507/testReport)**
 for PR 32272 at commit 
[`3b91a26`](https://github.com/apache/spark/commit/3b91a268c356544c8b6f75e7c365e0bfa7459e67).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xuanyuanking commented on pull request #32272: [SPARK-35172][SS] The implementation of RocksDBCheckpointMetadata

2021-05-13 Thread GitBox


xuanyuanking commented on pull request #32272:
URL: https://github.com/apache/spark/pull/32272#issuecomment-840445866


   ```
   there're some sorts of uncertainty during reviewing as there's no reference 
PR. In other words, we are reviewing methods which we don't have idea how these 
methods will be used.
   ```
   ```
   I'm also OK to review PRs one by one with uncertainty (with faith) and 
revisit all changes at the last phase.
   ```
   Yes, agree on both. I propose that we can mark down the uncertain methods or 
the ones without the caller side for now in the PR. When I submitting the 
reference PR, I can link the comment to the newly created PRs. It should help 
to our review and make sure I don't miss to explain any uncertainty during the 
review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32161: [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page.

2021-05-13 Thread GitBox


SparkQA removed a comment on pull request #32161:
URL: https://github.com/apache/spark/pull/32161#issuecomment-840310729


   **[Test build #138497 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138497/testReport)**
 for PR 32161 at commit 
[`bb5cd45`](https://github.com/apache/spark/commit/bb5cd4529b07b05b21cdaf878b06b61ad717be79).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32161: [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page.

2021-05-13 Thread GitBox


SparkQA commented on pull request #32161:
URL: https://github.com/apache/spark/pull/32161#issuecomment-840451449


   **[Test build #138497 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138497/testReport)**
 for PR 32161 at commit 
[`bb5cd45`](https://github.com/apache/spark/commit/bb5cd4529b07b05b21cdaf878b06b61ad717be79).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader

2021-05-13 Thread GitBox


SparkQA removed a comment on pull request #32515:
URL: https://github.com/apache/spark/pull/32515#issuecomment-840310366


   **[Test build #138493 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138493/testReport)**
 for PR 32515 at commit 
[`b8b54ea`](https://github.com/apache/spark/commit/b8b54ea9cb3bdbb8f50bdb260567dedd2af9fe1b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   >