[GitHub] [spark] SparkQA commented on pull request #31785: [SPARK-33498][SQL][TESTS][FOLLOWUP] Remove SQLConf.withExistingConf in CastSuite
SparkQA commented on pull request #31785: URL: https://github.com/apache/spark/pull/31785#issuecomment-793509967 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40485/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on pull request #31736: [SPARK-34620][SQL] Code-gen broadcast nested loop join (inner/cross)
c21 commented on pull request #31736: URL: https://github.com/apache/spark/pull/31736#issuecomment-793509585 Close & reopen PR to trigger test rerun as some transient unit test failure happened. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 closed pull request #31736: [SPARK-34620][SQL] Code-gen broadcast nested loop join (inner/cross)
c21 closed pull request #31736: URL: https://github.com/apache/spark/pull/31736 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31735: [WIP][SPARK-34600][Pyspark][SQL] Return User-defined types from Pandas UDF
SparkQA commented on pull request #31735: URL: https://github.com/apache/spark/pull/31735#issuecomment-793508584 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40480/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31758: [SPARK-34639][SQL] Always remove unnecessary Alias in Analyzer.resolveExpression
SparkQA commented on pull request #31758: URL: https://github.com/apache/spark/pull/31758#issuecomment-793507832 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40479/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31783: [SPARK-34152][SQL][FOLLOW-UP] Global temp view's identifier should be correctly stored
SparkQA commented on pull request #31783: URL: https://github.com/apache/spark/pull/31783#issuecomment-793506545 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40482/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #31785: [SPARK-33498][SQL][TESTS][FOLLOWUP] Remove SQLConf.withExistingConf in CastSuite
dongjoon-hyun commented on pull request #31785: URL: https://github.com/apache/spark/pull/31785#issuecomment-793505377 Oh, BTW, @maropu . I realized that this is a follow-up of old SPARK-33498 who fixed version is 3.1.0. It looks a little weird because this goes only to 3.2.0 and 3.1.2. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #31785: [SPARK-33498][SQL][TESTS][FOLLOWUP] Remove SQLConf.withExistingConf in CastSuite
dongjoon-hyun commented on pull request #31785: URL: https://github.com/apache/spark/pull/31785#issuecomment-793504046 There is a conflict on branch-3.1. Could you make a backporting PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #31785: [SPARK-33498][SQL][TESTS][FOLLOWUP] Remove SQLConf.withExistingConf in CastSuite
dongjoon-hyun closed pull request #31785: URL: https://github.com/apache/spark/pull/31785 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #31783: [SPARK-34152][SQL][FOLLOW-UP] Global temp view's identifier should be correctly stored
cloud-fan commented on a change in pull request #31783: URL: https://github.com/apache/spark/pull/31783#discussion_r590039084 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewTestSuite.scala ## @@ -293,22 +295,45 @@ abstract class SQLViewTestSuite extends QueryTest with SQLTestUtils { } } } + + test("SPARK-34152: view's identifier should be correctly stored") { +Seq(true, false).foreach { storeAnalyzed => + withSQLConf(STORE_ANALYZED_PLAN_FOR_VIEW.key -> storeAnalyzed.toString) { Review comment: it's ok to have a little waste in the test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.
cloud-fan commented on pull request #31522: URL: https://github.com/apache/spark/pull/31522#issuecomment-793499270 How often is that? We can also improve the log to make it easier to search for a certain job. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31783: [SPARK-34152][SQL][FOLLOW-UP] Global temp view's identifier should be correctly stored
SparkQA commented on pull request #31783: URL: https://github.com/apache/spark/pull/31783#issuecomment-793495385 **[Test build #135900 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135900/testReport)** for PR 31783 at commit [`3d2ef7f`](https://github.com/apache/spark/commit/3d2ef7f7b424985d035a19a4ec73f8e608221191). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31721: [SPARK-34603][SQL] Support ADD ARCHIVE and LIST ARCHIVES command
SparkQA commented on pull request #31721: URL: https://github.com/apache/spark/pull/31721#issuecomment-793489191 **[Test build #135902 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135902/testReport)** for PR 31721 at commit [`ff708fa`](https://github.com/apache/spark/commit/ff708fa75c8a16943d0187d7854cd57480e22478). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31769: [SPARK-34649][SQL][DOCS] org.apache.spark.sql.DataFrameNaFunctions.replace() fails for column name having a dot
SparkQA commented on pull request #31769: URL: https://github.com/apache/spark/pull/31769#issuecomment-793489039 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40481/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.
AngersZh commented on pull request #31522: URL: https://github.com/apache/spark/pull/31522#issuecomment-793484750 In fact, the cluster environment of many companies is not so healthy, and there are often slow nodes that cause the commit and hive metadata load table/partition to be very slow. We can indeed view it through the log, but for long-running service, especially the Spark Thrift Server, we have a lot of SQL running on it, we also need to go to the background log to find and confirm which SQL the log belongs to. Under normal circumstances, our SQL runs for a long time or there is a problem then we will to view these metrics information. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on a change in pull request #31754: [SPARK-25769][SPARK-34636][SPARK-34626][SQL] sql method in UnresolvedAttribute, AttributeReference and Alias don't quote qualifie
sarutak commented on a change in pull request #31754: URL: https://github.com/apache/spark/pull/31754#discussion_r590018226 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala ## @@ -141,8 +140,7 @@ case class UnresolvedTableValuedFunction( */ case class UnresolvedAttribute(nameParts: Seq[String]) extends Attribute with Unevaluable { - def name: String = -nameParts.map(n => if (n.contains(".")) s"`$n`" else n).mkString(".") + def name: String = nameParts.map(n => if (n.contains(".")) s"`$n`" else n).mkString(".") Review comment: I think it's correct that `($"`a.b`.c".expr.sql)` returns ``` `a.b`.`c` ```. For `UnresolvedAttribute`, we can specify each name part split by `::` so it's not necessary to quote them. For `$"name"` or `col()`, we can just pass single string so we need to quote manually. By the way, I've already noticed that we can't write like ``` $```a.b``` or col(```a.b```) ``` because `UnresolvedAttribute.parseAttributeName` doesn't handle escaped names properly. I believe it's a bug but it's out of scope of this PR and I'll fix it in another PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31735: [WIP][SPARK-34600][Pyspark][SQL] Return User-defined types from Pandas UDF
SparkQA commented on pull request #31735: URL: https://github.com/apache/spark/pull/31735#issuecomment-793479185 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40480/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31758: [SPARK-34639][SQL] Always remove unnecessary Alias in Analyzer.resolveExpression
SparkQA commented on pull request #31758: URL: https://github.com/apache/spark/pull/31758#issuecomment-793478886 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40479/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31736: [SPARK-34620][SQL] Code-gen broadcast nested loop join (inner/cross)
SparkQA commented on pull request #31736: URL: https://github.com/apache/spark/pull/31736#issuecomment-793476281 **[Test build #135901 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135901/testReport)** for PR 31736 at commit [`6f6d511`](https://github.com/apache/spark/commit/6f6d511e571a2109cba4dcd6b31fa3def4e44e21). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #31783: [SPARK-34152][SQL][FOLLOW-UP] Global temp view's identifier should be correctly stored
imback82 commented on a change in pull request #31783: URL: https://github.com/apache/spark/pull/31783#discussion_r590009335 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewTestSuite.scala ## @@ -293,22 +295,45 @@ abstract class SQLViewTestSuite extends QueryTest with SQLTestUtils { } } } + + test("SPARK-34152: view's identifier should be correctly stored") { +Seq(true, false).foreach { storeAnalyzed => + withSQLConf(STORE_ANALYZED_PLAN_FOR_VIEW.key -> storeAnalyzed.toString) { Review comment: This has no effect for permanent view, and please let me know if this is not desirable. I can introduce `isTempView` for each suite and only apply this if the suite is for a local or global temp view. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu opened a new pull request #31785: [SPARK-33498][SQL][FOLLOWUP] Remove SQLConf.withExistingConf in CastSuite
maropu opened a new pull request #31785: URL: https://github.com/apache/spark/pull/31785 ### What changes were proposed in this pull request? This PR intends to remove unnecessary `SQLConf.withExistingConf` in `CastSuite`; since we've remove `ParVector ` in #31775, we no longer need to copy SQL configs into each thread env. ### Why are the changes needed? Clean up the code. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Run the existing tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31769: [SPARK-34649][SQL][DOCS] org.apache.spark.sql.DataFrameNaFunctions.replace() fails for column name having a dot
AmplabJenkins removed a comment on pull request #31769: URL: https://github.com/apache/spark/pull/31769#issuecomment-793474381 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31758: [SPARK-34639][SQL] Always remove unnecessary Alias in Analyzer.resolveExpression
AmplabJenkins removed a comment on pull request #31758: URL: https://github.com/apache/spark/pull/31758#issuecomment-793474383 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135897/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31204: [SPARK-26399][WEBUI][CORE] Add new stage-level REST APIs and parameters
AmplabJenkins removed a comment on pull request #31204: URL: https://github.com/apache/spark/pull/31204#issuecomment-793474386 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135893/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31758: [SPARK-34639][SQL] Always remove unnecessary Alias in Analyzer.resolveExpression
AmplabJenkins commented on pull request #31758: URL: https://github.com/apache/spark/pull/31758#issuecomment-793474383 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135897/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31204: [SPARK-26399][WEBUI][CORE] Add new stage-level REST APIs and parameters
AmplabJenkins commented on pull request #31204: URL: https://github.com/apache/spark/pull/31204#issuecomment-793474386 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135893/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31769: [SPARK-34649][SQL][DOCS] org.apache.spark.sql.DataFrameNaFunctions.replace() fails for column name having a dot
AmplabJenkins commented on pull request #31769: URL: https://github.com/apache/spark/pull/31769#issuecomment-793474382 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.
cloud-fan commented on pull request #31522: URL: https://github.com/apache/spark/pull/31522#issuecomment-793469062 After a second look, I think it's rare that job committing takes a lot of time. If it happens, we can look at the logs to see the commit duration (as well as the hive LOAD TABLE duration). Most of the time this metrics won't be interesting to the users. Thus, I think we don't need to add this metrics. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31769: [SPARK-34649][SQL][DOCS] org.apache.spark.sql.DataFrameNaFunctions.replace() fails for column name having a dot
SparkQA commented on pull request #31769: URL: https://github.com/apache/spark/pull/31769#issuecomment-793467166 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40478/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31204: [SPARK-26399][WEBUI][CORE] Add new stage-level REST APIs and parameters
SparkQA removed a comment on pull request #31204: URL: https://github.com/apache/spark/pull/31204#issuecomment-793384593 **[Test build #135893 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135893/testReport)** for PR 31204 at commit [`135291d`](https://github.com/apache/spark/commit/135291db332a86870ed9ea6af37ee6398a60f808). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31204: [SPARK-26399][WEBUI][CORE] Add new stage-level REST APIs and parameters
SparkQA commented on pull request #31204: URL: https://github.com/apache/spark/pull/31204#issuecomment-793467097 **[Test build #135893 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135893/testReport)** for PR 31204 at commit [`135291d`](https://github.com/apache/spark/commit/135291db332a86870ed9ea6af37ee6398a60f808). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #31763: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager
Ngone51 commented on a change in pull request #31763: URL: https://github.com/apache/spark/pull/31763#discussion_r589993348 ## File path: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ## @@ -52,6 +52,13 @@ private[spark] sealed trait MapStatus { * partitionId of the task or taskContext.taskAttemptId is used. */ def mapId: Long + + /** + * Extra metadata for map status. This could be used by different ShuffleManager implementation + * to store information they need. For example, a Remote Shuffle Service ShuffleManager could + * store shuffle server information and let reducer task know where to fetch shuffle data. + */ + def metadata: Option[Serializable] Review comment: I can see how this PR would work. But even if we merged this PR, the ongoing SPARK-31801 may override your change before the release. `getAllMapOutputStatusMetadata` isn't an API, so you may suffer the breaking change. We're discussing a general way in SPARK-31801 to provide the API for users. IIUC, it'd also benefit your case. WDYT? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #31779: [SPARK-34663][SQL][TESTS] Test year-month and day-time intervals in UDF
cloud-fan closed pull request #31779: URL: https://github.com/apache/spark/pull/31779 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #31779: [SPARK-34663][SQL][TESTS] Test year-month and day-time intervals in UDF
cloud-fan commented on pull request #31779: URL: https://github.com/apache/spark/pull/31779#issuecomment-793466305 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31769: [SPARK-34649][SQL][DOCS] org.apache.spark.sql.DataFrameNaFunctions.replace() fails for column name having a dot
SparkQA commented on pull request #31769: URL: https://github.com/apache/spark/pull/31769#issuecomment-793465124 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40478/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on pull request #31782: [SPARK-34666][SQL][TESTS] Test DayTimeIntervalType and YearMonthIntervalType as ordered and atomic types
yaooqinn commented on pull request #31782: URL: https://github.com/apache/spark/pull/31782#issuecomment-793464997 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #31774: [SPARK-34659] Fix that Web UI always correctly get appId
dongjoon-hyun commented on pull request #31774: URL: https://github.com/apache/spark/pull/31774#issuecomment-793459875 Thank you for making a PR, @ornew . cc @gengliangwang This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #31782: [SPARK-34666][SQL][TESTS] Test DayTimeIntervalType and YearMonthIntervalType as ordered and atomic types
cloud-fan commented on pull request #31782: URL: https://github.com/apache/spark/pull/31782#issuecomment-793459044 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #31782: [SPARK-34666][SQL][TESTS] Test DayTimeIntervalType and YearMonthIntervalType as ordered and atomic types
cloud-fan closed pull request #31782: URL: https://github.com/apache/spark/pull/31782 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #31784: [SPARK-34670][BUILD] Upgrade ZSTD-JNI to 1.4.9-1
dongjoon-hyun commented on pull request #31784: URL: https://github.com/apache/spark/pull/31784#issuecomment-793458106 Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #31784: [SPARK-34670][BUILD] Upgrade ZSTD-JNI to 1.4.9-1
dongjoon-hyun closed pull request #31784: URL: https://github.com/apache/spark/pull/31784 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on a change in pull request #31721: [SPARK-34603][SQL] Support ADD ARCHIVE and LIST ARCHIVES command
sarutak commented on a change in pull request #31721: URL: https://github.com/apache/spark/pull/31721#discussion_r589978096 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala ## @@ -857,6 +857,47 @@ class HiveQuerySuite extends HiveComparisonTest with SQLTestUtils with BeforeAnd assert(sql(s"list file $testFile").count() == 1) } + test("ADD ARCHIVE command") { Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on a change in pull request #31721: [SPARK-34603][SQL] Support ADD ARCHIVE and LIST ARCHIVES command
sarutak commented on a change in pull request #31721: URL: https://github.com/apache/spark/pull/31721#discussion_r589978011 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala ## @@ -857,6 +857,47 @@ class HiveQuerySuite extends HiveComparisonTest with SQLTestUtils with BeforeAnd assert(sql(s"list file $testFile").count() == 1) } + test("ADD ARCHIVE command") { +withTempDir { dir => + val file1 = File.createTempFile("someprefix1", "somesuffix1", dir) + val zipFile = new File(dir, "test.zip") + val file2 = File.createTempFile("someprefix2", "somesuffix2", dir) + // Emulate unsupported archive format with .bz2 suffix. + val unsupportedArchive = new File(dir, "test.bz2") Review comment: Thanks for the advice. I've split it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on pull request #31736: [SPARK-34620][SQL] Code-gen broadcast nested loop join (inner/cross)
c21 commented on pull request #31736: URL: https://github.com/apache/spark/pull/31736#issuecomment-793454012 Addressed all comments for now, and the PR is ready for check again, thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on a change in pull request #31736: [SPARK-34620][SQL] Code-gen broadcast nested loop join (inner/cross)
c21 commented on a change in pull request #31736: URL: https://github.com/apache/spark/pull/31736#discussion_r589976077 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/JoinCodegenSupport.scala ## @@ -0,0 +1,93 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.joins + +import org.apache.spark.sql.catalyst.expressions.{BindReferences, BoundReference} +import org.apache.spark.sql.catalyst.expressions.codegen._ +import org.apache.spark.sql.catalyst.expressions.codegen.Block._ +import org.apache.spark.sql.catalyst.plans.InnerLike +import org.apache.spark.sql.execution.{CodegenSupport, SparkPlan} + +/** + * An interface for those join physical operators that support codegen. + */ +trait JoinCodegenSupport extends CodegenSupport with BaseJoinExec { + + /** + * Generate the (non-equi) condition used to filter joined rows. This is used in Inner, Left Semi + * and Left Anti joins. + */ + protected def getJoinCondition( + ctx: CodegenContext, + input: Seq[ExprCode], + streamedPlan: SparkPlan, + buildPlan: SparkPlan): (String, String, Seq[ExprCode]) = { +val buildRow = ctx.freshName("buildRow") +val buildVars = genBuildSideVars(ctx, buildRow, buildPlan) +val checkCondition = if (condition.isDefined) { + val expr = condition.get + // evaluate the variables from build side that used by condition Review comment: Updated to evaluate build side only. Same as before in `HashJoin.scala`. ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/JoinCodegenSupport.scala ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.joins + +import org.apache.spark.sql.catalyst.expressions.{BindReferences, BoundReference} +import org.apache.spark.sql.catalyst.expressions.codegen._ +import org.apache.spark.sql.catalyst.expressions.codegen.Block._ +import org.apache.spark.sql.catalyst.plans.InnerLike +import org.apache.spark.sql.execution.{CodegenSupport, SparkPlan} + +/** + * An interface for those join physical operators that support codegen. + */ +trait JoinCodegenSupport extends CodegenSupport with BaseJoinExec { + + /** + * Generate the (non-equi) condition used to filter joined rows. + * This is used in Inner, Left Semi and Left Anti joins. + * + * @return Tuple of variable name for row of build side, generated code for condition, + * and generated code for variables of build side. + */ + protected def getJoinCondition( + ctx: CodegenContext, + input: Seq[ExprCode], Review comment: @cloud-fan - updated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] amandeep-sharma commented on a change in pull request #31769: [SPARK-34649][SQL][DOCS] org.apache.spark.sql.DataFrameNaFunctions.replace() fails for column name having a dot
amandeep-sharma commented on a change in pull request #31769: URL: https://github.com/apache/spark/pull/31769#discussion_r589975656 ## File path: docs/sql-migration-guide.md ## @@ -66,6 +66,8 @@ license: | - In Spark 3.2, the output schema of `SHOW TBLPROPERTIES` becomes `key: string, value: string` whether you specify the table property key or not. In Spark 3.1 and earlier, the output schema of `SHOW TBLPROPERTIES` is `value: string` when you specify the table property key. To restore the old schema with the builtin catalog, you can set `spark.sql.legacy.keepCommandOutputSchema` to `true`. - In Spark 3.2, we support typed literals in the partition spec of INSERT and ADD/DROP/RENAME PARTITION. For example, `ADD PARTITION(dt = date'2020-01-01')` adds a partition with date value `2020-01-01`. In Spark 3.1 and earlier, the partition value will be parsed as string value `date '2020-01-01'`, which is an illegal date value, and we add a partition with null value at the end. + + - In Spark 3.2, `DataFrameNaFunctions.replace()` no longer uses exact string match for the input column names. Input column name having a dot in the name (not nested) needs to be escaped with backtick \`. Now, it throws `AnalysisException` if the column is not found in the data frame schema. It also throws `IllegalArgumentException` if the input column name is a nested column. In Spark 3.1 and earlier, it used to ignore invalid input column name and nested column name. Review comment: Done. thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31769: [SPARK-34649][SQL][DOCS] org.apache.spark.sql.DataFrameNaFunctions.replace() fails for column name having a dot
SparkQA commented on pull request #31769: URL: https://github.com/apache/spark/pull/31769#issuecomment-793452495 **[Test build #135898 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135898/testReport)** for PR 31769 at commit [`58679bd`](https://github.com/apache/spark/commit/58679bd9e27bb376fd9e6ea2baa780fe3454f23a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #31769: [SPARK-34649][SQL][DOCS] org.apache.spark.sql.DataFrameNaFunctions.replace() fails for column name having a dot
cloud-fan commented on pull request #31769: URL: https://github.com/apache/spark/pull/31769#issuecomment-793451199 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #31784: [SPARK-34670][BUILD] Upgrade ZSTD-JNI to 1.4.9-1
dongjoon-hyun commented on pull request #31784: URL: https://github.com/apache/spark/pull/31784#issuecomment-793447703 Thank you, @HyukjinKwon ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31749: [SPARK-34627][SQL] Use FunctionIdentifier in UnresolvedTableValuedFunction
AmplabJenkins removed a comment on pull request #31749: URL: https://github.com/apache/spark/pull/31749#issuecomment-793445250 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135886/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31735: [WIP][SPARK-34600][Pyspark][SQL] Return User-defined types from Pandas UDF
AmplabJenkins removed a comment on pull request #31735: URL: https://github.com/apache/spark/pull/31735#issuecomment-793445252 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135895/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31204: [SPARK-26399][WEBUI][CORE] Add new stage-level REST APIs and parameters
AmplabJenkins removed a comment on pull request #31204: URL: https://github.com/apache/spark/pull/31204#issuecomment-793445253 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40476/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31749: [SPARK-34627][SQL] Use FunctionIdentifier in UnresolvedTableValuedFunction
AmplabJenkins commented on pull request #31749: URL: https://github.com/apache/spark/pull/31749#issuecomment-793445250 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135886/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31735: [WIP][SPARK-34600][Pyspark][SQL] Return User-defined types from Pandas UDF
AmplabJenkins commented on pull request #31735: URL: https://github.com/apache/spark/pull/31735#issuecomment-793445252 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135895/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31204: [SPARK-26399][WEBUI][CORE] Add new stage-level REST APIs and parameters
AmplabJenkins commented on pull request #31204: URL: https://github.com/apache/spark/pull/31204#issuecomment-793445253 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40476/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31784: [SPARK-34670][BUILD] Upgrade ZSTD-JNI to 1.4.9-1
AmplabJenkins commented on pull request #31784: URL: https://github.com/apache/spark/pull/31784#issuecomment-793445256 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31204: [SPARK-26399][WEBUI][CORE] Add new stage-level REST APIs and parameters
SparkQA commented on pull request #31204: URL: https://github.com/apache/spark/pull/31204#issuecomment-793438522 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40476/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] hiboyang commented on a change in pull request #31763: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager
hiboyang commented on a change in pull request #31763: URL: https://github.com/apache/spark/pull/31763#discussion_r589964037 ## File path: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ## @@ -52,6 +52,13 @@ private[spark] sealed trait MapStatus { * partitionId of the task or taskContext.taskAttemptId is used. */ def mapId: Long + + /** + * Extra metadata for map status. This could be used by different ShuffleManager implementation + * to store information they need. For example, a Remote Shuffle Service ShuffleManager could + * store shuffle server information and let reducer task know where to fetch shuffle data. + */ + def metadata: Option[Serializable] Review comment: Yes, in this case, agree with you that this PR is a different topic from [SPARK-25299](https://issues.apache.org/jira/browse/SPARK-25299), and the community not have enough bandwidth to work on these two significant projects concurrently. This PR is adding a simple change to make it possible that different custom shuffle managers could add their own metadata inside the MapStatus object. Could we proceed to review this PR then? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #28715: [SPARK-31897][SQL] Enable codegen for GenerateExec
viirya commented on pull request #28715: URL: https://github.com/apache/spark/pull/28715#issuecomment-793435653 BTW, although there is performance benchmark in the PR description, it will not be tracked later. I think it may be good to have a benchmark suite for Generate codegen on/off too. We can follow other benchmark suite. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31783: [SPARK-34152][SQL][FOLLOW-UP] Global temp view's identifier should be correctly stored
dongjoon-hyun commented on a change in pull request #31783: URL: https://github.com/apache/spark/pull/31783#discussion_r589960978 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite.scala ## @@ -909,4 +910,21 @@ abstract class SQLViewSuite extends QueryTest with SQLTestUtils { } } } + + test("SPARK-34152: global temp view's identifier should be correctly stored") { Review comment: +1 for @cloud-fan 's suggestion. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31749: [SPARK-34627][SQL] Use FunctionIdentifier in UnresolvedTableValuedFunction
SparkQA removed a comment on pull request #31749: URL: https://github.com/apache/spark/pull/31749#issuecomment-793206590 **[Test build #135886 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135886/testReport)** for PR 31749 at commit [`d12e984`](https://github.com/apache/spark/commit/d12e9847915d1ed26eb0324eb65fd875f3f9910e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on pull request #31782: [SPARK-34666][SQL][TESTS] Test DayTimeIntervalType and YearMonthIntervalType as ordered and atomic types
MaxGekk commented on pull request #31782: URL: https://github.com/apache/spark/pull/31782#issuecomment-793428096 @cloud-fan @yaooqinn @HyukjinKwon Could you review this PR, please. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31749: [SPARK-34627][SQL] Use FunctionIdentifier in UnresolvedTableValuedFunction
SparkQA commented on pull request #31749: URL: https://github.com/apache/spark/pull/31749#issuecomment-793427797 **[Test build #135886 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135886/testReport)** for PR 31749 at commit [`d12e984`](https://github.com/apache/spark/commit/d12e9847915d1ed26eb0324eb65fd875f3f9910e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #31758: [SPARK-34639][SQL] Always remove unnecessary Alias in Analyzer.resolveExpression
cloud-fan commented on a change in pull request #31758: URL: https://github.com/apache/spark/pull/31758#discussion_r589955898 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1905,11 +1904,13 @@ class Analyzer(override val catalogManager: CatalogManager) .getOrElse(u) } val result = resolved match { -// When trimAlias = true, we will trim unnecessary alias of `GetStructField` and -// we won't trim the alias of top-level `GetStructField`. Since we will call -// CleanupAliases later in Analyzer, trim non top-level unnecessary alias of -// `GetStructField` here is safe. -case Alias(s: GetStructField, _) if trimAlias && !isTopLevel => s +// We trim unnecessary alias of `Get[Array]StructField` here. Note that, we cannot trim +// the alias of top-level `Get[Array]StructField`, as we should resolve +// `UnresolvedAttribute` to a named expression. The caller side can trim the alias of +// top-level `GetStructField` if it's safe to do so. Since we will call CleanupAliases +// later in Analyzer, trim non top-level unnecessary alias here is safe. +case Alias(s: GetStructField, _) if !isTopLevel => s +case Alias(s: GetArrayStructFields, _) if !isTopLevel => s Review comment: good catch! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan edited a comment on pull request #31771: [SPARK-34652][AVRO] Support SchemaRegistry in from_avro method
cloud-fan edited a comment on pull request #31771: URL: https://github.com/apache/spark/pull/31771#issuecomment-792717942 And are you sure that's the common way to use schema registry? I'm not a streaming expert but AFAIK the key advantage of schema registry is to support schema evolution. It's not like the Hive catalog that just stores the metadata. The data should also contain the schema to properly support schema evolution. It's an integrated solution (Kafka + Avro + schema registry). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #31736: [SPARK-34620][SQL] Code-gen broadcast nested loop join (inner/cross)
cloud-fan commented on a change in pull request #31736: URL: https://github.com/apache/spark/pull/31736#discussion_r589954425 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/JoinCodegenSupport.scala ## @@ -0,0 +1,93 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.joins + +import org.apache.spark.sql.catalyst.expressions.{BindReferences, BoundReference} +import org.apache.spark.sql.catalyst.expressions.codegen._ +import org.apache.spark.sql.catalyst.expressions.codegen.Block._ +import org.apache.spark.sql.catalyst.plans.InnerLike +import org.apache.spark.sql.execution.{CodegenSupport, SparkPlan} + +/** + * An interface for those join physical operators that support codegen. + */ +trait JoinCodegenSupport extends CodegenSupport with BaseJoinExec { + + /** + * Generate the (non-equi) condition used to filter joined rows. This is used in Inner, Left Semi + * and Left Anti joins. + */ + protected def getJoinCondition( + ctx: CodegenContext, + input: Seq[ExprCode], + streamedPlan: SparkPlan, + buildPlan: SparkPlan): (String, String, Seq[ExprCode]) = { +val buildRow = ctx.freshName("buildRow") +val buildVars = genBuildSideVars(ctx, buildRow, buildPlan) +val checkCondition = if (condition.isDefined) { + val expr = condition.get + // evaluate the variables from build side that used by condition Review comment: ah, good catch! @viirya is right, we don't need to do that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #31749: [SPARK-34627][SQL] Use FunctionIdentifier in UnresolvedTableValuedFunction
cloud-fan closed pull request #31749: URL: https://github.com/apache/spark/pull/31749 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #31749: [SPARK-34627][SQL] Use FunctionIdentifier in UnresolvedTableValuedFunction
cloud-fan commented on pull request #31749: URL: https://github.com/apache/spark/pull/31749#issuecomment-793420963 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31204: [SPARK-26399][WEBUI][CORE] Add new stage-level REST APIs and parameters
SparkQA commented on pull request #31204: URL: https://github.com/apache/spark/pull/31204#issuecomment-793416850 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40476/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #31769: [SPARK-34649][SQL][DOCS] org.apache.spark.sql.DataFrameNaFunctions.replace() fails for column name having a dot
cloud-fan commented on a change in pull request #31769: URL: https://github.com/apache/spark/pull/31769#discussion_r589949658 ## File path: docs/sql-migration-guide.md ## @@ -66,6 +66,8 @@ license: | - In Spark 3.2, the output schema of `SHOW TBLPROPERTIES` becomes `key: string, value: string` whether you specify the table property key or not. In Spark 3.1 and earlier, the output schema of `SHOW TBLPROPERTIES` is `value: string` when you specify the table property key. To restore the old schema with the builtin catalog, you can set `spark.sql.legacy.keepCommandOutputSchema` to `true`. - In Spark 3.2, we support typed literals in the partition spec of INSERT and ADD/DROP/RENAME PARTITION. For example, `ADD PARTITION(dt = date'2020-01-01')` adds a partition with date value `2020-01-01`. In Spark 3.1 and earlier, the partition value will be parsed as string value `date '2020-01-01'`, which is an illegal date value, and we add a partition with null value at the end. + + - In Spark 3.2, `DataFrameNaFunctions.replace()` no longer uses exact string match for the input column names. Input column name having a dot in the name (not nested) needs to be escaped with backtick \`. Now, it throws `AnalysisException` if the column is not found in the data frame schema. It also throws `IllegalArgumentException` if the input column name is a nested column. In Spark 3.1 and earlier, it used to ignore invalid input column name and nested column name. Review comment: looks good, maybe also explain a little bit why we need to make this change: ``` ... no longer uses exact string match for the input column names, to match the SQL syntax and support qualified column names. ... ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #31769: [SPARK-34649][SQL][DOCS] org.apache.spark.sql.DataFrameNaFunctions.replace() fails for column name having a dot
cloud-fan commented on a change in pull request #31769: URL: https://github.com/apache/spark/pull/31769#discussion_r589949658 ## File path: docs/sql-migration-guide.md ## @@ -66,6 +66,8 @@ license: | - In Spark 3.2, the output schema of `SHOW TBLPROPERTIES` becomes `key: string, value: string` whether you specify the table property key or not. In Spark 3.1 and earlier, the output schema of `SHOW TBLPROPERTIES` is `value: string` when you specify the table property key. To restore the old schema with the builtin catalog, you can set `spark.sql.legacy.keepCommandOutputSchema` to `true`. - In Spark 3.2, we support typed literals in the partition spec of INSERT and ADD/DROP/RENAME PARTITION. For example, `ADD PARTITION(dt = date'2020-01-01')` adds a partition with date value `2020-01-01`. In Spark 3.1 and earlier, the partition value will be parsed as string value `date '2020-01-01'`, which is an illegal date value, and we add a partition with null value at the end. + + - In Spark 3.2, `DataFrameNaFunctions.replace()` no longer uses exact string match for the input column names. Input column name having a dot in the name (not nested) needs to be escaped with backtick \`. Now, it throws `AnalysisException` if the column is not found in the data frame schema. It also throws `IllegalArgumentException` if the input column name is a nested column. In Spark 3.1 and earlier, it used to ignore invalid input column name and nested column name. Review comment: looks good, maybe also explain a little bit why we need to make this change: ``` ... no longer uses exact string match for the input column names, to match the SQL syntax and support qualifier column names. ... ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #31783: [SPARK-34152][SQL][FOLLOW-UP] Global temp view's identifier should be correctly stored
cloud-fan commented on a change in pull request #31783: URL: https://github.com/apache/spark/pull/31783#discussion_r589948567 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite.scala ## @@ -909,4 +910,21 @@ abstract class SQLViewSuite extends QueryTest with SQLTestUtils { } } } + + test("SPARK-34152: global temp view's identifier should be correctly stored") { Review comment: can we move it to the new `GlobalTempViewTestSuite`? we should slowly move all the view tests to `SQLViewTestSuite` framework. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31781: [SPARK-34665][SQL][DOCS] Revise the type coercion section of ANSI Compliance
AmplabJenkins removed a comment on pull request #31781: URL: https://github.com/apache/spark/pull/31781#issuecomment-793408468 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31781: [SPARK-34665][SQL][DOCS] Revise the type coercion section of ANSI Compliance
AmplabJenkins commented on pull request #31781: URL: https://github.com/apache/spark/pull/31781#issuecomment-793408468 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #31781: [SPARK-34665][SQL][DOCS] Revise the type coercion section of ANSI Compliance
cloud-fan commented on pull request #31781: URL: https://github.com/apache/spark/pull/31781#issuecomment-793406886 late LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun opened a new pull request #31784: [SPARK-34670][BUILD] Upgrade ZSTD-JNI to 1.4.9-1
dongjoon-hyun opened a new pull request #31784: URL: https://github.com/apache/spark/pull/31784 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31204: [SPARK-26399][WEBUI][CORE] Add new stage-level REST APIs and parameters
SparkQA commented on pull request #31204: URL: https://github.com/apache/spark/pull/31204#issuecomment-793384593 **[Test build #135893 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135893/testReport)** for PR 31204 at commit [`135291d`](https://github.com/apache/spark/commit/135291db332a86870ed9ea6af37ee6398a60f808). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu closed pull request #31781: [SPARK-34665][SQL][DOCS] Revise the type coercion section of ANSI Compliance
maropu closed pull request #31781: URL: https://github.com/apache/spark/pull/31781 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #31781: [SPARK-34665][SQL][DOCS] Revise the type coercion section of ANSI Compliance
maropu commented on pull request #31781: URL: https://github.com/apache/spark/pull/31781#issuecomment-793360706 Thanks for the update, @gengliangwang ! LGTM and I'll merge this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on pull request #31783: [SPARK-34152][SQL][FOLLOW-UP] Global temp view's identifier should be correctly stored
imback82 commented on pull request #31783: URL: https://github.com/apache/spark/pull/31783#issuecomment-793343978 cc @cloud-fan TIA! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31778: [SPARK-34545][SQL][3.0] Fix issues with valueCompare feature of pyrolite
AmplabJenkins removed a comment on pull request #31778: URL: https://github.com/apache/spark/pull/31778#issuecomment-793316732 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135885/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31778: [SPARK-34545][SQL][3.0] Fix issues with valueCompare feature of pyrolite
AmplabJenkins commented on pull request #31778: URL: https://github.com/apache/spark/pull/31778#issuecomment-793316732 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135885/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31778: [SPARK-34545][SQL][3.0] Fix issues with valueCompare feature of pyrolite
SparkQA removed a comment on pull request #31778: URL: https://github.com/apache/spark/pull/31778#issuecomment-793209612 **[Test build #135885 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135885/testReport)** for PR 31778 at commit [`ec857d2`](https://github.com/apache/spark/commit/ec857d26139d1256267423a2472d3055bd822ef2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31778: [SPARK-34545][SQL][3.0] Fix issues with valueCompare feature of pyrolite
SparkQA commented on pull request #31778: URL: https://github.com/apache/spark/pull/31778#issuecomment-793315043 **[Test build #135885 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135885/testReport)** for PR 31778 at commit [`ec857d2`](https://github.com/apache/spark/commit/ec857d26139d1256267423a2472d3055bd822ef2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31782: [SPARK-34666][SQL][TESTS] Test DayTimeIntervalType and YearMonthIntervalType as ordered and atomic types
AmplabJenkins removed a comment on pull request #31782: URL: https://github.com/apache/spark/pull/31782#issuecomment-793313342 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135883/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31693: [SPARK-34448][ML] Binary logistic regression incorrectly computes the intercept and coefficients with small var features
AmplabJenkins removed a comment on pull request #31693: URL: https://github.com/apache/spark/pull/31693#issuecomment-793313120 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40471/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31769: [SPARK-34649][SQL][DOCS] org.apache.spark.sql.DataFrameNaFunctions.replace() fails for column name having a dot
AmplabJenkins removed a comment on pull request #31769: URL: https://github.com/apache/spark/pull/31769#issuecomment-793313121 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40474/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31782: [SPARK-34666][SQL][TESTS] Test DayTimeIntervalType and YearMonthIntervalType as ordered and atomic types
AmplabJenkins commented on pull request #31782: URL: https://github.com/apache/spark/pull/31782#issuecomment-793313342 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135883/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31783: [SPARK-34152][SQL][FOLLOW-UP] Global temp view's identifier should be correctly stored
AmplabJenkins commented on pull request #31783: URL: https://github.com/apache/spark/pull/31783#issuecomment-793313125 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40473/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31693: [SPARK-34448][ML] Binary logistic regression incorrectly computes the intercept and coefficients with small var features
AmplabJenkins commented on pull request #31693: URL: https://github.com/apache/spark/pull/31693#issuecomment-793313120 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40471/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31769: [SPARK-34649][SQL][DOCS] org.apache.spark.sql.DataFrameNaFunctions.replace() fails for column name having a dot
AmplabJenkins commented on pull request #31769: URL: https://github.com/apache/spark/pull/31769#issuecomment-793313121 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40474/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31782: [SPARK-34666][SQL][TESTS] Test DayTimeIntervalType and YearMonthIntervalType as ordered and atomic types
SparkQA removed a comment on pull request #31782: URL: https://github.com/apache/spark/pull/31782#issuecomment-793141449 **[Test build #135883 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135883/testReport)** for PR 31782 at commit [`049edf6`](https://github.com/apache/spark/commit/049edf612566b4efa3cf4464d9fdf0c54347). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31782: [SPARK-34666][SQL][TESTS] Test DayTimeIntervalType and YearMonthIntervalType as ordered and atomic types
SparkQA commented on pull request #31782: URL: https://github.com/apache/spark/pull/31782#issuecomment-793312224 **[Test build #135883 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135883/testReport)** for PR 31782 at commit [`049edf6`](https://github.com/apache/spark/commit/049edf612566b4efa3cf4464d9fdf0c54347). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31693: [SPARK-34448][ML] Binary logistic regression incorrectly computes the intercept and coefficients with small var features
SparkQA commented on pull request #31693: URL: https://github.com/apache/spark/pull/31693#issuecomment-793295064 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40471/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #31763: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager
Ngone51 commented on a change in pull request #31763: URL: https://github.com/apache/spark/pull/31763#discussion_r589899892 ## File path: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ## @@ -52,6 +52,13 @@ private[spark] sealed trait MapStatus { * partitionId of the task or taskContext.taskAttemptId is used. */ def mapId: Long + + /** + * Extra metadata for map status. This could be used by different ShuffleManager implementation + * to store information they need. For example, a Remote Shuffle Service ShuffleManager could + * store shuffle server information and let reducer task know where to fetch shuffle data. + */ + def metadata: Option[Serializable] Review comment: Then, it'd be a totally different topic, right? IIUC, SPARK-25299 could also benefit custom shuffle manager if SPARK-25299(custom storage) is pluggable with a custom shuffle manager. Ideally, a custom shuffle manager should be able to plugin in different storages. We might not think deeply about how to support custom shuffle managers when working on SPARK-25299 but need to keep in mind the "pluggable". After SPARK-25299 completed, then, we can start to enhance the support for custom shuffle managers. That being said, I think you are still free to raise a separate discussion on supporting the custom shuffle manager. For example, what are the shortcomings of the current framework, and what should be improved...I just wonder the community may not have enough bandwidth to work on these two significant projects concurently. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #31598: [SPARK-34478][SQL] When build SparkSession, we should check config keys
AngersZh commented on pull request #31598: URL: https://github.com/apache/spark/pull/31598#issuecomment-793286274 > This might get a little hard to remember to maintain, but pretty OK. @cloud-fan WDYT? After https://github.com/apache/spark/pull/31598#discussion_r586110532, I think it will be easier. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31769: [SPARK-34649][SQL][DOCS] org.apache.spark.sql.DataFrameNaFunctions.replace() fails for column name having a dot
AmplabJenkins removed a comment on pull request #31769: URL: https://github.com/apache/spark/pull/31769#issuecomment-793282868 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135891/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31769: [SPARK-34649][SQL][DOCS] org.apache.spark.sql.DataFrameNaFunctions.replace() fails for column name having a dot
AmplabJenkins commented on pull request #31769: URL: https://github.com/apache/spark/pull/31769#issuecomment-793282868 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135891/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31783: [SPARK-34152][SQL][FOLLOW-UP] Global temp view's identifier should be correctly stored
AmplabJenkins commented on pull request #31783: URL: https://github.com/apache/spark/pull/31783#issuecomment-793282804 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135890/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on a change in pull request #31761: [SPARK-34295][CORE] Exclude filesystems from token renewal
sunchao commented on a change in pull request #31761: URL: https://github.com/apache/spark/pull/31761#discussion_r589882783 ## File path: core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala ## @@ -99,11 +100,24 @@ private[deploy] class HadoopFSDelegationTokenProvider private def fetchDelegationTokens( renewer: String, filesystems: Set[FileSystem], - creds: Credentials): Credentials = { + creds: Credentials, + hadoopConf: Configuration, + sparkConf: SparkConf): Credentials = { + +// The hosts on which the file systems to be excluded from token renewal +val fsToExclude = sparkConf.get(KERBEROS_FILESYSTEM_RENEWAL_EXCLUDE) + .map(new Path(_).getFileSystem(hadoopConf).getUri.getHost) + .toSet filesystems.foreach { fs => - logInfo(s"getting token for: $fs with renewer $renewer") - fs.addDelegationTokens(renewer, creds) + if (fsToExclude.contains(fs.getUri.getHost)) { +// RM skips renewing token with empty renewer Review comment: Hmm does it only apply to YARN though? It seems Spark has its own `HadoopDelegationTokenManager` which is separated from YARN. Also, it seems Spark has its own renewal logic and I'm not sure how the empty string for renewer approach work in the case. See `HadoopDelegationTokenManager.scheduleRenewal`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org