[GitHub] [spark] SparkQA commented on pull request #30819: [SPARK-33819][CORE][3.1] SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be `package private`
SparkQA commented on pull request #30819: URL: https://github.com/apache/spark/pull/30819#issuecomment-747277614 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37539/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30818: [SPARK-33822][SQL] Use the `CastSupport.cast` method in HashJoin
dongjoon-hyun commented on pull request #30818: URL: https://github.com/apache/spark/pull/30818#issuecomment-747277478 Thanks, I merged two issues by copying all description into one JIRA (SPARK-33822 Bug). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30818: [SPARK-33822][SQL] Use the `CastSupport.cast` method in HashJoin
SparkQA commented on pull request #30818: URL: https://github.com/apache/spark/pull/30818#issuecomment-747277413 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37540/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #30818: [SPARK-33823][SQL] Use the `CastSupport.cast` method in HashJoin
HyukjinKwon commented on pull request #30818: URL: https://github.com/apache/spark/pull/30818#issuecomment-747275647 Yeah let's port it back. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #30776: [SPARK-33787][SQL] Add the `purge` parameter to `dropPartition()` of `SupportsPartitionManagement`
MaxGekk commented on a change in pull request #30776: URL: https://github.com/apache/spark/pull/30776#discussion_r544878422 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java ## @@ -82,4 +82,27 @@ void createPartitions( * @return true if partitions were deleted, false if any partition not exists */ boolean dropPartitions(InternalRow[] idents); + + /** + * Drop an array of partitions atomically from table. + * + * If any partition doesn't exists, + * the operation of dropPartitions need to be safely rolled back. + * + * If the catalog supports the option to purge a table, this method must be overridden. The + * default implementation falls back to {@link #dropPartitions(InternalRow[])} dropPartitions} if + * the purge option is set to false. Otherwise, it throws {@link UnsupportedOperationException}. + * + * @param idents an array of partition identifiers + * @param purge whether a partition should be purged + * @return true if partitions were deleted, false if any partition not exists + * + * @since 3.2.0 + */ + default boolean dropPartitions(InternalRow[] idents, boolean purge) { Review comment: Here are changes for 3.1: https://github.com/apache/spark/pull/30821 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk opened a new pull request #30821: [SPARK-33787][SQL][3.1] Add the `purge` parameter to `dropPartition()` of `SupportsPartitionManagement`
MaxGekk opened a new pull request #30821: URL: https://github.com/apache/spark/pull/30821 ### What changes were proposed in this pull request? Add default methods `dropPartition()` in `SupportsPartitionManagement` and `dropPartitions()` in `SupportsAtomicPartitionManagement()` with the `purge` parameter. Also, propagate the parameter from the logical `command` to V2 exec node and further to catalog implementations. ### Why are the changes needed? The sql statement `ALTER TABLE .. DROP PARTITION` allows to specify the `PURGE` flag but it is ignored by v2 implementation. We should propagate it at least to `SupportsPartitionManagement` and to `SupportsAtomicPartitionManagement`, and let to implementations of those interface decide how to support it. ### Does this PR introduce _any_ user-facing change? Should not. ### How was this patch tested? By running new UT: ``` $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTablePartitionV2SQLSuite" ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #30790: [SPARK-33798][SQL] Add new rule to push down the foldable expressions through CaseWhen/If
maropu commented on a change in pull request #30790: URL: https://github.com/apache/spark/pull/30790#discussion_r544876405 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ## @@ -528,6 +528,28 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper { } +/** + * Push the foldable expression into (if / case) branches. + */ +object PushFoldableIntoBranches extends Rule[LogicalPlan] with PredicateHelper { + def apply(plan: LogicalPlan): LogicalPlan = plan transform { Review comment: nit: we cannot use `transformAllExpressions` here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30717: [SPARK-33599][SQL] Group exception messages in catalyst/analysis
SparkQA removed a comment on pull request #30717: URL: https://github.com/apache/spark/pull/30717#issuecomment-747175935 **[Test build #132917 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132917/testReport)** for PR 30717 at commit [`9a06fc9`](https://github.com/apache/spark/commit/9a06fc9f38168a2f4cbae44be234ad2b7dc49ff8). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30816: [SPARK-33818][DOC] Doc `spark.sql.parser.quotedRegexColumnNames`
SparkQA removed a comment on pull request #30816: URL: https://github.com/apache/spark/pull/30816#issuecomment-747259249 **[Test build #132938 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132938/testReport)** for PR 30816 at commit [`9df29ca`](https://github.com/apache/spark/commit/9df29cac8cf2de3c8216068263132f5950924b83). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #30818: [SPARK-33823][SQL] Use the `CastSupport.cast` method in HashJoin
dongjoon-hyun edited a comment on pull request #30818: URL: https://github.com/apache/spark/pull/30818#issuecomment-747267974 BTW, @maropu . Why do you make another JIRA for this? Also, the new JIRA seems to be an `Improvement`? Do we need more patch to handle Q5? If you don't mind, shall we use the original issue, SPARK-33822 (Bug) instead of SPARK-33823 (Improvement)? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #30818: [SPARK-33823][SQL] Use the `CastSupport.cast` method in HashJoin
dongjoon-hyun edited a comment on pull request #30818: URL: https://github.com/apache/spark/pull/30818#issuecomment-747267974 BTW, @maropu . Why do you make another JIRA for this? Also, the new JIRA seems to be an `Improvement`? If you don't mind, shall we use the original issue, SPARK-33822 (Bug) instead of SPARK-33823 (Improvement)? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30817: [SPARK-33821][BUILD] Upgrade SBT to 1.4.5
SparkQA commented on pull request #30817: URL: https://github.com/apache/spark/pull/30817#issuecomment-747268531 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37536/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30815: [SPARK-33817][SQL] CACHE TABLE uses a logical plan when caching a query to avoid creating a dataframe
SparkQA commented on pull request #30815: URL: https://github.com/apache/spark/pull/30815#issuecomment-747268149 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37537/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30818: [SPARK-33823][SQL] Use the `CastSupport.cast` method in HashJoin
dongjoon-hyun commented on pull request #30818: URL: https://github.com/apache/spark/pull/30818#issuecomment-747267974 BTW, @maropu . Why do you make another JIRA for this? Also, the new JIRA seems to be an `Improvement`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] kiszk commented on a change in pull request #30810: [WIP][ML] Add a vectorized BLAS implementation
kiszk commented on a change in pull request #30810: URL: https://github.com/apache/spark/pull/30810#discussion_r544868623 ## File path: mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala ## @@ -18,28 +18,46 @@ package org.apache.spark.ml.linalg import com.github.fommil.netlib.{BLAS => NetlibBLAS, F2jBLAS} -import com.github.fommil.netlib.BLAS.{getInstance => NativeBLAS} +import scala.util.Try + +import org.apache.spark.util.Utils /** * BLAS routines for MLlib's vectors and matrices. */ private[spark] object BLAS extends Serializable { - @transient private var _f2jBLAS: NetlibBLAS = _ + @transient private var _javaBLAS: NetlibBLAS = _ @transient private var _nativeBLAS: NetlibBLAS = _ private val nativeL1Threshold: Int = 256 Review comment: Will we need to change this value depending on which javaBLAS is selected? vectorized version or F2JBLAS. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on a change in pull request #30573: [SPARK-26341][WEBUI]Expose executor memory metrics at the stage level, in the Stages tab
sarutak commented on a change in pull request #30573: URL: https://github.com/apache/spark/pull/30573#discussion_r544868604 ## File path: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ## @@ -687,6 +687,9 @@ private[spark] class AppStatusListener( stage.killedSummary = killedTasksSummary(event.reason, stage.killedSummary) } stage.activeTasksPerExecutor(event.taskInfo.executorId) -= 1 + + stage.executorSummary(event.taskInfo.executorId).peakExecutorMetrics +.compareAndUpdatePeakValues(event.taskExecutorMetrics) Review comment: O.K, I'll merge later if there are no objections. Thanks for the response. @gengliangwang This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30717: [SPARK-33599][SQL] Group exception messages in catalyst/analysis
AmplabJenkins commented on pull request #30717: URL: https://github.com/apache/spark/pull/30717#issuecomment-747264692 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/132917/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30816: [SPARK-33818][DOC] Doc `spark.sql.parser.quotedRegexColumnNames`
AmplabJenkins commented on pull request #30816: URL: https://github.com/apache/spark/pull/30816#issuecomment-747264601 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/132938/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30816: [SPARK-33818][DOC] Doc `spark.sql.parser.quotedRegexColumnNames`
SparkQA commented on pull request #30816: URL: https://github.com/apache/spark/pull/30816#issuecomment-747264450 **[Test build #132938 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132938/testReport)** for PR 30816 at commit [`9df29ca`](https://github.com/apache/spark/commit/9df29cac8cf2de3c8216068263132f5950924b83). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30717: [SPARK-33599][SQL] Group exception messages in catalyst/analysis
SparkQA commented on pull request #30717: URL: https://github.com/apache/spark/pull/30717#issuecomment-747263881 **[Test build #132917 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132917/testReport)** for PR 30717 at commit [`9a06fc9`](https://github.com/apache/spark/commit/9a06fc9f38168a2f4cbae44be234ad2b7dc49ff8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a change in pull request #30573: [SPARK-26341][WEBUI]Expose executor memory metrics at the stage level, in the Stages tab
gengliangwang commented on a change in pull request #30573: URL: https://github.com/apache/spark/pull/30573#discussion_r544864373 ## File path: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ## @@ -687,6 +687,9 @@ private[spark] class AppStatusListener( stage.killedSummary = killedTasksSummary(event.reason, stage.killedSummary) } stage.activeTasksPerExecutor(event.taskInfo.executorId) -= 1 + + stage.executorSummary(event.taskInfo.executorId).peakExecutorMetrics +.compareAndUpdatePeakValues(event.taskExecutorMetrics) Review comment: @sarutak No, I am ok with it :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30820: [SPARK-33819][CORE][3.0] SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be `package private`
SparkQA commented on pull request #30820: URL: https://github.com/apache/spark/pull/30820#issuecomment-747262059 **[Test build #132941 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132941/testReport)** for PR 30820 at commit [`b9b2388`](https://github.com/apache/spark/commit/b9b23885bdb687468aaced31f3d9465dc49f0a81). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29695: [SPARK-22390][SPARK-32833][SQL] [WIP]JDBC V2 Datasource aggregate push down
SparkQA commented on pull request #29695: URL: https://github.com/apache/spark/pull/29695#issuecomment-747260714 **[Test build #132940 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132940/testReport)** for PR 29695 at commit [`710ff80`](https://github.com/apache/spark/commit/710ff80aaad361bead755d78049cd56431cd65be). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun opened a new pull request #30820: [SPARK-33819][CORE][3.0] SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be `package private`
dongjoon-hyun opened a new pull request #30820: URL: https://github.com/apache/spark/pull/30820 ### What changes were proposed in this pull request? This PR aims to convert `EventLogFileReader`'s derived classes into `package private`. - SingleFileEventLogFileReader - RollingEventLogFilesFileReader `EventLogFileReader` itself is used in `scheduler` module during tests. ### Why are the changes needed? This classes were designed to be internal. This PR hides it explicitly to reduce the maintenance burden. ### Does this PR introduce _any_ user-facing change? Yes, but these were exposed accidentally. ### How was this patch tested? Pass CIs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30818: [SPARK-33823][SQL] Use the `CastSupport.cast` method in HashJoin
SparkQA commented on pull request #30818: URL: https://github.com/apache/spark/pull/30818#issuecomment-747259797 **[Test build #132937 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132937/testReport)** for PR 30818 at commit [`7e33df3`](https://github.com/apache/spark/commit/7e33df3d1a6d7c5b33c446ea106877e5de0cb643). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30816: [SPARK-33818][DOC] Doc `spark.sql.parser.quotedRegexColumnNames`
SparkQA commented on pull request #30816: URL: https://github.com/apache/spark/pull/30816#issuecomment-747259249 **[Test build #132938 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132938/testReport)** for PR 30816 at commit [`9df29ca`](https://github.com/apache/spark/commit/9df29cac8cf2de3c8216068263132f5950924b83). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30790: [SPARK-33798][SQL] Add new rule to push down the foldable expressions through CaseWhen/If
SparkQA commented on pull request #30790: URL: https://github.com/apache/spark/pull/30790#issuecomment-747259282 **[Test build #132939 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132939/testReport)** for PR 30790 at commit [`8ccc3c1`](https://github.com/apache/spark/commit/8ccc3c1f8804e34b46fd9bec19a017db61b37f54). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30819: [SPARK-33819][CORE][3.1] SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be `package private`
SparkQA commented on pull request #30819: URL: https://github.com/apache/spark/pull/30819#issuecomment-747259209 **[Test build #132936 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132936/testReport)** for PR 30819 at commit [`132d50a`](https://github.com/apache/spark/commit/132d50aecd86c7fdf9743272c95df310efebb492). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #29695: [SPARK-22390][SPARK-32833][SQL] [WIP]JDBC V2 Datasource aggregate push down
huaxingao commented on a change in pull request #29695: URL: https://github.com/apache/spark/pull/29695#discussion_r544860215 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala ## @@ -21,6 +21,7 @@ import scala.collection.mutable.ArrayBuilder import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression +import org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughNonJoin.{getAliasMap, replaceAlias} Review comment: Fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30814: [SPARK-33819][CORE] SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be `package private`
AmplabJenkins removed a comment on pull request #30814: URL: https://github.com/apache/spark/pull/30814#issuecomment-747258018 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30543: [SPARK-33597][SQL] Support REGEXP_LIKE for consistent with mainstream databases
AmplabJenkins removed a comment on pull request #30543: URL: https://github.com/apache/spark/pull/30543#issuecomment-747258020 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/37535/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30312: [SPARK-32917][SHUFFLE][CORE] Adds support for executors to push shuffle blocks after successful map task completion
AmplabJenkins removed a comment on pull request #30312: URL: https://github.com/apache/spark/pull/30312#issuecomment-747258016 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/132924/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30790: [SPARK-33798][SQL] Add new rule to push down the foldable expressions through CaseWhen/If
AmplabJenkins removed a comment on pull request #30790: URL: https://github.com/apache/spark/pull/30790#issuecomment-747258015 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/37538/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #24990: [WIP][SPARK-28191][SS] New data source - state - reader part
AmplabJenkins removed a comment on pull request #24990: URL: https://github.com/apache/spark/pull/24990#issuecomment-747258021 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/132916/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30817: [SPARK-33821][BUILD] Upgrade SBT to 1.4.5
AmplabJenkins removed a comment on pull request #30817: URL: https://github.com/apache/spark/pull/30817#issuecomment-747258019 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/37531/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29695: [SPARK-22390][SPARK-32833][SQL] [WIP]JDBC V2 Datasource aggregate push down
AmplabJenkins commented on pull request #29695: URL: https://github.com/apache/spark/pull/29695#issuecomment-747258014 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/132914/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30543: [SPARK-33597][SQL] Support REGEXP_LIKE for consistent with mainstream databases
AmplabJenkins commented on pull request #30543: URL: https://github.com/apache/spark/pull/30543#issuecomment-747258020 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/37535/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #24990: [WIP][SPARK-28191][SS] New data source - state - reader part
AmplabJenkins commented on pull request #24990: URL: https://github.com/apache/spark/pull/24990#issuecomment-747258021 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/132916/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30312: [SPARK-32917][SHUFFLE][CORE] Adds support for executors to push shuffle blocks after successful map task completion
AmplabJenkins commented on pull request #30312: URL: https://github.com/apache/spark/pull/30312#issuecomment-747258016 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/132924/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30790: [SPARK-33798][SQL] Add new rule to push down the foldable expressions through CaseWhen/If
AmplabJenkins commented on pull request #30790: URL: https://github.com/apache/spark/pull/30790#issuecomment-747258015 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/37538/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30814: [SPARK-33819][CORE] SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be `package private`
AmplabJenkins commented on pull request #30814: URL: https://github.com/apache/spark/pull/30814#issuecomment-747258018 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30817: [SPARK-33821][BUILD] Upgrade SBT to 1.4.5
AmplabJenkins commented on pull request #30817: URL: https://github.com/apache/spark/pull/30817#issuecomment-747258019 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/37531/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun opened a new pull request #30819: [SPARK-33819][CORE] SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be `package private`
dongjoon-hyun opened a new pull request #30819: URL: https://github.com/apache/spark/pull/30819 ### What changes were proposed in this pull request? This PR aims to convert `EventLogFileReader`'s derived classes into `package private`. - SingleFileEventLogFileReader - RollingEventLogFilesFileReader `EventLogFileReader` itself is used in `scheduler` module during tests. ### Why are the changes needed? This classes were designed to be internal. This PR hides it explicitly to reduce the maintenance burden. ### Does this PR introduce _any_ user-facing change? Yes, but these were exposed accidentally. ### How was this patch tested? Pass CIs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #30818: [SPARK-33823][SQL] Use the `CastSupport.cast` method in HashJoin
maropu commented on pull request #30818: URL: https://github.com/apache/spark/pull/30818#issuecomment-747256274 cc: @dongjoon-hyun @cloud-fan @HyukjinKwon This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30817: [SPARK-33821][BUILD] Upgrade SBT to 1.4.5
SparkQA commented on pull request #30817: URL: https://github.com/apache/spark/pull/30817#issuecomment-747255650 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37536/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30815: [SPARK-33817][SQL] CACHE TABLE uses a logical plan when caching a query to avoid creating a dataframe
SparkQA commented on pull request #30815: URL: https://github.com/apache/spark/pull/30815#issuecomment-747255085 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37537/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30814: [SPARK-33819][CORE] SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be `package private`
SparkQA removed a comment on pull request #30814: URL: https://github.com/apache/spark/pull/30814#issuecomment-747196136 **[Test build #132926 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132926/testReport)** for PR 30814 at commit [`dbd7444`](https://github.com/apache/spark/commit/dbd7444a7c10e1331e467e9530695099ca0e4668). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30814: [SPARK-33819][CORE] SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be `package private`
SparkQA commented on pull request #30814: URL: https://github.com/apache/spark/pull/30814#issuecomment-747254566 **[Test build #132926 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132926/testReport)** for PR 30814 at commit [`dbd7444`](https://github.com/apache/spark/commit/dbd7444a7c10e1331e467e9530695099ca0e4668). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class EventLogFileReader(` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #30818: [SPARK-33823][SQL] Use the `CastSupport.cast` method in HashJoin
maropu commented on pull request #30818: URL: https://github.com/apache/spark/pull/30818#issuecomment-747253568 To add unit tests, I'm looking for a simpler query to reproduce this. But, I've not found it yet. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30817: [SPARK-33821][BUILD] Upgrade SBT to 1.4.5
dongjoon-hyun commented on pull request #30817: URL: https://github.com/apache/spark/pull/30817#issuecomment-747253610 Thank you, @HyukjinKwon ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu opened a new pull request #30818: [SPARK-33823][SQL] Use the `CastSupport.cast` method in HashJoin
maropu opened a new pull request #30818: URL: https://github.com/apache/spark/pull/30818 ### What changes were proposed in this pull request? This PR intends to fix the bug that throws a unsupported exception when running [the TPCDS q5](https://github.com/apache/spark/blob/master/sql/core/src/test/resources/tpcds/q5.sql) with AQE enabled ([this option is enabled by default now](https://github.com/apache/spark/commit/031c5ef280e0cba8c4718a6457a44b6cccb17f46)): ``` java.lang.UnsupportedOperationException: BroadcastExchange does not support the execute() code path. at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecute(BroadcastExchangeExec.scala:189) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) at org.apache.spark.sql.execution.exchange.ReusedExchangeExec.doExecute(Exchange.scala:60) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) at org.apache.spark.sql.execution.adaptive.QueryStageExec.doExecute(QueryStageExec.scala:115) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:321) at org.apache.spark.sql.execution.SparkPlan.executeCollectIterator(SparkPlan.scala:397) at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:118) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:185) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ... ``` I've checked the AQE code and I found `EnsureRequirements` wrongly puts `BroadcastExchange` on a top of `BroadcastQueryStage` in the `reOptimize` phase as follows: ``` +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint)),false), [id=#2183] +- BroadcastQueryStage 2 +- ReusedExchange [d_date_sk#1086], BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint)),false), [id=#1963] ``` A root cause is that a `Cast` class in a required child's distribution does not have a `timeZoneId` field (`timeZoneId=None`), and a `Cast` class in `child.outputPartitioning` has it. So, this difference can make the distribution requirement check fail in `EnsureRequirements`: https://github.com/apache/spark/blob/1e85707738a830d33598ca267a6740b3f06b1861/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala#L47-L50 To fix this issue, this PR proposes to use the `CastSupport.cast` method in `HashJoin`. ### Why are the changes needed? Bugfix. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually checked that q5 passed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #30799: [SPARK-33803][SQL] Sort table properties by key in DESCRIBE TABLE command
MaxGekk commented on a change in pull request #30799: URL: https://github.com/apache/spark/pull/30799#discussion_r544854547 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala ## @@ -388,7 +388,8 @@ case class CatalogTable( def toLinkedHashMap: mutable.LinkedHashMap[String, String] = { val map = new mutable.LinkedHashMap[String, String]() -val tableProperties = properties.map(p => p._1 + "=" + p._2).mkString("[", ", ", "]") +val tableProperties = properties.toSeq.sortBy(_._1) + .map(p => p._1 + "=" + p._2).mkString("[", ", ", "]") Review comment: ah, sorry, we can guarantee uniqueness of keys. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30814: [SPARK-33819][CORE] SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be `package private`
dongjoon-hyun commented on pull request #30814: URL: https://github.com/apache/spark/pull/30814#issuecomment-747250785 Thank you! I'll make backporting PRs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30312: [SPARK-32917][SHUFFLE][CORE] Adds support for executors to push shuffle blocks after successful map task completion
SparkQA removed a comment on pull request #30312: URL: https://github.com/apache/spark/pull/30312#issuecomment-747193023 **[Test build #132924 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132924/testReport)** for PR 30312 at commit [`bcebb13`](https://github.com/apache/spark/commit/bcebb13eaa1da1d6970c6f99b2c6996e1f6fbd4d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30312: [SPARK-32917][SHUFFLE][CORE] Adds support for executors to push shuffle blocks after successful map task completion
SparkQA commented on pull request #30312: URL: https://github.com/apache/spark/pull/30312#issuecomment-747250596 **[Test build #132924 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132924/testReport)** for PR 30312 at commit [`bcebb13`](https://github.com/apache/spark/commit/bcebb13eaa1da1d6970c6f99b2c6996e1f6fbd4d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #30799: [SPARK-33803][SQL] Sort table properties by key in DESCRIBE TABLE command
MaxGekk commented on a change in pull request #30799: URL: https://github.com/apache/spark/pull/30799#discussion_r544852303 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala ## @@ -388,7 +388,8 @@ case class CatalogTable( def toLinkedHashMap: mutable.LinkedHashMap[String, String] = { val map = new mutable.LinkedHashMap[String, String]() -val tableProperties = properties.map(p => p._1 + "=" + p._2).mkString("[", ", ", "]") +val tableProperties = properties.toSeq.sortBy(_._1) + .map(p => p._1 + "=" + p._2).mkString("[", ", ", "]") Review comment: Should we sort by entire strings (._1 and ._2)? For example, if you have `key0 = b` and `key0 = a`, you may get unstable output either `[key0 = b, key0 = a]` or `[key0 = a, key0 = b]` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #30799: [SPARK-33803][SQL] Sort table properties by key in DESCRIBE TABLE command
MaxGekk commented on a change in pull request #30799: URL: https://github.com/apache/spark/pull/30799#discussion_r544852303 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala ## @@ -388,7 +388,8 @@ case class CatalogTable( def toLinkedHashMap: mutable.LinkedHashMap[String, String] = { val map = new mutable.LinkedHashMap[String, String]() -val tableProperties = properties.map(p => p._1 + "=" + p._2).mkString("[", ", ", "]") +val tableProperties = properties.toSeq.sortBy(_._1) + .map(p => p._1 + "=" + p._2).mkString("[", ", ", "]") Review comment: Should we sort by entire strings? For example, if you have `key0 = b` and `key0 = a`, you may get unstable output either `[key0 = b, key0 = a]` or `[key0 = a, key0 = b]` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #30814: [SPARK-33819][CORE] SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be `package private`
HyukjinKwon closed pull request #30814: URL: https://github.com/apache/spark/pull/30814 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #30814: [SPARK-33819][CORE] SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be `package private`
HyukjinKwon commented on pull request #30814: URL: https://github.com/apache/spark/pull/30814#issuecomment-747248600 Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #30817: [SPARK-33821][BUILD] Upgrade SBT to 1.4.5
HyukjinKwon closed pull request #30817: URL: https://github.com/apache/spark/pull/30817 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #30817: [SPARK-33821][BUILD] Upgrade SBT to 1.4.5
HyukjinKwon commented on pull request #30817: URL: https://github.com/apache/spark/pull/30817#issuecomment-747247902 Merged to master and branch-3.1. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30817: [SPARK-33821][BUILD] Upgrade SBT to 1.4.5
SparkQA commented on pull request #30817: URL: https://github.com/apache/spark/pull/30817#issuecomment-747247389 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37531/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on a change in pull request #30573: [SPARK-26341][WEBUI]Expose executor memory metrics at the stage level, in the Stages tab
sarutak commented on a change in pull request #30573: URL: https://github.com/apache/spark/pull/30573#discussion_r544849995 ## File path: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ## @@ -687,6 +687,9 @@ private[spark] class AppStatusListener( stage.killedSummary = killedTasksSummary(event.reason, stage.killedSummary) } stage.activeTasksPerExecutor(event.taskInfo.executorId) -= 1 + + stage.executorSummary(event.taskInfo.executorId).peakExecutorMetrics +.compareAndUpdatePeakValues(event.taskExecutorMetrics) Review comment: > It seems that the first metrics of peakExecutorMetrics become 0 instead of -1 after this. @AngersZh Do you know the reason? @gengliangwang By this change, `peakExecutorMetrics` is updated not only `onExecutorMetricsUpdate` but also `onTaskEnd`. So, the peak value carried by `SparkListenerTaskEnd` is `0`, the corresponding peak values in `peakExecutorMetrics` is set to `0`. Do you have any concern? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29695: [SPARK-22390][SPARK-32833][SQL] [WIP]JDBC V2 Datasource aggregate push down
SparkQA removed a comment on pull request #29695: URL: https://github.com/apache/spark/pull/29695#issuecomment-747162438 **[Test build #132914 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132914/testReport)** for PR 29695 at commit [`18b9455`](https://github.com/apache/spark/commit/18b94552d70bd4a6d370e8e6b92a6f0f07b4649e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #24990: [WIP][SPARK-28191][SS] New data source - state - reader part
SparkQA removed a comment on pull request #24990: URL: https://github.com/apache/spark/pull/24990#issuecomment-747163826 **[Test build #132916 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132916/testReport)** for PR 24990 at commit [`a495f6d`](https://github.com/apache/spark/commit/a495f6d56411f2f3bb1e271babe9efad008b3959). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29695: [SPARK-22390][SPARK-32833][SQL] [WIP]JDBC V2 Datasource aggregate push down
SparkQA commented on pull request #29695: URL: https://github.com/apache/spark/pull/29695#issuecomment-747246451 **[Test build #132914 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132914/testReport)** for PR 29695 at commit [`18b9455`](https://github.com/apache/spark/commit/18b94552d70bd4a6d370e8e6b92a6f0f07b4649e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #24990: [WIP][SPARK-28191][SS] New data source - state - reader part
SparkQA commented on pull request #24990: URL: https://github.com/apache/spark/pull/24990#issuecomment-747246397 **[Test build #132916 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132916/testReport)** for PR 24990 at commit [`a495f6d`](https://github.com/apache/spark/commit/a495f6d56411f2f3bb1e271babe9efad008b3959). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30814: [SPARK-33819][CORE] SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be `package private`
SparkQA removed a comment on pull request #30814: URL: https://github.com/apache/spark/pull/30814#issuecomment-747193231 **[Test build #132922 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132922/testReport)** for PR 30814 at commit [`a32cd47`](https://github.com/apache/spark/commit/a32cd479ae00490a941ebe0115d16700308e06be). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30814: [SPARK-33819][CORE] SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be `package private`
SparkQA commented on pull request #30814: URL: https://github.com/apache/spark/pull/30814#issuecomment-747244715 **[Test build #132922 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132922/testReport)** for PR 30814 at commit [`a32cd47`](https://github.com/apache/spark/commit/a32cd479ae00490a941ebe0115d16700308e06be). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30790: [SPARK-33798][SQL] Simplify EqualTo(CaseWhen/If, Literal) always false
AmplabJenkins removed a comment on pull request #30790: URL: https://github.com/apache/spark/pull/30790#issuecomment-747161688 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30790: [SPARK-33798][SQL] Simplify EqualTo(CaseWhen/If, Literal) always false
SparkQA commented on pull request #30790: URL: https://github.com/apache/spark/pull/30790#issuecomment-747233718 **[Test build #132935 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132935/testReport)** for PR 30790 at commit [`f9f622f`](https://github.com/apache/spark/commit/f9f622f96c20d1787488c9ea392f0b857be532b5). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30817: [SPARK-33821][BUILD] Upgrade SBT to 1.4.5
SparkQA commented on pull request #30817: URL: https://github.com/apache/spark/pull/30817#issuecomment-747233514 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37531/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30815: [SPARK-33817][SQL] CACHE TABLE uses a logical plan when caching a query to avoid creating a dataframe
SparkQA commented on pull request #30815: URL: https://github.com/apache/spark/pull/30815#issuecomment-747233417 **[Test build #132934 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132934/testReport)** for PR 30815 at commit [`52de954`](https://github.com/apache/spark/commit/52de9548ebf713a83f649cb5dbb86c1b9e7a1750). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30770: [SPARK-33783][SS] Unload State Store Provider after configured keep alive time
AmplabJenkins removed a comment on pull request #30770: URL: https://github.com/apache/spark/pull/30770#issuecomment-747214936 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/132910/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya edited a comment on pull request #30770: [SPARK-33783][SS] Unload State Store Provider after configured keep alive time
viirya edited a comment on pull request #30770: URL: https://github.com/apache/spark/pull/30770#issuecomment-747232971 > So IMHO the right direction would be either trying our best to unload inactive state ASAP, or considering the replication as the further improvement. Not somewhere in between. Even the latter wouldn't be an improvement if we could enforce Spark to respect the active executor of the state. Regarding "the replication", I read through the previous comments. I'm not pretty sure if I understand your point correctly. Is it any different than reusing the stores of previous batch? Because seems to me, you are against to have previous stores kept in TTL and reuse them, or I misread your comments? But the replication sounds similar to me. Can you elaborate it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30817: [SPARK-33821][BUILD] Upgrade SBT to 1.4.5
AmplabJenkins removed a comment on pull request #30817: URL: https://github.com/apache/spark/pull/30817#issuecomment-747214209 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/132928/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #30770: [SPARK-33783][SS] Unload State Store Provider after configured keep alive time
viirya commented on pull request #30770: URL: https://github.com/apache/spark/pull/30770#issuecomment-747232971 > So IMHO the right direction would be either trying our best to unload inactive state ASAP, or considering the replication as the further improvement. Not somewhere in between. Even the latter wouldn't be an improvement if we could enforce Spark to respect the active executor of the state. Regarding "the replication", I read through the previous comments. I'm not pretty sure if I understand your point correctly. Is it any different than reusing the stores of previous batch? Because seems to me, you are against to have previous stores kept and reuse them, or I misread your comments? But the replication sounds similar to me. Can you elaborate it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30817: [SPARK-33821][BUILD] Upgrade SBT to 1.4.5
SparkQA commented on pull request #30817: URL: https://github.com/apache/spark/pull/30817#issuecomment-747232716 **[Test build #132933 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132933/testReport)** for PR 30817 at commit [`f50a5f5`](https://github.com/apache/spark/commit/f50a5f5eda00158f0395a7de4e8b21b332917b79). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate
AmplabJenkins removed a comment on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-747231846 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/37528/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30809: [SPARK-33812][SQL] Split the histogram column stats when saving to hive metastore as table property
AmplabJenkins removed a comment on pull request #30809: URL: https://github.com/apache/spark/pull/30809#issuecomment-747231834 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30813: [SPARK-33815][SQL] Migrate ALTER TABLE ... SET [SERDE|SERDEPROPERTIES] to use UnresolvedTable to resolve the identifier
AmplabJenkins removed a comment on pull request #30813: URL: https://github.com/apache/spark/pull/30813#issuecomment-747175650 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30814: [SPARK-33819][CORE] SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be `package private`
AmplabJenkins removed a comment on pull request #30814: URL: https://github.com/apache/spark/pull/30814#issuecomment-747231841 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30312: [SPARK-32917][SHUFFLE][CORE] Adds support for executors to push shuffle blocks after successful map task completion
AmplabJenkins removed a comment on pull request #30312: URL: https://github.com/apache/spark/pull/30312#issuecomment-747231842 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/37527/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30772: [SPARK-33733][SQL][2.4] PullOutNondeterministic should check and collect deterministic field
AmplabJenkins removed a comment on pull request #30772: URL: https://github.com/apache/spark/pull/30772#issuecomment-747231845 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/37534/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30816: [SPARK-33818][DOC] Doc `spark.sql.parser.quotedRegexColumnNames`
AmplabJenkins removed a comment on pull request #30816: URL: https://github.com/apache/spark/pull/30816#issuecomment-747213459 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30813: [SPARK-33815][SQL] Migrate ALTER TABLE ... SET [SERDE|SERDEPROPERTIES] to use UnresolvedTable to resolve the identifier
AmplabJenkins commented on pull request #30813: URL: https://github.com/apache/spark/pull/30813#issuecomment-747231839 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/132911/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30816: [SPARK-33818][DOC] Doc `spark.sql.parser.quotedRegexColumnNames`
AmplabJenkins commented on pull request #30816: URL: https://github.com/apache/spark/pull/30816#issuecomment-747231832 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/37532/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30772: [SPARK-33733][SQL][2.4] PullOutNondeterministic should check and collect deterministic field
AmplabJenkins commented on pull request #30772: URL: https://github.com/apache/spark/pull/30772#issuecomment-747231845 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/37534/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30312: [SPARK-32917][SHUFFLE][CORE] Adds support for executors to push shuffle blocks after successful map task completion
AmplabJenkins commented on pull request #30312: URL: https://github.com/apache/spark/pull/30312#issuecomment-747231842 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/37527/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate
AmplabJenkins commented on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-747231846 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/37528/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30814: [SPARK-33819][CORE] SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be `package private`
AmplabJenkins commented on pull request #30814: URL: https://github.com/apache/spark/pull/30814#issuecomment-747231843 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30809: [SPARK-33812][SQL] Split the histogram column stats when saving to hive metastore as table property
AmplabJenkins commented on pull request #30809: URL: https://github.com/apache/spark/pull/30809#issuecomment-747231836 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30816: [SPARK-33818][DOC] Doc `spark.sql.parser.quotedRegexColumnNames`
SparkQA commented on pull request #30816: URL: https://github.com/apache/spark/pull/30816#issuecomment-747229732 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37532/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jzhuge commented on a change in pull request #30806: [SPARK-33808][SQL] DataSource V2: Build logical writes in the optimizer
jzhuge commented on a change in pull request #30806: URL: https://github.com/apache/spark/pull/30806#discussion_r544832653 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala ## @@ -188,15 +189,20 @@ class DataSourceV2Strategy(session: SparkSession) extends Strategy with Predicat orCreate = orCreate) :: Nil } -case AppendData(r: DataSourceV2Relation, query, writeOptions, _) => +case AppendData(r: DataSourceV2Relation, query, writeOptions, _, write) => r.table.asWritable match { case v1 if v1.supports(TableCapability.V1_BATCH_WRITE) => - AppendDataExecV1(v1, writeOptions.asOptions, query, refreshCache(r)) :: Nil + AppendDataExecV1( +v1, writeOptions.asOptions, query, +refreshCache(r), write.get.asInstanceOf[V1Write]) :: Nil Review comment: Possible to avoid instance cast? See my suggestion above. ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala ## @@ -188,15 +189,20 @@ class DataSourceV2Strategy(session: SparkSession) extends Strategy with Predicat orCreate = orCreate) :: Nil } -case AppendData(r: DataSourceV2Relation, query, writeOptions, _) => +case AppendData(r: DataSourceV2Relation, query, writeOptions, _, write) => Review comment: Is `write` guaranteed not be `None`? How about rewriting this case as follows? ``` case AppendData(r @ DataSourceV2Relation(v1: SupportsWrite, _, _, _, _), query, writeOptions, _, Some(v1Write: V1Write)) if v1.supports(TableCapability.V1_BATCH_WRITE) => AppendDataExecV1(v1, writeOptions.asOptions, query, refreshCache(r), v1Write) :: Nil case AppendData(r @ DataSourceV2Relation(v2: SupportsWrite, _, _, _, _), query, writeOptions, _, Some(write)) => AppendDataExec(v2, writeOptions.asOptions, planLater(query), refreshCache(r), write) :: Nil ``` It is not exactly the same as the existing code. Some unmatched cases (not sure how many or if any) will fall through. Exception will be thrown later, instead of right here upon instance cast or Option.get. ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala ## @@ -477,9 +450,10 @@ private[v2] trait TableWriteExecHelper extends V2TableWriteExec with SupportsV1W writeOptions) val writeBuilder = table.newWriteBuilder(info) - val writtenRows = writeBuilder match { -case v1: V1WriteBuilder => writeWithV1(v1.buildForV1Write()) -case v2 => writeWithV2(v2.buildForBatch()) + val write = writeBuilder.build() + val writtenRows = write match { Review comment: Nit: merge line 451-454 into: ``` val writtenRows = table.newWriteBuilder(info).build() match { ``` ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V1FallbackWriters.scala ## @@ -115,14 +80,10 @@ sealed trait V1FallbackWriters extends V2CommandExec with SupportsV1Write { trait SupportsV1Write extends SparkPlan { def plan: LogicalPlan - protected def writeWithV1( - relation: InsertableRelation, - refreshCache: () => Unit = () => ()): Seq[InternalRow] = { + protected def writeWithV1(relation: InsertableRelation): Seq[InternalRow] = { Review comment: Nicely simplified ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala ## @@ -188,15 +189,20 @@ class DataSourceV2Strategy(session: SparkSession) extends Strategy with Predicat orCreate = orCreate) :: Nil } -case AppendData(r: DataSourceV2Relation, query, writeOptions, _) => +case AppendData(r: DataSourceV2Relation, query, writeOptions, _, write) => r.table.asWritable match { case v1 if v1.supports(TableCapability.V1_BATCH_WRITE) => - AppendDataExecV1(v1, writeOptions.asOptions, query, refreshCache(r)) :: Nil + AppendDataExecV1( +v1, writeOptions.asOptions, query, +refreshCache(r), write.get.asInstanceOf[V1Write]) :: Nil case v2 => - AppendDataExec(v2, writeOptions.asOptions, planLater(query), refreshCache(r)) :: Nil + AppendDataExec( +v2, writeOptions.asOptions, planLater(query), +refreshCache(r), write.get) :: Nil Review comment: Possible to avoid Option.get? See my suggestion above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.
[GitHub] [spark] SparkQA commented on pull request #30816: [SPARK-33818][DOC] Doc `spark.sql.parser.quotedRegexColumnNames`
SparkQA commented on pull request #30816: URL: https://github.com/apache/spark/pull/30816#issuecomment-747227772 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37532/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #30790: [SPARK-33798][SQL] Simplify EqualTo(CaseWhen/If, Literal) always false
wangyum commented on a change in pull request #30790: URL: https://github.com/apache/spark/pull/30790#discussion_r544829536 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ## @@ -523,6 +532,14 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper { } else { e.copy(branches = branches.take(i).map(branch => (branch._1, elseValue))) } + + case EqualTo(i @ If(_, trueValue, falseValue), right: Literal) + if i.deterministic && isAlwaysFalse(trueValue :: falseValue :: Nil, right) => +FalseLiteral + + case EqualTo(c @ CaseWhen(branches, elseValue), right: Literal) Review comment: OK. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate
SparkQA commented on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-747221956 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37528/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #30659: [SPARK-33697][SQL] RemoveRedundantProjects should require column ordering by default
cloud-fan commented on pull request #30659: URL: https://github.com/apache/spark/pull/30659#issuecomment-747221963 @allisonwang-db can you open backport PRs for other branches? Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #30659: [SPARK-33697][SQL] RemoveRedundantProjects should require column ordering by default
cloud-fan closed pull request #30659: URL: https://github.com/apache/spark/pull/30659 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #30659: [SPARK-33697][SQL] RemoveRedundantProjects should require column ordering by default
cloud-fan commented on pull request #30659: URL: https://github.com/apache/spark/pull/30659#issuecomment-747221660 thanks, merging to master/3.1! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org