[GitHub] [spark] SparkQA commented on pull request #32899: [SPARK-35652][SQL][3.0] joinWith on two table generated from same one
SparkQA commented on pull request #32899: URL: https://github.com/apache/spark/pull/32899#issuecomment-862066764 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44369/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32907: [SPARK-35757][CORE] Add bitwise AND operation and functionality for intersecting bloom filters
SparkQA commented on pull request #32907: URL: https://github.com/apache/spark/pull/32907#issuecomment-862066215 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44368/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32921: [WIP][SPARK-35779][SQL] Dynamic filtering for Data Source V2
SparkQA removed a comment on pull request #32921: URL: https://github.com/apache/spark/pull/32921#issuecomment-862005335 **[Test build #139834 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139834/testReport)** for PR 32921 at commit [`04ae0e3`](https://github.com/apache/spark/commit/04ae0e363e98cc0a8af1100ef11f08d9f2e47d1a). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32921: [WIP][SPARK-35779][SQL] Dynamic filtering for Data Source V2
SparkQA commented on pull request #32921: URL: https://github.com/apache/spark/pull/32921#issuecomment-862057877 **[Test build #139834 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139834/testReport)** for PR 32921 at commit [`04ae0e3`](https://github.com/apache/spark/commit/04ae0e363e98cc0a8af1100ef11f08d9f2e47d1a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] StefanXiepj opened a new pull request #32925: SPARK-35622: DataFrame's count function do not need groupBy and avoid shuffle
StefanXiepj opened a new pull request #32925: URL: https://github.com/apache/spark/pull/32925 ### What changes were proposed in this pull request? Use `df.rdd.count()` replace `df.count()`. ### Why are the changes needed? DataFrame's count function do not need groupBy, use `df.rdd.count()` replace `df.count()` and avoid shuffle ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added UT test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] allisonwang-db commented on a change in pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans
allisonwang-db commented on a change in pull request #32787: URL: https://github.com/apache/spark/pull/32787#discussion_r652363922 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveSubquerySuite.scala ## @@ -177,4 +178,61 @@ class ResolveSubquerySuite extends AnalysisTest { condition = Some(sum('a) === sum('c))) assertAnalysisError(plan, Seq("Invalid expressions: [sum(a), sum(c)]")) } + + test("SPARK-35618: lateral join with star expansion") { Review comment: @maropu I looked into how regex expressions are resolved and the logic is actually different from star expressions. It won't throw exceptions when there is no match. Instead, it returns an empty sequence. So we can't tell if the regex expression is resolved by the current plan with an empty output, or it can't be resolved. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on pull request #32924: [SPARK-35771][SQL] Format year-month intervals using type fields
sarutak commented on pull request #32924: URL: https://github.com/apache/spark/pull/32924#issuecomment-862052884 cc: @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak opened a new pull request #32924: Format year-month intervals using type fields.
sarutak opened a new pull request #32924: URL: https://github.com/apache/spark/pull/32924 ### What changes were proposed in this pull request? This PR proposes to format year-month interval to strings using the start and end fields of `YearMonthIntervalType`. ### Why are the changes needed? Currently, they are ignored, and any `YearMonthIntervalType` is formatted as `INTERVAL YEAR TO MONTH`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sumeetgajjar commented on pull request #32912: [SPARK-35429][CORE] Remove commons-httpclient from Hadoop-3.2 profile due to EOL and CVEs
sumeetgajjar commented on pull request #32912: URL: https://github.com/apache/spark/pull/32912#issuecomment-862050777 Thank you @dongjoon-hyun @wangyum and @sunchao for the quick review and comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] weixiuli opened a new pull request #32923: [SPARK-35783][SQL] Set the list of read columns in the task configuration to reduce reading of ORC data.
weixiuli opened a new pull request #32923: URL: https://github.com/apache/spark/pull/32923 ### What changes were proposed in this pull request? Set the list of read columns in the task configuration to reduce reading of ORC data. ### Why are the changes needed? Now, if the read column list is not set in the task configuration, it will read all columns in the ORC table. Therefore, we should set the list of read columns in the task configuration to reduce reading of ORC data. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? exist unittests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on pull request #32907: [SPARK-35757][CORE] Add bitwise AND operation and functionality for intersecting bloom filters
wangyum commented on pull request #32907: URL: https://github.com/apache/spark/pull/32907#issuecomment-862050091 @kudhru Please fix the code style first: ``` Scalastyle checks failed at following occurrences: [error] /home/jenkins/workspace/SparkPullRequestBuilder/common/sketch/src/test/scala/org/apache/spark/util/sketch/BloomFilterSuite.scala:102: File line length exceeds 100 characters [error] Total time: 46 s, completed Jun 15, 2021 10:21:32 PM ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32914: [SPARK-35763][SS] Add a new copy method to StateStoreCustomMetric
SparkQA removed a comment on pull request #32914: URL: https://github.com/apache/spark/pull/32914#issuecomment-861936755 **[Test build #139829 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139829/testReport)** for PR 32914 at commit [`1031593`](https://github.com/apache/spark/commit/10315931fff2ad06030cbc8e017f01e0be8593bc). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32914: [SPARK-35763][SS] Add a new copy method to StateStoreCustomMetric
SparkQA commented on pull request #32914: URL: https://github.com/apache/spark/pull/32914#issuecomment-862048837 **[Test build #139829 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139829/testReport)** for PR 32914 at commit [`1031593`](https://github.com/apache/spark/commit/10315931fff2ad06030cbc8e017f01e0be8593bc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] kudhru commented on pull request #32907: [SPARK-35757][CORE] Add bitwise AND operation and functionality for intersecting bloom filters
kudhru commented on pull request #32907: URL: https://github.com/apache/spark/pull/32907#issuecomment-862048748 I am a bit confused as to how and when will this PR be merged into the master branch. Could someone please clarify? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32907: [SPARK-35757][CORE] Add bitwise AND operation and functionality for intersecting bloom filters
SparkQA removed a comment on pull request #32907: URL: https://github.com/apache/spark/pull/32907#issuecomment-862047283 **[Test build #139844 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139844/testReport)** for PR 32907 at commit [`5cb4bd0`](https://github.com/apache/spark/commit/5cb4bd0966e25e0c2a374a43729ef3732027a23b). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32907: [SPARK-35757][CORE] Add bitwise AND operation and functionality for intersecting bloom filters
SparkQA commented on pull request #32907: URL: https://github.com/apache/spark/pull/32907#issuecomment-862048308 **[Test build #139844 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139844/testReport)** for PR 32907 at commit [`5cb4bd0`](https://github.com/apache/spark/commit/5cb4bd0966e25e0c2a374a43729ef3732027a23b). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class MakeYMInterval(years: Expression, months: Expression)` * `case class YearMonthIntervalType(startField: Byte, endField: Byte) extends AtomicType ` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31992: [SPARK-34898][CORE] We should log SparkListenerExecutorMetricsUpdateEvent of `driver` appropriately when `spark.eventLog.logStageExe
AmplabJenkins commented on pull request #31992: URL: https://github.com/apache/spark/pull/31992#issuecomment-862048157 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44365/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31992: [SPARK-34898][CORE] We should log SparkListenerExecutorMetricsUpdateEvent of `driver` appropriately when `spark.eventLog.logStageExecutorM
SparkQA commented on pull request #31992: URL: https://github.com/apache/spark/pull/31992#issuecomment-862048138 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44365/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32801: [SPARK-12567][SQL] Add aes_encrypt and aes_decrypt builtin functions
AmplabJenkins removed a comment on pull request #32801: URL: https://github.com/apache/spark/pull/32801#issuecomment-862047380 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44364/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on pull request #32922: [SPARK-35774][SQL] Parse any year-month interval types in SQL
sarutak commented on pull request #32922: URL: https://github.com/apache/spark/pull/32922#issuecomment-862047487 cc: @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32801: [SPARK-12567][SQL] Add aes_encrypt and aes_decrypt builtin functions
SparkQA commented on pull request #32801: URL: https://github.com/apache/spark/pull/32801#issuecomment-862047355 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44364/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32801: [SPARK-12567][SQL] Add aes_encrypt and aes_decrypt builtin functions
AmplabJenkins commented on pull request #32801: URL: https://github.com/apache/spark/pull/32801#issuecomment-862047380 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44364/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32907: [SPARK-35757][CORE] Add bitwise AND operation and functionality for intersecting bloom filters
SparkQA commented on pull request #32907: URL: https://github.com/apache/spark/pull/32907#issuecomment-862047283 **[Test build #139844 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139844/testReport)** for PR 32907 at commit [`5cb4bd0`](https://github.com/apache/spark/commit/5cb4bd0966e25e0c2a374a43729ef3732027a23b). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak opened a new pull request #32922: [SPARK-35774][SQL] Parse any year-month interval types in SQL
sarutak opened a new pull request #32922: URL: https://github.com/apache/spark/pull/32922 ### What changes were proposed in this pull request? This PR extends the parser rules to be able to parse the following types: * INTERVAL YEAR * INTERVAL YEAR TO MONTH * INTERVAL MONTH ### Why are the changes needed? For ANSI compliance. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New assertion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31992: [SPARK-34898][CORE] We should log SparkListenerExecutorMetricsUpdateEvent of `driver` appropriately when `spark.eventLog.logStageE
SparkQA removed a comment on pull request #31992: URL: https://github.com/apache/spark/pull/31992#issuecomment-862044801 **[Test build #139843 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139843/testReport)** for PR 31992 at commit [`87a079e`](https://github.com/apache/spark/commit/87a079e5fa159dad343adcf8ac3f158ff4870f6b). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31992: [SPARK-34898][CORE] We should log SparkListenerExecutorMetricsUpdateEvent of `driver` appropriately when `spark.eventLog.log
AmplabJenkins removed a comment on pull request #31992: URL: https://github.com/apache/spark/pull/31992#issuecomment-862046824 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139843/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31992: [SPARK-34898][CORE] We should log SparkListenerExecutorMetricsUpdateEvent of `driver` appropriately when `spark.eventLog.logStageExe
AmplabJenkins commented on pull request #31992: URL: https://github.com/apache/spark/pull/31992#issuecomment-862046824 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139843/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31992: [SPARK-34898][CORE] We should log SparkListenerExecutorMetricsUpdateEvent of `driver` appropriately when `spark.eventLog.logStageExecutorM
SparkQA commented on pull request #31992: URL: https://github.com/apache/spark/pull/31992#issuecomment-862046804 **[Test build #139843 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139843/testReport)** for PR 31992 at commit [`87a079e`](https://github.com/apache/spark/commit/87a079e5fa159dad343adcf8ac3f158ff4870f6b). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32907: [SPARK-35757][CORE] Add bitwise AND operation and functionality for intersecting bloom filters
AmplabJenkins removed a comment on pull request #32907: URL: https://github.com/apache/spark/pull/32907#issuecomment-860798654 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32907: [SPARK-35757][CORE] Add bitwise AND operation and functionality for intersecting bloom filters
SparkQA removed a comment on pull request #32907: URL: https://github.com/apache/spark/pull/32907#issuecomment-862041054 **[Test build #139840 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139840/testReport)** for PR 32907 at commit [`f1aec5e`](https://github.com/apache/spark/commit/f1aec5e39ef135edc39724efb3802eea8053ea37). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31992: [SPARK-34898][CORE] We should log SparkListenerExecutorMetricsUpdateEvent of `driver` appropriately when `spark.eventLog.logStageExecutorM
SparkQA commented on pull request #31992: URL: https://github.com/apache/spark/pull/31992#issuecomment-862044801 **[Test build #139843 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139843/testReport)** for PR 31992 at commit [`87a079e`](https://github.com/apache/spark/commit/87a079e5fa159dad343adcf8ac3f158ff4870f6b). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #31992: [SPARK-34898][CORE] We should log SparkListenerExecutorMetricsUpdateEvent of `driver` appropriately when `spark.eventLog.logStageExec
AngersZh commented on pull request #31992: URL: https://github.com/apache/spark/pull/31992#issuecomment-862044197 > @AngersZh BTW did you disable GA in your fork repo? It should be enabled so PR leverage the GA resources in your forked repo. No, but this pr is a little long. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #31992: [SPARK-34898][CORE] We should log SparkListenerExecutorMetricsUpdateEvent of `driver` appropriately when `spark.eventLog.logStageExec
AngersZh commented on pull request #31992: URL: https://github.com/apache/spark/pull/31992#issuecomment-862043865 > @AngersZh, mind making the PR description disambiguous? what's "driver executor peakMemoryMetrics"? How about current -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32907: [SPARK-35757][CORE] Add bitwise AND operation and functionality for intersecting bloom filters
AmplabJenkins commented on pull request #32907: URL: https://github.com/apache/spark/pull/32907#issuecomment-862042035 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139840/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32907: [SPARK-35757][CORE] Add bitwise AND operation and functionality for intersecting bloom filters
SparkQA commented on pull request #32907: URL: https://github.com/apache/spark/pull/32907#issuecomment-862042021 **[Test build #139840 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139840/testReport)** for PR 32907 at commit [`f1aec5e`](https://github.com/apache/spark/commit/f1aec5e39ef135edc39724efb3802eea8053ea37). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31992: [SPARK-34898][CORE] We should log SparkListenerExecutorMetricsUpdateEvent of `driver` appropriately when `spark.eventLog.logStageExecutorM
SparkQA commented on pull request #31992: URL: https://github.com/apache/spark/pull/31992#issuecomment-862041476 **[Test build #139842 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139842/testReport)** for PR 31992 at commit [`6c81e2d`](https://github.com/apache/spark/commit/6c81e2dd74b7f49b3ba30b8618d1a502db1246dc). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32914: [SPARK-35763][SS] Add a new copy method to StateStoreCustomMetric
AmplabJenkins removed a comment on pull request #32914: URL: https://github.com/apache/spark/pull/32914#issuecomment-862040635 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32919: [SPARK-35378][SQL][FOLLOWUP] Restore the command execution name for DataFrameWriterV2
SparkQA commented on pull request #32919: URL: https://github.com/apache/spark/pull/32919#issuecomment-862041045 **[Test build #139838 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139838/testReport)** for PR 32919 at commit [`297c43d`](https://github.com/apache/spark/commit/297c43d5e9ed8586820f228a6d1693309a1a0b4d). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32899: [SPARK-35652][SQL][3.0] joinWith on two table generated from same one
SparkQA commented on pull request #32899: URL: https://github.com/apache/spark/pull/32899#issuecomment-862041113 **[Test build #139841 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139841/testReport)** for PR 32899 at commit [`b5c7706`](https://github.com/apache/spark/commit/b5c77069001fc64cb01628554ab2ac7e4bc42c7c). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32921: [WIP][SPARK-35779][SQL] Dynamic filtering for Data Source V2
AmplabJenkins removed a comment on pull request #32921: URL: https://github.com/apache/spark/pull/32921#issuecomment-862040632 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32907: [SPARK-35757][CORE] Add bitwise AND operation and functionality for intersecting bloom filters
SparkQA commented on pull request #32907: URL: https://github.com/apache/spark/pull/32907#issuecomment-862041054 **[Test build #139840 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139840/testReport)** for PR 32907 at commit [`f1aec5e`](https://github.com/apache/spark/commit/f1aec5e39ef135edc39724efb3802eea8053ea37). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32914: [SPARK-35763][SS] Add a new copy method to StateStoreCustomMetric
SparkQA commented on pull request #32914: URL: https://github.com/apache/spark/pull/32914#issuecomment-862041004 **[Test build #139839 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139839/testReport)** for PR 32914 at commit [`7211c4b`](https://github.com/apache/spark/commit/7211c4b407e5bf8353af3d2cbf6f072c79d5a175). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32921: [WIP][SPARK-35779][SQL] Dynamic filtering for Data Source V2
AmplabJenkins commented on pull request #32921: URL: https://github.com/apache/spark/pull/32921#issuecomment-862040632 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32914: [SPARK-35763][SS] Add a new copy method to StateStoreCustomMetric
AmplabJenkins commented on pull request #32914: URL: https://github.com/apache/spark/pull/32914#issuecomment-862040639 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32921: [WIP][SPARK-35779][SQL] Dynamic filtering for Data Source V2
SparkQA removed a comment on pull request #32921: URL: https://github.com/apache/spark/pull/32921#issuecomment-862006513 **[Test build #139835 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139835/testReport)** for PR 32921 at commit [`202be14`](https://github.com/apache/spark/commit/202be14f09cbacd94de9d0bc3b518c87292f3878). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32921: [WIP][SPARK-35779][SQL] Dynamic filtering for Data Source V2
SparkQA commented on pull request #32921: URL: https://github.com/apache/spark/pull/32921#issuecomment-862039808 **[Test build #139835 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139835/testReport)** for PR 32921 at commit [`202be14`](https://github.com/apache/spark/commit/202be14f09cbacd94de9d0bc3b518c87292f3878). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #31992: [SPARK-34898][CORE] We should log SparkListenerExecutorMetricsUpdateEvent of `driver` appropriately when `spark.eventLog.logStageExecu
HyukjinKwon commented on pull request #31992: URL: https://github.com/apache/spark/pull/31992#issuecomment-862038185 cc @HeartSaVioR too FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #31992: [SPARK-34898][CORE] We should log SparkListenerExecutorMetricsUpdateEvent of `driver` appropriately when `spark.eventLog.logStageExecu
HyukjinKwon commented on pull request #31992: URL: https://github.com/apache/spark/pull/31992#issuecomment-862038027 @AngersZh BTW did you disable GA in your fork repo? It should be enabled so PR leverage the GA resources in your forked repo. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm edited a comment on pull request #32385: [WIP][SPARK-35275][CORE] Add checksum for shuffle blocks and diagnose corruption
mridulm edited a comment on pull request #32385: URL: https://github.com/apache/spark/pull/32385#issuecomment-862024269 lol, thanks for the links @Ngone51 :-) Glad I went through this once more anyway - will help me with better understanding of the sub-pr's ! Will wait for the update before taking a look at #32401. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm edited a comment on pull request #32385: [WIP][SPARK-35275][CORE] Add checksum for shuffle blocks and diagnose corruption
mridulm edited a comment on pull request #32385: URL: https://github.com/apache/spark/pull/32385#issuecomment-862024269 lol, thanks for the links @Ngone51 :-) Glad I went through this once more anyway - will help me with better understanding of the sub-pr's ! Will wait for the update before taking a look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #31992: [SPARK-34898][CORE] We should send SparkListenerExecutorMetricsUpdateEventLog of `driver` appropriately
HyukjinKwon commented on pull request #31992: URL: https://github.com/apache/spark/pull/31992#issuecomment-862036980 Also do you mean `SparkListenerExecutorMetricsUpdateEvent` by `SparkListenerExecutorMetricsUpdateEventLog`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #31992: [SPARK-34898][CORE] We should send SparkListenerExecutorMetricsUpdateEventLog of `driver` appropriately
HyukjinKwon commented on pull request #31992: URL: https://github.com/apache/spark/pull/31992#issuecomment-862036119 @AngersZh, mind making the PR description disambiguous? what's "driver executor peakMemoryMetrics"? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32801: [SPARK-12567][SQL] Add aes_encrypt and aes_decrypt builtin functions
SparkQA commented on pull request #32801: URL: https://github.com/apache/spark/pull/32801#issuecomment-862035784 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44364/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #32907: [SPARK-35757][CORE] Add bitwise AND operation and functionality for intersecting bloom filters
cloud-fan commented on pull request #32907: URL: https://github.com/apache/spark/pull/32907#issuecomment-862035609 @kudhru I think you need to rebase this PR with the latest master branch, and also update the master branch of your spark fork to sync with the latest upstream master branch. Otherwise github action won't work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31992: [SPARK-34898][CORE] We should send SparkListenerExecutorMetricsUpdateEventLog of `driver` appropriately
SparkQA commented on pull request #31992: URL: https://github.com/apache/spark/pull/31992#issuecomment-862035537 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44365/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #31992: [SPARK-34898][CORE] We should send SparkListenerExecutorMetricsUpdateEventLog of `driver` appropriately
HyukjinKwon commented on a change in pull request #31992: URL: https://github.com/apache/spark/pull/31992#discussion_r652346351 ## File path: core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala ## @@ -618,6 +619,10 @@ class EventLoggingListenerSuite extends SparkFunSuite with LocalSparkContext wit assert(expected.stageInfo.stageId === actual.stageInfo.stageId) case (expected: SparkListenerTaskEnd, actual: SparkListenerTaskEnd) => assert(expected.stageId === actual.stageId) + case (expected: SparkListenerExecutorMetricsUpdate, + actual: SparkListenerExecutorMetricsUpdate) => Review comment: ```suggestion actual: SparkListenerExecutorMetricsUpdate) => ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #32907: [SPARK-35757][CORE] Add bitwise AND operation and functionality for intersecting bloom filters
cloud-fan commented on pull request #32907: URL: https://github.com/apache/spark/pull/32907#issuecomment-862035082 ok to test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file
cloud-fan commented on a change in pull request #32401: URL: https://github.com/apache/spark/pull/32401#discussion_r652345736 ## File path: core/src/main/java/org/apache/spark/shuffle/api/ShuffleMapOutputWriter.java ## @@ -68,8 +72,11 @@ *for that partition id. * * 2) An optional metadata blob that can be used by shuffle readers. + * + * @param checksums The checksum values for each partition if shuffle checksum enabled. + * Otherwise, it's empty. */ - MapOutputCommitMessage commitAllPartitions() throws IOException; + MapOutputCommitMessage commitAllPartitions(long[] checksums) throws IOException; Review comment: TBH I don't think the current shuffle API provides enough abstraction to do checksum. I'm OK with this change as the shuffle API is still private, but we should revisit the shuffle API later, so that checksum can be done at the shuffle implementation side. The current issue I see is, Spark writes local spill files and then asks the shuffle implementation to "transfer" the spill files. Then Spark has to do checksum by itself during spill file writing, to reduce the perf overhead. We can discuss it later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32914: [SPARK-35763][SS] Add a new copy method to StateStoreCustomMetric
SparkQA removed a comment on pull request #32914: URL: https://github.com/apache/spark/pull/32914#issuecomment-862002836 **[Test build #139833 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139833/testReport)** for PR 32914 at commit [`5dddc66`](https://github.com/apache/spark/commit/5dddc6644c9e76f78ebd0d44a4c23ee80b0fd55c). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32914: [SPARK-35763][SS] Add a new copy method to StateStoreCustomMetric
SparkQA commented on pull request #32914: URL: https://github.com/apache/spark/pull/32914#issuecomment-862033749 **[Test build #139833 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139833/testReport)** for PR 32914 at commit [`5dddc66`](https://github.com/apache/spark/commit/5dddc6644c9e76f78ebd0d44a4c23ee80b0fd55c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32921: [WIP][SPARK-35779][SQL] Dynamic filtering for Data Source V2
SparkQA commented on pull request #32921: URL: https://github.com/apache/spark/pull/32921#issuecomment-862032913 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44363/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] kudhru closed pull request #32907: [SPARK-35757][CORE] Add bitwise AND operation and functionality for intersecting bloom filters
kudhru closed pull request #32907: URL: https://github.com/apache/spark/pull/32907 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #32899: [SPARK-35652][SQL][3.0] joinWith on two table generated from same one
cloud-fan commented on pull request #32899: URL: https://github.com/apache/spark/pull/32899#issuecomment-862031371 retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #31905: [SPARK-34806][SQL] Add Observation helper for Dataset.observe
cloud-fan commented on a change in pull request #31905: URL: https://github.com/apache/spark/pull/31905#discussion_r652340898 ## File path: sql/core/src/main/scala/org/apache/spark/sql/Observation.scala ## @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import java.util.UUID +import java.util.concurrent.TimeUnit +import java.util.concurrent.locks.{Condition, Lock, ReentrantLock} + +import org.apache.spark.sql.execution.QueryExecution +import org.apache.spark.sql.util.QueryExecutionListener + +/** + * Not thread-safe. + * @param name + * @param sparkSession + */ +class Observation(name: String) { Review comment: then shall we have a variant of `waitCompleted` that w/o a timeout? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32914: [SPARK-35763][SS] Add a new copy method to StateStoreCustomMetric
SparkQA commented on pull request #32914: URL: https://github.com/apache/spark/pull/32914#issuecomment-862029828 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44361/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #32470: [SPARK-35712][SQL] Simplify ResolveAggregateFunctions
cloud-fan commented on pull request #32470: URL: https://github.com/apache/spark/pull/32470#issuecomment-862028396 @viirya @maropu Can you take one more look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on pull request #32385: [WIP][SPARK-35275][CORE] Add checksum for shuffle blocks and diagnose corruption
mridulm commented on pull request #32385: URL: https://github.com/apache/spark/pull/32385#issuecomment-862024269 lol, thanks for the links @Ngone51 :-) Glad I went through this once more anyway - will help me with better understanding of the sub-pr's ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm edited a comment on pull request #31992: [SPARK-34898][CORE] We should send SparkListenerExecutorMetricsUpdateEventLog of `driver` appropriately
mridulm edited a comment on pull request #31992: URL: https://github.com/apache/spark/pull/31992#issuecomment-862023314 Thanks for the update @AngersZh, I am fine with merging this ... will keep it around for a couple of days in case there are other comments. +CC @zhouyejoe, @thejdeep PTAL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on pull request #31992: [SPARK-34898][CORE] We should send SparkListenerExecutorMetricsUpdateEventLog of `driver` appropriately
mridulm commented on pull request #31992: URL: https://github.com/apache/spark/pull/31992#issuecomment-862023314 Thanks for the update @AngersZh +CC @zhouyejoe, @thejdeep PTAL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31992: [SPARK-34898][CORE] We should send SparkListenerExecutorMetricsUpdateEventLog of `driver` appropriately
SparkQA commented on pull request #31992: URL: https://github.com/apache/spark/pull/31992#issuecomment-862021515 **[Test build #139837 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139837/testReport)** for PR 31992 at commit [`c9a4a67`](https://github.com/apache/spark/commit/c9a4a6789586771720739cd71354029b64e5b358). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32801: [SPARK-12567][SQL] Add aes_encrypt and aes_decrypt builtin functions
SparkQA commented on pull request #32801: URL: https://github.com/apache/spark/pull/32801#issuecomment-862021191 **[Test build #139836 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139836/testReport)** for PR 32801 at commit [`996f787`](https://github.com/apache/spark/commit/996f787921860d002e275fe530af6b2ab429cf7e). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32921: [WIP][SPARK-35779][SQL] Dynamic filtering for Data Source V2
SparkQA commented on pull request #32921: URL: https://github.com/apache/spark/pull/32921#issuecomment-862021114 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44363/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on a change in pull request #32811: [SPARK-35671][SHUFFLE][CORE] Add support in the ESS to serve merged shuffle block meta and data to executors
mridulm commented on a change in pull request #32811: URL: https://github.com/apache/spark/pull/32811#discussion_r651973359 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java ## @@ -294,18 +336,30 @@ public ShuffleMetrics() { private int index = 0; private final Function blockDataForIndexFn; private final int size; +private boolean requestForMergedBlockChunks; ManagedBufferIterator(OpenBlocks msg) { String appId = msg.appId; String execId = msg.execId; String[] blockIds = msg.blockIds; String[] blockId0Parts = blockIds[0].split("_"); - if (blockId0Parts.length == 4 && blockId0Parts[0].equals("shuffle")) { + if (blockId0Parts.length == 4 && (blockId0Parts[0].equals(SHUFFLE_BLOCK_ID) || +blockId0Parts[0].equals(SHUFFLE_CHUNK_ID))) { final int shuffleId = Integer.parseInt(blockId0Parts[1]); -final int[] mapIdAndReduceIds = shuffleMapIdAndReduceIds(blockIds, shuffleId); -size = mapIdAndReduceIds.length; -blockDataForIndexFn = index -> blockManager.getBlockData(appId, execId, shuffleId, - mapIdAndReduceIds[index], mapIdAndReduceIds[index + 1]); +requestForMergedBlockChunks = blockId0Parts[0].equals(SHUFFLE_CHUNK_ID); +// For regular shuffle blocks, primaryId is mapId and secondaryIds are reduceIds. +// For shuffle chunks, primaryIds is reduceId and secondaryIds are chunkIds. +final int[] primaryIdAndSecondaryIds = shuffleMapIdAndReduceIds(blockIds, shuffleId); +size = primaryIdAndSecondaryIds.length; +blockDataForIndexFn = index -> { + if (requestForMergedBlockChunks) { +return mergeManager.getMergedBlockData(msg.appId, shuffleId, + primaryIdAndSecondaryIds[index], primaryIdAndSecondaryIds[index + 1]); + } else { +return blockManager.getBlockData(msg.appId, msg.execId, shuffleId, + primaryIdAndSecondaryIds[index], primaryIdAndSecondaryIds[index + 1]); + } +}; Review comment: nit: Wondering if this is cleaner if we simply split this out into its own else block for block chunk ? ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java ## @@ -88,82 +94,125 @@ public OneForOneBlockFetcher( if (blockIds.length == 0) { throw new IllegalArgumentException("Zero-sized blockIds array"); } -if (!transportConf.useOldFetchProtocol() && isShuffleBlocks(blockIds)) { +if (!transportConf.useOldFetchProtocol() && areShuffleBlocksOrChunks(blockIds)) { this.blockIds = new String[blockIds.length]; - this.message = createFetchShuffleBlocksMsgAndBuildBlockIds(appId, execId, blockIds); + this.message = createFetchShuffleBlocksOrChunksMsg(appId, execId, blockIds); } else { this.blockIds = blockIds; this.message = new OpenBlocks(appId, execId, blockIds); } } - private boolean isShuffleBlocks(String[] blockIds) { -for (String blockId : blockIds) { - if (!blockId.startsWith("shuffle_")) { -return false; - } + /** + * Check if the array of block IDs are all shuffle block IDs. With push based shuffle, + * the shuffle block ID could be either unmerged shuffle block IDs or merged shuffle chunk + * IDs. For a given stream of shuffle blocks to be fetched in one request, they would be either + * all unmerged shuffle blocks or all merged shuffle chunks. + * @param blockIds block ID array + * @return whether the array contains only shuffle block IDs + */ + private boolean areShuffleBlocksOrChunks(String[] blockIds) { +if (Arrays.stream(blockIds).anyMatch(blockId -> !blockId.startsWith(SHUFFLE_BLOCK_PREFIX))) { + // It comes here because there is a blockId which doesn't have "shuffle_" prefix so we + // check if all the block ids are shuffle chunk Ids. + return Arrays.stream(blockIds).allMatch(blockId -> blockId.startsWith(SHUFFLE_CHUNK_PREFIX)); } return true; } + /** Creates either a {@link FetchShuffleBlocks} or {@link FetchShuffleBlockChunks} message. */ + private AbstractFetchShuffleBlocks createFetchShuffleBlocksOrChunksMsg( + String appId, + String execId, + String[] blockIds) { +if (blockIds[0].startsWith(SHUFFLE_CHUNK_PREFIX)) { + return createFetchShuffleMsgAndBuildBlockIds(appId, execId, blockIds, true); +} else { + return createFetchShuffleMsgAndBuildBlockIds(appId, execId, blockIds, false); +} + } + /** - * Create FetchShuffleBlocks message and rebuild internal blockIds by + * Create FetchShuffleBlocks/FetchShuffleBlockChunks message and rebuild internal blockIds by * analyzing the pass in blockIds. */ - private FetchShuffleBlocks createFetchShuffleBlocksMsgAndBuildBlockIds( - String appId, String execId,
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32049: [SPARK-34952][SQL] Aggregate (Min/Max/Count) push down for Parquet
AmplabJenkins removed a comment on pull request #32049: URL: https://github.com/apache/spark/pull/32049#issuecomment-862019804 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44362/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32914: [SPARK-35763][SS] Add a new copy method to StateStoreCustomMetric
AmplabJenkins removed a comment on pull request #32914: URL: https://github.com/apache/spark/pull/32914#issuecomment-862019805 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44359/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32801: [SPARK-12567][SQL] Add aes_encrypt and aes_decrypt builtin functions
AmplabJenkins removed a comment on pull request #32801: URL: https://github.com/apache/spark/pull/32801#issuecomment-862019802 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44360/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32049: [SPARK-34952][SQL] Aggregate (Min/Max/Count) push down for Parquet
AmplabJenkins commented on pull request #32049: URL: https://github.com/apache/spark/pull/32049#issuecomment-862019804 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44362/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32801: [SPARK-12567][SQL] Add aes_encrypt and aes_decrypt builtin functions
AmplabJenkins commented on pull request #32801: URL: https://github.com/apache/spark/pull/32801#issuecomment-862019802 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44360/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32914: [SPARK-35763][SS] Add a new copy method to StateStoreCustomMetric
AmplabJenkins commented on pull request #32914: URL: https://github.com/apache/spark/pull/32914#issuecomment-862019805 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44359/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32049: [SPARK-34952][SQL] Aggregate (Min/Max/Count) push down for Parquet
SparkQA commented on pull request #32049: URL: https://github.com/apache/spark/pull/32049#issuecomment-862018031 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44362/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32914: [SPARK-35763][SS] Add a new copy method to StateStoreCustomMetric
SparkQA commented on pull request #32914: URL: https://github.com/apache/spark/pull/32914#issuecomment-862017695 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44361/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on pull request #32911: [SPARK-35760][SQL] Fix the max rows check for broadcast exchange
c21 commented on pull request #32911: URL: https://github.com/apache/spark/pull/32911#issuecomment-862016454 Thank you all for review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #31992: [SPARK-34898][CORE] We should send SparkListenerExecutorMetricsUpdateEventLog of `driver` appropriately
AngersZh commented on pull request #31992: URL: https://github.com/apache/spark/pull/31992#issuecomment-862010798 > I don't know enough to have an opinion on this. I think the key questions are - what is the most consistent thing to do, and, are there any performance problems with adding this information to events? Since for spark admin, we always want to build system to know the app's running status in our cluster and let user to change the memory configuration if they don't set it reasonable. So we nee to know the peak memory usage. Although we can get this information form metrics system but we need to integrate restful api's metrics data and metrics system's information. This pr make us can get driver's memory usage from hisitory server's restful api. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31992: [SPARK-34898][CORE] We should send SparkListenerExecutorMetricsUpdateEventLog of `driver` appropriately
AngersZh commented on a change in pull request #31992: URL: https://github.com/apache/spark/pull/31992#discussion_r652323059 ## File path: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ## @@ -249,6 +249,9 @@ private[spark] class EventLoggingListener( } override def onExecutorMetricsUpdate(event: SparkListenerExecutorMetricsUpdate): Unit = { +if (event.execId == SparkContext.DRIVER_IDENTIFIER) { + logEvent(event) +} Review comment: > Currently, we have a single event for both driver and executor metrics update - differentiated by exec id. > I dont have strong opinions on this, but if we have a flag (`shouldLogStageExecutorMetrics`) controlling whether metrics are to be updated, we should consistently apply it IMO. @mridulm Follow this comment, how about current. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32801: [SPARK-12567][SQL] Add aes_encrypt and aes_decrypt builtin functions
SparkQA commented on pull request #32801: URL: https://github.com/apache/spark/pull/32801#issuecomment-862007710 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44360/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32914: [SPARK-35763][SS] Add a new copy method to StateStoreCustomMetric
SparkQA commented on pull request #32914: URL: https://github.com/apache/spark/pull/32914#issuecomment-862007179 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44359/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32921: [WIP][SPARK-35779][SQL] Dynamic filtering for Data Source V2
SparkQA commented on pull request #32921: URL: https://github.com/apache/spark/pull/32921#issuecomment-862006513 **[Test build #139835 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139835/testReport)** for PR 32921 at commit [`202be14`](https://github.com/apache/spark/commit/202be14f09cbacd94de9d0bc3b518c87292f3878). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on pull request #32610: [SPARK-35460][K8S] verify the content of`spark.kubernetes.executor.podNamePrefix` before post it to k8s api-server
yaooqinn commented on pull request #32610: URL: https://github.com/apache/spark/pull/32610#issuecomment-862006483 kindly ping @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] aokolnychyi commented on a change in pull request #32921: [WIP][SPARK-35779][SQL] Dynamic filtering for Data Source V2
aokolnychyi commented on a change in pull request #32921: URL: https://github.com/apache/spark/pull/32921#discussion_r652208722 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala ## @@ -227,3 +228,14 @@ object ReuseSubquery extends Rule[SparkPlan] { } } } + +object PrepareScans extends Rule[SparkPlan] { + def apply(plan: SparkPlan): SparkPlan = { +val scans = plan.collect { + case scan: BatchScanExec => scan +} +scans.foreach(_.prepare()) Review comment: I mention in the doc why I am using `prepare` but we can make this more specific to dynamic filters if needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32921: [WIP][SPARK-35779][SQL] Dynamic filtering for Data Source V2
SparkQA commented on pull request #32921: URL: https://github.com/apache/spark/pull/32921#issuecomment-862005335 **[Test build #139834 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139834/testReport)** for PR 32921 at commit [`04ae0e3`](https://github.com/apache/spark/commit/04ae0e363e98cc0a8af1100ef11f08d9f2e47d1a). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32914: [SPARK-35763][SS] Add a new copy method to StateStoreCustomMetric
SparkQA commented on pull request #32914: URL: https://github.com/apache/spark/pull/32914#issuecomment-862002836 **[Test build #139833 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139833/testReport)** for PR 32914 at commit [`5dddc66`](https://github.com/apache/spark/commit/5dddc6644c9e76f78ebd0d44a4c23ee80b0fd55c). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32049: [SPARK-34952][SQL] Aggregate (Min/Max/Count) push down for Parquet
SparkQA commented on pull request #32049: URL: https://github.com/apache/spark/pull/32049#issuecomment-862001794 **[Test build #139832 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139832/testReport)** for PR 32049 at commit [`a5833ef`](https://github.com/apache/spark/commit/a5833ef7f551980ef48229932d9427a9e00af444). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #32914: [SPARK-35763][SS] Add a new copy method to StateStoreCustomMetric
HeartSaVioR commented on pull request #32914: URL: https://github.com/apache/spark/pull/32914#issuecomment-862001604 retest this, please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #32914: [SPARK-35763][SS] Add a new copy method to StateStoreCustomMetric
HeartSaVioR commented on pull request #32914: URL: https://github.com/apache/spark/pull/32914#issuecomment-862001559 > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139830 StackOverflowError happened while compiling... I'll retrigger again to see whether it's intermittent or not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] aokolnychyi commented on a change in pull request #32921: [WIP][SPARK-35779][SQL] Dynamic filtering for Data Source V2
aokolnychyi commented on a change in pull request #32921: URL: https://github.com/apache/spark/pull/32921#discussion_r652315411 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala ## @@ -96,6 +96,7 @@ case class AdaptiveSparkPlanExec( @transient private val queryStageOptimizerRules: Seq[Rule[SparkPlan]] = Seq( PlanAdaptiveDynamicPruningFilters(this), ReuseAdaptiveSubquery(context.subqueryCache), +PrepareScans, Review comment: @sunchao, per our design doc discussion, removing this explicit call causes test failures. I'll check what is going on tomorrow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32914: [SPARK-35763][SS] Add a new copy method to StateStoreCustomMetric
AmplabJenkins removed a comment on pull request #32914: URL: https://github.com/apache/spark/pull/32914#issuecomment-861999568 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44358/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] asfgit closed pull request #32754: [SPARK-35613][CORE][SQL] Cache commonly occurring strings in SQLMetrics, JSONProtocol and AccumulatorV2 classes
asfgit closed pull request #32754: URL: https://github.com/apache/spark/pull/32754 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32914: [SPARK-35763][SS] Add a new copy method to StateStoreCustomMetric
AmplabJenkins commented on pull request #32914: URL: https://github.com/apache/spark/pull/32914#issuecomment-861999568 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44358/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on pull request #32754: [SPARK-35613][CORE][SQL] Cache commonly occurring strings in SQLMetrics, JSONProtocol and AccumulatorV2 classes
mridulm commented on pull request #32754: URL: https://github.com/apache/spark/pull/32754#issuecomment-861999217 Merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32801: [SPARK-12567][SQL] Add aes_encrypt and aes_decrypt builtin functions
SparkQA commented on pull request #32801: URL: https://github.com/apache/spark/pull/32801#issuecomment-861996098 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44360/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org