[GitHub] [spark] HyukjinKwon commented on pull request #30478: [SPARK-33525][SQL] Update hive-service-rpc to 3.1.2
HyukjinKwon commented on pull request #30478: URL: https://github.com/apache/spark/pull/30478#issuecomment-734052269 Nice! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile commented on pull request #30494: [SPARK-33551][SQL] Do not use custom shuffle reader for repartition
gatorsmile commented on pull request #30494: URL: https://github.com/apache/spark/pull/30494#issuecomment-734053423 Thanks! Merged to master This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile closed pull request #30494: [SPARK-33551][SQL] Do not use custom shuffle reader for repartition
gatorsmile closed pull request #30494: URL: https://github.com/apache/spark/pull/30494 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30475: [SPARK-33522][SQL] Improve exception messages while handling UnresolvedTableOrView
SparkQA removed a comment on pull request #30475: URL: https://github.com/apache/spark/pull/30475#issuecomment-733922687 **[Test build #131808 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131808/testReport)** for PR 30475 at commit [`147c654`](https://github.com/apache/spark/commit/147c654408749db637825462f7324ce4629f93b6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30312: [SPARK-32917][SHUFFLE][CORE] Adds support for executors to push shuffle blocks after successful map task completion
SparkQA commented on pull request #30312: URL: https://github.com/apache/spark/pull/30312#issuecomment-733968379 **[Test build #131809 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131809/testReport)** for PR 30312 at commit [`40fbafc`](https://github.com/apache/spark/commit/40fbafc0c1378842fb370a7b49ec65de48d0222f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30475: [SPARK-33522][SQL] Improve exception messages while handling UnresolvedTableOrView
AmplabJenkins commented on pull request #30475: URL: https://github.com/apache/spark/pull/30475#issuecomment-733972117 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30506: [SPARK-33565][BUILD][PYTHON] remove python3.8 and fix breakage
SparkQA removed a comment on pull request #30506: URL: https://github.com/apache/spark/pull/30506#issuecomment-733966884 **[Test build #131812 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131812/testReport)** for PR 30506 at commit [`986feb6`](https://github.com/apache/spark/commit/986feb682396c2a44e8846f4099b5ec396afbcf8). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] rdblue commented on pull request #30421: [SPARK-33474][SQL] Support TypeConstructed partition spec value
rdblue commented on pull request #30421: URL: https://github.com/apache/spark/pull/30421#issuecomment-733979208 In general, I support the idea of moving away from tracking partition values using String. I'm not convinced that using a Literal would work much more easily because literal values would need to be cast to the correct type when constructing an `InternalRow` as well. But the idea seems like it's going in the right direction. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun opened a new pull request #30508: Revert "[SPARK-33212][BUILD] Move to shaded clients for Hadoop 3.x profile"
dongjoon-hyun opened a new pull request #30508: URL: https://github.com/apache/spark/pull/30508 ### What changes were proposed in this pull request? This reverts commit cb3fa6c9368e64184a5f7b19688181d11de9511c. ### Why are the changes needed? According to [HADOOP-16080](https://issues.apache.org/jira/browse/HADOOP-16080) since Apache Hadoop 3.1.1, `hadoop-aws` doesn't work with `hadoop-client-api` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CI. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30412: [SPARK-33480][SQL] Support char/varchar type
SparkQA commented on pull request #30412: URL: https://github.com/apache/spark/pull/30412#issuecomment-733985079 **[Test build #131798 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131798/testReport)** for PR 30412 at commit [`8f05eee`](https://github.com/apache/spark/commit/8f05eeeb36b3953a1f0500c7eb0bd664fa4cf70b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30508: Revert "[SPARK-33212][BUILD] Move to shaded clients for Hadoop 3.x profile"
dongjoon-hyun commented on pull request #30508: URL: https://github.com/apache/spark/pull/30508#issuecomment-733985043 cc @sunchao This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30412: [SPARK-33480][SQL] Support char/varchar type
AmplabJenkins commented on pull request #30412: URL: https://github.com/apache/spark/pull/30412#issuecomment-733985770 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30509: [SPARK-33565][PYTHON][BUILD][3.0] Remove py38 spark3
dongjoon-hyun commented on pull request #30509: URL: https://github.com/apache/spark/pull/30509#issuecomment-733987709 Got it. No problem~ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30403: [SPARK-33448][SQL] Support CACHE/UNCACHE TABLE commands for v2 tables
SparkQA commented on pull request #30403: URL: https://github.com/apache/spark/pull/30403#issuecomment-734000115 **[Test build #131818 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131818/testReport)** for PR 30403 at commit [`d0f49ef`](https://github.com/apache/spark/commit/d0f49eff7db0e5775e6ed769fb23e6a1f7cf203a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on pull request #30508: Revert "[SPARK-33212][BUILD] Move to shaded clients for Hadoop 3.x profile"
sunchao commented on pull request #30508: URL: https://github.com/apache/spark/pull/30508#issuecomment-734011469 Yes I'm fine for reverting this first while we searching for other solutions. Let's hope we can still ship this in Spark 3.1 release. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #30131: [SPARK-33220][CORE]Use `scheduleWithFixedDelay` to avoid repeated unnecessary scheduling for a short time
AngersZh commented on a change in pull request #30131: URL: https://github.com/apache/spark/pull/30131#discussion_r530711869 ## File path: core/src/main/scala/org/apache/spark/Heartbeater.scala ## @@ -45,7 +45,8 @@ private[spark] class Heartbeater( val heartbeatTask = new Runnable() { override def run(): Unit = Utils.logUncaughtExceptions(reportHeartbeat()) } -heartbeater.scheduleAtFixedRate(heartbeatTask, initialDelay, intervalMs, TimeUnit.MILLISECONDS) +heartbeater.scheduleWithFixedDelay( + heartbeatTask, initialDelay, intervalMs, TimeUnit.MILLISECONDS) Review comment: Done ## File path: core/src/main/scala/org/apache/spark/executor/ExecutorMetricsPoller.scala ## @@ -99,7 +99,7 @@ private[spark] class ExecutorMetricsPoller( def start(): Unit = { poller.foreach { exec => val pollingTask: Runnable = () => Utils.logUncaughtExceptions(poll()) - exec.scheduleAtFixedRate(pollingTask, 0L, pollingInterval, TimeUnit.MILLISECONDS) + exec.scheduleWithFixedDelay(pollingTask, 0L, pollingInterval, TimeUnit.MILLISECONDS) } } Review comment: revert this This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30508: Revert "[SPARK-33212][BUILD] Move to shaded clients for Hadoop 3.x profile"
dongjoon-hyun commented on pull request #30508: URL: https://github.com/apache/spark/pull/30508#issuecomment-734011862 Thank you, @HyukjinKwon and @sunchao . This is still testing to check the feasibility to revert~ This PR will wait until next Monday. :) BTW, I'll update the PR title and description. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon opened a new pull request #30510: [SPARK-33565][INFRA][FOLLOW-UP] Keep the test coverage with Python 3.8 in GitHub Actions
HyukjinKwon opened a new pull request #30510: URL: https://github.com/apache/spark/pull/30510 ### What changes were proposed in this pull request? This PR proposes to keep the test coverage with Python 3.8 in GitHub Actions. It is not tested for now in Jenkins due to an env issue. ### Why are the changes needed? To keep the test coverage with Python 3.8 in GitHub Actions. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? GitHub Actions in this build will test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30131: [SPARK-33220][CORE]Use `scheduleWithFixedDelay` to avoid repeated unnecessary scheduling for a short time
SparkQA commented on pull request #30131: URL: https://github.com/apache/spark/pull/30131#issuecomment-734016301 **[Test build #131820 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131820/testReport)** for PR 30131 at commit [`5643ec2`](https://github.com/apache/spark/commit/5643ec267a339877c1004aae9e1ed8ae1f8f6cca). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #30501: [SPARK-33563][PYTHON][R][SQL] Expose inverse hyperbolic trig functions in PySpark and SparkR
HyukjinKwon commented on a change in pull request #30501: URL: https://github.com/apache/spark/pull/30501#discussion_r530719331 ## File path: R/pkg/R/functions.R ## @@ -455,6 +455,19 @@ setMethod("acos", column(jc) }) +#' @details Review comment: @zero323, should we add them into `generic.R` too? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #30501: [SPARK-33563][PYTHON][R][SQL] Expose inverse hyperbolic trig functions in PySpark and SparkR
HyukjinKwon commented on a change in pull request #30501: URL: https://github.com/apache/spark/pull/30501#discussion_r530719552 ## File path: python/pyspark/sql/functions.py ## @@ -220,6 +220,19 @@ def acos(col): return _invoke_function_over_column("acos", col) +def acosh(col): Review comment: Let's also update `spark/python/docs/source/reference/pyspark.sql.rst` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #30501: [SPARK-33563][PYTHON][R][SQL] Expose inverse hyperbolic trig functions in PySpark and SparkR
HyukjinKwon commented on pull request #30501: URL: https://github.com/apache/spark/pull/30501#issuecomment-734018562 Looks fine otherwise. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] rdblue commented on pull request #30452: [SPARK-33509][SQL] List partition by names from a V2 table which supports partition management
rdblue commented on pull request #30452: URL: https://github.com/apache/spark/pull/30452#issuecomment-734021147 > The partition can be a transform like year(ts_col), shall we just partition index in the API instead? If I remember correctly, there should be a schema exposed by the table that describes these. We should get the name from that schema. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] stczwd commented on pull request #30452: [SPARK-33509][SQL] List partition by names from a V2 table which supports partition management
stczwd commented on pull request #30452: URL: https://github.com/apache/spark/pull/30452#issuecomment-734020787 > Since `SupportsPartitionManagement` already have the API `partitionSchema`, which means that the implementations will pick a name for partition transforms, I think it's OK to use `String[]` in the `listPartitionIdentifiers` API parameter. Yeah, I prefer extend the `listPartitionIdentifiers` instead of add new API `listPartitionsByNames`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30403: [SPARK-33448][SQL] Support CACHE/UNCACHE TABLE commands for v2 tables
SparkQA commented on pull request #30403: URL: https://github.com/apache/spark/pull/30403#issuecomment-734025753 **[Test build #131818 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131818/testReport)** for PR 30403 at commit [`d0f49ef`](https://github.com/apache/spark/commit/d0f49eff7db0e5775e6ed769fb23e6a1f7cf203a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30403: [SPARK-33448][SQL] Support CACHE/UNCACHE TABLE commands for v2 tables
SparkQA removed a comment on pull request #30403: URL: https://github.com/apache/spark/pull/30403#issuecomment-734000115 **[Test build #131818 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131818/testReport)** for PR 30403 at commit [`d0f49ef`](https://github.com/apache/spark/commit/d0f49eff7db0e5775e6ed769fb23e6a1f7cf203a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] JQ-Cao commented on pull request #30495: [SPARK-33548][WEBUI] display the jvm peak memory usage on the executor ui
JQ-Cao commented on pull request #30495: URL: https://github.com/apache/spark/pull/30495#issuecomment-734033935 > @JQ-Cao These metrics are already on the executor page. They will show up after select the additional metrics checkbox: > ![image](https://user-images.githubusercontent.com/1097932/100242069-3bc5ab80-2ee9-11eb-8c7d-96c221398fee.png) yes, i find it, thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] JQ-Cao closed pull request #30495: [SPARK-33548][WEBUI] display the jvm peak memory usage on the executor ui
JQ-Cao closed pull request #30495: URL: https://github.com/apache/spark/pull/30495 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on pull request #30486: [SPARK-33530][CORE] Support --archives and spark.archives option natively
HyukjinKwon edited a comment on pull request #30486: URL: https://github.com/apache/spark/pull/30486#issuecomment-734037334 It will be exactly same as `spark.files` and `spark.yarn.dist.files`. To be honest, I am not exactly sure how they will conflict to each other but both work together as far as I know. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #30486: [SPARK-33530][CORE] Support --archives and spark.archives option natively
HyukjinKwon commented on pull request #30486: URL: https://github.com/apache/spark/pull/30486#issuecomment-734037334 It will be exactly same as `spark.files` and `spark.yarn.dist.files`. To be honest, I am not exactly sure how they will conflictto each other but both work together as far as I know. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29490: [SPARK-32668][SQL] HiveGenericUDTF initialize UDTF should use StructObjectInspector method
SparkQA commented on pull request #29490: URL: https://github.com/apache/spark/pull/29490#issuecomment-734041140 **[Test build #131822 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131822/testReport)** for PR 29490 at commit [`c014b75`](https://github.com/apache/spark/commit/c014b759228bf9a0bb6edd2d01f4b53b3c88a92d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29490: [SPARK-32668][SQL] HiveGenericUDTF initialize UDTF should use StructObjectInspector method
SparkQA removed a comment on pull request #29490: URL: https://github.com/apache/spark/pull/29490#issuecomment-734020610 **[Test build #131822 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131822/testReport)** for PR 29490 at commit [`c014b75`](https://github.com/apache/spark/commit/c014b759228bf9a0bb6edd2d01f4b53b3c88a92d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #30421: [SPARK-33474][SQL] Support TypeConstructed partition spec value
cloud-fan commented on pull request #30421: URL: https://github.com/apache/spark/pull/30421#issuecomment-73404 Let's do it step by step, and support typed literal first. We can figure out how to eliminate the string <-> actual value roundtrip in v2 commands later. Let's make sure this feature works correctly: 1. All the literals are supported. Non-literals are forbidden. e.g. `part_col=array(1)` does not create a string value "array(1)". 2. Null literal is supported. We should use null instead of "null" to represent it. 3. If the literal data type doesn't match the partition column data type, we should do type check and cast like normal table insertion. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30131: [SPARK-33220][CORE]Use `scheduleWithFixedDelay` to avoid repeated unnecessary scheduling for a short time
SparkQA commented on pull request #30131: URL: https://github.com/apache/spark/pull/30131#issuecomment-734051712 **[Test build #131820 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131820/testReport)** for PR 30131 at commit [`5643ec2`](https://github.com/apache/spark/commit/5643ec267a339877c1004aae9e1ed8ae1f8f6cca). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30510: [SPARK-33565][INFRA][FOLLOW-UP] Keep the test coverage with Python 3.8 in GitHub Actions
SparkQA commented on pull request #30510: URL: https://github.com/apache/spark/pull/30510#issuecomment-734052772 **[Test build #131821 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131821/testReport)** for PR 30510 at commit [`31755d9`](https://github.com/apache/spark/commit/31755d91b3a8dd7e975a03b5fe01972b5d11fcb8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #30511: [SPARK-33565][INFRA][FOLLOW-UP][3.0] Keep the test coverage with Python 3.8 in GitHub Actions
HyukjinKwon commented on pull request #30511: URL: https://github.com/apache/spark/pull/30511#issuecomment-734052805 Merged to branch-3.0. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #30511: [SPARK-33565][INFRA][FOLLOW-UP][3.0] Keep the test coverage with Python 3.8 in GitHub Actions
HyukjinKwon closed pull request #30511: URL: https://github.com/apache/spark/pull/30511 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30510: [SPARK-33565][INFRA][FOLLOW-UP] Keep the test coverage with Python 3.8 in GitHub Actions
SparkQA removed a comment on pull request #30510: URL: https://github.com/apache/spark/pull/30510#issuecomment-734017295 **[Test build #131821 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131821/testReport)** for PR 30510 at commit [`31755d9`](https://github.com/apache/spark/commit/31755d91b3a8dd7e975a03b5fe01972b5d11fcb8). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30403: [SPARK-33448][SQL] Support CACHE/UNCACHE TABLE commands for v2 tables
cloud-fan commented on a change in pull request #30403: URL: https://github.com/apache/spark/pull/30403#discussion_r530756887 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DropTableExec.scala ## @@ -27,7 +26,6 @@ import org.apache.spark.sql.connector.catalog.{Identifier, Table, TableCatalog} * Physical plan node for dropping a table. */ case class DropTableExec( Review comment: Can we avoid changing it? It's also being changed in https://github.com/apache/spark/pull/30491 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30398: [SPARK-33452][SQL] Support v2 SHOW PARTITIONS
SparkQA removed a comment on pull request #30398: URL: https://github.com/apache/spark/pull/30398#issuecomment-733763807 **[Test build #131783 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131783/testReport)** for PR 30398 at commit [`a4acf40`](https://github.com/apache/spark/commit/a4acf4060c02cfc865a8e560d8cef898353085ed). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30475: [SPARK-33522][SQL] Improve exception messages while handling UnresolvedTableOrView
SparkQA removed a comment on pull request #30475: URL: https://github.com/apache/spark/pull/30475#issuecomment-733899176 **[Test build #131802 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131802/testReport)** for PR 30475 at commit [`68ee277`](https://github.com/apache/spark/commit/68ee277cbde9ecb466b1480af676a2f831e11236). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement
SparkQA removed a comment on pull request #29893: URL: https://github.com/apache/spark/pull/29893#issuecomment-733868465 **[Test build #131796 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131796/testReport)** for PR 29893 at commit [`86d0032`](https://github.com/apache/spark/commit/86d0032ad00f7a1f10e1963070e39a24e640998d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30403: [SPARK-33448][SQL] Support CACHE/UNCACHE TABLE commands for v2 tables
SparkQA removed a comment on pull request #30403: URL: https://github.com/apache/spark/pull/30403#issuecomment-733873132 **[Test build #131799 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131799/testReport)** for PR 30403 at commit [`3c4a0cf`](https://github.com/apache/spark/commit/3c4a0cf4394823800e50d5dbeb0ebef2a1c09e49). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30504: [SPARK-33544][SQL] Optimizer should not insert filter when when explode with CreateArray/CreateMap
cloud-fan commented on a change in pull request #30504: URL: https://github.com/apache/spark/pull/30504#discussion_r530602009 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -873,24 +873,30 @@ object InferFiltersFromGenerate extends Rule[LogicalPlan] { if !e.deterministic || e.children.forall(_.foldable) => generate case generate @ Generate(g, _, false, _, _, _) if canInferFilters(g) => - // Exclude child's constraints to guarantee idempotency - val inferredFilters = ExpressionSet( -Seq( - GreaterThan(Size(g.children.head), Literal(0)), - IsNotNull(g.children.head) -) - ) -- generate.child.constraints - - if (inferredFilters.nonEmpty) { -generate.copy(child = Filter(inferredFilters.reduce(And), generate.child)) - } else { -generate + g.children.head match { +case _: CreateNonEmptyNonNullCollection => Review comment: In general, optimizer rules should be orthogonal. For this case, I think it's better to add a new optimizer rule that removes the `IsNotNull` and size check predicates above `CreateArray` and `CreateMap`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30504: [SPARK-33544][SQL] Optimizer should not insert filter when when explode with CreateArray/CreateMap
cloud-fan commented on a change in pull request #30504: URL: https://github.com/apache/spark/pull/30504#discussion_r530603409 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -873,24 +873,30 @@ object InferFiltersFromGenerate extends Rule[LogicalPlan] { if !e.deterministic || e.children.forall(_.foldable) => generate case generate @ Generate(g, _, false, _, _, _) if canInferFilters(g) => - // Exclude child's constraints to guarantee idempotency - val inferredFilters = ExpressionSet( -Seq( - GreaterThan(Size(g.children.head), Literal(0)), - IsNotNull(g.children.head) -) - ) -- generate.child.constraints - - if (inferredFilters.nonEmpty) { -generate.copy(child = Filter(inferredFilters.reduce(And), generate.child)) - } else { -generate + g.children.head match { Review comment: For safety we can add an `assert(g.children.length == 1)`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Victsm commented on a change in pull request #30312: [SPARK-32917][SHUFFLE][CORE] Adds support for executors to push shuffle blocks after successful map task completion
Victsm commented on a change in pull request #30312: URL: https://github.com/apache/spark/pull/30312#discussion_r530605991 ## File path: core/src/main/scala/org/apache/spark/internal/config/package.scala ## @@ -1992,4 +1992,32 @@ package object config { .version("3.1.0") .doubleConf .createWithDefault(5) + + private[spark] val SHUFFLE_NUM_PUSH_THREADS = +ConfigBuilder("spark.shuffle.push.numPushThreads") + .doc("Specify the number of threads in the block pusher pool. These threads assist " + +"in creating connections and pushing blocks to remote shuffle services when push based " + +"shuffle is enabled. By default, the threadpool size is equal to the number of cores.") Review comment: The number of cores here is referring to Spark executor cores, not the underlying CPU vcores. Might need better clarification here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #30403: [SPARK-33448][SQL] Support CACHE/UNCACHE TABLE commands for v2 tables
imback82 commented on a change in pull request #30403: URL: https://github.com/apache/spark/pull/30403#discussion_r530606256 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/CachedTableSuite.scala ## @@ -421,7 +421,7 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with TestHiveSingleto activateDatabase(db) { sql("REFRESH TABLE default.cachedTable") assertCached( - sql("SELECT * FROM default.cachedTable"), "`default`.`cachedTable`", DISK_ONLY) + sql("SELECT * FROM default.cachedTable"), "cachedTable", DISK_ONLY) Review comment: I guess this is OK as long as `spark.catalog.isCached` works fine since it's just a name of cache builder? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30500: [SPARK-33562][UI] Improve the style of the checkbox in executor page
SparkQA commented on pull request #30500: URL: https://github.com/apache/spark/pull/30500#issuecomment-733913452 **[Test build #131788 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131788/testReport)** for PR 30500 at commit [`8f4700e`](https://github.com/apache/spark/commit/8f4700e5caf9f2e82315ea616842c0dd8f9fe711). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30500: [SPARK-33562][UI] Improve the style of the checkbox in executor page
SparkQA removed a comment on pull request #30500: URL: https://github.com/apache/spark/pull/30500#issuecomment-733835284 **[Test build #131788 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131788/testReport)** for PR 30500 at commit [`8f4700e`](https://github.com/apache/spark/commit/8f4700e5caf9f2e82315ea616842c0dd8f9fe711). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Victsm commented on a change in pull request #30312: [SPARK-32917][SHUFFLE][CORE] Adds support for executors to push shuffle blocks after successful map task completion
Victsm commented on a change in pull request #30312: URL: https://github.com/apache/spark/pull/30312#discussion_r530605991 ## File path: core/src/main/scala/org/apache/spark/internal/config/package.scala ## @@ -1992,4 +1992,32 @@ package object config { .version("3.1.0") .doubleConf .createWithDefault(5) + + private[spark] val SHUFFLE_NUM_PUSH_THREADS = +ConfigBuilder("spark.shuffle.push.numPushThreads") + .doc("Specify the number of threads in the block pusher pool. These threads assist " + +"in creating connections and pushing blocks to remote shuffle services when push based " + +"shuffle is enabled. By default, the threadpool size is equal to the number of cores.") Review comment: The number of cores here is referring to Spark executor cores, not the underlying CPU vcores. Might need better clarification here. @otterc This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30504: [SPARK-33544][SQL] Optimizer should not insert filter when explode with CreateArray/CreateMap
cloud-fan commented on a change in pull request #30504: URL: https://github.com/apache/spark/pull/30504#discussion_r530606734 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -873,24 +873,30 @@ object InferFiltersFromGenerate extends Rule[LogicalPlan] { if !e.deterministic || e.children.forall(_.foldable) => generate case generate @ Generate(g, _, false, _, _, _) if canInferFilters(g) => - // Exclude child's constraints to guarantee idempotency - val inferredFilters = ExpressionSet( -Seq( - GreaterThan(Size(g.children.head), Literal(0)), - IsNotNull(g.children.head) -) - ) -- generate.child.constraints - - if (inferredFilters.nonEmpty) { -generate.copy(child = Filter(inferredFilters.reduce(And), generate.child)) - } else { -generate + g.children.head match { +case _: CreateNonEmptyNonNullCollection => + // we don't need to add filters when creating an array because we know its size + // is > 0 and its not null + generate +case _ => + // Exclude child's constraints to guarantee idempotency + val inferredFilters = ExpressionSet( +Seq( + GreaterThan(Size(g.children.head), Literal(0)), + IsNotNull(g.children.head) Review comment: In general, optimizer rules should be orthogonal. For this case, I think a better idea is to add a new optimizer rule, which optimizes `IsNotNull` and size check expressions above `CreateArray/Map` into true literal. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #30403: [SPARK-33448][SQL] Support CACHE/UNCACHE TABLE commands for v2 tables
imback82 commented on a change in pull request #30403: URL: https://github.com/apache/spark/pull/30403#discussion_r530606256 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/CachedTableSuite.scala ## @@ -421,7 +421,7 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with TestHiveSingleto activateDatabase(db) { sql("REFRESH TABLE default.cachedTable") assertCached( - sql("SELECT * FROM default.cachedTable"), "`default`.`cachedTable`", DISK_ONLY) + sql("SELECT * FROM default.cachedTable"), "cachedTable", DISK_ONLY) Review comment: I guess this is OK as long as `spark.catalog.isCached` works fine since it's just a name of cache builder. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #30403: [SPARK-33448][SQL] Support CACHE/UNCACHE TABLE commands for v2 tables
imback82 commented on a change in pull request #30403: URL: https://github.com/apache/spark/pull/30403#discussion_r530571035 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/CachedTableSuite.scala ## @@ -421,7 +421,7 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with TestHiveSingleto activateDatabase(db) { sql("REFRESH TABLE default.cachedTable") assertCached( - sql("SELECT * FROM default.cachedTable"), "`default`.`cachedTable`", DISK_ONLY) + sql("SELECT * FROM default.cachedTable"), "cachedTable", DISK_ONLY) Review comment: ~~@cloud-fan Looks like we need to resolve catalog/current namespace for this scenario?~~ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30403: [SPARK-33448][SQL] Support CACHE/UNCACHE TABLE commands for v2 tables
cloud-fan commented on a change in pull request #30403: URL: https://github.com/apache/spark/pull/30403#discussion_r530608911 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/CachedTableSuite.scala ## @@ -421,7 +421,7 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with TestHiveSingleto activateDatabase(db) { sql("REFRESH TABLE default.cachedTable") assertCached( - sql("SELECT * FROM default.cachedTable"), "`default`.`cachedTable`", DISK_ONLY) + sql("SELECT * FROM default.cachedTable"), "cachedTable", DISK_ONLY) Review comment: Why is `default.` gone here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Victsm removed a comment on pull request #30480: [SPARK-32921][SHUFFLE][test-maven][test-hadoop2.7] MapOutputTracker extensions to support push-based shuffle
Victsm removed a comment on pull request #30480: URL: https://github.com/apache/spark/pull/30480#issuecomment-733866872 The javadoc issues leading to the build failures are not related to files changed in this patch. ``` [error] /home/runner/work/spark/spark/mllib/target/java/org/apache/spark/mllib/util/MLlibTestSparkContext.java:10:1: error: illegal combination of modifiers: public and protected [error] protected class testImplicits { [error] ^ [error] /home/runner/work/spark/spark/sql/core/target/java/org/apache/spark/sql/UDFSuite.java:4:1: error: modifier static not allowed here [error] static public class MalformedNonPrimitiveFunction implements scala.Function1, scala.Serializable { [error]^ [error] /home/runner/work/spark/spark/sql/core/target/java/org/apache/spark/sql/test/SQLTestUtilsBase.java:19:1: error: illegal combination of modifiers: public and protected [error] protected class testImplicits { [error] ^ [error] /home/runner/work/spark/spark/core/target/java/org/apache/spark/scheduler/SparkListenerStageCompleted.java:3:1: error: illegal combination of modifiers: abstract and static [error] static public abstract R apply (T1 v1) ; [error] ^ [error] /home/runner/work/spark/spark/core/target/java/org/apache/spark/scheduler/SparkListenerStageCompleted.java:3:1: error: cannot find symbol [error] static public abstract R apply (T1 v1) ; [error]^ symbol: class T1 [error] location: class SparkListenerStageCompleted ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30430: [SPARK-33503][SQL] Refactor SortOrder class to allow multiple childrens
SparkQA commented on pull request #30430: URL: https://github.com/apache/spark/pull/30430#issuecomment-733916550 **[Test build #131785 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131785/testReport)** for PR 30430 at commit [`1c38e97`](https://github.com/apache/spark/commit/1c38e979a35dbaafbac013254e3bc35befde7cc8). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30430: [SPARK-33503][SQL] Refactor SortOrder class to allow multiple childrens
SparkQA removed a comment on pull request #30430: URL: https://github.com/apache/spark/pull/30430#issuecomment-733801134 **[Test build #131785 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131785/testReport)** for PR 30430 at commit [`1c38e97`](https://github.com/apache/spark/commit/1c38e979a35dbaafbac013254e3bc35befde7cc8). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29066: [SPARK-23889][SQL] DataSourceV2: required sorting and clustering for writes
AmplabJenkins commented on pull request #29066: URL: https://github.com/apache/spark/pull/29066#issuecomment-733924096 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30494: [SPARK-33551][SQL] Do not use custom shuffle reader for repartition
SparkQA commented on pull request #30494: URL: https://github.com/apache/spark/pull/30494#issuecomment-733927636 **[Test build #131810 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131810/testReport)** for PR 30494 at commit [`97f58f5`](https://github.com/apache/spark/commit/97f58f5744ace0d869f3660eec4f05a48e710e97). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30504: [SPARK-33544][SQL] Optimizer should not insert filter when explode with CreateArray/CreateMap
viirya commented on a change in pull request #30504: URL: https://github.com/apache/spark/pull/30504#discussion_r530608975 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ## @@ -30,6 +30,13 @@ import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types._ import org.apache.spark.unsafe.types.UTF8String +/** + * This trait is to indicate that this is an expression that creates a collection + * that will not be null and will not be empty when it contains children. + * Note that it will be foldable if it doesn't container children. Review comment: "container" -> "contain"? ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ## @@ -30,6 +30,13 @@ import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types._ import org.apache.spark.unsafe.types.UTF8String +/** + * This trait is to indicate that this is an expression that creates a collection + * that will not be null and will not be empty when it contains children. + * Note that it will be foldable if it doesn't container children. + */ +trait CreateNonEmptyNonNullCollection Review comment: Maybe `CreateNonNullCollection`? The complex type creators `CreateArray` and `CreateMap` creates non null collection, but I think we cannot infer it creates non empty collection. Although the comment said it is only non empty if the creator contains children, it cannot be inferred from the trait itself only. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30488: [SPARK-33071][SPARK-33536][SQL] Avoid changing dataset_id of LogicalPlan in join() to not break DetectAmbiguousSelfJoin
AmplabJenkins commented on pull request #30488: URL: https://github.com/apache/spark/pull/30488#issuecomment-733536612 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30470: [SPARK-33495][BUILD] Remove commons-logging.jar's dependency
SparkQA commented on pull request #30470: URL: https://github.com/apache/spark/pull/30470#issuecomment-733536564 **[Test build #131737 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131737/testReport)** for PR 30470 at commit [`bc3cb8b`](https://github.com/apache/spark/commit/bc3cb8b419bb985cdf98aaf172b20c900d40e806). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #30496: [SPARK-33547][SQL] Add usage of typed literal in doc
AngersZh commented on pull request #30496: URL: https://github.com/apache/spark/pull/30496#issuecomment-733536446 FYI @maropu there is many duplicated example between typed literal and corresponding literal, any suggestion? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30492: [SPARK-33545][CORE] Support Fallback Storage during Worker decommission
AmplabJenkins commented on pull request #30492: URL: https://github.com/apache/spark/pull/30492#issuecomment-733536613 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30421: [SPARK-33474][SQL] Support TypeConstructed partition spec value
AmplabJenkins commented on pull request #30421: URL: https://github.com/apache/spark/pull/30421#issuecomment-733536615 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30478: [SPARK-33525][SQL] Update hive-service-rpc to 3.1.2
AmplabJenkins commented on pull request #30478: URL: https://github.com/apache/spark/pull/30478#issuecomment-733536614 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30472: [WIP][SPARK-32221] Avoid possible errors due to incorrect file size or type supplied in spark conf.
SparkQA commented on pull request #30472: URL: https://github.com/apache/spark/pull/30472#issuecomment-733543655 **[Test build #131757 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131757/testReport)** for PR 30472 at commit [`534f2ff`](https://github.com/apache/spark/commit/534f2ffc3aa6f019e9c3f85b5f7d35be92c0c379). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30483: [WIP][SPARK-33449][SQL] Add File Metadata cache support for Parquet and Orc
AmplabJenkins commented on pull request #30483: URL: https://github.com/apache/spark/pull/30483#issuecomment-733544301 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30483: [WIP][SPARK-33449][SQL] Add File Metadata cache support for Parquet and Orc
SparkQA commented on pull request #30483: URL: https://github.com/apache/spark/pull/30483#issuecomment-733544254 **[Test build #131754 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131754/testReport)** for PR 30483 at commit [`8bba51a`](https://github.com/apache/spark/commit/8bba51a2c65393e92a494a9539064d94ad24ec50). * This patch **fails Java style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class TruncateTable(` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28026: [SPARK-31257][SQL] Unify create table syntax
SparkQA commented on pull request #28026: URL: https://github.com/apache/spark/pull/28026#issuecomment-733547825 **[Test build #131736 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131736/testReport)** for PR 28026 at commit [`25ec746`](https://github.com/apache/spark/commit/25ec746753f29acd5e248d03db48211a3876a7c1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.
SparkQA removed a comment on pull request #30472: URL: https://github.com/apache/spark/pull/30472#issuecomment-733543655 **[Test build #131757 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131757/testReport)** for PR 30472 at commit [`534f2ff`](https://github.com/apache/spark/commit/534f2ffc3aa6f019e9c3f85b5f7d35be92c0c379). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.
SparkQA commented on pull request #30472: URL: https://github.com/apache/spark/pull/30472#issuecomment-733550292 **[Test build #131757 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131757/testReport)** for PR 30472 at commit [`534f2ff`](https://github.com/apache/spark/commit/534f2ffc3aa6f019e9c3f85b5f7d35be92c0c379). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30480: [SPARK-32921][SHUFFLE][test-maven][test-hadoop2.7] MapOutputTracker extensions to support push-based shuffle
SparkQA commented on pull request #30480: URL: https://github.com/apache/spark/pull/30480#issuecomment-733551692 **[Test build #131710 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131710/testReport)** for PR 30480 at commit [`cc1c077`](https://github.com/apache/spark/commit/cc1c077cdd3e808f97d2025ffab7545fce58c067). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] luluorta commented on a change in pull request #30289: [SPARK-33141][SQL] Capture SQL configs when creating permanent views
luluorta commented on a change in pull request #30289: URL: https://github.com/apache/spark/pull/30289#discussion_r530134316 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ## @@ -361,11 +379,38 @@ object ViewHelper { } } + /** + * Convert the view query SQL configs in `properties`. + */ + private def generateQuerySQLConfigs(conf: SQLConf): Map[String, String] = { +val modifiedConfs = conf.getAllConfs.filter { case (k, _) => + conf.isModifiable(k) && !isConfigBlacklisted(k) +} +val props = new mutable.HashMap[String, String] +if (modifiedConfs.nonEmpty) { + val confJson = compact(render(JsonProtocol.mapToJson(modifiedConfs))) + props.put(VIEW_QUERY_SQL_CONFIGS, confJson) Review comment: Thanks for pointing this out. Splitting a large value string into small chunks seems a hive specific solution, so I changed to store one config per table property entry, each with a "view.sqlConfig." prefix. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on pull request #30483: [WIP][SPARK-33449][SQL] Add File Metadata cache support for Parquet and Orc
wangyum commented on pull request #30483: URL: https://github.com/apache/spark/pull/30483#issuecomment-733556681 @LuciferYang It would be great if we had some benchmark numbers. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] aminh73 commented on pull request #27380: [SPARK-30669][SS] Introduce AdmissionControl APIs for StructuredStreaming
aminh73 commented on pull request #27380: URL: https://github.com/apache/spark/pull/27380#issuecomment-733557893 We need to use `maxOffsetsPerTrigger` in the Kafka source with `Trigger.Once()` but it seems reads `allAvailable` in spark 3. Is there a way for achieving rate limit in this situation? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #30312: [SPARK-32917][SHUFFLE][CORE] Adds support for executors to push shuffle blocks after successful map task completion
Ngone51 commented on a change in pull request #30312: URL: https://github.com/apache/spark/pull/30312#discussion_r530141471 ## File path: core/src/main/scala/org/apache/spark/shuffle/ShuffleBlockPusher.scala ## @@ -0,0 +1,462 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.shuffle + +import java.io.File +import java.net.ConnectException +import java.nio.ByteBuffer +import java.util.concurrent.ExecutorService + +import scala.collection.mutable.{ArrayBuffer, HashMap, HashSet, Queue} + +import com.google.common.base.Throwables + +import org.apache.spark.{ShuffleDependency, SparkConf, SparkEnv} +import org.apache.spark.annotation.Since +import org.apache.spark.internal.Logging +import org.apache.spark.internal.config._ +import org.apache.spark.launcher.SparkLauncher +import org.apache.spark.network.buffer.{FileSegmentManagedBuffer, ManagedBuffer, NioManagedBuffer} +import org.apache.spark.network.netty.SparkTransportConf +import org.apache.spark.network.shuffle.BlockFetchingListener +import org.apache.spark.network.shuffle.ErrorHandler.BlockPushErrorHandler +import org.apache.spark.network.util.TransportConf +import org.apache.spark.shuffle.ShuffleBlockPusher._ +import org.apache.spark.storage.{BlockId, BlockManagerId, ShufflePushBlockId} +import org.apache.spark.util.{ThreadUtils, Utils} + +/** + * Used for pushing shuffle blocks to remote shuffle services when push shuffle is enabled. + * When push shuffle is enabled, it is created after the shuffle writer finishes writing the shuffle + * file and initiates the block push process. + * + * @param dataFile mapper generated shuffle data file + * @param partitionLengths array of shuffle block size so we can tell shuffle block + * boundaries within the shuffle file + * @param dep shuffle dependency to get shuffle ID and the location of remote shuffle + * services to push local shuffle blocks + * @param partitionId map index of the shuffle map task + * @param conf spark configuration + */ +@Since("3.1.0") +private[spark] class ShuffleBlockPusher( +dataFile: File, +partitionLengths: Array[Long], +dep: ShuffleDependency[_, _, _], +partitionId: Int, +conf: SparkConf) extends Logging { Review comment: Pass these fields to `initiateBlockPush()` should be enough? ## File path: core/src/test/scala/org/apache/spark/shuffle/ShuffleBlockPusherSuite.scala ## @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.shuffle + +import java.io.File +import java.net.ConnectException +import java.nio.ByteBuffer + +import scala.collection.mutable.ArrayBuffer + +import org.mockito.{Mock, MockitoAnnotations} +import org.mockito.Answers.RETURNS_SMART_NULLS +import org.mockito.ArgumentMatchers.any +import org.mockito.Mockito._ +import org.mockito.invocation.InvocationOnMock +import org.scalatest.BeforeAndAfterEach + +import org.apache.spark._ +import org.apache.spark.network.buffer.ManagedBuffer +import org.apache.spark.network.shuffle.{BlockFetchingListener, BlockStoreClient} +import org.apache.spark.network.shuffle.ErrorHandler.BlockPushErrorHandler +import org.apache.spark.network.util.TransportConf +import org.apache.spark.serializer.JavaSerializer +import org.apache.spark.storage._ + +class ShuffleBlockPusherSuite extends SparkFunSuite with
[GitHub] [spark] zero323 commented on pull request #30382: [SPARK-33457][PYTHON] Adjust mypy configuration
zero323 commented on pull request #30382: URL: https://github.com/apache/spark/pull/30382#issuecomment-733566086 Thanks @Fokko and @HyukjinKwon! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.
SparkQA commented on pull request #30472: URL: https://github.com/apache/spark/pull/30472#issuecomment-733568741 **[Test build #131761 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131761/testReport)** for PR 30472 at commit [`2d676cd`](https://github.com/apache/spark/commit/2d676cdeaed89ffe89f00de41350e51122b559d1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30483: [WIP][SPARK-33449][SQL] Add File Metadata cache support for Parquet and Orc
SparkQA commented on pull request #30483: URL: https://github.com/apache/spark/pull/30483#issuecomment-733568636 **[Test build #131760 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131760/testReport)** for PR 30483 at commit [`3e2db1a`](https://github.com/apache/spark/commit/3e2db1a1cdec3df84e9ceb9cc64860b7f88c6720). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30486: [SPARK-33530][CORE] Support --archives and spark.archives option natively
SparkQA commented on pull request #30486: URL: https://github.com/apache/spark/pull/30486#issuecomment-733568538 **[Test build #131759 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131759/testReport)** for PR 30486 at commit [`15d8ed5`](https://github.com/apache/spark/commit/15d8ed51d99e57403aae3272d9975b96e7735ee2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30493: [SPARK-33549][SQL] Remove configuration spark.sql.legacy.allowCastNumericToTimestamp
AmplabJenkins commented on pull request #30493: URL: https://github.com/apache/spark/pull/30493#issuecomment-733568871 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30484: [SPARK-33532][SQL] Remove unreachable branch in SpecificParquetRecordReaderBase.initialize method
AmplabJenkins commented on pull request #30484: URL: https://github.com/apache/spark/pull/30484#issuecomment-733572799 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.
SparkQA commented on pull request #30472: URL: https://github.com/apache/spark/pull/30472#issuecomment-733577201 **[Test build #131761 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131761/testReport)** for PR 30472 at commit [`2d676cd`](https://github.com/apache/spark/commit/2d676cdeaed89ffe89f00de41350e51122b559d1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30440: [SPARK-33496][SQL]Improve error message of ANSI explicit cast
SparkQA commented on pull request #30440: URL: https://github.com/apache/spark/pull/30440#issuecomment-733576716 **[Test build #131764 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131764/testReport)** for PR 30440 at commit [`bb5b219`](https://github.com/apache/spark/commit/bb5b219e3a337a4dbdf6923c19734c2643acd1fa). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30212: [SPARK-33308][SQL] Refract current grouping analytics
SparkQA commented on pull request #30212: URL: https://github.com/apache/spark/pull/30212#issuecomment-733576983 **[Test build #131730 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131730/testReport)** for PR 30212 at commit [`516afc5`](https://github.com/apache/spark/commit/516afc56aa0c818ffb2d5d0915025a3a6387c8ff). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30212: [SPARK-33308][SQL] Refract current grouping analytics
AmplabJenkins commented on pull request #30212: URL: https://github.com/apache/spark/pull/30212#issuecomment-733578781 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] kiszk commented on pull request #30243: [SPARK-33335][SQL] Support `array_contains_array` func
kiszk commented on pull request #30243: URL: https://github.com/apache/spark/pull/30243#issuecomment-733582819 LGTM from code generation perstpective @maropu @cloud-fan @HyukjinKwon @ueshin Any other comments? In particular, regarding the specification of the function This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu opened a new pull request #30496: [SPARK-33547][SQL] Add usage of typed literal in doc
AngersZh opened a new pull request #30496: URL: https://github.com/apache/spark/pull/30496 ### What changes were proposed in this pull request? According to https://github.com/apache/spark/pull/30421#discussion_r530024114 Add typed literal in doc ### Why are the changes needed? Make user clear about usage of typed literal ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? NOT need This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #30496: [SPARK-33547][SQL] Add usage of typed literal in doc
AngersZh commented on a change in pull request #30496: URL: https://github.com/apache/spark/pull/30496#discussion_r530172194 ## File path: docs/sql-ref-literals.md ## @@ -21,14 +21,74 @@ license: | A literal (also known as a constant) represents a fixed data value. Spark SQL supports the following literals: + * [Typed Literal](#typed-literal) * [String Literal](#string-literal) * [Binary Literal](#binary-literal) * [Null Literal](#null-literal) * [Boolean Literal](#boolean-literal) * [Numeric Literal](#numeric-literal) + * [Timestamp Literal](#timestamp-literal) Review comment: Missed in current doc This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30470: [SPARK-33495][BUILD] Remove commons-logging.jar's dependency
SparkQA removed a comment on pull request #30470: URL: https://github.com/apache/spark/pull/30470#issuecomment-733468984 **[Test build #131737 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131737/testReport)** for PR 30470 at commit [`bc3cb8b`](https://github.com/apache/spark/commit/bc3cb8b419bb985cdf98aaf172b20c900d40e806). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30421: [SPARK-33474][SQL] Support TypeConstructed partition spec value
AmplabJenkins removed a comment on pull request #30421: URL: https://github.com/apache/spark/pull/30421#issuecomment-733536615 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30492: [SPARK-33545][CORE] Support Fallback Storage during Worker decommission
SparkQA commented on pull request #30492: URL: https://github.com/apache/spark/pull/30492#issuecomment-733536865 **[Test build #131753 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131753/testReport)** for PR 30492 at commit [`025d9aa`](https://github.com/apache/spark/commit/025d9aadc49663521cb558237ee6b33a8f21f1e6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30440: [SPARK-33496][SQL]Improve error message of ANSI explicit cast
SparkQA commented on pull request #30440: URL: https://github.com/apache/spark/pull/30440#issuecomment-733537202 **[Test build #131756 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131756/testReport)** for PR 30440 at commit [`e762162`](https://github.com/apache/spark/commit/e762162311e04c20bb06f9a4735514547050b832). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30478: [SPARK-33525][SQL] Update hive-service-rpc to 3.1.2
AmplabJenkins removed a comment on pull request #30478: URL: https://github.com/apache/spark/pull/30478#issuecomment-733513051 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30440: [SPARK-33496][SQL]Improve error message of ANSI explicit cast
AmplabJenkins removed a comment on pull request #30440: URL: https://github.com/apache/spark/pull/30440#issuecomment-733510495 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30478: [SPARK-33525][SQL] Update hive-service-rpc to 3.1.2
SparkQA commented on pull request #30478: URL: https://github.com/apache/spark/pull/30478#issuecomment-733537096 **[Test build #131755 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131755/testReport)** for PR 30478 at commit [`43d90ca`](https://github.com/apache/spark/commit/43d90cafaf0aa4c8c4a355070d5c71008f6f3ea9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30470: [SPARK-33495][BUILD] Remove commons-logging.jar's dependency
AmplabJenkins commented on pull request #30470: URL: https://github.com/apache/spark/pull/30470#issuecomment-733537538 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30496: [SPARK-33547][SQL] Add usage of typed literal in doc
SparkQA commented on pull request #30496: URL: https://github.com/apache/spark/pull/30496#issuecomment-733537090 **[Test build #131752 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131752/testReport)** for PR 30496 at commit [`e3de389`](https://github.com/apache/spark/commit/e3de389280d41c1f8c30569c512a71abff90799b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30488: [SPARK-33071][SPARK-33536][SQL] Avoid changing dataset_id of LogicalPlan in join() to not break DetectAmbiguousSelfJoin
AmplabJenkins removed a comment on pull request #30488: URL: https://github.com/apache/spark/pull/30488#issuecomment-733536612 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org