[GitHub] [spark] SparkQA commented on pull request #32722: [SPARK-35586][[K8S][TESTS] Set a default value for spark.kubernetes.test.sparkTgz in pom.xml for Kubernetes integration tests
SparkQA commented on pull request #32722: URL: https://github.com/apache/spark/pull/32722#issuecomment-851803720 **[Test build #139132 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139132/testReport)** for PR 32722 at commit [`f2d9f30`](https://github.com/apache/spark/commit/f2d9f30e2f84fcc3fd692daf31934b568134a56c). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions
AmplabJenkins removed a comment on pull request #32718: URL: https://github.com/apache/spark/pull/32718#issuecomment-851802395 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43647/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
AmplabJenkins removed a comment on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-851802400 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139125/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32720: [SPARK-35576][SQL][3.1] Redact the sensitive info in the result of Set command
AmplabJenkins removed a comment on pull request #32720: URL: https://github.com/apache/spark/pull/32720#issuecomment-851802399 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43651/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32693: [SPARK-35556][SQL][TESTS] Avoid log NoSuchMethodError when HiveClientImpl.state close
AmplabJenkins removed a comment on pull request #32693: URL: https://github.com/apache/spark/pull/32693#issuecomment-851802397 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43650/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32506: [SPARK-35374][SQL] Add string-to-number conversion support to JacksonParser
AmplabJenkins removed a comment on pull request #32506: URL: https://github.com/apache/spark/pull/32506#issuecomment-851802398 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43648/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
AmplabJenkins commented on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-851802400 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139125/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32720: [SPARK-35576][SQL][3.1] Redact the sensitive info in the result of Set command
AmplabJenkins commented on pull request #32720: URL: https://github.com/apache/spark/pull/32720#issuecomment-851802399 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43651/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions
AmplabJenkins commented on pull request #32718: URL: https://github.com/apache/spark/pull/32718#issuecomment-851802395 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43647/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32693: [SPARK-35556][SQL][TESTS] Avoid log NoSuchMethodError when HiveClientImpl.state close
AmplabJenkins commented on pull request #32693: URL: https://github.com/apache/spark/pull/32693#issuecomment-851802397 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43650/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32506: [SPARK-35374][SQL] Add string-to-number conversion support to JacksonParser
AmplabJenkins commented on pull request #32506: URL: https://github.com/apache/spark/pull/32506#issuecomment-851802398 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43648/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #32702: [SPARK-35565][SS] Add config for ignoring metadata directory of FileStreamSink
viirya commented on pull request #32702: URL: https://github.com/apache/spark/pull/32702#issuecomment-851802006 Okay, sounds good. Let me change to using a source option. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32720: [SPARK-35576][SQL][3.1] Redact the sensitive info in the result of Set command
SparkQA commented on pull request #32720: URL: https://github.com/apache/spark/pull/32720#issuecomment-851801961 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43651/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak opened a new pull request #32722: [SPARK-35586][[K8S][TESTS] Set a default value for spark.kubernetes.test.sparkTgz in pom.xml for Kubernetes integration tests
sarutak opened a new pull request #32722: URL: https://github.com/apache/spark/pull/32722 ### What changes were proposed in this pull request? This PR set a default value for `spark.kubernetes.test.sparkTgz` in `kubernetes/integration-tests/pom.xml` for Kubernetes integration tests. ### Why are the changes needed? In the current master, running the integration tests with the following command will fail because there is no default value set for the property. ``` build/mvn -Dspark.kubernetes.test.namespace=default -Pkubernetes -Pkubernetes-integration-tests -Psparkr -pl resource-managers/kubernetes/integration-tests integration-test ``` ``` + mkdir -p /home/kou/work/oss/spark/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked + tar -xzvf --test-exclude-tags --strip-components=1 -C /home/kou/work/oss/spark/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked tar (child): --test-exclude-tags: Cannot open: No such file or directory tar (child): Error is not recoverable: exiting now tar: Child returned status 2 tar: Error is not recoverable: exiting now [ERROR] Command execution failed. ``` According to `setup-integration-test-env.sh`, `N/A` is intended as the default value so this PR choose it. ``` SPARK_TGZ="N/A" MVN="$TEST_ROOT_DIR/build/mvn" EXCLUDE_TAGS="" ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Build and tests successfully finish with the command shown above. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32693: [SPARK-35556][SQL][TESTS] Avoid log NoSuchMethodError when HiveClientImpl.state close
SparkQA commented on pull request #32693: URL: https://github.com/apache/spark/pull/32693#issuecomment-851797476 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43650/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sigmod opened a new pull request #32721: [WIP][SPARK-35077][SQL] Migrate to transformWithPruning for leftover optimizer rules
sigmod opened a new pull request #32721: URL: https://github.com/apache/spark/pull/32721 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32506: [SPARK-35374][SQL] Add string-to-number conversion support to JacksonParser
SparkQA commented on pull request #32506: URL: https://github.com/apache/spark/pull/32506#issuecomment-851795183 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43648/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
SparkQA removed a comment on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-851734216 **[Test build #139125 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139125/testReport)** for PR 32686 at commit [`8252a6a`](https://github.com/apache/spark/commit/8252a6a93a05c97ed47e3174be76fe1aeb3f6567). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
SparkQA commented on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-851794843 **[Test build #139125 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139125/testReport)** for PR 32686 at commit [`8252a6a`](https://github.com/apache/spark/commit/8252a6a93a05c97ed47e3174be76fe1aeb3f6567). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions
SparkQA commented on pull request #32718: URL: https://github.com/apache/spark/pull/32718#issuecomment-851792608 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43647/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32720: [SPARK-35576][SQL][3.1] Redact the sensitive info in the result of Set command
SparkQA commented on pull request #32720: URL: https://github.com/apache/spark/pull/32720#issuecomment-851790068 **[Test build #139131 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139131/testReport)** for PR 32720 at commit [`66536fb`](https://github.com/apache/spark/commit/66536fb5b2d8f1499bd4bdb5a9a31435f637bab8). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #32712: [SPARK-35576][SQL] Redact the sensitive info in the result of Set command
gengliangwang commented on pull request #32712: URL: https://github.com/apache/spark/pull/32712#issuecomment-851789021 @dongjoon-hyun Thanks for merging. I have opened a cherry-pick PR in https://github.com/apache/spark/pull/32720 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang opened a new pull request #32720: [SPARK-35576][SQL][3.1] Redact the sensitive info in the result of Set command
gengliangwang opened a new pull request #32720: URL: https://github.com/apache/spark/pull/32720 ### What changes were proposed in this pull request? Currently, the results of following SQL queries are not redacted: ``` SET [KEY]; SET; ``` For example: ``` scala> spark.sql("set javax.jdo.option.ConnectionPassword=123456").show() ++--+ | key| value| ++--+ |javax.jdo.option|123456| ++--+ scala> spark.sql("set javax.jdo.option.ConnectionPassword").show() ++--+ | key| value| ++--+ |javax.jdo.option|123456| ++--+ scala> spark.sql("set").show() +++ | key| value| +++ |javax.jdo.option| 123456| ``` We should hide the sensitive information and redact the query output. ### Why are the changes needed? Security. ### Does this PR introduce _any_ user-facing change? Yes, the sensitive information in the output of Set commands are redacted ### How was this patch tested? Unit test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #32709: [SPARK-35573][R][TESTS] Make SparkR tests pass with R 4.1+
viirya commented on pull request #32709: URL: https://github.com/apache/spark/pull/32709#issuecomment-851788514 Cool! Thanks @HyukjinKwon! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #32653: [SPARK-35312][SS] Introduce new Option in Kafka source to specify minimum number of records to read per trigger
HeartSaVioR commented on a change in pull request #32653: URL: https://github.com/apache/spark/pull/32653#discussion_r642765673 ## File path: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala ## @@ -139,26 +156,78 @@ private[kafka010] class KafkaSource( override def latestOffset(startOffset: streaming.Offset, limit: ReadLimit): streaming.Offset = { // Make sure initialPartitionOffsets is initialized initialPartitionOffsets - -val latest = kafkaReader.fetchLatestOffsets( - currentPartitionOffsets.orElse(Some(initialPartitionOffsets))) +val currentOffsets = currentPartitionOffsets.orElse(Some(initialPartitionOffsets)) +val latest = kafkaReader.fetchLatestOffsets(currentOffsets) +var skipBatch = false Review comment: Same here as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #32653: [SPARK-35312][SS] Introduce new Option in Kafka source to specify minimum number of records to read per trigger
HeartSaVioR commented on a change in pull request #32653: URL: https://github.com/apache/spark/pull/32653#discussion_r642765440 ## File path: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala ## @@ -95,15 +114,62 @@ private[kafka010] class KafkaMicroBatchStream( override def latestOffset(start: Offset, readLimit: ReadLimit): Offset = { val startPartitionOffsets = start.asInstanceOf[KafkaSourceOffset].partitionToOffsets latestPartitionOffsets = kafkaOffsetReader.fetchLatestOffsets(Some(startPartitionOffsets)) +var skipBatch = false Review comment: Now I see duplicated codes around due to branches handling each type, including CompositeReadLimit which handles both lower and upper hence having same code. How about changing like below: ``` val limits: Seq[ReadLimit] = readLimit match { case rows: CompositeReadLimit => rows.getReadLimits case rows => Seq(rows) } val offsets = if (limits.exists(_.isInstanceOf[ReadAllAvailable])) { // ReadAllAvailable has the highest priority latestPartitionOffsets } else { val lowerLimit = limits.find(_.isInstanceOf[ReadMinRows]).map(_.asInstanceOf[ReadMinRows]) val upperLimit = limits.find(_.isInstanceOf[ReadMaxRows]).map(_.asInstanceOf[ReadMaxRows]) lowerLimit.flatMap { limit => // checking if we need to skip batch based on minOffsetPerTrigger criteria val skipBatch = delayBatch( limit.minRows, latestPartitionOffsets, startPartitionOffsets, limit.maxTriggerDelayMs) if (skipBatch) { logDebug( s"Delaying batch as number of records available is less than minOffsetsPerTrigger") Some(startPartitionOffsets) } else { None } }.orElse { // checking if we need to adjust a range of offsets based on maxOffsetPerTrigger criteria upperLimit.map { limit => rateLimit(limit.maxRows(), startPartitionOffsets, latestPartitionOffsets) } }.getOrElse(latestPartitionOffsets) } endPartitionOffsets = KafkaSourceOffset(offsets) endPartitionOffsets ``` this would require less change when we want to add more read limits in the future. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32693: [SPARK-35556][SQL][TESTS] Avoid log NoSuchMethodError when HiveClientImpl.state close
SparkQA commented on pull request #32693: URL: https://github.com/apache/spark/pull/32693#issuecomment-851785773 **[Test build #139130 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139130/testReport)** for PR 32693 at commit [`698bea5`](https://github.com/apache/spark/commit/698bea5d49986f955c0736bff59ceb0c7c6051e8). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
AmplabJenkins removed a comment on pull request #32658: URL: https://github.com/apache/spark/pull/32658#issuecomment-851784991 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43646/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32719: [SPARK-35584][TESTS] Increase the timeout in FallbackStorageSuite
AmplabJenkins commented on pull request #32719: URL: https://github.com/apache/spark/pull/32719#issuecomment-851784992 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43649/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
AmplabJenkins commented on pull request #32658: URL: https://github.com/apache/spark/pull/32658#issuecomment-851784991 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43646/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32719: [SPARK-35584][TESTS] Increase the timeout in FallbackStorageSuite
AmplabJenkins removed a comment on pull request #32719: URL: https://github.com/apache/spark/pull/32719#issuecomment-851784992 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43649/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32506: [SPARK-35374][SQL] Add string-to-number conversion support to JacksonParser
SparkQA commented on pull request #32506: URL: https://github.com/apache/spark/pull/32506#issuecomment-851784737 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43648/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32719: [SPARK-35584][TESTS] Increase the timeout in FallbackStorageSuite
SparkQA commented on pull request #32719: URL: https://github.com/apache/spark/pull/32719#issuecomment-851784608 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43649/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions
SparkQA commented on pull request #32718: URL: https://github.com/apache/spark/pull/32718#issuecomment-851782661 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43647/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang closed pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
gengliangwang closed pull request #32686: URL: https://github.com/apache/spark/pull/32686 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
gengliangwang commented on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-851781327 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions
HyukjinKwon commented on a change in pull request #32718: URL: https://github.com/apache/spark/pull/32718#discussion_r642758972 ## File path: sql/core/src/test/scala/org/apache/spark/sql/MiscFunctionsSuite.scala ## @@ -40,6 +40,12 @@ class MiscFunctionsSuite extends QueryTest with SharedSparkSession { Row(SPARK_VERSION_SHORT + " " + SPARK_REVISION)) assert(df.schema.fieldNames === Seq("version()")) } + + test("get current_user and session_user in normal spark apps") { Review comment: shall we add the JIRA prefix? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32709: [SPARK-35573][R][TESTS] Make SparkR tests pass with R 4.1+
HyukjinKwon commented on pull request #32709: URL: https://github.com/apache/spark/pull/32709#issuecomment-851778790 CRAN was my env issue. Now the tests and CRAN check should work with R 4.1+ too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #32714: [SPARK-35581][SQL] Support special datetime values in typed literals only
yaooqinn commented on a change in pull request #32714: URL: https://github.com/apache/spark/pull/32714#discussion_r642757369 ## File path: docs/sql-migration-guide.md ## @@ -91,6 +91,8 @@ license: | - In Spark 3.2, `CREATE TABLE AS SELECT` with non-empty `LOCATION` will throw `AnalysisException`. To restore the behavior before Spark 3.2, you can set `spark.sql.legacy.allowNonEmptyLocationInCTAS` to `true`. + - In Spark 3.2, the special datetime values such as `epoch`, `today`, `yesterday`, `tomorrow` and `now` are supported in typed literals only, for instance `select timestamp'now'`. In Spark 3.1 and earlier, such special values are supported in any casts of strings to dates/timestamps. To restore the behavior before Spark 3.2, you should preprocess string columns and convert the strings to desired timestamps explicitly using UDF for instance. Review comment: In Spark 3.2, ~the~ special datetime values. in typed literals only, for instance **(add',')** `select timestamp'now'`. In Spark 3.1 and ~earlier~ (3.0?) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on a change in pull request #32602: [SPARK-35455][SQL] Unify empty relation optimization between normal and AQE optimizer
ulysses-you commented on a change in pull request #32602: URL: https://github.com/apache/spark/pull/32602#discussion_r642757227 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala ## @@ -27,7 +28,9 @@ import org.apache.spark.util.Utils */ class AQEOptimizer(conf: SQLConf) extends RuleExecutor[LogicalPlan] { private val defaultBatches = Seq( -Batch("Eliminate Unnecessary Join", Once, EliminateUnnecessaryJoin), +Batch("Propagate Empty Relations", Once, + AQEPropagateEmptyRelation, + UpdateAttributeNullability), Review comment: ah I see, will do this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
SparkQA commented on pull request #32658: URL: https://github.com/apache/spark/pull/32658#issuecomment-851775047 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43646/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32719: [SPARK-34059][TESTS]Increase the timeout in FallbackStorageSuite
HyukjinKwon commented on pull request #32719: URL: https://github.com/apache/spark/pull/32719#issuecomment-851771579 seems like the JIRA number is wrong in the title -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32719: [SPARK-34059][TESTS]Increase the timeout in FallbackStorageSuite
SparkQA commented on pull request #32719: URL: https://github.com/apache/spark/pull/32719#issuecomment-851769807 **[Test build #139129 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139129/testReport)** for PR 32719 at commit [`941ee9c`](https://github.com/apache/spark/commit/941ee9c1d04f9951598ed8bfb93b5bdaa2819e18). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Yikun opened a new pull request #32719: [SPARK-34059][TESTS]Increase the timeout in FallbackStorageSuite
Yikun opened a new pull request #32719: URL: https://github.com/apache/spark/pull/32719 ### What changes were proposed in this pull request? ``` - Upload multi stages *** FAILED *** {{ The code passed to eventually never returned normally. Attempted 20 times over 10.011176743 seconds. Last failure message: fallbackStorage.exists(0, file) was false. (FallbackStorageSuite.scala:243)}} ``` The error like above was raised in aarch64 randomly and also in github action test[1][2]. [1] https://github.com/apache/spark/actions/runs/489319612 [2]https://github.com/apache/spark/actions/runs/479317320 ### Why are the changes needed? timeout is too short, need to increase to let test case complete. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? build/mvn test -Dtest=none -DwildcardSuites=org.apache.spark.storage.FallbackStorageSuite -pl :spark-core_2.12 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32602: [SPARK-35455][SQL] Unify empty relation optimization between normal and AQE optimizer
cloud-fan commented on a change in pull request #32602: URL: https://github.com/apache/spark/pull/32602#discussion_r642749109 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala ## @@ -27,7 +28,9 @@ import org.apache.spark.util.Utils */ class AQEOptimizer(conf: SQLConf) extends RuleExecutor[LogicalPlan] { private val defaultBatches = Seq( -Batch("Eliminate Unnecessary Join", Once, EliminateUnnecessaryJoin), +Batch("Propagate Empty Relations", Once, + AQEPropagateEmptyRelation, + UpdateAttributeNullability), Review comment: It's a bit different: ``` Project Shuffle Stage ``` For the above case, we don't want to optimize it as the benefit is too small (removing a shuffle stage may cause regression) ``` Project Sort Shuffle Stage ``` For the above case, we will optimize Sort -> Shuffle Stage to empty relation first. Then it makes sense to optimize further and optimize out project, as the shuffle stage is already gone. So adding `ConvertToLocalRelation` looks the best solution here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32506: [SPARK-35374][SQL] Add string-to-number conversion support to JacksonParser
SparkQA commented on pull request #32506: URL: https://github.com/apache/spark/pull/32506#issuecomment-851767556 **[Test build #139128 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139128/testReport)** for PR 32506 at commit [`a361275`](https://github.com/apache/spark/commit/a36127512f4f5eadd9f0b9c9f9b0c3ef90b155e3). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions
SparkQA commented on pull request #32718: URL: https://github.com/apache/spark/pull/32718#issuecomment-851767500 **[Test build #139127 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139127/testReport)** for PR 32718 at commit [`ae337c1`](https://github.com/apache/spark/commit/ae337c13b7648c2011976eb8bef4fd8e67fcf44d). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32602: [SPARK-35455][SQL] Unify empty relation optimization between normal and AQE optimizer
cloud-fan commented on a change in pull request #32602: URL: https://github.com/apache/spark/pull/32602#discussion_r642749109 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala ## @@ -27,7 +28,9 @@ import org.apache.spark.util.Utils */ class AQEOptimizer(conf: SQLConf) extends RuleExecutor[LogicalPlan] { private val defaultBatches = Seq( -Batch("Eliminate Unnecessary Join", Once, EliminateUnnecessaryJoin), +Batch("Propagate Empty Relations", Once, + AQEPropagateEmptyRelation, + UpdateAttributeNullability), Review comment: It's a bit different: ``` Project Shuffle Stage ``` For the above case, we don't want to optimize it as the benefit is too small ``` Project Sort Shuffle Stage ``` For the above case, we will optimize Sort -> Shuffle Stage to empty relation first. Then it makes sense to optimize further and optimize out project, as the shuffle stage is already gone. So adding `ConvertToLocalRelation` looks the best solution here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions
yaooqinn commented on pull request #32718: URL: https://github.com/apache/spark/pull/32718#issuecomment-851766836 cc @cloud-fan @wangyum @maropu thanks very much -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn opened a new pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions
yaooqinn opened a new pull request #32718: URL: https://github.com/apache/spark/pull/32718 ### What changes were proposed in this pull request? Currently, we do not have a suitable definition of the `user` concept in Spark. We only have a `sparkUser` app widely but do not support identifier or retrieve the user information from a session in STS or a runtime query execution. These SQL functions are very popular and supported by plenty of other modern or old school databases, and also compliance. This PR add `current_user()` and `session_user()` as SQL functions. And, they are the same. In this PR, we add these functions w/o ambiguity. 1. For a normal single-threaded Spark application, clearly the `sparkUser` is always equivalent to `current_user()` and `session_user()`. 2. For a multi-threaded Spark application, e.g. Spark thrift server, we use a `ThreadLocal` variable to store the client-side user(after authenticated) before running the query and retrieve it in the parser. ### Why are the changes needed? These SQL functions are very popular and supported by plenty of other modern or old school databases, and also compliance. ### Does this PR introduce _any_ user-facing change? yes, added `current_user()` and `session_user()` as SQL functions ### How was this patch tested? new tests in thrift server and sql/catalyst -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on a change in pull request #32602: [SPARK-35455][SQL] Unify empty relation optimization between normal and AQE optimizer
ulysses-you commented on a change in pull request #32602: URL: https://github.com/apache/spark/pull/32602#discussion_r642747242 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala ## @@ -27,7 +28,9 @@ import org.apache.spark.util.Utils */ class AQEOptimizer(conf: SQLConf) extends RuleExecutor[LogicalPlan] { private val defaultBatches = Seq( -Batch("Eliminate Unnecessary Join", Once, EliminateUnnecessaryJoin), +Batch("Propagate Empty Relations", Once, + AQEPropagateEmptyRelation, + UpdateAttributeNullability), Review comment: yeah, I noticed it. We can put it so that we can propagate empty through `project/filter`. like such case: ``` Aggregate Project Join Shuffle ``` But it need to isolate normal and AQE due to `transformWithPruning`. Otherhand I feel that it's similar if we just let `AQEPropagateEmptyRelation` support propagate `project/filter`. and the later is simpler. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
SparkQA commented on pull request #32658: URL: https://github.com/apache/spark/pull/32658#issuecomment-851763525 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43646/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32715: [SPARK-35577][TESTS] Allow to log container output for docker integration tests
HyukjinKwon commented on pull request #32715: URL: https://github.com/apache/spark/pull/32715#issuecomment-851751136 Looks fine. cc @maropu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
HyukjinKwon closed pull request #32658: URL: https://github.com/apache/spark/pull/32658 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
HyukjinKwon commented on pull request #32658: URL: https://github.com/apache/spark/pull/32658#issuecomment-851750789 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32558: [SPARK-34953][CORE][SQL] Add the code change for adding the DateType in the infer schema while reading in CSV and JSON
HyukjinKwon commented on pull request #32558: URL: https://github.com/apache/spark/pull/32558#issuecomment-851750660 Oh I meant this: https://github.com/apache/spark/blob/master/python/pyspark/sql/readwriter.py#L342-L350 These options are listed up as a parameter in Python side specifically. For CSV documentation, it's merged at https://github.com/apache/spark/pull/32658 so you could add the option in that page. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
SparkQA commented on pull request #32658: URL: https://github.com/apache/spark/pull/32658#issuecomment-851749314 **[Test build #139126 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139126/testReport)** for PR 32658 at commit [`f55a2fa`](https://github.com/apache/spark/commit/f55a2fa22efd4ac7611d0483b82dd73596bccce7). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
AmplabJenkins removed a comment on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-851748863 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43645/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
AmplabJenkins commented on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-851748863 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43645/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #32716: [SPARK-35578][SQL][TEST] Add a test case for a bug in janino
HyukjinKwon closed pull request #32716: URL: https://github.com/apache/spark/pull/32716 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32716: [SPARK-35578][SQL][TEST] Add a test case for a bug in janino
HyukjinKwon commented on pull request #32716: URL: https://github.com/apache/spark/pull/32716#issuecomment-851748664 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32709: [SPARK-35573][R][TESTS] Make SparkR tests pass with R 4.1+
HyukjinKwon commented on pull request #32709: URL: https://github.com/apache/spark/pull/32709#issuecomment-851745373 I have backported it to branch-3.1 and branch-3.0 too because this is a test-only, and in case other people run the tests with higher R versions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #32709: [SPARK-35573][R][TESTS] Make SparkR tests pass with R 4.1+
HyukjinKwon closed pull request #32709: URL: https://github.com/apache/spark/pull/32709 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32709: [SPARK-35573][R][TESTS] Make SparkR tests pass with R 4.1+
HyukjinKwon commented on pull request #32709: URL: https://github.com/apache/spark/pull/32709#issuecomment-851744847 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #32674: [SPARK-35453][PYTHON] Move Koalas accessor to pandas_on_spark accessor
HyukjinKwon closed pull request #32674: URL: https://github.com/apache/spark/pull/32674 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32674: [SPARK-35453][PYTHON] Move Koalas accessor to pandas_on_spark accessor
HyukjinKwon commented on pull request #32674: URL: https://github.com/apache/spark/pull/32674#issuecomment-851744212 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
SparkQA commented on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-851743806 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43645/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
HyukjinKwon commented on a change in pull request #32658: URL: https://github.com/apache/spark/pull/32658#discussion_r642727555 ## File path: docs/sql-data-sources-csv.md ## @@ -195,7 +195,7 @@ Data source options of CSV can be set via: multiLine false -Parse one record, which may span multiple lines, per file. Review comment: let's also keep `, per file` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
HyukjinKwon commented on a change in pull request #32658: URL: https://github.com/apache/spark/pull/32658#discussion_r642727463 ## File path: docs/sql-data-sources-csv.md ## @@ -92,14 +92,14 @@ Data source options of CSV can be set via: comment -empty string + Sets a single character used for skipping lines beginning with this character. By default, it is disabled. read header false -For reading, uses the first line as names of columns. For writing, writes the names of columns as the first line. Note that if the given path is a RDD of Strings, this header option will remove all lines same with the header if exists. Review comment: Let's keep this note: Note that if the given path is a RDD of Strings, this header option will remove all lines same with the header if exists. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] kiszk commented on pull request #32716: [SPARK-35578][SQL][TEST] Add a test case for a bug in janino
kiszk commented on pull request #32716: URL: https://github.com/apache/spark/pull/32716#issuecomment-851743316 Good catch and good producable test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32714: [SPARK-35581][SQL] Support special datetime values in typed literals only
AmplabJenkins removed a comment on pull request #32714: URL: https://github.com/apache/spark/pull/32714#issuecomment-851738273 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139124/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32714: [SPARK-35581][SQL] Support special datetime values in typed literals only
AmplabJenkins commented on pull request #32714: URL: https://github.com/apache/spark/pull/32714#issuecomment-851738273 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139124/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32714: [SPARK-35581][SQL] Support special datetime values in typed literals only
SparkQA removed a comment on pull request #32714: URL: https://github.com/apache/spark/pull/32714#issuecomment-851678599 **[Test build #139124 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139124/testReport)** for PR 32714 at commit [`33b5ce3`](https://github.com/apache/spark/commit/33b5ce30b2d94455ae027e725e28c5c1101b42ec). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32714: [SPARK-35581][SQL] Support special datetime values in typed literals only
SparkQA commented on pull request #32714: URL: https://github.com/apache/spark/pull/32714#issuecomment-851737748 **[Test build #139124 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139124/testReport)** for PR 32714 at commit [`33b5ce3`](https://github.com/apache/spark/commit/33b5ce30b2d94455ae027e725e28c5c1101b42ec). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #32702: [SPARK-35565][SS] Add config for ignoring metadata directory of FileStreamSink
HeartSaVioR commented on pull request #32702: URL: https://github.com/apache/spark/pull/32702#issuecomment-851736620 Now I think it should be a source option. Given the impact, they should know what they are doing in their code, not configuration which can be brought by multiple places, even from cluster level config. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on pull request #32700: [SPARK-35558] Optimizes for multi-quantile retrieval
srowen commented on pull request #32700: URL: https://github.com/apache/spark/pull/32700#issuecomment-851736329 Not sure if it's definitely related, but it looks like this results in tests that hang forever: `[info] *** Test still running after 16 minutes, 2 seconds: suite name: AdaptiveQueryExecSuite, test name: SPARK-33933: Materialize BroadcastQueryStage first in AQE. ` Not 100% sure how it's connected, but, doesn't seem to be happening on other PRs? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #32716: [SPARK-35578][SQL][TEST] Add a test case for a bug in janino
maropu commented on pull request #32716: URL: https://github.com/apache/spark/pull/32716#issuecomment-851734876 @cloud-fan Thanks for sharing this test case! Okay, I'll look into the janino code to check if we could fix the bug there. Anyway, adding this test case into master looks fine to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
SparkQA commented on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-851734216 **[Test build #139125 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139125/testReport)** for PR 32686 at commit [`8252a6a`](https://github.com/apache/spark/commit/8252a6a93a05c97ed47e3174be76fe1aeb3f6567). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32717: [SPARK-35396]Manual close for CachedBatch in InMemoryRelation
AmplabJenkins commented on pull request #32717: URL: https://github.com/apache/spark/pull/32717#issuecomment-851734113 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuechendi opened a new pull request #32717: [SPARK-35396]Manual close for CachedBatch in InMemoryRelation
xuechendi opened a new pull request #32717: URL: https://github.com/apache/spark/pull/32717 Fixed: https://issues.apache.org/jira/browse/SPARK-35396 Signed-off-by: Chendi Xue ### What changes were proposed in this pull request? This PR is used to do manual close for some objects may not be released by GC. For example some arrow allocated memory or other native objects. ### Why are the changes needed? Added a case match in InMemoryRelation 'clearCache' function, if one object is extends from AutoCloseable, then it will manually call its close function in case there is additional memory should be manually released. So one can implement CachedBatch extends from AutoCloseable to indidate this object require extra release. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT is added, org.apache.spark.sql.execution.columnar.RefCountedTestCachedBatchSerializerSuite -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sigmod commented on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
sigmod commented on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-851729720 > One small ergonomic comment. I would be great if we can create some shorthand for the function closures. I would probably make the in individual value be matcher for itself (if Enumeration allows subclassing of the Value class), and create a bunch of functions that allow you to compose them, e.g.: `any`, `all`, ... I'm not sure what the transformWithPruning interface exactly looks like. IIUC, transformWithPruning may still not be able to just take a `composed pattern` instead of a lambda, because we also have `and`, `or`, `not` over `all`, `any` -- even though they're not frequent. If we'd like to put `and`, `or`, `not` into patterns, it sounds a bit complex, as we need to be able to process a tree of such compositions. Anyway, thanks for the suggestion. I'll think about whether there's a simpler approach and may address it subsequent PRs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sigmod commented on a change in pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
sigmod commented on a change in pull request #32686: URL: https://github.com/apache/spark/pull/32686#discussion_r642711834 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala ## @@ -117,6 +120,7 @@ case class AggregateExpression( UnresolvedAttribute(aggregateFunction.toString) } + Review comment: Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sigmod commented on a change in pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
sigmod commented on a change in pull request #32686: URL: https://github.com/apache/spark/pull/32686#discussion_r642711661 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -3736,7 +3744,8 @@ object EliminateUnions extends Rule[LogicalPlan] { * rule can't work for those parameters. */ object CleanupAliases extends Rule[LogicalPlan] with AliasHelper { - override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp { + override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUpWithPruning( +_.containsPattern(ALIAS)) { Review comment: Done. Thanks for the catch! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sigmod commented on a change in pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
sigmod commented on a change in pull request #32686: URL: https://github.com/apache/spark/pull/32686#discussion_r642711608 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -423,7 +424,9 @@ class Analyzer(override val catalogManager: CatalogManager) */ object ResolveAliases extends Rule[LogicalPlan] { private def assignAliases(exprs: Seq[NamedExpression]) = { - exprs.map(_.transformUp { case u @ UnresolvedAlias(child, optGenAliasFunc) => + exprs.map(_.transformUpWithPruning(_.containsPattern(UNRESOLVED_ALIAS)) +{ Review comment: Done. ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1876,7 +1879,7 @@ class Analyzer(override val catalogManager: CatalogManager) private def allowGroupByAlias: Boolean = conf.groupByAliases && !conf.ansiEnabled override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUpWithPruning( - AlwaysProcess.fn, ruleId) { + _.containsAllPatterns(AGGREGATE, UNRESOLVED_ATTRIBUTE), ruleId) { Review comment: Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #32649: [SPARK-35497][PYTHON] Enable plotly tests in pandas-on-Spark
HyukjinKwon commented on a change in pull request #32649: URL: https://github.com/apache/spark/pull/32649#discussion_r642711442 ## File path: .github/workflows/build_and_test.yml ## @@ -215,7 +215,7 @@ jobs: # Ubuntu 20.04. See also SPARK-33162. - name: Install Python packages (Python 3.6) run: | -python3.6 -m pip install numpy 'pyarrow<3.0.0' pandas scipy xmlrunner +python3.6 -m pip install numpy 'pyarrow<3.0.0' pandas scipy xmlrunner plotly>=4.8 Review comment: Oh i should have clarified it.now python 3.9 has to have this since we don't run he pyspark tests with python 3.8 anymore in the master branch, and pandas on Spark is only in the master branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32298: [WIP][SPARK-34079][SQL] Merge non-correlated scalar subqueries to multi-column scalar subqueries for better reuse
AmplabJenkins removed a comment on pull request #32298: URL: https://github.com/apache/spark/pull/32298#issuecomment-851723175 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139122/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32298: [WIP][SPARK-34079][SQL] Merge non-correlated scalar subqueries to multi-column scalar subqueries for better reuse
AmplabJenkins commented on pull request #32298: URL: https://github.com/apache/spark/pull/32298#issuecomment-851723175 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139122/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32298: [WIP][SPARK-34079][SQL] Merge non-correlated scalar subqueries to multi-column scalar subqueries for better reuse
SparkQA removed a comment on pull request #32298: URL: https://github.com/apache/spark/pull/32298#issuecomment-851654123 **[Test build #139122 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139122/testReport)** for PR 32298 at commit [`9d8dd6b`](https://github.com/apache/spark/commit/9d8dd6bc7bca56a11878dcccb5a5186d09e9f67b). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32298: [WIP][SPARK-34079][SQL] Merge non-correlated scalar subqueries to multi-column scalar subqueries for better reuse
SparkQA commented on pull request #32298: URL: https://github.com/apache/spark/pull/32298#issuecomment-851722508 **[Test build #139122 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139122/testReport)** for PR 32298 at commit [`9d8dd6b`](https://github.com/apache/spark/commit/9d8dd6bc7bca56a11878dcccb5a5186d09e9f67b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32649: [SPARK-35497][PYTHON] Enable plotly tests in pandas-on-Spark
dongjoon-hyun commented on a change in pull request #32649: URL: https://github.com/apache/spark/pull/32649#discussion_r642703204 ## File path: .github/workflows/build_and_test.yml ## @@ -215,7 +215,7 @@ jobs: # Ubuntu 20.04. See also SPARK-33162. - name: Install Python packages (Python 3.6) run: | -python3.6 -m pip install numpy 'pyarrow<3.0.0' pandas scipy xmlrunner +python3.6 -m pip install numpy 'pyarrow<3.0.0' pandas scipy xmlrunner plotly>=4.8 Review comment: Oh, I missed this comment last week. I only added plotly to Python 3.9 for now. I will add it to Python 3.8 soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32114: [SPARK-35011][CORE] Avoid Block Manager registrations when StopExecutor msg is in-flight
AmplabJenkins removed a comment on pull request #32114: URL: https://github.com/apache/spark/pull/32114#issuecomment-851683239 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32714: [SPARK-35581][SQL] Support special datetime values in typed literals only
AmplabJenkins removed a comment on pull request #32714: URL: https://github.com/apache/spark/pull/32714#issuecomment-851708124 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43644/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32700: [SPARK-35558] Optimizes for multi-quantile retrieval
AmplabJenkins removed a comment on pull request #32700: URL: https://github.com/apache/spark/pull/32700#issuecomment-851708118 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139118/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32714: [SPARK-35581][SQL] Support special datetime values in typed literals only
AmplabJenkins commented on pull request #32714: URL: https://github.com/apache/spark/pull/32714#issuecomment-851708124 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43644/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32700: [SPARK-35558] Optimizes for multi-quantile retrieval
AmplabJenkins commented on pull request #32700: URL: https://github.com/apache/spark/pull/32700#issuecomment-851708118 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139118/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32114: [SPARK-35011][CORE] Avoid Block Manager registrations when StopExecutor msg is in-flight
AmplabJenkins commented on pull request #32114: URL: https://github.com/apache/spark/pull/32114#issuecomment-851708120 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139123/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32700: [SPARK-35558] Optimizes for multi-quantile retrieval
SparkQA removed a comment on pull request #32700: URL: https://github.com/apache/spark/pull/32700#issuecomment-851513136 **[Test build #139118 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139118/testReport)** for PR 32700 at commit [`8d33ba9`](https://github.com/apache/spark/commit/8d33ba9cfbf6645a60419aed11c5b309434e994e). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32700: [SPARK-35558] Optimizes for multi-quantile retrieval
SparkQA commented on pull request #32700: URL: https://github.com/apache/spark/pull/32700#issuecomment-851703086 **[Test build #139118 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139118/testReport)** for PR 32700 at commit [`8d33ba9`](https://github.com/apache/spark/commit/8d33ba9cfbf6645a60419aed11c5b309434e994e). * This patch **fails from timeout after a configured wait of `500m`**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32114: [SPARK-35011][CORE] Avoid Block Manager registrations when StopExecutor msg is in-flight
SparkQA removed a comment on pull request #32114: URL: https://github.com/apache/spark/pull/32114#issuecomment-851655230 **[Test build #139123 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139123/testReport)** for PR 32114 at commit [`2c7a439`](https://github.com/apache/spark/commit/2c7a4395c3dc75ff803b37a29541292104c53cb7). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32114: [SPARK-35011][CORE] Avoid Block Manager registrations when StopExecutor msg is in-flight
SparkQA commented on pull request #32114: URL: https://github.com/apache/spark/pull/32114#issuecomment-851701808 **[Test build #139123 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139123/testReport)** for PR 32114 at commit [`2c7a439`](https://github.com/apache/spark/commit/2c7a4395c3dc75ff803b37a29541292104c53cb7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org