[GitHub] [spark] SparkQA commented on pull request #32722: [SPARK-35586][[K8S][TESTS] Set a default value for spark.kubernetes.test.sparkTgz in pom.xml for Kubernetes integration tests

2021-05-31 Thread GitBox


SparkQA commented on pull request #32722:
URL: https://github.com/apache/spark/pull/32722#issuecomment-851803720


   **[Test build #139132 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139132/testReport)**
 for PR 32722 at commit 
[`f2d9f30`](https://github.com/apache/spark/commit/f2d9f30e2f84fcc3fd692daf31934b568134a56c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions

2021-05-31 Thread GitBox


AmplabJenkins removed a comment on pull request #32718:
URL: https://github.com/apache/spark/pull/32718#issuecomment-851802395


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43647/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules

2021-05-31 Thread GitBox


AmplabJenkins removed a comment on pull request #32686:
URL: https://github.com/apache/spark/pull/32686#issuecomment-851802400


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139125/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32720: [SPARK-35576][SQL][3.1] Redact the sensitive info in the result of Set command

2021-05-31 Thread GitBox


AmplabJenkins removed a comment on pull request #32720:
URL: https://github.com/apache/spark/pull/32720#issuecomment-851802399


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43651/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32693: [SPARK-35556][SQL][TESTS] Avoid log NoSuchMethodError when HiveClientImpl.state close

2021-05-31 Thread GitBox


AmplabJenkins removed a comment on pull request #32693:
URL: https://github.com/apache/spark/pull/32693#issuecomment-851802397


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43650/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32506: [SPARK-35374][SQL] Add string-to-number conversion support to JacksonParser

2021-05-31 Thread GitBox


AmplabJenkins removed a comment on pull request #32506:
URL: https://github.com/apache/spark/pull/32506#issuecomment-851802398


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43648/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules

2021-05-31 Thread GitBox


AmplabJenkins commented on pull request #32686:
URL: https://github.com/apache/spark/pull/32686#issuecomment-851802400


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139125/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32720: [SPARK-35576][SQL][3.1] Redact the sensitive info in the result of Set command

2021-05-31 Thread GitBox


AmplabJenkins commented on pull request #32720:
URL: https://github.com/apache/spark/pull/32720#issuecomment-851802399


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43651/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions

2021-05-31 Thread GitBox


AmplabJenkins commented on pull request #32718:
URL: https://github.com/apache/spark/pull/32718#issuecomment-851802395


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43647/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32693: [SPARK-35556][SQL][TESTS] Avoid log NoSuchMethodError when HiveClientImpl.state close

2021-05-31 Thread GitBox


AmplabJenkins commented on pull request #32693:
URL: https://github.com/apache/spark/pull/32693#issuecomment-851802397


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43650/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32506: [SPARK-35374][SQL] Add string-to-number conversion support to JacksonParser

2021-05-31 Thread GitBox


AmplabJenkins commented on pull request #32506:
URL: https://github.com/apache/spark/pull/32506#issuecomment-851802398


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43648/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #32702: [SPARK-35565][SS] Add config for ignoring metadata directory of FileStreamSink

2021-05-31 Thread GitBox


viirya commented on pull request #32702:
URL: https://github.com/apache/spark/pull/32702#issuecomment-851802006


   Okay, sounds good. Let me change to using a source option.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32720: [SPARK-35576][SQL][3.1] Redact the sensitive info in the result of Set command

2021-05-31 Thread GitBox


SparkQA commented on pull request #32720:
URL: https://github.com/apache/spark/pull/32720#issuecomment-851801961


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43651/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sarutak opened a new pull request #32722: [SPARK-35586][[K8S][TESTS] Set a default value for spark.kubernetes.test.sparkTgz in pom.xml for Kubernetes integration tests

2021-05-31 Thread GitBox


sarutak opened a new pull request #32722:
URL: https://github.com/apache/spark/pull/32722


   ### What changes were proposed in this pull request?
   
   This PR set a default value for `spark.kubernetes.test.sparkTgz` in 
`kubernetes/integration-tests/pom.xml` for Kubernetes integration tests.
   
   ### Why are the changes needed?
   
   In the current master, running the integration tests with the following 
command will fail because there is no default value set for the property.
   ```
   build/mvn -Dspark.kubernetes.test.namespace=default -Pkubernetes 
-Pkubernetes-integration-tests -Psparkr  -pl 
resource-managers/kubernetes/integration-tests integration-test
   ```
   ```
   + mkdir -p 
/home/kou/work/oss/spark/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked
   + tar -xzvf --test-exclude-tags --strip-components=1 -C 
/home/kou/work/oss/spark/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked
   tar (child): --test-exclude-tags: Cannot open: No such file or directory
   tar (child): Error is not recoverable: exiting now
   tar: Child returned status 2
   tar: Error is not recoverable: exiting now
   [ERROR] Command execution failed.
   ```
   
   According to `setup-integration-test-env.sh`, `N/A` is intended as the 
default value so this PR choose it.
   ```
   SPARK_TGZ="N/A"
   MVN="$TEST_ROOT_DIR/build/mvn"
   EXCLUDE_TAGS=""
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Build and tests successfully finish with the command shown above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32693: [SPARK-35556][SQL][TESTS] Avoid log NoSuchMethodError when HiveClientImpl.state close

2021-05-31 Thread GitBox


SparkQA commented on pull request #32693:
URL: https://github.com/apache/spark/pull/32693#issuecomment-851797476


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43650/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sigmod opened a new pull request #32721: [WIP][SPARK-35077][SQL] Migrate to transformWithPruning for leftover optimizer rules

2021-05-31 Thread GitBox


sigmod opened a new pull request #32721:
URL: https://github.com/apache/spark/pull/32721


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32506: [SPARK-35374][SQL] Add string-to-number conversion support to JacksonParser

2021-05-31 Thread GitBox


SparkQA commented on pull request #32506:
URL: https://github.com/apache/spark/pull/32506#issuecomment-851795183


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43648/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules

2021-05-31 Thread GitBox


SparkQA removed a comment on pull request #32686:
URL: https://github.com/apache/spark/pull/32686#issuecomment-851734216


   **[Test build #139125 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139125/testReport)**
 for PR 32686 at commit 
[`8252a6a`](https://github.com/apache/spark/commit/8252a6a93a05c97ed47e3174be76fe1aeb3f6567).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules

2021-05-31 Thread GitBox


SparkQA commented on pull request #32686:
URL: https://github.com/apache/spark/pull/32686#issuecomment-851794843


   **[Test build #139125 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139125/testReport)**
 for PR 32686 at commit 
[`8252a6a`](https://github.com/apache/spark/commit/8252a6a93a05c97ed47e3174be76fe1aeb3f6567).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions

2021-05-31 Thread GitBox


SparkQA commented on pull request #32718:
URL: https://github.com/apache/spark/pull/32718#issuecomment-851792608


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43647/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32720: [SPARK-35576][SQL][3.1] Redact the sensitive info in the result of Set command

2021-05-31 Thread GitBox


SparkQA commented on pull request #32720:
URL: https://github.com/apache/spark/pull/32720#issuecomment-851790068


   **[Test build #139131 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139131/testReport)**
 for PR 32720 at commit 
[`66536fb`](https://github.com/apache/spark/commit/66536fb5b2d8f1499bd4bdb5a9a31435f637bab8).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on pull request #32712: [SPARK-35576][SQL] Redact the sensitive info in the result of Set command

2021-05-31 Thread GitBox


gengliangwang commented on pull request #32712:
URL: https://github.com/apache/spark/pull/32712#issuecomment-851789021


   @dongjoon-hyun Thanks for merging. I have opened a cherry-pick PR in 
https://github.com/apache/spark/pull/32720


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang opened a new pull request #32720: [SPARK-35576][SQL][3.1] Redact the sensitive info in the result of Set command

2021-05-31 Thread GitBox


gengliangwang opened a new pull request #32720:
URL: https://github.com/apache/spark/pull/32720


   
   
   ### What changes were proposed in this pull request?
   
   Currently, the results of following SQL queries are not redacted:
   ```
   SET [KEY];
   SET;
   ```
   For example:
   
   ```
   scala> spark.sql("set javax.jdo.option.ConnectionPassword=123456").show()
   ++--+
   | key| value|
   ++--+
   |javax.jdo.option|123456|
   ++--+
   
   scala> spark.sql("set javax.jdo.option.ConnectionPassword").show()
   ++--+
   | key| value|
   ++--+
   |javax.jdo.option|123456|
   ++--+
   
   scala> spark.sql("set").show()
   +++
   | key|   value|
   +++
   |javax.jdo.option|  123456|
   
   ```
   
   We should hide the sensitive information and redact the query output.
   
   ### Why are the changes needed?
   
   Security.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, the sensitive information in the output of Set commands are redacted
   
   
   ### How was this patch tested?
   
   Unit test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #32709: [SPARK-35573][R][TESTS] Make SparkR tests pass with R 4.1+

2021-05-31 Thread GitBox


viirya commented on pull request #32709:
URL: https://github.com/apache/spark/pull/32709#issuecomment-851788514


   Cool! Thanks @HyukjinKwon!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #32653: [SPARK-35312][SS] Introduce new Option in Kafka source to specify minimum number of records to read per trigger

2021-05-31 Thread GitBox


HeartSaVioR commented on a change in pull request #32653:
URL: https://github.com/apache/spark/pull/32653#discussion_r642765673



##
File path: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
##
@@ -139,26 +156,78 @@ private[kafka010] class KafkaSource(
   override def latestOffset(startOffset: streaming.Offset, limit: ReadLimit): 
streaming.Offset = {
 // Make sure initialPartitionOffsets is initialized
 initialPartitionOffsets
-
-val latest = kafkaReader.fetchLatestOffsets(
-  currentPartitionOffsets.orElse(Some(initialPartitionOffsets)))
+val currentOffsets = 
currentPartitionOffsets.orElse(Some(initialPartitionOffsets))
+val latest = kafkaReader.fetchLatestOffsets(currentOffsets)
+var skipBatch = false

Review comment:
   Same here as well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #32653: [SPARK-35312][SS] Introduce new Option in Kafka source to specify minimum number of records to read per trigger

2021-05-31 Thread GitBox


HeartSaVioR commented on a change in pull request #32653:
URL: https://github.com/apache/spark/pull/32653#discussion_r642765440



##
File path: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala
##
@@ -95,15 +114,62 @@ private[kafka010] class KafkaMicroBatchStream(
   override def latestOffset(start: Offset, readLimit: ReadLimit): Offset = {
 val startPartitionOffsets = 
start.asInstanceOf[KafkaSourceOffset].partitionToOffsets
 latestPartitionOffsets = 
kafkaOffsetReader.fetchLatestOffsets(Some(startPartitionOffsets))
+var skipBatch = false

Review comment:
   Now I see duplicated codes around due to branches handling each type, 
including CompositeReadLimit which handles both lower and upper hence having 
same code.
   
   How about changing like below:
   
   ```
   val limits: Seq[ReadLimit] = readLimit match {
 case rows: CompositeReadLimit => rows.getReadLimits
 case rows => Seq(rows)
   }
   
   val offsets = if (limits.exists(_.isInstanceOf[ReadAllAvailable])) {
 // ReadAllAvailable has the highest priority
 latestPartitionOffsets
   } else {
 val lowerLimit = 
limits.find(_.isInstanceOf[ReadMinRows]).map(_.asInstanceOf[ReadMinRows])
 val upperLimit = 
limits.find(_.isInstanceOf[ReadMaxRows]).map(_.asInstanceOf[ReadMaxRows])
   
 lowerLimit.flatMap { limit =>
   // checking if we need to skip batch based on minOffsetPerTrigger 
criteria
   val skipBatch = delayBatch(
 limit.minRows, latestPartitionOffsets, startPartitionOffsets, 
limit.maxTriggerDelayMs)
   if (skipBatch) {
 logDebug(
   s"Delaying batch as number of records available is less than 
minOffsetsPerTrigger")
 Some(startPartitionOffsets)
   } else {
 None
   }
 }.orElse {
   // checking if we need to adjust a range of offsets based on 
maxOffsetPerTrigger criteria
   upperLimit.map { limit =>
 rateLimit(limit.maxRows(), startPartitionOffsets, 
latestPartitionOffsets)
   }
 }.getOrElse(latestPartitionOffsets)
   }
   
   endPartitionOffsets = KafkaSourceOffset(offsets)
   endPartitionOffsets
   ```
   
   this would require less change when we want to add more read limits in the 
future.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32693: [SPARK-35556][SQL][TESTS] Avoid log NoSuchMethodError when HiveClientImpl.state close

2021-05-31 Thread GitBox


SparkQA commented on pull request #32693:
URL: https://github.com/apache/spark/pull/32693#issuecomment-851785773


   **[Test build #139130 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139130/testReport)**
 for PR 32693 at commit 
[`698bea5`](https://github.com/apache/spark/commit/698bea5d49986f955c0736bff59ceb0c7c6051e8).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page

2021-05-31 Thread GitBox


AmplabJenkins removed a comment on pull request #32658:
URL: https://github.com/apache/spark/pull/32658#issuecomment-851784991


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43646/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32719: [SPARK-35584][TESTS] Increase the timeout in FallbackStorageSuite

2021-05-31 Thread GitBox


AmplabJenkins commented on pull request #32719:
URL: https://github.com/apache/spark/pull/32719#issuecomment-851784992


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43649/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page

2021-05-31 Thread GitBox


AmplabJenkins commented on pull request #32658:
URL: https://github.com/apache/spark/pull/32658#issuecomment-851784991


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43646/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32719: [SPARK-35584][TESTS] Increase the timeout in FallbackStorageSuite

2021-05-31 Thread GitBox


AmplabJenkins removed a comment on pull request #32719:
URL: https://github.com/apache/spark/pull/32719#issuecomment-851784992


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43649/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32506: [SPARK-35374][SQL] Add string-to-number conversion support to JacksonParser

2021-05-31 Thread GitBox


SparkQA commented on pull request #32506:
URL: https://github.com/apache/spark/pull/32506#issuecomment-851784737


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43648/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32719: [SPARK-35584][TESTS] Increase the timeout in FallbackStorageSuite

2021-05-31 Thread GitBox


SparkQA commented on pull request #32719:
URL: https://github.com/apache/spark/pull/32719#issuecomment-851784608


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43649/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions

2021-05-31 Thread GitBox


SparkQA commented on pull request #32718:
URL: https://github.com/apache/spark/pull/32718#issuecomment-851782661


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43647/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang closed pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules

2021-05-31 Thread GitBox


gengliangwang closed pull request #32686:
URL: https://github.com/apache/spark/pull/32686


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules

2021-05-31 Thread GitBox


gengliangwang commented on pull request #32686:
URL: https://github.com/apache/spark/pull/32686#issuecomment-851781327


   Thanks, merging to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions

2021-05-31 Thread GitBox


HyukjinKwon commented on a change in pull request #32718:
URL: https://github.com/apache/spark/pull/32718#discussion_r642758972



##
File path: sql/core/src/test/scala/org/apache/spark/sql/MiscFunctionsSuite.scala
##
@@ -40,6 +40,12 @@ class MiscFunctionsSuite extends QueryTest with 
SharedSparkSession {
   Row(SPARK_VERSION_SHORT + " " + SPARK_REVISION))
 assert(df.schema.fieldNames === Seq("version()"))
   }
+
+  test("get current_user and session_user in normal spark apps") {

Review comment:
   shall we add the JIRA prefix?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32709: [SPARK-35573][R][TESTS] Make SparkR tests pass with R 4.1+

2021-05-31 Thread GitBox


HyukjinKwon commented on pull request #32709:
URL: https://github.com/apache/spark/pull/32709#issuecomment-851778790


   CRAN was my env issue. Now the tests and CRAN check should work with R 4.1+ 
too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on a change in pull request #32714: [SPARK-35581][SQL] Support special datetime values in typed literals only

2021-05-31 Thread GitBox


yaooqinn commented on a change in pull request #32714:
URL: https://github.com/apache/spark/pull/32714#discussion_r642757369



##
File path: docs/sql-migration-guide.md
##
@@ -91,6 +91,8 @@ license: |
 
   - In Spark 3.2, `CREATE TABLE AS SELECT` with non-empty `LOCATION` will 
throw `AnalysisException`. To restore the behavior before Spark 3.2, you can 
set `spark.sql.legacy.allowNonEmptyLocationInCTAS` to `true`.
 
+  - In Spark 3.2, the special datetime values such as `epoch`, `today`, 
`yesterday`, `tomorrow` and `now` are supported in typed literals only, for 
instance `select timestamp'now'`. In Spark 3.1 and earlier, such special values 
are supported in any casts of strings to dates/timestamps. To restore the 
behavior before Spark 3.2, you should preprocess string columns and convert the 
strings to desired timestamps explicitly using UDF for instance.

Review comment:
   In Spark 3.2, ~the~ special datetime values. in typed literals only, 
for instance **(add',')** `select timestamp'now'`. In Spark 3.1 and ~earlier~ 
(3.0?)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on a change in pull request #32602: [SPARK-35455][SQL] Unify empty relation optimization between normal and AQE optimizer

2021-05-31 Thread GitBox


ulysses-you commented on a change in pull request #32602:
URL: https://github.com/apache/spark/pull/32602#discussion_r642757227



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala
##
@@ -27,7 +28,9 @@ import org.apache.spark.util.Utils
  */
 class AQEOptimizer(conf: SQLConf) extends RuleExecutor[LogicalPlan] {
   private val defaultBatches = Seq(
-Batch("Eliminate Unnecessary Join", Once, EliminateUnnecessaryJoin),
+Batch("Propagate Empty Relations", Once,
+  AQEPropagateEmptyRelation,
+  UpdateAttributeNullability),

Review comment:
   ah I see, will do this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page

2021-05-31 Thread GitBox


SparkQA commented on pull request #32658:
URL: https://github.com/apache/spark/pull/32658#issuecomment-851775047


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43646/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32719: [SPARK-34059][TESTS]Increase the timeout in FallbackStorageSuite

2021-05-31 Thread GitBox


HyukjinKwon commented on pull request #32719:
URL: https://github.com/apache/spark/pull/32719#issuecomment-851771579


   seems like the JIRA number is wrong in the title


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32719: [SPARK-34059][TESTS]Increase the timeout in FallbackStorageSuite

2021-05-31 Thread GitBox


SparkQA commented on pull request #32719:
URL: https://github.com/apache/spark/pull/32719#issuecomment-851769807


   **[Test build #139129 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139129/testReport)**
 for PR 32719 at commit 
[`941ee9c`](https://github.com/apache/spark/commit/941ee9c1d04f9951598ed8bfb93b5bdaa2819e18).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Yikun opened a new pull request #32719: [SPARK-34059][TESTS]Increase the timeout in FallbackStorageSuite

2021-05-31 Thread GitBox


Yikun opened a new pull request #32719:
URL: https://github.com/apache/spark/pull/32719


   ### What changes were proposed in this pull request?
   ```
   - Upload multi stages *** FAILED ***
   {{ The code passed to eventually never returned normally. Attempted 20 times 
over 10.011176743 seconds. Last failure message: fallbackStorage.exists(0, 
file) was false. (FallbackStorageSuite.scala:243)}}
   ```
   The error like above was raised in aarch64 randomly and also in github 
action test[1][2].
   
   [1] https://github.com/apache/spark/actions/runs/489319612
   [2]https://github.com/apache/spark/actions/runs/479317320
   
   ### Why are the changes needed?
   timeout is too short, need to increase to let test case complete.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   build/mvn test -Dtest=none 
-DwildcardSuites=org.apache.spark.storage.FallbackStorageSuite -pl 
:spark-core_2.12


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32602: [SPARK-35455][SQL] Unify empty relation optimization between normal and AQE optimizer

2021-05-31 Thread GitBox


cloud-fan commented on a change in pull request #32602:
URL: https://github.com/apache/spark/pull/32602#discussion_r642749109



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala
##
@@ -27,7 +28,9 @@ import org.apache.spark.util.Utils
  */
 class AQEOptimizer(conf: SQLConf) extends RuleExecutor[LogicalPlan] {
   private val defaultBatches = Seq(
-Batch("Eliminate Unnecessary Join", Once, EliminateUnnecessaryJoin),
+Batch("Propagate Empty Relations", Once,
+  AQEPropagateEmptyRelation,
+  UpdateAttributeNullability),

Review comment:
   It's a bit different:
   ```
   Project
 Shuffle Stage
   ```
   For the above case, we don't want to optimize it as the benefit is too small 
(removing a shuffle stage may cause regression)
   
   ```
   Project
 Sort
   Shuffle Stage
   ```
   For the above case, we will optimize Sort -> Shuffle Stage to empty relation 
first. Then it makes sense to optimize further and optimize out project, as the 
shuffle stage is already gone.
   
   So adding `ConvertToLocalRelation` looks the best solution here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32506: [SPARK-35374][SQL] Add string-to-number conversion support to JacksonParser

2021-05-31 Thread GitBox


SparkQA commented on pull request #32506:
URL: https://github.com/apache/spark/pull/32506#issuecomment-851767556


   **[Test build #139128 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139128/testReport)**
 for PR 32506 at commit 
[`a361275`](https://github.com/apache/spark/commit/a36127512f4f5eadd9f0b9c9f9b0c3ef90b155e3).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions

2021-05-31 Thread GitBox


SparkQA commented on pull request #32718:
URL: https://github.com/apache/spark/pull/32718#issuecomment-851767500


   **[Test build #139127 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139127/testReport)**
 for PR 32718 at commit 
[`ae337c1`](https://github.com/apache/spark/commit/ae337c13b7648c2011976eb8bef4fd8e67fcf44d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32602: [SPARK-35455][SQL] Unify empty relation optimization between normal and AQE optimizer

2021-05-31 Thread GitBox


cloud-fan commented on a change in pull request #32602:
URL: https://github.com/apache/spark/pull/32602#discussion_r642749109



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala
##
@@ -27,7 +28,9 @@ import org.apache.spark.util.Utils
  */
 class AQEOptimizer(conf: SQLConf) extends RuleExecutor[LogicalPlan] {
   private val defaultBatches = Seq(
-Batch("Eliminate Unnecessary Join", Once, EliminateUnnecessaryJoin),
+Batch("Propagate Empty Relations", Once,
+  AQEPropagateEmptyRelation,
+  UpdateAttributeNullability),

Review comment:
   It's a bit different:
   ```
   Project
 Shuffle Stage
   ```
   For the above case, we don't want to optimize it as the benefit is too small
   
   ```
   Project
 Sort
   Shuffle Stage
   ```
   For the above case, we will optimize Sort -> Shuffle Stage to empty relation 
first. Then it makes sense to optimize further and optimize out project, as the 
shuffle stage is already gone.
   
   So adding `ConvertToLocalRelation` looks the best solution here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions

2021-05-31 Thread GitBox


yaooqinn commented on pull request #32718:
URL: https://github.com/apache/spark/pull/32718#issuecomment-851766836


   cc @cloud-fan @wangyum @maropu thanks very much


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn opened a new pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions

2021-05-31 Thread GitBox


yaooqinn opened a new pull request #32718:
URL: https://github.com/apache/spark/pull/32718


   ### What changes were proposed in this pull request?
   
   Currently, we do not have a suitable definition of the `user` concept in 
Spark. We only have a `sparkUser` app widely but do not support identifier or 
retrieve the user information from a session in STS or a runtime query 
execution.
   
   These SQL functions are very popular and supported by plenty of other modern 
or old school databases, and also compliance.
   
   This PR add `current_user()` and `session_user()` as SQL functions. And, 
they are the same.  In this PR, we add these functions w/o ambiguity.
   1. For a normal single-threaded Spark application, clearly the `sparkUser` 
is always equivalent to `current_user()` and `session_user()`. 
   2. For a multi-threaded Spark application, e.g. Spark thrift server, we use 
a `ThreadLocal` variable to store the client-side user(after authenticated) 
before running the query and retrieve it in the parser.
   
   ### Why are the changes needed?
   
   These SQL functions are very popular and supported by plenty of other modern 
or old school databases, and also compliance.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   yes, added  `current_user()` and `session_user()` as SQL functions
   ### How was this patch tested?
   
   
   new tests in thrift server and sql/catalyst


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on a change in pull request #32602: [SPARK-35455][SQL] Unify empty relation optimization between normal and AQE optimizer

2021-05-31 Thread GitBox


ulysses-you commented on a change in pull request #32602:
URL: https://github.com/apache/spark/pull/32602#discussion_r642747242



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala
##
@@ -27,7 +28,9 @@ import org.apache.spark.util.Utils
  */
 class AQEOptimizer(conf: SQLConf) extends RuleExecutor[LogicalPlan] {
   private val defaultBatches = Seq(
-Batch("Eliminate Unnecessary Join", Once, EliminateUnnecessaryJoin),
+Batch("Propagate Empty Relations", Once,
+  AQEPropagateEmptyRelation,
+  UpdateAttributeNullability),

Review comment:
   yeah, I noticed it. We can put it so that we can propagate empty through 
`project/filter`. like such case:
   ```
   Aggregate
 Project
   Join
 Shuffle
   ```
   But it need to isolate normal and AQE due to `transformWithPruning`.
   
   Otherhand I feel that it's similar if we just let 
`AQEPropagateEmptyRelation` support propagate `project/filter`. and the later 
is simpler. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page

2021-05-31 Thread GitBox


SparkQA commented on pull request #32658:
URL: https://github.com/apache/spark/pull/32658#issuecomment-851763525


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43646/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32715: [SPARK-35577][TESTS] Allow to log container output for docker integration tests

2021-05-31 Thread GitBox


HyukjinKwon commented on pull request #32715:
URL: https://github.com/apache/spark/pull/32715#issuecomment-851751136


   Looks fine. cc @maropu


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page

2021-05-31 Thread GitBox


HyukjinKwon closed pull request #32658:
URL: https://github.com/apache/spark/pull/32658


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page

2021-05-31 Thread GitBox


HyukjinKwon commented on pull request #32658:
URL: https://github.com/apache/spark/pull/32658#issuecomment-851750789


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32558: [SPARK-34953][CORE][SQL] Add the code change for adding the DateType in the infer schema while reading in CSV and JSON

2021-05-31 Thread GitBox


HyukjinKwon commented on pull request #32558:
URL: https://github.com/apache/spark/pull/32558#issuecomment-851750660


   Oh I meant this: 
https://github.com/apache/spark/blob/master/python/pyspark/sql/readwriter.py#L342-L350
   These options are listed up as a parameter in Python side specifically. For 
CSV documentation, it's merged at https://github.com/apache/spark/pull/32658 so 
you could add the option in that page.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page

2021-05-31 Thread GitBox


SparkQA commented on pull request #32658:
URL: https://github.com/apache/spark/pull/32658#issuecomment-851749314


   **[Test build #139126 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139126/testReport)**
 for PR 32658 at commit 
[`f55a2fa`](https://github.com/apache/spark/commit/f55a2fa22efd4ac7611d0483b82dd73596bccce7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules

2021-05-31 Thread GitBox


AmplabJenkins removed a comment on pull request #32686:
URL: https://github.com/apache/spark/pull/32686#issuecomment-851748863


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43645/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules

2021-05-31 Thread GitBox


AmplabJenkins commented on pull request #32686:
URL: https://github.com/apache/spark/pull/32686#issuecomment-851748863


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43645/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #32716: [SPARK-35578][SQL][TEST] Add a test case for a bug in janino

2021-05-31 Thread GitBox


HyukjinKwon closed pull request #32716:
URL: https://github.com/apache/spark/pull/32716


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32716: [SPARK-35578][SQL][TEST] Add a test case for a bug in janino

2021-05-31 Thread GitBox


HyukjinKwon commented on pull request #32716:
URL: https://github.com/apache/spark/pull/32716#issuecomment-851748664


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32709: [SPARK-35573][R][TESTS] Make SparkR tests pass with R 4.1+

2021-05-31 Thread GitBox


HyukjinKwon commented on pull request #32709:
URL: https://github.com/apache/spark/pull/32709#issuecomment-851745373


   I have backported it to branch-3.1 and branch-3.0 too because this is a 
test-only, and in case other people run the tests with higher R versions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #32709: [SPARK-35573][R][TESTS] Make SparkR tests pass with R 4.1+

2021-05-31 Thread GitBox


HyukjinKwon closed pull request #32709:
URL: https://github.com/apache/spark/pull/32709


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32709: [SPARK-35573][R][TESTS] Make SparkR tests pass with R 4.1+

2021-05-31 Thread GitBox


HyukjinKwon commented on pull request #32709:
URL: https://github.com/apache/spark/pull/32709#issuecomment-851744847






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #32674: [SPARK-35453][PYTHON] Move Koalas accessor to pandas_on_spark accessor

2021-05-31 Thread GitBox


HyukjinKwon closed pull request #32674:
URL: https://github.com/apache/spark/pull/32674


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32674: [SPARK-35453][PYTHON] Move Koalas accessor to pandas_on_spark accessor

2021-05-31 Thread GitBox


HyukjinKwon commented on pull request #32674:
URL: https://github.com/apache/spark/pull/32674#issuecomment-851744212


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules

2021-05-31 Thread GitBox


SparkQA commented on pull request #32686:
URL: https://github.com/apache/spark/pull/32686#issuecomment-851743806


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43645/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page

2021-05-31 Thread GitBox


HyukjinKwon commented on a change in pull request #32658:
URL: https://github.com/apache/spark/pull/32658#discussion_r642727555



##
File path: docs/sql-data-sources-csv.md
##
@@ -195,7 +195,7 @@ Data source options of CSV can be set via:
   
 multiLine
 false
-Parse one record, which may span multiple lines, per file.

Review comment:
   let's also keep `, per file` 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page

2021-05-31 Thread GitBox


HyukjinKwon commented on a change in pull request #32658:
URL: https://github.com/apache/spark/pull/32658#discussion_r642727463



##
File path: docs/sql-data-sources-csv.md
##
@@ -92,14 +92,14 @@ Data source options of CSV can be set via:
   
   
 comment
-empty string
+
 Sets a single character used for skipping lines beginning with this 
character. By default, it is disabled.
 read
   
   
 header
 false
-For reading, uses the first line as names of columns. For writing, 
writes the names of columns as the first line. Note that if the given path is a 
RDD of Strings, this header option will remove all lines same with the header 
if exists.

Review comment:
   Let's keep this note:
   
   Note that if the given path is a RDD of Strings, this header option will 
remove all lines same with the header if exists.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] kiszk commented on pull request #32716: [SPARK-35578][SQL][TEST] Add a test case for a bug in janino

2021-05-31 Thread GitBox


kiszk commented on pull request #32716:
URL: https://github.com/apache/spark/pull/32716#issuecomment-851743316


   Good catch and good producable test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32714: [SPARK-35581][SQL] Support special datetime values in typed literals only

2021-05-31 Thread GitBox


AmplabJenkins removed a comment on pull request #32714:
URL: https://github.com/apache/spark/pull/32714#issuecomment-851738273


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139124/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32714: [SPARK-35581][SQL] Support special datetime values in typed literals only

2021-05-31 Thread GitBox


AmplabJenkins commented on pull request #32714:
URL: https://github.com/apache/spark/pull/32714#issuecomment-851738273


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139124/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32714: [SPARK-35581][SQL] Support special datetime values in typed literals only

2021-05-31 Thread GitBox


SparkQA removed a comment on pull request #32714:
URL: https://github.com/apache/spark/pull/32714#issuecomment-851678599


   **[Test build #139124 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139124/testReport)**
 for PR 32714 at commit 
[`33b5ce3`](https://github.com/apache/spark/commit/33b5ce30b2d94455ae027e725e28c5c1101b42ec).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32714: [SPARK-35581][SQL] Support special datetime values in typed literals only

2021-05-31 Thread GitBox


SparkQA commented on pull request #32714:
URL: https://github.com/apache/spark/pull/32714#issuecomment-851737748


   **[Test build #139124 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139124/testReport)**
 for PR 32714 at commit 
[`33b5ce3`](https://github.com/apache/spark/commit/33b5ce30b2d94455ae027e725e28c5c1101b42ec).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #32702: [SPARK-35565][SS] Add config for ignoring metadata directory of FileStreamSink

2021-05-31 Thread GitBox


HeartSaVioR commented on pull request #32702:
URL: https://github.com/apache/spark/pull/32702#issuecomment-851736620


   Now I think it should be a source option. Given the impact, they should know 
what they are doing in their code, not configuration which can be brought by 
multiple places, even from cluster level config.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on pull request #32700: [SPARK-35558] Optimizes for multi-quantile retrieval

2021-05-31 Thread GitBox


srowen commented on pull request #32700:
URL: https://github.com/apache/spark/pull/32700#issuecomment-851736329


   Not sure if it's definitely related, but it looks like this results in tests 
that hang forever:
   `[info] *** Test still running after 16 minutes, 2 seconds: suite name: 
AdaptiveQueryExecSuite, test name: SPARK-33933: Materialize BroadcastQueryStage 
first in AQE. `
   
   Not 100% sure how it's connected, but, doesn't seem to be happening on other 
PRs?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #32716: [SPARK-35578][SQL][TEST] Add a test case for a bug in janino

2021-05-31 Thread GitBox


maropu commented on pull request #32716:
URL: https://github.com/apache/spark/pull/32716#issuecomment-851734876


   @cloud-fan Thanks for sharing this test case! Okay, I'll look into the 
janino code to check if we could fix the bug there. Anyway, adding this test 
case into master looks fine to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules

2021-05-31 Thread GitBox


SparkQA commented on pull request #32686:
URL: https://github.com/apache/spark/pull/32686#issuecomment-851734216


   **[Test build #139125 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139125/testReport)**
 for PR 32686 at commit 
[`8252a6a`](https://github.com/apache/spark/commit/8252a6a93a05c97ed47e3174be76fe1aeb3f6567).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32717: [SPARK-35396]Manual close for CachedBatch in InMemoryRelation

2021-05-31 Thread GitBox


AmplabJenkins commented on pull request #32717:
URL: https://github.com/apache/spark/pull/32717#issuecomment-851734113


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xuechendi opened a new pull request #32717: [SPARK-35396]Manual close for CachedBatch in InMemoryRelation

2021-05-31 Thread GitBox


xuechendi opened a new pull request #32717:
URL: https://github.com/apache/spark/pull/32717


   Fixed: https://issues.apache.org/jira/browse/SPARK-35396
   Signed-off-by: Chendi Xue 
   
   ### What changes were proposed in this pull request?
   This PR is used to do manual close for some objects may not be released by 
GC. For example some arrow allocated memory or other native objects. 
   
   ### Why are the changes needed?
   Added a case match in InMemoryRelation 'clearCache' function, if one object 
is extends from AutoCloseable, then it will manually call its close function in 
case there is additional memory should be manually released.
   So one can implement CachedBatch extends from AutoCloseable to indidate this 
object require extra release.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   UT is added, 
org.apache.spark.sql.execution.columnar.RefCountedTestCachedBatchSerializerSuite
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sigmod commented on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules

2021-05-31 Thread GitBox


sigmod commented on pull request #32686:
URL: https://github.com/apache/spark/pull/32686#issuecomment-851729720


   > One small ergonomic comment. I would be great if we can create some 
shorthand for the function closures. I would probably make the in individual 
value be matcher for itself (if Enumeration allows subclassing of the Value 
class), and create a bunch of functions that allow you to compose them, e.g.: 
`any`, `all`, ...
   
   I'm not sure what the transformWithPruning interface exactly looks like.  
IIUC,  transformWithPruning may still not be able to just take a `composed 
pattern` instead of a lambda, because we also have `and`,  `or`, `not` over 
`all`, `any` -- even though they're not frequent. If we'd like to put `and`, 
`or`, `not` into patterns, it sounds a bit complex, as we need to be able to 
process a tree of such compositions.  
   
   Anyway, thanks for the suggestion. I'll think about whether there's a 
simpler approach and may address it subsequent PRs. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sigmod commented on a change in pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules

2021-05-31 Thread GitBox


sigmod commented on a change in pull request #32686:
URL: https://github.com/apache/spark/pull/32686#discussion_r642711834



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
##
@@ -117,6 +120,7 @@ case class AggregateExpression(
 UnresolvedAttribute(aggregateFunction.toString)
   }
 
+

Review comment:
   Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sigmod commented on a change in pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules

2021-05-31 Thread GitBox


sigmod commented on a change in pull request #32686:
URL: https://github.com/apache/spark/pull/32686#discussion_r642711661



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -3736,7 +3744,8 @@ object EliminateUnions extends Rule[LogicalPlan] {
  * rule can't work for those parameters.
  */
 object CleanupAliases extends Rule[LogicalPlan] with AliasHelper {
-  override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp 
{
+  override def apply(plan: LogicalPlan): LogicalPlan = 
plan.resolveOperatorsUpWithPruning(
+_.containsPattern(ALIAS)) {

Review comment:
   Done. Thanks for the catch!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sigmod commented on a change in pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules

2021-05-31 Thread GitBox


sigmod commented on a change in pull request #32686:
URL: https://github.com/apache/spark/pull/32686#discussion_r642711608



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -423,7 +424,9 @@ class Analyzer(override val catalogManager: CatalogManager)
*/
   object ResolveAliases extends Rule[LogicalPlan] {
 private def assignAliases(exprs: Seq[NamedExpression]) = {
-  exprs.map(_.transformUp { case u @ UnresolvedAlias(child, 
optGenAliasFunc) =>
+  exprs.map(_.transformUpWithPruning(_.containsPattern(UNRESOLVED_ALIAS))
+{

Review comment:
   Done.

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1876,7 +1879,7 @@ class Analyzer(override val catalogManager: 
CatalogManager)
 private def allowGroupByAlias: Boolean = conf.groupByAliases && 
!conf.ansiEnabled
 
 override def apply(plan: LogicalPlan): LogicalPlan = 
plan.resolveOperatorsUpWithPruning(
-  AlwaysProcess.fn, ruleId) {
+  _.containsAllPatterns(AGGREGATE, UNRESOLVED_ATTRIBUTE), ruleId) {

Review comment:
   Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #32649: [SPARK-35497][PYTHON] Enable plotly tests in pandas-on-Spark

2021-05-31 Thread GitBox


HyukjinKwon commented on a change in pull request #32649:
URL: https://github.com/apache/spark/pull/32649#discussion_r642711442



##
File path: .github/workflows/build_and_test.yml
##
@@ -215,7 +215,7 @@ jobs:
 # Ubuntu 20.04. See also SPARK-33162.
 - name: Install Python packages (Python 3.6)
   run: |
-python3.6 -m pip install numpy 'pyarrow<3.0.0' pandas scipy xmlrunner
+python3.6 -m pip install numpy 'pyarrow<3.0.0' pandas scipy xmlrunner 
plotly>=4.8

Review comment:
   Oh i should have clarified it.now python 3.9 has to have this since we 
don't run he pyspark tests with python 3.8 anymore in the master branch, and 
pandas on Spark is only in the master branch. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32298: [WIP][SPARK-34079][SQL] Merge non-correlated scalar subqueries to multi-column scalar subqueries for better reuse

2021-05-31 Thread GitBox


AmplabJenkins removed a comment on pull request #32298:
URL: https://github.com/apache/spark/pull/32298#issuecomment-851723175


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139122/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32298: [WIP][SPARK-34079][SQL] Merge non-correlated scalar subqueries to multi-column scalar subqueries for better reuse

2021-05-31 Thread GitBox


AmplabJenkins commented on pull request #32298:
URL: https://github.com/apache/spark/pull/32298#issuecomment-851723175


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139122/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32298: [WIP][SPARK-34079][SQL] Merge non-correlated scalar subqueries to multi-column scalar subqueries for better reuse

2021-05-31 Thread GitBox


SparkQA removed a comment on pull request #32298:
URL: https://github.com/apache/spark/pull/32298#issuecomment-851654123


   **[Test build #139122 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139122/testReport)**
 for PR 32298 at commit 
[`9d8dd6b`](https://github.com/apache/spark/commit/9d8dd6bc7bca56a11878dcccb5a5186d09e9f67b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32298: [WIP][SPARK-34079][SQL] Merge non-correlated scalar subqueries to multi-column scalar subqueries for better reuse

2021-05-31 Thread GitBox


SparkQA commented on pull request #32298:
URL: https://github.com/apache/spark/pull/32298#issuecomment-851722508


   **[Test build #139122 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139122/testReport)**
 for PR 32298 at commit 
[`9d8dd6b`](https://github.com/apache/spark/commit/9d8dd6bc7bca56a11878dcccb5a5186d09e9f67b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32649: [SPARK-35497][PYTHON] Enable plotly tests in pandas-on-Spark

2021-05-31 Thread GitBox


dongjoon-hyun commented on a change in pull request #32649:
URL: https://github.com/apache/spark/pull/32649#discussion_r642703204



##
File path: .github/workflows/build_and_test.yml
##
@@ -215,7 +215,7 @@ jobs:
 # Ubuntu 20.04. See also SPARK-33162.
 - name: Install Python packages (Python 3.6)
   run: |
-python3.6 -m pip install numpy 'pyarrow<3.0.0' pandas scipy xmlrunner
+python3.6 -m pip install numpy 'pyarrow<3.0.0' pandas scipy xmlrunner 
plotly>=4.8

Review comment:
   Oh, I missed this comment last week. I only added plotly to Python 3.9 
for now. I will add it to Python 3.8 soon.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32114: [SPARK-35011][CORE] Avoid Block Manager registrations when StopExecutor msg is in-flight

2021-05-31 Thread GitBox


AmplabJenkins removed a comment on pull request #32114:
URL: https://github.com/apache/spark/pull/32114#issuecomment-851683239






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32714: [SPARK-35581][SQL] Support special datetime values in typed literals only

2021-05-31 Thread GitBox


AmplabJenkins removed a comment on pull request #32714:
URL: https://github.com/apache/spark/pull/32714#issuecomment-851708124


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43644/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32700: [SPARK-35558] Optimizes for multi-quantile retrieval

2021-05-31 Thread GitBox


AmplabJenkins removed a comment on pull request #32700:
URL: https://github.com/apache/spark/pull/32700#issuecomment-851708118


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139118/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32714: [SPARK-35581][SQL] Support special datetime values in typed literals only

2021-05-31 Thread GitBox


AmplabJenkins commented on pull request #32714:
URL: https://github.com/apache/spark/pull/32714#issuecomment-851708124


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43644/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32700: [SPARK-35558] Optimizes for multi-quantile retrieval

2021-05-31 Thread GitBox


AmplabJenkins commented on pull request #32700:
URL: https://github.com/apache/spark/pull/32700#issuecomment-851708118


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139118/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32114: [SPARK-35011][CORE] Avoid Block Manager registrations when StopExecutor msg is in-flight

2021-05-31 Thread GitBox


AmplabJenkins commented on pull request #32114:
URL: https://github.com/apache/spark/pull/32114#issuecomment-851708120


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139123/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32700: [SPARK-35558] Optimizes for multi-quantile retrieval

2021-05-31 Thread GitBox


SparkQA removed a comment on pull request #32700:
URL: https://github.com/apache/spark/pull/32700#issuecomment-851513136


   **[Test build #139118 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139118/testReport)**
 for PR 32700 at commit 
[`8d33ba9`](https://github.com/apache/spark/commit/8d33ba9cfbf6645a60419aed11c5b309434e994e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32700: [SPARK-35558] Optimizes for multi-quantile retrieval

2021-05-31 Thread GitBox


SparkQA commented on pull request #32700:
URL: https://github.com/apache/spark/pull/32700#issuecomment-851703086


   **[Test build #139118 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139118/testReport)**
 for PR 32700 at commit 
[`8d33ba9`](https://github.com/apache/spark/commit/8d33ba9cfbf6645a60419aed11c5b309434e994e).
* This patch **fails from timeout after a configured wait of `500m`**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32114: [SPARK-35011][CORE] Avoid Block Manager registrations when StopExecutor msg is in-flight

2021-05-31 Thread GitBox


SparkQA removed a comment on pull request #32114:
URL: https://github.com/apache/spark/pull/32114#issuecomment-851655230


   **[Test build #139123 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139123/testReport)**
 for PR 32114 at commit 
[`2c7a439`](https://github.com/apache/spark/commit/2c7a4395c3dc75ff803b37a29541292104c53cb7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32114: [SPARK-35011][CORE] Avoid Block Manager registrations when StopExecutor msg is in-flight

2021-05-31 Thread GitBox


SparkQA commented on pull request #32114:
URL: https://github.com/apache/spark/pull/32114#issuecomment-851701808


   **[Test build #139123 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139123/testReport)**
 for PR 32114 at commit 
[`2c7a439`](https://github.com/apache/spark/commit/2c7a4395c3dc75ff803b37a29541292104c53cb7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   >