[GitHub] [spark] ReachInfi commented on pull request #33314: Add bitmap functions in Spark SQL
ReachInfi commented on pull request #33314: URL: https://github.com/apache/spark/pull/33314#issuecomment-878802086 https://issues.apache.org/jira/browse/SPARK-36118 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on pull request #33286: [SPARK-36079][SQL] Null-based filter estimate should always be in the range [0, 1]
wangyum commented on pull request #33286: URL: https://github.com/apache/spark/pull/33286#issuecomment-878801153 retest this please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33311: [SPARK-36109][SS][TEST] Check data after adding data to topic in KafkaSourceStressSuite
SparkQA removed a comment on pull request #33311: URL: https://github.com/apache/spark/pull/33311#issuecomment-878708461 **[Test build #140946 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140946/testReport)** for PR 33311 at commit [`f33cc23`](https://github.com/apache/spark/commit/f33cc23b5e7391dc3aa68c494f4db0a0ad9a8c09). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33311: [SPARK-36109][SS][TEST] Check data after adding data to topic in KafkaSourceStressSuite
SparkQA commented on pull request #33311: URL: https://github.com/apache/spark/pull/33311#issuecomment-878797982 **[Test build #140946 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140946/testReport)** for PR 33311 at commit [`f33cc23`](https://github.com/apache/spark/commit/f33cc23b5e7391dc3aa68c494f4db0a0ad9a8c09). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ReachInfi commented on pull request #33314: Add bitmap functions in Spark SQL
ReachInfi commented on pull request #33314: URL: https://github.com/apache/spark/pull/33314#issuecomment-878797178 Ok,tks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #33311: [SPARK-36109][SS][TEST] Check data after adding data to topic in KafkaSourceStressSuite
viirya commented on a change in pull request #33311: URL: https://github.com/apache/spark/pull/33311#discussion_r668442575 ## File path: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala ## @@ -2429,8 +2429,9 @@ class KafkaSourceStressSuite extends KafkaSourceTest { (d, running) => { Random.nextInt(5) match { case 0 => // Add a new topic -topics = topics ++ Seq(newStressTopic) -AddKafkaData(topics.toSet, d: _*)(message = s"Add topic $newStressTopic", Review comment: Yea, that is why we see the error message looks like: ``` AddKafkaData(topics = Set(stress4, stress6, stress2, stress1, stress5, stress3), data = Range 15 until 20, message = Add topic stress7) ``` `stress7` should be `stress6`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #33311: [SPARK-36109][SS][TEST] Check data after adding data to topic in KafkaSourceStressSuite
viirya commented on a change in pull request #33311: URL: https://github.com/apache/spark/pull/33311#discussion_r668442575 ## File path: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala ## @@ -2429,8 +2429,9 @@ class KafkaSourceStressSuite extends KafkaSourceTest { (d, running) => { Random.nextInt(5) match { case 0 => // Add a new topic -topics = topics ++ Seq(newStressTopic) -AddKafkaData(topics.toSet, d: _*)(message = s"Add topic $newStressTopic", Review comment: Yea, that is why we see the error message looks like: ``` AddKafkaData(topics = Set(stress4, stress6, stress2, stress1, stress5, stress3), data = Range 15 until 20, message = Add topic stress7) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #33311: [SPARK-36109][SS][TEST] Check data after adding data to topic in KafkaSourceStressSuite
HeartSaVioR commented on a change in pull request #33311: URL: https://github.com/apache/spark/pull/33311#discussion_r668441780 ## File path: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala ## @@ -2429,8 +2429,9 @@ class KafkaSourceStressSuite extends KafkaSourceTest { (d, running) => { Random.nextInt(5) match { case 0 => // Add a new topic -topics = topics ++ Seq(newStressTopic) -AddKafkaData(topics.toSet, d: _*)(message = s"Add topic $newStressTopic", Review comment: Looks like. Nice finding @viirya ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function
AmplabJenkins removed a comment on pull request #33258: URL: https://github.com/apache/spark/pull/33258#issuecomment-878794471 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140953/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #33311: [SPARK-36109][SS][TEST] Check data after adding data to topic in KafkaSourceStressSuite
HeartSaVioR commented on a change in pull request #33311: URL: https://github.com/apache/spark/pull/33311#discussion_r668441137 ## File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala ## @@ -871,6 +871,9 @@ trait StreamTest extends QueryTest with SharedSparkSession with TimeLimits with case r if r < 0.7 => // AddData addRandomData() +// Must check data after adding data in case we delete the topic with added data Review comment: It seems odd to explain Kafka specific one here, as the logic is a generic one. Even we need to infer which module/suite needs this with the only one hint "topic". If we'd like to explain Kafka specific issue here, let's make clear which suite requires it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function
AmplabJenkins commented on pull request #33258: URL: https://github.com/apache/spark/pull/33258#issuecomment-878794471 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140953/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function
SparkQA removed a comment on pull request #33258: URL: https://github.com/apache/spark/pull/33258#issuecomment-878749337 **[Test build #140953 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140953/testReport)** for PR 33258 at commit [`8a4d40d`](https://github.com/apache/spark/commit/8a4d40d76912f3fedc5e282b719b4a61e5908a27). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function
SparkQA commented on pull request #33258: URL: https://github.com/apache/spark/pull/33258#issuecomment-878794174 **[Test build #140953 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140953/testReport)** for PR 33258 at commit [`8a4d40d`](https://github.com/apache/spark/commit/8a4d40d76912f3fedc5e282b719b4a61e5908a27). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class GetTimestamp(` * `case class ParseToTimestamp(` * `case class MakeTimestampNTZ(` * `case class MakeTimestampLTZ(` * ` static class IntegerUpdater implements ParquetVectorUpdater ` * `trait HDFSBackedStateStoreMap ` * `class NoPrefixHDFSBackedStateStoreMap extends HDFSBackedStateStoreMap ` * `class PrefixScannableHDFSBackedStateStoreMap(` * ` class HDFSBackedReadStateStore(val version: Long, map: HDFSBackedStateStoreMap)` * ` class HDFSBackedStateStore(val version: Long, mapToUpdate: HDFSBackedStateStoreMap)` * `sealed trait RocksDBStateEncoder ` * `class PrefixKeyScanStateEncoder(` * `class NoPrefixKeyStateEncoder(keySchema: StructType, valueSchema: StructType)` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33310: [WIP][SPARK-36105][SQL] OptimizeLocalShuffleReader support reading data of multiple mappers in one task
AmplabJenkins removed a comment on pull request #33310: URL: https://github.com/apache/spark/pull/33310#issuecomment-878789756 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the state in a better
SparkQA commented on pull request #33078: URL: https://github.com/apache/spark/pull/33078#issuecomment-878790692 **[Test build #140956 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140956/testReport)** for PR 33078 at commit [`3eb4cbb`](https://github.com/apache/spark/commit/3eb4cbb448ea5359a7d5f9e8966d7c62ce3ffb54). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33310: [WIP][SPARK-36105][SQL] OptimizeLocalShuffleReader support reading data of multiple mappers in one task
SparkQA commented on pull request #33310: URL: https://github.com/apache/spark/pull/33310#issuecomment-878790553 **[Test build #140955 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140955/testReport)** for PR 33310 at commit [`e9255b9`](https://github.com/apache/spark/commit/e9255b90cc2ce49a72f8cb147eb99de8f9988f01). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function
AmplabJenkins removed a comment on pull request #33258: URL: https://github.com/apache/spark/pull/33258#issuecomment-878789751 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45467/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33297: [SPARK-36069] from_json's exception should contain field name, type and value
AmplabJenkins removed a comment on pull request #33297: URL: https://github.com/apache/spark/pull/33297#issuecomment-878789749 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45466/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30869: [SPARK-33865][SQL] When HiveDDL, we need check avro schema too
AmplabJenkins removed a comment on pull request #30869: URL: https://github.com/apache/spark/pull/30869#issuecomment-878789750 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45468/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33312: [SPARK-36110][BUILD] Upgrade SBT to 1.5.5
AmplabJenkins removed a comment on pull request #33312: URL: https://github.com/apache/spark/pull/33312#issuecomment-878789748 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140947/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function
AmplabJenkins commented on pull request #33258: URL: https://github.com/apache/spark/pull/33258#issuecomment-878789751 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45467/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33297: [SPARK-36069] from_json's exception should contain field name, type and value
AmplabJenkins commented on pull request #33297: URL: https://github.com/apache/spark/pull/33297#issuecomment-878789749 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45466/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33312: [SPARK-36110][BUILD] Upgrade SBT to 1.5.5
AmplabJenkins commented on pull request #33312: URL: https://github.com/apache/spark/pull/33312#issuecomment-878789748 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140947/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33310: [WIP][SPARK-36105][SQL] OptimizeLocalShuffleReader support reading data of multiple mappers in one task
AmplabJenkins commented on pull request #33310: URL: https://github.com/apache/spark/pull/33310#issuecomment-878789756 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30869: [SPARK-33865][SQL] When HiveDDL, we need check avro schema too
AmplabJenkins commented on pull request #30869: URL: https://github.com/apache/spark/pull/30869#issuecomment-878789750 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45468/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #33314: Add bitmap functions in Spark SQL
HyukjinKwon commented on pull request #33314: URL: https://github.com/apache/spark/pull/33314#issuecomment-878784758 @ReachInfi can you file a JIRA and link it to the PR title please? See also https://spark.apache.org/contributing.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33311: [SPARK-36109][SS][TEST] Check data after adding data to topic in KafkaSourceStressSuite
dongjoon-hyun commented on a change in pull request #33311: URL: https://github.com/apache/spark/pull/33311#discussion_r668430347 ## File path: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala ## @@ -2429,8 +2429,9 @@ class KafkaSourceStressSuite extends KafkaSourceTest { (d, running) => { Random.nextInt(5) match { case 0 => // Add a new topic -topics = topics ++ Seq(newStressTopic) -AddKafkaData(topics.toSet, d: _*)(message = s"Add topic $newStressTopic", Review comment: So, previously, is `newStressTopic` function invoked twice? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #33311: [SPARK-36109][SS][TEST] Check data after adding data to topic in KafkaSourceStressSuite
dongjoon-hyun commented on pull request #33311: URL: https://github.com/apache/spark/pull/33311#issuecomment-878782539 Thank you, @viirya . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #33312: [SPARK-36110][BUILD] Upgrade SBT to 1.5.5
dongjoon-hyun commented on pull request #33312: URL: https://github.com/apache/spark/pull/33312#issuecomment-878781192 +1, LGTM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] allisonwang-db commented on a change in pull request #33070: [SPARK-35551][SQL] Handle the COUNT bug for lateral subqueries
allisonwang-db commented on a change in pull request #33070: URL: https://github.com/apache/spark/pull/33070#discussion_r668428086 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala ## @@ -428,7 +451,132 @@ object DecorrelateInnerQuery extends PredicateHelper { groupingExpressions = newGroupingExpr ++ referencesToAdd, aggregateExpressions = newAggExpr ++ referencesToAdd, child = newChild) -(newAggregate, joinCond, outerReferenceMap) + +// Preserving domain attributes over an Aggregate with an empty grouping expression +// is subject to the "COUNT bug" that can lead to wrong answer: +// +// Suppose the original query is: +// SELECT a, (SELECT COUNT(*) cnt FROM t2 WHERE t1.a = t2.c) FROM t1 +// +// Decorrelated plan: +// Project [a, scalar-subquery [a = c]] +// : +- Aggregate [c] [count(*) AS cnt, c] +// : +- Relation [c, d] +// +- Relation [a, b] +// +// After rewrite: +// Project [a, cnt] +// +- Join LeftOuter (a = c) +// :- Relation [a, b] +// +- Aggregate [c] [count(*) AS cnt, c] +// +- Relation [c, d] +// +// T1T2 T2' (GROUP BY c) +// +---+---+ +---+---+ +---+-+ +// | a | b | | c | d | | c | cnt | +// +---+---+ +---+---+ +---+-+ +// | 0 | 1 | | 0 | 2 | | 0 | 2 | +// | 1 | 2 | | 0 | 3 | +---+-+ +// +---+---+ +---+---+ +// +// T1 nested loop join T2 T1 left outer join T2' +// on (a = c):on (a = c): +// +---+-++---+-++ +// | a | cnt || a | cnt | +// +---+-++---+--+ +// | 0 | 2 || 0 | 2| +// | 1 | 0 | <--- correct | 1 | null | <--- wrong result +// +---+-++---+--+ +// +// If an aggregate is subject to the COUNT bug: +// 1) add a column `true AS alwaysTrue` to the result of the aggregate +// 2) insert a left outer domain join between the outer query and this aggregate +// 3) rewrite the original aggregate's output column using the default value of the +//aggregate function and the alwaysTrue column. +// +// For example, T1 left outer join T2' with `alwaysTrue` marker: +// +---+--+++ +// | c | cnt | alwaysTrue | if(isnull(alwaysTrue), 0, cnt) | +// +---+--+++ +// | 0 | 2| true | 2 | +// | 0 | null | null | 0 | <--- correct result +// +---+--+++ +if (groupingExpressions.isEmpty && handleCountBug) { + // Evaluate the aggregate expressions with zero tuples. + val resultMap = RewriteCorrelatedScalarSubquery.evalAggregateOnZeroTups(newAggregate) + val alwaysTrue = Alias(Literal.TrueLiteral, "alwaysTrue")() + val alwaysTrueRef = alwaysTrue.toAttribute.withNullability(true) + val expressions = ArrayBuffer.empty[NamedExpression] + // Create new aliases for aggregate expressions that have non-null default + // values and reconstruct the output with the `alwaysTrue` marker. + val projectList = newAggregate.aggregateExpressions.map { a => +resultMap.get(a.exprId) match { + // Aggregate expression is not subject to the count bug. + case Some(Literal(null, _)) | None => +expressions += a +// The attribute is nullable since it is from the right-hand side of a +// left outer join. +a.toAttribute.withNullability(true) + case Some(default) => +val newAttr = a.newInstance() Review comment: Yes this holds because if `a` is an attribute, evaluating it with zero tuples will yield null: https://github.com/apache/spark/blob/c46342e3d057bcc949b4caf016514ff05e0a1ebd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala#L407 Another possibility is OuterReference. Let me verify if outer works in this case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub
[GitHub] [spark] michaelzhang-db removed a comment on pull request #33310: [WIP][SPARK-36105][SQL] OptimizeLocalShuffleReader support reading data of multiple mappers in one task
michaelzhang-db removed a comment on pull request #33310: URL: https://github.com/apache/spark/pull/33310#issuecomment-878780180 ok to test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33312: [SPARK-36110][BUILD] Upgrade SBT to 1.5.5
SparkQA removed a comment on pull request #33312: URL: https://github.com/apache/spark/pull/33312#issuecomment-878728629 **[Test build #140947 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140947/testReport)** for PR 33312 at commit [`3e00efa`](https://github.com/apache/spark/commit/3e00efaef8c06e5ee15fb1a3bf071aabcf94b8e7). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] michaelzhang-db commented on pull request #33310: [WIP][SPARK-36105][SQL] OptimizeLocalShuffleReader support reading data of multiple mappers in one task
michaelzhang-db commented on pull request #33310: URL: https://github.com/apache/spark/pull/33310#issuecomment-878780180 ok to test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33312: [SPARK-36110][BUILD] Upgrade SBT to 1.5.5
SparkQA commented on pull request #33312: URL: https://github.com/apache/spark/pull/33312#issuecomment-878779979 **[Test build #140947 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140947/testReport)** for PR 33312 at commit [`3e00efa`](https://github.com/apache/spark/commit/3e00efaef8c06e5ee15fb1a3bf071aabcf94b8e7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function
SparkQA commented on pull request #33258: URL: https://github.com/apache/spark/pull/33258#issuecomment-87895 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45467/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #33299: [SPARK-36046][SQL] Support new functions make_timestamp_ntz and make_timestamp_ltz
cloud-fan commented on a change in pull request #33299: URL: https://github.com/apache/spark/pull/33299#discussion_r668423810 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ## @@ -552,6 +552,8 @@ object FunctionRegistry { expression[TimeWindow]("window"), expression[MakeDate]("make_date"), expression[MakeTimestamp]("make_timestamp"), +expression[MakeTimestampNTZ]("make_timestamp_ntz", true), Review comment: It's better to implement `prettyName`. The alias here is implemented via `TreeNodeTag`, which is quite unreliable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] allisonwang-db commented on pull request #33070: [SPARK-35551][SQL] Handle the COUNT bug for lateral subqueries
allisonwang-db commented on pull request #33070: URL: https://github.com/apache/spark/pull/33070#issuecomment-878776521 > Next we should use this fix to solve the count bug for all correlated subqueries. Created two follow-up issues: SPARK-36113 and SPARK-36115. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33297: [SPARK-36069] from_json's exception should contain field name, type and value
SparkQA commented on pull request #33297: URL: https://github.com/apache/spark/pull/33297#issuecomment-878775838 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45466/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30869: [SPARK-33865][SQL] When HiveDDL, we need check avro schema too
SparkQA commented on pull request #30869: URL: https://github.com/apache/spark/pull/30869#issuecomment-878775341 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45468/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33314: Add bitmap functions in Spark SQL
AmplabJenkins commented on pull request #33314: URL: https://github.com/apache/spark/pull/33314#issuecomment-878767184 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33310: [WIP][SPARK-36105][SQL] OptimizeLocalShuffleReader support reading data of multiple mappers in one task
AmplabJenkins removed a comment on pull request #33310: URL: https://github.com/apache/spark/pull/33310#issuecomment-878767004 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45465/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33293: [SPARK-36076][SQL][3.1] ArrayIndexOutOfBounds in Cast string to times…
AmplabJenkins removed a comment on pull request #33293: URL: https://github.com/apache/spark/pull/33293#issuecomment-878767002 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45464/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33297: [SPARK-36069] from_json's exception should contain field name, type and value
AmplabJenkins removed a comment on pull request #33297: URL: https://github.com/apache/spark/pull/33297#issuecomment-878767000 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45463/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33297: [SPARK-36069] from_json's exception should contain field name, type and value
AmplabJenkins commented on pull request #33297: URL: https://github.com/apache/spark/pull/33297#issuecomment-878767000 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45463/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33293: [SPARK-36076][SQL][3.1] ArrayIndexOutOfBounds in Cast string to times…
AmplabJenkins commented on pull request #33293: URL: https://github.com/apache/spark/pull/33293#issuecomment-878767002 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45464/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33310: [WIP][SPARK-36105][SQL] OptimizeLocalShuffleReader support reading data of multiple mappers in one task
AmplabJenkins commented on pull request #33310: URL: https://github.com/apache/spark/pull/33310#issuecomment-878767004 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45465/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function
SparkQA commented on pull request #33258: URL: https://github.com/apache/spark/pull/33258#issuecomment-878764568 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45467/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #33312: [SPARK-36110][BUILD] Upgrade SBT to 1.5.5
HyukjinKwon closed pull request #33312: URL: https://github.com/apache/spark/pull/33312 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #33315: [SPARK-36104][PYTHON][FOLLOWUP] Remove unused import "typing.cast"
HyukjinKwon closed pull request #33315: URL: https://github.com/apache/spark/pull/33315 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #33312: [SPARK-36110][BUILD] Upgrade SBT to 1.5.5
HyukjinKwon commented on pull request #33312: URL: https://github.com/apache/spark/pull/33312#issuecomment-878763360 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #33315: [SPARK-36104][PYTHON][FOLLOWUP] Remove unused import "typing.cast"
HyukjinKwon commented on pull request #33315: URL: https://github.com/apache/spark/pull/33315#issuecomment-878763205 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33297: [SPARK-36069] from_json's exception should contain field name, type and value
SparkQA commented on pull request #33297: URL: https://github.com/apache/spark/pull/33297#issuecomment-878762536 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45466/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30869: [SPARK-33865][SQL] When HiveDDL, we need check avro schema too
SparkQA commented on pull request #30869: URL: https://github.com/apache/spark/pull/30869#issuecomment-878762434 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45468/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on pull request #33315: [SPARK-36104][PYTHON][FOLLOWUP] Remove unused import "typing.cast"
sarutak commented on pull request #33315: URL: https://github.com/apache/spark/pull/33315#issuecomment-878762244 cc: @HyukjinKwon @xinrong-databricks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak opened a new pull request #33315: [SPARK-36104][PYTHON][FOLLOWUP] Remove unused import "typing.cast"
sarutak opened a new pull request #33315: URL: https://github.com/apache/spark/pull/33315 ### What changes were proposed in this pull request? This is a followup PR for SPARK-36104 (#33307) and removes unused import `typing.cast`. ### Why are the changes needed? To recover CI. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33310: [WIP][SPARK-36105][SQL] OptimizeLocalShuffleReader support reading data of multiple mappers in one task
SparkQA commented on pull request #33310: URL: https://github.com/apache/spark/pull/33310#issuecomment-878761359 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45465/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file
Ngone51 commented on a change in pull request #32401: URL: https://github.com/apache/spark/pull/32401#discussion_r668409304 ## File path: core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala ## @@ -360,13 +389,41 @@ private[spark] class IndexShuffleBlockResolver( if (dataTmp != null && dataTmp.exists() && !dataTmp.renameTo(dataFile)) { throw new IOException("fail to rename file " + dataTmp + " to " + dataFile) } + + // write the checksum file + checksumTmpOpt.zip(checksumFileOpt).foreach { case (checksumTmp, checksumFile) => +val out = new DataOutputStream( + new BufferedOutputStream( +new FileOutputStream(checksumTmp) + ) +) +Utils.tryWithSafeFinally { + checksums.foreach(out.writeLong) +} { + out.close() +} + +if (checksumFile.exists()) { + checksumFile.delete() +} +if (!checksumTmp.renameTo(checksumFile)) { + // It's not worthwhile to fail here after index file and data file are already + // successfully stored due to checksum is only used for the corner error case. + logWarning("fail to rename file " + checksumTmp + " to " + checksumFile) +} + } } } } finally { logDebug(s"Shuffle index for mapId $mapId: ${lengths.mkString("[", ",", "]")}") if (indexTmp.exists() && !indexTmp.delete()) { logError(s"Failed to delete temporary index file at ${indexTmp.getAbsolutePath}") } + checksumTmpOpt.foreach { checksumTmp => +if (checksumTmp.exists() && !checksumTmp.delete()) { Review comment: Good point! We won't propagate the error. I'll handle it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file
Ngone51 commented on a change in pull request #32401: URL: https://github.com/apache/spark/pull/32401#discussion_r668408590 ## File path: core/src/main/java/org/apache/spark/shuffle/checksum/ShuffleChecksumHelper.java ## @@ -0,0 +1,81 @@ +package org.apache.spark.shuffle.checksum; + +import java.util.zip.Adler32; +import java.util.zip.CRC32; +import java.util.zip.Checksum; + +import org.apache.spark.SparkConf; +import org.apache.spark.SparkException; +import org.apache.spark.internal.config.package$; +import org.apache.spark.storage.ShuffleChecksumBlockId; + +public class ShuffleChecksumHelper { Review comment: Added the doc. And marked it as private. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file
Ngone51 commented on a change in pull request #32401: URL: https://github.com/apache/spark/pull/32401#discussion_r668408329 ## File path: core/src/main/scala/org/apache/spark/storage/DiskBlockObjectWriter.scala ## @@ -101,12 +108,30 @@ private[spark] class DiskBlockObjectWriter( */ private var numRecordsWritten = 0 + /** + * Set the checksum that the checksumOutputStream should use + */ + def setChecksum(checksum: Checksum): Unit = { +if (checksumOutputStream == null) { + this.checksumEnabled = true + this.checksum = checksum +} else { + checksumOutputStream.setChecksum(checksum) Review comment: Yes, it's intentional. In the case of `ShuffleExternalSorter` spill, one `DiskBlockObjectWriter` would serve multiple partitions and different partitions should use different checksums. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ReachInfi opened a new pull request #33314: Add bitmap functions in Spark SQL
ReachInfi opened a new pull request #33314: URL: https://github.com/apache/spark/pull/33314 ### What changes were proposed in this pull request? add functions of bitmap building and computing cardinality for Spark SQL, If this is ok, I will update function.scala and FunctionRegistry.scala. ### Why are the changes needed? Bitmaps are used more and more widely, and many frameworks have native support, such as Clickhouse ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI, it performs well on billions of rows based on our real demand -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dgd-contributor commented on a change in pull request #33293: [SPARK-36076][SQL][3.1] ArrayIndexOutOfBounds in Cast string to times…
dgd-contributor commented on a change in pull request #33293: URL: https://github.com/apache/spark/pull/33293#discussion_r668407288 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala ## @@ -344,29 +344,28 @@ object DateTimeUtils { segments(6) /= 10 digitsMilli -= 1 } -try { - val zoneId = tz match { -case None => timeZoneId -case Some("+") => ZoneOffset.ofHoursMinutes(segments(7), segments(8)) -case Some("-") => ZoneOffset.ofHoursMinutes(-segments(7), -segments(8)) -case Some(zoneName: String) => getZoneId(zoneName.trim) - } - val nanoseconds = MICROSECONDS.toNanos(segments(6)) - val localTime = LocalTime.of(segments(3), segments(4), segments(5), nanoseconds.toInt) - val localDate = if (justTime) { -LocalDate.now(zoneId) - } else { -LocalDate.of(segments(0), segments(1), segments(2)) - } - val localDateTime = LocalDateTime.of(localDate, localTime) - val zonedDateTime = ZonedDateTime.of(localDateTime, zoneId) - val instant = Instant.from(zonedDateTime) - Some(instantToMicros(instant)) -} catch { - case NonFatal(_) => None +val zoneId = tz match { + case None => timeZoneId + case Some("+") => ZoneOffset.ofHoursMinutes(segments(7), segments(8)) + case Some("-") => ZoneOffset.ofHoursMinutes(-segments(7), -segments(8)) + case Some(zoneName: String) => getZoneId(zoneName.trim) +} +val nanoseconds = MICROSECONDS.toNanos(segments(6)) +val localTime = LocalTime.of(segments(3), segments(4), segments(5), nanoseconds.toInt) +val localDate = if (justTime) { + LocalDate.now(zoneId) +} else { + LocalDate.of(segments(0), segments(1), segments(2)) } +val localDateTime = LocalDateTime.of(localDate, localTime) +val zonedDateTime = ZonedDateTime.of(localDateTime, zoneId) +val instant = Instant.from(zonedDateTime) +Some(instantToMicros(instant)) + } catch { +case NonFatal(_) => None } + Review comment: Thanks, done! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33297: [SPARK-36069] from_json's exception should contain field name, type and value
SparkQA commented on pull request #33297: URL: https://github.com/apache/spark/pull/33297#issuecomment-878756758 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45463/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33293: [SPARK-36076][SQL][3.1] ArrayIndexOutOfBounds in Cast string to times…
SparkQA commented on pull request #33293: URL: https://github.com/apache/spark/pull/33293#issuecomment-878754781 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45464/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33293: [SPARK-36076][SQL][3.1] ArrayIndexOutOfBounds in Cast string to times…
AmplabJenkins removed a comment on pull request #33293: URL: https://github.com/apache/spark/pull/33293#issuecomment-878753027 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45460/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33293: [SPARK-36076][SQL][3.1] ArrayIndexOutOfBounds in Cast string to times…
AmplabJenkins commented on pull request #33293: URL: https://github.com/apache/spark/pull/33293#issuecomment-878753027 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45460/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33293: [SPARK-36076][SQL][3.1] ArrayIndexOutOfBounds in Cast string to times…
SparkQA commented on pull request #33293: URL: https://github.com/apache/spark/pull/33293#issuecomment-878753012 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45460/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function
beliefer commented on a change in pull request #33258: URL: https://github.com/apache/spark/pull/33258#discussion_r668401909 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ## @@ -236,6 +274,8 @@ case class CurrentBatchTimestamp( val timestampUs = millisToMicros(timestampMs) dataType match { case _: TimestampType => Literal(timestampUs, TimestampType) + case _: TimestampNTZType => +Literal(convertTz(timestampUs, ZoneOffset.UTC, zoneId), TimestampNTZType) Review comment: I will add new test case in another PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30869: [SPARK-33865][SQL] When HiveDDL, we need check avro schema too
SparkQA removed a comment on pull request #30869: URL: https://github.com/apache/spark/pull/30869#issuecomment-878749770 **[Test build #140954 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140954/testReport)** for PR 30869 at commit [`9654220`](https://github.com/apache/spark/commit/9654220aa5230171e06ac377a24acbc247bb66c7). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33310: [WIP][SPARK-36105][SQL] OptimizeLocalShuffleReader support reading data of multiple mappers in one task
AmplabJenkins removed a comment on pull request #33310: URL: https://github.com/apache/spark/pull/33310#issuecomment-878685125 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30869: [SPARK-33865][SQL] When HiveDDL, we need check avro schema too
AmplabJenkins removed a comment on pull request #30869: URL: https://github.com/apache/spark/pull/30869#issuecomment-878750245 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140954/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file
Ngone51 commented on a change in pull request #32401: URL: https://github.com/apache/spark/pull/32401#discussion_r668400943 ## File path: core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala ## @@ -360,13 +389,41 @@ private[spark] class IndexShuffleBlockResolver( if (dataTmp != null && dataTmp.exists() && !dataTmp.renameTo(dataFile)) { throw new IOException("fail to rename file " + dataTmp + " to " + dataFile) } + + // write the checksum file + checksumTmpOpt.zip(checksumFileOpt).foreach { case (checksumTmp, checksumFile) => +val out = new DataOutputStream( + new BufferedOutputStream( +new FileOutputStream(checksumTmp) + ) +) +Utils.tryWithSafeFinally { + checksums.foreach(out.writeLong) +} { + out.close() +} + +if (checksumFile.exists()) { + checksumFile.delete() +} +if (!checksumTmp.renameTo(checksumFile)) { + // It's not worthwhile to fail here after index file and data file are already + // successfully stored due to checksum is only used for the corner error case. + logWarning("fail to rename file " + checksumTmp + " to " + checksumFile) Review comment: I see. I got your point. I'd prefer to back to `if (existingLengths != null) {`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30869: [SPARK-33865][SQL] When HiveDDL, we need check avro schema too
AmplabJenkins commented on pull request #30869: URL: https://github.com/apache/spark/pull/30869#issuecomment-878750245 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140954/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30869: [SPARK-33865][SQL] When HiveDDL, we need check avro schema too
SparkQA commented on pull request #30869: URL: https://github.com/apache/spark/pull/30869#issuecomment-878750232 **[Test build #140954 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140954/testReport)** for PR 30869 at commit [`9654220`](https://github.com/apache/spark/commit/9654220aa5230171e06ac377a24acbc247bb66c7). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30869: [SPARK-33865][SQL] When HiveDDL, we need check avro schema too
SparkQA commented on pull request #30869: URL: https://github.com/apache/spark/pull/30869#issuecomment-878749770 **[Test build #140954 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140954/testReport)** for PR 30869 at commit [`9654220`](https://github.com/apache/spark/commit/9654220aa5230171e06ac377a24acbc247bb66c7). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function
SparkQA commented on pull request #33258: URL: https://github.com/apache/spark/pull/33258#issuecomment-878749337 **[Test build #140953 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140953/testReport)** for PR 33258 at commit [`8a4d40d`](https://github.com/apache/spark/commit/8a4d40d76912f3fedc5e282b719b4a61e5908a27). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33310: [WIP][SPARK-36105][SQL] OptimizeLocalShuffleReader support reading data of multiple mappers in one task
SparkQA commented on pull request #33310: URL: https://github.com/apache/spark/pull/33310#issuecomment-878749275 **[Test build #140951 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140951/testReport)** for PR 33310 at commit [`f004267`](https://github.com/apache/spark/commit/f004267282c600e719d4a67a79618c525e27ec6c). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33297: [SPARK-36069] from_json's exception should contain field name, type and value
SparkQA commented on pull request #33297: URL: https://github.com/apache/spark/pull/33297#issuecomment-878749259 **[Test build #140952 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140952/testReport)** for PR 33297 at commit [`e0700c6`](https://github.com/apache/spark/commit/e0700c692e0fe62a2a0cd9b7f3fde01e2ce50603). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33311: [SPARK-36109][SS][TEST] Check data after adding data to topic in KafkaSourceStressSuite
AmplabJenkins removed a comment on pull request #33311: URL: https://github.com/apache/spark/pull/33311#issuecomment-878748105 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45459/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33287: [SPARK-36081][SPARK-36066][SQL] Update the document about the behavior change of trimming characters for cast
AmplabJenkins removed a comment on pull request #33287: URL: https://github.com/apache/spark/pull/33287#issuecomment-878748106 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140945/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32949: [SPARK-35749][SPARK-35773][SQL] Parse unit list interval literals as tightest year-month/day-time interval types
AmplabJenkins removed a comment on pull request #32949: URL: https://github.com/apache/spark/pull/32949#issuecomment-878748107 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140939/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33312: [SPARK-36110][BUILD] Upgrade SBT to 1.5.5
AmplabJenkins removed a comment on pull request #33312: URL: https://github.com/apache/spark/pull/33312#issuecomment-878748108 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45462/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32949: [SPARK-35749][SPARK-35773][SQL] Parse unit list interval literals as tightest year-month/day-time interval types
AmplabJenkins commented on pull request #32949: URL: https://github.com/apache/spark/pull/32949#issuecomment-878748107 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140939/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33287: [SPARK-36081][SPARK-36066][SQL] Update the document about the behavior change of trimming characters for cast
AmplabJenkins commented on pull request #33287: URL: https://github.com/apache/spark/pull/33287#issuecomment-878748106 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140945/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33312: [SPARK-36110][BUILD] Upgrade SBT to 1.5.5
AmplabJenkins commented on pull request #33312: URL: https://github.com/apache/spark/pull/33312#issuecomment-878748108 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45462/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33311: [SPARK-36109][SS][TEST] Check data after adding data to topic in KafkaSourceStressSuite
AmplabJenkins commented on pull request #33311: URL: https://github.com/apache/spark/pull/33311#issuecomment-878748105 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45459/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33312: [SPARK-36110][BUILD] Upgrade SBT to 1.5.5
SparkQA commented on pull request #33312: URL: https://github.com/apache/spark/pull/33312#issuecomment-878747011 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45462/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33297: [SPARK-36069] from_json's exception should contain field name, type and value
SparkQA commented on pull request #33297: URL: https://github.com/apache/spark/pull/33297#issuecomment-878746639 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45463/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33293: [SPARK-36076][SQL][3.1] ArrayIndexOutOfBounds in Cast string to times…
SparkQA commented on pull request #33293: URL: https://github.com/apache/spark/pull/33293#issuecomment-878744867 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45464/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #33302: Revert "[SPARK-35253][SPARK-35398][SQL][BUILD] Bump up the janino version to v3.1.4"
HyukjinKwon closed pull request #33302: URL: https://github.com/apache/spark/pull/33302 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #33302: Revert "[SPARK-35253][SPARK-35398][SQL][BUILD] Bump up the janino version to v3.1.4"
HyukjinKwon commented on pull request #33302: URL: https://github.com/apache/spark/pull/33302#issuecomment-878744589 Merged to master and branch-3.2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #33310: [WIP][SPARK-36105][SQL] OptimizeLocalShuffleReader support reading data of multiple mappers in one task
HyukjinKwon commented on a change in pull request #33310: URL: https://github.com/apache/spark/pull/33310#discussion_r668394926 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ShuffledRowRDD.scala ## @@ -58,6 +58,11 @@ case class PartialMapperPartitionSpec( startReducerIndex: Int, endReducerIndex: Int) extends ShufflePartitionSpec +case class CoalescedMapperPartitionSpec( + startMapIndex: Int, + endMapIndex: Int, + numReducers: Int) extends ShufflePartitionSpec Review comment: ```suggestion startMapIndex: Int, endMapIndex: Int, numReducers: Int) extends ShufflePartitionSpec ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #33310: [WIP][SPARK-36105][SQL] OptimizeLocalShuffleReader support reading data of multiple mappers in one task
HyukjinKwon commented on pull request #33310: URL: https://github.com/apache/spark/pull/33310#issuecomment-878743794 ok to test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33287: [SPARK-36081][SPARK-36066][SQL] Update the document about the behavior change of trimming characters for cast
SparkQA removed a comment on pull request #33287: URL: https://github.com/apache/spark/pull/33287#issuecomment-878687971 **[Test build #140945 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140945/testReport)** for PR 33287 at commit [`168f3c8`](https://github.com/apache/spark/commit/168f3c8cce5a8a4bca4c4603f7a4dc3d7683c50b). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33287: [SPARK-36081][SPARK-36066][SQL] Update the document about the behavior change of trimming characters for cast
SparkQA commented on pull request #33287: URL: https://github.com/apache/spark/pull/33287#issuecomment-878743114 **[Test build #140945 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140945/testReport)** for PR 33287 at commit [`168f3c8`](https://github.com/apache/spark/commit/168f3c8cce5a8a4bca4c4603f7a4dc3d7683c50b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #33293: [SPARK-36076][SQL][3.1] ArrayIndexOutOfBounds in Cast string to times…
HyukjinKwon commented on pull request #33293: URL: https://github.com/apache/spark/pull/33293#issuecomment-878742789 cc @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #33293: [SPARK-36076][SQL][3.1] ArrayIndexOutOfBounds in Cast string to times…
HyukjinKwon commented on a change in pull request #33293: URL: https://github.com/apache/spark/pull/33293#discussion_r668394064 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala ## @@ -344,29 +344,28 @@ object DateTimeUtils { segments(6) /= 10 digitsMilli -= 1 } -try { - val zoneId = tz match { -case None => timeZoneId -case Some("+") => ZoneOffset.ofHoursMinutes(segments(7), segments(8)) -case Some("-") => ZoneOffset.ofHoursMinutes(-segments(7), -segments(8)) -case Some(zoneName: String) => getZoneId(zoneName.trim) - } - val nanoseconds = MICROSECONDS.toNanos(segments(6)) - val localTime = LocalTime.of(segments(3), segments(4), segments(5), nanoseconds.toInt) - val localDate = if (justTime) { -LocalDate.now(zoneId) - } else { -LocalDate.of(segments(0), segments(1), segments(2)) - } - val localDateTime = LocalDateTime.of(localDate, localTime) - val zonedDateTime = ZonedDateTime.of(localDateTime, zoneId) - val instant = Instant.from(zonedDateTime) - Some(instantToMicros(instant)) -} catch { - case NonFatal(_) => None +val zoneId = tz match { + case None => timeZoneId + case Some("+") => ZoneOffset.ofHoursMinutes(segments(7), segments(8)) + case Some("-") => ZoneOffset.ofHoursMinutes(-segments(7), -segments(8)) + case Some(zoneName: String) => getZoneId(zoneName.trim) +} +val nanoseconds = MICROSECONDS.toNanos(segments(6)) +val localTime = LocalTime.of(segments(3), segments(4), segments(5), nanoseconds.toInt) +val localDate = if (justTime) { + LocalDate.now(zoneId) +} else { + LocalDate.of(segments(0), segments(1), segments(2)) } +val localDateTime = LocalDateTime.of(localDate, localTime) +val zonedDateTime = ZonedDateTime.of(localDateTime, zoneId) +val instant = Instant.from(zonedDateTime) +Some(instantToMicros(instant)) + } catch { +case NonFatal(_) => None } + Review comment: can you remove this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #33307: [SPARK-36104][PYTHON] Manage InternalField in DataTypeOps.neg/abs
HyukjinKwon closed pull request #33307: URL: https://github.com/apache/spark/pull/33307 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #33307: [SPARK-36104][PYTHON] Manage InternalField in DataTypeOps.neg/abs
HyukjinKwon commented on pull request #33307: URL: https://github.com/apache/spark/pull/33307#issuecomment-878741636 Merged to master and branch-3.2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33293: [SPARK-36076][SQL][3.1] ArrayIndexOutOfBounds in Cast string to times…
SparkQA commented on pull request #33293: URL: https://github.com/apache/spark/pull/33293#issuecomment-878741530 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45460/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dgd-contributor commented on a change in pull request #33291: [SPARK-35561][SQL] Remove leading zeros from empty static number type partition
dgd-contributor commented on a change in pull request #33291: URL: https://github.com/apache/spark/pull/33291#discussion_r668391894 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala ## @@ -351,10 +351,24 @@ object PartitioningUtils { */ def getPathFragment(spec: TablePartitionSpec, partitionSchema: StructType): String = { Review comment: PartitioningUtils.parsePartitions use castPartValueToDesiredType to cast multiple type of partition. I think in this issue, we only need cast number type (or remove leading zeros), the other types will be handled by org.apache.spark.sql.catalyst.expressions.CastBase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org