[GitHub] [spark] ReachInfi commented on pull request #33314: Add bitmap functions in Spark SQL

2021-07-12 Thread GitBox


ReachInfi commented on pull request #33314:
URL: https://github.com/apache/spark/pull/33314#issuecomment-878802086


   https://issues.apache.org/jira/browse/SPARK-36118


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on pull request #33286: [SPARK-36079][SQL] Null-based filter estimate should always be in the range [0, 1]

2021-07-12 Thread GitBox


wangyum commented on pull request #33286:
URL: https://github.com/apache/spark/pull/33286#issuecomment-878801153


   retest this please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33311: [SPARK-36109][SS][TEST] Check data after adding data to topic in KafkaSourceStressSuite

2021-07-12 Thread GitBox


SparkQA removed a comment on pull request #33311:
URL: https://github.com/apache/spark/pull/33311#issuecomment-878708461


   **[Test build #140946 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140946/testReport)**
 for PR 33311 at commit 
[`f33cc23`](https://github.com/apache/spark/commit/f33cc23b5e7391dc3aa68c494f4db0a0ad9a8c09).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33311: [SPARK-36109][SS][TEST] Check data after adding data to topic in KafkaSourceStressSuite

2021-07-12 Thread GitBox


SparkQA commented on pull request #33311:
URL: https://github.com/apache/spark/pull/33311#issuecomment-878797982


   **[Test build #140946 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140946/testReport)**
 for PR 33311 at commit 
[`f33cc23`](https://github.com/apache/spark/commit/f33cc23b5e7391dc3aa68c494f4db0a0ad9a8c09).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ReachInfi commented on pull request #33314: Add bitmap functions in Spark SQL

2021-07-12 Thread GitBox


ReachInfi commented on pull request #33314:
URL: https://github.com/apache/spark/pull/33314#issuecomment-878797178


   Ok,tks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #33311: [SPARK-36109][SS][TEST] Check data after adding data to topic in KafkaSourceStressSuite

2021-07-12 Thread GitBox


viirya commented on a change in pull request #33311:
URL: https://github.com/apache/spark/pull/33311#discussion_r668442575



##
File path: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala
##
@@ -2429,8 +2429,9 @@ class KafkaSourceStressSuite extends KafkaSourceTest {
   (d, running) => {
 Random.nextInt(5) match {
   case 0 => // Add a new topic
-topics = topics ++ Seq(newStressTopic)
-AddKafkaData(topics.toSet, d: _*)(message = s"Add topic 
$newStressTopic",

Review comment:
   Yea, that is why we see the error message looks like:
   
   
   ```
   AddKafkaData(topics = Set(stress4, stress6, stress2, stress1, stress5, 
stress3), data = Range 15 until 20, message = Add topic stress7)
   ```
   
   `stress7` should be `stress6`.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #33311: [SPARK-36109][SS][TEST] Check data after adding data to topic in KafkaSourceStressSuite

2021-07-12 Thread GitBox


viirya commented on a change in pull request #33311:
URL: https://github.com/apache/spark/pull/33311#discussion_r668442575



##
File path: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala
##
@@ -2429,8 +2429,9 @@ class KafkaSourceStressSuite extends KafkaSourceTest {
   (d, running) => {
 Random.nextInt(5) match {
   case 0 => // Add a new topic
-topics = topics ++ Seq(newStressTopic)
-AddKafkaData(topics.toSet, d: _*)(message = s"Add topic 
$newStressTopic",

Review comment:
   Yea, that is why we see the error message looks like:
   
   
   ```
   AddKafkaData(topics = Set(stress4, stress6, stress2, stress1, stress5, 
stress3), data = Range 15 until 20, message = Add topic stress7)
   ```
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #33311: [SPARK-36109][SS][TEST] Check data after adding data to topic in KafkaSourceStressSuite

2021-07-12 Thread GitBox


HeartSaVioR commented on a change in pull request #33311:
URL: https://github.com/apache/spark/pull/33311#discussion_r668441780



##
File path: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala
##
@@ -2429,8 +2429,9 @@ class KafkaSourceStressSuite extends KafkaSourceTest {
   (d, running) => {
 Random.nextInt(5) match {
   case 0 => // Add a new topic
-topics = topics ++ Seq(newStressTopic)
-AddKafkaData(topics.toSet, d: _*)(message = s"Add topic 
$newStressTopic",

Review comment:
   Looks like. Nice finding @viirya !




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function

2021-07-12 Thread GitBox


AmplabJenkins removed a comment on pull request #33258:
URL: https://github.com/apache/spark/pull/33258#issuecomment-878794471


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140953/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #33311: [SPARK-36109][SS][TEST] Check data after adding data to topic in KafkaSourceStressSuite

2021-07-12 Thread GitBox


HeartSaVioR commented on a change in pull request #33311:
URL: https://github.com/apache/spark/pull/33311#discussion_r668441137



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala
##
@@ -871,6 +871,9 @@ trait StreamTest extends QueryTest with SharedSparkSession 
with TimeLimits with
 
   case r if r < 0.7 => // AddData
 addRandomData()
+// Must check data after adding data in case we delete the topic 
with added data

Review comment:
   It seems odd to explain Kafka specific one here, as the logic is a 
generic one. Even we need to infer which module/suite needs this with the only 
one hint "topic".
   
   If we'd like to explain Kafka specific issue here, let's make clear which 
suite requires it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function

2021-07-12 Thread GitBox


AmplabJenkins commented on pull request #33258:
URL: https://github.com/apache/spark/pull/33258#issuecomment-878794471


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140953/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function

2021-07-12 Thread GitBox


SparkQA removed a comment on pull request #33258:
URL: https://github.com/apache/spark/pull/33258#issuecomment-878749337


   **[Test build #140953 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140953/testReport)**
 for PR 33258 at commit 
[`8a4d40d`](https://github.com/apache/spark/commit/8a4d40d76912f3fedc5e282b719b4a61e5908a27).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function

2021-07-12 Thread GitBox


SparkQA commented on pull request #33258:
URL: https://github.com/apache/spark/pull/33258#issuecomment-878794174


   **[Test build #140953 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140953/testReport)**
 for PR 33258 at commit 
[`8a4d40d`](https://github.com/apache/spark/commit/8a4d40d76912f3fedc5e282b719b4a61e5908a27).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `case class GetTimestamp(`
 * `case class ParseToTimestamp(`
 * `case class MakeTimestampNTZ(`
 * `case class MakeTimestampLTZ(`
 * `  static class IntegerUpdater implements ParquetVectorUpdater `
 * `trait HDFSBackedStateStoreMap `
 * `class NoPrefixHDFSBackedStateStoreMap extends HDFSBackedStateStoreMap `
 * `class PrefixScannableHDFSBackedStateStoreMap(`
 * `  class HDFSBackedReadStateStore(val version: Long, map: 
HDFSBackedStateStoreMap)`
 * `  class HDFSBackedStateStore(val version: Long, mapToUpdate: 
HDFSBackedStateStoreMap)`
 * `sealed trait RocksDBStateEncoder `
 * `class PrefixKeyScanStateEncoder(`
 * `class NoPrefixKeyStateEncoder(keySchema: StructType, valueSchema: 
StructType)`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33310: [WIP][SPARK-36105][SQL] OptimizeLocalShuffleReader support reading data of multiple mappers in one task

2021-07-12 Thread GitBox


AmplabJenkins removed a comment on pull request #33310:
URL: https://github.com/apache/spark/pull/33310#issuecomment-878789756


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the state in a better

2021-07-12 Thread GitBox


SparkQA commented on pull request #33078:
URL: https://github.com/apache/spark/pull/33078#issuecomment-878790692


   **[Test build #140956 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140956/testReport)**
 for PR 33078 at commit 
[`3eb4cbb`](https://github.com/apache/spark/commit/3eb4cbb448ea5359a7d5f9e8966d7c62ce3ffb54).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33310: [WIP][SPARK-36105][SQL] OptimizeLocalShuffleReader support reading data of multiple mappers in one task

2021-07-12 Thread GitBox


SparkQA commented on pull request #33310:
URL: https://github.com/apache/spark/pull/33310#issuecomment-878790553


   **[Test build #140955 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140955/testReport)**
 for PR 33310 at commit 
[`e9255b9`](https://github.com/apache/spark/commit/e9255b90cc2ce49a72f8cb147eb99de8f9988f01).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function

2021-07-12 Thread GitBox


AmplabJenkins removed a comment on pull request #33258:
URL: https://github.com/apache/spark/pull/33258#issuecomment-878789751


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45467/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33297: [SPARK-36069] from_json's exception should contain field name, type and value

2021-07-12 Thread GitBox


AmplabJenkins removed a comment on pull request #33297:
URL: https://github.com/apache/spark/pull/33297#issuecomment-878789749


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45466/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30869: [SPARK-33865][SQL] When HiveDDL, we need check avro schema too

2021-07-12 Thread GitBox


AmplabJenkins removed a comment on pull request #30869:
URL: https://github.com/apache/spark/pull/30869#issuecomment-878789750


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45468/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33312: [SPARK-36110][BUILD] Upgrade SBT to 1.5.5

2021-07-12 Thread GitBox


AmplabJenkins removed a comment on pull request #33312:
URL: https://github.com/apache/spark/pull/33312#issuecomment-878789748


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140947/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function

2021-07-12 Thread GitBox


AmplabJenkins commented on pull request #33258:
URL: https://github.com/apache/spark/pull/33258#issuecomment-878789751


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45467/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33297: [SPARK-36069] from_json's exception should contain field name, type and value

2021-07-12 Thread GitBox


AmplabJenkins commented on pull request #33297:
URL: https://github.com/apache/spark/pull/33297#issuecomment-878789749


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45466/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33312: [SPARK-36110][BUILD] Upgrade SBT to 1.5.5

2021-07-12 Thread GitBox


AmplabJenkins commented on pull request #33312:
URL: https://github.com/apache/spark/pull/33312#issuecomment-878789748


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140947/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33310: [WIP][SPARK-36105][SQL] OptimizeLocalShuffleReader support reading data of multiple mappers in one task

2021-07-12 Thread GitBox


AmplabJenkins commented on pull request #33310:
URL: https://github.com/apache/spark/pull/33310#issuecomment-878789756


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30869: [SPARK-33865][SQL] When HiveDDL, we need check avro schema too

2021-07-12 Thread GitBox


AmplabJenkins commented on pull request #30869:
URL: https://github.com/apache/spark/pull/30869#issuecomment-878789750


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45468/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #33314: Add bitmap functions in Spark SQL

2021-07-12 Thread GitBox


HyukjinKwon commented on pull request #33314:
URL: https://github.com/apache/spark/pull/33314#issuecomment-878784758


   @ReachInfi can you file a JIRA and link it to the PR title please? See also 
https://spark.apache.org/contributing.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33311: [SPARK-36109][SS][TEST] Check data after adding data to topic in KafkaSourceStressSuite

2021-07-12 Thread GitBox


dongjoon-hyun commented on a change in pull request #33311:
URL: https://github.com/apache/spark/pull/33311#discussion_r668430347



##
File path: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala
##
@@ -2429,8 +2429,9 @@ class KafkaSourceStressSuite extends KafkaSourceTest {
   (d, running) => {
 Random.nextInt(5) match {
   case 0 => // Add a new topic
-topics = topics ++ Seq(newStressTopic)
-AddKafkaData(topics.toSet, d: _*)(message = s"Add topic 
$newStressTopic",

Review comment:
   So, previously, is `newStressTopic` function invoked twice?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #33311: [SPARK-36109][SS][TEST] Check data after adding data to topic in KafkaSourceStressSuite

2021-07-12 Thread GitBox


dongjoon-hyun commented on pull request #33311:
URL: https://github.com/apache/spark/pull/33311#issuecomment-878782539


   Thank you, @viirya .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #33312: [SPARK-36110][BUILD] Upgrade SBT to 1.5.5

2021-07-12 Thread GitBox


dongjoon-hyun commented on pull request #33312:
URL: https://github.com/apache/spark/pull/33312#issuecomment-878781192


   +1, LGTM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] allisonwang-db commented on a change in pull request #33070: [SPARK-35551][SQL] Handle the COUNT bug for lateral subqueries

2021-07-12 Thread GitBox


allisonwang-db commented on a change in pull request #33070:
URL: https://github.com/apache/spark/pull/33070#discussion_r668428086



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala
##
@@ -428,7 +451,132 @@ object DecorrelateInnerQuery extends PredicateHelper {
   groupingExpressions = newGroupingExpr ++ referencesToAdd,
   aggregateExpressions = newAggExpr ++ referencesToAdd,
   child = newChild)
-(newAggregate, joinCond, outerReferenceMap)
+
+// Preserving domain attributes over an Aggregate with an empty 
grouping expression
+// is subject to the "COUNT bug" that can lead to wrong answer:
+//
+// Suppose the original query is:
+//   SELECT a, (SELECT COUNT(*) cnt FROM t2 WHERE t1.a = t2.c) 
FROM t1
+//
+// Decorrelated plan:
+//   Project [a, scalar-subquery [a = c]]
+//   :  +- Aggregate [c] [count(*) AS cnt, c]
+//   : +- Relation [c, d]
+//   +- Relation [a, b]
+//
+// After rewrite:
+//   Project [a, cnt]
+//   +- Join LeftOuter (a = c)
+//  :- Relation [a, b]
+//  +- Aggregate [c] [count(*) AS cnt, c]
+// +- Relation [c, d]
+//
+// T1T2  T2' (GROUP BY c)
+// +---+---+ +---+---+ +---+-+
+// | a | b | | c | d | | c | cnt |
+// +---+---+ +---+---+ +---+-+
+// | 0 | 1 | | 0 | 2 | | 0 | 2   |
+// | 1 | 2 | | 0 | 3 | +---+-+
+// +---+---+ +---+---+
+//
+// T1 nested loop join T2 T1 left outer join T2'
+// on (a = c):on (a = c):
+// +---+-++---+-++
+// | a | cnt || a | cnt  |
+// +---+-++---+--+
+// | 0 | 2   || 0 | 2|
+// | 1 | 0   | <--- correct   | 1 | null | <--- wrong result
+// +---+-++---+--+
+//
+// If an aggregate is subject to the COUNT bug:
+// 1) add a column `true AS alwaysTrue` to the result of the 
aggregate
+// 2) insert a left outer domain join between the outer query and 
this aggregate
+// 3) rewrite the original aggregate's output column using the 
default value of the
+//aggregate function and the alwaysTrue column.
+//
+// For example, T1 left outer join T2' with `alwaysTrue` marker:
+// +---+--+++
+// | c | cnt  | alwaysTrue | if(isnull(alwaysTrue), 0, cnt) |
+// +---+--+++
+// | 0 | 2| true   | 2  |
+// | 0 | null | null   | 0  |  
<--- correct result
+// +---+--+++
+if (groupingExpressions.isEmpty && handleCountBug) {
+  // Evaluate the aggregate expressions with zero tuples.
+  val resultMap = 
RewriteCorrelatedScalarSubquery.evalAggregateOnZeroTups(newAggregate)
+  val alwaysTrue = Alias(Literal.TrueLiteral, "alwaysTrue")()
+  val alwaysTrueRef = alwaysTrue.toAttribute.withNullability(true)
+  val expressions = ArrayBuffer.empty[NamedExpression]
+  // Create new aliases for aggregate expressions that have 
non-null default
+  // values and reconstruct the output with the `alwaysTrue` 
marker.
+  val projectList = newAggregate.aggregateExpressions.map { a =>
+resultMap.get(a.exprId) match {
+  // Aggregate expression is not subject to the count bug.
+  case Some(Literal(null, _)) | None =>
+expressions += a
+// The attribute is nullable since it is from the 
right-hand side of a
+// left outer join.
+a.toAttribute.withNullability(true)
+  case Some(default) =>
+val newAttr = a.newInstance()

Review comment:
   Yes this holds because if `a` is an attribute, evaluating it with zero 
tuples will yield null:
   
https://github.com/apache/spark/blob/c46342e3d057bcc949b4caf016514ff05e0a1ebd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala#L407
   Another possibility is OuterReference. Let me verify if outer works in this 
case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub 

[GitHub] [spark] michaelzhang-db removed a comment on pull request #33310: [WIP][SPARK-36105][SQL] OptimizeLocalShuffleReader support reading data of multiple mappers in one task

2021-07-12 Thread GitBox


michaelzhang-db removed a comment on pull request #33310:
URL: https://github.com/apache/spark/pull/33310#issuecomment-878780180


   ok to test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33312: [SPARK-36110][BUILD] Upgrade SBT to 1.5.5

2021-07-12 Thread GitBox


SparkQA removed a comment on pull request #33312:
URL: https://github.com/apache/spark/pull/33312#issuecomment-878728629


   **[Test build #140947 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140947/testReport)**
 for PR 33312 at commit 
[`3e00efa`](https://github.com/apache/spark/commit/3e00efaef8c06e5ee15fb1a3bf071aabcf94b8e7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] michaelzhang-db commented on pull request #33310: [WIP][SPARK-36105][SQL] OptimizeLocalShuffleReader support reading data of multiple mappers in one task

2021-07-12 Thread GitBox


michaelzhang-db commented on pull request #33310:
URL: https://github.com/apache/spark/pull/33310#issuecomment-878780180


   ok to test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33312: [SPARK-36110][BUILD] Upgrade SBT to 1.5.5

2021-07-12 Thread GitBox


SparkQA commented on pull request #33312:
URL: https://github.com/apache/spark/pull/33312#issuecomment-878779979


   **[Test build #140947 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140947/testReport)**
 for PR 33312 at commit 
[`3e00efa`](https://github.com/apache/spark/commit/3e00efaef8c06e5ee15fb1a3bf071aabcf94b8e7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function

2021-07-12 Thread GitBox


SparkQA commented on pull request #33258:
URL: https://github.com/apache/spark/pull/33258#issuecomment-87895


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45467/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #33299: [SPARK-36046][SQL] Support new functions make_timestamp_ntz and make_timestamp_ltz

2021-07-12 Thread GitBox


cloud-fan commented on a change in pull request #33299:
URL: https://github.com/apache/spark/pull/33299#discussion_r668423810



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
##
@@ -552,6 +552,8 @@ object FunctionRegistry {
 expression[TimeWindow]("window"),
 expression[MakeDate]("make_date"),
 expression[MakeTimestamp]("make_timestamp"),
+expression[MakeTimestampNTZ]("make_timestamp_ntz", true),

Review comment:
   It's better to implement `prettyName`. The alias here is implemented via 
`TreeNodeTag`, which is quite unreliable.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] allisonwang-db commented on pull request #33070: [SPARK-35551][SQL] Handle the COUNT bug for lateral subqueries

2021-07-12 Thread GitBox


allisonwang-db commented on pull request #33070:
URL: https://github.com/apache/spark/pull/33070#issuecomment-878776521


   > Next we should use this fix to solve the count bug for all correlated 
subqueries.
   
   Created two follow-up issues: SPARK-36113 and SPARK-36115.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33297: [SPARK-36069] from_json's exception should contain field name, type and value

2021-07-12 Thread GitBox


SparkQA commented on pull request #33297:
URL: https://github.com/apache/spark/pull/33297#issuecomment-878775838


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45466/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30869: [SPARK-33865][SQL] When HiveDDL, we need check avro schema too

2021-07-12 Thread GitBox


SparkQA commented on pull request #30869:
URL: https://github.com/apache/spark/pull/30869#issuecomment-878775341


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45468/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33314: Add bitmap functions in Spark SQL

2021-07-12 Thread GitBox


AmplabJenkins commented on pull request #33314:
URL: https://github.com/apache/spark/pull/33314#issuecomment-878767184


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33310: [WIP][SPARK-36105][SQL] OptimizeLocalShuffleReader support reading data of multiple mappers in one task

2021-07-12 Thread GitBox


AmplabJenkins removed a comment on pull request #33310:
URL: https://github.com/apache/spark/pull/33310#issuecomment-878767004


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45465/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33293: [SPARK-36076][SQL][3.1] ArrayIndexOutOfBounds in Cast string to times…

2021-07-12 Thread GitBox


AmplabJenkins removed a comment on pull request #33293:
URL: https://github.com/apache/spark/pull/33293#issuecomment-878767002


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45464/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33297: [SPARK-36069] from_json's exception should contain field name, type and value

2021-07-12 Thread GitBox


AmplabJenkins removed a comment on pull request #33297:
URL: https://github.com/apache/spark/pull/33297#issuecomment-878767000


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45463/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33297: [SPARK-36069] from_json's exception should contain field name, type and value

2021-07-12 Thread GitBox


AmplabJenkins commented on pull request #33297:
URL: https://github.com/apache/spark/pull/33297#issuecomment-878767000


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45463/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33293: [SPARK-36076][SQL][3.1] ArrayIndexOutOfBounds in Cast string to times…

2021-07-12 Thread GitBox


AmplabJenkins commented on pull request #33293:
URL: https://github.com/apache/spark/pull/33293#issuecomment-878767002


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45464/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33310: [WIP][SPARK-36105][SQL] OptimizeLocalShuffleReader support reading data of multiple mappers in one task

2021-07-12 Thread GitBox


AmplabJenkins commented on pull request #33310:
URL: https://github.com/apache/spark/pull/33310#issuecomment-878767004


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45465/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function

2021-07-12 Thread GitBox


SparkQA commented on pull request #33258:
URL: https://github.com/apache/spark/pull/33258#issuecomment-878764568


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45467/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #33312: [SPARK-36110][BUILD] Upgrade SBT to 1.5.5

2021-07-12 Thread GitBox


HyukjinKwon closed pull request #33312:
URL: https://github.com/apache/spark/pull/33312


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #33315: [SPARK-36104][PYTHON][FOLLOWUP] Remove unused import "typing.cast"

2021-07-12 Thread GitBox


HyukjinKwon closed pull request #33315:
URL: https://github.com/apache/spark/pull/33315


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #33312: [SPARK-36110][BUILD] Upgrade SBT to 1.5.5

2021-07-12 Thread GitBox


HyukjinKwon commented on pull request #33312:
URL: https://github.com/apache/spark/pull/33312#issuecomment-878763360


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #33315: [SPARK-36104][PYTHON][FOLLOWUP] Remove unused import "typing.cast"

2021-07-12 Thread GitBox


HyukjinKwon commented on pull request #33315:
URL: https://github.com/apache/spark/pull/33315#issuecomment-878763205






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33297: [SPARK-36069] from_json's exception should contain field name, type and value

2021-07-12 Thread GitBox


SparkQA commented on pull request #33297:
URL: https://github.com/apache/spark/pull/33297#issuecomment-878762536


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45466/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30869: [SPARK-33865][SQL] When HiveDDL, we need check avro schema too

2021-07-12 Thread GitBox


SparkQA commented on pull request #30869:
URL: https://github.com/apache/spark/pull/30869#issuecomment-878762434


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45468/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sarutak commented on pull request #33315: [SPARK-36104][PYTHON][FOLLOWUP] Remove unused import "typing.cast"

2021-07-12 Thread GitBox


sarutak commented on pull request #33315:
URL: https://github.com/apache/spark/pull/33315#issuecomment-878762244


   cc: @HyukjinKwon @xinrong-databricks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sarutak opened a new pull request #33315: [SPARK-36104][PYTHON][FOLLOWUP] Remove unused import "typing.cast"

2021-07-12 Thread GitBox


sarutak opened a new pull request #33315:
URL: https://github.com/apache/spark/pull/33315


   ### What changes were proposed in this pull request?
   
   This is a followup PR for SPARK-36104 (#33307) and removes unused import 
`typing.cast`.
   
   ### Why are the changes needed?
   
   To recover CI.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   CI.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33310: [WIP][SPARK-36105][SQL] OptimizeLocalShuffleReader support reading data of multiple mappers in one task

2021-07-12 Thread GitBox


SparkQA commented on pull request #33310:
URL: https://github.com/apache/spark/pull/33310#issuecomment-878761359


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45465/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

2021-07-12 Thread GitBox


Ngone51 commented on a change in pull request #32401:
URL: https://github.com/apache/spark/pull/32401#discussion_r668409304



##
File path: 
core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala
##
@@ -360,13 +389,41 @@ private[spark] class IndexShuffleBlockResolver(
   if (dataTmp != null && dataTmp.exists() && 
!dataTmp.renameTo(dataFile)) {
 throw new IOException("fail to rename file " + dataTmp + " to " + 
dataFile)
   }
+
+  // write the checksum file
+  checksumTmpOpt.zip(checksumFileOpt).foreach { case (checksumTmp, 
checksumFile) =>
+val out = new DataOutputStream(
+  new BufferedOutputStream(
+new FileOutputStream(checksumTmp)
+  )
+)
+Utils.tryWithSafeFinally {
+  checksums.foreach(out.writeLong)
+} {
+  out.close()
+}
+
+if (checksumFile.exists()) {
+  checksumFile.delete()
+}
+if (!checksumTmp.renameTo(checksumFile)) {
+  // It's not worthwhile to fail here after index file and data 
file are already
+  // successfully stored due to checksum is only used for the 
corner error case.
+  logWarning("fail to rename file " + checksumTmp + " to " + 
checksumFile)
+}
+  }
 }
   }
 } finally {
   logDebug(s"Shuffle index for mapId $mapId: ${lengths.mkString("[", ",", 
"]")}")
   if (indexTmp.exists() && !indexTmp.delete()) {
 logError(s"Failed to delete temporary index file at 
${indexTmp.getAbsolutePath}")
   }
+  checksumTmpOpt.foreach { checksumTmp =>
+if (checksumTmp.exists() && !checksumTmp.delete()) {

Review comment:
   Good point! We won't propagate the error. I'll handle it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

2021-07-12 Thread GitBox


Ngone51 commented on a change in pull request #32401:
URL: https://github.com/apache/spark/pull/32401#discussion_r668408590



##
File path: 
core/src/main/java/org/apache/spark/shuffle/checksum/ShuffleChecksumHelper.java
##
@@ -0,0 +1,81 @@
+package org.apache.spark.shuffle.checksum;
+
+import java.util.zip.Adler32;
+import java.util.zip.CRC32;
+import java.util.zip.Checksum;
+
+import org.apache.spark.SparkConf;
+import org.apache.spark.SparkException;
+import org.apache.spark.internal.config.package$;
+import org.apache.spark.storage.ShuffleChecksumBlockId;
+
+public class ShuffleChecksumHelper {

Review comment:
   Added the doc. And marked it as private.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

2021-07-12 Thread GitBox


Ngone51 commented on a change in pull request #32401:
URL: https://github.com/apache/spark/pull/32401#discussion_r668408329



##
File path: 
core/src/main/scala/org/apache/spark/storage/DiskBlockObjectWriter.scala
##
@@ -101,12 +108,30 @@ private[spark] class DiskBlockObjectWriter(
*/
   private var numRecordsWritten = 0
 
+  /**
+   * Set the checksum that the checksumOutputStream should use
+   */
+  def setChecksum(checksum: Checksum): Unit = {
+if (checksumOutputStream == null) {
+  this.checksumEnabled = true
+  this.checksum = checksum
+} else {
+  checksumOutputStream.setChecksum(checksum)

Review comment:
   Yes, it's intentional. In the case of `ShuffleExternalSorter` spill, one 
`DiskBlockObjectWriter` would serve multiple partitions and different 
partitions should use different checksums.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ReachInfi opened a new pull request #33314: Add bitmap functions in Spark SQL

2021-07-12 Thread GitBox


ReachInfi opened a new pull request #33314:
URL: https://github.com/apache/spark/pull/33314


   
   
   ### What changes were proposed in this pull request?
   
   add functions of bitmap building and computing cardinality for Spark SQL, If 
this is ok, I will update function.scala and FunctionRegistry.scala.
   
   
   ### Why are the changes needed?
   
   Bitmaps are used more and more widely, and many frameworks have native 
support, such as Clickhouse
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   CI, it performs well on billions of rows based on our real demand


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dgd-contributor commented on a change in pull request #33293: [SPARK-36076][SQL][3.1] ArrayIndexOutOfBounds in Cast string to times…

2021-07-12 Thread GitBox


dgd-contributor commented on a change in pull request #33293:
URL: https://github.com/apache/spark/pull/33293#discussion_r668407288



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##
@@ -344,29 +344,28 @@ object DateTimeUtils {
   segments(6) /= 10
   digitsMilli -= 1
 }
-try {
-  val zoneId = tz match {
-case None => timeZoneId
-case Some("+") => ZoneOffset.ofHoursMinutes(segments(7), segments(8))
-case Some("-") => ZoneOffset.ofHoursMinutes(-segments(7), -segments(8))
-case Some(zoneName: String) => getZoneId(zoneName.trim)
-  }
-  val nanoseconds = MICROSECONDS.toNanos(segments(6))
-  val localTime = LocalTime.of(segments(3), segments(4), segments(5), 
nanoseconds.toInt)
-  val localDate = if (justTime) {
-LocalDate.now(zoneId)
-  } else {
-LocalDate.of(segments(0), segments(1), segments(2))
-  }
-  val localDateTime = LocalDateTime.of(localDate, localTime)
-  val zonedDateTime = ZonedDateTime.of(localDateTime, zoneId)
-  val instant = Instant.from(zonedDateTime)
-  Some(instantToMicros(instant))
-} catch {
-  case NonFatal(_) => None
+val zoneId = tz match {
+  case None => timeZoneId
+  case Some("+") => ZoneOffset.ofHoursMinutes(segments(7), segments(8))
+  case Some("-") => ZoneOffset.ofHoursMinutes(-segments(7), -segments(8))
+  case Some(zoneName: String) => getZoneId(zoneName.trim)
+}
+val nanoseconds = MICROSECONDS.toNanos(segments(6))
+val localTime = LocalTime.of(segments(3), segments(4), segments(5), 
nanoseconds.toInt)
+val localDate = if (justTime) {
+  LocalDate.now(zoneId)
+} else {
+  LocalDate.of(segments(0), segments(1), segments(2))
 }
+val localDateTime = LocalDateTime.of(localDate, localTime)
+val zonedDateTime = ZonedDateTime.of(localDateTime, zoneId)
+val instant = Instant.from(zonedDateTime)
+Some(instantToMicros(instant))
+  } catch {
+case NonFatal(_) => None
   }
 
+

Review comment:
   Thanks, done!
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33297: [SPARK-36069] from_json's exception should contain field name, type and value

2021-07-12 Thread GitBox


SparkQA commented on pull request #33297:
URL: https://github.com/apache/spark/pull/33297#issuecomment-878756758


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45463/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33293: [SPARK-36076][SQL][3.1] ArrayIndexOutOfBounds in Cast string to times…

2021-07-12 Thread GitBox


SparkQA commented on pull request #33293:
URL: https://github.com/apache/spark/pull/33293#issuecomment-878754781


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45464/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33293: [SPARK-36076][SQL][3.1] ArrayIndexOutOfBounds in Cast string to times…

2021-07-12 Thread GitBox


AmplabJenkins removed a comment on pull request #33293:
URL: https://github.com/apache/spark/pull/33293#issuecomment-878753027


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45460/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33293: [SPARK-36076][SQL][3.1] ArrayIndexOutOfBounds in Cast string to times…

2021-07-12 Thread GitBox


AmplabJenkins commented on pull request #33293:
URL: https://github.com/apache/spark/pull/33293#issuecomment-878753027


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45460/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33293: [SPARK-36076][SQL][3.1] ArrayIndexOutOfBounds in Cast string to times…

2021-07-12 Thread GitBox


SparkQA commented on pull request #33293:
URL: https://github.com/apache/spark/pull/33293#issuecomment-878753012


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45460/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function

2021-07-12 Thread GitBox


beliefer commented on a change in pull request #33258:
URL: https://github.com/apache/spark/pull/33258#discussion_r668401909



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##
@@ -236,6 +274,8 @@ case class CurrentBatchTimestamp(
 val timestampUs = millisToMicros(timestampMs)
 dataType match {
   case _: TimestampType => Literal(timestampUs, TimestampType)
+  case _: TimestampNTZType =>
+Literal(convertTz(timestampUs, ZoneOffset.UTC, zoneId), 
TimestampNTZType)

Review comment:
   I will add new test case in another PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30869: [SPARK-33865][SQL] When HiveDDL, we need check avro schema too

2021-07-12 Thread GitBox


SparkQA removed a comment on pull request #30869:
URL: https://github.com/apache/spark/pull/30869#issuecomment-878749770


   **[Test build #140954 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140954/testReport)**
 for PR 30869 at commit 
[`9654220`](https://github.com/apache/spark/commit/9654220aa5230171e06ac377a24acbc247bb66c7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33310: [WIP][SPARK-36105][SQL] OptimizeLocalShuffleReader support reading data of multiple mappers in one task

2021-07-12 Thread GitBox


AmplabJenkins removed a comment on pull request #33310:
URL: https://github.com/apache/spark/pull/33310#issuecomment-878685125


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30869: [SPARK-33865][SQL] When HiveDDL, we need check avro schema too

2021-07-12 Thread GitBox


AmplabJenkins removed a comment on pull request #30869:
URL: https://github.com/apache/spark/pull/30869#issuecomment-878750245


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140954/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

2021-07-12 Thread GitBox


Ngone51 commented on a change in pull request #32401:
URL: https://github.com/apache/spark/pull/32401#discussion_r668400943



##
File path: 
core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala
##
@@ -360,13 +389,41 @@ private[spark] class IndexShuffleBlockResolver(
   if (dataTmp != null && dataTmp.exists() && 
!dataTmp.renameTo(dataFile)) {
 throw new IOException("fail to rename file " + dataTmp + " to " + 
dataFile)
   }
+
+  // write the checksum file
+  checksumTmpOpt.zip(checksumFileOpt).foreach { case (checksumTmp, 
checksumFile) =>
+val out = new DataOutputStream(
+  new BufferedOutputStream(
+new FileOutputStream(checksumTmp)
+  )
+)
+Utils.tryWithSafeFinally {
+  checksums.foreach(out.writeLong)
+} {
+  out.close()
+}
+
+if (checksumFile.exists()) {
+  checksumFile.delete()
+}
+if (!checksumTmp.renameTo(checksumFile)) {
+  // It's not worthwhile to fail here after index file and data 
file are already
+  // successfully stored due to checksum is only used for the 
corner error case.
+  logWarning("fail to rename file " + checksumTmp + " to " + 
checksumFile)

Review comment:
   I see. I got your point. I'd prefer to back to `if (existingLengths != 
null) {`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30869: [SPARK-33865][SQL] When HiveDDL, we need check avro schema too

2021-07-12 Thread GitBox


AmplabJenkins commented on pull request #30869:
URL: https://github.com/apache/spark/pull/30869#issuecomment-878750245


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140954/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30869: [SPARK-33865][SQL] When HiveDDL, we need check avro schema too

2021-07-12 Thread GitBox


SparkQA commented on pull request #30869:
URL: https://github.com/apache/spark/pull/30869#issuecomment-878750232


   **[Test build #140954 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140954/testReport)**
 for PR 30869 at commit 
[`9654220`](https://github.com/apache/spark/commit/9654220aa5230171e06ac377a24acbc247bb66c7).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30869: [SPARK-33865][SQL] When HiveDDL, we need check avro schema too

2021-07-12 Thread GitBox


SparkQA commented on pull request #30869:
URL: https://github.com/apache/spark/pull/30869#issuecomment-878749770


   **[Test build #140954 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140954/testReport)**
 for PR 30869 at commit 
[`9654220`](https://github.com/apache/spark/commit/9654220aa5230171e06ac377a24acbc247bb66c7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function

2021-07-12 Thread GitBox


SparkQA commented on pull request #33258:
URL: https://github.com/apache/spark/pull/33258#issuecomment-878749337


   **[Test build #140953 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140953/testReport)**
 for PR 33258 at commit 
[`8a4d40d`](https://github.com/apache/spark/commit/8a4d40d76912f3fedc5e282b719b4a61e5908a27).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33310: [WIP][SPARK-36105][SQL] OptimizeLocalShuffleReader support reading data of multiple mappers in one task

2021-07-12 Thread GitBox


SparkQA commented on pull request #33310:
URL: https://github.com/apache/spark/pull/33310#issuecomment-878749275


   **[Test build #140951 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140951/testReport)**
 for PR 33310 at commit 
[`f004267`](https://github.com/apache/spark/commit/f004267282c600e719d4a67a79618c525e27ec6c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33297: [SPARK-36069] from_json's exception should contain field name, type and value

2021-07-12 Thread GitBox


SparkQA commented on pull request #33297:
URL: https://github.com/apache/spark/pull/33297#issuecomment-878749259


   **[Test build #140952 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140952/testReport)**
 for PR 33297 at commit 
[`e0700c6`](https://github.com/apache/spark/commit/e0700c692e0fe62a2a0cd9b7f3fde01e2ce50603).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33311: [SPARK-36109][SS][TEST] Check data after adding data to topic in KafkaSourceStressSuite

2021-07-12 Thread GitBox


AmplabJenkins removed a comment on pull request #33311:
URL: https://github.com/apache/spark/pull/33311#issuecomment-878748105


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45459/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33287: [SPARK-36081][SPARK-36066][SQL] Update the document about the behavior change of trimming characters for cast

2021-07-12 Thread GitBox


AmplabJenkins removed a comment on pull request #33287:
URL: https://github.com/apache/spark/pull/33287#issuecomment-878748106


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140945/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32949: [SPARK-35749][SPARK-35773][SQL] Parse unit list interval literals as tightest year-month/day-time interval types

2021-07-12 Thread GitBox


AmplabJenkins removed a comment on pull request #32949:
URL: https://github.com/apache/spark/pull/32949#issuecomment-878748107


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140939/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33312: [SPARK-36110][BUILD] Upgrade SBT to 1.5.5

2021-07-12 Thread GitBox


AmplabJenkins removed a comment on pull request #33312:
URL: https://github.com/apache/spark/pull/33312#issuecomment-878748108


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45462/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32949: [SPARK-35749][SPARK-35773][SQL] Parse unit list interval literals as tightest year-month/day-time interval types

2021-07-12 Thread GitBox


AmplabJenkins commented on pull request #32949:
URL: https://github.com/apache/spark/pull/32949#issuecomment-878748107


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140939/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33287: [SPARK-36081][SPARK-36066][SQL] Update the document about the behavior change of trimming characters for cast

2021-07-12 Thread GitBox


AmplabJenkins commented on pull request #33287:
URL: https://github.com/apache/spark/pull/33287#issuecomment-878748106


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140945/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33312: [SPARK-36110][BUILD] Upgrade SBT to 1.5.5

2021-07-12 Thread GitBox


AmplabJenkins commented on pull request #33312:
URL: https://github.com/apache/spark/pull/33312#issuecomment-878748108


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45462/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33311: [SPARK-36109][SS][TEST] Check data after adding data to topic in KafkaSourceStressSuite

2021-07-12 Thread GitBox


AmplabJenkins commented on pull request #33311:
URL: https://github.com/apache/spark/pull/33311#issuecomment-878748105


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45459/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33312: [SPARK-36110][BUILD] Upgrade SBT to 1.5.5

2021-07-12 Thread GitBox


SparkQA commented on pull request #33312:
URL: https://github.com/apache/spark/pull/33312#issuecomment-878747011


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45462/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33297: [SPARK-36069] from_json's exception should contain field name, type and value

2021-07-12 Thread GitBox


SparkQA commented on pull request #33297:
URL: https://github.com/apache/spark/pull/33297#issuecomment-878746639


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45463/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33293: [SPARK-36076][SQL][3.1] ArrayIndexOutOfBounds in Cast string to times…

2021-07-12 Thread GitBox


SparkQA commented on pull request #33293:
URL: https://github.com/apache/spark/pull/33293#issuecomment-878744867


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45464/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #33302: Revert "[SPARK-35253][SPARK-35398][SQL][BUILD] Bump up the janino version to v3.1.4"

2021-07-12 Thread GitBox


HyukjinKwon closed pull request #33302:
URL: https://github.com/apache/spark/pull/33302


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #33302: Revert "[SPARK-35253][SPARK-35398][SQL][BUILD] Bump up the janino version to v3.1.4"

2021-07-12 Thread GitBox


HyukjinKwon commented on pull request #33302:
URL: https://github.com/apache/spark/pull/33302#issuecomment-878744589


   Merged to master and branch-3.2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #33310: [WIP][SPARK-36105][SQL] OptimizeLocalShuffleReader support reading data of multiple mappers in one task

2021-07-12 Thread GitBox


HyukjinKwon commented on a change in pull request #33310:
URL: https://github.com/apache/spark/pull/33310#discussion_r668394926



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ShuffledRowRDD.scala
##
@@ -58,6 +58,11 @@ case class PartialMapperPartitionSpec(
 startReducerIndex: Int,
 endReducerIndex: Int) extends ShufflePartitionSpec
 
+case class CoalescedMapperPartitionSpec(
+ startMapIndex: Int,
+ endMapIndex: Int,
+ numReducers: Int) extends 
ShufflePartitionSpec

Review comment:
   ```suggestion
   startMapIndex: Int,
   endMapIndex: Int,
   numReducers: Int) extends ShufflePartitionSpec
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #33310: [WIP][SPARK-36105][SQL] OptimizeLocalShuffleReader support reading data of multiple mappers in one task

2021-07-12 Thread GitBox


HyukjinKwon commented on pull request #33310:
URL: https://github.com/apache/spark/pull/33310#issuecomment-878743794


   ok to test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33287: [SPARK-36081][SPARK-36066][SQL] Update the document about the behavior change of trimming characters for cast

2021-07-12 Thread GitBox


SparkQA removed a comment on pull request #33287:
URL: https://github.com/apache/spark/pull/33287#issuecomment-878687971


   **[Test build #140945 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140945/testReport)**
 for PR 33287 at commit 
[`168f3c8`](https://github.com/apache/spark/commit/168f3c8cce5a8a4bca4c4603f7a4dc3d7683c50b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33287: [SPARK-36081][SPARK-36066][SQL] Update the document about the behavior change of trimming characters for cast

2021-07-12 Thread GitBox


SparkQA commented on pull request #33287:
URL: https://github.com/apache/spark/pull/33287#issuecomment-878743114


   **[Test build #140945 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140945/testReport)**
 for PR 33287 at commit 
[`168f3c8`](https://github.com/apache/spark/commit/168f3c8cce5a8a4bca4c4603f7a4dc3d7683c50b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #33293: [SPARK-36076][SQL][3.1] ArrayIndexOutOfBounds in Cast string to times…

2021-07-12 Thread GitBox


HyukjinKwon commented on pull request #33293:
URL: https://github.com/apache/spark/pull/33293#issuecomment-878742789


   cc @gengliangwang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #33293: [SPARK-36076][SQL][3.1] ArrayIndexOutOfBounds in Cast string to times…

2021-07-12 Thread GitBox


HyukjinKwon commented on a change in pull request #33293:
URL: https://github.com/apache/spark/pull/33293#discussion_r668394064



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##
@@ -344,29 +344,28 @@ object DateTimeUtils {
   segments(6) /= 10
   digitsMilli -= 1
 }
-try {
-  val zoneId = tz match {
-case None => timeZoneId
-case Some("+") => ZoneOffset.ofHoursMinutes(segments(7), segments(8))
-case Some("-") => ZoneOffset.ofHoursMinutes(-segments(7), -segments(8))
-case Some(zoneName: String) => getZoneId(zoneName.trim)
-  }
-  val nanoseconds = MICROSECONDS.toNanos(segments(6))
-  val localTime = LocalTime.of(segments(3), segments(4), segments(5), 
nanoseconds.toInt)
-  val localDate = if (justTime) {
-LocalDate.now(zoneId)
-  } else {
-LocalDate.of(segments(0), segments(1), segments(2))
-  }
-  val localDateTime = LocalDateTime.of(localDate, localTime)
-  val zonedDateTime = ZonedDateTime.of(localDateTime, zoneId)
-  val instant = Instant.from(zonedDateTime)
-  Some(instantToMicros(instant))
-} catch {
-  case NonFatal(_) => None
+val zoneId = tz match {
+  case None => timeZoneId
+  case Some("+") => ZoneOffset.ofHoursMinutes(segments(7), segments(8))
+  case Some("-") => ZoneOffset.ofHoursMinutes(-segments(7), -segments(8))
+  case Some(zoneName: String) => getZoneId(zoneName.trim)
+}
+val nanoseconds = MICROSECONDS.toNanos(segments(6))
+val localTime = LocalTime.of(segments(3), segments(4), segments(5), 
nanoseconds.toInt)
+val localDate = if (justTime) {
+  LocalDate.now(zoneId)
+} else {
+  LocalDate.of(segments(0), segments(1), segments(2))
 }
+val localDateTime = LocalDateTime.of(localDate, localTime)
+val zonedDateTime = ZonedDateTime.of(localDateTime, zoneId)
+val instant = Instant.from(zonedDateTime)
+Some(instantToMicros(instant))
+  } catch {
+case NonFatal(_) => None
   }
 
+

Review comment:
   can you remove this?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #33307: [SPARK-36104][PYTHON] Manage InternalField in DataTypeOps.neg/abs

2021-07-12 Thread GitBox


HyukjinKwon closed pull request #33307:
URL: https://github.com/apache/spark/pull/33307


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #33307: [SPARK-36104][PYTHON] Manage InternalField in DataTypeOps.neg/abs

2021-07-12 Thread GitBox


HyukjinKwon commented on pull request #33307:
URL: https://github.com/apache/spark/pull/33307#issuecomment-878741636


   Merged to master and branch-3.2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33293: [SPARK-36076][SQL][3.1] ArrayIndexOutOfBounds in Cast string to times…

2021-07-12 Thread GitBox


SparkQA commented on pull request #33293:
URL: https://github.com/apache/spark/pull/33293#issuecomment-878741530


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45460/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dgd-contributor commented on a change in pull request #33291: [SPARK-35561][SQL] Remove leading zeros from empty static number type partition

2021-07-12 Thread GitBox


dgd-contributor commented on a change in pull request #33291:
URL: https://github.com/apache/spark/pull/33291#discussion_r668391894



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
##
@@ -351,10 +351,24 @@ object PartitioningUtils {
*/
   def getPathFragment(spec: TablePartitionSpec, partitionSchema: StructType): 
String = {

Review comment:
   PartitioningUtils.parsePartitions use castPartValueToDesiredType to cast 
multiple type of partition. I think in this issue, we only need cast number 
type (or remove leading zeros), the other types will be handled by 
org.apache.spark.sql.catalyst.expressions.CastBase




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >