[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST
TJX2014 commented on a change in pull request #28856: URL: https://github.com/apache/spark/pull/28856#discussion_r450020829 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -2623,8 +2625,16 @@ object Sequence { // about a month length in days and a day length in microseconds val intervalStepInMicros = stepMicros + stepMonths * microsPerMonth + stepDays * microsPerDay -val startMicros: Long = num.toLong(start) * scale -val stopMicros: Long = num.toLong(stop) * scale + +// Date to timestamp is not equal from GMT and Chicago timezones Review comment: Seems the date if different from west to east, when it is date, we might need to consider to zone info to convert to time stamp, if it is already a time stamp, not a date here, we may ignore the zone because the time stamp is already consider it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST
TJX2014 commented on a change in pull request #28856: URL: https://github.com/apache/spark/pull/28856#discussion_r444969655 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -2589,6 +2589,8 @@ object Sequence { } } + // To generate time sequences, we use scale 1 in TemporalSequenceImpl + // for `TimestampType`, while MICROS_PER_DAY for `DateType` Review comment: Ok, I will do it tomorrow. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST
TJX2014 commented on a change in pull request #28856: URL: https://github.com/apache/spark/pull/28856#discussion_r444858262 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -2589,6 +2589,8 @@ object Sequence { } } + // To generate time sequences, we use scale 1 in TemporalSequenceImpl + // for `TimestampType`, while MICROS_PER_DAY for `DateType` Review comment: Base `presto-server-0.236`: presto> select sequence(date('2011-03-01'),date('2011-03-02'),interval '1' hour); Query 20200624_122744_2_pehix failed: sequence step must be a day interval if start and end values are dates presto> select sequence(date('2011-03-01'),date('2011-03-02'),interval '1' day); _col0 [2011-03-01, 2011-03-02] (1 row) Query 20200624_122757_3_pehix, FINISHED, 1 node Splits: 17 total, 17 done (100.00%) 0:00 [0 rows, 0B] [0 rows/s, 0B/s] presto> select sequence(date('2011-03-01'),date('2011-03-02'),interval '1' month); _col0 [2011-03-01] (1 row) Query 20200624_122806_4_pehix, FINISHED, 1 node Splits: 17 total, 17 done (100.00%) 0:00 [0 rows, 0B] [0 rows/s, 0B/s] presto> select sequence(date('2011-03-01'),date('2011-03-02'),interval '1' year); _col0 [2011-03-01] (1 row) Query 20200624_122810_5_pehix, FINISHED, 1 node Splits: 17 total, 17 done (100.00%) 0:00 [0 rows, 0B] [0 rows/s, 0B/s] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST
TJX2014 commented on a change in pull request #28856: URL: https://github.com/apache/spark/pull/28856#discussion_r444859851 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -2589,6 +2589,8 @@ object Sequence { } } + // To generate time sequences, we use scale 1 in TemporalSequenceImpl + // for `TimestampType`, while MICROS_PER_DAY for `DateType` Review comment: @cloud-fan Done, It seems can be sequence `day`,`month`,`year` when start and end are `DateType` in presto. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST
TJX2014 commented on a change in pull request #28856: URL: https://github.com/apache/spark/pull/28856#discussion_r444858262 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -2589,6 +2589,8 @@ object Sequence { } } + // To generate time sequences, we use scale 1 in TemporalSequenceImpl + // for `TimestampType`, while MICROS_PER_DAY for `DateType` Review comment: `base presto-server-0.236` presto> select sequence(date('2011-03-01'),date('2011-03-02'),interval '1' hour); Query 20200624_122744_2_pehix failed: sequence step must be a day interval if start and end values are dates presto> select sequence(date('2011-03-01'),date('2011-03-02'),interval '1' day); _col0 [2011-03-01, 2011-03-02] (1 row) Query 20200624_122757_3_pehix, FINISHED, 1 node Splits: 17 total, 17 done (100.00%) 0:00 [0 rows, 0B] [0 rows/s, 0B/s] presto> select sequence(date('2011-03-01'),date('2011-03-02'),interval '1' month); _col0 [2011-03-01] (1 row) Query 20200624_122806_4_pehix, FINISHED, 1 node Splits: 17 total, 17 done (100.00%) 0:00 [0 rows, 0B] [0 rows/s, 0B/s] presto> select sequence(date('2011-03-01'),date('2011-03-02'),interval '1' year); _col0 [2011-03-01] (1 row) Query 20200624_122810_5_pehix, FINISHED, 1 node Splits: 17 total, 17 done (100.00%) 0:00 [0 rows, 0B] [0 rows/s, 0B/s] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST
TJX2014 commented on a change in pull request #28856: URL: https://github.com/apache/spark/pull/28856#discussion_r444859851 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -2589,6 +2589,8 @@ object Sequence { } } + // To generate time sequences, we use scale 1 in TemporalSequenceImpl + // for `TimestampType`, while MICROS_PER_DAY for `DateType` Review comment: @cloud-fan Done, It seems can be sequence `day`,`month`,`year` when start and end are `DateType`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST
TJX2014 commented on a change in pull request #28856: URL: https://github.com/apache/spark/pull/28856#discussion_r444859851 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -2589,6 +2589,8 @@ object Sequence { } } + // To generate time sequences, we use scale 1 in TemporalSequenceImpl + // for `TimestampType`, while MICROS_PER_DAY for `DateType` Review comment: @cloud-fan Done, It seems can be sequence `day`,`month`,'year' when start and end are date type. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST
TJX2014 commented on a change in pull request #28856: URL: https://github.com/apache/spark/pull/28856#discussion_r444858262 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -2589,6 +2589,8 @@ object Sequence { } } + // To generate time sequences, we use scale 1 in TemporalSequenceImpl + // for `TimestampType`, while MICROS_PER_DAY for `DateType` Review comment: presto> select sequence(date('2011-03-01'),date('2011-03-02'),interval '1' hour); Query 20200624_122744_2_pehix failed: sequence step must be a day interval if start and end values are dates presto> select sequence(date('2011-03-01'),date('2011-03-02'),interval '1' day); _col0 [2011-03-01, 2011-03-02] (1 row) Query 20200624_122757_3_pehix, FINISHED, 1 node Splits: 17 total, 17 done (100.00%) 0:00 [0 rows, 0B] [0 rows/s, 0B/s] presto> select sequence(date('2011-03-01'),date('2011-03-02'),interval '1' month); _col0 [2011-03-01] (1 row) Query 20200624_122806_4_pehix, FINISHED, 1 node Splits: 17 total, 17 done (100.00%) 0:00 [0 rows, 0B] [0 rows/s, 0B/s] presto> select sequence(date('2011-03-01'),date('2011-03-02'),interval '1' year); _col0 [2011-03-01] (1 row) Query 20200624_122810_5_pehix, FINISHED, 1 node Splits: 17 total, 17 done (100.00%) 0:00 [0 rows, 0B] [0 rows/s, 0B/s] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST
TJX2014 commented on a change in pull request #28856: URL: https://github.com/apache/spark/pull/28856#discussion_r444297027 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -2589,6 +2589,8 @@ object Sequence { } } + // To generate time sequences, we use scale 1 in TemporalSequenceImpl + // for `TimestampType`, while MICROS_PER_DAY for `DateType` Review comment: Seems pgsql can only support int as follows: postgres= create sequence seq_test; CREATE SEQUENCE postgres= select nextval('seq_test'); 1 (1 行记录) postgres= select nextval('seq_test'); 2 (1 行记录) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST
TJX2014 commented on a change in pull request #28856: URL: https://github.com/apache/spark/pull/28856#discussion_r444045788 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -2589,6 +2589,8 @@ object Sequence { } } + // To generate time sequences, we use scale 1 in TemporalSequenceImpl + // for `TimestampType`, while MICROS_PER_DAY for `DateType` Review comment: Yes, seems we can, the result as follows: `scala> sql("select explode(sequence(cast('2011-03-01' as date), cast('2011-05-01' as date), interval 1 hour))").count res19: Long = 1465 scala> sql("select explode(sequence(cast('2011-03-01' as date), cast('2011-05-01' as date), interval 1 minute))").count res20: Long = 87841 scala> sql("select explode(sequence(cast('2011-03-01' as date), cast('2011-05-01' as date), interval 1 second))").count res21: Long = 5270401 scala> sql("select explode(sequence(cast('2011-03-01' as date), cast('2011-05-01' as date), interval 1 minute))").head(3) res25: Array[org.apache.spark.sql.Row] = Array([2011-03-01], [2011-03-01], [2011-03-01]) scala> sql("select explode(sequence(cast('2011-03-01' as date), cast('2011-05-01' as date), interval 1 second))").head(3) res26: Array[org.apache.spark.sql.Row] = Array([2011-03-01], [2011-03-01], [2011-03-01]) scala> sql("select explode(sequence(cast('2011-03-01' as date), cast('2011-05-01' as date), interval 1 minute))").head(3) res27: Array[org.apache.spark.sql.Row] = Array([2011-03-01], [2011-03-01], [2011-03-01]) scala> sql("select explode(sequence(cast('2011-03-01' as date), cast('2011-05-01' as date), interval 1 hour))").head(3) res28: Array[org.apache.spark.sql.Row] = Array([2011-03-01], [2011-03-01], [2011-03-01]) ` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST
TJX2014 commented on a change in pull request #28856: URL: https://github.com/apache/spark/pull/28856#discussion_r444045788 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -2589,6 +2589,8 @@ object Sequence { } } + // To generate time sequences, we use scale 1 in TemporalSequenceImpl + // for `TimestampType`, while MICROS_PER_DAY for `DateType` Review comment: Yes, we can, but the result seems a little strange: `scala> sql("select explode(sequence(cast('2011-03-01' as date), cast('2011-05-01' as date), interval 1 hour))").count res19: Long = 1465 scala> sql("select explode(sequence(cast('2011-03-01' as date), cast('2011-05-01' as date), interval 1 minute))").count res20: Long = 87841 scala> sql("select explode(sequence(cast('2011-03-01' as date), cast('2011-05-01' as date), interval 1 second))").count res21: Long = 5270401` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST
TJX2014 commented on a change in pull request #28856: URL: https://github.com/apache/spark/pull/28856#discussion_r443643942 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -2623,8 +2623,16 @@ object Sequence { // about a month length in days and a day length in microseconds val intervalStepInMicros = stepMicros + stepMonths * microsPerMonth + stepDays * microsPerDay -val startMicros: Long = num.toLong(start) * scale -val stopMicros: Long = num.toLong(stop) * scale + +// Date to timestamp is not equal from GMT and Chicago timezones +val (startMicros, stopMicros) = if (scale == 1) { + (num.toLong(start), num.toLong(stop)) +} +else { + (daysToMicros(num.toInt(start), zoneId), +daysToMicros(num.toInt(stop), zoneId)) Review comment: Done, maybe we can pass the scale through constructor: private class TemporalSequenceImpl[T: ClassTag] (dt: IntegralType, **scale: Long**, fromLong: Long => T, zoneId: ZoneId) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST
TJX2014 commented on a change in pull request #28856: URL: https://github.com/apache/spark/pull/28856#discussion_r443610816 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -2623,8 +2623,16 @@ object Sequence { // about a month length in days and a day length in microseconds val intervalStepInMicros = Review comment: hi @cloud-fan,as @MaxGekk explain here, I am not sure if this patch looks ok,I am willing to `add more documents to TemporalSequenceImpl`but I am not sure if we can follow this way or refactor a little. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST
TJX2014 commented on a change in pull request #28856: URL: https://github.com/apache/spark/pull/28856#discussion_r443610816 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -2623,8 +2623,16 @@ object Sequence { // about a month length in days and a day length in microseconds val intervalStepInMicros = Review comment: hi @cloud-fan,as @MaxGekk explain here, I am not sure if this patch look ok,I am willing to `add more documents to TemporalSequenceImpl`but I am not sure if we can follow this way or refactor a little. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST
TJX2014 commented on a change in pull request #28856: URL: https://github.com/apache/spark/pull/28856#discussion_r443610816 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -2623,8 +2623,16 @@ object Sequence { // about a month length in days and a day length in microseconds val intervalStepInMicros = Review comment: hi @cloud-fan,as @MaxGekk explain here, I am not sure if this patch look ok,I am willing to `add more documents to TemporalSequenceImpl`if we can follow this way rather than refactor a little. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST
TJX2014 commented on a change in pull request #28856: URL: https://github.com/apache/spark/pull/28856#discussion_r443560745 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -2623,8 +2623,16 @@ object Sequence { // about a month length in days and a day length in microseconds val intervalStepInMicros = stepMicros + stepMonths * microsPerMonth + stepDays * microsPerDay -val startMicros: Long = num.toLong(start) * scale -val stopMicros: Long = num.toLong(stop) * scale + +// Date to timestamp is not equal from GMT and Chicago timezones +val (startMicros, stopMicros) = if (scale == 1) { + (num.toLong(start), num.toLong(stop)) +} +else { + (daysToMicros(num.toInt(start), zoneId), +daysToMicros(num.toInt(stop), zoneId)) Review comment: Because when `scale` != 1,it is converted to day count,so we may need to use zone info to translate into microseconds to get a correct result,rather than just multiply `MICROS_PER_DAY` which ignore timezone. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST
TJX2014 commented on a change in pull request #28856: URL: https://github.com/apache/spark/pull/28856#discussion_r443556895 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -2623,8 +2623,16 @@ object Sequence { // about a month length in days and a day length in microseconds val intervalStepInMicros = stepMicros + stepMonths * microsPerMonth + stepDays * microsPerDay -val startMicros: Long = num.toLong(start) * scale -val stopMicros: Long = num.toLong(stop) * scale + +// Date to timestamp is not equal from GMT and Chicago timezones +val (startMicros, stopMicros) = if (scale == 1) { + (num.toLong(start), num.toLong(stop)) Review comment: Maybe we could separate this into different methods ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST
TJX2014 commented on a change in pull request #28856: URL: https://github.com/apache/spark/pull/28856#discussion_r442621499 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -2698,7 +2721,11 @@ object Sequence { | int $i = 0; | | while ($t < $exclusiveItem ^ $stepSign < 0) { - |$arr[$i] = ($elemType) ($t / ${scale}L); + |if (${scale}L == 1L) { + | $arr[$i] = ($elemType) ($t / ${scale}L); Review comment: May it is better change `$arr[$i] = ($elemType) ($t / ${scale}L)` to `$arr[$i] = ($elemType) $t` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST
TJX2014 commented on a change in pull request #28856: URL: https://github.com/apache/spark/pull/28856#discussion_r442621499 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -2698,7 +2721,11 @@ object Sequence { | int $i = 0; | | while ($t < $exclusiveItem ^ $stepSign < 0) { - |$arr[$i] = ($elemType) ($t / ${scale}L); + |if (${scale}L == 1L) { + | $arr[$i] = ($elemType) ($t / ${scale}L); Review comment: `$arr[$i] = ($elemType) ($t / ${scale}L)` -> `$arr[$i] = ($elemType) $t` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST
TJX2014 commented on a change in pull request #28856: URL: https://github.com/apache/spark/pull/28856#discussion_r442621330 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -2635,7 +2643,11 @@ object Sequence { var i = 0 while (t < exclusiveItem ^ stepSign < 0) { - arr(i) = fromLong(t / scale) + arr(i) = if (scale == 1) { +fromLong(t / scale) Review comment: I will change to `fromLong(t)` because it is same as ` fromLong(t / scale)` in `if` condition This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST
TJX2014 commented on a change in pull request #28856: URL: https://github.com/apache/spark/pull/28856#discussion_r442613764 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -2635,7 +2643,11 @@ object Sequence { var i = 0 while (t < exclusiveItem ^ stepSign < 0) { - arr(i) = fromLong(t / scale) + arr(i) = if (scale == 1) { +fromLong(t / scale) + } else { +fromLong(Math.round(t / scale.toFloat)) Review comment: We could get date from `Math.round(t / scale.toFloat)`, if the target is timestamp, the former ` fromLong(t / scale)` could be used. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST
TJX2014 commented on a change in pull request #28856: URL: https://github.com/apache/spark/pull/28856#discussion_r442560213 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -2623,8 +2623,16 @@ object Sequence { // about a month length in days and a day length in microseconds val intervalStepInMicros = Review comment: `start is DateType, just add number of days. The same as for timestamps, time zone is not involved here.` seems strange in different `tz` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST
TJX2014 commented on a change in pull request #28856: URL: https://github.com/apache/spark/pull/28856#discussion_r442557807 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala ## @@ -1836,4 +1836,16 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper checkEvaluation(ArrayIntersect(empty, oneNull), Seq.empty) checkEvaluation(ArrayIntersect(oneNull, empty), Seq.empty) } + + test("SPARK-31982: sequence doesn't handle date increments that cross DST") { +Array("America/Chicago", "GMT", "Asia/Shanghai").foreach(tz => { + checkEvaluation(Sequence( +Cast(Literal("2011-03-01"), DateType), +Cast(Literal("2011-04-01"), DateType), +Option(Literal(stringToInterval("interval 1 month"))), +Option(tz)), +Seq( + Date.valueOf("2011-03-01"), Date.valueOf("2011-04-01"))) Review comment: Sure, the result could be `tz` independently. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST
TJX2014 commented on a change in pull request #28856: URL: https://github.com/apache/spark/pull/28856#discussion_r442305304 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -2698,7 +2717,7 @@ object Sequence { | int $i = 0; | | while ($t < $exclusiveItem ^ $stepSign < 0) { - |$arr[$i] = ($elemType) ($t / ${scale}L); + |$arr[$i] = ($elemType) (Math.round($t / (float)${scale}L)); Review comment: @MaxGekk Because we may need the `Math.round`, if not use this float ops,it seems hard to avoid the gap about one day between the output and expected. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST
TJX2014 commented on a change in pull request #28856: URL: https://github.com/apache/spark/pull/28856#discussion_r442306713 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala ## @@ -1836,4 +1836,16 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper checkEvaluation(ArrayIntersect(empty, oneNull), Seq.empty) checkEvaluation(ArrayIntersect(oneNull, empty), Seq.empty) } + + test("SPARK-31982: sequence doesn't handle date increments that cross DST") { +Array("America/Chicago", "GMT", "Asia/Shanghai").foreach(tz => { + checkEvaluation(Sequence( +Cast(Literal("2011-03-01"), DateType), +Cast(Literal("2011-04-01"), DateType), +Option(Literal(stringToInterval("interval 1 month"))), +Option(tz)), +Seq( + Date.valueOf("2011-03-01"), Date.valueOf("2011-04-01"))) Review comment: Yes, `America/Los_Angeles` can pass the test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST
TJX2014 commented on a change in pull request #28856: URL: https://github.com/apache/spark/pull/28856#discussion_r442305304 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -2698,7 +2717,7 @@ object Sequence { | int $i = 0; | | while ($t < $exclusiveItem ^ $stepSign < 0) { - |$arr[$i] = ($elemType) ($t / ${scale}L); + |$arr[$i] = ($elemType) (Math.round($t / (float)${scale}L)); Review comment: @MaxGekk Because we may need this `Math.round`, if not use this flout ops,it seems hard to avoid the gap about one day between the output and expected. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST
TJX2014 commented on a change in pull request #28856: URL: https://github.com/apache/spark/pull/28856#discussion_r442154371 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -2623,8 +2623,16 @@ object Sequence { // about a month length in days and a day length in microseconds val intervalStepInMicros = stepMicros + stepMonths * microsPerMonth + stepDays * microsPerDay -val startMicros: Long = num.toLong(start) * scale -val stopMicros: Long = num.toLong(stop) * scale + +// Date to timestamp is not equal from GMT and Chicago timezones +val (startMicros, stopMicros) = if (scale == 1) { + (num.toLong(start), num.toLong(stop)) +} +else { + (epochDaysToMicros(num.toInt(start), zoneId), Review comment: Use timezone to transfer days to micros. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST
TJX2014 commented on a change in pull request #28856: URL: https://github.com/apache/spark/pull/28856#discussion_r442148485 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -2635,7 +2643,7 @@ object Sequence { var i = 0 while (t < exclusiveItem ^ stepSign < 0) { - arr(i) = fromLong(t / scale) Review comment: Resolve Asia timezone not correct. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org