[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST

2020-07-06 Thread GitBox


TJX2014 commented on a change in pull request #28856:
URL: https://github.com/apache/spark/pull/28856#discussion_r450020829



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##
@@ -2623,8 +2625,16 @@ object Sequence {
 // about a month length in days and a day length in microseconds
 val intervalStepInMicros =
   stepMicros + stepMonths * microsPerMonth + stepDays * microsPerDay
-val startMicros: Long = num.toLong(start) * scale
-val stopMicros: Long = num.toLong(stop) * scale
+
+// Date to timestamp is not equal from GMT and Chicago timezones

Review comment:
   Seems the date if different from west to east, when it is date, we might 
need to consider to zone info to convert to time stamp, if it is already a time 
stamp, not a date here, we may ignore the zone because the time stamp is 
already consider it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST

2020-06-24 Thread GitBox


TJX2014 commented on a change in pull request #28856:
URL: https://github.com/apache/spark/pull/28856#discussion_r444969655



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##
@@ -2589,6 +2589,8 @@ object Sequence {
 }
   }
 
+  // To generate time sequences, we use scale 1 in TemporalSequenceImpl
+  // for `TimestampType`, while MICROS_PER_DAY for `DateType`

Review comment:
   Ok, I will do it tomorrow.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST

2020-06-24 Thread GitBox


TJX2014 commented on a change in pull request #28856:
URL: https://github.com/apache/spark/pull/28856#discussion_r444858262



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##
@@ -2589,6 +2589,8 @@ object Sequence {
 }
   }
 
+  // To generate time sequences, we use scale 1 in TemporalSequenceImpl
+  // for `TimestampType`, while MICROS_PER_DAY for `DateType`

Review comment:
   Base `presto-server-0.236`:
   presto> select sequence(date('2011-03-01'),date('2011-03-02'),interval '1' 
hour);
   Query 20200624_122744_2_pehix failed: sequence step must be a day 
interval if start and end values are dates
   presto> select sequence(date('2011-03-01'),date('2011-03-02'),interval '1' 
day);
 _col0   
[2011-03-01, 2011-03-02] 
   (1 row)
   Query 20200624_122757_3_pehix, FINISHED, 1 node
   Splits: 17 total, 17 done (100.00%)
   0:00 [0 rows, 0B] [0 rows/s, 0B/s]
   presto> select sequence(date('2011-03-01'),date('2011-03-02'),interval '1' 
month);
   _col0 
[2011-03-01] 
   (1 row)
   
   Query 20200624_122806_4_pehix, FINISHED, 1 node
   Splits: 17 total, 17 done (100.00%)
   0:00 [0 rows, 0B] [0 rows/s, 0B/s]
   presto> select sequence(date('2011-03-01'),date('2011-03-02'),interval '1' 
year);
   _col0 
[2011-03-01] 
   (1 row)
   Query 20200624_122810_5_pehix, FINISHED, 1 node
   Splits: 17 total, 17 done (100.00%)
   0:00 [0 rows, 0B] [0 rows/s, 0B/s]





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST

2020-06-24 Thread GitBox


TJX2014 commented on a change in pull request #28856:
URL: https://github.com/apache/spark/pull/28856#discussion_r444859851



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##
@@ -2589,6 +2589,8 @@ object Sequence {
 }
   }
 
+  // To generate time sequences, we use scale 1 in TemporalSequenceImpl
+  // for `TimestampType`, while MICROS_PER_DAY for `DateType`

Review comment:
   @cloud-fan Done, It seems can be sequence `day`,`month`,`year` when 
start and end are `DateType` in presto.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST

2020-06-24 Thread GitBox


TJX2014 commented on a change in pull request #28856:
URL: https://github.com/apache/spark/pull/28856#discussion_r444858262



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##
@@ -2589,6 +2589,8 @@ object Sequence {
 }
   }
 
+  // To generate time sequences, we use scale 1 in TemporalSequenceImpl
+  // for `TimestampType`, while MICROS_PER_DAY for `DateType`

Review comment:
   `base presto-server-0.236`
   presto> select sequence(date('2011-03-01'),date('2011-03-02'),interval '1' 
hour);
   Query 20200624_122744_2_pehix failed: sequence step must be a day 
interval if start and end values are dates
   presto> select sequence(date('2011-03-01'),date('2011-03-02'),interval '1' 
day);
 _col0   
[2011-03-01, 2011-03-02] 
   (1 row)
   Query 20200624_122757_3_pehix, FINISHED, 1 node
   Splits: 17 total, 17 done (100.00%)
   0:00 [0 rows, 0B] [0 rows/s, 0B/s]
   presto> select sequence(date('2011-03-01'),date('2011-03-02'),interval '1' 
month);
   _col0 
[2011-03-01] 
   (1 row)
   
   Query 20200624_122806_4_pehix, FINISHED, 1 node
   Splits: 17 total, 17 done (100.00%)
   0:00 [0 rows, 0B] [0 rows/s, 0B/s]
   presto> select sequence(date('2011-03-01'),date('2011-03-02'),interval '1' 
year);
   _col0 
[2011-03-01] 
   (1 row)
   Query 20200624_122810_5_pehix, FINISHED, 1 node
   Splits: 17 total, 17 done (100.00%)
   0:00 [0 rows, 0B] [0 rows/s, 0B/s]





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST

2020-06-24 Thread GitBox


TJX2014 commented on a change in pull request #28856:
URL: https://github.com/apache/spark/pull/28856#discussion_r444859851



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##
@@ -2589,6 +2589,8 @@ object Sequence {
 }
   }
 
+  // To generate time sequences, we use scale 1 in TemporalSequenceImpl
+  // for `TimestampType`, while MICROS_PER_DAY for `DateType`

Review comment:
   @cloud-fan Done, It seems can be sequence `day`,`month`,`year` when 
start and end are `DateType`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST

2020-06-24 Thread GitBox


TJX2014 commented on a change in pull request #28856:
URL: https://github.com/apache/spark/pull/28856#discussion_r444859851



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##
@@ -2589,6 +2589,8 @@ object Sequence {
 }
   }
 
+  // To generate time sequences, we use scale 1 in TemporalSequenceImpl
+  // for `TimestampType`, while MICROS_PER_DAY for `DateType`

Review comment:
   @cloud-fan Done, It seems can be sequence `day`,`month`,'year' when 
start and end are date type.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST

2020-06-24 Thread GitBox


TJX2014 commented on a change in pull request #28856:
URL: https://github.com/apache/spark/pull/28856#discussion_r444858262



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##
@@ -2589,6 +2589,8 @@ object Sequence {
 }
   }
 
+  // To generate time sequences, we use scale 1 in TemporalSequenceImpl
+  // for `TimestampType`, while MICROS_PER_DAY for `DateType`

Review comment:
   presto> select sequence(date('2011-03-01'),date('2011-03-02'),interval 
'1' hour);
   Query 20200624_122744_2_pehix failed: sequence step must be a day 
interval if start and end values are dates
   presto> select sequence(date('2011-03-01'),date('2011-03-02'),interval '1' 
day);
 _col0   
[2011-03-01, 2011-03-02] 
   (1 row)
   Query 20200624_122757_3_pehix, FINISHED, 1 node
   Splits: 17 total, 17 done (100.00%)
   0:00 [0 rows, 0B] [0 rows/s, 0B/s]
   presto> select sequence(date('2011-03-01'),date('2011-03-02'),interval '1' 
month);
   _col0 
[2011-03-01] 
   (1 row)
   
   Query 20200624_122806_4_pehix, FINISHED, 1 node
   Splits: 17 total, 17 done (100.00%)
   0:00 [0 rows, 0B] [0 rows/s, 0B/s]
   presto> select sequence(date('2011-03-01'),date('2011-03-02'),interval '1' 
year);
   _col0 
[2011-03-01] 
   (1 row)
   Query 20200624_122810_5_pehix, FINISHED, 1 node
   Splits: 17 total, 17 done (100.00%)
   0:00 [0 rows, 0B] [0 rows/s, 0B/s]





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST

2020-06-23 Thread GitBox


TJX2014 commented on a change in pull request #28856:
URL: https://github.com/apache/spark/pull/28856#discussion_r444297027



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##
@@ -2589,6 +2589,8 @@ object Sequence {
 }
   }
 
+  // To generate time sequences, we use scale 1 in TemporalSequenceImpl
+  // for `TimestampType`, while MICROS_PER_DAY for `DateType`

Review comment:
   Seems pgsql can only support int as follows:
   postgres= create sequence seq_test;
   CREATE SEQUENCE
   postgres= select nextval('seq_test');
  1
   (1 行记录)
   postgres= select nextval('seq_test');
  2
   (1 行记录)
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST

2020-06-23 Thread GitBox


TJX2014 commented on a change in pull request #28856:
URL: https://github.com/apache/spark/pull/28856#discussion_r444045788



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##
@@ -2589,6 +2589,8 @@ object Sequence {
 }
   }
 
+  // To generate time sequences, we use scale 1 in TemporalSequenceImpl
+  // for `TimestampType`, while MICROS_PER_DAY for `DateType`

Review comment:
   Yes, seems we can, the result as follows:
   `scala> sql("select explode(sequence(cast('2011-03-01' as date), 
cast('2011-05-01' as date), interval 1 hour))").count
   res19: Long = 1465
   
   scala> sql("select explode(sequence(cast('2011-03-01' as date), 
cast('2011-05-01' as date), interval 1 minute))").count
   res20: Long = 87841
   
   scala> sql("select explode(sequence(cast('2011-03-01' as date), 
cast('2011-05-01' as date), interval 1 second))").count
   res21: Long = 5270401
   scala> sql("select explode(sequence(cast('2011-03-01' as date), 
cast('2011-05-01' as date), interval 1 minute))").head(3)
   res25: Array[org.apache.spark.sql.Row] = Array([2011-03-01], [2011-03-01], 
[2011-03-01])
   
   scala> sql("select explode(sequence(cast('2011-03-01' as date), 
cast('2011-05-01' as date), interval 1 second))").head(3)
   
   res26: Array[org.apache.spark.sql.Row] = Array([2011-03-01], [2011-03-01], 
[2011-03-01])
   
   scala> sql("select explode(sequence(cast('2011-03-01' as date), 
cast('2011-05-01' as date), interval 1 minute))").head(3)
   res27: Array[org.apache.spark.sql.Row] = Array([2011-03-01], [2011-03-01], 
[2011-03-01])
   
   scala> sql("select explode(sequence(cast('2011-03-01' as date), 
cast('2011-05-01' as date), interval 1 hour))").head(3)
   res28: Array[org.apache.spark.sql.Row] = Array([2011-03-01], [2011-03-01], 
[2011-03-01])
   `





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST

2020-06-23 Thread GitBox


TJX2014 commented on a change in pull request #28856:
URL: https://github.com/apache/spark/pull/28856#discussion_r444045788



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##
@@ -2589,6 +2589,8 @@ object Sequence {
 }
   }
 
+  // To generate time sequences, we use scale 1 in TemporalSequenceImpl
+  // for `TimestampType`, while MICROS_PER_DAY for `DateType`

Review comment:
   Yes, we can, but the result seems a little strange:
   `scala> sql("select explode(sequence(cast('2011-03-01' as date), 
cast('2011-05-01' as date), interval 1 hour))").count
   res19: Long = 1465
   
   scala> sql("select explode(sequence(cast('2011-03-01' as date), 
cast('2011-05-01' as date), interval 1 minute))").count
   res20: Long = 87841
   
   scala> sql("select explode(sequence(cast('2011-03-01' as date), 
cast('2011-05-01' as date), interval 1 second))").count
   res21: Long = 5270401`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST

2020-06-22 Thread GitBox


TJX2014 commented on a change in pull request #28856:
URL: https://github.com/apache/spark/pull/28856#discussion_r443643942



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##
@@ -2623,8 +2623,16 @@ object Sequence {
 // about a month length in days and a day length in microseconds
 val intervalStepInMicros =
   stepMicros + stepMonths * microsPerMonth + stepDays * microsPerDay
-val startMicros: Long = num.toLong(start) * scale
-val stopMicros: Long = num.toLong(stop) * scale
+
+// Date to timestamp is not equal from GMT and Chicago timezones
+val (startMicros, stopMicros) = if (scale == 1) {
+  (num.toLong(start), num.toLong(stop))
+}
+else {
+  (daysToMicros(num.toInt(start), zoneId),
+daysToMicros(num.toInt(stop), zoneId))

Review comment:
   Done, maybe we can pass the scale through constructor:
   private class TemporalSequenceImpl[T: ClassTag]
 (dt: IntegralType, **scale: Long**, fromLong: Long => T, zoneId: 
ZoneId)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST

2020-06-22 Thread GitBox


TJX2014 commented on a change in pull request #28856:
URL: https://github.com/apache/spark/pull/28856#discussion_r443610816



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##
@@ -2623,8 +2623,16 @@ object Sequence {
 // about a month length in days and a day length in microseconds
 val intervalStepInMicros =

Review comment:
   hi @cloud-fan,as  @MaxGekk  explain here, I am not sure if this patch 
looks ok,I am willing to `add more documents to TemporalSequenceImpl`but I am 
not sure if  we can follow this way or refactor a little.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST

2020-06-22 Thread GitBox


TJX2014 commented on a change in pull request #28856:
URL: https://github.com/apache/spark/pull/28856#discussion_r443610816



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##
@@ -2623,8 +2623,16 @@ object Sequence {
 // about a month length in days and a day length in microseconds
 val intervalStepInMicros =

Review comment:
   hi @cloud-fan,as  @MaxGekk  explain here, I am not sure if this patch 
look ok,I am willing to `add more documents to TemporalSequenceImpl`but I am 
not sure if  we can follow this way or refactor a little.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST

2020-06-22 Thread GitBox


TJX2014 commented on a change in pull request #28856:
URL: https://github.com/apache/spark/pull/28856#discussion_r443610816



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##
@@ -2623,8 +2623,16 @@ object Sequence {
 // about a month length in days and a day length in microseconds
 val intervalStepInMicros =

Review comment:
   hi @cloud-fan,as  @MaxGekk  explain here, I am not sure if this patch 
look ok,I am willing to `add more documents to TemporalSequenceImpl`if we can 
follow this way rather than refactor a little.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST

2020-06-22 Thread GitBox


TJX2014 commented on a change in pull request #28856:
URL: https://github.com/apache/spark/pull/28856#discussion_r443560745



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##
@@ -2623,8 +2623,16 @@ object Sequence {
 // about a month length in days and a day length in microseconds
 val intervalStepInMicros =
   stepMicros + stepMonths * microsPerMonth + stepDays * microsPerDay
-val startMicros: Long = num.toLong(start) * scale
-val stopMicros: Long = num.toLong(stop) * scale
+
+// Date to timestamp is not equal from GMT and Chicago timezones
+val (startMicros, stopMicros) = if (scale == 1) {
+  (num.toLong(start), num.toLong(stop))
+}
+else {
+  (daysToMicros(num.toInt(start), zoneId),
+daysToMicros(num.toInt(stop), zoneId))

Review comment:
   Because when `scale` != 1,it is converted to day count,so we may need to 
use zone info to translate into microseconds to get a correct result,rather 
than just multiply `MICROS_PER_DAY` which ignore timezone.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST

2020-06-22 Thread GitBox


TJX2014 commented on a change in pull request #28856:
URL: https://github.com/apache/spark/pull/28856#discussion_r443556895



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##
@@ -2623,8 +2623,16 @@ object Sequence {
 // about a month length in days and a day length in microseconds
 val intervalStepInMicros =
   stepMicros + stepMonths * microsPerMonth + stepDays * microsPerDay
-val startMicros: Long = num.toLong(start) * scale
-val stopMicros: Long = num.toLong(stop) * scale
+
+// Date to timestamp is not equal from GMT and Chicago timezones
+val (startMicros, stopMicros) = if (scale == 1) {
+  (num.toLong(start), num.toLong(stop))

Review comment:
   Maybe we could separate this into different methods ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST

2020-06-18 Thread GitBox


TJX2014 commented on a change in pull request #28856:
URL: https://github.com/apache/spark/pull/28856#discussion_r442621499



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##
@@ -2698,7 +2721,11 @@ object Sequence {
  |  int $i = 0;
  |
  |  while ($t < $exclusiveItem ^ $stepSign < 0) {
- |$arr[$i] = ($elemType) ($t / ${scale}L);
+ |if (${scale}L == 1L) {
+ |  $arr[$i] = ($elemType) ($t / ${scale}L);

Review comment:
   May it is better change `$arr[$i] = ($elemType) ($t / ${scale}L)` to 
`$arr[$i] = ($elemType) $t`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST

2020-06-18 Thread GitBox


TJX2014 commented on a change in pull request #28856:
URL: https://github.com/apache/spark/pull/28856#discussion_r442621499



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##
@@ -2698,7 +2721,11 @@ object Sequence {
  |  int $i = 0;
  |
  |  while ($t < $exclusiveItem ^ $stepSign < 0) {
- |$arr[$i] = ($elemType) ($t / ${scale}L);
+ |if (${scale}L == 1L) {
+ |  $arr[$i] = ($elemType) ($t / ${scale}L);

Review comment:
   `$arr[$i] = ($elemType) ($t / ${scale}L)` -> `$arr[$i] = ($elemType) $t`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST

2020-06-18 Thread GitBox


TJX2014 commented on a change in pull request #28856:
URL: https://github.com/apache/spark/pull/28856#discussion_r442621330



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##
@@ -2635,7 +2643,11 @@ object Sequence {
 var i = 0
 
 while (t < exclusiveItem ^ stepSign < 0) {
-  arr(i) = fromLong(t / scale)
+  arr(i) = if (scale == 1) {
+fromLong(t / scale)

Review comment:
I will change to `fromLong(t)` because it is same as ` fromLong(t / 
scale)` in `if` condition





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST

2020-06-18 Thread GitBox


TJX2014 commented on a change in pull request #28856:
URL: https://github.com/apache/spark/pull/28856#discussion_r442613764



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##
@@ -2635,7 +2643,11 @@ object Sequence {
 var i = 0
 
 while (t < exclusiveItem ^ stepSign < 0) {
-  arr(i) = fromLong(t / scale)
+  arr(i) = if (scale == 1) {
+fromLong(t / scale)
+  } else {
+fromLong(Math.round(t / scale.toFloat))

Review comment:
   We could get date from `Math.round(t / scale.toFloat)`, if the target is 
timestamp,  the former ` fromLong(t / scale)` could be used.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST

2020-06-18 Thread GitBox


TJX2014 commented on a change in pull request #28856:
URL: https://github.com/apache/spark/pull/28856#discussion_r442560213



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##
@@ -2623,8 +2623,16 @@ object Sequence {
 // about a month length in days and a day length in microseconds
 val intervalStepInMicros =

Review comment:
   `start is DateType, just add number of days. The same as for timestamps, 
time zone is not involved here.` seems strange in different `tz`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST

2020-06-18 Thread GitBox


TJX2014 commented on a change in pull request #28856:
URL: https://github.com/apache/spark/pull/28856#discussion_r442557807



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
##
@@ -1836,4 +1836,16 @@ class CollectionExpressionsSuite extends SparkFunSuite 
with ExpressionEvalHelper
 checkEvaluation(ArrayIntersect(empty, oneNull), Seq.empty)
 checkEvaluation(ArrayIntersect(oneNull, empty), Seq.empty)
   }
+
+  test("SPARK-31982: sequence doesn't handle date increments that cross DST") {
+Array("America/Chicago", "GMT", "Asia/Shanghai").foreach(tz => {
+  checkEvaluation(Sequence(
+Cast(Literal("2011-03-01"), DateType),
+Cast(Literal("2011-04-01"), DateType),
+Option(Literal(stringToInterval("interval 1 month"))),
+Option(tz)),
+Seq(
+  Date.valueOf("2011-03-01"), Date.valueOf("2011-04-01")))

Review comment:
   Sure, the result could be `tz` independently.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST

2020-06-18 Thread GitBox


TJX2014 commented on a change in pull request #28856:
URL: https://github.com/apache/spark/pull/28856#discussion_r442305304



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##
@@ -2698,7 +2717,7 @@ object Sequence {
  |  int $i = 0;
  |
  |  while ($t < $exclusiveItem ^ $stepSign < 0) {
- |$arr[$i] = ($elemType) ($t / ${scale}L);
+ |$arr[$i] = ($elemType) (Math.round($t / (float)${scale}L));

Review comment:
   @MaxGekk Because we may need the `Math.round`, if not use this float 
ops,it seems hard to avoid the gap about one day between the output and 
expected.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST

2020-06-18 Thread GitBox


TJX2014 commented on a change in pull request #28856:
URL: https://github.com/apache/spark/pull/28856#discussion_r442306713



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
##
@@ -1836,4 +1836,16 @@ class CollectionExpressionsSuite extends SparkFunSuite 
with ExpressionEvalHelper
 checkEvaluation(ArrayIntersect(empty, oneNull), Seq.empty)
 checkEvaluation(ArrayIntersect(oneNull, empty), Seq.empty)
   }
+
+  test("SPARK-31982: sequence doesn't handle date increments that cross DST") {
+Array("America/Chicago", "GMT", "Asia/Shanghai").foreach(tz => {
+  checkEvaluation(Sequence(
+Cast(Literal("2011-03-01"), DateType),
+Cast(Literal("2011-04-01"), DateType),
+Option(Literal(stringToInterval("interval 1 month"))),
+Option(tz)),
+Seq(
+  Date.valueOf("2011-03-01"), Date.valueOf("2011-04-01")))

Review comment:
   Yes, `America/Los_Angeles` can pass the test.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST

2020-06-18 Thread GitBox


TJX2014 commented on a change in pull request #28856:
URL: https://github.com/apache/spark/pull/28856#discussion_r442305304



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##
@@ -2698,7 +2717,7 @@ object Sequence {
  |  int $i = 0;
  |
  |  while ($t < $exclusiveItem ^ $stepSign < 0) {
- |$arr[$i] = ($elemType) ($t / ${scale}L);
+ |$arr[$i] = ($elemType) (Math.round($t / (float)${scale}L));

Review comment:
   @MaxGekk Because we may need this `Math.round`, if not use this flout 
ops,it seems hard to avoid the gap about one day between the output and 
expected.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST

2020-06-18 Thread GitBox


TJX2014 commented on a change in pull request #28856:
URL: https://github.com/apache/spark/pull/28856#discussion_r442154371



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##
@@ -2623,8 +2623,16 @@ object Sequence {
 // about a month length in days and a day length in microseconds
 val intervalStepInMicros =
   stepMicros + stepMonths * microsPerMonth + stepDays * microsPerDay
-val startMicros: Long = num.toLong(start) * scale
-val stopMicros: Long = num.toLong(stop) * scale
+
+// Date to timestamp is not equal from GMT and Chicago timezones
+val (startMicros, stopMicros) = if (scale == 1) {
+  (num.toLong(start), num.toLong(stop))
+}
+else {
+  (epochDaysToMicros(num.toInt(start), zoneId),

Review comment:
   Use timezone to transfer days to micros.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST

2020-06-18 Thread GitBox


TJX2014 commented on a change in pull request #28856:
URL: https://github.com/apache/spark/pull/28856#discussion_r442148485



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##
@@ -2635,7 +2643,7 @@ object Sequence {
 var i = 0
 
 while (t < exclusiveItem ^ stepSign < 0) {
-  arr(i) = fromLong(t / scale)

Review comment:
   Resolve Asia timezone not correct.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org