[ https://issues.apache.org/jira/browse/SPARK-39184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537252#comment-17537252 ]
Bruce Robbins commented on SPARK-39184: --------------------------------------- Some notes on what is happening. Let's take this small reproduction example (this fails in the {{America/Los_Angeles}} time-zone): {noformat} select sequence( timestamp'2022-03-13 00:00:00', timestamp'2022-03-14 01:00:00', interval 1 day 1 hour) as x; {noformat} This produces the error: {noformat} java.lang.ArrayIndexOutOfBoundsException: 1 at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:77) ~[scala-library.jar:?] {noformat} The following shows what is happening (essentially), using Scala code that can be pasted into the REPL: {noformat} import java.time._ import java.time.temporal.ChronoUnit._ val zid = ZoneId.of("America/Los_Angeles") val startZdt = ZonedDateTime.of(LocalDateTime.of(2022, 3, 13, 0, 0, 0, 0), zid) val stopZdt = ZonedDateTime.of(LocalDateTime.of(2022, 3, 14, 1, 0, 0, 0), zid) val interval = Duration.ofDays(1).plusHours(1) print(interval.toHours) // prints 25 // the diff between the start and stop is 24 hours, because 2022-03-13 has // only 23 hours in the America/Los_Angeles time-zone due to "Spring Forward" val hours = startZdt.until(stopZdt, HOURS) println(hours) // prints 24 // this is how InternalSequenceBase estimates the size of the result array // (it actually uses micros, but we're just scaling to hours for simplicity here). // This calculates a single element print((hours/interval.toHours) + 1) // prints 1 // InternalSequenceBase thinks it only needs one array element because if you add // 25 hours to '2022-03-13 00:00:00', you get '2022-03-14 02:00:00', which is greater // than the stop value. // To show that, we add 25 hours to the start value println(startZdt.plusHours(25)) // prints 2022-03-14T02:00-07:00[America/Los_Angeles] // However, when calculating the value to put in each element, InternalSequenceBase // doesn't add 25 hours to the previous value (or start value). It instead // adds 1 day and 1 hour. That gives you '2022-03-14 01:00:00', which is equal to the // stop value (and thus should be included). // As a result, we blow past the end of the pre-allocated array. println(startZdt.plusDays(1).plusHours(1)) // prints 2022-03-14T01:00-07:00[America/Los_Angeles] {noformat} > ArrayIndexOutOfBoundsException for some date/time sequences in some time-zones > ------------------------------------------------------------------------------ > > Key: SPARK-39184 > URL: https://issues.apache.org/jira/browse/SPARK-39184 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.2.1, 3.3.0, 3.4.0 > Reporter: Bruce Robbins > Priority: Major > > The following query gets an {{ArrayIndexOutOfBoundsException}} when run from > the {{America/Los_Angeles}} time-zone: > {noformat} > spark-sql> select sequence(timestamp'2022-03-13 00:00:00', > timestamp'2022-03-16 03:00:00', interval 1 day 1 hour) as x; > 22/05/13 14:47:27 ERROR SparkSQLDriver: Failed in [select > sequence(timestamp'2022-03-13 00:00:00', timestamp'2022-03-16 03:00:00', > interval 1 day 1 hour) as x] > java.lang.ArrayIndexOutOfBoundsException: 3 > {noformat} > In fact, any such query will get an {{ArrayIndexOutOfBoundsException}} if the > start-stop period in your time-zone includes more instances of "spring > forward" than instances of "fall back" and the start-stop period is evenly > divisible by the interval. > In the {{America/Los_Angeles}} time-zone, examples include: > {noformat} > -- This query encompasses 2 instances of "spring forward" but only one > -- instance of "fall back". > select sequence( > timestamp'2022-03-13', > timestamp'2022-03-13' + (interval '42' hours * 209), > interval '42' hours) as x; > {noformat} > {noformat} > select sequence( > timestamp'2022-03-13', > timestamp'2022-03-13' + (interval '31' hours * 11), > interval '31' hours) as x; > {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org