This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 0024173 [SPARK-27405][SQL][TEST] Restrict the range of generated random timestamps 0024173 is described below commit 00241733a6d7761a4f6e64442319b11dd9d2b57b Author: Maxim Gekk <max.g...@gmail.com> AuthorDate: Mon Apr 8 09:53:00 2019 -0700 [SPARK-27405][SQL][TEST] Restrict the range of generated random timestamps ## What changes were proposed in this pull request? In the PR, I propose to restrict the range of random timestamp literals generated in `LiteralGenerator. timestampLiteralGen`. The generator creates instances of `java.sql.Timestamp` by passing milliseconds since epoch as `Long` type. Converting the milliseconds to microseconds can cause arithmetic overflow of Long type because Catalyst's Timestamp type stores microseconds since epoch in `Long` type internally as well. Proposed interval of random milliseconds is `[Long.MinValue / 1000, [...] For example, generated timestamp `new java.sql.Timestamp(-3948373668011580000)` causes `Long` overflow at the method: ```scala def fromJavaTimestamp(t: Timestamp): SQLTimestamp = { ... MILLISECONDS.toMicros(t.getTime()) + NANOSECONDS.toMicros(t.getNanos()) % NANOS_PER_MICROS ... } ``` because `t.getTime()` returns `-3948373668011580000` which is multiplied by `1000` at `MILLISECONDS.toMicros`, and the result `-3948373668011580000000` is less than `Long.MinValue`. ## How was this patch tested? By `DateExpressionsSuite` in the PR https://github.com/apache/spark/pull/24311 Closes #24316 from MaxGekk/random-timestamps-gen. Authored-by: Maxim Gekk <max.g...@gmail.com> Signed-off-by: Dongjoon Hyun <dh...@apple.com> --- .../spark/sql/catalyst/expressions/LiteralGenerator.scala | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala index 032aec0..be5fdb5 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala @@ -21,6 +21,7 @@ import java.sql.{Date, Timestamp} import org.scalacheck.{Arbitrary, Gen} +import org.apache.spark.sql.catalyst.util.DateTimeUtils import org.apache.spark.sql.types._ import org.apache.spark.unsafe.types.CalendarInterval @@ -102,8 +103,16 @@ object LiteralGenerator { lazy val dateLiteralGen: Gen[Literal] = for { d <- Arbitrary.arbInt.arbitrary } yield Literal.create(new Date(d), DateType) - lazy val timestampLiteralGen: Gen[Literal] = - for { t <- Arbitrary.arbLong.arbitrary } yield Literal.create(new Timestamp(t), TimestampType) + lazy val timestampLiteralGen: Gen[Literal] = { + // Catalyst's Timestamp type stores number of microseconds since epoch in + // a variable of Long type. To prevent arithmetic overflow of Long on + // conversion from milliseconds to microseconds, the range of random milliseconds + // since epoch is restricted here. + val maxMillis = Long.MaxValue / DateTimeUtils.MICROS_PER_MILLIS + val minMillis = Long.MinValue / DateTimeUtils.MICROS_PER_MILLIS + for { millis <- Gen.choose(minMillis, maxMillis) } + yield Literal.create(new Timestamp(millis), TimestampType) + } lazy val calendarIntervalLiterGen: Gen[Literal] = for { m <- Arbitrary.arbInt.arbitrary; s <- Arbitrary.arbLong.arbitrary} --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org