[GitHub] spark pull request #16308: [SPARK-18350][SQL][WIP] Support session local tim...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/16308#discussion_r92982685 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -111,7 +112,8 @@ case class CatalogTablePartition( */ def toRow(partitionSchema: StructType): InternalRow = { InternalRow.fromSeq(partitionSchema.map { field => - Cast(Literal(spec(field.name)), field.dataType).eval() + Cast(Literal(spec(field.name)), field.dataType, +DateTimeUtils.defaultTimeZone().getID).eval() --- End diff -- Hmm, now I think we should use timezone settings for partition values, because the values are also parts of data so they should be affected by the settings. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16308: [SPARK-18350][SQL][WIP] Support session local tim...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/16308#discussion_r92973560 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -111,7 +112,8 @@ case class CatalogTablePartition( */ def toRow(partitionSchema: StructType): InternalRow = { InternalRow.fromSeq(partitionSchema.map { field => - Cast(Literal(spec(field.name)), field.dataType).eval() + Cast(Literal(spec(field.name)), field.dataType, +DateTimeUtils.defaultTimeZone().getID).eval() --- End diff -- Currently the behavior doesn't change by timezone setting, i.e. using system timezone. This is a part that I was not sure which we should handle the partition values, use timezone settings or system timezone. Should we use timezone settings? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16308: [SPARK-18350][SQL][WIP] Support session local tim...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/16308#discussion_r92961639 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -135,6 +146,14 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w override def nullable: Boolean = Cast.forceNullable(child.dataType, dataType) || child.nullable + override def timeZoneResolved: Boolean = +(!(childrenResolved && Cast.needTimeZone(child.dataType, dataType))) || super.timeZoneResolved + + override lazy val resolved: Boolean = +childrenResolved && checkInputDataTypes().isSuccess && timeZoneResolved + + override def withTimeZone(zoneId: String): TimeZoneAwareExpression = copy(zoneId = zoneId) --- End diff -- Yes, this is a copy ctor, but the analyzer `ResolveTimeZone` can't call the copy ctor because it doesn't know the actual expression class. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16308: [SPARK-18350][SQL][WIP] Support session local tim...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/16308#discussion_r92962066 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -120,7 +128,10 @@ object Cast { > SELECT _FUNC_('10' as int); 10 """) -case class Cast(child: Expression, dataType: DataType) extends UnaryExpression with NullIntolerant { +case class Cast(child: Expression, dataType: DataType, zoneId: String = null) --- End diff -- I agree that an extra unapply is a bad idea. I'll leave it as it is for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16308: [SPARK-18350][SQL][WIP] Support session local tim...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/16308#discussion_r92961818 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -110,6 +110,14 @@ object Cast { case (_: FractionalType, _: IntegralType) => true // NaN, infinity case _ => false } + + def needTimeZone(from: DataType, to: DataType): Boolean = (from, to) match { --- End diff -- I see, I'll add a document. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16308: [SPARK-18350][SQL][WIP] Support session local tim...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/16308#discussion_r92962327 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -30,21 +30,44 @@ import org.apache.spark.sql.types._ import org.apache.spark.unsafe.types.{CalendarInterval, UTF8String} /** + * Common base class for time zone aware expressions. + */ +trait TimeZoneAwareExpression extends Expression { + + def zoneId: String + + def timeZoneResolved: Boolean = zoneId != null + + def withTimeZone(zoneId: String): TimeZoneAwareExpression + + def timeZone: TimeZone = TimeZone.getTimeZone(zoneId) --- End diff -- You are right. I'll use `lazy val`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16308: [SPARK-18350][SQL][WIP] Support session local tim...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/16308#discussion_r92962286 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -30,21 +30,44 @@ import org.apache.spark.sql.types._ import org.apache.spark.unsafe.types.{CalendarInterval, UTF8String} /** + * Common base class for time zone aware expressions. + */ +trait TimeZoneAwareExpression extends Expression { + + def zoneId: String --- End diff -- Yes, I wanted to avoid a bunch of gets. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16308: [SPARK-18350][SQL][WIP] Support session local tim...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/16308#discussion_r92962120 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala --- @@ -41,13 +43,16 @@ object ReplaceExpressions extends Rule[LogicalPlan] { */ object ComputeCurrentTime extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = { -val dateExpr = CurrentDate() +val currentDates = mutable.Map.empty[String, Literal] val timeExpr = CurrentTimestamp() -val currentDate = Literal.create(dateExpr.eval(EmptyRow), dateExpr.dataType) val currentTime = Literal.create(timeExpr.eval(EmptyRow), timeExpr.dataType) plan transformAllExpressions { - case CurrentDate() => currentDate + case CurrentDate(tz) => +currentDates.getOrElseUpdate(tz, { + val dateExpr = CurrentDate(tz) + Literal.create(dateExpr.eval(EmptyRow), dateExpr.dataType) --- End diff -- Good catch, I'll modify this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16308: [SPARK-18350][SQL][WIP] Support session local tim...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16308#discussion_r92916070 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -30,21 +30,44 @@ import org.apache.spark.sql.types._ import org.apache.spark.unsafe.types.{CalendarInterval, UTF8String} /** + * Common base class for time zone aware expressions. + */ +trait TimeZoneAwareExpression extends Expression { + + def zoneId: String + + def timeZoneResolved: Boolean = zoneId != null + + def withTimeZone(zoneId: String): TimeZoneAwareExpression + + def timeZone: TimeZone = TimeZone.getTimeZone(zoneId) --- End diff -- should this be a lazy val? otherwise it is pretty expensive to keep creating a new timezone object per row in the interpreted path. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16308: [SPARK-18350][SQL][WIP] Support session local tim...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16308#discussion_r92915894 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -120,7 +128,10 @@ object Cast { > SELECT _FUNC_('10' as int); 10 """) -case class Cast(child: Expression, dataType: DataType) extends UnaryExpression with NullIntolerant { +case class Cast(child: Expression, dataType: DataType, zoneId: String = null) --- End diff -- maybe an extra unapply is probably a bad idea, since then we can miss a pattern match. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16308: [SPARK-18350][SQL][WIP] Support session local tim...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16308#discussion_r92915851 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -111,7 +112,8 @@ case class CatalogTablePartition( */ def toRow(partitionSchema: StructType): InternalRow = { InternalRow.fromSeq(partitionSchema.map { field => - Cast(Literal(spec(field.name)), field.dataType).eval() + Cast(Literal(spec(field.name)), field.dataType, +DateTimeUtils.defaultTimeZone().getID).eval() --- End diff -- could this change the behavior on how we interpret partition values when timezone settings change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16308: [SPARK-18350][SQL][WIP] Support session local tim...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16308#discussion_r92915771 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -120,7 +128,10 @@ object Cast { > SELECT _FUNC_('10' as int); 10 """) -case class Cast(child: Expression, dataType: DataType) extends UnaryExpression with NullIntolerant { +case class Cast(child: Expression, dataType: DataType, zoneId: String = null) --- End diff -- not 100% sure whether this is a good idea, but should we consider adding a Cast.unapply that does not match on zoneId? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16308: [SPARK-18350][SQL][WIP] Support session local tim...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16308#discussion_r92915757 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -110,6 +110,14 @@ object Cast { case (_: FractionalType, _: IntegralType) => true // NaN, infinity case _ => false } + + def needTimeZone(from: DataType, to: DataType): Boolean = (from, to) match { --- End diff -- i think it's important to document this ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16308: [SPARK-18350][SQL][WIP] Support session local tim...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16308#discussion_r92915708 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -30,21 +30,44 @@ import org.apache.spark.sql.types._ import org.apache.spark.unsafe.types.{CalendarInterval, UTF8String} /** + * Common base class for time zone aware expressions. + */ +trait TimeZoneAwareExpression extends Expression { + + def zoneId: String --- End diff -- is the reason you are using null rather than option to avoid a bunch of gets? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16308: [SPARK-18350][SQL][WIP] Support session local tim...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/16308 [SPARK-18350][SQL][WIP] Support session local timezone. ## What changes were proposed in this pull request? As of Spark 2.1, Spark SQL assumes the machine timezone for datetime manipulation, which is bad if users are not in the same timezones as the machines, or if different users have different timezones. We should introduce a session local timezone setting that is used for execution. An explicit non-goal is locale handling. ### Design of the fix I introduced an analyzer to pass session local timezone to timezone-aware expressions and modified DateTimeUtils to take the timezone argument. ## How was this patch tested? Existing tests and added tests for timezone aware expressions. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-18350 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16308.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16308 commit c48a70da0f06ebde001091fe765f787382c7409e Author: Takuya UESHIN Date: 2016-12-06T06:41:39Z Prepare for session local timezone support. commit 1d21fec497147faa1eabcfc31e524abc102c67c5 Author: Takuya UESHIN Date: 2016-12-06T13:43:26Z Make Cast TimeZoneAwareExpression. commit 0763c8f67e5e13045d6958b45ffe3d4e0efb0a63 Author: Takuya UESHIN Date: 2016-12-06T14:37:47Z Fix DateTimeUtilsSuite to follow changes. commit 449d93d416d225a323a18dbffc965dca11c3b55d Author: Takuya UESHIN Date: 2016-12-08T15:20:05Z Make some datetime expressions TimeZoneAwareExpression. commit b59d902d80a2d33d7778903b45c334333bd00833 Author: Takuya UESHIN Date: 2016-12-08T17:48:20Z Fix compiler error in sql/core. commit 3ddfae41af5a3046ba966088cdc552d65574d02b Author: Takuya UESHIN Date: 2016-12-09T07:38:17Z Add constructors without zoneId to TimeZoneAwareExpressions for FunctionRepository. commit f58f00df14a60afccf84c10c94c96d64bf6c4724 Author: Takuya UESHIN Date: 2016-12-13T04:01:01Z Add DateTimeUtils.threadLocalLocalTimeZone to partition-reltated Cast. commit 8f2040b1be461d71f919d53462e70af86d6e6ada Author: Takuya UESHIN Date: 2016-12-13T07:32:43Z Fix timezone for Hive timestamp string. commit 63c103c2e94f45e0a9af74c48c1af57311ef1586 Author: Takuya UESHIN Date: 2016-12-13T09:53:30Z Use defaultTimeZone instead of threadLocalLocalTimeZone. commit 7066850286fbd5cd845b38d8b3498531420c8a89 Author: Takuya UESHIN Date: 2016-12-13T10:08:10Z Add TimeZone to DateFormats. commit 1aaca29122d9996da4a2d60a3e9aae01b89e685e Author: Takuya UESHIN Date: 2016-12-14T05:36:33Z Make `CurrentBatchTimestamp` `TimeZoneAwareExpression`. commit e5bb246f8b4cbda13d248a0b916c55817d78a926 Author: Takuya UESHIN Date: 2016-12-14T10:00:27Z Add tests for date functions with session local timezone. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org