[GitHub] spark pull request #16308: [SPARK-18350][SQL][WIP] Support session local tim...

2016-12-18 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r92982685
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -111,7 +112,8 @@ case class CatalogTablePartition(
*/
   def toRow(partitionSchema: StructType): InternalRow = {
 InternalRow.fromSeq(partitionSchema.map { field =>
-  Cast(Literal(spec(field.name)), field.dataType).eval()
+  Cast(Literal(spec(field.name)), field.dataType,
+DateTimeUtils.defaultTimeZone().getID).eval()
--- End diff --

Hmm, now I think we should use timezone settings for partition values, 
because the values are also parts of data so they should be affected by the 
settings.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16308: [SPARK-18350][SQL][WIP] Support session local tim...

2016-12-18 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r92973560
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -111,7 +112,8 @@ case class CatalogTablePartition(
*/
   def toRow(partitionSchema: StructType): InternalRow = {
 InternalRow.fromSeq(partitionSchema.map { field =>
-  Cast(Literal(spec(field.name)), field.dataType).eval()
+  Cast(Literal(spec(field.name)), field.dataType,
+DateTimeUtils.defaultTimeZone().getID).eval()
--- End diff --

Currently the behavior doesn't change by timezone setting, i.e. using 
system timezone.

This is a part that I was not sure which we should handle the partition 
values, use timezone settings or system timezone.
Should we use timezone settings?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16308: [SPARK-18350][SQL][WIP] Support session local tim...

2016-12-18 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r92961639
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -135,6 +146,14 @@ case class Cast(child: Expression, dataType: DataType) 
extends UnaryExpression w
 
   override def nullable: Boolean = Cast.forceNullable(child.dataType, 
dataType) || child.nullable
 
+  override def timeZoneResolved: Boolean =
+(!(childrenResolved && Cast.needTimeZone(child.dataType, dataType))) 
|| super.timeZoneResolved
+
+  override lazy val resolved: Boolean =
+childrenResolved && checkInputDataTypes().isSuccess && timeZoneResolved
+
+  override def withTimeZone(zoneId: String): TimeZoneAwareExpression = 
copy(zoneId = zoneId)
--- End diff --

Yes, this is a copy ctor, but the analyzer `ResolveTimeZone` can't call the 
copy ctor because it doesn't know the actual expression class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16308: [SPARK-18350][SQL][WIP] Support session local tim...

2016-12-18 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r92962066
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -120,7 +128,10 @@ object Cast {
   > SELECT _FUNC_('10' as int);
10
   """)
-case class Cast(child: Expression, dataType: DataType) extends 
UnaryExpression with NullIntolerant {
+case class Cast(child: Expression, dataType: DataType, zoneId: String = 
null)
--- End diff --

I agree that an extra unapply is a bad idea. I'll leave it as it is for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16308: [SPARK-18350][SQL][WIP] Support session local tim...

2016-12-18 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r92961818
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -110,6 +110,14 @@ object Cast {
 case (_: FractionalType, _: IntegralType) => true  // NaN, infinity
 case _ => false
   }
+
+  def needTimeZone(from: DataType, to: DataType): Boolean = (from, to) 
match {
--- End diff --

I see, I'll add a document.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16308: [SPARK-18350][SQL][WIP] Support session local tim...

2016-12-18 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r92962327
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -30,21 +30,44 @@ import org.apache.spark.sql.types._
 import org.apache.spark.unsafe.types.{CalendarInterval, UTF8String}
 
 /**
+ * Common base class for time zone aware expressions.
+ */
+trait TimeZoneAwareExpression extends Expression {
+
+  def zoneId: String
+
+  def timeZoneResolved: Boolean = zoneId != null
+
+  def withTimeZone(zoneId: String): TimeZoneAwareExpression
+
+  def timeZone: TimeZone = TimeZone.getTimeZone(zoneId)
--- End diff --

You are right. I'll use `lazy val`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16308: [SPARK-18350][SQL][WIP] Support session local tim...

2016-12-18 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r92962286
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -30,21 +30,44 @@ import org.apache.spark.sql.types._
 import org.apache.spark.unsafe.types.{CalendarInterval, UTF8String}
 
 /**
+ * Common base class for time zone aware expressions.
+ */
+trait TimeZoneAwareExpression extends Expression {
+
+  def zoneId: String
--- End diff --

Yes, I wanted to avoid a bunch of gets.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16308: [SPARK-18350][SQL][WIP] Support session local tim...

2016-12-18 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r92962120
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala
 ---
@@ -41,13 +43,16 @@ object ReplaceExpressions extends Rule[LogicalPlan] {
  */
 object ComputeCurrentTime extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = {
-val dateExpr = CurrentDate()
+val currentDates = mutable.Map.empty[String, Literal]
 val timeExpr = CurrentTimestamp()
-val currentDate = Literal.create(dateExpr.eval(EmptyRow), 
dateExpr.dataType)
 val currentTime = Literal.create(timeExpr.eval(EmptyRow), 
timeExpr.dataType)
 
 plan transformAllExpressions {
-  case CurrentDate() => currentDate
+  case CurrentDate(tz) =>
+currentDates.getOrElseUpdate(tz, {
+  val dateExpr = CurrentDate(tz)
+  Literal.create(dateExpr.eval(EmptyRow), dateExpr.dataType)
--- End diff --

Good catch, I'll modify this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16308: [SPARK-18350][SQL][WIP] Support session local tim...

2016-12-16 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r92916070
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -30,21 +30,44 @@ import org.apache.spark.sql.types._
 import org.apache.spark.unsafe.types.{CalendarInterval, UTF8String}
 
 /**
+ * Common base class for time zone aware expressions.
+ */
+trait TimeZoneAwareExpression extends Expression {
+
+  def zoneId: String
+
+  def timeZoneResolved: Boolean = zoneId != null
+
+  def withTimeZone(zoneId: String): TimeZoneAwareExpression
+
+  def timeZone: TimeZone = TimeZone.getTimeZone(zoneId)
--- End diff --

should this be a lazy val? otherwise it is pretty expensive to keep 
creating a new timezone object per row in the interpreted path.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16308: [SPARK-18350][SQL][WIP] Support session local tim...

2016-12-16 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r92915894
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -120,7 +128,10 @@ object Cast {
   > SELECT _FUNC_('10' as int);
10
   """)
-case class Cast(child: Expression, dataType: DataType) extends 
UnaryExpression with NullIntolerant {
+case class Cast(child: Expression, dataType: DataType, zoneId: String = 
null)
--- End diff --

maybe an extra unapply is probably a bad idea, since then we can miss a 
pattern match.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16308: [SPARK-18350][SQL][WIP] Support session local tim...

2016-12-16 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r92915851
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -111,7 +112,8 @@ case class CatalogTablePartition(
*/
   def toRow(partitionSchema: StructType): InternalRow = {
 InternalRow.fromSeq(partitionSchema.map { field =>
-  Cast(Literal(spec(field.name)), field.dataType).eval()
+  Cast(Literal(spec(field.name)), field.dataType,
+DateTimeUtils.defaultTimeZone().getID).eval()
--- End diff --

could this change the behavior on how we interpret partition values when 
timezone settings change?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16308: [SPARK-18350][SQL][WIP] Support session local tim...

2016-12-16 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r92915771
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -120,7 +128,10 @@ object Cast {
   > SELECT _FUNC_('10' as int);
10
   """)
-case class Cast(child: Expression, dataType: DataType) extends 
UnaryExpression with NullIntolerant {
+case class Cast(child: Expression, dataType: DataType, zoneId: String = 
null)
--- End diff --

not 100% sure whether this is a good idea, but should we consider adding a 
Cast.unapply that does not match on zoneId?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16308: [SPARK-18350][SQL][WIP] Support session local tim...

2016-12-16 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r92915757
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -110,6 +110,14 @@ object Cast {
 case (_: FractionalType, _: IntegralType) => true  // NaN, infinity
 case _ => false
   }
+
+  def needTimeZone(from: DataType, to: DataType): Boolean = (from, to) 
match {
--- End diff --

i think it's important to document this ...




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16308: [SPARK-18350][SQL][WIP] Support session local tim...

2016-12-16 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r92915708
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -30,21 +30,44 @@ import org.apache.spark.sql.types._
 import org.apache.spark.unsafe.types.{CalendarInterval, UTF8String}
 
 /**
+ * Common base class for time zone aware expressions.
+ */
+trait TimeZoneAwareExpression extends Expression {
+
+  def zoneId: String
--- End diff --

is the reason you are using null rather than option to avoid a bunch of 
gets?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16308: [SPARK-18350][SQL][WIP] Support session local tim...

2016-12-16 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/16308

[SPARK-18350][SQL][WIP] Support session local timezone.

## What changes were proposed in this pull request?

As of Spark 2.1, Spark SQL assumes the machine timezone for datetime 
manipulation, which is bad if users are not in the same timezones as the 
machines, or if different users have different timezones.

We should introduce a session local timezone setting that is used for 
execution.

An explicit non-goal is locale handling.

### Design of the fix

I introduced an analyzer to pass session local timezone to timezone-aware 
expressions and modified DateTimeUtils to take the timezone argument.

## How was this patch tested?

Existing tests and added tests for timezone aware expressions.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-18350

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16308.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16308


commit c48a70da0f06ebde001091fe765f787382c7409e
Author: Takuya UESHIN 
Date:   2016-12-06T06:41:39Z

Prepare for session local timezone support.

commit 1d21fec497147faa1eabcfc31e524abc102c67c5
Author: Takuya UESHIN 
Date:   2016-12-06T13:43:26Z

Make Cast TimeZoneAwareExpression.

commit 0763c8f67e5e13045d6958b45ffe3d4e0efb0a63
Author: Takuya UESHIN 
Date:   2016-12-06T14:37:47Z

Fix DateTimeUtilsSuite to follow changes.

commit 449d93d416d225a323a18dbffc965dca11c3b55d
Author: Takuya UESHIN 
Date:   2016-12-08T15:20:05Z

Make some datetime expressions TimeZoneAwareExpression.

commit b59d902d80a2d33d7778903b45c334333bd00833
Author: Takuya UESHIN 
Date:   2016-12-08T17:48:20Z

Fix compiler error in sql/core.

commit 3ddfae41af5a3046ba966088cdc552d65574d02b
Author: Takuya UESHIN 
Date:   2016-12-09T07:38:17Z

Add constructors without zoneId to TimeZoneAwareExpressions for 
FunctionRepository.

commit f58f00df14a60afccf84c10c94c96d64bf6c4724
Author: Takuya UESHIN 
Date:   2016-12-13T04:01:01Z

Add DateTimeUtils.threadLocalLocalTimeZone to partition-reltated Cast.

commit 8f2040b1be461d71f919d53462e70af86d6e6ada
Author: Takuya UESHIN 
Date:   2016-12-13T07:32:43Z

Fix timezone for Hive timestamp string.

commit 63c103c2e94f45e0a9af74c48c1af57311ef1586
Author: Takuya UESHIN 
Date:   2016-12-13T09:53:30Z

Use defaultTimeZone instead of threadLocalLocalTimeZone.

commit 7066850286fbd5cd845b38d8b3498531420c8a89
Author: Takuya UESHIN 
Date:   2016-12-13T10:08:10Z

Add TimeZone to DateFormats.

commit 1aaca29122d9996da4a2d60a3e9aae01b89e685e
Author: Takuya UESHIN 
Date:   2016-12-14T05:36:33Z

Make `CurrentBatchTimestamp` `TimeZoneAwareExpression`.

commit e5bb246f8b4cbda13d248a0b916c55817d78a926
Author: Takuya UESHIN 
Date:   2016-12-14T10:00:27Z

Add tests for date functions with session local timezone.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org