[GitHub] spark pull request #19939: [SPARK-20557] [SQL] Only support TIMESTAMP WITH T...

2017-12-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19939


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19939: [SPARK-20557] [SQL] Only support TIMESTAMP WITH T...

2017-12-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19939#discussion_r156218641
  
--- Diff: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
 ---
@@ -151,6 +151,8 @@ class PostgresIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 
   test("SPARK-20557: column type TIMESTAMP with TIME ZONE and TIME with 
TIME ZONE " +
 "should be recognized") {
+// When using JDBC to read the columns of TIMESTAMP with TIME ZONE and 
TIME with TIME ZONE
--- End diff --

That is for the support of `ArrayType`. We can revisit that one in the next 
PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19939: [SPARK-20557] [SQL] Only support TIMESTAMP WITH T...

2017-12-11 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request:

https://github.com/apache/spark/pull/19939#discussion_r156205533
  
--- Diff: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
 ---
@@ -151,6 +151,8 @@ class PostgresIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 
   test("SPARK-20557: column type TIMESTAMP with TIME ZONE and TIME with 
TIME ZONE " +
 "should be recognized") {
+// When using JDBC to read the columns of TIMESTAMP with TIME ZONE and 
TIME with TIME ZONE
--- End diff --

Out of curiosity, why do we not need something similar for postgres: 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala#L61?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19939: [SPARK-20557] [SQL] Only support TIMESTAMP WITH T...

2017-12-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19939#discussion_r156002148
  
--- Diff: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
 ---
@@ -235,6 +239,61 @@ class OracleIntegrationSuite extends 
DockerJDBCIntegrationSuite with SharedSQLCo
 assert(types(1).equals("class java.sql.Timestamp"))
   }
 
+  test("Column type TIMESTAMP with SESSION_LOCAL_TIMEZONE is different 
from default") {
+val defaultJVMTimeZone = TimeZone.getDefault
+// Pick the timezone different from the current default time zone of 
JVM
+val sofiaTimeZone = TimeZone.getTimeZone("Europe/Sofia")
+val shanghaiTimeZone = TimeZone.getTimeZone("Asia/Shanghai")
+val localSessionTimeZone =
+  if (defaultJVMTimeZone == shanghaiTimeZone) sofiaTimeZone else 
shanghaiTimeZone
+
+withSQLConf(SQLConf.SESSION_LOCAL_TIMEZONE.key -> 
localSessionTimeZone.getID) {
+  val e = intercept[java.sql.SQLException] {
+val dfRead = sqlContext.read.jdbc(jdbcUrl, "ts_with_timezone", new 
Properties)
+dfRead.collect()
+  }.getMessage
+  assert(e.contains("Unrecognized SQL type -101"))
+}
+  }
+
+  /**
+   * Change the Time Zone `timeZoneId` of JVM before executing `f`, then 
switches back to the
+   * original after `f` returns.
+   * @param timeZoneId the ID for a TimeZone, either an abbreviation such 
as "PST", a full name such
+   *   as "America/Los_Angeles", or a custom ID such as 
"GMT-8:00".
+   */
+  private def withTimeZone(timeZoneId: String)(f: => Unit): Unit = {
+val originalLocale = TimeZone.getDefault
+try {
+  // Add Locale setting
+  TimeZone.setDefault(TimeZone.getTimeZone(timeZoneId))
+  f
+} finally {
+  TimeZone.setDefault(originalLocale)
+}
+  }
+
+  test("Column TIMESTAMP with TIME ZONE(JVM timezone)") {
+def checkRow(row: Row, ts: String): Unit = {
+  assert(row.getTimestamp(1).equals(Timestamp.valueOf(ts)))
+}
+
+val dfRead = sqlContext.read.jdbc(jdbcUrl, "ts_with_timezone", new 
Properties)
+withTimeZone("PST") {
--- End diff --

I feel it's safer if we also set session local time zone in `withTimeZone`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19939: [SPARK-20557] [SQL] Only support TIMESTAMP WITH T...

2017-12-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19939#discussion_r156001886
  
--- Diff: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
 ---
@@ -235,6 +239,61 @@ class OracleIntegrationSuite extends 
DockerJDBCIntegrationSuite with SharedSQLCo
 assert(types(1).equals("class java.sql.Timestamp"))
   }
 
+  test("Column type TIMESTAMP with SESSION_LOCAL_TIMEZONE is different 
from default") {
+val defaultJVMTimeZone = TimeZone.getDefault
+// Pick the timezone different from the current default time zone of 
JVM
+val sofiaTimeZone = TimeZone.getTimeZone("Europe/Sofia")
+val shanghaiTimeZone = TimeZone.getTimeZone("Asia/Shanghai")
+val localSessionTimeZone =
+  if (defaultJVMTimeZone == shanghaiTimeZone) sofiaTimeZone else 
shanghaiTimeZone
+
+withSQLConf(SQLConf.SESSION_LOCAL_TIMEZONE.key -> 
localSessionTimeZone.getID) {
+  val e = intercept[java.sql.SQLException] {
+val dfRead = sqlContext.read.jdbc(jdbcUrl, "ts_with_timezone", new 
Properties)
+dfRead.collect()
+  }.getMessage
+  assert(e.contains("Unrecognized SQL type -101"))
+}
+  }
+
+  /**
+   * Change the Time Zone `timeZoneId` of JVM before executing `f`, then 
switches back to the
+   * original after `f` returns.
+   * @param timeZoneId the ID for a TimeZone, either an abbreviation such 
as "PST", a full name such
+   *   as "America/Los_Angeles", or a custom ID such as 
"GMT-8:00".
+   */
+  private def withTimeZone(timeZoneId: String)(f: => Unit): Unit = {
+val originalLocale = TimeZone.getDefault
+try {
+  // Add Locale setting
+  TimeZone.setDefault(TimeZone.getTimeZone(timeZoneId))
+  f
+} finally {
+  TimeZone.setDefault(originalLocale)
+}
+  }
+
+  test("Column TIMESTAMP with TIME ZONE(JVM timezone)") {
+def checkRow(row: Row, ts: String): Unit = {
+  assert(row.getTimestamp(1).equals(Timestamp.valueOf(ts)))
+}
+
+val dfRead = sqlContext.read.jdbc(jdbcUrl, "ts_with_timezone", new 
Properties)
+withTimeZone("PST") {
--- End diff --

oh i see, session time zone is dynamically evaluated with JVM timezone if 
not set, so we need to guarantee that session time zone is not set here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19939: [SPARK-20557] [SQL] Only support TIMESTAMP WITH T...

2017-12-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19939#discussion_r156001560
  
--- Diff: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
 ---
@@ -235,6 +239,61 @@ class OracleIntegrationSuite extends 
DockerJDBCIntegrationSuite with SharedSQLCo
 assert(types(1).equals("class java.sql.Timestamp"))
   }
 
+  test("Column type TIMESTAMP with SESSION_LOCAL_TIMEZONE is different 
from default") {
+val defaultJVMTimeZone = TimeZone.getDefault
+// Pick the timezone different from the current default time zone of 
JVM
+val sofiaTimeZone = TimeZone.getTimeZone("Europe/Sofia")
+val shanghaiTimeZone = TimeZone.getTimeZone("Asia/Shanghai")
+val localSessionTimeZone =
+  if (defaultJVMTimeZone == shanghaiTimeZone) sofiaTimeZone else 
shanghaiTimeZone
+
+withSQLConf(SQLConf.SESSION_LOCAL_TIMEZONE.key -> 
localSessionTimeZone.getID) {
+  val e = intercept[java.sql.SQLException] {
+val dfRead = sqlContext.read.jdbc(jdbcUrl, "ts_with_timezone", new 
Properties)
+dfRead.collect()
+  }.getMessage
+  assert(e.contains("Unrecognized SQL type -101"))
+}
+  }
+
+  /**
+   * Change the Time Zone `timeZoneId` of JVM before executing `f`, then 
switches back to the
+   * original after `f` returns.
+   * @param timeZoneId the ID for a TimeZone, either an abbreviation such 
as "PST", a full name such
+   *   as "America/Los_Angeles", or a custom ID such as 
"GMT-8:00".
+   */
+  private def withTimeZone(timeZoneId: String)(f: => Unit): Unit = {
+val originalLocale = TimeZone.getDefault
+try {
+  // Add Locale setting
+  TimeZone.setDefault(TimeZone.getTimeZone(timeZoneId))
+  f
+} finally {
+  TimeZone.setDefault(originalLocale)
+}
+  }
+
+  test("Column TIMESTAMP with TIME ZONE(JVM timezone)") {
+def checkRow(row: Row, ts: String): Unit = {
+  assert(row.getTimestamp(1).equals(Timestamp.valueOf(ts)))
+}
+
+val dfRead = sqlContext.read.jdbc(jdbcUrl, "ts_with_timezone", new 
Properties)
+withTimeZone("PST") {
--- End diff --

how do you make sure sesson time zone is same with JVM time zone?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19939: [SPARK-20557] [SQL] Only support TIMESTAMP WITH T...

2017-12-09 Thread gatorsmile
GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/19939

[SPARK-20557] [SQL] Only support TIMESTAMP WITH TIME ZONE for Oracle Dialect

## What changes were proposed in this pull request?
In the previous PRs, https://github.com/apache/spark/pull/17832 and 
https://github.com/apache/spark/pull/17835 , we convert `TIMESTAMP WITH TIME 
ZONE` and `TIME WITH TIME ZONE` to `TIMESTAMP` for all the JDBC sources. 
However, this conversion could be risky since it does not respect our SQL 
configuration `spark.sql.session.timeZone`. 

In addition, each vendor might have different semantics for these two 
types. For example, Postgres simply returns `TIMESTAMP` types for `TIMESTAMP 
WITH TIME ZONE`. For such supports, we should do it case by case. This PR 
reverts the general support of `TIMESTAMP WITH TIME ZONE` and `TIME WITH TIME 
ZONE` for JDBC sources, except ORACLE Dialect.

When supporting the ORACLE's `TIMESTAMP WITH TIME ZONE`, we only support it 
when the JVM default timezone is the same as the user-specified configuration 
`spark.sql.session.timeZone`. Now, we still treat `TIMESTAMP WITH TIME ZONE` as 
`TIMESTAMP` when fetching the values via the Oracle JDBC connector, whose 
client converts the timestamp values with time zone to the timestamp values 
using the local JVM default timezone (a test case is added to 
`OracleIntegrationSuite.scala` in this PR for showing the behavior). Thus, to 
avoid any future behavior change, we will not support it if JVM default 
timezone is different from `spark.sql.session.timeZone`

No regression because the previous two PRs were just merged to be 
unreleased master branch. 

## How was this patch tested?
Added the test cases

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark timezoneUpdate

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19939.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19939


commit 69c6e3ef97b6450da5250171d47cf6c7f0faa9ff
Author: gatorsmile 
Date:   2017-12-10T07:06:37Z

fix




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org