[jira] [Comment Edited] (SPARK-18350) Support session local timezone

Giorgio Massignani (JIRA) Thu, 23 Mar 2017 04:29:58 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15938096#comment-15938096
 ]


Giorgio Massignani edited comment on SPARK-18350 at 3/23/17 11:29 AM:
----------------------------------------------------------------------

I'd like to share what we did to solve the oracle _TIMESTAMP WITH TIME ZONE_
We are looking for to upgrade to the latest spark version, but because it 
hasn't changes about it, we did in the _spark 1.6.1_ with scala.

In our case, we are creating _StructType_ and _StructField_ programatically 
creating DataFrames from RDDs.

The first problem with the TimeZones are, how to send the TimeZone embedded 
into a Timestamp column?

Our workaround was creating the a new type _TimestampTz_ which has 
_UserDefinedType_ and _Kryo_ serialisers.
{code:java}
@SQLUserDefinedType(udt = classOf[TimestampTzUdt])
@DefaultSerializer(classOf[TimestampTzKryo])
class TimestampTz(val time: Long, val timeZoneId:String)
{code}
The second problem, how to customise spark when it is call 
_PreparedStatement.setXXX_?

It makes me create a new _DataFrameWriter_ duplicating the code because it is a 
_final class_

With a _CustomDataFrameWriter_ it has to to call the _JdbcUtils_ where the 
customisation should be done.

We created a _CustomJdbcUtils_ which is a Proxy of _JdbcUtils_ but with a 
change only where it call the _PreparedStatement.setTimestamp_
{code:java}
 case TimestampTzUdt =>
   val timestampTz = row.getAs[TimestampTz](i)
   val cal = timestampTz.getCalendar
   stmt.setTimestamp(i + 1, new java.sql.Timestamp(timestampTz.time), cal)
{code}
It would be perfect if the oracle driver worked as we expected, sending the 
timezone to the column.

However, to work, we need to call a specific oracle class.
{code:java}
case TimestampTzUdt =>
   val timestampTz = row.getAs[TimestampTz](i)
   val cal = timestampTz.getCalendar
   if (isOracle)
     stmt.setObject(i + 1, new oracle.sql.TIMESTAMPTZ(conn, new 
java.sql.Timestamp(timestampTz.time), cal))
   else
    stmt.setTimestamp(i + 1, new java.sql.Timestamp(timestampTz.time), cal)
{code}


In resume, what we expected from Spark? 

Creating some shortcuts to make easier customise spark sql for these cases.


was (Author: giorgio_sonra):
I'd like to share what we did to solve the oracle _TIMESTAMP WITH TIME ZONE_
We are looking for to upgrade to the latest spark version, but because it 
hasn't changes about it, we did in the `spark 1.6.1` with scala.

In our case, we are creating _StructType_ and _StructField_ programatically 
creating DataFrames from RDDs.

The first problem with the TimeZones are, how to send the TimeZone embedded 
into a Timestamp column?

Our workaround was creating the a new type _TimestampTz_ which has 
_UserDefinedType_ and _Kryo_ serialisers.
{code:java}
@SQLUserDefinedType(udt = classOf[TimestampTzUdt])
@DefaultSerializer(classOf[TimestampTzKryo])
class TimestampTz(val time: Long, val timeZoneId:String)
{code}
The second problem, how to customise spark when it is call 
_PreparedStatement.setXXX_?

It makes me create a new _DataFrameWriter_ duplicating the code because it is a 
_final class_

With a _CustomDataFrameWriter_ it has to to call the _JdbcUtils_ where the 
customisation should be done.

We created a _CustomJdbcUtils_ which is a Proxy of _JdbcUtils_ but with a 
change only where it call the _PreparedStatement.setTimestamp_
{code:java}
 case TimestampTzUdt =>
   val timestampTz = row.getAs[TimestampTz](i)
   val cal = timestampTz.getCalendar
   stmt.setTimestamp(i + 1, new java.sql.Timestamp(timestampTz.time), cal)
{code}
It would be perfect if the oracle driver worked as we expected, sending the 
timezone to the column.

However, to work, we need to call a specific oracle class.
{code:java}
case TimestampTzUdt =>
   val timestampTz = row.getAs[TimestampTz](i)
   val cal = timestampTz.getCalendar
   if (isOracle)
     stmt.setObject(i + 1, new oracle.sql.TIMESTAMPTZ(conn, new 
java.sql.Timestamp(timestampTz.time), cal))
   else
    stmt.setTimestamp(i + 1, new java.sql.Timestamp(timestampTz.time), cal)
{code}


In resume, what we expected from Spark? 

Creating some shortcuts to make easier customise spark sql for these cases.

> Support session local timezone
> ------------------------------
>
>                 Key: SPARK-18350
>                 URL: https://issues.apache.org/jira/browse/SPARK-18350
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>            Reporter: Reynold Xin
>            Assignee: Takuya Ueshin
>              Labels: releasenotes
>             Fix For: 2.2.0
>
>
> As of Spark 2.1, Spark SQL assumes the machine timezone for datetime 
> manipulation, which is bad if users are not in the same timezones as the 
> machines, or if different users have different timezones.
> We should introduce a session local timezone setting that is used for 
> execution.
> An explicit non-goal is locale handling.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

Reply via email to