[ https://issues.apache.org/jira/browse/SPARK-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15938096#comment-15938096 ]
Giorgio Massignani edited comment on SPARK-18350 at 3/23/17 11:29 AM: ---------------------------------------------------------------------- I'd like to share what we did to solve the oracle _TIMESTAMP WITH TIME ZONE_ We are looking for to upgrade to the latest spark version, but because it hasn't changes about it, we did in the _spark 1.6.1_ with scala. In our case, we are creating _StructType_ and _StructField_ programatically creating DataFrames from RDDs. The first problem with the TimeZones are, how to send the TimeZone embedded into a Timestamp column? Our workaround was creating the a new type _TimestampTz_ which has _UserDefinedType_ and _Kryo_ serialisers. {code:java} @SQLUserDefinedType(udt = classOf[TimestampTzUdt]) @DefaultSerializer(classOf[TimestampTzKryo]) class TimestampTz(val time: Long, val timeZoneId:String) {code} The second problem, how to customise spark when it is call _PreparedStatement.setXXX_? It makes me create a new _DataFrameWriter_ duplicating the code because it is a _final class_ With a _CustomDataFrameWriter_ it has to to call the _JdbcUtils_ where the customisation should be done. We created a _CustomJdbcUtils_ which is a Proxy of _JdbcUtils_ but with a change only where it call the _PreparedStatement.setTimestamp_ {code:java} case TimestampTzUdt => val timestampTz = row.getAs[TimestampTz](i) val cal = timestampTz.getCalendar stmt.setTimestamp(i + 1, new java.sql.Timestamp(timestampTz.time), cal) {code} It would be perfect if the oracle driver worked as we expected, sending the timezone to the column. However, to work, we need to call a specific oracle class. {code:java} case TimestampTzUdt => val timestampTz = row.getAs[TimestampTz](i) val cal = timestampTz.getCalendar if (isOracle) stmt.setObject(i + 1, new oracle.sql.TIMESTAMPTZ(conn, new java.sql.Timestamp(timestampTz.time), cal)) else stmt.setTimestamp(i + 1, new java.sql.Timestamp(timestampTz.time), cal) {code} In resume, what we expected from Spark? Creating some shortcuts to make easier customise spark sql for these cases. was (Author: giorgio_sonra): I'd like to share what we did to solve the oracle _TIMESTAMP WITH TIME ZONE_ We are looking for to upgrade to the latest spark version, but because it hasn't changes about it, we did in the `spark 1.6.1` with scala. In our case, we are creating _StructType_ and _StructField_ programatically creating DataFrames from RDDs. The first problem with the TimeZones are, how to send the TimeZone embedded into a Timestamp column? Our workaround was creating the a new type _TimestampTz_ which has _UserDefinedType_ and _Kryo_ serialisers. {code:java} @SQLUserDefinedType(udt = classOf[TimestampTzUdt]) @DefaultSerializer(classOf[TimestampTzKryo]) class TimestampTz(val time: Long, val timeZoneId:String) {code} The second problem, how to customise spark when it is call _PreparedStatement.setXXX_? It makes me create a new _DataFrameWriter_ duplicating the code because it is a _final class_ With a _CustomDataFrameWriter_ it has to to call the _JdbcUtils_ where the customisation should be done. We created a _CustomJdbcUtils_ which is a Proxy of _JdbcUtils_ but with a change only where it call the _PreparedStatement.setTimestamp_ {code:java} case TimestampTzUdt => val timestampTz = row.getAs[TimestampTz](i) val cal = timestampTz.getCalendar stmt.setTimestamp(i + 1, new java.sql.Timestamp(timestampTz.time), cal) {code} It would be perfect if the oracle driver worked as we expected, sending the timezone to the column. However, to work, we need to call a specific oracle class. {code:java} case TimestampTzUdt => val timestampTz = row.getAs[TimestampTz](i) val cal = timestampTz.getCalendar if (isOracle) stmt.setObject(i + 1, new oracle.sql.TIMESTAMPTZ(conn, new java.sql.Timestamp(timestampTz.time), cal)) else stmt.setTimestamp(i + 1, new java.sql.Timestamp(timestampTz.time), cal) {code} In resume, what we expected from Spark? Creating some shortcuts to make easier customise spark sql for these cases. > Support session local timezone > ------------------------------ > > Key: SPARK-18350 > URL: https://issues.apache.org/jira/browse/SPARK-18350 > Project: Spark > Issue Type: New Feature > Components: SQL > Reporter: Reynold Xin > Assignee: Takuya Ueshin > Labels: releasenotes > Fix For: 2.2.0 > > > As of Spark 2.1, Spark SQL assumes the machine timezone for datetime > manipulation, which is bad if users are not in the same timezones as the > machines, or if different users have different timezones. > We should introduce a session local timezone setting that is used for > execution. > An explicit non-goal is locale handling. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org