subject:"\[jira\] \[Comment Edited\] \(SPARK\-18350\) Support session local timezone"

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

2017-10-12 Thread Alexandre Dupriez (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16202322#comment-16202322
 ] 

Alexandre Dupriez edited comment on SPARK-18350 at 10/12/17 5:30 PM:
-

Hello all,

I have a use case where a {{Dataset}} contains a column of type 
{{java.sql.Timestamp}} (let's call it {{_time}}) which I am using to derive new 
columns with the year, month, day and hour specified by the {{_time}} column, 
with something like:
{code:java}
session.read.schema(mySchema)
   .json(path)
   .withColumn("year", year($"_time"))
   .withColumn("month", month($"_time"))
   .withColumn("day", dayofmonth($"_time"))
   .withColumn("hour", hour($"_time"))
{code}
using the standard {{year}}, {{month}}, {{dayofmonth}} and {{hour}} functions 
defined in {{org.apache.spark.sql.functions}}.

Now let's assume the timezone is row dependent - and let's call {{_tz}} the 
column which contains it.The timezone is at the row level which is why I cannot 
configure the {{DataFrameWriter}} with a {{timeZone}} option.
I wondered if something like this would be advisable:
{code:java}
session.read.schema(mySchema)
  .json(path)
  .withColumn("year", year($"_time"))
  .withColumn("month", month($"_time"))
  .withColumn("day", dayofmonth($"_time"))
  .withColumn("hour", hour($"_time", $"_tz"))
{code}
Having a look at the definition of the {{hour}} function, it uses an {{Hour}} 
expression which can be constructed with an optional {{timeZoneId}}.
I have been trying to create an {{Hour}} expression but this is Spark-internal 
construct - and the API forbids to use it directly.
I guess providing a function {{hour(t: Column, tz: Column)}} along with the 
existing {{hour(t: Column)}} would not be a satisfying design.

Do you think a somehow elegant solution exists for this use case? Or is the 
methodology I use flawed - i.e. I should not derive the hour from a timestamp 
column if it happens to rely on a not predefined, row-dependent time zone like 
this?


was (Author: hangleton):
Hello all,

I have a use case where a {{Dataset}} contains a column of type 
{{java.sql.Timestamp}} (let's call it {{_time}}) which I am using to derive new 
columns with the year, month, day and hour specified by the {{_time}} column, 
with something like:
{code:java}
session.read.schema(mySchema)
 .json(path)
  .withColumn("year", year($"_time"))
  .withColumn("month", month($"_time"))
  .withColumn("day", dayofmonth($"_time"))
  .withColumn("hour", hour($"_time"))
{code}
using the standard {{year}}, {{month}}, {{dayofmonth}} and {{hour}} functions 
defined in {{org.apache.spark.sql.functions}}.

Now let's assume the timezone is row dependent - and let's call {{_tz}} the 
column which contains it.The timezone is at the row level which is why I cannot 
configure the {{DataFrameWriter}} with a {{timeZone}} option.
I wondered if something like this would be advisable:
{code:java}
session.read.schema(mySchema)
  .json(path)
  .withColumn("year", year($"_time"))
  .withColumn("month", month($"_time"))
  .withColumn("day", dayofmonth($"_time"))
  .withColumn("hour", hour($"_time", $"_tz"))
{code}
Having a look at the definition of the {{hour}} function, it uses an {{Hour}} 
expression which can be constructed with an optional {{timeZoneId}}.
I have been trying to create an {{Hour}} expression but this is Spark-internal 
construct - and the API forbids to use it directly.
I guess providing a function {{hour(t: Column, tz: Column)}} along with the 
existing {{hour(t: Column)}} would not be a satisfying design.

Do you think a somehow elegant solution exists for this use case? Or is the 
methodology I use flawed - i.e. I should not derive the hour from a timestamp 
column if it happens to rely on a not predefined, row-dependent time zone like 
this?

> Support session local timezone
> --
>
> Key: SPARK-18350
> URL: https://issues.apache.org/jira/browse/SPARK-18350
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Takuya Ueshin
>  Labels: releasenotes
> Fix For: 2.2.0
>
> Attachments: sample.csv
>
>
> As of Spark 2.1, Spark SQL assumes the machine timezone for datetime 
> manipulation, which is bad if users are not in the same timezones as the 
> machines, or if different users have different timezones.
> We should introduce a session local timezone setting that is used for 
> execution.
> An explicit non-goal is locale handling.



--
This message was sent by Atlassian JIRA
(v6.

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

2017-10-12 Thread Alexandre Dupriez (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16202322#comment-16202322
 ] 

Alexandre Dupriez edited comment on SPARK-18350 at 10/12/17 5:30 PM:
-

Hello all,

I have a use case where a {{Dataset}} contains a column of type 
{{java.sql.Timestamp}} (let's call it {{_time}}) which I am using to derive new 
columns with the year, month, day and hour specified by the {{_time}} column, 
with something like:
{code:java}
session.read.schema(mySchema)
.json(path)
.withColumn("year", year($"_time"))
.withColumn("month", month($"_time"))
.withColumn("day", dayofmonth($"_time"))
.withColumn("hour", hour($"_time"))
{code}
using the standard {{year}}, {{month}}, {{dayofmonth}} and {{hour}} functions 
defined in {{org.apache.spark.sql.functions}}.

Now let's assume the timezone is row dependent - and let's call {{_tz}} the 
column which contains it.The timezone is at the row level which is why I cannot 
configure the {{DataFrameWriter}} with a {{timeZone}} option.
I wondered if something like this would be advisable:
{code:java}
session.read.schema(mySchema)
.json(path)
.withColumn("year", year($"_time"))
.withColumn("month", month($"_time"))
.withColumn("day", dayofmonth($"_time"))
.withColumn("hour", hour($"_time", $"_tz"))
{code}
Having a look at the definition of the {{hour}} function, it uses an {{Hour}} 
expression which can be constructed with an optional {{timeZoneId}}.
I have been trying to create an {{Hour}} expression but this is Spark-internal 
construct - and the API forbids to use it directly.
I guess providing a function {{hour(t: Column, tz: Column)}} along with the 
existing {{hour(t: Column)}} would not be a satisfying design.

Do you think a somehow elegant solution exists for this use case? Or is the 
methodology I use flawed - i.e. I should not derive the hour from a timestamp 
column if it happens to rely on a not predefined, row-dependent time zone like 
this?


was (Author: hangleton):
Hello all,

I have a use case where a {{Dataset}} contains a column of type 
{{java.sql.Timestamp}} (let's call it {{_time}}) which I am using to derive new 
columns with the year, month, day and hour specified by the {{_time}} column, 
with something like:
{code:java}
session.read.schema(mySchema)
   .json(path)
   .withColumn("year", year($"_time"))
   .withColumn("month", month($"_time"))
   .withColumn("day", dayofmonth($"_time"))
   .withColumn("hour", hour($"_time"))
{code}
using the standard {{year}}, {{month}}, {{dayofmonth}} and {{hour}} functions 
defined in {{org.apache.spark.sql.functions}}.

Now let's assume the timezone is row dependent - and let's call {{_tz}} the 
column which contains it.The timezone is at the row level which is why I cannot 
configure the {{DataFrameWriter}} with a {{timeZone}} option.
I wondered if something like this would be advisable:
{code:java}
session.read.schema(mySchema)
  .json(path)
  .withColumn("year", year($"_time"))
  .withColumn("month", month($"_time"))
  .withColumn("day", dayofmonth($"_time"))
  .withColumn("hour", hour($"_time", $"_tz"))
{code}
Having a look at the definition of the {{hour}} function, it uses an {{Hour}} 
expression which can be constructed with an optional {{timeZoneId}}.
I have been trying to create an {{Hour}} expression but this is Spark-internal 
construct - and the API forbids to use it directly.
I guess providing a function {{hour(t: Column, tz: Column)}} along with the 
existing {{hour(t: Column)}} would not be a satisfying design.

Do you think a somehow elegant solution exists for this use case? Or is the 
methodology I use flawed - i.e. I should not derive the hour from a timestamp 
column if it happens to rely on a not predefined, row-dependent time zone like 
this?

> Support session local timezone
> --
>
> Key: SPARK-18350
> URL: https://issues.apache.org/jira/browse/SPARK-18350
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Takuya Ueshin
>  Labels: releasenotes
> Fix For: 2.2.0
>
> Attachments: sample.csv
>
>
> As of Spark 2.1, Spark SQL assumes the machine timezone for datetime 
> manipulation, which is bad if users are not in the same timezones as the 
> machines, or if different users have different timezones.
> We should introduce a session local timezone setting that is used for 
> execution.
> An explicit non-goal is locale handling.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsub

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

2017-10-12 Thread Alexandre Dupriez (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16202322#comment-16202322
 ] 

Alexandre Dupriez edited comment on SPARK-18350 at 10/12/17 5:29 PM:
-

Hello all,

I have a use case where a {{Dataset}} contains a column of type 
{{java.sql.Timestamp}} (let's call it {{_time}}) which I am using to derive new 
columns with the year, month, day and hour specified by the {{_time}} column, 
with something like:
{code:java}
session.read.schema(mySchema)
 .json(path)
  .withColumn("year", year($"_time"))
  .withColumn("month", month($"_time"))
  .withColumn("day", dayofmonth($"_time"))
  .withColumn("hour", hour($"_time"))
{code}
using the standard {{year}}, {{month}}, {{dayofmonth}} and {{hour}} functions 
defined in {{org.apache.spark.sql.functions}}.

Now let's assume the timezone is row dependent - and let's call {{_tz}} the 
column which contains it.The timezone is at the row level which is why I cannot 
configure the {{DataFrameWriter}} with a {{timeZone}} option.
I wondered if something like this would be advisable:
{code:java}
session.read.schema(mySchema)
  .json(path)
  .withColumn("year", year($"_time"))
  .withColumn("month", month($"_time"))
  .withColumn("day", dayofmonth($"_time"))
  .withColumn("hour", hour($"_time", $"_tz"))
{code}
Having a look at the definition of the {{hour}} function, it uses an {{Hour}} 
expression which can be constructed with an optional {{timeZoneId}}.
I have been trying to create an {{Hour}} expression but this is Spark-internal 
construct - and the API forbids to use it directly.
I guess providing a function {{hour(t: Column, tz: Column)}} along with the 
existing {{hour(t: Column)}} would not be a satisfying design.

Do you think a somehow elegant solution exists for this use case? Or is the 
methodology I use flawed - i.e. I should not derive the hour from a timestamp 
column if it happens to rely on a not predefined, row-dependent time zone like 
this?


was (Author: hangleton):
Hello all,

I have a use case where a {{Dataset}} contains a column of type 
{{java.sql.Timestamp}} (let's call it {{_time}}) which I am using to derive new 
columns with the year, month, day and hour specified by the {{_time}} column, 
with something like:
{code:java}
session.read.schema(mySchema)
  .json(path)
  .withColumn("year", year($"_time"))
  .withColumn("month", month($"_time"))
  .withColumn("day", dayofmonth($"_time"))
  .withColumn("hour", hour($"_time"))
{code}
using the standard {{year}}, {{month}}, {{dayofmonth}} and {{hour}} functions 
defined in {{org.apache.spark.sql.functions}}.

Now let's assume the timezone is row dependent - and let's call {{_tz}} the 
column which contains it.The timezone is at the row level which is why I cannot 
configure the {{DataFrameWriter}} with a {{timeZone}} option.
I wondered if something like this would be advisable:
{code:java}
session.read.schema(mySchema)
  .json(path)
  .withColumn("year", year($"_time"))
  .withColumn("month", month($"_time"))
  .withColumn("day", dayofmonth($"_time"))
  .withColumn("hour", hour($"_time", $"_tz"))
{code}
Having a look at the definition of the {{hour}} function, it uses an {{Hour}} 
expression which can be constructed with an optional {{timeZoneId}}.
I have been trying to create an {{Hour}} expression but this is Spark-internal 
construct - and the API forbids to use it directly.
I guess providing a function {{hour(t: Column, tz: Column)}} along with the 
existing {{hour(t: Column)}} would not be a satisfying design.

Do you think a somehow elegant solution exists for this use case? Or is the 
methodology I use flawed - i.e. I should not derive the hour from a timestamp 
column if it happens to rely on a not predefined, row-dependent time zone like 
this?

> Support session local timezone
> --
>
> Key: SPARK-18350
> URL: https://issues.apache.org/jira/browse/SPARK-18350
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Takuya Ueshin
>  Labels: releasenotes
> Fix For: 2.2.0
>
> Attachments: sample.csv
>
>
> As of Spark 2.1, Spark SQL assumes the machine timezone for datetime 
> manipulation, which is bad if users are not in the same timezones as the 
> machines, or if different users have different timezones.
> We should introduce a session local timezone setting that is u

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

2017-08-30 Thread Vinayak (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136567#comment-16136567
 ] 

Vinayak edited comment on SPARK-18350 at 8/30/17 12:56 PM:
---

[~ueshin]  
I have set the below value to set the timeZone to UTC. It is adding the current 
timeZone value even though it is in the UTC format.

spark.conf.set("spark.sql.session.timeZone", "UTC")

Find the attached csv data for reference.

Expected : Time should remain same as the input since it's already in UTC format

var df1 = spark.read.option("delimiter", ",").option("qualifier", 
"\"").option("inferSchema","true").option("header", "true").option("mode", 
"PERMISSIVE").option("timestampFormat","MM/dd/'T'HH:mm:ss.SSS").option("dateFormat",
 "MM/dd/'T'HH:mm:ss").csv("DateSpark.csv");

df1: org.apache.spark.sql.DataFrame = [Name: string, Age: int ... 5 more fields]

scala> df1.show(false);


++---++---+---+--+---+

|Name|Age|Add |Date   |SparkDate  |SparkDate1
|SparkDate2 |

++---++---+---+--+---+

|abc |21 |bvxc|04/22/2017T03:30:02|2017-03-21 03:30:02|2017-03-21 
09:00:02.02|2017-03-21 05:30:00|

++---++---+---+--+---+


was (Author: vinayaksgadag):
[~ueshin]  
I have set the below value to set the timeZone to UTC. It is adding the current 
timeZone value even though it is in the UTC format.

spark.conf.set("spark.sql.session.timeZone", "UTC")

Find the attached csv data for reference.

Expected : Time should remain same as the input since it's already in UTC format

var df1 = spark.read.option("delimiter", ",").option("qualifier", 
"\"").option("inferSchema","true").option("header", "true").option("mode", 
"PERMISSIVE").option("timestampFormat","MM/dd/'T'HH:mm:ss.SSS").option("dateFormat",
 "MM/dd/'T'HH:mm:ss").csv("DateSpark.csv");

df1: org.apache.spark.sql.DataFrame = [Name: string, Age: int ... 5 more fields]

scala> df1.show(false);
--


Name Age Add  Date  SparkDate  SparkDate1  SparkDate2  

--


abc  21  bvxc 04/22/2017T03:30:02 2017-03-21 03:30:02 2017-03-21 09:00:02.02 
2017-03-21 05:30:00 

--



> Support session local timezone
> --
>
> Key: SPARK-18350
> URL: https://issues.apache.org/jira/browse/SPARK-18350
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Takuya Ueshin
>  Labels: releasenotes
> Fix For: 2.2.0
>
> Attachments: sample.csv
>
>
> As of Spark 2.1, Spark SQL assumes the machine timezone for datetime 
> manipulation, which is bad if users are not in the same timezones as the 
> machines, or if different users have different timezones.
> We should introduce a session local timezone setting that is used for 
> execution.
> An explicit non-goal is locale handling.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

2017-08-30 Thread Vinayak (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136567#comment-16136567
 ] 

Vinayak edited comment on SPARK-18350 at 8/30/17 12:55 PM:
---

[~ueshin]  
I have set the below value to set the timeZone to UTC. It is adding the current 
timeZone value even though it is in the UTC format.

spark.conf.set("spark.sql.session.timeZone", "UTC")

Find the attached csv data for reference.

Expected : Time should remain same as the input since it's already in UTC format

var df1 = spark.read.option("delimiter", ",").option("qualifier", 
"\"").option("inferSchema","true").option("header", "true").option("mode", 
"PERMISSIVE").option("timestampFormat","MM/dd/'T'HH:mm:ss.SSS").option("dateFormat",
 "MM/dd/'T'HH:mm:ss").csv("DateSpark.csv");

df1: org.apache.spark.sql.DataFrame = [Name: string, Age: int ... 5 more fields]

scala> df1.show(false);
--


Name Age Add  Date  SparkDate  SparkDate1  SparkDate2  

--


abc  21  bvxc 04/22/2017T03:30:02 2017-03-21 03:30:02 2017-03-21 09:00:02.02 
2017-03-21 05:30:00 

--




was (Author: vinayaksgadag):
[~ueshin]  
I have set the below value to set the timeZone to UTC. It is adding the current 
timeZone value even though it is in the UTC format.

spark.conf.set("spark.sql.session.timeZone", "UTC")

Find the attached csv data for reference.

Expected : Time should remain same as the input since it's already in UTC format

var df1 = spark.read.option("delimiter", ",").option("qualifier", 
"\"").option("inferSchema","true").option("header", "true").option("mode", 
"PERMISSIVE").option("timestampFormat","MM/dd/'T'HH:mm:ss.SSS").option("dateFormat",
 "MM/dd/'T'HH:mm:ss").csv("DateSpark.csv");

df1: org.apache.spark.sql.DataFrame = [Name: string, Age: int ... 5 more fields]

scala> df1.show(false);

--


Name Age Add  Date  SparkDate  SparkDate1  SparkDate2  

--


abc  21  bvxc 04/22/2017T03:30:02 2017-03-21 03:30:02 2017-03-21 09:00:02.02 
2017-03-21 05:30:00 


> Support session local timezone
> --
>
> Key: SPARK-18350
> URL: https://issues.apache.org/jira/browse/SPARK-18350
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Takuya Ueshin
>  Labels: releasenotes
> Fix For: 2.2.0
>
> Attachments: sample.csv
>
>
> As of Spark 2.1, Spark SQL assumes the machine timezone for datetime 
> manipulation, which is bad if users are not in the same timezones as the 
> machines, or if different users have different timezones.
> We should introduce a session local timezone setting that is used for 
> execution.
> An explicit non-goal is locale handling.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

2017-08-29 Thread Vinayak (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136567#comment-16136567
 ] 

Vinayak edited comment on SPARK-18350 at 8/29/17 7:30 PM:
--

[~ueshin]  
I have set the below value to set the timeZone to UTC. It is adding the current 
timeZone value even though it is in the UTC format.

spark.conf.set("spark.sql.session.timeZone", "UTC")

Find the attached csv data for reference.

Expected : Time should remain same as the input since it's already in UTC format

var df1 = spark.read.option("delimiter", ",").option("qualifier", 
"\"").option("inferSchema","true").option("header", "true").option("mode", 
"PERMISSIVE").option("timestampFormat","MM/dd/'T'HH:mm:ss.SSS").option("dateFormat",
 "MM/dd/'T'HH:mm:ss").csv("DateSpark.csv");

df1: org.apache.spark.sql.DataFrame = [Name: string, Age: int ... 5 more fields]

scala> df1.show(false);

--


Name Age Add  Date  SparkDate  SparkDate1  SparkDate2  

--


abc  21  bvxc 04/22/2017T03:30:02 2017-03-21 03:30:02 2017-03-21 09:00:02.02 
2017-03-21 05:30:00 



was (Author: vinayaksgadag):
I have set the below value to set the timeZone to UTC. It is adding the current 
timeZone value even though it is in the UTC format.

spark.conf.set("spark.sql.session.timeZone", "UTC")

Find the attached csv data for reference.

Expected : Time should remain same as the input since it's already in UTC format

var df1 = spark.read.option("delimiter", ",").option("qualifier", 
"\"").option("inferSchema","true").option("header", "true").option("mode", 
"PERMISSIVE").option("timestampFormat","MM/dd/'T'HH:mm:ss.SSS").option("dateFormat",
 "MM/dd/'T'HH:mm:ss").csv("DateSpark.csv");

df1: org.apache.spark.sql.DataFrame = [Name: string, Age: int ... 5 more fields]

scala> df1.show(false);

--


Name Age Add  Date  SparkDate  SparkDate1  SparkDate2  

--


abc  21  bvxc 04/22/2017T03:30:02 2017-03-21 03:30:02 2017-03-21 09:00:02.02 
2017-03-21 05:30:00 


> Support session local timezone
> --
>
> Key: SPARK-18350
> URL: https://issues.apache.org/jira/browse/SPARK-18350
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Takuya Ueshin
>  Labels: releasenotes
> Fix For: 2.2.0
>
> Attachments: sample.csv
>
>
> As of Spark 2.1, Spark SQL assumes the machine timezone for datetime 
> manipulation, which is bad if users are not in the same timezones as the 
> machines, or if different users have different timezones.
> We should introduce a session local timezone setting that is used for 
> execution.
> An explicit non-goal is locale handling.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

2017-08-22 Thread Vinayak (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136567#comment-16136567
 ] 

Vinayak edited comment on SPARK-18350 at 8/22/17 11:42 AM:
---

I have set the below value to set the timeZone to UTC. It is adding the current 
timeZone value even though it is in the UTC format.

spark.conf.set("spark.sql.session.timeZone", "UTC")

Find the attached csv data for reference.

Expected : Time should remain same as the input since it's already in UTC format

var df1 = spark.read.option("delimiter", ",").option("qualifier", 
"\"").option("inferSchema","true").option("header", "true").option("mode", 
"PERMISSIVE").option("timestampFormat","MM/dd/'T'HH:mm:ss.SSS").option("dateFormat",
 "MM/dd/'T'HH:mm:ss").csv("DateSpark.csv");

df1: org.apache.spark.sql.DataFrame = [Name: string, Age: int ... 5 more fields]

scala> df1.show(false);

--


Name Age Add  Date  SparkDate  SparkDate1  SparkDate2  

--


abc  21  bvxc 04/22/2017T03:30:02 2017-03-21 03:30:02 2017-03-21 09:00:02.02 
2017-03-21 05:30:00 



was (Author: vinayaksgadag):
I have set the below value to set the timeZone to UTC. It is adding the current 
timeZone value even though it is in the UTC format.

spark.conf.set("spark.sql.session.timeZone", "UTC")

Find the attached csv data for reference.

Expected : Time should remain same as the input since it's already in UTC format

> Support session local timezone
> --
>
> Key: SPARK-18350
> URL: https://issues.apache.org/jira/browse/SPARK-18350
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Takuya Ueshin
>  Labels: releasenotes
> Fix For: 2.2.0
>
> Attachments: sample.csv
>
>
> As of Spark 2.1, Spark SQL assumes the machine timezone for datetime 
> manipulation, which is bad if users are not in the same timezones as the 
> machines, or if different users have different timezones.
> We should introduce a session local timezone setting that is used for 
> execution.
> An explicit non-goal is locale handling.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

2017-08-22 Thread Vinayak (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136567#comment-16136567
 ] 

Vinayak edited comment on SPARK-18350 at 8/22/17 9:37 AM:
--

I have set the below value to set the timeZone to UTC. It is adding the current 
timeZone value even though it is in the UTC format.

spark.conf.set("spark.sql.session.timeZone", "UTC")

Find the attached csv data for reference.

Expected : Time should remain same as the input since it's already in UTC format


was (Author: vinayaksgadag):
I have set the below value to set the timeZone to UTC. It is adding the current 
timeZone value even though it is in the UTC format.

spark.conf.set("spark.sql.session.timeZone", "UTC")

Find the attached csv data for reference.

> Support session local timezone
> --
>
> Key: SPARK-18350
> URL: https://issues.apache.org/jira/browse/SPARK-18350
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Takuya Ueshin
>  Labels: releasenotes
> Fix For: 2.2.0
>
> Attachments: sample.csv
>
>
> As of Spark 2.1, Spark SQL assumes the machine timezone for datetime 
> manipulation, which is bad if users are not in the same timezones as the 
> machines, or if different users have different timezones.
> We should introduce a session local timezone setting that is used for 
> execution.
> An explicit non-goal is locale handling.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

2017-03-23 Thread Giorgio Massignani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15938096#comment-15938096
 ] 

Giorgio Massignani edited comment on SPARK-18350 at 3/23/17 11:29 AM:
--

I'd like to share what we did to solve the oracle _TIMESTAMP WITH TIME ZONE_
We are looking for to upgrade to the latest spark version, but because it 
hasn't changes about it, we did in the `spark 1.6.1` with scala.

In our case, we are creating _StructType_ and _StructField_ programatically 
creating DataFrames from RDDs.

The first problem with the TimeZones are, how to send the TimeZone embedded 
into a Timestamp column?

Our workaround was creating the a new type _TimestampTz_ which has 
_UserDefinedType_ and _Kryo_ serialisers.
{code:java}
@SQLUserDefinedType(udt = classOf[TimestampTzUdt])
@DefaultSerializer(classOf[TimestampTzKryo])
class TimestampTz(val time: Long, val timeZoneId:String)
{code}
The second problem, how to customise spark when it is call 
_PreparedStatement.setXXX_?

It makes me create a new _DataFrameWriter_ duplicating the code because it is a 
_final class_

With a _CustomDataFrameWriter_ it has to to call the _JdbcUtils_ where the 
customisation should be done.

We created a _CustomJdbcUtils_ which is a Proxy of _JdbcUtils_ but with a 
change only where it call the _PreparedStatement.setTimestamp_
{code:java}
 case TimestampTzUdt =>
   val timestampTz = row.getAs[TimestampTz](i)
   val cal = timestampTz.getCalendar
   stmt.setTimestamp(i + 1, new java.sql.Timestamp(timestampTz.time), cal)
{code}
It would be perfect if the oracle driver worked as we expected, sending the 
timezone to the column.

However, to work, we need to call a specific oracle class.
{code:java}
case TimestampTzUdt =>
   val timestampTz = row.getAs[TimestampTz](i)
   val cal = timestampTz.getCalendar
   if (isOracle)
 stmt.setObject(i + 1, new oracle.sql.TIMESTAMPTZ(conn, new 
java.sql.Timestamp(timestampTz.time), cal))
   else
stmt.setTimestamp(i + 1, new java.sql.Timestamp(timestampTz.time), cal)
{code}


In resume, what we expected from Spark? 

Creating some shortcuts to make easier customise spark sql for these cases.


was (Author: giorgio_sonra):
I'd like to share what we did to solve the oracle `TIMESTAMP WITH TIME ZONE`
We are looking for to upgrade to the latest spark version, but because it 
hasn't changes about it, we did in the `spark 1.6.1` with scala.

In our case, we are creating `StructType` and `StructField` programatically 
creating DataFrames from RDDs.

The first problem with the TimeZones are, how to send the TimeZone embedded 
into a Timestamp column?

My workaround was creating the a new type `TimestampTz` which has 
UserDefinedType and Kryo serialisers.
{code:java}
@SQLUserDefinedType(udt = classOf[TimestampTzUdt])
@DefaultSerializer(classOf[TimestampTzKryo])
class TimestampTz(val time: Long, val timeZoneId:String)
{code}
The second problem, how to customise spark when it is call 
`PreparedStatement.setXXX`?

It makes me create a new `DataFrameWriter` duplicating the code because it is a 
`final class`

With a `CustomDataFrameWriter` it has to to call the `JdbcUtils` where the 
customisation should be done.

We created a `CustomJdbcUtils` which is a Proxy of ``JdbcUtils` but with a 
change only where it call the `PreparedStatement.setTimestamp`
{code:java}
 case TimestampTzUdt =>
   val timestampTz = row.getAs[TimestampTz](i)
   val cal = timestampTz.getCalendar
   stmt.setTimestamp(i + 1, new java.sql.Timestamp(timestampTz.time), cal)
{code}
It would be perfect if the oracle driver worked as we expected, sending the 
timezone to the column.

However, to work, we need to call a specific oracle class.
{code:java}
case TimestampTzUdt =>
   val timestampTz = row.getAs[TimestampTz](i)
   val cal = timestampTz.getCalendar
   if (isOracle)
 stmt.setObject(i + 1, new oracle.sql.TIMESTAMPTZ(conn, new 
java.sql.Timestamp(timestampTz.time), cal))
   else
stmt.setTimestamp(i + 1, new java.sql.Timestamp(timestampTz.time), cal)
{code}


In resume, what we expected from Spark? 

Creating some shortcuts to make easier customise spark sql for these cases.

> Support session local timezone
> --
>
> Key: SPARK-18350
> URL: https://issues.apache.org/jira/browse/SPARK-18350
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Takuya Ueshin
>  Labels: releasenotes
> Fix For: 2.2.0
>
>
> As of Spark 2.1, Spark SQL assumes the machine timezone for datetime 
> manipulation, which is bad if users are not in the same timezones as the 
> machines, or if different users have different timezones.
> We should introduce a session local timezone setting that is used for 
> execution.
> An explicit non-goal is loc

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

2017-03-23 Thread Giorgio Massignani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15938096#comment-15938096
 ] 

Giorgio Massignani edited comment on SPARK-18350 at 3/23/17 11:29 AM:
--

I'd like to share what we did to solve the oracle _TIMESTAMP WITH TIME ZONE_
We are looking for to upgrade to the latest spark version, but because it 
hasn't changes about it, we did in the _spark 1.6.1_ with scala.

In our case, we are creating _StructType_ and _StructField_ programatically 
creating DataFrames from RDDs.

The first problem with the TimeZones are, how to send the TimeZone embedded 
into a Timestamp column?

Our workaround was creating the a new type _TimestampTz_ which has 
_UserDefinedType_ and _Kryo_ serialisers.
{code:java}
@SQLUserDefinedType(udt = classOf[TimestampTzUdt])
@DefaultSerializer(classOf[TimestampTzKryo])
class TimestampTz(val time: Long, val timeZoneId:String)
{code}
The second problem, how to customise spark when it is call 
_PreparedStatement.setXXX_?

It makes me create a new _DataFrameWriter_ duplicating the code because it is a 
_final class_

With a _CustomDataFrameWriter_ it has to to call the _JdbcUtils_ where the 
customisation should be done.

We created a _CustomJdbcUtils_ which is a Proxy of _JdbcUtils_ but with a 
change only where it call the _PreparedStatement.setTimestamp_
{code:java}
 case TimestampTzUdt =>
   val timestampTz = row.getAs[TimestampTz](i)
   val cal = timestampTz.getCalendar
   stmt.setTimestamp(i + 1, new java.sql.Timestamp(timestampTz.time), cal)
{code}
It would be perfect if the oracle driver worked as we expected, sending the 
timezone to the column.

However, to work, we need to call a specific oracle class.
{code:java}
case TimestampTzUdt =>
   val timestampTz = row.getAs[TimestampTz](i)
   val cal = timestampTz.getCalendar
   if (isOracle)
 stmt.setObject(i + 1, new oracle.sql.TIMESTAMPTZ(conn, new 
java.sql.Timestamp(timestampTz.time), cal))
   else
stmt.setTimestamp(i + 1, new java.sql.Timestamp(timestampTz.time), cal)
{code}


In resume, what we expected from Spark? 

Creating some shortcuts to make easier customise spark sql for these cases.


was (Author: giorgio_sonra):
I'd like to share what we did to solve the oracle _TIMESTAMP WITH TIME ZONE_
We are looking for to upgrade to the latest spark version, but because it 
hasn't changes about it, we did in the `spark 1.6.1` with scala.

In our case, we are creating _StructType_ and _StructField_ programatically 
creating DataFrames from RDDs.

The first problem with the TimeZones are, how to send the TimeZone embedded 
into a Timestamp column?

Our workaround was creating the a new type _TimestampTz_ which has 
_UserDefinedType_ and _Kryo_ serialisers.
{code:java}
@SQLUserDefinedType(udt = classOf[TimestampTzUdt])
@DefaultSerializer(classOf[TimestampTzKryo])
class TimestampTz(val time: Long, val timeZoneId:String)
{code}
The second problem, how to customise spark when it is call 
_PreparedStatement.setXXX_?

It makes me create a new _DataFrameWriter_ duplicating the code because it is a 
_final class_

With a _CustomDataFrameWriter_ it has to to call the _JdbcUtils_ where the 
customisation should be done.

We created a _CustomJdbcUtils_ which is a Proxy of _JdbcUtils_ but with a 
change only where it call the _PreparedStatement.setTimestamp_
{code:java}
 case TimestampTzUdt =>
   val timestampTz = row.getAs[TimestampTz](i)
   val cal = timestampTz.getCalendar
   stmt.setTimestamp(i + 1, new java.sql.Timestamp(timestampTz.time), cal)
{code}
It would be perfect if the oracle driver worked as we expected, sending the 
timezone to the column.

However, to work, we need to call a specific oracle class.
{code:java}
case TimestampTzUdt =>
   val timestampTz = row.getAs[TimestampTz](i)
   val cal = timestampTz.getCalendar
   if (isOracle)
 stmt.setObject(i + 1, new oracle.sql.TIMESTAMPTZ(conn, new 
java.sql.Timestamp(timestampTz.time), cal))
   else
stmt.setTimestamp(i + 1, new java.sql.Timestamp(timestampTz.time), cal)
{code}


In resume, what we expected from Spark? 

Creating some shortcuts to make easier customise spark sql for these cases.

> Support session local timezone
> --
>
> Key: SPARK-18350
> URL: https://issues.apache.org/jira/browse/SPARK-18350
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Takuya Ueshin
>  Labels: releasenotes
> Fix For: 2.2.0
>
>
> As of Spark 2.1, Spark SQL assumes the machine timezone for datetime 
> manipulation, which is bad if users are not in the same timezones as the 
> machines, or if different users have different timezones.
> We should introduce a session local timezone setting that is used for 
> execution.
> An explicit non-goal is

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

2017-03-23 Thread Giorgio Massignani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15938096#comment-15938096
 ] 

Giorgio Massignani edited comment on SPARK-18350 at 3/23/17 11:26 AM:
--

I'd like to share what we did to solve the oracle `TIMESTAMP WITH TIME ZONE`
We are looking for to upgrade to the latest spark version, but because it 
hasn't changes about it, we did in the `spark 1.6.1` with scala.

In our case, we are creating `StructType` and `StructField` programatically 
creating DataFrames from RDDs.

The first problem with the TimeZones are, how to send the TimeZone embedded 
into a Timestamp column?

My workaround was creating the a new type `TimestampTz` which has 
UserDefinedType and Kryo serialisers.
{code:java}
@SQLUserDefinedType(udt = classOf[TimestampTzUdt])
@DefaultSerializer(classOf[TimestampTzKryo])
class TimestampTz(val time: Long, val timeZoneId:String)
{code}
The second problem, how to customise spark when it is call 
`PreparedStatement.setXXX`?

It makes me create a new `DataFrameWriter` duplicating the code because it is a 
`final class`

With a `CustomDataFrameWriter` it has to to call the `JdbcUtils` where the 
customisation should be done.

We created a `CustomJdbcUtils` which is a Proxy of ``JdbcUtils` but with a 
change only where it call the `PreparedStatement.setTimestamp`
{code:java}
 case TimestampTzUdt =>
   val timestampTz = row.getAs[TimestampTz](i)
   val cal = timestampTz.getCalendar
   stmt.setTimestamp(i + 1, new java.sql.Timestamp(timestampTz.time), cal)
{code}
It would be perfect if the oracle driver worked as we expected, sending the 
timezone to the column.

However, to work, we need to call a specific oracle class.
{code:java}
case TimestampTzUdt =>
   val timestampTz = row.getAs[TimestampTz](i)
   val cal = timestampTz.getCalendar
   if (isOracle)
 stmt.setObject(i + 1, new oracle.sql.TIMESTAMPTZ(conn, new 
java.sql.Timestamp(timestampTz.time), cal))
   else
stmt.setTimestamp(i + 1, new java.sql.Timestamp(timestampTz.time), cal)
{code}


In resume, what we expected from Spark? 

Creating some shortcuts to make easier customise spark sql for these cases.


was (Author: giorgio_sonra):
I'd like to share what we did to solve the oracle `TIMESTAMP WITH TIME ZONE`
We are looking for to upgrade to the latest spark version, but because it 
hasn't changes about it, we did in the `spark 1.6.1` with scala.

In our case, we are creating `StructType` and `StructField` programatically 
creating DataFrames from RDDs.

The first problem with the TimeZones are, how to send the TimeZone embedded 
into a Timestamp column?

My workaround was creating the a new type `TimestampTz` which has 
UserDefinedType and Kryo serialisers.
{code:scala}
@SQLUserDefinedType(udt = classOf[TimestampTzUdt])
@DefaultSerializer(classOf[TimestampTzKryo])
class TimestampTz(val time: Long, val timeZoneId:String)
{code}
The second problem, how to customise spark when it is call 
`PreparedStatement.setXXX`?

It makes me create a new `DataFrameWriter` duplicating the code because it is a 
`final class`

With a `CustomDataFrameWriter` it has to to call the `JdbcUtils` where the 
customisation should be done.

We created a `CustomJdbcUtils` which is a Proxy of ``JdbcUtils` but with a 
change only where it call the `PreparedStatement.setTimestamp`

 case TimestampTzUdt =>
   val timestampTz = row.getAs[TimestampTz](i)
   val cal = timestampTz.getCalendar
   stmt.setTimestamp(i + 1, new java.sql.Timestamp(timestampTz.time), cal)
```
It would be perfect if the oracle driver worked as we expected, sending the 
timezone to the column.

However, to work, we need to call a specific oracle class.

case TimestampTzUdt =>
   val timestampTz = row.getAs[TimestampTz](i)
   val cal = timestampTz.getCalendar
   if (isOracle)
 stmt.setObject(i + 1, new oracle.sql.TIMESTAMPTZ(conn, new 
java.sql.Timestamp(timestampTz.time), cal))
   else
stmt.setTimestamp(i + 1, new java.sql.Timestamp(timestampTz.time), cal)



In resume, what we expected from Spark? 

Creating some shortcuts to make easier customise spark sql for these cases.

> Support session local timezone
> --
>
> Key: SPARK-18350
> URL: https://issues.apache.org/jira/browse/SPARK-18350
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Takuya Ueshin
>  Labels: releasenotes
> Fix For: 2.2.0
>
>
> As of Spark 2.1, Spark SQL assumes the machine timezone for datetime 
> manipulation, which is bad if users are not in the same timezones as the 
> machines, or if different users have different timezones.
> We should introduce a session local timezone setting that is used for 
> execution.
> An explicit non-goal is locale handling.



--
Th

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

2017-03-23 Thread Giorgio Massignani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15938096#comment-15938096
 ] 

Giorgio Massignani edited comment on SPARK-18350 at 3/23/17 11:25 AM:
--

I'd like to share what we did to solve the oracle `TIMESTAMP WITH TIME ZONE`
We are looking for to upgrade to the latest spark version, but because it 
hasn't changes about it, we did in the `spark 1.6.1` with scala.

In our case, we are creating `StructType` and `StructField` programatically 
creating DataFrames from RDDs.

The first problem with the TimeZones are, how to send the TimeZone embedded 
into a Timestamp column?

My workaround was creating the a new type `TimestampTz` which has 
UserDefinedType and Kryo serialisers.
{code:scala}
@SQLUserDefinedType(udt = classOf[TimestampTzUdt])
@DefaultSerializer(classOf[TimestampTzKryo])
class TimestampTz(val time: Long, val timeZoneId:String)
{code}
The second problem, how to customise spark when it is call 
`PreparedStatement.setXXX`?

It makes me create a new `DataFrameWriter` duplicating the code because it is a 
`final class`

With a `CustomDataFrameWriter` it has to to call the `JdbcUtils` where the 
customisation should be done.

We created a `CustomJdbcUtils` which is a Proxy of ``JdbcUtils` but with a 
change only where it call the `PreparedStatement.setTimestamp`

 case TimestampTzUdt =>
   val timestampTz = row.getAs[TimestampTz](i)
   val cal = timestampTz.getCalendar
   stmt.setTimestamp(i + 1, new java.sql.Timestamp(timestampTz.time), cal)
```
It would be perfect if the oracle driver worked as we expected, sending the 
timezone to the column.

However, to work, we need to call a specific oracle class.

case TimestampTzUdt =>
   val timestampTz = row.getAs[TimestampTz](i)
   val cal = timestampTz.getCalendar
   if (isOracle)
 stmt.setObject(i + 1, new oracle.sql.TIMESTAMPTZ(conn, new 
java.sql.Timestamp(timestampTz.time), cal))
   else
stmt.setTimestamp(i + 1, new java.sql.Timestamp(timestampTz.time), cal)



In resume, what we expected from Spark? 

Creating some shortcuts to make easier customise spark sql for these cases.


was (Author: giorgio_sonra):
I'd like to share what we did to solve the oracle `TIMESTAMP WITH TIME ZONE`
We are looking for to upgrade to the latest spark version, but because it 
hasn't changes about it, we did in the `spark 1.6.1` with scala.

In our case, we are creating `StructType` and `StructField` programatically 
creating DataFrames from RDDs.

The first problem with the TimeZones are, how to send the TimeZone embedded 
into a Timestamp column?

My workaround was creating the a new type `TimestampTz` which has 
UserDefinedType and Kryo serialisers.
```
@SQLUserDefinedType(udt = classOf[TimestampTzUdt])
@DefaultSerializer(classOf[TimestampTzKryo])
class TimestampTz(val time: Long, val timeZoneId:String)
```
The second problem, how to customise spark when it is call 
`PreparedStatement.setXXX`?

It makes me create a new `DataFrameWriter` duplicating the code because it is a 
`final class`

With a `CustomDataFrameWriter` it has to to call the `JdbcUtils` where the 
customisation should be done.

We created a `CustomJdbcUtils` which is a Proxy of ``JdbcUtils` but with a 
change only where it call the `PreparedStatement.setTimestamp`

 case TimestampTzUdt =>
   val timestampTz = row.getAs[TimestampTz](i)
   val cal = timestampTz.getCalendar
   stmt.setTimestamp(i + 1, new java.sql.Timestamp(timestampTz.time), cal)
```
It would be perfect if the oracle driver worked as we expected, sending the 
timezone to the column.

However, to work, we need to call a specific oracle class.

case TimestampTzUdt =>
   val timestampTz = row.getAs[TimestampTz](i)
   val cal = timestampTz.getCalendar
   if (isOracle)
 stmt.setObject(i + 1, new oracle.sql.TIMESTAMPTZ(conn, new 
java.sql.Timestamp(timestampTz.time), cal))
   else
stmt.setTimestamp(i + 1, new java.sql.Timestamp(timestampTz.time), cal)



In resume, what we expected from Spark? 

Creating some shortcuts to make easier customise spark sql for these cases.

> Support session local timezone
> --
>
> Key: SPARK-18350
> URL: https://issues.apache.org/jira/browse/SPARK-18350
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Takuya Ueshin
>  Labels: releasenotes
> Fix For: 2.2.0
>
>
> As of Spark 2.1, Spark SQL assumes the machine timezone for datetime 
> manipulation, which is bad if users are not in the same timezones as the 
> machines, or if different users have different timezones.
> We should introduce a session local timezone setting that is used for 
> execution.
> An explicit non-goal is locale handling.



--
This message was sent by Atlassi

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

2017-03-23 Thread Giorgio Massignani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15938096#comment-15938096
 ] 

Giorgio Massignani edited comment on SPARK-18350 at 3/23/17 11:22 AM:
--

I'd like to share what we did to solve the oracle `TIMESTAMP WITH TIME ZONE`
We are looking for to upgrade to the latest spark version, but because it 
hasn't changes about it, we did in the `spark 1.6.1` with scala.

In our case, we are creating `StructType` and `StructField` programatically 
creating DataFrames from RDDs.

The first problem with the TimeZones are, how to send the TimeZone embedded 
into a Timestamp column?

My workaround was creating the a new type `TimestampTz` which has 
UserDefinedType and Kryo serialisers.
```
@SQLUserDefinedType(udt = classOf[TimestampTzUdt])
@DefaultSerializer(classOf[TimestampTzKryo])
class TimestampTz(val time: Long, val timeZoneId:String)
```
The second problem, how to customise spark when it is call 
`PreparedStatement.setXXX`?

It makes me create a new `DataFrameWriter` duplicating the code because it is a 
`final class`

With a `CustomDataFrameWriter` it has to to call the `JdbcUtils` where the 
customisation should be done.

We created a `CustomJdbcUtils` which is a Proxy of ``JdbcUtils` but with a 
change only where it call the `PreparedStatement.setTimestamp`

 case TimestampTzUdt =>
   val timestampTz = row.getAs[TimestampTz](i)
   val cal = timestampTz.getCalendar
   stmt.setTimestamp(i + 1, new java.sql.Timestamp(timestampTz.time), cal)
```
It would be perfect if the oracle driver worked as we expected, sending the 
timezone to the column.

However, to work, we need to call a specific oracle class.

case TimestampTzUdt =>
   val timestampTz = row.getAs[TimestampTz](i)
   val cal = timestampTz.getCalendar
   if (isOracle)
 stmt.setObject(i + 1, new oracle.sql.TIMESTAMPTZ(conn, new 
java.sql.Timestamp(timestampTz.time), cal))
   else
stmt.setTimestamp(i + 1, new java.sql.Timestamp(timestampTz.time), cal)



In resume, what we expected from Spark? 

Creating some shortcuts to make easier customise spark sql for these cases.


was (Author: giorgio_sonra):
I'd like to share what we did to solve the oracle `TIMESTAMP WITH TIME ZONE`
We are looking for to upgrade to the latest spark version, but because it 
hasn't changes about it we did in the `spark 1.6.1`.

In our case, we are creating `StructType` and `StructField` programatically 
creating DataFrames from RDDs.

> Support session local timezone
> --
>
> Key: SPARK-18350
> URL: https://issues.apache.org/jira/browse/SPARK-18350
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Takuya Ueshin
>  Labels: releasenotes
> Fix For: 2.2.0
>
>
> As of Spark 2.1, Spark SQL assumes the machine timezone for datetime 
> manipulation, which is bad if users are not in the same timezones as the 
> machines, or if different users have different timezones.
> We should introduce a session local timezone setting that is used for 
> execution.
> An explicit non-goal is locale handling.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

2017-03-23 Thread Giorgio Massignani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15938096#comment-15938096
 ] 

Giorgio Massignani edited comment on SPARK-18350 at 3/23/17 10:58 AM:
--

I'd like to share what we did to solve the oracle `TIMESTAMP WITH TIME ZONE`
We are looking for to upgrade to the latest spark version, but because it 
hasn't changes about it we did in the `spark 1.6.1`.

In our case, we are creating `StructType` and `StructField` programatically 
creating DataFrames from RDDs.


was (Author: giorgio_sonra):
I'd like to share what we did to solve the oracle `TIMESTAMP WITH TIME ZONE`
We are looking for to upgrade to the latest spark version, but because it 
hasn't changes about it we did in the spark 1.6.1.

> Support session local timezone
> --
>
> Key: SPARK-18350
> URL: https://issues.apache.org/jira/browse/SPARK-18350
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Takuya Ueshin
>  Labels: releasenotes
> Fix For: 2.2.0
>
>
> As of Spark 2.1, Spark SQL assumes the machine timezone for datetime 
> manipulation, which is bad if users are not in the same timezones as the 
> machines, or if different users have different timezones.
> We should introduce a session local timezone setting that is used for 
> execution.
> An explicit non-goal is locale handling.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

2017-03-23 Thread Giorgio Massignani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15938096#comment-15938096
 ] 

Giorgio Massignani edited comment on SPARK-18350 at 3/23/17 10:55 AM:
--

I'd like to share what we did to solve the oracle `TIMESTAMP WITH TIME ZONE`
We are looking for to upgrade to the latest spark version, but because it 
hasn't changes about it we did in the spark 1.6.1.


was (Author: giorgio_sonra):
I'd like to share what we did to solve the oracle `TIMESTAMP WITH TIME ZONE`

> Support session local timezone
> --
>
> Key: SPARK-18350
> URL: https://issues.apache.org/jira/browse/SPARK-18350
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Takuya Ueshin
>  Labels: releasenotes
> Fix For: 2.2.0
>
>
> As of Spark 2.1, Spark SQL assumes the machine timezone for datetime 
> manipulation, which is bad if users are not in the same timezones as the 
> machines, or if different users have different timezones.
> We should introduce a session local timezone setting that is used for 
> execution.
> An explicit non-goal is locale handling.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

2016-11-08 Thread Xiao Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649858#comment-15649858
 ] 

Xiao Li edited comment on SPARK-18350 at 11/9/16 5:26 AM:
--

Below might be needed if we want to support session timezone?

- Add a SQL statement and API to set the current session timezone?
Link: 
https://docs.oracle.com/cd/B19306_01/server.102/b14225/ch4datetime.htm#i1006728
- Add a SQL statement and API to get the current session timezone? 
Link: 
https://www.ibm.com/support/knowledgecenter/SSEPEK_10.0.0/sqlref/src/tpc/db2z_currenttz.html
- Add time zone specific expressions? 
Link: 
http://www.ibm.com/support/knowledgecenter/SSEPEK_10.0.0/sqlref/src/tpc/db2z_tzspecificexpression.html


More works are needed if we want to add a new data type {{TIMESTAMP WITH TIME 
ZONE}}
Link: 
https://docs.oracle.com/cd/B19306_01/server.102/b14225/ch4datetime.htm#i1005946


was (Author: smilegator):
Below might be needed if we want to support session timezone?

- Add a SQL statement and API to set the current session timezone?
Link: 
https://docs.oracle.com/cd/B19306_01/server.102/b14225/ch4datetime.htm#i1006728
- Add a SQL statement and API to get the current session timezone? 
Link: 
https://www.ibm.com/support/knowledgecenter/SSEPEK_10.0.0/sqlref/src/tpc/db2z_currenttz.html
- Add time zone specific expressions? 
Link: 
http://www.ibm.com/support/knowledgecenter/SSEPEK_10.0.0/sqlref/src/tpc/db2z_tzspecificexpression.html


More works are needed if we want to add a new data type {TIMESTAMP WITH TIME 
ZONE}
Link: 
https://docs.oracle.com/cd/B19306_01/server.102/b14225/ch4datetime.htm#i1005946

> Support session local timezone
> --
>
> Key: SPARK-18350
> URL: https://issues.apache.org/jira/browse/SPARK-18350
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> As of Spark 2.1, Spark SQL assumes the machine timezone for datetime 
> manipulation, which is bad if users are not in the same timezones as the 
> machines, or if different users have different timezones.
> We should introduce a session local timezone setting that is used for 
> execution.
> An explicit non-goal is locale handling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

16 matches

Site Navigation

Mail list logo

Footer information