[jira] [Comment Edited] (SPARK-16216) CSV data source does not write date and timestamp correctly

2016-10-21 Thread Barry Becker (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15595619#comment-15595619
 ] 

Barry Becker edited comment on SPARK-16216 at 10/21/16 4:41 PM:


If timezone is not specified, the date should be interpreted as being in "local 
time".  Trying to add a time zone when none was specified is not the right 
thing to do since it is making an assumption that is not necessarily true. I 
think JSON is doing the right thing above by leaving off the timezone. I just 
updated to 2.0.1 and see that one of my tests broke because of this.
 Here is my test case:
 I create a dataFrame containing this data:
{code}
val ISO_DATE_FORMAT = DateTimeFormat.forPattern("-MM-dd'T'HH:mm:ss")
val columnData = List(
  new 
Timestamp(ISO_DATE_FORMAT.parseDateTime("2012-01-03T09:12:00").getMillis),
  null,
  new 
Timestamp(ISO_DATE_FORMAT.parseDateTime("2015-02-23T18:00:00").getMillis))
{code}
then write it to a file using
{code}
dataframe.write.format("csv") 
.option("delimiter", "\t")
.option("header", "false")
.option("nullValue", NULL_VALUE)
.option("dateFormat", "-MM-dd'T'HH:mm:ss")
.option("escape", "\\") 
.save(tempFileName)
{code}
Note that I specifically do not want a time zone when I write my dateTimes to 
the file. They are in local time not UTC or GMT. I do not want a timeZone added.

The dataFile used to contain
{code}
2012-01-03T09:12:00
?
2015-02-23T18:00:00
{code}
Which is correct. With spark 1.6.2, but now, with 2.0.1, it contains
{code}
2012-01-03T09:12:00.000-08:00
?
2015-02-23T18:00:00.000-08:00
{code}
Which is not correct. I think the previous behavior is correct. Can we reopen?
If I actually wanted the timeZone to be considered as UTC, then I could add an 
explicit Z at the end.



was (Author: barrybecker4):
If timezone is not specified, the date should be interpreted as being in "local 
time".  Trying to add a time zone when none was specified is not the right 
thing to do since it is making an assumption that is not necessarily true. I 
think JSON is doing the right thing above by leaving off the timezone. I just 
updated to 2.0.1 and see that one of my tests broke because of this.
 Here is my test case:
 I create a dataFrame containing this data:
{code}
val ISO_DATE_FORMAT = DateTimeFormat.forPattern("-MM-dd'T'HH:mm:ss")
val columnData = List(
  new 
Timestamp(ISO_DATE_FORMAT.parseDateTime("2012-01-03T09:12:00").getMillis),
  null,
  new 
Timestamp(ISO_DATE_FORMAT.parseDateTime("2015-02-23T18:00:00").getMillis))
{code}
then write it to a file using
{code}
dataframe.write.format("csv") 
.option("delimiter", "\t")
.option("header", "false")
.option("nullValue", NULL_VALUE)
.option("dateFormat", "-MM-dd'T'HH:mm:ss")
.option("escape", "\\") 
.save(tempFileName)
{code}
Note that I specifically do not want a time zone when I write my dateTimes to 
the file. They are in local time not UTC or GMT. I do not want a timeZone added.

The dataFile used to contain
{code}
2012-01-03T09:12:00
?
2015-02-23T18:00:00
{code}
Which is correct. With spark 1.6.2, but now, with 2.0.1, it contains
{code}
2012-01-03T09:12:00.000-08:00
?
2015-02-23T18:00:00.000-08:00
{code}
Which is not correct. I think the previous behavior is correct. Can we reopen?


> CSV data source does not write date and timestamp correctly
> ---
>
> Key: SPARK-16216
> URL: https://issues.apache.org/jira/browse/SPARK-16216
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Blocker
>  Labels: releasenotes
> Fix For: 2.0.1, 2.1.0
>
>
> Currently, CSV data source write {{DateType}} and {{TimestampType}} as below:
> {code}
> ++
> |date|
> ++
> |14406372|
> |14144598|
> |14540400|
> ++
> {code}
> It would be nicer if it write dates and timestamps as a formatted string just 
> like JSON data sources.
> Also, CSV data source currently supports {{dateFormat}} option to read dates 
> and timestamps in a custom format. It might be better if this option can be 
> applied in writing as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16216) CSV data source does not write date and timestamp correctly

2016-07-19 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383680#comment-15383680
 ] 

Hyukjin Kwon edited comment on SPARK-16216 at 7/19/16 6:56 AM:
---

I can close existing PR and open another to match both (without supporting 
{{dateFormat}} for writing) for simplicity.

Actually, initial proposal was just for CSV to match JSON and to add a support 
{{dateFormat}} for writing as well but it become kind of bigger.

Just FYI, I had a talk about him, here 
https://github.com/apache/spark/pull/13912#issuecomment-228586981

I don't have strong preference. I will follow your decision! 


was (Author: hyukjin.kwon):
I can close existing PR and open another to match both (without supporting 
{{dateFormat}} for writing) for simplicity.

Actually, initial proposal was just to add a support {{dateFormat}} for writing 
as well but it become kind of bigger.

Just FYI, I had a talk about him, here 
https://github.com/apache/spark/pull/13912#issuecomment-228586981

I don't have strong preference. I will follow your decision! 

> CSV data source does not write date and timestamp correctly
> ---
>
> Key: SPARK-16216
> URL: https://issues.apache.org/jira/browse/SPARK-16216
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> Currently, CSV data source write {{DateType}} and {{TimestampType}} as below:
> {code}
> ++
> |date|
> ++
> |14406372|
> |14144598|
> |14540400|
> ++
> {code}
> It would be nicer if it write dates and timestamps as a formatted string just 
> like JSON data sources.
> Also, CSV data source currently supports {{dateFormat}} option to read dates 
> and timestamps in a custom format. It might be better if this option can be 
> applied in writing as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16216) CSV data source does not write date and timestamp correctly

2016-07-19 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383664#comment-15383664
 ] 

Hyukjin Kwon edited comment on SPARK-16216 at 7/19/16 6:44 AM:
---

JSON writes date and timestamp as a string format as below:

{code}
// TimestampType
1970-01-01 11:46:40.0

// DateType
1970-01-01
{code}

So, like [~srowen] suggested for CSV, this might have to be written as 
timestamp (as long values) by default with configurable option for this.


was (Author: hyukjin.kwon):
JSON writes date and timestamp as a string format as below:

{code}
// TimestampType
1970-01-01 11:46:40.0

// DateType
1970-01-01
{code}

So, as [~srowen] suggested, this might have to be written as timestamp (as long 
values) by default.

> CSV data source does not write date and timestamp correctly
> ---
>
> Key: SPARK-16216
> URL: https://issues.apache.org/jira/browse/SPARK-16216
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> Currently, CSV data source write {{DateType}} and {{TimestampType}} as below:
> {code}
> ++
> |date|
> ++
> |14406372|
> |14144598|
> |14540400|
> ++
> {code}
> It would be nicer if it write dates and timestamps as a formatted string just 
> like JSON data sources.
> Also, CSV data source currently supports {{dateFormat}} option to read dates 
> and timestamps in a custom format. It might be better if this option can be 
> applied in writing as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16216) CSV data source does not write date and timestamp correctly

2016-07-19 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383664#comment-15383664
 ] 

Hyukjin Kwon edited comment on SPARK-16216 at 7/19/16 6:43 AM:
---

JSON writes date and timestamp as a string format as below:

{code}
// TimestampType
1970-01-01 11:46:40.0

// DateType
1970-01-01
{code}

So, as [~srowen] suggested, this might have to be written as timestamp (as long 
values) by default.


was (Author: hyukjin.kwon):
JSON writes date and timestamp as a string format as below:

{code}
// TimestampType
1970-01-01 11:46:40.0

// DateType
1970-01-01
{code}

So, as [~srowen] suggested, this might have to be printed as timestamp (as long 
values) by default.

> CSV data source does not write date and timestamp correctly
> ---
>
> Key: SPARK-16216
> URL: https://issues.apache.org/jira/browse/SPARK-16216
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> Currently, CSV data source write {{DateType}} and {{TimestampType}} as below:
> {code}
> ++
> |date|
> ++
> |14406372|
> |14144598|
> |14540400|
> ++
> {code}
> It would be nicer if it write dates and timestamps as a formatted string just 
> like JSON data sources.
> Also, CSV data source currently supports {{dateFormat}} option to read dates 
> and timestamps in a custom format. It might be better if this option can be 
> applied in writing as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org