[ https://issues.apache.org/jira/browse/SPARK-16216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15595619#comment-15595619 ]
Barry Becker edited comment on SPARK-16216 at 10/21/16 4:41 PM: ---------------------------------------------------------------- If timezone is not specified, the date should be interpreted as being in "local time". Trying to add a time zone when none was specified is not the right thing to do since it is making an assumption that is not necessarily true. I think JSON is doing the right thing above by leaving off the timezone. I just updated to 2.0.1 and see that one of my tests broke because of this. Here is my test case: I create a dataFrame containing this data: {code} val ISO_DATE_FORMAT = DateTimeFormat.forPattern("yyyy-MM-dd'T'HH:mm:ss") val columnData = List( new Timestamp(ISO_DATE_FORMAT.parseDateTime("2012-01-03T09:12:00").getMillis), null, new Timestamp(ISO_DATE_FORMAT.parseDateTime("2015-02-23T18:00:00").getMillis)) {code} then write it to a file using {code} dataframe.write.format("csv") .option("delimiter", "\t") .option("header", "false") .option("nullValue", NULL_VALUE) .option("dateFormat", "yyyy-MM-dd'T'HH:mm:ss") .option("escape", "\\") .save(tempFileName) {code} Note that I specifically do not want a time zone when I write my dateTimes to the file. They are in local time not UTC or GMT. I do not want a timeZone added. The dataFile used to contain {code} 2012-01-03T09:12:00 ? 2015-02-23T18:00:00 {code} Which is correct. With spark 1.6.2, but now, with 2.0.1, it contains {code} 2012-01-03T09:12:00.000-08:00 ? 2015-02-23T18:00:00.000-08:00 {code} Which is not correct. I think the previous behavior is correct. Can we reopen? If I actually wanted the timeZone to be considered as UTC, then I could add an explicit Z at the end. was (Author: barrybecker4): If timezone is not specified, the date should be interpreted as being in "local time". Trying to add a time zone when none was specified is not the right thing to do since it is making an assumption that is not necessarily true. I think JSON is doing the right thing above by leaving off the timezone. I just updated to 2.0.1 and see that one of my tests broke because of this. Here is my test case: I create a dataFrame containing this data: {code} val ISO_DATE_FORMAT = DateTimeFormat.forPattern("yyyy-MM-dd'T'HH:mm:ss") val columnData = List( new Timestamp(ISO_DATE_FORMAT.parseDateTime("2012-01-03T09:12:00").getMillis), null, new Timestamp(ISO_DATE_FORMAT.parseDateTime("2015-02-23T18:00:00").getMillis)) {code} then write it to a file using {code} dataframe.write.format("csv") .option("delimiter", "\t") .option("header", "false") .option("nullValue", NULL_VALUE) .option("dateFormat", "yyyy-MM-dd'T'HH:mm:ss") .option("escape", "\\") .save(tempFileName) {code} Note that I specifically do not want a time zone when I write my dateTimes to the file. They are in local time not UTC or GMT. I do not want a timeZone added. The dataFile used to contain {code} 2012-01-03T09:12:00 ? 2015-02-23T18:00:00 {code} Which is correct. With spark 1.6.2, but now, with 2.0.1, it contains {code} 2012-01-03T09:12:00.000-08:00 ? 2015-02-23T18:00:00.000-08:00 {code} Which is not correct. I think the previous behavior is correct. Can we reopen? > CSV data source does not write date and timestamp correctly > ----------------------------------------------------------- > > Key: SPARK-16216 > URL: https://issues.apache.org/jira/browse/SPARK-16216 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.0.0 > Reporter: Hyukjin Kwon > Assignee: Hyukjin Kwon > Priority: Blocker > Labels: releasenotes > Fix For: 2.0.1, 2.1.0 > > > Currently, CSV data source write {{DateType}} and {{TimestampType}} as below: > {code} > +----------------+ > | date| > +----------------+ > |1440637200000000| > |1414459800000000| > |1454040000000000| > +----------------+ > {code} > It would be nicer if it write dates and timestamps as a formatted string just > like JSON data sources. > Also, CSV data source currently supports {{dateFormat}} option to read dates > and timestamps in a custom format. It might be better if this option can be > applied in writing as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org