[ https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15950149#comment-15950149 ]
Hyukjin Kwon edited comment on SPARK-20152 at 3/31/17 1:38 AM: --------------------------------------------------------------- I think the correct usage is as below per {{SimpleDateFormat}}: {code} scala> new java.text.SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSXXX").parse("2017-03-21T00:00:00.000Z") res15: java.util.Date = Tue Mar 21 09:00:00 KST 2017 {code} I should have left some comments there maybe. At that time I introduce this in SPARK-16216, I used {{ZZ}} as specified in {{FastDateFormat}} to support "ISO 8601 extended format time zones" (see https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/time/FastDateFormat.html). I am sorry I think I misunderstood the documentation (I thought it was not {{FastDateFormat}} specific), and maybe I had to use {{SimpleDateFormat}} with thread-local instead. After this gets merged, I realised it seems {{FastDateFormat}} has a bug about supporting {{XXX}} format specified in https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html - https://issues.apache.org/jira/browse/LANG-1101 and it seems fixed in 3.4. IIRC, I left this format for that reason and the commons-lang3 version was 3.3.2 at that time. After few months, in favour of SPARK-17985, it is bumped up and now it should be fixed and I think you can use {{XXX}} as below: {code} scala> import org.apache.commons.lang3.time.FastDateFormat import org.apache.commons.lang3.time.FastDateFormat scala> FastDateFormat.getInstance("yyyy-MM-dd'T'HH:mm:ss.SSSXXX").parse("2017-03-21T00:00:00.000Z") res0: java.util.Date = Tue Mar 21 09:00:00 KST 2017 {code} and also {code} scala> import org.apache.commons.lang3.time.FastDateFormat import org.apache.commons.lang3.time.FastDateFormat scala> FastDateFormat.getInstance("yyyy-MM-dd'T'HH:mm:ss.SSSZZ").parse("2017-03-21T00:00:00.000Z") res2: java.util.Date = Tue Mar 21 09:00:00 KST 2017 {code} The related test was added in commons here - https://github.com/apache/commons-lang/commit/bdb074610c87a210ea4c0d91d579cb4558f4b19f To cut this short, I think this issue is resolvable, and I think we can replace the default format to {{XXX}} by default now instead of {{ZZ}} which is {{FastDateFormat}}-specific up to my knowledge. BTW, I think was (Author: hyukjin.kwon): I think the correct usage is as below: {code} scala> new java.text.SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSXXX").parse("2017-03-21T00:00:00.000Z") res15: java.util.Date = Tue Mar 21 09:00:00 KST 2017 {code} I should have left some comments there maybe. At that time I introduce this in SPARK-16216, I used {{ZZ}} as specified in {{FastDateFormat}} to support "ISO 8601 extended format time zones" (see https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/time/FastDateFormat.html). I am sorry I think I misunderstood the documentation (I thought it was not {{FastDateFormat}} specific), and maybe I had to use {{SimpleDateFormat}} with thread-local instead. After this gets merged, I realised it seems {{FastDateFormat}} has a bug about supporting {{XXX}} format specified in https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html - https://issues.apache.org/jira/browse/LANG-1101 and it seems fixed in 3.4. IIRC, I left this format for that reason and the commons-lang3 version was 3.3.2 at that time. After few months, in favour of SPARK-17985, it is bumped up and now it should be fixed and I think you can use {{XXX}} as below: {code} scala> import org.apache.commons.lang3.time.FastDateFormat import org.apache.commons.lang3.time.FastDateFormat scala> FastDateFormat.getInstance("yyyy-MM-dd'T'HH:mm:ss.SSSXXX").parse("2017-03-21T00:00:00.000Z") res0: java.util.Date = Tue Mar 21 09:00:00 KST 2017 {code} The related test was added in commons here - https://github.com/apache/commons-lang/commit/bdb074610c87a210ea4c0d91d579cb4558f4b19f To cut this short, I think this issue is resolvable, and I think we can replace the default format to {{XXX}} by default now instead of {{ZZ}} which is {{FastDateFormat}}-specific up to my knowledge. > Time zone is not respected while parsing csv for timeStampFormat > "MM-dd-yyyy'T'HH:mm:ss.SSSZZ" > ---------------------------------------------------------------------------------------------- > > Key: SPARK-20152 > URL: https://issues.apache.org/jira/browse/SPARK-20152 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.1.0 > Reporter: Navya Krishnappa > > When reading the below mentioned time value by specifying the > "timestampFormat": "MM-dd-yyyy'T'HH:mm:ss.SSSZZ", time zone is ignored. > Source File: > TimeColumn > 03-21-2017T03:30:02Z > Source code1: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-yyyy'T'HH:mm:ss.SSSZZ") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but > expected result is TimeCoumn should be of "TimestampType" and should > consider time zone for manipulation > Source code2: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-yyyy'T'HH:mm:ss") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but > expected result is TimeCoumn should consider time zone for manipulation -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org