[ https://issues.apache.org/jira/browse/SPARK-17545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495408#comment-15495408 ]
Hyukjin Kwon commented on SPARK-17545: -------------------------------------- Hi [~nbeyer], the basic ISO format currently follows https://www.w3.org/TR/NOTE-datetime That says {quote} 1997-07-16T19:20:30.45+01:00 {quote} is the right ISO format where timezone is {quote} TZD = time zone designator (Z or +hh:mm or -hh:mm) {quote} To make sure, I double-checked the ISO 8601 - 2004 full specification in http://www.uai.cl/images/sitio/biblioteca/citas/ISO_8601_2004en.pdf That says, {quote} ... the expression shall either be completely in basic format, in which case the minimum number of separators necessary for the required expression is used, or completely in extended format, in which case additional separators shall be used ... {quote} where the basic format is {{20160707T211822+0300 }} whereas the extended format is {{2016-07-07T21:18:22+03:00}}. In addition, basic format seems even discouraged in text format {quote} NOTE : The basic format should be avoided in plain text. {quote} Therefore, {{2016-07-07T21:18:22+03:00}} Is the right ISO 8601:2004. whereas {{2016-07-07T21:18:22+0300}} Is not because the zone designator may not be in the basic format when the date and time of day is in the extended format. > Spark SQL Catalyst doesn't handle ISO 8601 date without colon in offset > ----------------------------------------------------------------------- > > Key: SPARK-17545 > URL: https://issues.apache.org/jira/browse/SPARK-17545 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0 > Reporter: Nathan Beyer > > When parsing a CSV with a date/time column that contains a variant ISO 8601 > that doesn't include a colon in the offset, casting to Timestamp fails. > Here's a simple, example CSV content. > {quote} > time > "2015-07-20T15:09:23.736-0500" > "2015-07-20T15:10:51.687-0500" > "2015-11-21T23:15:01.499-0600" > {quote} > Here's the stack trace that results from processing this data. > {quote} > 16/09/14 15:22:59 ERROR Utils: Aborting task > java.lang.IllegalArgumentException: 2015-11-21T23:15:01.499-0600 > at > org.apache.xerces.jaxp.datatype.XMLGregorianCalendarImpl$Parser.skip(Unknown > Source) > at > org.apache.xerces.jaxp.datatype.XMLGregorianCalendarImpl$Parser.parse(Unknown > Source) > at > org.apache.xerces.jaxp.datatype.XMLGregorianCalendarImpl.<init>(Unknown > Source) > at > org.apache.xerces.jaxp.datatype.DatatypeFactoryImpl.newXMLGregorianCalendar(Unknown > Source) > at > javax.xml.bind.DatatypeConverterImpl._parseDateTime(DatatypeConverterImpl.java:422) > at > javax.xml.bind.DatatypeConverterImpl.parseDateTime(DatatypeConverterImpl.java:417) > at > javax.xml.bind.DatatypeConverter.parseDateTime(DatatypeConverter.java:327) > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.stringToTime(DateTimeUtils.scala:140) > at > org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:287) > {quote} > Somewhat related, I believe Python standard libraries can produce this form > of zone offset. The system I got the data from is written in Python. > https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org