[ https://issues.apache.org/jira/browse/SPARK-17545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15502064#comment-15502064 ]
Nathan Beyer commented on SPARK-17545: -------------------------------------- This is a PR based on my suggestion above. It would be nicer to replace things with commons-lang 3.6 or Java 8 SimpleDateFormat, but those will require quite a bit of time and can always overlay this approach. > Spark SQL Catalyst doesn't handle ISO 8601 date without colon in offset > ----------------------------------------------------------------------- > > Key: SPARK-17545 > URL: https://issues.apache.org/jira/browse/SPARK-17545 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0 > Reporter: Nathan Beyer > > When parsing a CSV with a date/time column that contains a variant ISO 8601 > that doesn't include a colon in the offset, casting to Timestamp fails. > Here's a simple, example CSV content. > {quote} > time > "2015-07-20T15:09:23.736-0500" > "2015-07-20T15:10:51.687-0500" > "2015-11-21T23:15:01.499-0600" > {quote} > Here's the stack trace that results from processing this data. > {quote} > 16/09/14 15:22:59 ERROR Utils: Aborting task > java.lang.IllegalArgumentException: 2015-11-21T23:15:01.499-0600 > at > org.apache.xerces.jaxp.datatype.XMLGregorianCalendarImpl$Parser.skip(Unknown > Source) > at > org.apache.xerces.jaxp.datatype.XMLGregorianCalendarImpl$Parser.parse(Unknown > Source) > at > org.apache.xerces.jaxp.datatype.XMLGregorianCalendarImpl.<init>(Unknown > Source) > at > org.apache.xerces.jaxp.datatype.DatatypeFactoryImpl.newXMLGregorianCalendar(Unknown > Source) > at > javax.xml.bind.DatatypeConverterImpl._parseDateTime(DatatypeConverterImpl.java:422) > at > javax.xml.bind.DatatypeConverterImpl.parseDateTime(DatatypeConverterImpl.java:417) > at > javax.xml.bind.DatatypeConverter.parseDateTime(DatatypeConverter.java:327) > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.stringToTime(DateTimeUtils.scala:140) > at > org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:287) > {quote} > Somewhat related, I believe Python standard libraries can produce this form > of zone offset. The system I got the data from is written in Python. > https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org