[jira] [Comment Edited] (SPARK-17545) Spark SQL Catalyst doesn't handle ISO 8601 date without colon in offset

Hyukjin Kwon (JIRA) Sat, 17 Sep 2016 08:07:46 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-17545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15499175#comment-15499175
 ]


Hyukjin Kwon edited comment on SPARK-17545 at 9/17/16 3:06 PM:
---------------------------------------------------------------

I took another look. This might be solved by changing the default time pattern 
from

{code}
yyyy-MM-dd'T'HH:mm:ss.SSSZZ
{code}

to

{code}
yyyy-MM-dd'T'HH:mm:ss.SSSXXX
{code}

in 
[CSVOptions.scala#L111|https://github.com/apache/spark/blob/29952ed096fd2a0a19079933ff691671d6f00835/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala#L111]

which (in case of {{SimpleDateFormat}} will read {{+0800}}, {{+08}} and 
{{+08:00}} permissively but write {{+08:00}} which does not break current 
behaviour.

But, it seems {{FastDateFormat}} does not support this {{X}}/{{XX}}/{{XXX}} - 
https://issues.apache.org/jira/browse/LANG-1267

We might be able to do one of the followings

 - Use {{SimpleDateFormat}} with {{ThreadLocal}} and change the default pattern.
 - Wait for the release and upgrade common-lang to 3.6 after this is fixed for 
{{FastDateFormat}}
 - Use {{DateTimeFormatter}} after we drop Java 7 - 
https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html



was (Author: hyukjin.kwon):
I took another look. This might be solved by changing the default time pattern 
from

{code}
yyyy-MM-dd'T'HH:mm:ss.SSSZZ
{code}

to

{code}
yyyy-MM-dd'T'HH:mm:ss.SSSXXX
{code}

in 
[CSVOptions.scala#L111|https://github.com/apache/spark/blob/29952ed096fd2a0a19079933ff691671d6f00835/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala#L111]

which (in case of {{SimpleDateFormat}} will read {{+0800}}, {{+08}} and 
{{+08:00}} permissively but write {{+08:00}} which does not break current 
behaviour.

But, it seems {{FastDateFormat}} does not support this {{X}}/{{XX}}/{{XXX}} - 
https://issues.apache.org/jira/browse/LANG-1267

We might be able to do one of the followings

 - Use {{SimpleDateFormat}} with {{ThreadLocal}} and change the default pattern.
 - Wait for the release and upgrade common-lang to 3.6 after this is fixed for 
{{SimpleDateFormat}}
 - Use {{DateTimeFormatter}} after we drop Java 7 - 
https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html


> Spark SQL Catalyst doesn't handle ISO 8601 date without colon in offset
> -----------------------------------------------------------------------
>
>                 Key: SPARK-17545
>                 URL: https://issues.apache.org/jira/browse/SPARK-17545
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Nathan Beyer
>
> When parsing a CSV with a date/time column that contains a variant ISO 8601 
> that doesn't include a colon in the offset, casting to Timestamp fails.
> Here's a simple, example CSV content.
> {quote}
> time
> "2015-07-20T15:09:23.736-0500"
> "2015-07-20T15:10:51.687-0500"
> "2015-11-21T23:15:01.499-0600"
> {quote}
> Here's the stack trace that results from processing this data.
> {quote}
> 16/09/14 15:22:59 ERROR Utils: Aborting task
> java.lang.IllegalArgumentException: 2015-11-21T23:15:01.499-0600
>       at 
> org.apache.xerces.jaxp.datatype.XMLGregorianCalendarImpl$Parser.skip(Unknown 
> Source)
>       at 
> org.apache.xerces.jaxp.datatype.XMLGregorianCalendarImpl$Parser.parse(Unknown 
> Source)
>       at 
> org.apache.xerces.jaxp.datatype.XMLGregorianCalendarImpl.<init>(Unknown 
> Source)
>       at 
> org.apache.xerces.jaxp.datatype.DatatypeFactoryImpl.newXMLGregorianCalendar(Unknown
>  Source)
>       at 
> javax.xml.bind.DatatypeConverterImpl._parseDateTime(DatatypeConverterImpl.java:422)
>       at 
> javax.xml.bind.DatatypeConverterImpl.parseDateTime(DatatypeConverterImpl.java:417)
>       at 
> javax.xml.bind.DatatypeConverter.parseDateTime(DatatypeConverter.java:327)
>       at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.stringToTime(DateTimeUtils.scala:140)
>       at 
> org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:287)
> {quote}
> Somewhat related, I believe Python standard libraries can produce this form 
> of zone offset. The system I got the data from is written in Python.
> https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-17545) Spark SQL Catalyst doesn't handle ISO 8601 date without colon in offset

Reply via email to