[GitHub] spark pull request #23202: [SPARK-26248][SQL] Infer date type from CSV

HyukjinKwon Sun, 02 Dec 2018 20:23:56 -0800

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23202#discussion_r238141702
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala
 ---
    @@ -98,6 +100,7 @@ class CSVInferSchema(options: CSVOptions) extends 
Serializable {
               compatibleType(typeSoFar, 
tryParseDecimal(field)).getOrElse(StringType)
             case DoubleType => tryParseDouble(field)
             case TimestampType => tryParseTimestamp(field)
    +        case DateType => tryParseDate(field)
    --- End diff --
    
    I mean, IIRC, if the pattern is, for instance, `yyyy-MM-dd`, 2010-10-10 and 
also 2018-12-02T21:04:00.123567 are parsed as dates because the current parsing 
library checks if the string is matched and ignore the rest of them.
    
    So, if we try date first, it will works for its default value but it we do 
some weird patterns, it wouldn't work again.
    
    I was thinking we can fix it if we use `DateTimeFormatter`.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23202: [SPARK-26248][SQL] Infer date type from CSV

Reply via email to