cloud-fan commented on a change in pull request #27710: [SPARK-30960][SQL] add 
back the legacy date/timestamp format support in CSV/JSON parser
URL: https://github.com/apache/spark/pull/27710#discussion_r384679403
 
 

 ##########
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
 ##########
 @@ -175,10 +175,30 @@ class UnivocityParser(
       }
 
     case _: TimestampType => (d: String) =>
-      nullSafeDatum(d, name, nullable, options)(timestampFormatter.parse)
+      nullSafeDatum(d, name, nullable, options) { datum =>
+        try {
+          timestampFormatter.parse(datum)
+        } catch {
+          case NonFatal(e) =>
+            // If fails to parse, then tries the way used in 2.0 and 1.x for 
backwards
+            // compatibility.
+            val str = UTF8String.fromString(datum)
+            DateTimeUtils.stringToTimestamp(str, 
options.zoneId).getOrElse(throw e)
 
 Review comment:
   We switch the calendar from "hybrid calendar (Julian + Gregorian)" to "ISO 
chronology (Proleptic Gregorian)", which affects date/timestamp values before 
October 15, 1582. For example, some dates are invalid according to the new 
calendar but the old calendar accepts it. When we add days to a date, the 
calendar skips invalid dates.
   
   The new calendar is widely used by RDBMS and others like parquet, snowflake, 
but we should have a better rollout plan for the calendar switch, like making 
the calendar configurable for Spark. Unfortunately, we catch this too late.
   
   However, the more serious problem is the switch of the pattern string(as we 
switch to the Java 8 datetime API), and 3.0 can't parse some date/timestamp 
strings suddenly. This is the first step to fix this issue by adding back the 
legacy fallback, we are still working on fixing the 1% when users set a custom 
pattern string:
   1. try to support the previous pattern string with the new calendar. If 
can't, go 2)
   2. detect the conflicts between old and new pattern string and throw an 
exception to tell users what to do (turn on the legacy config or accept the new 
pattern string knowing what it means)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to