bersprockets commented on code in PR #36871:
URL: https://github.com/apache/spark/pull/36871#discussion_r899496784


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/csv/UnivocityParserSuite.scala:
##########
@@ -358,4 +358,19 @@ class UnivocityParserSuite extends SparkFunSuite with 
SQLHelper {
       Map("timestampFormat" -> "invalid", "dateFormat" -> "invalid"), false, 
"UTC")
     check(new UnivocityParser(StructType(Seq.empty), optionsWithPattern))
   }
+
+  test("dates should be parsed correctly in a timestamp column") {
+    def checkDate(dataType: DataType): Unit = {
+      val timestampsOptions =
+        new CSVOptions(Map("timestampFormat" -> "dd/MM/yyyy HH:mm",
+          "timestampNTZFormat" -> "dd-MM-yyyy HH:mm", "dateFormat" -> 
"dd_MM_yyyy"),

Review Comment:
   >to make sure timestamps are not parsed as date types without conflicting.
   
   That's actually what happens:
   
   Before this PR:
   ```
   scala> val csvInput = Seq("0,2012-01-01 12:00:00", "1,2021-07-01 
15:00:00").toDS()
   csvInput: org.apache.spark.sql.Dataset[String] = [value: string]
   
   scala> val df = spark.read.option("inferSchema", "true").csv(csvInput)
   df: org.apache.spark.sql.DataFrame = [_c0: int, _c1: timestamp]
   
   scala> df.printSchema
   root
    |-- _c0: integer (nullable = true)
    |-- _c1: timestamp (nullable = true)
   
   scala> 
   ```
   After this PR:
   ```
   scala> val csvInput = Seq("0,2012-01-01 12:00:00", "1,2021-07-01 
15:00:00").toDS()
   csvInput: org.apache.spark.sql.Dataset[String] = [value: string]
   
   scala> val df = spark.read.option("inferSchema", "true").csv(csvInput)
   df: org.apache.spark.sql.DataFrame = [_c0: int, _c1: date]
   
   scala> df.printSchema
   root
    |-- _c0: integer (nullable = true)
    |-- _c1: date (nullable = true)
   
   scala>
   ```
   It looks like some tests fail too, like `CSVInferSchemaSuite`,  and 
`CSVv1Suite`, possibly others (I ran these two suites on my laptop. For some 
reason, the github actions didn't run tests for this PR. Maybe @Jonathancui123 
needs to turn them on in his fork?).
   
   >We should probably 1. add either SQL configuration or an option e.g., 
infersDate
   
   I think you would need something like that: when set, the date formatter 
could use the slower, more strict method of parsing (so "2012-01-01 12:00:00" 
wouldn't parse as a date).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to