Jonathancui123 commented on code in PR #36871:
URL: https://github.com/apache/spark/pull/36871#discussion_r903186613


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala:
##########
@@ -2753,6 +2754,35 @@ abstract class CSVSuite
       }
     }
   }
+
+  test("SPARK-39469: Infer schema for date type") {
+    val options = Map(
+      "header" -> "true",
+      "inferSchema" -> "true",
+      "timestampFormat" -> "yyyy-MM-dd'T'HH:mm",
+      "dateFormat" -> "yyyy-MM-dd",
+      "inferDate" -> "true")
+
+    val results = spark.read
+      .format("csv")
+      .options(options)
+      .load(testFile(dateInferSchemaFile))
+
+    val expectedSchema = StructType(List(StructField("date", DateType),
+      StructField("timestamp-date", TimestampType), 
StructField("date-timestamp", TimestampType)))
+    assert(results.schema == expectedSchema)
+
+    val expected =
+      Seq(
+        Seq(Date.valueOf("2001-9-8"), Timestamp.valueOf("2014-10-27 
18:30:0.0"),
+          Timestamp.valueOf("1765-03-28 00:00:0.0")),
+        Seq(Date.valueOf("1941-1-2"), Timestamp.valueOf("2000-09-14 
01:01:0.0"),
+          Timestamp.valueOf("1423-11-12 23:41:0.0")),
+        Seq(Date.valueOf("0293-11-7"), Timestamp.valueOf("1995-06-25 
00:00:00.0"),
+          Timestamp.valueOf("2016-01-28 20:00:00.0"))
+      )
+    assert(results.collect().toSeq.map(_.toSeq) == expected)
+  }

Review Comment:
   > One test we might need would be timestampFormat" -> "dd/MM/yyyy HH:mm and 
dateFormat -> dd/MM/yyyy to make sure timestamps are not parsed as date types 
without conflicting.
   
   This test uses: 
   ```
         "timestampFormat" -> "yyyy-MM-dd'T'HH:mm",
         "dateFormat" -> "yyyy-MM-dd",
   ```
   
   This e2e test ensures that our DateFormatter is using strict parsing. We 
will not infer Timestamp columns as Date columns if the `DateFormat` is a 
prefix of the `TimestampFormat`. 
   
   Thank you for the review! @HyukjinKwon @bersprockets 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to