Jonathancui123 commented on code in PR #36871: URL: https://github.com/apache/spark/pull/36871#discussion_r903186613
########## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala: ########## @@ -2753,6 +2754,35 @@ abstract class CSVSuite } } } + + test("SPARK-39469: Infer schema for date type") { + val options = Map( + "header" -> "true", + "inferSchema" -> "true", + "timestampFormat" -> "yyyy-MM-dd'T'HH:mm", + "dateFormat" -> "yyyy-MM-dd", + "inferDate" -> "true") + + val results = spark.read + .format("csv") + .options(options) + .load(testFile(dateInferSchemaFile)) + + val expectedSchema = StructType(List(StructField("date", DateType), + StructField("timestamp-date", TimestampType), StructField("date-timestamp", TimestampType))) + assert(results.schema == expectedSchema) + + val expected = + Seq( + Seq(Date.valueOf("2001-9-8"), Timestamp.valueOf("2014-10-27 18:30:0.0"), + Timestamp.valueOf("1765-03-28 00:00:0.0")), + Seq(Date.valueOf("1941-1-2"), Timestamp.valueOf("2000-09-14 01:01:0.0"), + Timestamp.valueOf("1423-11-12 23:41:0.0")), + Seq(Date.valueOf("0293-11-7"), Timestamp.valueOf("1995-06-25 00:00:00.0"), + Timestamp.valueOf("2016-01-28 20:00:00.0")) + ) + assert(results.collect().toSeq.map(_.toSeq) == expected) + } Review Comment: > One test we might need would be timestampFormat" -> "dd/MM/yyyy HH:mm and dateFormat -> dd/MM/yyyy to make sure timestamps are not parsed as date types without conflicting. This test uses: ``` "timestampFormat" -> "yyyy-MM-dd'T'HH:mm", "dateFormat" -> "yyyy-MM-dd", ``` This e2e test ensures that our DateFormatter is using strict parsing. We will not infer Timestamp columns as Date columns if the `DateFormat` is a prefix of the `TimestampFormat`. Thank you for the review! @HyukjinKwon @bersprockets -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org