bersprockets commented on code in PR #36871: URL: https://github.com/apache/spark/pull/36871#discussion_r899496784
########## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/csv/UnivocityParserSuite.scala: ########## @@ -358,4 +358,19 @@ class UnivocityParserSuite extends SparkFunSuite with SQLHelper { Map("timestampFormat" -> "invalid", "dateFormat" -> "invalid"), false, "UTC") check(new UnivocityParser(StructType(Seq.empty), optionsWithPattern)) } + + test("dates should be parsed correctly in a timestamp column") { + def checkDate(dataType: DataType): Unit = { + val timestampsOptions = + new CSVOptions(Map("timestampFormat" -> "dd/MM/yyyy HH:mm", + "timestampNTZFormat" -> "dd-MM-yyyy HH:mm", "dateFormat" -> "dd_MM_yyyy"), Review Comment: >to make sure timestamps are not parsed as date types without conflicting. That's actually what happens: Before this PR: ``` scala> val csvInput = Seq("0,2012-01-01 12:00:00", "1,2021-07-01 15:00:00").toDS() csvInput: org.apache.spark.sql.Dataset[String] = [value: string] scala> val df = spark.read.option("inferSchema", "true").csv(csvInput) df: org.apache.spark.sql.DataFrame = [_c0: int, _c1: timestamp] scala> df.printSchema root |-- _c0: integer (nullable = true) |-- _c1: timestamp (nullable = true) scala> ``` After this PR: ``` scala> val csvInput = Seq("0,2012-01-01 12:00:00", "1,2021-07-01 15:00:00").toDS() csvInput: org.apache.spark.sql.Dataset[String] = [value: string] scala> val df = spark.read.option("inferSchema", "true").csv(csvInput) df: org.apache.spark.sql.DataFrame = [_c0: int, _c1: date] scala> df.printSchema root |-- _c0: integer (nullable = true) |-- _c1: date (nullable = true) scala> ``` It looks like some tests fail too, like `CSVInferSchemaSuite`, and `CSVv1Suite`, possibly others (I ran these two suites on my laptop. For some reason, the github actions didn't run tests for this PR. Maybe @Jonathancui123 needs to turn them on in his fork?). >We should probably 1. add either SQL configuration or an option e.g., infersDate I think you would need something like that: when set, the date formatter could use the slower, more strict method of parsing (so "2012-01-01 12:00:00" wouldn't parse as a date). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org