[ https://issues.apache.org/jira/browse/SPARK-39731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ivan Sadikov updated SPARK-39731: --------------------------------- Description: In Spark 3.x, when reading CSV data like this: {code:java} name,mydate 1,2020011 2,20201203{code} and specifying date pattern as "yyyyMMdd", dates are not parsed correctly with CORRECTED time parser policy. For example, {code:java} val df = spark.read.schema("name string, mydate date").option("dateFormat", "yyyyMMdd").option("header", "true").csv("file:/tmp/test.csv") df.show(false){code} Returns: {code:java} +----+--------------+ |name|mydate | +----+--------------+ |1 |+2020011-01-01| |2 |2020-12-03 | +----+--------------+ {code} and it used to return null in Spark 3.2 or below. The issue appears to be caused by this PR: [https://github.com/apache/spark/pull/32959]. was: In Spark 3.x, when reading CSV data like this: {code:java} name,mydate 1,2020011 2,20201203{code} and specifying date pattern as "yyyyMMdd", dates are not parsed correctly with CORRECTED time parser policy. For example, {code:java} val df = spark.read.schema("name string, mydate date").option("dateFormat", "yyyyMMdd").option("header", "true").csv("file:/tmp/test.csv") df.show(false){code} Returns: {code:java} +----+--------+--------------+ |name|orig |mydate | +----+--------+--------------+ |1 |2020011 |+2020011-01-01| |2 |20201203|2020-12-03 | +----+--------+--------------+ {code} and it used to return null in Spark 3.2 or below. The issue appears to be caused by this PR: https://github.com/apache/spark/pull/32959. > Correctness issue when parsing dates with yyyyMMdd format in CSV > ---------------------------------------------------------------- > > Key: SPARK-39731 > URL: https://issues.apache.org/jira/browse/SPARK-39731 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.4.0 > Reporter: Ivan Sadikov > Priority: Major > > In Spark 3.x, when reading CSV data like this: > {code:java} > name,mydate > 1,2020011 > 2,20201203{code} > and specifying date pattern as "yyyyMMdd", dates are not parsed correctly > with CORRECTED time parser policy. > For example, > {code:java} > val df = spark.read.schema("name string, mydate date").option("dateFormat", > "yyyyMMdd").option("header", "true").csv("file:/tmp/test.csv") > df.show(false){code} > Returns: > {code:java} > +----+--------------+ > |name|mydate | > +----+--------------+ > |1 |+2020011-01-01| > |2 |2020-12-03 | > +----+--------------+ {code} > and it used to return null in Spark 3.2 or below. > > The issue appears to be caused by this PR: > [https://github.com/apache/spark/pull/32959]. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org