[ https://issues.apache.org/jira/browse/SPARK-39731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564745#comment-17564745 ]
Apache Spark commented on SPARK-39731: -------------------------------------- User 'sadikovi' has created a pull request for this issue: https://github.com/apache/spark/pull/37147 > Correctness issue when parsing dates with yyyyMMdd format in CSV > ---------------------------------------------------------------- > > Key: SPARK-39731 > URL: https://issues.apache.org/jira/browse/SPARK-39731 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.4.0 > Reporter: Ivan Sadikov > Priority: Major > > In Spark 3.x, when reading CSV data like this: > {code:java} > name,mydate > 1,2020011 > 2,20201203{code} > and specifying date pattern as "yyyyMMdd", dates are not parsed correctly > with CORRECTED time parser policy. > For example, > {code:java} > val df = spark.read.schema("name string, mydate date").option("dateFormat", > "yyyyMMdd").option("header", "true").csv("file:/tmp/test.csv") > df.show(false){code} > Returns: > {code:java} > +----+--------------+ > |name|mydate | > +----+--------------+ > |1 |+2020011-01-01| > |2 |2020-12-03 | > +----+--------------+ {code} > and it used to return null instead of the invalid date in Spark 3.2 or below. > > The issue appears to be caused by this PR: > [https://github.com/apache/spark/pull/32959]. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org