[jira] [Commented] (SPARK-39731) Correctness issue when parsing dates with yyyyMMdd format in CSV
[ https://issues.apache.org/jira/browse/SPARK-39731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564745#comment-17564745 ] Apache Spark commented on SPARK-39731: -- User 'sadikovi' has created a pull request for this issue: https://github.com/apache/spark/pull/37147 > Correctness issue when parsing dates with MMdd format in CSV > > > Key: SPARK-39731 > URL: https://issues.apache.org/jira/browse/SPARK-39731 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Ivan Sadikov >Priority: Major > > In Spark 3.x, when reading CSV data like this: > {code:java} > name,mydate > 1,2020011 > 2,20201203{code} > and specifying date pattern as "MMdd", dates are not parsed correctly > with CORRECTED time parser policy. > For example, > {code:java} > val df = spark.read.schema("name string, mydate date").option("dateFormat", > "MMdd").option("header", "true").csv("file:/tmp/test.csv") > df.show(false){code} > Returns: > {code:java} > ++--+ > |name|mydate| > ++--+ > |1 |+2020011-01-01| > |2 |2020-12-03| > ++--+ {code} > and it used to return null instead of the invalid date in Spark 3.2 or below. > > The issue appears to be caused by this PR: > [https://github.com/apache/spark/pull/32959]. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39731) Correctness issue when parsing dates with yyyyMMdd format in CSV
[ https://issues.apache.org/jira/browse/SPARK-39731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564744#comment-17564744 ] Apache Spark commented on SPARK-39731: -- User 'sadikovi' has created a pull request for this issue: https://github.com/apache/spark/pull/37147 > Correctness issue when parsing dates with MMdd format in CSV > > > Key: SPARK-39731 > URL: https://issues.apache.org/jira/browse/SPARK-39731 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Ivan Sadikov >Priority: Major > > In Spark 3.x, when reading CSV data like this: > {code:java} > name,mydate > 1,2020011 > 2,20201203{code} > and specifying date pattern as "MMdd", dates are not parsed correctly > with CORRECTED time parser policy. > For example, > {code:java} > val df = spark.read.schema("name string, mydate date").option("dateFormat", > "MMdd").option("header", "true").csv("file:/tmp/test.csv") > df.show(false){code} > Returns: > {code:java} > ++--+ > |name|mydate| > ++--+ > |1 |+2020011-01-01| > |2 |2020-12-03| > ++--+ {code} > and it used to return null instead of the invalid date in Spark 3.2 or below. > > The issue appears to be caused by this PR: > [https://github.com/apache/spark/pull/32959]. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org