[jira] [Assigned] (SPARK-40474) Correct CSV schema inference and data parsing behavior on columns with mixed dates and timestamps

Wenchen Fan (Jira) Fri, 23 Sep 2022 00:59:58 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-40474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Wenchen Fan reassigned SPARK-40474:
-----------------------------------

    Assignee: Xiaonan Yang

> Correct CSV schema inference and data parsing behavior on columns with mixed 
> dates and timestamps
> -------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-40474
>                 URL: https://issues.apache.org/jira/browse/SPARK-40474
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.4.0
>            Reporter: Xiaonan Yang
>            Assignee: Xiaonan Yang
>            Priority: Major
>             Fix For: 3.4.0
>
>
> In this ticket https://issues.apache.org/jira/browse/SPARK-39469, we 
> introduced the support of date type in CSV schema inference. The schema 
> inference behavior on date time columns now is:
>  * For a column only containing dates, we will infer it as Date type
>  * For a column only containing timestamps, we will infer it as Timestamp type
>  * For a column containing a mixture of dates and timestamps, we will infer 
> it as Timestamp type
> However, we found that we are too ambitious on the last scenario, to support 
> which we have introduced much complexity in code and caused a lot of 
> performance concerns. Thus, we want to simplify and correct the behavior of 
> the last scenario as:
>  * For a column containing a mixture of dates and timestamps
>  ** If user specifies timestamp format, it will always be inferred as 
> `StringType`
>  ** If no timestamp format specified by user, we will try inferring it as 
> `TimestampType` if possible, otherwise it will be inferred as `StringType`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40474) Correct CSV schema inference and data parsing behavior on columns with mixed dates and timestamps

Reply via email to