[ 
https://issues.apache.org/jira/browse/SPARK-40474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaonan Yang updated SPARK-40474:
---------------------------------
    Shepherd: Xiaonan Yang

> Infer columns with mixed date and timestamp as String in CSV schema inference
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-40474
>                 URL: https://issues.apache.org/jira/browse/SPARK-40474
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.4.0
>            Reporter: Xiaonan Yang
>            Priority: Major
>             Fix For: 3.4.0
>
>
> In ticket, we introduced the support of date type in CSV schema inference. 
> The schema inference behavior on date time columns now is:
>  * For columns only containing dates, we will infer it as Date type
>  * For columns only containing timestamps, we will infer it as Timestamp type
>  * For columns containing a mixture of dates and timestamps, we will infer it 
> as Timestamp type
> However, we found that we are too ambitious on the last scenario, to support 
> which we have introduced much complexity in code and caused a lot of 
> performance concerns. Thus, we want to simplify the behavior of the last 
> scenario as:
>  * For columns containing a mixture of dates and timestamps, we will infer it 
> as String type



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to