[ 
https://issues.apache.org/jira/browse/SPARK-39193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-39193:
-----------------------------------
    Summary: Fasten Timestamp type inference of default format in JSON/CSV data 
source  (was: Improve the performance of inferring Timestamp type in JSON/CSV 
data source)

> Fasten Timestamp type inference of default format in JSON/CSV data source
> -------------------------------------------------------------------------
>
>                 Key: SPARK-39193
>                 URL: https://issues.apache.org/jira/browse/SPARK-39193
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.3.0
>            Reporter: Gengliang Wang
>            Assignee: Gengliang Wang
>            Priority: Major
>             Fix For: 3.3.0
>
>
> When reading JSON/CSV files with inferring timestamp types 
> `.option("inferTimestamp", true)`, the Timestamp conversion will throw and 
> catch exceptions. As we are putting decent error messages in the exception, 
> the creation of the exceptions is actually not cheap. It consumes more than 
> 90% of the type inference time. 
> We can use the parsing methods which return optional results instead.
> Before the change, it takes 166 seconds to infer a JSON file of 624MB with 
> inferring timestamp enabled.
> After the change, it only 16 seconds.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to