[ https://issues.apache.org/jira/browse/SPARK-39193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gengliang Wang updated SPARK-39193: ----------------------------------- Summary: Fasten Timestamp type inference of default format in JSON/CSV data source (was: Improve the performance of inferring Timestamp type in JSON/CSV data source) > Fasten Timestamp type inference of default format in JSON/CSV data source > ------------------------------------------------------------------------- > > Key: SPARK-39193 > URL: https://issues.apache.org/jira/browse/SPARK-39193 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.3.0 > Reporter: Gengliang Wang > Assignee: Gengliang Wang > Priority: Major > Fix For: 3.3.0 > > > When reading JSON/CSV files with inferring timestamp types > `.option("inferTimestamp", true)`, the Timestamp conversion will throw and > catch exceptions. As we are putting decent error messages in the exception, > the creation of the exceptions is actually not cheap. It consumes more than > 90% of the type inference time. > We can use the parsing methods which return optional results instead. > Before the change, it takes 166 seconds to infer a JSON file of 624MB with > inferring timestamp enabled. > After the change, it only 16 seconds. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org