[ https://issues.apache.org/jira/browse/SPARK-26222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16719122#comment-16719122 ]
ASF GitHub Bot commented on SPARK-26222: ---------------------------------------- xuanyuanking opened a new pull request #23298: [SPARK-26222][SQL] Track file listing time URL: https://github.com/apache/spark/pull/23298 ## What changes were proposed in this pull request? File listing time in scan node's SQL metrics has done and improved in spark-20136/SPARK-26327. In this pr we use QueryPlanningTracker to track start and end time of file listing. ## How was this patch tested? Add test for DataFrameWriter and Non-physical phase below: - DataFrameReader.load, file listing will be triggered by DataSource.resolveRelation. - Analyze rule like FindDataSourceTable. - Optimization rule like PruneFileSourcePartitions, OptimizeMetadataOnlyQuery. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Scan: track file listing time > ----------------------------- > > Key: SPARK-26222 > URL: https://issues.apache.org/jira/browse/SPARK-26222 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 2.4.0 > Reporter: Reynold Xin > Priority: Major > > We should track file listing time and add it to the scan node's SQL metric, > so we have visibility how much is spent in file listing. It'd be useful to > track not just duration, but also start and end time so we can construct a > timeline. > This requires a little bit design to define what file listing time means, > when we are reading from cache, vs not cache. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org