[jira] [Comment Edited] (SPARK-26222) Scan: track file listing time

Yuanjian Li (JIRA) Thu, 13 Dec 2018 07:54:03 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-26222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16720088#comment-16720088
 ]


Yuanjian Li edited comment on SPARK-26222 at 12/13/18 3:43 PM:
---------------------------------------------------------------

Hi Reynold, the first PR track all file listing time into a independent phase, 
which happened in analyze, optimize and DataFrameReader/Writer, is this match 
your thoughts? Or you want to display all the 'real'(real means currently the 
scan node sql metrics is read from cache) file listing file time spending in 
FileSourceScanExec node?


was (Author: xuanyuan):
Hi Reynold, the first PR track all file listing time into a independent phase, 
which happened in analyze, optimize and DataFrameReader/Writer, is this match 
your thoughts?

> Scan: track file listing time
> -----------------------------
>
>                 Key: SPARK-26222
>                 URL: https://issues.apache.org/jira/browse/SPARK-26222
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Reynold Xin
>            Priority: Major
>
> We should track file listing time and add it to the scan node's SQL metric, 
> so we have visibility how much is spent in file listing. It'd be useful to 
> track not just duration, but also start and end time so we can construct a 
> timeline.
> This requires a little bit design to define what file listing time means, 
> when we are reading from cache, vs not cache.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26222) Scan: track file listing time

Reply via email to