I think I found my answer at https://github.com/kayousterhout/trace-analysis:

"One thing to keep in mind is that Spark does not currently include
instrumentation to measure the time spent reading input data from disk or
writing job output to disk (the `Output write wait'' shown in the waterfall
is time to write shuffle output to disk, which Spark does have
instrumentation for); as a result, the time shown asCompute' may include
time using the disk. We have a custom Hadoop branch that measures the time
Hadoop spends transferring data to/from disk, and we are hopeful that
similar timing metrics will someday be included in the Hadoop FileStatistics
API. In the meantime, it is not currently possible to understand how much of
a Spark task's time is spent reading from disk via HDFS."

That said, this might be posted as a footnote at the event timeline to avoid
confusion :)

Best regards,

Tom Hubregtsen



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Info-from-the-event-timeline-appears-to-contradict-dstat-info-tp23862p23865.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to