On Tue, Jul 22, 2014 at 7:08 AM, Yifan LI <iamyifa...@gmail.com> wrote:
> 1) what is the difference between "Duration"(Stages -> Completed Stages) > and "Task Time"(Executors) ? Stages are composed of tasks that run on executors. Tasks within a stage may run concurrently, since there are multiple executors and each executor may run more than one task at a time. An executor's task time is the sum of the durations of all of its tasks. Because this is a simple sum, it does not take parallelism into account: if an executor runs 8 tasks concurrently and each takes a minute, it has only spent one minute of wallclock time, but the reported task time will be 8 minutes. A stage's duration is how much wallclock time elapsed between when the first task launched and when the last task finished. This does take parallelism into account, so in the above example the stage duration would be 1 minute. 2) what are the exact meanings of "Shuffle Read/Shuffle Write"? Stages communicate using shuffles. Each task may start by reading shuffle inputs across the network, and may end by writing shuffle outputs to disk locally. See page 7 of the Spark NSDI paper <https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final138.pdf> for details. Shuffle read and shuffle write refer to the total amount of data that a stage read across the network and wrote to disk. Ankur <http://www.ankurdave.com/>