Hi,

I've developed a POC Spark Streaming application.
But it seems to perform better on my development machine  than on our cluster.
I submit it to yarn on our cloudera cluster.

But my first question is more detailed:

In de application UI (:4040) I see in the streaming section that the batch 
processing took 6 sec.
Then when I look at the stages I indeed see a stage with duration 5s.

For example:
1678

map at LogonAnalysis.scala:215+details

2015/07/09 09:17:00

5 s

50/50

173.5 KB


But when I look into the details of state 1678 it tells me the duration was 14 
ms and the aggregated metrics by executor has 1.0s as Task Time.
What is responsible for the gap between 14 ms, 1s and 5 sec?


Details for Stage 1678
*         Total task time across all tasks: 0.8 s
*         Shuffle write: 173.5 KB / 2031
 Show additional metrics
Summary Metrics for 50 Completed Tasks
Metric

Min

25th percentile

Median

75th percentile

Max

Duration

14 ms

14 ms

15 ms

15 ms

24 ms

GC Time

0 ms

0 ms

0 ms

0 ms

0 ms

Shuffle Write Size / Records

2.6 KB / 28

3.1 KB / 35

3.5 KB / 42

3.9 KB / 46

4.4 KB / 53

Aggregated Metrics by Executor
Executor ID

Address

Task Time

Total Tasks

Failed Tasks

Succeeded Tasks

Shuffle Write Size / Records

2

xxxx:44231

1.0 s

50

0

50

173.5 KB / 2031






Reply via email to