Would you mind posting the code?
On 2 Jun 2015 00:53, "Karlson" <ksonsp...@siberie.de> wrote:

> Hi,
>
> In all (pyspark) Spark jobs, that become somewhat more involved, I am
> experiencing the issue that some stages take a very long time to complete
> and sometimes don't at all. This clearly correlates with the size of my
> input data. Looking at the stage details for one such stage, I am wondering
> where Spark spends all this time. Take this table of the stages task
> metrics for example:
>
> Metric                          Min             25th
> percentile      Median          75th percentile Max
> Duration                        1.4 min         1.5 min         1.7 min
>      1.9 min         2.3 min
> Scheduler Delay                 1 ms            3 ms            4 ms
>       5 ms            23 ms
> Task Deserialization Time       1 ms            2 ms            3 ms
>       8 ms            22 ms
> GC Time                         0 ms            0 ms            0 ms
>       0 ms            0 ms
> Result Serialization Time       0 ms            0 ms            0 ms
>       0 ms            1 ms
> Getting Result Time             0 ms            0 ms            0 ms
>       0 ms            0 ms
> Input Size / Records            23.9 KB / 1     24.0 KB / 1     24.1 KB /
> 1     24.1 KB / 1     24.3 KB / 1
>
> Why is the overall duration almost 2min? Where is all this time spent,
> when no progress of the stages is visible? The progress bar simply displays
> 0 succeeded tasks for a very long time before sometimes slowly progressing.
>
> Also, the name of the stage displayed above is `javaToPython at null:-1`,
> which I find very uninformative. I don't even know which action exactly is
> responsible for this stage. Does anyone experience similar issues or have
> any advice for me?
>
> Thanks!
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to