A job can have multiple stages for sure. One action triggers a job. This
seems normal.

On Thu, Apr 21, 2022, 9:10 AM Joe <j...@net2020.org> wrote:

> Hi,
> When looking at application UI (in Amazon EMR) I'm seeing one job for
> my particular line of code, for example:
> 64 Running count at MySparkJob.scala:540
>
> When I click into the job and go to stages I can see over a 100 stages
> running the same line of code (stages are active, pending or
> completed):
> 190 Pending count at MySparkJob.scala:540
> ...
> 162 Active count at MySparkJob.scala:540
> ...
> 108 Completed count at MySparkJob.scala:540
> ...
>
> I'm not sure what that means, I thought that stage was a logical
> operation boundary and you could have only one stage in the job (unless
> you executed the same dataset+action many times on purpose) and tasks
> were the ones that were replicated across partitions. But here I'm
> seeing many stages running, each with the same line of code?
>
> I don't have a situation where my code is re-processing the same set of
> data many times, all intermediate sets are persisted.
> I'm not sure if EMR UI display is wrong or if spark stages are not what
> I thought they were?
> Thanks,
>
> Joe
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Reply via email to