The line of code triggers a job, the job triggers stages. You should see
they are different operations, all supporting execution of the action on
that line.
On Thu, Apr 21, 2022 at 9:24 AM Joe wrote:
> Hi Sean,
> Thanks for replying but my question was about multiple stages running
> the same
There are a few things going on here.
1. Spark is lazy, so nothing happens until a result is collected back to the
driver or data is written to a sink. So the 1 line you see
is most likely just that trigger. Once triggered, all of the work required to
make that final result happen occurs. If
Hi Sean,
Thanks for replying but my question was about multiple stages running
the same line of code, not about multiple stages in general. Yes single
job can have multiple stages, but they should not be repeated, as far
as I know, if you're caching/persisting your intermediate outputs.
My
A job can have multiple stages for sure. One action triggers a job. This
seems normal.
On Thu, Apr 21, 2022, 9:10 AM Joe wrote:
> Hi,
> When looking at application UI (in Amazon EMR) I'm seeing one job for
> my particular line of code, for example:
> 64 Running count at MySparkJob.scala:540
>
>
Hi,
When looking at application UI (in Amazon EMR) I'm seeing one job for
my particular line of code, for example:
64 Running count at MySparkJob.scala:540
When I click into the job and go to stages I can see over a 100 stages
running the same line of code (stages are active, pending or