And to be clear. Yes, execution plans show what exactly it's doing. The
problem is that it's unclear how it's related to the actual Scala/Python
code.
On 7/21/20 15:45, Michal Sankot wrote:
Yes, the problem is that DAGs only refer to code line (action) that
inovked it. It doesn't provide information about how individual
transformations link to the code.
So you can have dozen of stages, each with the same code line which
invoked it, doing different stuff. And then we guess what it's
actually doing.
On 7/21/20 15:36, Russell Spitzer wrote:
Have you looked in the DAG visualization? Each block refer to the
code line invoking it.
For Dataframes the execution plan will let you know explicitly which
operations are in which stages.
On Tue, Jul 21, 2020, 8:18 AM Michal Sankot
<michal.san...@spreaker.com.invalid> wrote:
Hi,
when I analyze and debug our Spark batch jobs executions it's a
pain to
find out how blocks in Spark UI Jobs/SQL tab correspond to the
actual
Scala code that we write and how much time they take. Would there
be a
way to somehow instruct compiler or something and get this
information
into Spark UI?
At the moment linking Spark UI elements with our code is a guess
work
driven by adding and removing lines of code and reruning the job,
which
is tedious. A possibility to make our life easier e.g. by running
Spark
jobs in dedicated debug mode where this information would be
available
would be greatly appreciated. (Though I don't know whether it's
possible
at all).
Thanks,
Michal
---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
<mailto:dev-unsubscr...@spark.apache.org>
--
<https://www.voxnest.com>
<https://twitter.com/voxnest><https://www.facebook.com/voxnest/><https://www.instagram.com/voxnest><https://www.linkedin.com/company/voxnest/>
MichalSankot
BigData Engineer
E: michal.san...@voxnest.com <mailto:michal.san...@voxnest.com>