[jira] [Commented] (SPARK-21157) Report Total Memory Used by Spark Executors

Jose Soltren (JIRA) Fri, 23 Jun 2017 15:18:28 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-21157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061582#comment-16061582
 ]


Jose Soltren commented on SPARK-21157:
--------------------------------------

Hello Marcelo - It's true, the design doc doesn't discuss flights very well. 
Let me give my thoughts on them here, and I'll propagate this back to the 
design doc at a later time.

So, first off, I thought for a while to come up with a catchy name for "a 
period of time during which the number of stages in execution is constant". I 
came up with "flight". If you have a better term I would love to hear your 
thoughts. Let's stick with "flight" for now.

After giving it some thought, I don't think the end user needs to care about 
flights at all. Here's what I think the user does care about: seeing some 
metrics (min/max/mean/stdev) for how different types of memory are consumed for 
a particular stage.

I haven't worked out the details of the data store yet, but, what I envision is 
a data store of key-value pairs, where the key is the start time of a 
particular flight, and the values are the metrics associated with that flight 
and the stages that were running for the duration of that flight. Then, for a 
particular stage, we would be able to query all of the flights during which 
this stage was active, get min/max/mean/stdev metrics for each of those 
flights, and aggregate them to get total metrics for that particular stage.

These total metrics for the stage would be shown in the Stages UI.

Of course, with this data store, you could directly query statistics for a 
particular flight.

Note that there is not a precise way to determine memory used for a particular 
stage at a given time unless it was the only stage active in that flight. If 
memory usage for stages were constant then we could possibly impute the memory 
usage for a single stage given all of its flight statistics. This is not 
feasible, so, the UI would be clear that these were total memory metrics for 
executors while the stage was running, and not specific to that stage. Even 
this should be enough for an end user to do some detective work and determine 
which stage is hogging memory.

I glossed over some of these details since I thought they were well covered in 
SPARK-9103. I hope this clarifies things somewhat. If not, please let me know 
how I can clarify this further. Cheers.

> Report Total Memory Used by Spark Executors
> -------------------------------------------
>
>                 Key: SPARK-21157
>                 URL: https://issues.apache.org/jira/browse/SPARK-21157
>             Project: Spark
>          Issue Type: Improvement
>          Components: Input/Output
>    Affects Versions: 2.1.1
>            Reporter: Jose Soltren
>         Attachments: TotalMemoryReportingDesignDoc.pdf
>
>
> Building on some of the core ideas of SPARK-9103, this JIRA proposes tracking 
> total memory used by Spark executors, and a means of broadcasting, 
> aggregating, and reporting memory usage data in the Spark UI.
> Here, "total memory used" refers to memory usage that is visible outside of 
> Spark, to an external observer such as YARN, Mesos, or the operating system. 
> The goal of this enhancement is to give Spark users more information about 
> how Spark clusters are using memory. Total memory will include non-Spark JVM 
> memory and all off-heap memory.
> Please consult the attached design document for further details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21157) Report Total Memory Used by Spark Executors

Reply via email to