[ 
https://issues.apache.org/jira/browse/SPARK-21157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16179496#comment-16179496
 ] 

Thomas Graves commented on SPARK-21157:
---------------------------------------

Just to point out that yarn/mapreduce/tez already have this functionality.  Not 
saying we need to use it but adding for reference.
 
https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorProcessTree.java
procfs based:
https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ProcfsBasedProcessTree.java
windows based:
https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/WindowsBasedProcessTree.java

This won't help for other resource managers but yarn automatically puts the pid 
in the container env for you:
    final String pid = System.getenv().get("JVM_PID");
It makes sense to do something more resource manager generic just pointing this 
out in case others do it.


it would be nice to have a few more details in the design. Are you getting both 
resident and virtual on Linux.  Are you doing anything with dirty pages?
are you walking the entire process tree or just doing the executor jvm?

Have you looked at all at the performance implications?  Depending on the 
information you are getting, how does pmap compare to cat /proc/pid/stat or 
/proc/pid/smaps


> Report Total Memory Used by Spark Executors
> -------------------------------------------
>
>                 Key: SPARK-21157
>                 URL: https://issues.apache.org/jira/browse/SPARK-21157
>             Project: Spark
>          Issue Type: Improvement
>          Components: Input/Output
>    Affects Versions: 2.1.1
>            Reporter: Jose Soltren
>         Attachments: TotalMemoryReportingDesignDoc.pdf
>
>
> Building on some of the core ideas of SPARK-9103, this JIRA proposes tracking 
> total memory used by Spark executors, and a means of broadcasting, 
> aggregating, and reporting memory usage data in the Spark UI.
> Here, "total memory used" refers to memory usage that is visible outside of 
> Spark, to an external observer such as YARN, Mesos, or the operating system. 
> The goal of this enhancement is to give Spark users more information about 
> how Spark clusters are using memory. Total memory will include non-Spark JVM 
> memory and all off-heap memory.
> Please consult the attached design document for further details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to