[jira] [Commented] (SPARK-21157) Report Total Memory Used by Spark Executors

2018-03-07 Thread assia ydroudj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389683#comment-16389683
 ] 

assia ydroudj commented on SPARK-21157:
---

Is there a PR for this please?

> Report Total Memory Used by Spark Executors
> ---
>
> Key: SPARK-21157
> URL: https://issues.apache.org/jira/browse/SPARK-21157
> Project: Spark
>  Issue Type: Improvement
>  Components: Input/Output
>Affects Versions: 2.1.1
>Reporter: Jose Soltren
>Priority: Major
> Attachments: TotalMemoryReportingDesignDoc.pdf
>
>
> Building on some of the core ideas of SPARK-9103, this JIRA proposes tracking 
> total memory used by Spark executors, and a means of broadcasting, 
> aggregating, and reporting memory usage data in the Spark UI.
> Here, "total memory used" refers to memory usage that is visible outside of 
> Spark, to an external observer such as YARN, Mesos, or the operating system. 
> The goal of this enhancement is to give Spark users more information about 
> how Spark clusters are using memory. Total memory will include non-Spark JVM 
> memory and all off-heap memory.
> Please consult the attached design document for further details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21157) Report Total Memory Used by Spark Executors

2018-02-22 Thread assia ydroudj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372789#comment-16372789
 ] 

assia ydroudj commented on SPARK-21157:
---

this paragraph is well interesting for me: "Perhaps we could
show a trend line of memory usage allowing a user to see at which stage 
boundary memory usage
increased." Is it done please  
how can we get memory reporting values gathered in a json file or other formats 
to further exploitation?
thanks 

> Report Total Memory Used by Spark Executors
> ---
>
> Key: SPARK-21157
> URL: https://issues.apache.org/jira/browse/SPARK-21157
> Project: Spark
>  Issue Type: Improvement
>  Components: Input/Output
>Affects Versions: 2.1.1
>Reporter: Jose Soltren
>Priority: Major
> Attachments: TotalMemoryReportingDesignDoc.pdf
>
>
> Building on some of the core ideas of SPARK-9103, this JIRA proposes tracking 
> total memory used by Spark executors, and a means of broadcasting, 
> aggregating, and reporting memory usage data in the Spark UI.
> Here, "total memory used" refers to memory usage that is visible outside of 
> Spark, to an external observer such as YARN, Mesos, or the operating system. 
> The goal of this enhancement is to give Spark users more information about 
> how Spark clusters are using memory. Total memory will include non-Spark JVM 
> memory and all off-heap memory.
> Please consult the attached design document for further details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21157) Report Total Memory Used by Spark Executors

2018-01-10 Thread assia ydroudj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320293#comment-16320293
 ] 

assia ydroudj commented on SPARK-21157:
---



I m beginner in apache spark and have installed a prebuilt distribution of 
apache spark with hadoop. I look to get the consumption or the usage of memory 
while running the example PageRank implemented within spark. I have my cluster 
standalone mode with 1 maser and 4 workers (Virtual machines)

I have tried external tools like ganglia and graphite but they give the memory 
usage at resource or system level (more general) but what i need exactly is "to 
track the behavior of the memory (Storage, execution) while running the 
algorithm does it means, memory usage for a spark application-ID ". Is there 
anyway to get it into text-file for further exploitation? Please help me on 
this, Thanks


> Report Total Memory Used by Spark Executors
> ---
>
> Key: SPARK-21157
> URL: https://issues.apache.org/jira/browse/SPARK-21157
> Project: Spark
>  Issue Type: Improvement
>  Components: Input/Output
>Affects Versions: 2.1.1
>Reporter: Jose Soltren
> Attachments: TotalMemoryReportingDesignDoc.pdf
>
>
> Building on some of the core ideas of SPARK-9103, this JIRA proposes tracking 
> total memory used by Spark executors, and a means of broadcasting, 
> aggregating, and reporting memory usage data in the Spark UI.
> Here, "total memory used" refers to memory usage that is visible outside of 
> Spark, to an external observer such as YARN, Mesos, or the operating system. 
> The goal of this enhancement is to give Spark users more information about 
> how Spark clusters are using memory. Total memory will include non-Spark JVM 
> memory and all off-heap memory.
> Please consult the attached design document for further details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21157) Report Total Memory Used by Spark Executors

2017-09-25 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179496#comment-16179496
 ] 

Thomas Graves commented on SPARK-21157:
---

Just to point out that yarn/mapreduce/tez already have this functionality.  Not 
saying we need to use it but adding for reference.
 
https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorProcessTree.java
procfs based:
https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ProcfsBasedProcessTree.java
windows based:
https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/WindowsBasedProcessTree.java

This won't help for other resource managers but yarn automatically puts the pid 
in the container env for you:
final String pid = System.getenv().get("JVM_PID");
It makes sense to do something more resource manager generic just pointing this 
out in case others do it.


it would be nice to have a few more details in the design. Are you getting both 
resident and virtual on Linux.  Are you doing anything with dirty pages?
are you walking the entire process tree or just doing the executor jvm?

Have you looked at all at the performance implications?  Depending on the 
information you are getting, how does pmap compare to cat /proc/pid/stat or 
/proc/pid/smaps


> Report Total Memory Used by Spark Executors
> ---
>
> Key: SPARK-21157
> URL: https://issues.apache.org/jira/browse/SPARK-21157
> Project: Spark
>  Issue Type: Improvement
>  Components: Input/Output
>Affects Versions: 2.1.1
>Reporter: Jose Soltren
> Attachments: TotalMemoryReportingDesignDoc.pdf
>
>
> Building on some of the core ideas of SPARK-9103, this JIRA proposes tracking 
> total memory used by Spark executors, and a means of broadcasting, 
> aggregating, and reporting memory usage data in the Spark UI.
> Here, "total memory used" refers to memory usage that is visible outside of 
> Spark, to an external observer such as YARN, Mesos, or the operating system. 
> The goal of this enhancement is to give Spark users more information about 
> how Spark clusters are using memory. Total memory will include non-Spark JVM 
> memory and all off-heap memory.
> Please consult the attached design document for further details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21157) Report Total Memory Used by Spark Executors

2017-09-23 Thread Wang Haihua (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16177814#comment-16177814
 ] 

Wang Haihua commented on SPARK-21157:
-

Dose the include the RES memory of one executor? 

It does useful to trace memory usage. We implementa a simple demp (Collect the 
RES memory usage by simulating the YARN container memory monitor and send the 
metrics by heartbeat to driver ) one year ago, for we need collect real memory 
usage for statistics of cluster and application optimization. 

> Report Total Memory Used by Spark Executors
> ---
>
> Key: SPARK-21157
> URL: https://issues.apache.org/jira/browse/SPARK-21157
> Project: Spark
>  Issue Type: Improvement
>  Components: Input/Output
>Affects Versions: 2.1.1
>Reporter: Jose Soltren
> Attachments: TotalMemoryReportingDesignDoc.pdf
>
>
> Building on some of the core ideas of SPARK-9103, this JIRA proposes tracking 
> total memory used by Spark executors, and a means of broadcasting, 
> aggregating, and reporting memory usage data in the Spark UI.
> Here, "total memory used" refers to memory usage that is visible outside of 
> Spark, to an external observer such as YARN, Mesos, or the operating system. 
> The goal of this enhancement is to give Spark users more information about 
> how Spark clusters are using memory. Total memory will include non-Spark JVM 
> memory and all off-heap memory.
> Please consult the attached design document for further details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21157) Report Total Memory Used by Spark Executors

2017-06-23 Thread Jose Soltren (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061582#comment-16061582
 ] 

Jose Soltren commented on SPARK-21157:
--

Hello Marcelo - It's true, the design doc doesn't discuss flights very well. 
Let me give my thoughts on them here, and I'll propagate this back to the 
design doc at a later time.

So, first off, I thought for a while to come up with a catchy name for "a 
period of time during which the number of stages in execution is constant". I 
came up with "flight". If you have a better term I would love to hear your 
thoughts. Let's stick with "flight" for now.

After giving it some thought, I don't think the end user needs to care about 
flights at all. Here's what I think the user does care about: seeing some 
metrics (min/max/mean/stdev) for how different types of memory are consumed for 
a particular stage.

I haven't worked out the details of the data store yet, but, what I envision is 
a data store of key-value pairs, where the key is the start time of a 
particular flight, and the values are the metrics associated with that flight 
and the stages that were running for the duration of that flight. Then, for a 
particular stage, we would be able to query all of the flights during which 
this stage was active, get min/max/mean/stdev metrics for each of those 
flights, and aggregate them to get total metrics for that particular stage.

These total metrics for the stage would be shown in the Stages UI.

Of course, with this data store, you could directly query statistics for a 
particular flight.

Note that there is not a precise way to determine memory used for a particular 
stage at a given time unless it was the only stage active in that flight. If 
memory usage for stages were constant then we could possibly impute the memory 
usage for a single stage given all of its flight statistics. This is not 
feasible, so, the UI would be clear that these were total memory metrics for 
executors while the stage was running, and not specific to that stage. Even 
this should be enough for an end user to do some detective work and determine 
which stage is hogging memory.

I glossed over some of these details since I thought they were well covered in 
SPARK-9103. I hope this clarifies things somewhat. If not, please let me know 
how I can clarify this further. Cheers.

> Report Total Memory Used by Spark Executors
> ---
>
> Key: SPARK-21157
> URL: https://issues.apache.org/jira/browse/SPARK-21157
> Project: Spark
>  Issue Type: Improvement
>  Components: Input/Output
>Affects Versions: 2.1.1
>Reporter: Jose Soltren
> Attachments: TotalMemoryReportingDesignDoc.pdf
>
>
> Building on some of the core ideas of SPARK-9103, this JIRA proposes tracking 
> total memory used by Spark executors, and a means of broadcasting, 
> aggregating, and reporting memory usage data in the Spark UI.
> Here, "total memory used" refers to memory usage that is visible outside of 
> Spark, to an external observer such as YARN, Mesos, or the operating system. 
> The goal of this enhancement is to give Spark users more information about 
> how Spark clusters are using memory. Total memory will include non-Spark JVM 
> memory and all off-heap memory.
> Please consult the attached design document for further details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21157) Report Total Memory Used by Spark Executors

2017-06-20 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056779#comment-16056779
 ] 

Marcelo Vanzin commented on SPARK-21157:


I'm pretty confused by the description of flights. How are they presented to 
the user? I see nothing in the API that lets me get information about a 
specific flight, or nothing in the mock UI that seems to show how different 
stage boundaries affect the metrics, so it's unclear in the document how 
flights would be used, or how they would affect the information that is shown.

The whole section seems to indicate that you would have some API for 
stage-specific memory usage, or some extra information about how stages affect 
memory usage in the executors, but I see nothing like that being explained. 
Instead, I see a description of flights and the rest of the document pretty 
much ignores them.

> Report Total Memory Used by Spark Executors
> ---
>
> Key: SPARK-21157
> URL: https://issues.apache.org/jira/browse/SPARK-21157
> Project: Spark
>  Issue Type: Improvement
>  Components: Input/Output
>Affects Versions: 2.1.1
>Reporter: Jose Soltren
> Attachments: TotalMemoryReportingDesignDoc.pdf
>
>
> Building on some of the core ideas of SPARK-9103, this JIRA proposes tracking 
> total memory used by Spark executors, and a means of broadcasting, 
> aggregating, and reporting memory usage data in the Spark UI.
> Here, "total memory used" refers to memory usage that is visible outside of 
> Spark, to an external observer such as YARN, Mesos, or the operating system. 
> The goal of this enhancement is to give Spark users more information about 
> how Spark clusters are using memory. Total memory will include non-Spark JVM 
> memory and all off-heap memory.
> Please consult the attached design document for further details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org