[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-04-18 Thread assia ydroudj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442379#comment-16442379
 ] 

assia ydroudj commented on SPARK-23206:
---

[~elu], thank you for the shared dos, it works for me

Is there a final PR to get the executor metrics ?

 

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, ExecutorsTab2.png, 
> MemoryTuningMetricsDesignDoc.pdf, SPARK-23206 Design Doc.pdf, StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23429) Add executor memory metrics to heartbeat and expose in executors REST API

2018-03-21 Thread assia ydroudj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16408723#comment-16408723
 ] 

assia ydroudj commented on SPARK-23429:
---

Hi  [~elu], 

Still interesting to get this values... can you please provide how to do it?

thanks.

> Add executor memory metrics to heartbeat and expose in executors REST API
> -
>
> Key: SPARK-23429
> URL: https://issues.apache.org/jira/browse/SPARK-23429
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
>
> Add new executor level memory metrics ( jvmUsedMemory, executionMemory, 
> storageMemory, and unifiedMemory), and expose these via the executors REST 
> API. This information will help provide insight into how executor and driver 
> JVM memory is used, and for the different memory regions. It can be used to 
> help determine good values for spark.executor.memory, spark.driver.memory, 
> spark.memory.fraction, and spark.memory.storageFraction.
> Add an ExecutorMetrics class, with jvmUsedMemory, executionMemory, and 
> storageMemory. This will track the memory usage at the executor level. The 
> new ExecutorMetrics will be sent by executors to the driver as part of the 
> Heartbeat. A heartbeat will be added for the driver as well, to collect these 
> metrics for the driver.
> Modify the EventLoggingListener to log ExecutorMetricsUpdate events if there 
> is a new peak value for one of the memory metrics for an executor and stage. 
> Only the ExecutorMetrics will be logged, and not the TaskMetrics, to minimize 
> additional logging. Analysis on a set of sample applications showed an 
> increase of 0.25% in the size of the Spark history log, with this approach.
> Modify the AppStatusListener to collect snapshots of peak values for each 
> memory metric. Each snapshot has the time, jvmUsedMemory, executionMemory and 
> storageMemory, and list of active stages.
> Add the new memory metrics (snapshots of peak values for each memory metric) 
> to the executors REST API.
> This is a subtask for SPARK-23206. Please refer to the design doc for that 
> ticket for more details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23738) Memory usage for executors

2018-03-19 Thread assia ydroudj (JIRA)
assia ydroudj created SPARK-23738:
-

 Summary: Memory usage for executors
 Key: SPARK-23738
 URL: https://issues.apache.org/jira/browse/SPARK-23738
 Project: Spark
  Issue Type: Question
  Components: Spark Core, Spark Submit
Affects Versions: 2.1.0
Reporter: assia ydroudj


Hi, 

I'm running a spark cluster with 3 nodes (one as master+ worker) and 2 other 
workers. each worker has one executor. Then, I execute the pagerank example 
implemented in spark. I need to gather the memory usage by each executor while 
running the application and gather them into a file to further analysis. 

How can I do it please? 

One idea in my mind is to get the PID  of executor ans driver processes and 
then, use a linux command line to get this information..is it right?

please, guide me



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21157) Report Total Memory Used by Spark Executors

2018-03-07 Thread assia ydroudj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389683#comment-16389683
 ] 

assia ydroudj commented on SPARK-21157:
---

Is there a PR for this please?

> Report Total Memory Used by Spark Executors
> ---
>
> Key: SPARK-21157
> URL: https://issues.apache.org/jira/browse/SPARK-21157
> Project: Spark
>  Issue Type: Improvement
>  Components: Input/Output
>Affects Versions: 2.1.1
>Reporter: Jose Soltren
>Priority: Major
> Attachments: TotalMemoryReportingDesignDoc.pdf
>
>
> Building on some of the core ideas of SPARK-9103, this JIRA proposes tracking 
> total memory used by Spark executors, and a means of broadcasting, 
> aggregating, and reporting memory usage data in the Spark UI.
> Here, "total memory used" refers to memory usage that is visible outside of 
> Spark, to an external observer such as YARN, Mesos, or the operating system. 
> The goal of this enhancement is to give Spark users more information about 
> how Spark clusters are using memory. Total memory will include non-Spark JVM 
> memory and all off-heap memory.
> Please consult the attached design document for further details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-03-07 Thread assia ydroudj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389644#comment-16389644
 ] 

assia ydroudj commented on SPARK-23206:
---

[~elu] , thanks 

I ll be for wait!

I have another simple question, is how to get the PID java process lunched when 
an executor starts? I got master and workers pid but i m interesting  by pid 
executor!

 
 

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, ExecutorsTab2.png, 
> MemoryTuningMetricsDesignDoc.pdf, StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-03-04 Thread assia ydroudj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385044#comment-16385044
 ] 

assia ydroudj commented on SPARK-23206:
---

Edwina Lu, Thank you

I use actually Spark 2.1 and I need to get peak values of storage, execution, 
jvm memory.

When can you submit the PR please?

 

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, ExecutorsTab2.png, 
> MemoryTuningMetricsDesignDoc.pdf, StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-02-26 Thread assia ydroudj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376655#comment-16376655
 ] 

assia ydroudj commented on SPARK-23206:
---

Hi. Where can I find the code of this PR to clone it on my machine please? I 
want to get this different memory metrics of my application? thanks

 

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, ExecutorsTab2.png, 
> MemoryTuningMetricsDesignDoc.pdf, StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21157) Report Total Memory Used by Spark Executors

2018-02-22 Thread assia ydroudj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372789#comment-16372789
 ] 

assia ydroudj commented on SPARK-21157:
---

this paragraph is well interesting for me: "Perhaps we could
show a trend line of memory usage allowing a user to see at which stage 
boundary memory usage
increased." Is it done please  
how can we get memory reporting values gathered in a json file or other formats 
to further exploitation?
thanks 

> Report Total Memory Used by Spark Executors
> ---
>
> Key: SPARK-21157
> URL: https://issues.apache.org/jira/browse/SPARK-21157
> Project: Spark
>  Issue Type: Improvement
>  Components: Input/Output
>Affects Versions: 2.1.1
>Reporter: Jose Soltren
>Priority: Major
> Attachments: TotalMemoryReportingDesignDoc.pdf
>
>
> Building on some of the core ideas of SPARK-9103, this JIRA proposes tracking 
> total memory used by Spark executors, and a means of broadcasting, 
> aggregating, and reporting memory usage data in the Spark UI.
> Here, "total memory used" refers to memory usage that is visible outside of 
> Spark, to an external observer such as YARN, Mesos, or the operating system. 
> The goal of this enhancement is to give Spark users more information about 
> how Spark clusters are using memory. Total memory will include non-Spark JVM 
> memory and all off-heap memory.
> Please consult the attached design document for further details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21157) Report Total Memory Used by Spark Executors

2018-01-10 Thread assia ydroudj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320293#comment-16320293
 ] 

assia ydroudj commented on SPARK-21157:
---



I m beginner in apache spark and have installed a prebuilt distribution of 
apache spark with hadoop. I look to get the consumption or the usage of memory 
while running the example PageRank implemented within spark. I have my cluster 
standalone mode with 1 maser and 4 workers (Virtual machines)

I have tried external tools like ganglia and graphite but they give the memory 
usage at resource or system level (more general) but what i need exactly is "to 
track the behavior of the memory (Storage, execution) while running the 
algorithm does it means, memory usage for a spark application-ID ". Is there 
anyway to get it into text-file for further exploitation? Please help me on 
this, Thanks


> Report Total Memory Used by Spark Executors
> ---
>
> Key: SPARK-21157
> URL: https://issues.apache.org/jira/browse/SPARK-21157
> Project: Spark
>  Issue Type: Improvement
>  Components: Input/Output
>Affects Versions: 2.1.1
>Reporter: Jose Soltren
> Attachments: TotalMemoryReportingDesignDoc.pdf
>
>
> Building on some of the core ideas of SPARK-9103, this JIRA proposes tracking 
> total memory used by Spark executors, and a means of broadcasting, 
> aggregating, and reporting memory usage data in the Spark UI.
> Here, "total memory used" refers to memory usage that is visible outside of 
> Spark, to an external observer such as YARN, Mesos, or the operating system. 
> The goal of this enhancement is to give Spark users more information about 
> how Spark clusters are using memory. Total memory will include non-Spark JVM 
> memory and all off-heap memory.
> Please consult the attached design document for further details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org