[jira] [Commented] (SPARK-26329) ExecutorMetrics should poll faster than heartbeats

2019-01-17 Thread Edwina Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16745362#comment-16745362
 ] 

Edwina Lu commented on SPARK-26329:
---

[~wypoon], this sounds like a good solution. Sending the information at task 
end would make sure that the information (for peak values for the stage) is as 
complete as possible. This will also provide information for short stages, 
which might not have much (or any) information captured by the heartbeat. The 
number of concurrent stages should be small, so the amount of extra information 
sent for each heartbeat would also be limited.

> ExecutorMetrics should poll faster than heartbeats
> --
>
> Key: SPARK-26329
> URL: https://issues.apache.org/jira/browse/SPARK-26329
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Web UI
>Affects Versions: 3.0.0
>Reporter: Imran Rashid
>Priority: Major
>
> We should allow faster polling of the executor memory metrics (SPARK-23429 / 
> SPARK-23206) without requiring a faster heartbeat rate.  We've seen the 
> memory usage of executors pike over 1 GB in less than a second, but 
> heartbeats are only every 10 seconds (by default).  Spark needs to enable 
> fast polling to capture these peaks, without causing too much strain on the 
> system.
> In the current implementation, the metrics are polled along with the 
> heartbeat, but this leads to a slow rate of polling metrics by default.  If 
> users were to increase the rate of the heartbeat, they risk overloading the 
> driver on a large cluster, with too many messages and too much work to 
> aggregate the metrics.  But, the executor could poll the metrics more 
> frequently, and still only send the *max* since the last heartbeat for each 
> metric.  This keeps the load on the driver the same, and only introduces a 
> small overhead on the executor to grab the metrics and keep the max.
> The downside of this approach is that we still need to wait for the next 
> heartbeat for the driver to be aware of the new peak.   If the executor dies 
> or is killed before then, then we won't find out.  A potential future 
> enhancement would be to send an update *anytime* there is an increase by some 
> percentage, but we'll leave that out for now.
> Another possibility would be to change the metrics themselves to track peaks 
> for us, so we don't have to fine-tune the polling rate.  For example, some 
> jvm metrics provide a usage threshold, and notification: 
> https://docs.oracle.com/javase/7/docs/api/java/lang/management/MemoryPoolMXBean.html#UsageThreshold
> But, that is not available on all metrics.  This proposal gives us a generic 
> way to get a more accurate peak memory usage for *all* metrics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26399) Add new stage-level REST APIs and parameters

2018-12-18 Thread Edwina Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edwina Lu updated SPARK-26399:
--
Description: 
Add the peak values for the metrics to the stages REST API. Also add a new 
executorSummary REST API, which will return executor summary metrics for a 
specified stage:
{code:java}
curl http://:18080/api/v1/applicationsexecutorSummary{code}

Add parameters to the stages REST API to specify:
*  filtering for task status, and returning tasks that match (for example, 
FAILED tasks).
* task metric quantiles, add adding the task summary if specified
* executor metric quantiles, and adding the executor summary if specified

  was:
Add the peak values for the metrics to the stages REST API. Also add a new 
executorSummary REST API, which will return executor summary metrics for a 
specified stage:
{code:java}
curl http://:18080/api/v1/applicationsexecutorSummary
{code}


> Add new stage-level REST APIs and parameters
> 
>
> Key: SPARK-26399
> URL: https://issues.apache.org/jira/browse/SPARK-26399
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Edwina Lu
>Priority: Major
>
> Add the peak values for the metrics to the stages REST API. Also add a new 
> executorSummary REST API, which will return executor summary metrics for a 
> specified stage:
> {code:java}
> curl http://:18080/api/v1/applications/ id>// attempt>/executorSummary{code}
> Add parameters to the stages REST API to specify:
> *  filtering for task status, and returning tasks that match (for example, 
> FAILED tasks).
> * task metric quantiles, add adding the task summary if specified
> * executor metric quantiles, and adding the executor summary if specified



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26399) Add new stage-level REST APIs and parameters

2018-12-18 Thread Edwina Lu (JIRA)
Edwina Lu created SPARK-26399:
-

 Summary: Add new stage-level REST APIs and parameters
 Key: SPARK-26399
 URL: https://issues.apache.org/jira/browse/SPARK-26399
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 2.4.0
Reporter: Edwina Lu


Add the peak values for the metrics to the stages REST API. Also add a new 
executorSummary REST API, which will return executor summary metrics for a 
specified stage:
{code:java}
curl http://:18080/api/v1/applicationsexecutorSummary
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23431) Expose the new executor memory metrics at the stage level

2018-12-18 Thread Edwina Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edwina Lu updated SPARK-23431:
--
Description: 
Collect and show the new executor memory metrics for each stage, to provide 
more information on how memory is used per stage.

Modify the AppStatusListener to track the peak values for JVM used memory, 
execution memory, storage memory, and unified memory for each executor for each 
stage.

This is a subtask for SPARK-23206. Please refer to the design doc for that 
ticket for more details.

  was:
Collect and show the new executor memory metrics for each stage, to provide 
more information on how memory is used per stage.

Modify the AppStatusListener to track the peak values for JVM used memory, 
execution memory, storage memory, and unified memory for each executor for each 
stage.

Add the peak values for the metrics to the stages REST API. Also add a new 
stageSummary REST API, which will return executor summary metrics for a 
specified stage:
{code:java}
curl http://:18080/api/v1/applicationsexecutorSummary{code}
This is a subtask for SPARK-23206. Please refer to the design doc for that 
ticket for more details.


> Expose the new executor memory metrics at the stage level
> -
>
> Key: SPARK-23431
> URL: https://issues.apache.org/jira/browse/SPARK-23431
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
>
> Collect and show the new executor memory metrics for each stage, to provide 
> more information on how memory is used per stage.
> Modify the AppStatusListener to track the peak values for JVM used memory, 
> execution memory, storage memory, and unified memory for each executor for 
> each stage.
> This is a subtask for SPARK-23206. Please refer to the design doc for that 
> ticket for more details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23431) Expose the new executor memory metrics at the stage level

2018-12-18 Thread Edwina Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16724472#comment-16724472
 ] 

Edwina Lu commented on SPARK-23431:
---

Splitting off the new REST APIs into a new subtask.

> Expose the new executor memory metrics at the stage level
> -
>
> Key: SPARK-23431
> URL: https://issues.apache.org/jira/browse/SPARK-23431
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
>
> Collect and show the new executor memory metrics for each stage, to provide 
> more information on how memory is used per stage.
> Modify the AppStatusListener to track the peak values for JVM used memory, 
> execution memory, storage memory, and unified memory for each executor for 
> each stage.
> This is a subtask for SPARK-23206. Please refer to the design doc for that 
> ticket for more details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23432) Expose executor memory metrics in the web UI for executors

2018-12-11 Thread Edwina Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edwina Lu updated SPARK-23432:
--
Summary: Expose executor memory metrics in the web UI for executors  (was: 
Expose executor memory metrics in the web UI for executors and stages)

> Expose executor memory metrics in the web UI for executors
> --
>
> Key: SPARK-23432
> URL: https://issues.apache.org/jira/browse/SPARK-23432
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
>
> Add the new memory metrics (jvmUsedMemory, executionMemory, storageMemory, 
> and unifiedMemory) to the executors tab, in the summary and for foreach 
> executor. 
> Also add the new memory metrics to the stages tab. Add a new Summary Metrics 
> for Executors table, which will show quantile values for the executor level 
> metrics. Also add columns for the new metrics to the Aggregated Metrics by 
> Executor table.
> This is a subtask for SPARK-23206. Please refer to the design doc for that 
> ticket for more details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-12-11 Thread Edwina Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16717988#comment-16717988
 ] 

Edwina Lu commented on SPARK-23206:
---

[~irashid], [~rezasafi], [~wypoon], as discussed, I've split SPARK-23432:
 * SPARK-23432 is now for web UI changes for the executors tab (metrics added 
in SPARK-23429 and SPARK-24958).
 * SPARK-26341 is for the web UI changes for stage level metrics, in the stages 
tab. This is dependent on SPARK-23431, which I am currently working on.

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, ExecutorsTab2.png, 
> MemoryTuningMetricsDesignDoc.pdf, SPARK-23206 Design Doc.pdf, StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23432) Expose executor memory metrics in the web UI for executors

2018-12-11 Thread Edwina Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edwina Lu updated SPARK-23432:
--
Description: 
Add the new memory metrics (jvmUsedMemory, executionMemory, storageMemory, and 
unifiedMemory, etc.) to the executors tab, in the summary and for foreach 
executor.

This is a subtask for SPARK-23206. Please refer to the design doc for that 
ticket for more details.

  was:
Add the new memory metrics (jvmUsedMemory, executionMemory, storageMemory, and 
unifiedMemory) to the executors tab, in the summary and for foreach executor. 

Also add the new memory metrics to the stages tab. Add a new Summary Metrics 
for Executors table, which will show quantile values for the executor level 
metrics. Also add columns for the new metrics to the Aggregated Metrics by 
Executor table.

This is a subtask for SPARK-23206. Please refer to the design doc for that 
ticket for more details.


> Expose executor memory metrics in the web UI for executors
> --
>
> Key: SPARK-23432
> URL: https://issues.apache.org/jira/browse/SPARK-23432
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
>
> Add the new memory metrics (jvmUsedMemory, executionMemory, storageMemory, 
> and unifiedMemory, etc.) to the executors tab, in the summary and for foreach 
> executor.
> This is a subtask for SPARK-23206. Please refer to the design doc for that 
> ticket for more details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23432) Expose executor memory metrics in the web UI for executors

2018-12-11 Thread Edwina Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edwina Lu updated SPARK-23432:
--
Description: 
Add the new memory metrics (jvmUsedMemory, executionMemory, storageMemory, and 
unifiedMemory, etc.) to the executors tab, in the summary and for each executor.

This is a subtask for SPARK-23206. Please refer to the design doc for that 
ticket for more details.

  was:
Add the new memory metrics (jvmUsedMemory, executionMemory, storageMemory, and 
unifiedMemory, etc.) to the executors tab, in the summary and for foreach 
executor.

This is a subtask for SPARK-23206. Please refer to the design doc for that 
ticket for more details.


> Expose executor memory metrics in the web UI for executors
> --
>
> Key: SPARK-23432
> URL: https://issues.apache.org/jira/browse/SPARK-23432
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
>
> Add the new memory metrics (jvmUsedMemory, executionMemory, storageMemory, 
> and unifiedMemory, etc.) to the executors tab, in the summary and for each 
> executor.
> This is a subtask for SPARK-23206. Please refer to the design doc for that 
> ticket for more details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26341) Expose executor memory metrics at the stage level, in the Stages tab

2018-12-11 Thread Edwina Lu (JIRA)
Edwina Lu created SPARK-26341:
-

 Summary: Expose executor memory metrics at the stage level, in the 
Stages tab
 Key: SPARK-26341
 URL: https://issues.apache.org/jira/browse/SPARK-26341
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core, Web UI
Affects Versions: 2.4.0
Reporter: Edwina Lu


Sub-task SPARK-23431 will add stage level executor memory metrics (peak values 
for each stage, and peak values for each executor for the stage). This 
information should also be exposed the the web UI, so that users can see which 
stages are memory intensive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26329) ExecutorMetrics should poll faster than heartbeats

2018-12-10 Thread Edwina Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16715491#comment-16715491
 ] 

Edwina Lu commented on SPARK-26329:
---

More frequent polling (and more accurate metrics) sound good. Right now with  
SPARK-23206, it will keep track of the peak values per stage. Would the stage 
level information (peak values) still be kept with more frequent polling?

For the following example, with : event/polled metric(s)

10: start stage 1; start task 1 stage 1

11: val = 25

12:start stage 2; start task 1 stage 2

13: val = 20

14: end stage 1; end task 1 stage 1

15: val = 15

16: heartbeat 1

17: val = 17

18: end stage 2; end task 1 stage 2

 

 

Stage 1 should have peak metric 25 (from time 11), and stage 2 should have peak 
metric 20 ( from time 13, when both stage 1 and stage 2 are active). To 
maintain stage level information, heartbeat 1 would need to send the peak 
values for both stage 1 and stage 2.

> ExecutorMetrics should poll faster than heartbeats
> --
>
> Key: SPARK-26329
> URL: https://issues.apache.org/jira/browse/SPARK-26329
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Web UI
>Affects Versions: 3.0.0
>Reporter: Imran Rashid
>Priority: Major
>
> We should allow faster polling of the executor memory metrics (SPARK-23429 / 
> SPARK-23206) without requiring a faster heartbeat rate.  We've seen the 
> memory usage of executors pike over 1 GB in less than a second, but 
> heartbeats are only every 10 seconds (by default).  Spark needs to enable 
> fast polling to capture these peaks, without causing too much strain on the 
> system.
> In the current implementation, the metrics are polled along with the 
> heartbeat, but this leads to a slow rate of polling metrics by default.  If 
> users were to increase the rate of the heartbeat, they risk overloading the 
> driver on a large cluster, with too many messages and too much work to 
> aggregate the metrics.  But, the executor could poll the metrics more 
> frequently, and still only send the *max* since the last heartbeat for each 
> metric.  This keeps the load on the driver the same, and only introduces a 
> small overhead on the executor to grab the metrics and keep the max.
> The downside of this approach is that we still need to wait for the next 
> heartbeat for the driver to be aware of the new peak.   If the executor dies 
> or is killed before then, then we won't find out.  A potential future 
> enhancement would be to send an update *anytime* there is an increase by some 
> percentage, but we'll leave that out for now.
> Another possibility would be to change the metrics themselves to track peaks 
> for us, so we don't have to fine-tune the polling rate.  For example, some 
> jvm metrics provide a usage threshold, and notification: 
> https://docs.oracle.com/javase/7/docs/api/java/lang/management/MemoryPoolMXBean.html#UsageThreshold
> But, that is not available on all metrics.  This proposal gives us a generic 
> way to get a more accurate peak memory usage for *all* metrics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-10-26 Thread Edwina Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16665628#comment-16665628
 ] 

Edwina Lu commented on SPARK-23206:
---

[~irashid], yes, I am planning to work on the other tasks for adding the 
metrics at the stage level and in the UI. I am planning to see how the final 
APIs will look with SPARK-23206, and want to include these metrics for stage 
and UI as well.

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, ExecutorsTab2.png, 
> MemoryTuningMetricsDesignDoc.pdf, SPARK-23206 Design Doc.pdf, StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23429) Add executor memory metrics to heartbeat and expose in executors REST API

2018-09-30 Thread Edwina Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edwina Lu updated SPARK-23429:
--
Fix Version/s: 3.0.0

> Add executor memory metrics to heartbeat and expose in executors REST API
> -
>
> Key: SPARK-23429
> URL: https://issues.apache.org/jira/browse/SPARK-23429
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Fix For: 3.0.0
>
>
> Add new executor level memory metrics ( jvmUsedMemory, onHeapExecutionMemory, 
> offHeapExecutionMemory, onHeapStorageMemory, offHeapStorageMemory, 
> onHeapUnifiedMemory, and offHeapUnifiedMemory), and expose these via the 
> executors REST API. This information will help provide insight into how 
> executor and driver JVM memory is used, and for the different memory regions. 
> It can be used to help determine good values for spark.executor.memory, 
> spark.driver.memory, spark.memory.fraction, and spark.memory.storageFraction.
> Add an ExecutorMetrics class, with jvmUsedMemory, onHeapExecutionMemory, 
> offHeapExecutionMemory, onHeapStorageMemory, and offHeapStorageMemory. This 
> will track the memory usage at the executor level. The new ExecutorMetrics 
> will be sent by executors to the driver as part of the Heartbeat. A heartbeat 
> will be added for the driver as well, to collect these metrics for the driver.
> Modify the EventLoggingListener to log ExecutorMetricsUpdate events if there 
> is a new peak value for one of the memory metrics for an executor and stage. 
> Only the ExecutorMetrics will be logged, and not the TaskMetrics, to minimize 
> additional logging. Analysis on a set of sample applications showed an 
> increase of 0.25% in the size of the Spark history log, with this approach.
> Modify the AppStatusListener to collect snapshots of peak values for each 
> memory metric. Each snapshot has the time, jvmUsedMemory, executionMemory and 
> storageMemory, and list of active stages.
> Add the new memory metrics (snapshots of peak values for each memory metric) 
> to the executors REST API.
> This is a subtask for SPARK-23206. Please refer to the design doc for that 
> ticket for more details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24356) Duplicate strings in File.path managed by FileSegmentManagedBuffer

2018-05-28 Thread Edwina Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16492960#comment-16492960
 ] 

Edwina Lu commented on SPARK-24356:
---

Thanks! This is interesting, and could help with some of our shuffle service 
issues – we can give this a try.

> Duplicate strings in File.path managed by FileSegmentManagedBuffer
> --
>
> Key: SPARK-24356
> URL: https://issues.apache.org/jira/browse/SPARK-24356
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 2.3.0
>Reporter: Misha Dmitriev
>Priority: Major
> Attachments: SPARK-24356.01.patch
>
>
> I recently analyzed a heap dump of Yarn Node Manager that was suffering from 
> high GC pressure due to high object churn. Analysis was done with the jxray 
> tool ([www.jxray.com)|http://www.jxray.com)/] that checks a heap dump for a 
> number of well-known memory issues. One problem that it found in this dump is 
> 19.5% of memory wasted due to duplicate strings. Of these duplicates, more 
> than a half come from {{FileInputStream.path}} and {{File.path}}. All the 
> {{FileInputStream}} objects that JXRay shows are garbage - looks like they 
> are used for a very short period and then discarded (I guess there is a 
> separate question of whether that's a good pattern). But {{File}} instances 
> are traceable to 
> {{org.apache.spark.network.buffer.FileSegmentManagedBuffer.file}} field. Here 
> is the full reference chain:
>  
> {code:java}
> ↖java.io.File.path
> ↖org.apache.spark.network.buffer.FileSegmentManagedBuffer.file
> ↖{j.u.ArrayList}
> ↖j.u.ArrayList$Itr.this$0
> ↖org.apache.spark.network.server.OneForOneStreamManager$StreamState.buffers
> ↖{java.util.concurrent.ConcurrentHashMap}.values
> ↖org.apache.spark.network.server.OneForOneStreamManager.streams
> ↖org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.streamManager
> ↖org.apache.spark.network.yarn.YarnShuffleService.blockHandler
> ↖Java Static org.apache.spark.network.yarn.YarnShuffleService.instance
> {code}
>  
> Values of these {{File.path}}'s and {{FileInputStream.path}}'s look very 
> similar, so I think {{FileInputStream}}s are generated by the 
> {{FileSegmentManagedBuffer}} code. Instances of {{File}}, in turn, likely 
> come from 
> [https://github.com/apache/spark/blob/master/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java#L258-L263]
>  
> To avoid duplicate strings in {{File.path}}'s in this case, it is suggested 
> that in the above code we create a File with a complete, normalized pathname, 
> that has been already interned. This will prevent the code inside 
> {{java.io.File}} from modifying this string, and thus it will use the 
> interned copy, and will pass it to FileInputStream. Essentially the current 
> line
> {code:java}
> return new File(new File(localDir, String.format("%02x", subDirId)), 
> filename);{code}
> should be replaced with something like
> {code:java}
> String pathname = localDir + File.separator + String.format(...) + 
> File.separator + filename;
> pathname = fileSystem.normalize(pathname).intern();
> return new File(pathname);{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-05-11 Thread Edwina Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472405#comment-16472405
 ] 

Edwina Lu commented on SPARK-23206:
---

The design discussion for SPARK-23206 is scheduled for Monday, May 15 at 11am 
PDT (6pm UTC). 

https://linkedin.bluejeans.com/1886759322/

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, ExecutorsTab2.png, 
> MemoryTuningMetricsDesignDoc.pdf, SPARK-23206 Design Doc.pdf, StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-05-10 Thread Edwina Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16470789#comment-16470789
 ] 

Edwina Lu commented on SPARK-23206:
---

[~irashid], I do not have the rest of the changes for 2.3/master yet. The 
original changes were done for 2.1, and also somewhat different.

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, ExecutorsTab2.png, 
> MemoryTuningMetricsDesignDoc.pdf, SPARK-23206 Design Doc.pdf, StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23206) Additional Memory Tuning Metrics

2018-04-21 Thread Edwina Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16446247#comment-16446247
 ] 

Edwina Lu edited comment on SPARK-23206 at 4/22/18 3:36 AM:


CANCELLED: -The design discussion is scheduled for Monday 4/23 PDT at 11am 
(Monday April 23, 6pm UTC).-

Sorry for the late notice. I won't be able to attend on Monday. I will post 
when the meeting is rescheduled.


was (Author: elu):
The design discussion is scheduled for Monday 4/23 PDT at 11am (Monday April 
23, 6pm UTC).

Bluejeans: https://linkedin.bluejeans.com/1886759322/

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, ExecutorsTab2.png, 
> MemoryTuningMetricsDesignDoc.pdf, SPARK-23206 Design Doc.pdf, StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-04-20 Thread Edwina Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16446247#comment-16446247
 ] 

Edwina Lu commented on SPARK-23206:
---

The design discussion is scheduled for Monday 4/23 PDT at 11am (Monday April 
23, 6pm UTC).

Bluejeans: https://linkedin.bluejeans.com/1886759322/

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, ExecutorsTab2.png, 
> MemoryTuningMetricsDesignDoc.pdf, SPARK-23206 Design Doc.pdf, StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-04-18 Thread Edwina Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443025#comment-16443025
 ] 

Edwina Lu commented on SPARK-23206:
---

We are planning a design discussion for early next week. Please let me know if 
you are interested in attending, and I will post once there is a set date/time.

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, ExecutorsTab2.png, 
> MemoryTuningMetricsDesignDoc.pdf, SPARK-23206 Design Doc.pdf, StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-04-18 Thread Edwina Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442928#comment-16442928
 ] 

Edwina Lu commented on SPARK-23206:
---

[~assia6], there is an initial PR:  [[Github] Pull Request #20940 
(edwinalu)|https://github.com/apache/spark/pull/20940], and I am working on 
making changes based on comments.

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, ExecutorsTab2.png, 
> MemoryTuningMetricsDesignDoc.pdf, SPARK-23206 Design Doc.pdf, StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-04-17 Thread Edwina Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441006#comment-16441006
 ] 

Edwina Lu commented on SPARK-23206:
---

[~assia6], could you please try the new link, 
[https://docs.google.com/document/d/1fIL2XMHPnqs6kaeHr822iTvs08uuYnjP5roSGZfejyA/edit?usp=sharing]

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, ExecutorsTab2.png, 
> MemoryTuningMetricsDesignDoc.pdf, SPARK-23206 Design Doc.pdf, StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-04-16 Thread Edwina Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16440326#comment-16440326
 ] 

Edwina Lu commented on SPARK-23206:
---

[~smilegator], please try: 
https://docs.google.com/document/d/1fIL2XMHPnqs6kaeHr822iTvs08uuYnjP5roSGZfejyA/edit?usp=sharing

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, ExecutorsTab2.png, 
> MemoryTuningMetricsDesignDoc.pdf, SPARK-23206 Design Doc.pdf, StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23206) Additional Memory Tuning Metrics

2018-04-16 Thread Edwina Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438879#comment-16438879
 ] 

Edwina Lu edited comment on SPARK-23206 at 4/17/18 3:15 AM:


After discussion with [~irashid] on the PR, we've decided to move 
ExecutorMetricsUpdate logging to stage end, to minimize the amount of extra 
logging. The updated design doc: 
https://docs.google.com/document/d/1fIL2XMHPnqs6kaeHr822iTvs08uuYnjP5roSGZfejyA/edit?usp=sharing

[^SPARK-23206 Design Doc.pdf]


was (Author: elu):
After discussion with [~irashid] on the PR, we've decided to move 
ExecutorMetricsUpdate logging to stage end, to minimize the amount of extra 
logging. The updated design doc: 
[https://docs.google.com/document/d/1vLojop9I4WkpUdbrSnoHzJ6jkCMnH2Ot5JTSk7YEX5s/edit?usp=sharing|https://docs.google.com/document/d/1fIL2XMHPnqs6kaeHr822iTvs08uuYnjP5roSGZfejyA/edit?usp=sharing]

[^SPARK-23206 Design Doc.pdf]

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, ExecutorsTab2.png, 
> MemoryTuningMetricsDesignDoc.pdf, SPARK-23206 Design Doc.pdf, StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23206) Additional Memory Tuning Metrics

2018-04-16 Thread Edwina Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438879#comment-16438879
 ] 

Edwina Lu edited comment on SPARK-23206 at 4/17/18 3:14 AM:


After discussion with [~irashid] on the PR, we've decided to move 
ExecutorMetricsUpdate logging to stage end, to minimize the amount of extra 
logging. The updated design doc: 
[https://docs.google.com/document/d/1vLojop9I4WkpUdbrSnoHzJ6jkCMnH2Ot5JTSk7YEX5s/edit?usp=sharing|https://docs.google.com/document/d/1fIL2XMHPnqs6kaeHr822iTvs08uuYnjP5roSGZfejyA/edit?usp=sharing]

[^SPARK-23206 Design Doc.pdf]


was (Author: elu):
After discussion with [~irashid] on the PR, we've decided to move 
ExecutorMetricsUpdate logging to stage end, to minimize the amount of extra 
logging. The updated design doc: 
[https://docs.google.com/document/d/1vLojop9I4WkpUdbrSnoHzJ6jkCMnH2Ot5JTSk7YEX5s/edit?usp=sharing]

[^SPARK-23206 Design Doc.pdf]

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, ExecutorsTab2.png, 
> MemoryTuningMetricsDesignDoc.pdf, SPARK-23206 Design Doc.pdf, StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23206) Additional Memory Tuning Metrics

2018-04-15 Thread Edwina Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edwina Lu updated SPARK-23206:
--
Attachment: (was: SPARK-23206 Design Doc.pdf)

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, ExecutorsTab2.png, 
> MemoryTuningMetricsDesignDoc.pdf, SPARK-23206 Design Doc.pdf, StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23206) Additional Memory Tuning Metrics

2018-04-15 Thread Edwina Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edwina Lu updated SPARK-23206:
--
Attachment: SPARK-23206 Design Doc.pdf

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, ExecutorsTab2.png, 
> MemoryTuningMetricsDesignDoc.pdf, SPARK-23206 Design Doc.pdf, StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23206) Additional Memory Tuning Metrics

2018-04-15 Thread Edwina Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edwina Lu updated SPARK-23206:
--
Attachment: SPARK-23206 Design Doc.pdf

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, ExecutorsTab2.png, 
> MemoryTuningMetricsDesignDoc.pdf, SPARK-23206 Design Doc.pdf, StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-04-15 Thread Edwina Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438879#comment-16438879
 ] 

Edwina Lu commented on SPARK-23206:
---

After discussion with [~irashid] on the PR, we've decided to move 
ExecutorMetricsUpdate logging to stage end, to minimize the amount of extra 
logging. The updated design doc: 
[https://docs.google.com/document/d/1vLojop9I4WkpUdbrSnoHzJ6jkCMnH2Ot5JTSk7YEX5s/edit?usp=sharing]

[^SPARK-23206 Design Doc.pdf]

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, ExecutorsTab2.png, 
> MemoryTuningMetricsDesignDoc.pdf, SPARK-23206 Design Doc.pdf, StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-04-09 Thread Edwina Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16430922#comment-16430922
 ] 

Edwina Lu commented on SPARK-23206:
---

The doc wasn't very clear what the quantile values were for. Thanks for asking.

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, ExecutorsTab2.png, 
> MemoryTuningMetricsDesignDoc.pdf, StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-04-09 Thread Edwina Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16430765#comment-16430765
 ] 

Edwina Lu commented on SPARK-23206:
---

[~irashid], this would be quantile values for peak executor memory usage (JVM 
used, execution, etc.) and other executor metrics across executors for a stage. 
It would give some idea of differences in memory usage for executors (for 
example if most executors are using 2G, but there are a couple using 10G). 
There is already information about skew at the task level with the taskSummary 
REST API (input, output and shuffle read/write), but an executor summary would 
show the effects of skew at the executor level.

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, ExecutorsTab2.png, 
> MemoryTuningMetricsDesignDoc.pdf, StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23429) Add executor memory metrics to heartbeat and expose in executors REST API

2018-03-29 Thread Edwina Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edwina Lu updated SPARK-23429:
--
Description: 
Add new executor level memory metrics ( jvmUsedMemory, onHeapExecutionMemory, 
offHeapExecutionMemory, onHeapStorageMemory, offHeapStorageMemory, 
onHeapUnifiedMemory, and offHeapUnifiedMemory), and expose these via the 
executors REST API. This information will help provide insight into how 
executor and driver JVM memory is used, and for the different memory regions. 
It can be used to help determine good values for spark.executor.memory, 
spark.driver.memory, spark.memory.fraction, and spark.memory.storageFraction.

Add an ExecutorMetrics class, with jvmUsedMemory, onHeapExecutionMemory, 
offHeapExecutionMemory, onHeapStorageMemory, and offHeapStorageMemory. This 
will track the memory usage at the executor level. The new ExecutorMetrics will 
be sent by executors to the driver as part of the Heartbeat. A heartbeat will 
be added for the driver as well, to collect these metrics for the driver.

Modify the EventLoggingListener to log ExecutorMetricsUpdate events if there is 
a new peak value for one of the memory metrics for an executor and stage. Only 
the ExecutorMetrics will be logged, and not the TaskMetrics, to minimize 
additional logging. Analysis on a set of sample applications showed an increase 
of 0.25% in the size of the Spark history log, with this approach.

Modify the AppStatusListener to collect snapshots of peak values for each 
memory metric. Each snapshot has the time, jvmUsedMemory, executionMemory and 
storageMemory, and list of active stages.

Add the new memory metrics (snapshots of peak values for each memory metric) to 
the executors REST API.

This is a subtask for SPARK-23206. Please refer to the design doc for that 
ticket for more details.

  was:
Add new executor level memory metrics ( jvmUsedMemory, executionMemory, 
storageMemory, and unifiedMemory), and expose these via the executors REST API. 
This information will help provide insight into how executor and driver JVM 
memory is used, and for the different memory regions. It can be used to help 
determine good values for spark.executor.memory, spark.driver.memory, 
spark.memory.fraction, and spark.memory.storageFraction.

Add an ExecutorMetrics class, with jvmUsedMemory, executionMemory, and 
storageMemory. This will track the memory usage at the executor level. The new 
ExecutorMetrics will be sent by executors to the driver as part of the 
Heartbeat. A heartbeat will be added for the driver as well, to collect these 
metrics for the driver.

Modify the EventLoggingListener to log ExecutorMetricsUpdate events if there is 
a new peak value for one of the memory metrics for an executor and stage. Only 
the ExecutorMetrics will be logged, and not the TaskMetrics, to minimize 
additional logging. Analysis on a set of sample applications showed an increase 
of 0.25% in the size of the Spark history log, with this approach.

Modify the AppStatusListener to collect snapshots of peak values for each 
memory metric. Each snapshot has the time, jvmUsedMemory, executionMemory and 
storageMemory, and list of active stages.

Add the new memory metrics (snapshots of peak values for each memory metric) to 
the executors REST API.

This is a subtask for SPARK-23206. Please refer to the design doc for that 
ticket for more details.


> Add executor memory metrics to heartbeat and expose in executors REST API
> -
>
> Key: SPARK-23429
> URL: https://issues.apache.org/jira/browse/SPARK-23429
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
>
> Add new executor level memory metrics ( jvmUsedMemory, onHeapExecutionMemory, 
> offHeapExecutionMemory, onHeapStorageMemory, offHeapStorageMemory, 
> onHeapUnifiedMemory, and offHeapUnifiedMemory), and expose these via the 
> executors REST API. This information will help provide insight into how 
> executor and driver JVM memory is used, and for the different memory regions. 
> It can be used to help determine good values for spark.executor.memory, 
> spark.driver.memory, spark.memory.fraction, and spark.memory.storageFraction.
> Add an ExecutorMetrics class, with jvmUsedMemory, onHeapExecutionMemory, 
> offHeapExecutionMemory, onHeapStorageMemory, and offHeapStorageMemory. This 
> will track the memory usage at the executor level. The new ExecutorMetrics 
> will be sent by executors to the driver as part of the Heartbeat. A heartbeat 
> will be added for the driver as well, to collect these metrics for the driver.
> Modify the EventLoggingListener to log ExecutorMetricsUpdate events if there 
> is a new peak value for one of the memory metrics for an execu

[jira] [Commented] (SPARK-23429) Add executor memory metrics to heartbeat and expose in executors REST API

2018-03-22 Thread Edwina Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16409775#comment-16409775
 ] 

Edwina Lu commented on SPARK-23429:
---

[~assia6], it's being reviewed internally right now. I will post the PR when 
the review and any changes are done.

> Add executor memory metrics to heartbeat and expose in executors REST API
> -
>
> Key: SPARK-23429
> URL: https://issues.apache.org/jira/browse/SPARK-23429
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
>
> Add new executor level memory metrics ( jvmUsedMemory, executionMemory, 
> storageMemory, and unifiedMemory), and expose these via the executors REST 
> API. This information will help provide insight into how executor and driver 
> JVM memory is used, and for the different memory regions. It can be used to 
> help determine good values for spark.executor.memory, spark.driver.memory, 
> spark.memory.fraction, and spark.memory.storageFraction.
> Add an ExecutorMetrics class, with jvmUsedMemory, executionMemory, and 
> storageMemory. This will track the memory usage at the executor level. The 
> new ExecutorMetrics will be sent by executors to the driver as part of the 
> Heartbeat. A heartbeat will be added for the driver as well, to collect these 
> metrics for the driver.
> Modify the EventLoggingListener to log ExecutorMetricsUpdate events if there 
> is a new peak value for one of the memory metrics for an executor and stage. 
> Only the ExecutorMetrics will be logged, and not the TaskMetrics, to minimize 
> additional logging. Analysis on a set of sample applications showed an 
> increase of 0.25% in the size of the Spark history log, with this approach.
> Modify the AppStatusListener to collect snapshots of peak values for each 
> memory metric. Each snapshot has the time, jvmUsedMemory, executionMemory and 
> storageMemory, and list of active stages.
> Add the new memory metrics (snapshots of peak values for each memory metric) 
> to the executors REST API.
> This is a subtask for SPARK-23206. Please refer to the design doc for that 
> ticket for more details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-03-05 Thread Edwina Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16387083#comment-16387083
 ] 

Edwina Lu commented on SPARK-23206:
---

[~assia6], sorry for the delay. I'm planning to submit a pull request in a 
couple of weeks.

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, ExecutorsTab2.png, 
> MemoryTuningMetricsDesignDoc.pdf, StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-02-27 Thread Edwina Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16378879#comment-16378879
 ] 

Edwina Lu commented on SPARK-23206:
---

[~assia6], I haven't submitted the PR yet – the code is currently in Spark 2.1 
and needs to be forward ported. I will update the ticket when the PR is 
available.

 

[~felixcheung], I will let you know when we schedule the next meeting. Is there 
more information on what you are looking into for shuffle, and metric 
collections?

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, ExecutorsTab2.png, 
> MemoryTuningMetricsDesignDoc.pdf, StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-02-14 Thread Edwina Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16364979#comment-16364979
 ] 

Edwina Lu commented on SPARK-23206:
---

After discussion with [~cltlfcjin] and the eBay Hadoop team, we would like to 
coordinate our efforts in adding more executor memory metrics. I've added some 
subtasks, and will follow up with pull requests. I think there are some design 
differences – looking forward to hearing more details. Comments and suggestions 
are very welcome.

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, ExecutorsTab2.png, 
> MemoryTuningMetricsDesignDoc.pdf, StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23432) Expose executor memory metrics in the web UI for executors and stages

2018-02-14 Thread Edwina Lu (JIRA)
Edwina Lu created SPARK-23432:
-

 Summary: Expose executor memory metrics in the web UI for 
executors and stages
 Key: SPARK-23432
 URL: https://issues.apache.org/jira/browse/SPARK-23432
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 2.2.1
Reporter: Edwina Lu


Add the new memory metrics (jvmUsedMemory, executionMemory, storageMemory, and 
unifiedMemory) to the executors tab, in the summary and for foreach executor. 

Also add the new memory metrics to the stages tab. Add a new Summary Metrics 
for Executors table, which will show quantile values for the executor level 
metrics. Also add columns for the new metrics to the Aggregated Metrics by 
Executor table.

This is a subtask for SPARK-23206. Please refer to the design doc for that 
ticket for more details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23431) Expose the new executor memory metrics at the stage level

2018-02-14 Thread Edwina Lu (JIRA)
Edwina Lu created SPARK-23431:
-

 Summary: Expose the new executor memory metrics at the stage level
 Key: SPARK-23431
 URL: https://issues.apache.org/jira/browse/SPARK-23431
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 2.2.1
Reporter: Edwina Lu


Collect and show the new executor memory metrics for each stage, to provide 
more information on how memory is used per stage.

Modify the AppStatusListener to track the peak values for JVM used memory, 
execution memory, storage memory, and unified memory for each executor for each 
stage.

Add the peak values for the metrics to the stages REST API. Also add a new 
stageSummary REST API, which will return executor summary metrics for a 
specified stage:
{code:java}
curl http://:18080/api/v1/applicationsexecutorSummary{code}
This is a subtask for SPARK-23206. Please refer to the design doc for that 
ticket for more details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23429) Add executor memory metrics to heartbeat and expose in executors REST API

2018-02-14 Thread Edwina Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edwina Lu updated SPARK-23429:
--
Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-23206

> Add executor memory metrics to heartbeat and expose in executors REST API
> -
>
> Key: SPARK-23429
> URL: https://issues.apache.org/jira/browse/SPARK-23429
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
>
> Add new executor level memory metrics ( jvmUsedMemory, executionMemory, 
> storageMemory, and unifiedMemory), and expose these via the executors REST 
> API. This information will help provide insight into how executor and driver 
> JVM memory is used, and for the different memory regions. It can be used to 
> help determine good values for spark.executor.memory, spark.driver.memory, 
> spark.memory.fraction, and spark.memory.storageFraction.
> Add an ExecutorMetrics class, with jvmUsedMemory, executionMemory, and 
> storageMemory. This will track the memory usage at the executor level. The 
> new ExecutorMetrics will be sent by executors to the driver as part of the 
> Heartbeat. A heartbeat will be added for the driver as well, to collect these 
> metrics for the driver.
> Modify the EventLoggingListener to log ExecutorMetricsUpdate events if there 
> is a new peak value for one of the memory metrics for an executor and stage. 
> Only the ExecutorMetrics will be logged, and not the TaskMetrics, to minimize 
> additional logging. Analysis on a set of sample applications showed an 
> increase of 0.25% in the size of the Spark history log, with this approach.
> Modify the AppStatusListener to collect snapshots of peak values for each 
> memory metric. Each snapshot has the time, jvmUsedMemory, executionMemory and 
> storageMemory, and list of active stages.
> Add the new memory metrics (snapshots of peak values for each memory metric) 
> to the executors REST API.
> This is a subtask for SPARK-23206. Please refer to the design doc for that 
> ticket for more details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23429) Add executor memory metrics to heartbeat and expose in executors REST API

2018-02-14 Thread Edwina Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16364967#comment-16364967
 ] 

Edwina Lu commented on SPARK-23429:
---

Subtask of SPARK-23206

> Add executor memory metrics to heartbeat and expose in executors REST API
> -
>
> Key: SPARK-23429
> URL: https://issues.apache.org/jira/browse/SPARK-23429
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
>
> Add new executor level memory metrics ( jvmUsedMemory, executionMemory, 
> storageMemory, and unifiedMemory), and expose these via the executors REST 
> API. This information will help provide insight into how executor and driver 
> JVM memory is used, and for the different memory regions. It can be used to 
> help determine good values for spark.executor.memory, spark.driver.memory, 
> spark.memory.fraction, and spark.memory.storageFraction.
> Add an ExecutorMetrics class, with jvmUsedMemory, executionMemory, and 
> storageMemory. This will track the memory usage at the executor level. The 
> new ExecutorMetrics will be sent by executors to the driver as part of the 
> Heartbeat. A heartbeat will be added for the driver as well, to collect these 
> metrics for the driver.
> Modify the EventLoggingListener to log ExecutorMetricsUpdate events if there 
> is a new peak value for one of the memory metrics for an executor and stage. 
> Only the ExecutorMetrics will be logged, and not the TaskMetrics, to minimize 
> additional logging. Analysis on a set of sample applications showed an 
> increase of 0.25% in the size of the Spark history log, with this approach.
> Modify the AppStatusListener to collect snapshots of peak values for each 
> memory metric. Each snapshot has the time, jvmUsedMemory, executionMemory and 
> storageMemory, and list of active stages.
> Add the new memory metrics (snapshots of peak values for each memory metric) 
> to the executors REST API.
> This is a subtask for SPARK-23206. Please refer to the design doc for that 
> ticket for more details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23429) Add executor memory metrics to heartbeat and expose in executors REST API

2018-02-14 Thread Edwina Lu (JIRA)
Edwina Lu created SPARK-23429:
-

 Summary: Add executor memory metrics to heartbeat and expose in 
executors REST API
 Key: SPARK-23429
 URL: https://issues.apache.org/jira/browse/SPARK-23429
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.2.1
Reporter: Edwina Lu


Add new executor level memory metrics ( jvmUsedMemory, executionMemory, 
storageMemory, and unifiedMemory), and expose these via the executors REST API. 
This information will help provide insight into how executor and driver JVM 
memory is used, and for the different memory regions. It can be used to help 
determine good values for spark.executor.memory, spark.driver.memory, 
spark.memory.fraction, and spark.memory.storageFraction.

Add an ExecutorMetrics class, with jvmUsedMemory, executionMemory, and 
storageMemory. This will track the memory usage at the executor level. The new 
ExecutorMetrics will be sent by executors to the driver as part of the 
Heartbeat. A heartbeat will be added for the driver as well, to collect these 
metrics for the driver.

Modify the EventLoggingListener to log ExecutorMetricsUpdate events if there is 
a new peak value for one of the memory metrics for an executor and stage. Only 
the ExecutorMetrics will be logged, and not the TaskMetrics, to minimize 
additional logging. Analysis on a set of sample applications showed an increase 
of 0.25% in the size of the Spark history log, with this approach.

Modify the AppStatusListener to collect snapshots of peak values for each 
memory metric. Each snapshot has the time, jvmUsedMemory, executionMemory and 
storageMemory, and list of active stages.

Add the new memory metrics (snapshots of peak values for each memory metric) to 
the executors REST API.

This is a subtask for SPARK-23206. Please refer to the design doc for that 
ticket for more details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23206) Additional Memory Tuning Metrics

2018-02-14 Thread Edwina Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edwina Lu updated SPARK-23206:
--
Issue Type: Umbrella  (was: Improvement)

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, ExecutorsTab2.png, 
> MemoryTuningMetricsDesignDoc.pdf, StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-02-09 Thread Edwina Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16358866#comment-16358866
 ] 

Edwina Lu commented on SPARK-23206:
---

[~irashid], total memory and off heap memory is also very useful for us, so we 
are interested in the work being done for SPARK-21157 and SPARK-9103. The 
infrastructure (using the heartbeat and selectively logging to the history log) 
is also similar. We are planning to discuss with [~cltlfcjin] on Monday.

For stage level logging, we've modified LiveExecutorStageSummary to store peak 
values for the new memory metrics, and these are checked and updated for active 
stages in AppStatusListener.onExecutorMetricsUpdate(). For history logging, our 
design is a bit simpler: we track the peak values per executor, and immediately 
log if there is a new peak value. The peak values are reinitialized whenever a 
new stage starts, and this would provide the peak value for a memory metric for 
a stage. In the design doc for SPARK-9103, the heartbeats are combined and 
logged at each stage end – this design could work for us as well.

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, ExecutorsTab2.png, 
> MemoryTuningMetricsDesignDoc.pdf, StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23206) Additional Memory Tuning Metrics

2018-02-08 Thread Edwina Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357846#comment-16357846
 ] 

Edwina Lu edited comment on SPARK-23206 at 2/9/18 2:30 AM:
---

[~cltlfcjin], thanks for uploading the screenshot – these would be useful 
metrics for us as well, and it would be great to coordinate our efforts in an 
umbrella ticket. Yes, let's talk – both Skype and Slack are good. What is a 
good time/day?


was (Author: elu):
[~cltlfcjin], thanks for uploading the screenshot – these would be useful 
metrics for us as well. Yes, let's talk – both Skype and Slack are good. What 
is a good time/day?

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorTab2.png, ExecutorsTab.png, 
> MemoryTuningMetricsDesignDoc.pdf, StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-02-08 Thread Edwina Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357846#comment-16357846
 ] 

Edwina Lu commented on SPARK-23206:
---

[~cltlfcjin], thanks for uploading the screenshot – these would be useful 
metrics for us as well. Yes, let's talk – both Skype and Slack are good. What 
is a good time/day?

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorTab2.png, ExecutorsTab.png, 
> MemoryTuningMetricsDesignDoc.pdf, StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-02-08 Thread Edwina Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357841#comment-16357841
 ] 

Edwina Lu commented on SPARK-23206:
---

[~jerryshao], thanks for your help and advice. [~cltlfcjin], I'll contact you 
to discuss and coordinate.

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, MemoryTuningMetricsDesignDoc.pdf, 
> StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-02-07 Thread Edwina Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355994#comment-16355994
 ] 

Edwina Lu commented on SPARK-23206:
---

We ([~jerryshao], [~zhz] and I) are planning a conference call via Skype to 
discuss this ticket tomorrow Thursday Feb 8 at 5pm PST. If anyone else is 
interested in joining the call, please let me know.

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, MemoryTuningMetricsDesignDoc.pdf, 
> StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23206) Additional Memory Tuning Metrics

2018-01-31 Thread Edwina Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edwina Lu updated SPARK-23206:
--
Attachment: StageTab.png

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, MemoryTuningMetricsDesignDoc.pdf, 
> StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23206) Additional Memory Tuning Metrics

2018-01-31 Thread Edwina Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edwina Lu updated SPARK-23206:
--
Attachment: (was: StageTab.png)

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, MemoryTuningMetricsDesignDoc.pdf
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23206) Additional Memory Tuning Metrics

2018-01-26 Thread Edwina Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edwina Lu updated SPARK-23206:
--
Attachment: (was: ExecutorsTab.png)

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, MemoryTuningMetricsDesignDoc.pdf, 
> StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23206) Additional Memory Tuning Metrics

2018-01-26 Thread Edwina Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edwina Lu updated SPARK-23206:
--
Attachment: ExecutorsTab.png

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, MemoryTuningMetricsDesignDoc.pdf, 
> StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-01-26 Thread Edwina Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16341526#comment-16341526
 ] 

Edwina Lu commented on SPARK-23206:
---

We'd like to monitor the following executor level metrics:
 * JVM used memory: the JVM heap size for the executor/driver. 
 *  ManagementFactory.getMemoryMxBean.getHeapMemoryUsage().getUsed()


 * Execution memory: memory used for computation in shuffles, joins, sorts and 
aggregations.
 * MemoryManager.executionMemoryUsed()


 * Storage memory: memory used caching and propagating internal data across the 
cluster.
 * MemoryManager.storageMemoryUsed()


 * Unified memory: sum of execution and storage memory.

We would like to expose the above metrics via the Web UI and REST API, for 
stages and executors. I've attached screenshots of how these metrics would look 
in the Stage tab (new Summary Metrics for Executor tables to give an overall 
view of executor level metrics for the stage, and new columns for each metric 
in the Aggregated Metrics by Executor page), and Executors tab (new columns for 
each metric). For the stages REST API, peak values for each metric would be 
returned for each stage, and each executor for the stage. For the executors 
REST API, snapshots of the peak values for each metric would be included. This 
makes it straightforward for users to see how much memory is used by executors, 
and which stages are using are using more (or less) memory. 

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, MemoryTuningMetricsDesignDoc.pdf, 
> StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

--

[jira] [Updated] (SPARK-23206) Additional Memory Tuning Metrics

2018-01-26 Thread Edwina Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edwina Lu updated SPARK-23206:
--
Attachment: ExecutorsTab.png

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, MemoryTuningMetricsDesignDoc.pdf, 
> StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23206) Additional Memory Tuning Metrics

2018-01-26 Thread Edwina Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edwina Lu updated SPARK-23206:
--
Attachment: (was: ExecutorsTab.png)

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, MemoryTuningMetricsDesignDoc.pdf, 
> StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23206) Additional Memory Tuning Metrics

2018-01-26 Thread Edwina Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edwina Lu updated SPARK-23206:
--
Attachment: ExecutorsTab.png

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, MemoryTuningMetricsDesignDoc.pdf, 
> StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23206) Additional Memory Tuning Metrics

2018-01-26 Thread Edwina Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edwina Lu updated SPARK-23206:
--
Attachment: StageTab.png

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, MemoryTuningMetricsDesignDoc.pdf, 
> StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-01-25 Thread Edwina Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16340474#comment-16340474
 ] 

Edwina Lu commented on SPARK-23206:
---

[~jerryshao], yes  SPARK-9103 is similar, and this proposal (SPARK-23206) seems 
complementary. We are also interested in netty and total memory metrics from  
SPARK-9103 and SPARK-21157. It would be great to share the Heartbeat and 
history logging infrastructure mentioned in both tickets. We'd also like to 
have the metrics exposed via web UI and REST API.

Adding JVM used memory metrics would give an idea about Spark's JVM process, 
and adding executor level information about execution and storage memory would 
give insight into the unified memory region. Peak execution memory is available 
at the task level, so users could calculate a peak execution memory for the 
executor by multiplying with the number of concurrent tasks. Storage memory is 
available in the storage tab while it is running, but it isn't visible once the 
application finishes, so can be hard to examine, and would be difficult to 
combine with execution memory for an overall view into the unified memory 
region, at a per executor and stage level.

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: MemoryTuningMetricsDesignDoc.pdf
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-9103) Tracking spark's memory usage

2018-01-25 Thread Edwina Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16340449#comment-16340449
 ] 

Edwina Lu edited comment on SPARK-9103 at 1/26/18 2:08 AM:
---

We (at LinkedIn) are interested in gathering more memory metrics as well. The 
pull requests for exposing netty and java.nio bufferedPool metrics via Metrics 
System have been merged. Are the changes for adding the metrics to the 
heartbeat and exposing it via the web UI still being worked on? From pull 
request 17762, it sounds like this may have been replaced by SPARK-21157. 
Having both total memory and netty memory information would be useful.

SPARK-23206 proposes adding executor level metrics for JVM used memory, storage 
memory, and execution memory. It is also using the heartbeat to send executor 
metrics, and exposing the metrics via the web UI, and could share some of the 
same infrastructure.

 


was (Author: elu):
We (at LinkedIn) are interested in gathering more memory metrics as well. The 
pull requests for exposing netty and java.nio bufferedPool metrics via Metrics 
System have been merged. Are the changes for adding the metrics to the 
heartbeat and exposing it via the web UI still being worked on? From pull 
request 17762, it sounds like this may have been replaced by SPARK-21157. 
Having both total memory and netty memory information would be useful.

 [LIHADOOP-34243|https://jira01.corp.linkedin.com:8443/browse/LIHADOOP-34243] 
proposes adding executor level metrics for JVM used memory, storage memory, and 
execution memory. It is also using the heartbeat to send executor metrics, and 
exposing the metrics via the web UI, and could share some of the same 
infrastructure.

 

> Tracking spark's memory usage
> -
>
> Key: SPARK-9103
> URL: https://issues.apache.org/jira/browse/SPARK-9103
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core, Web UI
>Reporter: Zhang, Liye
>Priority: Major
> Attachments: Tracking Spark Memory Usage - Phase 1.pdf
>
>
> Currently spark only provides little memory usage information (RDD cache on 
> webUI) for the executors. User have no idea on what is the memory consumption 
> when they are running spark applications with a lot of memory used in spark 
> executors. Especially when they encounter the OOM, it’s really hard to know 
> what is the cause of the problem. So it would be helpful to give out the 
> detail memory consumption information for each part of spark, so that user 
> can clearly have a picture of where the memory is exactly used. 
> The memory usage info to expose should include but not limited to shuffle, 
> cache, network, serializer, etc.
> User can optionally choose to open this functionality since this is mainly 
> for debugging and tuning.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9103) Tracking spark's memory usage

2018-01-25 Thread Edwina Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16340449#comment-16340449
 ] 

Edwina Lu commented on SPARK-9103:
--

We (at LinkedIn) are interested in gathering more memory metrics as well. The 
pull requests for exposing netty and java.nio bufferedPool metrics via Metrics 
System have been merged. Are the changes for adding the metrics to the 
heartbeat and exposing it via the web UI still being worked on? From pull 
request 17762, it sounds like this may have been replaced by SPARK-21157. 
Having both total memory and netty memory information would be useful.

 [LIHADOOP-34243|https://jira01.corp.linkedin.com:8443/browse/LIHADOOP-34243] 
proposes adding executor level metrics for JVM used memory, storage memory, and 
execution memory. It is also using the heartbeat to send executor metrics, and 
exposing the metrics via the web UI, and could share some of the same 
infrastructure.

 

> Tracking spark's memory usage
> -
>
> Key: SPARK-9103
> URL: https://issues.apache.org/jira/browse/SPARK-9103
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core, Web UI
>Reporter: Zhang, Liye
>Priority: Major
> Attachments: Tracking Spark Memory Usage - Phase 1.pdf
>
>
> Currently spark only provides little memory usage information (RDD cache on 
> webUI) for the executors. User have no idea on what is the memory consumption 
> when they are running spark applications with a lot of memory used in spark 
> executors. Especially when they encounter the OOM, it’s really hard to know 
> what is the cause of the problem. So it would be helpful to give out the 
> detail memory consumption information for each part of spark, so that user 
> can clearly have a picture of where the memory is exactly used. 
> The memory usage info to expose should include but not limited to shuffle, 
> cache, network, serializer, etc.
> User can optionally choose to open this functionality since this is mainly 
> for debugging and tuning.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-01-24 Thread Edwina Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16338559#comment-16338559
 ] 

Edwina Lu commented on SPARK-23206:
---

Thanks, [~zsxwing]. Making the new metrics available in the metrics system 
would be very useful. Could this be done separately?

We would also like the metrics to be available in the Spark web UI, since it is 
easy for users to see and use, and the REST API, which is used by [Dr. 
Elephant|https://github.com/linkedin/dr-elephant] (an open source tool for 
analyzing and tuning Hadoop and now Spark) and other metrics gathering and 
analysis projects which we have.

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: MemoryTuningMetricsDesignDoc.pdf
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23206) Additional Memory Tuning Metrics

2018-01-24 Thread Edwina Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edwina Lu updated SPARK-23206:
--
Description: 
At LinkedIn, we have multiple clusters, running thousands of Spark 
applications, and these numbers are growing rapidly. We need to ensure that 
these Spark applications are well tuned – cluster resources, including memory, 
should be used efficiently so that the cluster can support running more 
applications concurrently, and applications should run quickly and reliably.

Currently there is limited visibility into how much memory executors are using, 
and users are guessing numbers for executor and driver memory sizing. These 
estimates are often much larger than needed, leading to memory wastage. 
Examining the metrics for one cluster for a month, the average percentage of 
used executor memory (max JVM used memory across executors /  
spark.executor.memory) is 35%, leading to an average of 591GB unused memory per 
application (number of executors * (spark.executor.memory - max JVM used 
memory)). Spark has multiple memory regions (user memory, execution memory, 
storage memory, and overhead memory), and to understand how memory is being 
used and fine-tune allocation between regions, it would be useful to have 
information about how much memory is being used for the different regions.

To improve visibility into memory usage for the driver and executors and 
different memory regions, the following additional memory metrics can be be 
tracked for each executor and driver:
 * JVM used memory: the JVM heap size for the executor/driver.
 * Execution memory: memory used for computation in shuffles, joins, sorts and 
aggregations.
 * Storage memory: memory used caching and propagating internal data across the 
cluster.
 * Unified memory: sum of execution and storage memory.

The peak values for each memory metric can be tracked for each executor, and 
also per stage. This information can be shown in the Spark UI and the REST 
APIs. Information for peak JVM used memory can help with determining 
appropriate values for spark.executor.memory and spark.driver.memory, and 
information about the unified memory region can help with determining 
appropriate values for spark.memory.fraction and spark.memory.storageFraction. 
Stage memory information can help identify which stages are most memory 
intensive, and users can look into the relevant code to determine if it can be 
optimized.

The memory metrics can be gathered by adding the current JVM used memory, 
execution memory and storage memory to the heartbeat. SparkListeners are 
modified to collect the new metrics for the executors, stages and Spark history 
log. Only interesting values (peak values per stage per executor) are recorded 
in the Spark history log, to minimize the amount of additional logging.

We have attached our design documentation with this ticket and would like to 
receive feedback from the community for this proposal.

  was:
At LinkedIn, we have multiple clusters, running thousands of Spark 
applications, and these numbers are growing rapidly. We need to ensure that 
these Spark applications are well tuned -- cluster resources, including memory, 
should be used efficiently so that the cluster can support running more 
applications concurrently, and applications should run quickly and reliably.

Currently there is limited visibility into how much memory executors are using, 
and users are guessing numbers for executor and driver memory sizing. These 
estimates are often much larger than needed, leading to memory wastage. 
Examining the metrics for one cluster for a month, the average percentage of 
used executor memory (max JVM used memory across executors /  
spark.executor.memory) is 35%, leading to an average of 591GB unused memory per 
application (number of executors * (spark.executor.memory - max JVM used 
memory)). Spark has multiple memory regions (user memory, execution memory, 
storage memory, and overhead memory), and to understand how memory is being 
used and fine-tune allocation between regions, it would be useful to have 
information about how much memory is being used for the different regions.

To improve visibility into memory usage for the driver and executors and 
different memory regions, the following additional memory metrics can be be 
tracked for each executor and driver:
 * JVM used memory: the JVM heap size for the executor/driver. 
 * Execution memory: memory used for computation in shuffles, joins, sorts and 
aggregations.
 * Storage memory: memory used caching and propagating internal data across the 
cluster.
 * Unified memory: sum of execution and storage memory.

The peak values for each memory metric can be tracked for each executor, and 
also per stage. This information can be shown in the Spark UI and the REST 
APIs. Information for peak JVM used memory can help with determining 
appropriate values for 

[jira] [Updated] (SPARK-23206) Additional Memory Tuning Metrics

2018-01-24 Thread Edwina Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edwina Lu updated SPARK-23206:
--
Attachment: MemoryTuningMetricsDesignDoc.pdf

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: MemoryTuningMetricsDesignDoc.pdf
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned -- cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver. 
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23206) Additional Memory Tuning Metrics

2018-01-24 Thread Edwina Lu (JIRA)
Edwina Lu created SPARK-23206:
-

 Summary: Additional Memory Tuning Metrics
 Key: SPARK-23206
 URL: https://issues.apache.org/jira/browse/SPARK-23206
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.2.1
Reporter: Edwina Lu


At LinkedIn, we have multiple clusters, running thousands of Spark 
applications, and these numbers are growing rapidly. We need to ensure that 
these Spark applications are well tuned -- cluster resources, including memory, 
should be used efficiently so that the cluster can support running more 
applications concurrently, and applications should run quickly and reliably.

Currently there is limited visibility into how much memory executors are using, 
and users are guessing numbers for executor and driver memory sizing. These 
estimates are often much larger than needed, leading to memory wastage. 
Examining the metrics for one cluster for a month, the average percentage of 
used executor memory (max JVM used memory across executors /  
spark.executor.memory) is 35%, leading to an average of 591GB unused memory per 
application (number of executors * (spark.executor.memory - max JVM used 
memory)). Spark has multiple memory regions (user memory, execution memory, 
storage memory, and overhead memory), and to understand how memory is being 
used and fine-tune allocation between regions, it would be useful to have 
information about how much memory is being used for the different regions.

To improve visibility into memory usage for the driver and executors and 
different memory regions, the following additional memory metrics can be be 
tracked for each executor and driver:
 * JVM used memory: the JVM heap size for the executor/driver. 
 * Execution memory: memory used for computation in shuffles, joins, sorts and 
aggregations.
 * Storage memory: memory used caching and propagating internal data across the 
cluster.
 * Unified memory: sum of execution and storage memory.

The peak values for each memory metric can be tracked for each executor, and 
also per stage. This information can be shown in the Spark UI and the REST 
APIs. Information for peak JVM used memory can help with determining 
appropriate values for spark.executor.memory and spark.driver.memory, and 
information about the unified memory region can help with determining 
appropriate values for spark.memory.fraction and spark.memory.storageFraction. 
Stage memory information can help identify which stages are most memory 
intensive, and users can look into the relevant code to determine if it can be 
optimized.

The memory metrics can be gathered by adding the current JVM used memory, 
execution memory and storage memory to the heartbeat. SparkListeners are 
modified to collect the new metrics for the executors, stages and Spark history 
log. Only interesting values (peak values per stage per executor) are recorded 
in the Spark history log, to minimize the amount of additional logging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org