[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics
[ https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442379#comment-16442379 ] assia ydroudj commented on SPARK-23206: --- [~elu], thank you for the shared dos, it works for me Is there a final PR to get the executor metrics ? > Additional Memory Tuning Metrics > > > Key: SPARK-23206 > URL: https://issues.apache.org/jira/browse/SPARK-23206 > Project: Spark > Issue Type: Umbrella > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Edwina Lu >Priority: Major > Attachments: ExecutorsTab.png, ExecutorsTab2.png, > MemoryTuningMetricsDesignDoc.pdf, SPARK-23206 Design Doc.pdf, StageTab.png > > > At LinkedIn, we have multiple clusters, running thousands of Spark > applications, and these numbers are growing rapidly. We need to ensure that > these Spark applications are well tuned – cluster resources, including > memory, should be used efficiently so that the cluster can support running > more applications concurrently, and applications should run quickly and > reliably. > Currently there is limited visibility into how much memory executors are > using, and users are guessing numbers for executor and driver memory sizing. > These estimates are often much larger than needed, leading to memory wastage. > Examining the metrics for one cluster for a month, the average percentage of > used executor memory (max JVM used memory across executors / > spark.executor.memory) is 35%, leading to an average of 591GB unused memory > per application (number of executors * (spark.executor.memory - max JVM used > memory)). Spark has multiple memory regions (user memory, execution memory, > storage memory, and overhead memory), and to understand how memory is being > used and fine-tune allocation between regions, it would be useful to have > information about how much memory is being used for the different regions. > To improve visibility into memory usage for the driver and executors and > different memory regions, the following additional memory metrics can be be > tracked for each executor and driver: > * JVM used memory: the JVM heap size for the executor/driver. > * Execution memory: memory used for computation in shuffles, joins, sorts > and aggregations. > * Storage memory: memory used caching and propagating internal data across > the cluster. > * Unified memory: sum of execution and storage memory. > The peak values for each memory metric can be tracked for each executor, and > also per stage. This information can be shown in the Spark UI and the REST > APIs. Information for peak JVM used memory can help with determining > appropriate values for spark.executor.memory and spark.driver.memory, and > information about the unified memory region can help with determining > appropriate values for spark.memory.fraction and > spark.memory.storageFraction. Stage memory information can help identify > which stages are most memory intensive, and users can look into the relevant > code to determine if it can be optimized. > The memory metrics can be gathered by adding the current JVM used memory, > execution memory and storage memory to the heartbeat. SparkListeners are > modified to collect the new metrics for the executors, stages and Spark > history log. Only interesting values (peak values per stage per executor) are > recorded in the Spark history log, to minimize the amount of additional > logging. > We have attached our design documentation with this ticket and would like to > receive feedback from the community for this proposal. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23429) Add executor memory metrics to heartbeat and expose in executors REST API
[ https://issues.apache.org/jira/browse/SPARK-23429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16408723#comment-16408723 ] assia ydroudj commented on SPARK-23429: --- Hi [~elu], Still interesting to get this values... can you please provide how to do it? thanks. > Add executor memory metrics to heartbeat and expose in executors REST API > - > > Key: SPARK-23429 > URL: https://issues.apache.org/jira/browse/SPARK-23429 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Edwina Lu >Priority: Major > > Add new executor level memory metrics ( jvmUsedMemory, executionMemory, > storageMemory, and unifiedMemory), and expose these via the executors REST > API. This information will help provide insight into how executor and driver > JVM memory is used, and for the different memory regions. It can be used to > help determine good values for spark.executor.memory, spark.driver.memory, > spark.memory.fraction, and spark.memory.storageFraction. > Add an ExecutorMetrics class, with jvmUsedMemory, executionMemory, and > storageMemory. This will track the memory usage at the executor level. The > new ExecutorMetrics will be sent by executors to the driver as part of the > Heartbeat. A heartbeat will be added for the driver as well, to collect these > metrics for the driver. > Modify the EventLoggingListener to log ExecutorMetricsUpdate events if there > is a new peak value for one of the memory metrics for an executor and stage. > Only the ExecutorMetrics will be logged, and not the TaskMetrics, to minimize > additional logging. Analysis on a set of sample applications showed an > increase of 0.25% in the size of the Spark history log, with this approach. > Modify the AppStatusListener to collect snapshots of peak values for each > memory metric. Each snapshot has the time, jvmUsedMemory, executionMemory and > storageMemory, and list of active stages. > Add the new memory metrics (snapshots of peak values for each memory metric) > to the executors REST API. > This is a subtask for SPARK-23206. Please refer to the design doc for that > ticket for more details. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23738) Memory usage for executors
assia ydroudj created SPARK-23738: - Summary: Memory usage for executors Key: SPARK-23738 URL: https://issues.apache.org/jira/browse/SPARK-23738 Project: Spark Issue Type: Question Components: Spark Core, Spark Submit Affects Versions: 2.1.0 Reporter: assia ydroudj Hi, I'm running a spark cluster with 3 nodes (one as master+ worker) and 2 other workers. each worker has one executor. Then, I execute the pagerank example implemented in spark. I need to gather the memory usage by each executor while running the application and gather them into a file to further analysis. How can I do it please? One idea in my mind is to get the PID of executor ans driver processes and then, use a linux command line to get this information..is it right? please, guide me -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21157) Report Total Memory Used by Spark Executors
[ https://issues.apache.org/jira/browse/SPARK-21157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389683#comment-16389683 ] assia ydroudj commented on SPARK-21157: --- Is there a PR for this please? > Report Total Memory Used by Spark Executors > --- > > Key: SPARK-21157 > URL: https://issues.apache.org/jira/browse/SPARK-21157 > Project: Spark > Issue Type: Improvement > Components: Input/Output >Affects Versions: 2.1.1 >Reporter: Jose Soltren >Priority: Major > Attachments: TotalMemoryReportingDesignDoc.pdf > > > Building on some of the core ideas of SPARK-9103, this JIRA proposes tracking > total memory used by Spark executors, and a means of broadcasting, > aggregating, and reporting memory usage data in the Spark UI. > Here, "total memory used" refers to memory usage that is visible outside of > Spark, to an external observer such as YARN, Mesos, or the operating system. > The goal of this enhancement is to give Spark users more information about > how Spark clusters are using memory. Total memory will include non-Spark JVM > memory and all off-heap memory. > Please consult the attached design document for further details. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics
[ https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389644#comment-16389644 ] assia ydroudj commented on SPARK-23206: --- [~elu] , thanks I ll be for wait! I have another simple question, is how to get the PID java process lunched when an executor starts? I got master and workers pid but i m interesting by pid executor! > Additional Memory Tuning Metrics > > > Key: SPARK-23206 > URL: https://issues.apache.org/jira/browse/SPARK-23206 > Project: Spark > Issue Type: Umbrella > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Edwina Lu >Priority: Major > Attachments: ExecutorsTab.png, ExecutorsTab2.png, > MemoryTuningMetricsDesignDoc.pdf, StageTab.png > > > At LinkedIn, we have multiple clusters, running thousands of Spark > applications, and these numbers are growing rapidly. We need to ensure that > these Spark applications are well tuned – cluster resources, including > memory, should be used efficiently so that the cluster can support running > more applications concurrently, and applications should run quickly and > reliably. > Currently there is limited visibility into how much memory executors are > using, and users are guessing numbers for executor and driver memory sizing. > These estimates are often much larger than needed, leading to memory wastage. > Examining the metrics for one cluster for a month, the average percentage of > used executor memory (max JVM used memory across executors / > spark.executor.memory) is 35%, leading to an average of 591GB unused memory > per application (number of executors * (spark.executor.memory - max JVM used > memory)). Spark has multiple memory regions (user memory, execution memory, > storage memory, and overhead memory), and to understand how memory is being > used and fine-tune allocation between regions, it would be useful to have > information about how much memory is being used for the different regions. > To improve visibility into memory usage for the driver and executors and > different memory regions, the following additional memory metrics can be be > tracked for each executor and driver: > * JVM used memory: the JVM heap size for the executor/driver. > * Execution memory: memory used for computation in shuffles, joins, sorts > and aggregations. > * Storage memory: memory used caching and propagating internal data across > the cluster. > * Unified memory: sum of execution and storage memory. > The peak values for each memory metric can be tracked for each executor, and > also per stage. This information can be shown in the Spark UI and the REST > APIs. Information for peak JVM used memory can help with determining > appropriate values for spark.executor.memory and spark.driver.memory, and > information about the unified memory region can help with determining > appropriate values for spark.memory.fraction and > spark.memory.storageFraction. Stage memory information can help identify > which stages are most memory intensive, and users can look into the relevant > code to determine if it can be optimized. > The memory metrics can be gathered by adding the current JVM used memory, > execution memory and storage memory to the heartbeat. SparkListeners are > modified to collect the new metrics for the executors, stages and Spark > history log. Only interesting values (peak values per stage per executor) are > recorded in the Spark history log, to minimize the amount of additional > logging. > We have attached our design documentation with this ticket and would like to > receive feedback from the community for this proposal. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics
[ https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385044#comment-16385044 ] assia ydroudj commented on SPARK-23206: --- Edwina Lu, Thank you I use actually Spark 2.1 and I need to get peak values of storage, execution, jvm memory. When can you submit the PR please? > Additional Memory Tuning Metrics > > > Key: SPARK-23206 > URL: https://issues.apache.org/jira/browse/SPARK-23206 > Project: Spark > Issue Type: Umbrella > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Edwina Lu >Priority: Major > Attachments: ExecutorsTab.png, ExecutorsTab2.png, > MemoryTuningMetricsDesignDoc.pdf, StageTab.png > > > At LinkedIn, we have multiple clusters, running thousands of Spark > applications, and these numbers are growing rapidly. We need to ensure that > these Spark applications are well tuned – cluster resources, including > memory, should be used efficiently so that the cluster can support running > more applications concurrently, and applications should run quickly and > reliably. > Currently there is limited visibility into how much memory executors are > using, and users are guessing numbers for executor and driver memory sizing. > These estimates are often much larger than needed, leading to memory wastage. > Examining the metrics for one cluster for a month, the average percentage of > used executor memory (max JVM used memory across executors / > spark.executor.memory) is 35%, leading to an average of 591GB unused memory > per application (number of executors * (spark.executor.memory - max JVM used > memory)). Spark has multiple memory regions (user memory, execution memory, > storage memory, and overhead memory), and to understand how memory is being > used and fine-tune allocation between regions, it would be useful to have > information about how much memory is being used for the different regions. > To improve visibility into memory usage for the driver and executors and > different memory regions, the following additional memory metrics can be be > tracked for each executor and driver: > * JVM used memory: the JVM heap size for the executor/driver. > * Execution memory: memory used for computation in shuffles, joins, sorts > and aggregations. > * Storage memory: memory used caching and propagating internal data across > the cluster. > * Unified memory: sum of execution and storage memory. > The peak values for each memory metric can be tracked for each executor, and > also per stage. This information can be shown in the Spark UI and the REST > APIs. Information for peak JVM used memory can help with determining > appropriate values for spark.executor.memory and spark.driver.memory, and > information about the unified memory region can help with determining > appropriate values for spark.memory.fraction and > spark.memory.storageFraction. Stage memory information can help identify > which stages are most memory intensive, and users can look into the relevant > code to determine if it can be optimized. > The memory metrics can be gathered by adding the current JVM used memory, > execution memory and storage memory to the heartbeat. SparkListeners are > modified to collect the new metrics for the executors, stages and Spark > history log. Only interesting values (peak values per stage per executor) are > recorded in the Spark history log, to minimize the amount of additional > logging. > We have attached our design documentation with this ticket and would like to > receive feedback from the community for this proposal. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics
[ https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376655#comment-16376655 ] assia ydroudj commented on SPARK-23206: --- Hi. Where can I find the code of this PR to clone it on my machine please? I want to get this different memory metrics of my application? thanks > Additional Memory Tuning Metrics > > > Key: SPARK-23206 > URL: https://issues.apache.org/jira/browse/SPARK-23206 > Project: Spark > Issue Type: Umbrella > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Edwina Lu >Priority: Major > Attachments: ExecutorsTab.png, ExecutorsTab2.png, > MemoryTuningMetricsDesignDoc.pdf, StageTab.png > > > At LinkedIn, we have multiple clusters, running thousands of Spark > applications, and these numbers are growing rapidly. We need to ensure that > these Spark applications are well tuned – cluster resources, including > memory, should be used efficiently so that the cluster can support running > more applications concurrently, and applications should run quickly and > reliably. > Currently there is limited visibility into how much memory executors are > using, and users are guessing numbers for executor and driver memory sizing. > These estimates are often much larger than needed, leading to memory wastage. > Examining the metrics for one cluster for a month, the average percentage of > used executor memory (max JVM used memory across executors / > spark.executor.memory) is 35%, leading to an average of 591GB unused memory > per application (number of executors * (spark.executor.memory - max JVM used > memory)). Spark has multiple memory regions (user memory, execution memory, > storage memory, and overhead memory), and to understand how memory is being > used and fine-tune allocation between regions, it would be useful to have > information about how much memory is being used for the different regions. > To improve visibility into memory usage for the driver and executors and > different memory regions, the following additional memory metrics can be be > tracked for each executor and driver: > * JVM used memory: the JVM heap size for the executor/driver. > * Execution memory: memory used for computation in shuffles, joins, sorts > and aggregations. > * Storage memory: memory used caching and propagating internal data across > the cluster. > * Unified memory: sum of execution and storage memory. > The peak values for each memory metric can be tracked for each executor, and > also per stage. This information can be shown in the Spark UI and the REST > APIs. Information for peak JVM used memory can help with determining > appropriate values for spark.executor.memory and spark.driver.memory, and > information about the unified memory region can help with determining > appropriate values for spark.memory.fraction and > spark.memory.storageFraction. Stage memory information can help identify > which stages are most memory intensive, and users can look into the relevant > code to determine if it can be optimized. > The memory metrics can be gathered by adding the current JVM used memory, > execution memory and storage memory to the heartbeat. SparkListeners are > modified to collect the new metrics for the executors, stages and Spark > history log. Only interesting values (peak values per stage per executor) are > recorded in the Spark history log, to minimize the amount of additional > logging. > We have attached our design documentation with this ticket and would like to > receive feedback from the community for this proposal. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21157) Report Total Memory Used by Spark Executors
[ https://issues.apache.org/jira/browse/SPARK-21157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372789#comment-16372789 ] assia ydroudj commented on SPARK-21157: --- this paragraph is well interesting for me: "Perhaps we could show a trend line of memory usage allowing a user to see at which stage boundary memory usage increased." Is it done please how can we get memory reporting values gathered in a json file or other formats to further exploitation? thanks > Report Total Memory Used by Spark Executors > --- > > Key: SPARK-21157 > URL: https://issues.apache.org/jira/browse/SPARK-21157 > Project: Spark > Issue Type: Improvement > Components: Input/Output >Affects Versions: 2.1.1 >Reporter: Jose Soltren >Priority: Major > Attachments: TotalMemoryReportingDesignDoc.pdf > > > Building on some of the core ideas of SPARK-9103, this JIRA proposes tracking > total memory used by Spark executors, and a means of broadcasting, > aggregating, and reporting memory usage data in the Spark UI. > Here, "total memory used" refers to memory usage that is visible outside of > Spark, to an external observer such as YARN, Mesos, or the operating system. > The goal of this enhancement is to give Spark users more information about > how Spark clusters are using memory. Total memory will include non-Spark JVM > memory and all off-heap memory. > Please consult the attached design document for further details. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21157) Report Total Memory Used by Spark Executors
[ https://issues.apache.org/jira/browse/SPARK-21157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320293#comment-16320293 ] assia ydroudj commented on SPARK-21157: --- I m beginner in apache spark and have installed a prebuilt distribution of apache spark with hadoop. I look to get the consumption or the usage of memory while running the example PageRank implemented within spark. I have my cluster standalone mode with 1 maser and 4 workers (Virtual machines) I have tried external tools like ganglia and graphite but they give the memory usage at resource or system level (more general) but what i need exactly is "to track the behavior of the memory (Storage, execution) while running the algorithm does it means, memory usage for a spark application-ID ". Is there anyway to get it into text-file for further exploitation? Please help me on this, Thanks > Report Total Memory Used by Spark Executors > --- > > Key: SPARK-21157 > URL: https://issues.apache.org/jira/browse/SPARK-21157 > Project: Spark > Issue Type: Improvement > Components: Input/Output >Affects Versions: 2.1.1 >Reporter: Jose Soltren > Attachments: TotalMemoryReportingDesignDoc.pdf > > > Building on some of the core ideas of SPARK-9103, this JIRA proposes tracking > total memory used by Spark executors, and a means of broadcasting, > aggregating, and reporting memory usage data in the Spark UI. > Here, "total memory used" refers to memory usage that is visible outside of > Spark, to an external observer such as YARN, Mesos, or the operating system. > The goal of this enhancement is to give Spark users more information about > how Spark clusters are using memory. Total memory will include non-Spark JVM > memory and all off-heap memory. > Please consult the attached design document for further details. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org