[jira] [Comment Edited] (SPARK-26399) Add new stage-level REST APIs and parameters
[ https://issues.apache.org/jira/browse/SPARK-26399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17261948#comment-17261948 ] Ron Hu edited comment on SPARK-26399 at 1/23/21, 3:39 AM: -- [~angerszhuuu] and [~ron8hu] discussed a generic and consistent way for endpoint /application/\{app-id}/stages. It can be: /application/\{app-id}/stages?details=[true|false]=[ACTIVE|COMPLETE|FAILED|PENDING|SKIPPED]=[true|false]=[RUNNING|SUCCESS|FAILED|KILLED|PENDING] where * query parameter details=true is to show the detailed task information within each stage. The default value is details=false; * query parameter status can select those stages with the specified status. When status parameter is not specified, a list of all stages are generated. * query parameter withSummaries=true is to show both task summary information in percentile distribution and executor summary information in percentile distribution. The default value is withSummaries=false. * query parameter taskStatus is to show only those tasks with the specified status within their corresponding stages. This parameter is meaningful only when details=true (i.e. this parameter will be ignored when details=false). was (Author: ron8hu): [~angerszhuuu] and [~ron8hu] discussed a generic and consistent way for endpoint /application/\{app-id}/stages. It can be: /application/\{app-id}/stages?details=[true|false]=[ACTIVE|COMPLETE|FAILED|PENDING|SKIPPED]=[true|false]=[RUNNING|SUCCESS|FAILED|PENDING] where * query parameter details=true is to show the detailed task information within each stage. The default value is details=false; * query parameter status can select those stages with the specified status. When status parameter is not specified, a list of all stages are generated. * query parameter withSummaries=true is to show both task summary information in percentile distribution and executor summary information in percentile distribution. The default value is withSummaries=false. * query parameter taskStatus is to show only those tasks with the specified status within their corresponding stages. This parameter is meaningful only when details=true (i.e. this parameter will be ignored when details=false). > Add new stage-level REST APIs and parameters > > > Key: SPARK-26399 > URL: https://issues.apache.org/jira/browse/SPARK-26399 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Edward Lu >Priority: Major > Attachments: executorMetricsSummary.json, > lispark230_restapi_ex2_stages_failedTasks.json, > lispark230_restapi_ex2_stages_withSummaries.json, > stage_executorSummary_image1.png > > > Add the peak values for the metrics to the stages REST API. Also add a new > executorSummary REST API, which will return executor summary metrics for a > specified stage: > {code:java} > curl http:// server>:18080/api/v1/applicationsexecutorMetricsSummary{code} > Add parameters to the stages REST API to specify: > * filtering for task status, and returning tasks that match (for example, > FAILED tasks). > * task metric quantiles, add adding the task summary if specified > * executor metric quantiles, and adding the executor summary if specified > *. *. * > Note that the above description is too brief to be clear. [~angerszhuuu] and > [~ron8hu] discussed a generic and consistent way for endpoint > /application/\{app-id}/stages. It can be: > /application/\{app-id}/stages?details=[true|false]=[ACTIVE|COMPLETE|FAILED|PENDING|SKIPPED]=[true|false]=[RUNNING|SUCCESS|FAILED|KILLED|PENDING] > where > * query parameter details=true is to show the detailed task information > within each stage. The default value is details=false; > * query parameter status can select those stages with the specified status. > When status parameter is not specified, a list of all stages are generated. > * query parameter withSummaries=true is to show both task summary > information in percentile distribution and executor summary information in > percentile distribution. The default value is withSummaries=false. > * query parameter taskStatus is to show only those tasks with the specified > status within their corresponding stages. This parameter will be set when > details=true (i.e. this parameter will be ignored when details=false). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26399) Add new stage-level REST APIs and parameters
[ https://issues.apache.org/jira/browse/SPARK-26399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17261948#comment-17261948 ] Ron Hu edited comment on SPARK-26399 at 1/10/21, 6:22 PM: -- [~angerszhuuu] and [~ron8hu] discussed a generic and consistent way for endpoint /application/\{app-id}/stages. It can be: /application/\{app-id}/stages?details=[true|false]=[ACTIVE|COMPLETE|FAILED|PENDING|SKIPPED]=[true|false]=[RUNNING|SUCCESS|FAILED|PENDING] where * query parameter details=true is to show the detailed task information within each stage. The default value is details=false; * query parameter status can select those stages with the specified status. When status parameter is not specified, a list of all stages are generated. * query parameter withSummaries=true is to show both task summary information in percentile distribution and executor summary information in percentile distribution. The default value is withSummaries=false. * query parameter taskStatus is to show only those tasks with the specified status within their corresponding stages. This parameter is meaningful when details=true (i.e. this parameter will be ignored when details=false). was (Author: ron8hu): [~angerszhuuu] and [~ron8hu] discussed a generic and consistent way for endpoint /application/\{app-id}/stages. It can be: /application/\{app-id}/stages?details=[true|false]=[ACTIVE|COMPLETE|FAILED|PENDING|SKIPPED]=[true|false]=[RUNNING|SUCCESS|FAILED|PENDING] where * query parameter details=true is to show the detailed task information within each stage. The default value is details=false; * query parameter status can select those stages with the specified status. When status parameter is not specified, a list of all stages are generated. * query parameter withSummaries=true is to show both task summary information in percentile distribution and executor summary information in percentile distribution. The default value is withSummaries=false. * query parameter taskStatus is to show only those tasks with the specified status within their corresponding stages. This parameter will be set when details=true (i.e. this parameter will be ignored when details=false). > Add new stage-level REST APIs and parameters > > > Key: SPARK-26399 > URL: https://issues.apache.org/jira/browse/SPARK-26399 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Edward Lu >Priority: Major > Attachments: executorMetricsSummary.json, > lispark230_restapi_ex2_stages_failedTasks.json, > lispark230_restapi_ex2_stages_withSummaries.json, > stage_executorSummary_image1.png > > > Add the peak values for the metrics to the stages REST API. Also add a new > executorSummary REST API, which will return executor summary metrics for a > specified stage: > {code:java} > curl http:// server>:18080/api/v1/applicationsexecutorMetricsSummary{code} > Add parameters to the stages REST API to specify: > * filtering for task status, and returning tasks that match (for example, > FAILED tasks). > * task metric quantiles, add adding the task summary if specified > * executor metric quantiles, and adding the executor summary if specified > *. *. * > Note that the above description is too brief to be clear. [~angerszhuuu] and > [~ron8hu] discussed a generic and consistent way for endpoint > /application/\{app-id}/stages. It can be: > /application/\{app-id}/stages?details=[true|false]=[ACTIVE|COMPLETE|FAILED|PENDING|SKIPPED]=[true|false]=[RUNNING|SUCCESS|FAILED|PENDING] > where > * query parameter details=true is to show the detailed task information > within each stage. The default value is details=false; > * query parameter status can select those stages with the specified status. > When status parameter is not specified, a list of all stages are generated. > * query parameter withSummaries=true is to show both task summary > information in percentile distribution and executor summary information in > percentile distribution. The default value is withSummaries=false. > * query parameter taskStatus is to show only those tasks with the specified > status within their corresponding stages. This parameter will be set when > details=true (i.e. this parameter will be ignored when details=false). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26399) Add new stage-level REST APIs and parameters
[ https://issues.apache.org/jira/browse/SPARK-26399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17261948#comment-17261948 ] Ron Hu edited comment on SPARK-26399 at 1/10/21, 6:22 PM: -- [~angerszhuuu] and [~ron8hu] discussed a generic and consistent way for endpoint /application/\{app-id}/stages. It can be: /application/\{app-id}/stages?details=[true|false]=[ACTIVE|COMPLETE|FAILED|PENDING|SKIPPED]=[true|false]=[RUNNING|SUCCESS|FAILED|PENDING] where * query parameter details=true is to show the detailed task information within each stage. The default value is details=false; * query parameter status can select those stages with the specified status. When status parameter is not specified, a list of all stages are generated. * query parameter withSummaries=true is to show both task summary information in percentile distribution and executor summary information in percentile distribution. The default value is withSummaries=false. * query parameter taskStatus is to show only those tasks with the specified status within their corresponding stages. This parameter is meaningful only when details=true (i.e. this parameter will be ignored when details=false). was (Author: ron8hu): [~angerszhuuu] and [~ron8hu] discussed a generic and consistent way for endpoint /application/\{app-id}/stages. It can be: /application/\{app-id}/stages?details=[true|false]=[ACTIVE|COMPLETE|FAILED|PENDING|SKIPPED]=[true|false]=[RUNNING|SUCCESS|FAILED|PENDING] where * query parameter details=true is to show the detailed task information within each stage. The default value is details=false; * query parameter status can select those stages with the specified status. When status parameter is not specified, a list of all stages are generated. * query parameter withSummaries=true is to show both task summary information in percentile distribution and executor summary information in percentile distribution. The default value is withSummaries=false. * query parameter taskStatus is to show only those tasks with the specified status within their corresponding stages. This parameter is meaningful when details=true (i.e. this parameter will be ignored when details=false). > Add new stage-level REST APIs and parameters > > > Key: SPARK-26399 > URL: https://issues.apache.org/jira/browse/SPARK-26399 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Edward Lu >Priority: Major > Attachments: executorMetricsSummary.json, > lispark230_restapi_ex2_stages_failedTasks.json, > lispark230_restapi_ex2_stages_withSummaries.json, > stage_executorSummary_image1.png > > > Add the peak values for the metrics to the stages REST API. Also add a new > executorSummary REST API, which will return executor summary metrics for a > specified stage: > {code:java} > curl http:// server>:18080/api/v1/applicationsexecutorMetricsSummary{code} > Add parameters to the stages REST API to specify: > * filtering for task status, and returning tasks that match (for example, > FAILED tasks). > * task metric quantiles, add adding the task summary if specified > * executor metric quantiles, and adding the executor summary if specified > *. *. * > Note that the above description is too brief to be clear. [~angerszhuuu] and > [~ron8hu] discussed a generic and consistent way for endpoint > /application/\{app-id}/stages. It can be: > /application/\{app-id}/stages?details=[true|false]=[ACTIVE|COMPLETE|FAILED|PENDING|SKIPPED]=[true|false]=[RUNNING|SUCCESS|FAILED|PENDING] > where > * query parameter details=true is to show the detailed task information > within each stage. The default value is details=false; > * query parameter status can select those stages with the specified status. > When status parameter is not specified, a list of all stages are generated. > * query parameter withSummaries=true is to show both task summary > information in percentile distribution and executor summary information in > percentile distribution. The default value is withSummaries=false. > * query parameter taskStatus is to show only those tasks with the specified > status within their corresponding stages. This parameter will be set when > details=true (i.e. this parameter will be ignored when details=false). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26399) Add new stage-level REST APIs and parameters
[ https://issues.apache.org/jira/browse/SPARK-26399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17261948#comment-17261948 ] Ron Hu edited comment on SPARK-26399 at 1/10/21, 6:20 PM: -- [~angerszhuuu] and [~ron8hu] discussed a generic and consistent way for endpoint /application/\{app-id}/stages. It can be: /application/\{app-id}/stages?details=[true|false]=[ACTIVE|COMPLETE|FAILED|PENDING|SKIPPED]=[true|false]=[RUNNING|SUCCESS|FAILED|PENDING] where * query parameter details=true is to show the detailed task information within each stage. The default value is details=false; * query parameter status can select those stages with the specified status. When status parameter is not specified, a list of all stages are generated. * query parameter withSummaries=true is to show both task summary information in percentile distribution and executor summary information in percentile distribution. The default value is withSummaries=false. * query parameter taskStatus is to show only those tasks with the specified status within their corresponding stages. This parameter will be set when details=true (i.e. this parameter will be ignored when details=false). was (Author: ron8hu): [~angerszhuuu] and [~ron8hu] discussed a generic and consistent way for endpoint /application/\{app-id}/stages. It can be: /application/\{app-id}/stages?details=[true|false]=[ACTIVE|COMPLETE|FAILED|PENDING|SKIPPED]=[true|false]=[RUNNING|SUCCESS|FAILED|PENDING] where * query parameter details=true is to show the detailed task information within each stage. The default value is details=false; * query parameter status can select those stages with the specified status. When status parameter is not specified, a list of all stages are generated. * query parameter withSummaries=true is to show both task summary information in percentile distribution and executor summary information in percentile distribution. The default value is withSummaries=false. * query parameter taskStatus is to show only those tasks with the specified status within their corresponding stages. This parameter can be set only when details=true. > Add new stage-level REST APIs and parameters > > > Key: SPARK-26399 > URL: https://issues.apache.org/jira/browse/SPARK-26399 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Edward Lu >Priority: Major > Attachments: executorMetricsSummary.json, > lispark230_restapi_ex2_stages_failedTasks.json, > lispark230_restapi_ex2_stages_withSummaries.json, > stage_executorSummary_image1.png > > > Add the peak values for the metrics to the stages REST API. Also add a new > executorSummary REST API, which will return executor summary metrics for a > specified stage: > {code:java} > curl http:// server>:18080/api/v1/applicationsexecutorMetricsSummary{code} > Add parameters to the stages REST API to specify: > * filtering for task status, and returning tasks that match (for example, > FAILED tasks). > * task metric quantiles, add adding the task summary if specified > * executor metric quantiles, and adding the executor summary if specified > *. *. * > Note that the above description is too brief to be clear. [~angerszhuuu] and > [~ron8hu] discussed a generic and consistent way for endpoint > /application/\{app-id}/stages. It can be: > /application/\{app-id}/stages?details=[true|false]=[ACTIVE|COMPLETE|FAILED|PENDING|SKIPPED]=[true|false]=[RUNNING|SUCCESS|FAILED|PENDING] > where > * query parameter details=true is to show the detailed task information > within each stage. The default value is details=false; > * query parameter status can select those stages with the specified status. > When status parameter is not specified, a list of all stages are generated. > * query parameter withSummaries=true is to show both task summary > information in percentile distribution and executor summary information in > percentile distribution. The default value is withSummaries=false. > * query parameter taskStatus is to show only those tasks with the specified > status within their corresponding stages. This parameter will be set when > details=true (i.e. this parameter will be ignored when details=false). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26399) Add new stage-level REST APIs and parameters
[ https://issues.apache.org/jira/browse/SPARK-26399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17261948#comment-17261948 ] Ron Hu edited comment on SPARK-26399 at 1/10/21, 6:18 PM: -- [~angerszhuuu] and [~ron8hu] discussed a generic and consistent way for endpoint /application/\{app-id}/stages. It can be: /application/\{app-id}/stages?details=[true|false]=[ACTIVE|COMPLETE|FAILED|PENDING|SKIPPED]=[true|false]=[RUNNING|SUCCESS|FAILED|PENDING] where * query parameter details=true is to show the detailed task information within each stage. The default value is details=false; * query parameter status can select those stages with the specified status. When status parameter is not specified, a list of all stages are generated. * query parameter withSummaries=true is to show both task summary information in percentile distribution and executor summary information in percentile distribution. The default value is withSummaries=false. * query parameter taskStatus is to show only those tasks with the specified status within their corresponding stages. This parameter can be set only when details=true. was (Author: ron8hu): [~angerszhuuu] and [~ron8hu] discussed a generic and consistent way for endpoint /application/\{app-id}/stages. It can be: /application/\{app-id}/stages?details=[true|false]=[ACTIVE|COMPLETE|FAILED|PENDING|SKIPPED]=[true|false]=[RUNNING|SUCCESS|FAILED|PENDING] where * query parameter details=true is to show the detailed task information within each stage. The default value is details=false; * query parameter status can select those stages with the specified status; * query parameter withSummaries=true is to show both task summary information in percentile distribution and executor summary information in percentile distribution. The default value is withSummaries=false. * query parameter taskStatus is to show only those tasks with the specified status within their corresponding stages. > Add new stage-level REST APIs and parameters > > > Key: SPARK-26399 > URL: https://issues.apache.org/jira/browse/SPARK-26399 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Edward Lu >Priority: Major > Attachments: executorMetricsSummary.json, > lispark230_restapi_ex2_stages_failedTasks.json, > lispark230_restapi_ex2_stages_withSummaries.json, > stage_executorSummary_image1.png > > > Add the peak values for the metrics to the stages REST API. Also add a new > executorSummary REST API, which will return executor summary metrics for a > specified stage: > {code:java} > curl http:// server>:18080/api/v1/applicationsexecutorMetricsSummary{code} > Add parameters to the stages REST API to specify: > * filtering for task status, and returning tasks that match (for example, > FAILED tasks). > * task metric quantiles, add adding the task summary if specified > * executor metric quantiles, and adding the executor summary if specified > *. *. * > Note that the above description is too brief to be clear. [~angerszhuuu] and > [~ron8hu] discussed a generic and consistent way for endpoint > /application/\{app-id}/stages. It can be: > /application/\{app-id}/stages?details=[true|false]=[ACTIVE|COMPLETE|FAILED|PENDING|SKIPPED]=[true|false]=[RUNNING|SUCCESS|FAILED|PENDING] > where > * query parameter details=true is to show the detailed task information > within each stage. The default value is details=false; > * query parameter status can select those stages with the specified status. > When status parameter is not specified, a list of all stages are generated. > * query parameter withSummaries=true is to show both task summary > information in percentile distribution and executor summary information in > percentile distribution. The default value is withSummaries=false. > * query parameter taskStatus is to show only those tasks with the specified > status within their corresponding stages. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26399) Add new stage-level REST APIs and parameters
[ https://issues.apache.org/jira/browse/SPARK-26399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17261948#comment-17261948 ] Ron Hu edited comment on SPARK-26399 at 1/9/21, 9:25 PM: - [~angerszhuuu] and [~ron8hu] discussed a generic and consistent way for endpoint /application/\{app-id}/stages. It can be: /application/\{app-id}/stages?details=[true|false]=[ACTIVE|COMPLETE|FAILED|PENDING|SKIPPED]=[true|false]=[RUNNING|SUCCESS|FAILED|PENDING] where * query parameter details=true is to show the detailed task information within each stage. The default value is details=false; * query parameter status can select those stages with the specified status; * query parameter withSummaries=true is to show both task summary information in percentile distribution and executor summary information in percentile distribution. The default value is withSummaries=false. * query parameter taskStatus is to show only those tasks with the specified status within their corresponding stages. was (Author: ron8hu): [~angerszhuuu] and [~ron8hu] discussed a generic and consistent way for endpoint /application/\{app-id}/stages. It can be: /application/\{app-id}/stages?details=[true|false]=[ACTIVE|COMPLETE|FAILED|PENDING|SKIPPED]=[true|false]=[RUNNING|SUCCESS|FAILED|PENDING] where query parameter details=true is to show the detailed task information within each stage. The default value is details=false; query parameter status can select those stages with the specified status; query parameter withSummaries=true is to show both task summary information in percentile distribution and executor summary information in percentile distribution. The default value is withSummaries=false. taskStatus is to show only those tasks with the specified status within their corresponding stages. > Add new stage-level REST APIs and parameters > > > Key: SPARK-26399 > URL: https://issues.apache.org/jira/browse/SPARK-26399 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Edward Lu >Priority: Major > Attachments: executorMetricsSummary.json, > lispark230_restapi_ex2_stages_failedTasks.json, > lispark230_restapi_ex2_stages_withSummaries.json, > stage_executorSummary_image1.png > > > Add the peak values for the metrics to the stages REST API. Also add a new > executorSummary REST API, which will return executor summary metrics for a > specified stage: > {code:java} > curl http:// server>:18080/api/v1/applicationsexecutorMetricsSummary{code} > Add parameters to the stages REST API to specify: > * filtering for task status, and returning tasks that match (for example, > FAILED tasks). > * task metric quantiles, add adding the task summary if specified > * executor metric quantiles, and adding the executor summary if specified > *. *. * > Note that the above description is too brief to be clear. [~angerszhuuu] and > [~ron8hu] discussed a generic and consistent way for endpoint > /application/\{app-id}/stages. It can be: > /application/\{app-id}/stages?details=[true|false]=[ACTIVE|COMPLETE|FAILED|PENDING|SKIPPED]=[true|false]=[RUNNING|SUCCESS|FAILED|PENDING] > where > * query parameter details=true is to show the detailed task information > within each stage. The default value is details=false; > * query parameter status can select those stages with the specified status; > * query parameter withSummaries=true is to show both task summary > information in percentile distribution and executor summary information in > percentile distribution. The default value is withSummaries=false. > * query parameter taskStatus is to show only those tasks with the specified > status within their corresponding stages. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26399) Add new stage-level REST APIs and parameters
[ https://issues.apache.org/jira/browse/SPARK-26399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17260856#comment-17260856 ] Ron Hu edited comment on SPARK-26399 at 1/8/21, 5:18 AM: - The initial description of this jira has this statement: "filtering for task status, and returning tasks that match (for example, FAILED tasks)" To achieve the above statement, we need an new endpoint like this: /applications/[app-id]/stages?taskstatus=[FAILED|KILLED|SUCCESS] If a user specifies /applications/[app-id]/stages?taskstatus=KILLED, then we generate a json file to contain all the killed task information from all the stages. This way can help users monitor all the killed tasks. For example, when a Spark user enables speculation, he needs the information of all the killed tasks so that he can monitor the benefit/cost brought by speculation. I attach a sample json file [^lispark230_restapi_ex2_stages_failedTasks.json] which contains the failed tasks and the corresponding stages for reference. was (Author: ron8hu): The initial description of this jira has this statement: "filtering for task status, and returning tasks that match (for example, FAILED tasks)" To achieve the above statement, we need an new endpoint like this: /applications/[app-id]/stages?taskstatus=[FAILED|KILLED|SUCCESS] If a user specifies /applications/[app-id]/stages?taskstatus=KILLED, then we generate a json file to contain all the killed task information from all the stages. This way can help users monitor all the killed tasks. For example, a Spark user enables speculation, he needs the information of all the killed tasks so that he can monitor the benefit/cost brought by speculation. I attach a sample json file [^lispark230_restapi_ex2_stages_failedTasks.json] which contains the failed tasks and the corresponding stages for reference. > Add new stage-level REST APIs and parameters > > > Key: SPARK-26399 > URL: https://issues.apache.org/jira/browse/SPARK-26399 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Edward Lu >Priority: Major > Attachments: executorMetricsSummary.json, > lispark230_restapi_ex2_stages_failedTasks.json, > lispark230_restapi_ex2_stages_withSummaries.json, > stage_executorSummary_image1.png > > > Add the peak values for the metrics to the stages REST API. Also add a new > executorSummary REST API, which will return executor summary metrics for a > specified stage: > {code:java} > curl http:// server>:18080/api/v1/applicationsexecutorMetricsSummary{code} > Add parameters to the stages REST API to specify: > * filtering for task status, and returning tasks that match (for example, > FAILED tasks). > * task metric quantiles, add adding the task summary if specified > * executor metric quantiles, and adding the executor summary if specified > Note that the above description is too brief to be clear. Ron Hu added the > additional details to explain the use cases from the downstream products. > See the comments dated 1/07/2021 with a couple of sample json files. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26399) Add new stage-level REST APIs and parameters
[ https://issues.apache.org/jira/browse/SPARK-26399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17260805#comment-17260805 ] Ron Hu edited comment on SPARK-26399 at 1/7/21, 10:23 PM: -- Dr. Elephant ([https://github.com/linkedin/dr-elephant]) is a downstream open source product that utilizes Spark monitoring information so that it can advise Spark users where to optimize their configuration parameters ranging from memory usage, number of cores, etc. Because the initial description of this ticket is too brief to be clear. Let me explain the use cases for Dr. Elephant here. REST API /applications/[app-id]/stages: This useful endpoint generate a json file containing all stages for a given application. The current Spark version already provides it. In order to debug if there exists a skew issue, a downstream product also needs: - taskMetricsSummary: It includes task metric information such as executorRunTime, inputMetrics, outputMetrics, shuffleReadMetrics, etc. All in quantile distribution (0.0, 0.25, 0.5, 0.75, 1.0) for all the tasks in a given stage. The same information shows up in Web UI for a specified stage. - executorMetricsSummary: It includes executor metrics information such as number of tasks, input bytes, peak JVM memory, peak execution memory, etc. All in quantile distribution (0.0, 0.25, 0.5, 0.75, 1.0) for all the executors used in a given stage. This information has been developed by [~angerszhuuu] in the PR he submitted. We can add the above information to the the json file generated by /applications/[app-id]/stages. It may double the size of the stages endpoints file. It should be fine because the current stages json file is not that big. Here is one sample json file for stages endpoint. [^lispark230_restapi_ex2_stages_withSummaries.json] An alternative approach is to add a new REST API such as "/applications/[app-id]/stages/withSummaries". But it may need a little bit more code for a new endpoint. was (Author: ron8hu): Dr. Elephant ([https://github.com/linkedin/dr-elephant]) is a downstream open source product that utilizes Spark monitoring information so that it can advise Spark users where to optimize their configuration parameters ranging from memory usage, number of cores, etc. Because the initial description of this ticket is too brief to be clear. Let me explain the use cases for Dr. Elephant here. REST API /applications/[app-id]/stages: This useful endpoint provides a list of all stages for a given application. The current Spark version already provides it. In order to debug if there exists a skew issue, a downstream product also needs: - taskMetricsSummary: It includes task metric information such as executorRunTime, inputMetrics, outputMetrics, shuffleReadMetrics, etc. All in quantile distribution (0.0, 0.25, 0.5, 0.75, 1.0) for all the tasks in a given stage. The same information shows up in Web UI for a specified stage. - executorMetricsSummary: It includes executor metrics information such as number of tasks, input bytes, peak JVM memory, peak execution memory, etc. All in quantile distribution (0.0, 0.25, 0.5, 0.75, 1.0) for all the executors used in a given stage. This information has been developed by [~angerszhuuu] in the PR he submitted. We can add the above information to the the json file generated by /applications/[app-id]/stages. It may double the size of the stages endpoints file. It should be fine because the current stages json file is not that big. Here is one sample json file for stages endpoint. [^lispark230_restapi_ex2_stages_withSummaries.json] An alternative approach is to add a new REST API such as "/applications/[app-id]/stages/withSummaries". But it may need a little bit more code for a new endpoint. > Add new stage-level REST APIs and parameters > > > Key: SPARK-26399 > URL: https://issues.apache.org/jira/browse/SPARK-26399 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Edward Lu >Priority: Major > Attachments: executorMetricsSummary.json, > lispark230_restapi_ex2_stages_withSummaries.json, > stage_executorSummary_image1.png > > > Add the peak values for the metrics to the stages REST API. Also add a new > executorSummary REST API, which will return executor summary metrics for a > specified stage: > {code:java} > curl http:// server>:18080/api/v1/applicationsexecutorMetricsSummary{code} > Add parameters to the stages REST API to specify: > * filtering for task status, and returning tasks that match (for example, > FAILED tasks). > * task metric quantiles, add adding the task summary if specified > * executor metric quantiles, and adding the executor summary if specified --
[jira] [Comment Edited] (SPARK-26399) Add new stage-level REST APIs and parameters
[ https://issues.apache.org/jira/browse/SPARK-26399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17260805#comment-17260805 ] Ron Hu edited comment on SPARK-26399 at 1/7/21, 8:41 PM: - Dr. Elephant ([https://github.com/linkedin/dr-elephant]) is a downstream open source product that utilizes Spark monitoring information so that it can advise Spark users where to optimize their configuration parameters ranging from memory usage, number of cores, etc. Because the initial description of this ticket is too brief to be clear. Let me explain the use cases for Dr. Elephant here. REST API /applications/[app-id]/stages: This useful endpoint provides a list of all stages for a given application. The current Spark version already provides it. In order to debug if there exists a skew issue, a downstream product also needs: - taskMetricsSummary: It includes task metric information such as executorRunTime, inputMetrics, outputMetrics, shuffleReadMetrics, etc. All in quantile distribution (0.0, 0.25, 0.5, 0.75, 1.0) for all the tasks in a given stage. The same information shows up in Web UI for a specified stage. - executorMetricsSummary: It includes executor metrics information such as number of tasks, input bytes, peak JVM memory, peak execution memory, etc. All in quantile distribution (0.0, 0.25, 0.5, 0.75, 1.0) for all the executors used in a given stage. This information has been developed by [~angerszhuuu] in the PR he submitted. We can add the above information to the the json file generated by /applications/[app-id]/stages. It may double the size of the stages endpoints file. It should be fine because the current stages json file is not that big. Here is one sample json file for stages endpoint. [^lispark230_restapi_ex2_stages_withSummaries.json] An alternative approach is to add a new REST API such as "/applications/[app-id]/stages/withSummaries". But it may need a little bit more code for a new endpoint. was (Author: ron8hu): Dr. Elephant ([https://github.com/linkedin/dr-elephant]) is a downstream open source product that utilizes Spark monitoring information so that it can advise Spark users where to optimize their configuration parameters ranging from memory usage, number of cores, etc. Because the initial description of this ticket is too brief to be clear. Let me explain the use cases for Dr. Elephant here. REST API /applications/[app-id]/stages: This useful endpoint provides a list of all stages for a given application. The current Spark version already provides it. In order to debug if there exists a skew issue, a downstream product also needs: - taskMetricsSummary: It includes task metric information such as executorRunTime, inputMetrics, outputMetrics, shuffleReadMetrics, etc. All in quantile distribution (0.0, 0.25, 0.5, 0.75, 1.0) for all the tasks in a given stage. The same information shows up in Web UI for a specified stage. - executorMetricsSummary: It includes executor metrics information such as number of tasks, input bytes, peak JVM memory, peak execution memory, etc. All in quantile distribution (0.0, 0.25, 0.5, 0.75, 1.0) for all the executors used in a given stage. This information has been developed by [~angerszhuuu] in the PR he submitted. We can add the above information to the the json file generated by /applications/[app-id]/stages. It may double the size of the stages endpoints file. It should be fine because the current stages json file is not that big. Here is one sample json file "lispark230_restapi_ex2_stages_withSummaries.json" for stages endpoint. [^lispark230_restapi_ex2_stages_withSummaries.json] An alternative approach is to add a new REST API such as "/applications/[app-id]/stages/withSummaries". But it may need a little bit more code for a new endpoint. > Add new stage-level REST APIs and parameters > > > Key: SPARK-26399 > URL: https://issues.apache.org/jira/browse/SPARK-26399 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Edward Lu >Priority: Major > Attachments: executorMetricsSummary.json, > lispark230_restapi_ex2_stages_withSummaries.json, > stage_executorSummary_image1.png > > > Add the peak values for the metrics to the stages REST API. Also add a new > executorSummary REST API, which will return executor summary metrics for a > specified stage: > {code:java} > curl http:// server>:18080/api/v1/applicationsexecutorMetricsSummary{code} > Add parameters to the stages REST API to specify: > * filtering for task status, and returning tasks that match (for example, > FAILED tasks). > * task metric quantiles, add adding the task summary if specified > * executor metric quantiles, and adding the
[jira] [Comment Edited] (SPARK-26399) Add new stage-level REST APIs and parameters
[ https://issues.apache.org/jira/browse/SPARK-26399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17260805#comment-17260805 ] Ron Hu edited comment on SPARK-26399 at 1/7/21, 8:40 PM: - Dr. Elephant ([https://github.com/linkedin/dr-elephant]) is a downstream open source product that utilizes Spark monitoring information so that it can advise Spark users where to optimize their configuration parameters ranging from memory usage, number of cores, etc. Because the initial description of this ticket is too brief to be clear. Let me explain the use cases for Dr. Elephant here. REST API /applications/[app-id]/stages: This useful endpoint provides a list of all stages for a given application. The current Spark version already provides it. In order to debug if there exists a skew issue, a downstream product also needs: - taskMetricsSummary: It includes task metric information such as executorRunTime, inputMetrics, outputMetrics, shuffleReadMetrics, etc. All in quantile distribution (0.0, 0.25, 0.5, 0.75, 1.0) for all the tasks in a given stage. The same information shows up in Web UI for a specified stage. - executorMetricsSummary: It includes executor metrics information such as number of tasks, input bytes, peak JVM memory, peak execution memory, etc. All in quantile distribution (0.0, 0.25, 0.5, 0.75, 1.0) for all the executors used in a given stage. This information has been developed by [~angerszhuuu] in the PR he submitted. We can add the above information to the the json file generated by /applications/[app-id]/stages. It may double the size of the stages endpoints file. It should be fine because the current stages json file is not that big. Here is one sample json file "lispark230_restapi_ex2_stages_withSummaries.json" for stages endpoint. [^lispark230_restapi_ex2_stages_withSummaries.json] An alternative approach is to add a new REST API such as "/applications/[app-id]/stages/withSummaries". But it may need a little bit more code for a new endpoint. was (Author: ron8hu): Dr. Elephant (https://github.com/linkedin/dr-elephant) is a downstream open source product that utilizes Spark monitoring information so that it can advise Spark users where to optimize their configuration parameters ranging from memory usage, number of cores, etc. Because the initial description of this ticket is too brief to be clear. Let me explain the use cases for Dr. Elephant here. REST API /applications/[app-id]/stages: This useful endpoint provides a list of all stages for a given application. The current Spark version already provides it. In order to debug if there exists a skew issue, a downstream product also needs: - taskMetricsSummary: It includes task metric information such as executorRunTime, inputMetrics, outputMetrics, shuffleReadMetrics, etc. All in quantile distribution (0.0, 0.25, 0.5, 0.75, 1.0) for all the tasks in a given stage. The same information shows up in Web UI for a specified stage. - executorMetricsSummary: It includes executor metrics information such as number of tasks, input bytes, peak JVM memory, peak execution memory, etc. All in quantile distribution (0.0, 0.25, 0.5, 0.75, 1.0) for all the executors used in a given stage. This information has been developed by [~angerszhuuu] in the PR he submitted. We can add the above information to the the json file generated by /applications/[app-id]/stages. It may double the size of the stages endpoints file. It should be fine because the current stages json file is not that big. Here is one sample json file "lispark230_restapi_ex2_stages_withSummaries.json" for stages endpoint. An alternative approach is to add a new REST API such as "/applications/[app-id]/stages/withSummaries". But it may need a little bit more code for a new endpoint. > Add new stage-level REST APIs and parameters > > > Key: SPARK-26399 > URL: https://issues.apache.org/jira/browse/SPARK-26399 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Edward Lu >Priority: Major > Attachments: executorMetricsSummary.json, > lispark230_restapi_ex2_stages_withSummaries.json, > stage_executorSummary_image1.png > > > Add the peak values for the metrics to the stages REST API. Also add a new > executorSummary REST API, which will return executor summary metrics for a > specified stage: > {code:java} > curl http:// server>:18080/api/v1/applicationsexecutorMetricsSummary{code} > Add parameters to the stages REST API to specify: > * filtering for task status, and returning tasks that match (for example, > FAILED tasks). > * task metric quantiles, add adding the task summary if specified > * executor metric quantiles, and adding the
[jira] [Comment Edited] (SPARK-26399) Add new stage-level REST APIs and parameters
[ https://issues.apache.org/jira/browse/SPARK-26399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257844#comment-17257844 ] Ron Hu edited comment on SPARK-26399 at 1/3/21, 9:00 PM: - [~angerszhuuu] found that the "executorSummary" field already exists in stage REST API output. In the existing stage json file, the "executorSummary" field contains a list of executor metrics for all executors used for a given stage. In addition to the detailed metrics information for each executor, we also need the percentile distribution among the executors. This is because we need the percentile information in order to find out how bad a skew problem is. For example, we compute the ratio of maximal value over median value and maximal value over 75th percentile value. If the ratio of max-over-median is equal to 5, there is a skew issue. If the ratio of max-over-75th-percentile is equal to 5, then there is a really bad skew issue. In the attached web UI file !stage_executorSummary_image1.png! , you can see a sample image file of the "Summery Metrics for Executors" for a stage. Its corresponding REST API output can be something like: attach a json file here . [^executorMetricsSummary.json] Since the field name "executorSummary" already exists, we should change this REST API endpoint name. We may change it to "executorMetricsSummary". The new REST API can be: http://:18080/api/v1/applicationsexecutorMetricsSummary was (Author: ron8hu): [~angerszhuuu] found that the "executorSummary" field already exists in stage REST API output. In the existing stage json file, the "executorSummary" field contains a list of executor metrics for all executors used for a given stage. In addition to the detailed metrics information for each executor, we also need the percentile distribution among the executors. This is because we need the percentile information in order to find out how bad a skew problem is. For example, we compute the ratio of maximal value over median value and maximal value over 75th percentile value. If the ratio of max-over-median is equal to 5, there is a skew issue. If the ratio of max-over-75th-percentile is equal to 5, then there is a really bad skew issue. In the attached web UI file !stage_executorSummary_image1.png! , you can see a sample image file of the "Summery Metrics for Executors' for a stage. Its corresponding REST API output can be something like: attach a json file here . [^executorMetricsSummary.json] Since the field name "executorSummary" already exists, we should change this REST API endpoint name. We may change it to "executorMetricsSummary". The new REST API can be: http://:18080/api/v1/applicationsexecutorMetricsSummary > Add new stage-level REST APIs and parameters > > > Key: SPARK-26399 > URL: https://issues.apache.org/jira/browse/SPARK-26399 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Edward Lu >Priority: Major > Attachments: executorMetricsSummary.json, > stage_executorSummary_image1.png > > > Add the peak values for the metrics to the stages REST API. Also add a new > executorSummary REST API, which will return executor summary metrics for a > specified stage: > {code:java} > curl http:// server>:18080/api/v1/applicationsexecutorMetricsSummary{code} > Add parameters to the stages REST API to specify: > * filtering for task status, and returning tasks that match (for example, > FAILED tasks). > * task metric quantiles, add adding the task summary if specified > * executor metric quantiles, and adding the executor summary if specified -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26399) Add new stage-level REST APIs and parameters
[ https://issues.apache.org/jira/browse/SPARK-26399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257844#comment-17257844 ] Ron Hu edited comment on SPARK-26399 at 1/3/21, 7:46 PM: - [~angerszhuuu] found that the "executorSummary" field already exists in stage REST API output. In the existing stage json file, the "executorSummary" field contains a list of executor metrics for all executors used for a given stage. In addition to the detailed metrics information for each executor, we also need the percentile distribution among the executors. This is because we need the percentile information in order to find out how bad a skew problem is. For example, we compute the ratio of maximal value over median value and maximal value over 75th percentile value. If the ratio of max-over-median is equal to 5, there is a skew issue. If the ratio of max-over-75th-percentile is equal to 5, then there is a really bad skew issue. In the attached web UI file !stage_executorSummary_image1.png! , you can see a sample image file of the "Summery Metrics for Executors' for a stage. Its corresponding REST API output can be something like: attach a json file here . [^executorMetricsSummary.json] Since the field name "executorSummary" already exists, we should change this REST API endpoint name. We may change it to "executorMetricsSummary". The new REST API can be: http://:18080/api/v1/applicationsexecutorMetricsSummary was (Author: ron8hu): [~angerszhuuu] found that the "executorSummary" field already exists in stage REST API output. In the existing stage json file, the "executorSummary" field contains a list of executor metrics for all executors used for a given stage. In addition to the detailed metrics information for each executor, we also need the percentile distribution among the executors. This is because we need the percentile information in order to find out how bad a skew problem is. For example, we compute the ratio of maximal value over median value and maximal value over 75th percentile value. If the ratio of max-over-median is equal to 5, there is a skew issue. If the ratio of max-over-75th-percentile is equal to 5, then there is a really bad skew issue. In the attached file, you can see a sample image file of the "Summery Metrics for Executors' for a stage. Its corresponding REST API output can be something like: attach a json file here . [^executorMetricsSummary.json] Since the field name "executorSummary" already exists, we should change this REST API endpoint name. We may change it to "executorMetricsSummary". The new REST API can be: http://:18080/api/v1/applicationsexecutorMetricsSummary > Add new stage-level REST APIs and parameters > > > Key: SPARK-26399 > URL: https://issues.apache.org/jira/browse/SPARK-26399 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Edward Lu >Priority: Major > Attachments: executorMetricsSummary.json, > stage_executorSummary_image1.png > > > Add the peak values for the metrics to the stages REST API. Also add a new > executorSummary REST API, which will return executor summary metrics for a > specified stage: > {code:java} > curl http://:18080/api/v1/applications/ id>// attempt>/executorSummary{code} > Add parameters to the stages REST API to specify: > * filtering for task status, and returning tasks that match (for example, > FAILED tasks). > * task metric quantiles, add adding the task summary if specified > * executor metric quantiles, and adding the executor summary if specified -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26399) Add new stage-level REST APIs and parameters
[ https://issues.apache.org/jira/browse/SPARK-26399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257844#comment-17257844 ] Ron Hu edited comment on SPARK-26399 at 1/3/21, 7:45 PM: - [~angerszhuuu] found that the "executorSummary" field already exists in stage REST API output. In the existing stage json file, the "executorSummary" field contains a list of executor metrics for all executors used for a given stage. In addition to the detailed metrics information for each executor, we also need the percentile distribution among the executors. This is because we need the percentile information in order to find out how bad a skew problem is. For example, we compute the ratio of maximal value over median value and maximal value over 75th percentile value. If the ratio of max-over-median is equal to 5, there is a skew issue. If the ratio of max-over-75th-percentile is equal to 5, then there is a really bad skew issue. In the attached file, you can see a sample image file of the "Summery Metrics for Executors' for a stage. Its corresponding REST API output can be something like: attach a json file here . [^executorMetricsSummary.json] Since the field name "executorSummary" already exists, we should change this REST API endpoint name. We may change it to "executorMetricsSummary". The new REST API can be: http://:18080/api/v1/applicationsexecutorMetricsSummary was (Author: ron8hu): [~angerszhuuu] found that the "executorSummary" field already exists in stage REST API output. In the existing stage json file, the "executorSummary" field contains a list of executor metrics for all executors used for a given stage. In addition to the detailed metrics information for each executor, we also need the percentile distribution among the executors. This is because we need the percentile information in order to find out how bad a skew problem is. For example, we compute the ratio of maximal value over median value and maximal value over 75th percentile value. If the ratio of max-over-median is equal to 5, there is a skew issue. If the ratio of max-over-75th-percentile is equal to 5, then there is a really bad skew issue. In the attached file, you can see a sample image file of the "Summery Metrics for Executors' for a stage. Its corresponding REST API output can be something like: attach a json file here . Since the field name "executorSummary" already exists, we should change this REST API endpoint name. We may change it to "executorMetricsSummary". The new REST API can be: http://:18080/api/v1/applicationsexecutorMetricsSummary > Add new stage-level REST APIs and parameters > > > Key: SPARK-26399 > URL: https://issues.apache.org/jira/browse/SPARK-26399 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Edward Lu >Priority: Major > Attachments: executorMetricsSummary.json, > stage_executorSummary_image1.png > > > Add the peak values for the metrics to the stages REST API. Also add a new > executorSummary REST API, which will return executor summary metrics for a > specified stage: > {code:java} > curl http://:18080/api/v1/applications/ id>// attempt>/executorSummary{code} > Add parameters to the stages REST API to specify: > * filtering for task status, and returning tasks that match (for example, > FAILED tasks). > * task metric quantiles, add adding the task summary if specified > * executor metric quantiles, and adding the executor summary if specified -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26399) Add new stage-level REST APIs and parameters
[ https://issues.apache.org/jira/browse/SPARK-26399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257844#comment-17257844 ] Ron Hu edited comment on SPARK-26399 at 1/3/21, 7:44 PM: - [~angerszhuuu] found that the "executorSummary" field already exists in stage REST API output. In the existing stage json file, the "executorSummary" field contains a list of executor metrics for all executors used for a given stage. In addition to the detailed metrics information for each executor, we also need the percentile distribution among the executors. This is because we need the percentile information in order to find out how bad a skew problem is. For example, we compute the ratio of maximal value over median value and maximal value over 75th percentile value. If the ratio of max-over-median is equal to 5, there is a skew issue. If the ratio of max-over-75th-percentile is equal to 5, then there is a really bad skew issue. In the attached file, you can see a sample image file of the "Summery Metrics for Executors' for a stage. Its corresponding REST API output can be something like: attach a json file here . Since the field name "executorSummary" already exists, we should change this REST API endpoint name. We may change it to "executorMetricsSummary". The new REST API can be: http://:18080/api/v1/applicationsexecutorMetricsSummary was (Author: ron8hu): [~angerszhuuu] found that the "executorSummary" field already exists in stage REST API output. In the existing stage json file, the "executorSummary" field contains a list of executor metrics for all executors used for a given stage. In addition to the detailed metrics information for each executor, we also need the percentile distribution among the executors. This is because we need the percentile information in order to find out how bad a skew problem is. For example, we compute the ratio of maximal value over median value and maximal value over 75th percentile value. If the ratio of max-over-median is equal to 5, there is a skew issue. If the ratio of max-over-75th-percentile is equal to 5, then there is a really bad skew issue. In the attached file, you can see a sample image file of the "Summery Metrics for Executors' for a stage. Its corresponding REST API output can be something like: { "quantiles" : [ 0.0, 0.25, 0.5, 0.75, 1.0 ], "numTasks" : [ 1.0, 1.0, 3.0, 3.0, 4.0 ], "inputBytes" : [ 0.0, 0.0, 0.0, 0.0, 0.0 ], "inputRecords" : [ 0.0, 0.0, 0.0, 0.0, 0.0 ], "outputBytes" : [ 0.0, 0.0, 0.0, 0.0, 0.0 ], "outputRecords" : [ 0.0, 0.0, 0.0, 0.0, 0.0 ], "shuffleRead" : [ 0.0, 2.50967876E8, 7.50516665E8, 7.51114124E8, 1.001617709E9 ], "shuffleReadRecords" : [ 0.0, 740880.0, 2215608.0, 2217351.0, 2957194.0 ], "shuffleWrite" : [ 0.0, 2.3658701E8, 7.07482405E8, 7.08012783E8, 9.44322243E8 ], "shuffleWriteRecords" : [ 0.0, 726968.0, 2174281.0, 2176014.0, 2902184.0 ], "memoryBytesSpilled" : [ 0.0, 0.0, 0.0, 0.0, 0.0 ], "diskBytesSpilled" : [ 0.0, 0.0, 0.0, 0.0, 0.0 ], "peakJVMHeapMemory" : [ 2.09883992E8, 4.6213568E8, 7.5947948E8, 9.8473656E8, 9.8473656E8 ], "peakJVMOffHeapMemory" : [ 6.0829472E7, 6.1343616E7, 6.271752E7, 9.1926448E7, 9.1926448E7 ], "peakOnHeapExecutionMemory" : [ 0.0, 0.0, 0.0, 0.0, 0.0 ], "peakOffHeapExecutionMemory" : [ 0.0, 0.0, 0.0, 0.0, 0.0 ], "peakOnHeapStorageMemory" : [ 7023.0, 12537.0, 19560.0, 19560.0, 19560.0 ], "peakOffHeapStorageMemory" : [ 0.0, 0.0, 0.0, 0.0, 0.0 ], "peakOnHeapUnifiedMemory" : [ 7023.0, 12537.0, 19560.0, 19560.0, 19560.0 ], "peakOffHeapUnifiedMemory" : [ 0.0, 0.0, 0.0, 0.0, 0.0 ], "peakDirectPoolMemory" : [ 10742.0, 10865.0, 12781.0, 157182.0, 157182.0 ], "peakMappedPoolMemory" : [ 0.0, 0.0, 0.0, 0.0, 0.0 ], "peakProcessTreeJVMVMemory" : [ 8.296026112E9, 9.678606336E9, 9.684373504E9, 9.691553792E9, 9.691553792E9 ], "peakProcessTreeJVMRSSMemory" : [ 5.26491648E8, 7.03639552E8, 9.64222976E8, 1.210867712E9, 1.210867712E9 ] } Since the field name "executorSummary" already exists, we should change this REST API endpoint name. We may change it to "executorMetricsSummary". The new REST API can be: http://:18080/api/v1/applicationsexecutorMetricsSummary > Add new stage-level REST APIs and parameters > > > Key: SPARK-26399 > URL: https://issues.apache.org/jira/browse/SPARK-26399 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Edward Lu >Priority: Major > Attachments: executorMetricsSummary.json, > stage_executorSummary_image1.png > > > Add the peak values for the metrics to the stages REST API. Also add a new > executorSummary REST API, which will return executor summary metrics for a >