[ https://issues.apache.org/jira/browse/SPARK-26399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260856#comment-17260856 ]
Ron Hu edited comment on SPARK-26399 at 1/8/21, 5:18 AM: --------------------------------------------------------- The initial description of this jira has this statement: "filtering for task status, and returning tasks that match (for example, FAILED tasks)" To achieve the above statement, we need an new endpoint like this: /applications/[app-id]/stages?taskstatus=[FAILED|KILLED|SUCCESS] If a user specifies /applications/[app-id]/stages?taskstatus=KILLED, then we generate a json file to contain all the killed task information from all the stages. This way can help users monitor all the killed tasks. For example, when a Spark user enables speculation, he needs the information of all the killed tasks so that he can monitor the benefit/cost brought by speculation. I attach a sample json file [^lispark230_restapi_ex2_stages_failedTasks.json] which contains the failed tasks and the corresponding stages for reference. was (Author: ron8hu): The initial description of this jira has this statement: "filtering for task status, and returning tasks that match (for example, FAILED tasks)" To achieve the above statement, we need an new endpoint like this: /applications/[app-id]/stages?taskstatus=[FAILED|KILLED|SUCCESS] If a user specifies /applications/[app-id]/stages?taskstatus=KILLED, then we generate a json file to contain all the killed task information from all the stages. This way can help users monitor all the killed tasks. For example, a Spark user enables speculation, he needs the information of all the killed tasks so that he can monitor the benefit/cost brought by speculation. I attach a sample json file [^lispark230_restapi_ex2_stages_failedTasks.json] which contains the failed tasks and the corresponding stages for reference. > Add new stage-level REST APIs and parameters > -------------------------------------------- > > Key: SPARK-26399 > URL: https://issues.apache.org/jira/browse/SPARK-26399 > Project: Spark > Issue Type: Sub-task > Components: Spark Core > Affects Versions: 3.1.0 > Reporter: Edward Lu > Priority: Major > Attachments: executorMetricsSummary.json, > lispark230_restapi_ex2_stages_failedTasks.json, > lispark230_restapi_ex2_stages_withSummaries.json, > stage_executorSummary_image1.png > > > Add the peak values for the metrics to the stages REST API. Also add a new > executorSummary REST API, which will return executor summary metrics for a > specified stage: > {code:java} > curl http://<spark history > server>:18080/api/v1/applications/<application_id>/<application_attempt/stages/<stage_id>/<stage_attempt>/executorMetricsSummary{code} > Add parameters to the stages REST API to specify: > * filtering for task status, and returning tasks that match (for example, > FAILED tasks). > * task metric quantiles, add adding the task summary if specified > * executor metric quantiles, and adding the executor summary if specified > Note that the above description is too brief to be clear. Ron Hu added the > additional details to explain the use cases from the downstream products. > See the comments dated 1/07/2021 with a couple of sample json files. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org