[ 
https://issues.apache.org/jira/browse/SLIDER-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated SLIDER-1187:
------------------------------
    Fix Version/s:     (was: Slider 1.0.0)
                   Slider 0.92

> Create app diagnostics resource with placeholder for containers (live/dead)
> ---------------------------------------------------------------------------
>
>                 Key: SLIDER-1187
>                 URL: https://issues.apache.org/jira/browse/SLIDER-1187
>             Project: Slider
>          Issue Type: Sub-task
>          Components: appmaster, client
>    Affects Versions: Slider 0.91
>            Reporter: Gour Saha
>            Assignee: Gour Saha
>             Fix For: Slider 0.92
>
>         Attachments: SLIDER-1187.001.patch, SLIDER-1187.002.patch, 
> SLIDER-1187.003.patch, SLIDER-1187.004.patch
>
>
> This is a sample JSON structure of the proposed diagnostics resource -
> {code}
> {
>   "finalStatus": "SUCCEEDED", 
>   "finalMessage": "stop command issued", 
>   "containers": [
>     {
>       "containerId": "container_e3374_1485226679409_0016_01_000004", 
>       "component": "COMMAND_LOGGER", 
>       "appVersion": "1.0.0", 
>       "state": 3, 
>       "exitCode": -1000, 
>       "diagnostics": "", 
>       "createTime": 1485285533968, 
>       "startTime": 1485285533989, 
>       "host": "cn008.l42scl.hortonworks.com", 
>       "hostURL": "http://cn008.l42scl.hortonworks.com:8042";, 
>       "logLink": 
> "http://cn007.l42scl.hortonworks.com:19888/jobhistory/logs/cn008.l42scl.hortonworks.com:45454/container_e3374_1485226679409_0016_01_000004/ctx/root";
>     }, 
>     {
>       "containerId": "container_e3374_1485226679409_0016_01_000003", 
>       "component": "COMMAND_LOGGER", 
>       "appVersion": "1.0.0", 
>       "state": 3, 
>       "exitCode": -1000, 
>       "diagnostics": "", 
>       "createTime": 1485285120456, 
>       "startTime": 1485285120723, 
>       "host": "cn005.l42scl.hortonworks.com", 
>       "hostURL": "http://cn005.l42scl.hortonworks.com:8042";, 
>       "logLink": 
> "http://cn007.l42scl.hortonworks.com:19888/jobhistory/logs/cn005.l42scl.hortonworks.com:45454/container_e3374_1485226679409_0016_01_000003/ctx/root";
>     }, 
>     {
>       "containerId": "container_e3374_1485226679409_0016_01_000002", 
>       "component": "COMMAND_LOGGER", 
>       "appVersion": "1.0.0", 
>       "state": 4, 
>       "exitCode": -100, 
>       "diagnostics": "Container released by application", 
>       "createTime": 1485285120464, 
>       "startTime": 1485285120522, 
>       "host": "cn008.l42scl.hortonworks.com", 
>       "hostURL": "http://cn008.l42scl.hortonworks.com:8042";, 
>       "logLink": 
> "http://cn007.l42scl.hortonworks.com:19888/jobhistory/logs/cn008.l42scl.hortonworks.com:45454/container_e3374_1485226679409_0016_01_000002/ctx/root";
>     }
>   ]
> }
> {code}
> API consumers will need to call _*SliderClient#actionDiagnosticContainers*_ 
> API to get the _*ApplicationDiagnostics*_ object. This object has 3 
> attributes -
> # *finalStatus* - app-level status which is empty for a running app (of type 
> _org.apache.hadoop.yarn.api.records.FinalApplicationStatus_)
> # *finalMessage* - app-level summary message which is populated after the app 
> dies
> # *containers* - a set of all currently running and all previously failed 
> containers (type _org.apache.slider.api.types.ContainerInformation_)
> Note, it also contains an additional helper method _getContainer(String 
> containerId)_ which will return the _ContainerInformation_ for a specific 
> container if the container-id is known.
> _*ContainerInformation*_ (for each running or dead container) contains 
> several attributes which gets updated as and when a container transitions 
> through various stages - like newly created, running, dead, etc. Following 
> are the attributes -
> - containerId
> - component
> - appVersion
> - released (true/false)
> - state (of type org.apache.slider.api.StateValues)
> - exitCode (of type org.apache.hadoop.yarn.api.records.ContainerExitStatus)
> - diagnostics (container level diagnostics message)
> - createTime
> - startTime
> - host
> - hostURL
> - placement
> - output (empty so don't use)
> - logLink (container log link for a live as well as a dead container)
> h6. For an app which is still RUNNING -
> _ApplicationDiagnostics_ object can be retrieved at any point in the app's 
> lifetime by calling the 
> _*SliderClient#actionDiagnosticContainers(ActionDiagnosticArgs 
> diagnosticArgs)*_ API with only the name field in _ActionDiagnosticArgs_ set 
> to the application name. It can be retrieved on the command-line by calling 
> the *diagnostics* command with the following arguments -
> {code}
> slider diagnostics --name <app-name> --containers
> {code}
> On the command-line it is dumped in JSON format.
> h6. For an app which is FAILED/KILLED -
> The _ApplicationDiagnostics_ object is set as YARN application diagnostics 
> and can be retrieved by YARN API or through *application* command line like -
> {code}
> yarn application -status <application_id>
> {code}
> Note, the _ApplicationDiagnostics_ object (in JSON format) can also be viewed 
> in RM UI of the application in the *Diagnostics:* field. 
> To retrieve using YARN Client API, this JSON string can be retrieved by 
> calling _*YarnClient#getApplicationReport(ApplicationId appId)*_ to get the 
> _ApplicationReport_ and then subsequently calling 
> _*ApplicationReport#getDiagnostics*_. This JSON string can then be easily 
> converted to the Slider _ApplicationDiagnostics_ object by calling the static 
> method _*ApplicationDiagnostics#fromJson(String json)*_.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to