[ 
https://issues.apache.org/jira/browse/SLIDER-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15894919#comment-15894919
 ] 

Gour Saha commented on SLIDER-1209:
-----------------------------------

Thanks for reviewing [~billie.rinaldi]. I tested it manually in my cluster and 
looks ok for few cases. I did not get a chance to simulate all the scenarios 
covering all the enum values.

One of the apps which was gracefully stopped via the stop command has the 
following exitReason in diagnostics -
{code}
{
  "finalStatus": "SUCCEEDED", 
  "finalMessage": "stop command issued", 
  "exitReason": "STOP_COMMAND_ISSUED", 
  "containers": [
    {
      "containerId": "container_e3378_1488324757330_0011_01_000002", 
      "component": "LLAP", 
      "state": 4, 
      "exitCode": 0, 
      "diagnostics": "Application stop triggered", 
      "createTime": 1488568441199, 
      "startTime": 1488568441272, 
      "completionTime": 1488568686173, 
      "host": "host5.example.com", 
      "hostURL": "http://host5.example.com:8042";, 
      "logLink": 
"http://host7.example.com:19888/jobhistory/logs/host5.example.com:45454/container_e3378_1488324757330_0011_01_000002/ctx/root";
    },
.
.
}
{code}

Another one where I simulated a failure (by manually killing the app 
containers) where the app ultimately dies has following exitReason in 
diagnostics -
{code}
{
  "finalStatus": "FAILED", 
  "finalMessage": "Unstable Application Instance : - failed with component LLAP 
failed 'recently' 2 times (2 in startup); threshold is 1 - last failure: 
Failure container_e3378_1488324757330_0009_01_000002 on host host6.example.com 
(0): 
http://host7.example.com:19888/jobhistory/logs/host6.example.com:45454/container_e3378_1488324757330_0009_01_000002/ctx/root";,
 
  "exitReason": "SLIDER_AM_ERROR", 
  "containers": [
    {
      "containerId": "container_e3378_1488324757330_0009_01_000007", 
      "component": "LLAP", 
      "state": 4, 
      "exitCode": 0, 
      "createTime": 1488556767038, 
      "startTime": 1488556767113, 
      "completionTime": 1488556818069, 
      "host": "host9.example.com", 
      "hostURL": "http://host9.example.com:8042";, 
      "logLink": 
"http://host7.example.com:19888/jobhistory/logs/host9.example.com:45454/container_e3378_1488324757330_0009_01_000007/ctx/root";
    }, 
    {
      "containerId": "container_e3378_1488324757330_0009_01_000002", 
      "component": "LLAP", 
      "state": 4, 
      "exitCode": 0, 
      "createTime": 1488556767048, 
      "startTime": 1488556767244, 
      "completionTime": 1488556819070, 
      "host": "host6.example.com", 
      "hostURL": "http://host6.example.com:8042";, 
      "logLink": 
"http://host7.example.com:19888/jobhistory/logs/host6.example.com:45454/container_e3378_1488324757330_0009_01_000002/ctx/root";
    }
  ], 
  "recentFailedContainers": [
    "container_e3378_1488324757330_0009_01_000007", 
    "container_e3378_1488324757330_0009_01_000002"
  ]
}
{code}

I am trying to add some tests for this patch now.

> Provide information on whether a slider app was killed / stopped via a request
> ------------------------------------------------------------------------------
>
>                 Key: SLIDER-1209
>                 URL: https://issues.apache.org/jira/browse/SLIDER-1209
>             Project: Slider
>          Issue Type: Sub-task
>          Components: appmaster, client
>            Reporter: Siddharth Seth
>            Assignee: Gour Saha
>             Fix For: Slider 1.0.0
>
>         Attachments: SLIDER-1209.01.patch
>
>
> I am adding a new enum SliderExitReason with the high level reason for an 
> application failure.
> For most of the cases it is difficult to decipher if the Slider app failed 
> due to an application error. This gap can be bridged a little better when we 
> get to SLIDER-1208.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to