[ https://issues.apache.org/jira/browse/SLIDER-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15894919#comment-15894919 ]
Gour Saha commented on SLIDER-1209: ----------------------------------- Thanks for reviewing [~billie.rinaldi]. I tested it manually in my cluster and looks ok for few cases. I did not get a chance to simulate all the scenarios covering all the enum values. One of the apps which was gracefully stopped via the stop command has the following exitReason in diagnostics - {code} { "finalStatus": "SUCCEEDED", "finalMessage": "stop command issued", "exitReason": "STOP_COMMAND_ISSUED", "containers": [ { "containerId": "container_e3378_1488324757330_0011_01_000002", "component": "LLAP", "state": 4, "exitCode": 0, "diagnostics": "Application stop triggered", "createTime": 1488568441199, "startTime": 1488568441272, "completionTime": 1488568686173, "host": "host5.example.com", "hostURL": "http://host5.example.com:8042", "logLink": "http://host7.example.com:19888/jobhistory/logs/host5.example.com:45454/container_e3378_1488324757330_0011_01_000002/ctx/root" }, . . } {code} Another one where I simulated a failure (by manually killing the app containers) where the app ultimately dies has following exitReason in diagnostics - {code} { "finalStatus": "FAILED", "finalMessage": "Unstable Application Instance : - failed with component LLAP failed 'recently' 2 times (2 in startup); threshold is 1 - last failure: Failure container_e3378_1488324757330_0009_01_000002 on host host6.example.com (0): http://host7.example.com:19888/jobhistory/logs/host6.example.com:45454/container_e3378_1488324757330_0009_01_000002/ctx/root", "exitReason": "SLIDER_AM_ERROR", "containers": [ { "containerId": "container_e3378_1488324757330_0009_01_000007", "component": "LLAP", "state": 4, "exitCode": 0, "createTime": 1488556767038, "startTime": 1488556767113, "completionTime": 1488556818069, "host": "host9.example.com", "hostURL": "http://host9.example.com:8042", "logLink": "http://host7.example.com:19888/jobhistory/logs/host9.example.com:45454/container_e3378_1488324757330_0009_01_000007/ctx/root" }, { "containerId": "container_e3378_1488324757330_0009_01_000002", "component": "LLAP", "state": 4, "exitCode": 0, "createTime": 1488556767048, "startTime": 1488556767244, "completionTime": 1488556819070, "host": "host6.example.com", "hostURL": "http://host6.example.com:8042", "logLink": "http://host7.example.com:19888/jobhistory/logs/host6.example.com:45454/container_e3378_1488324757330_0009_01_000002/ctx/root" } ], "recentFailedContainers": [ "container_e3378_1488324757330_0009_01_000007", "container_e3378_1488324757330_0009_01_000002" ] } {code} I am trying to add some tests for this patch now. > Provide information on whether a slider app was killed / stopped via a request > ------------------------------------------------------------------------------ > > Key: SLIDER-1209 > URL: https://issues.apache.org/jira/browse/SLIDER-1209 > Project: Slider > Issue Type: Sub-task > Components: appmaster, client > Reporter: Siddharth Seth > Assignee: Gour Saha > Fix For: Slider 1.0.0 > > Attachments: SLIDER-1209.01.patch > > > I am adding a new enum SliderExitReason with the high level reason for an > application failure. > For most of the cases it is difficult to decipher if the Slider app failed > due to an application error. This gap can be bridged a little better when we > get to SLIDER-1208. -- This message was sent by Atlassian JIRA (v6.3.15#6346)