Dmitry Lysnichenko created AMBARI-15389:
-------------------------------------------

             Summary: Intermittent YARN service check failures during and post 
EU
                 Key: AMBARI-15389
                 URL: https://issues.apache.org/jira/browse/AMBARI-15389
             Project: Ambari
          Issue Type: Bug
            Reporter: Dmitry Lysnichenko
            Assignee: Dmitry Lysnichenko
         Attachments: AMBARI-15389.patch


Build # - Ambari 2.2.1.1 - #63

Observed this issue in a couple of EU runs recently where YARN service check 
reports failure
a. In one test, the EU ran from HDP 2.3.4.0 to 2.4.0.0 and YARN service check 
reported failure during EU itself; a retry of the operation led to service 
check being successful

b. In another test post EU when YARN service check was run, it reported 
failure; afterwards when I ran it again - success

Looks like there is some corner condition which causes this issue to be hit

{code}
stderr:   /var/lib/ambari-agent/data/errors-822.txt

Traceback (most recent call last):
File 
"/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/service_check.py",
 line 142, in <module>
ServiceCheck().execute()
File 
"/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
 line 219, in execute
method(env)
File 
"/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/service_check.py",
 line 104, in service_check
user=params.smokeuser,
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 
70, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 
92, in checked_call
tries=tries, try_sleep=try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 
140, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 
291, in _call
raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of '/usr/bin/kinit -kt 
/etc/security/keytabs/smokeuser.headless.keytab ambari...@example.com; yarn 
org.apache.hadoop.yarn.applications.distributedshell.Client -shell_command ls 
-num_containers 1 -jar 
/usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar'
 returned 2. ######## Hortonworks #############
This is MOTD message, added for testing in qe infra
16/03/03 02:33:51 INFO impl.TimelineClientImpl: Timeline service address: 
http://host:8188/ws/v1/timeline/
16/03/03 02:33:51 INFO distributedshell.Client: Initializing Client
16/03/03 02:33:51 INFO distributedshell.Client: Running Client
16/03/03 02:33:51 INFO client.RMProxy: Connecting to ResourceManager at 
host-9-5.test/127.0.0.254:8050
16/03/03 02:33:53 INFO distributedshell.Client: Got Cluster metric info from 
ASM, numNodeManagers=3
16/03/03 02:33:53 INFO distributedshell.Client: Got Cluster node info from ASM
16/03/03 02:33:53 INFO distributedshell.Client: Got node report from ASM for, 
nodeId=host:25454, nodeAddresshost:8042, nodeRackName/default-rack, 
nodeNumContainers1
16/03/03 02:33:53 INFO distributedshell.Client: Got node report from ASM for, 
nodeId=host-9-5.test:25454, nodeAddresshost-9-5.test:8042, 
nodeRackName/default-rack, nodeNumContainers0
16/03/03 02:33:53 INFO distributedshell.Client: Got node report from ASM for, 
nodeId=host-9-1.test:25454, nodeAddresshost-9-1.test:8042, 
nodeRackName/default-rack, nodeNumContainers0
16/03/03 02:33:53 INFO distributedshell.Client: Queue info, queueName=default, 
queueCurrentCapacity=0.083333336, queueMaxCapacity=1.0, 
queueApplicationCount=0, queueChildQueueCount=0
16/03/03 02:33:53 INFO distributedshell.Client: User ACL Info for Queue, 
queueName=root, userAcl=SUBMIT_APPLICATIONS
16/03/03 02:33:53 INFO distributedshell.Client: User ACL Info for Queue, 
queueName=default, userAcl=SUBMIT_APPLICATIONS
16/03/03 02:33:53 INFO distributedshell.Client: Max mem capabililty of 
resources in this cluster 10240
16/03/03 02:33:53 INFO distributedshell.Client: Max virtual cores capabililty 
of resources in this cluster 1
16/03/03 02:33:53 INFO distributedshell.Client: Copy App Master jar from local 
filesystem and add to local environment
16/03/03 02:33:53 INFO distributedshell.Client: Set the environment for the 
application master
16/03/03 02:33:53 INFO distributedshell.Client: Setting up app master command
16/03/03 02:33:53 INFO distributedshell.Client: Completed setting up app master 
command {{JAVA_HOME}}/bin/java -Xmx10m 
org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster 
--container_memory 10 --container_vcores 1 --num_containers 1 --priority 0 
1><LOG_DIR>/AppMaster.stdout 2><LOG_DIR>/AppMaster.stderr
16/03/03 02:33:53 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 290 
for ambari-qa on 127.0.0.235:8020
16/03/03 02:33:53 INFO distributedshell.Client: Got dt for 
hdfs://host-9-1.test:8020; Kind: HDFS_DELEGATION_TOKEN, Service: 
127.0.0.235:8020, Ident: (HDFS_DELEGATION_TOKEN token 290 for ambari-qa)
16/03/03 02:33:53 INFO distributedshell.Client: Submitting application to ASM
16/03/03 02:33:54 INFO impl.YarnClientImpl: Submitted application 
application_1456970141888_0011
16/03/03 02:33:55 INFO distributedshell.Client: Got application report from ASM 
for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, 
appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, 
appStartTime=1456972434150, yarnAppState=ACCEPTED, 
distributedFinalState=UNDEFINED, 
appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, 
appUser=ambari-qa
16/03/03 02:33:56 INFO distributedshell.Client: Got application report from ASM 
for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, 
appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, 
appStartTime=1456972434150, yarnAppState=ACCEPTED, 
distributedFinalState=UNDEFINED, 
appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, 
appUser=ambari-qa
16/03/03 02:33:57 INFO distributedshell.Client: Got application report from ASM 
for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, 
appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, 
appStartTime=1456972434150, yarnAppState=ACCEPTED, 
distributedFinalState=UNDEFINED, 
appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, 
appUser=ambari-qa
16/03/03 02:33:58 INFO distributedshell.Client: Got application report from ASM 
for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, 
appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, 
appStartTime=1456972434150, yarnAppState=ACCEPTED, 
distributedFinalState=UNDEFINED, 
appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, 
appUser=ambari-qa
16/03/03 02:33:59 INFO distributedshell.Client: Got application report from ASM 
for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, 
appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, 
appStartTime=1456972434150, yarnAppState=ACCEPTED, 
distributedFinalState=UNDEFINED, 
appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, 
appUser=ambari-qa
16/03/03 02:34:00 INFO distributedshell.Client: Got application report from ASM 
for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, 
appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, 
appStartTime=1456972434150, yarnAppState=ACCEPTED, 
distributedFinalState=UNDEFINED, 
appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, 
appUser=ambari-qa
16/03/03 02:34:01 INFO distributedshell.Client: Got application report from ASM 
for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, 
appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, 
appStartTime=1456972434150, yarnAppState=ACCEPTED, 
distributedFinalState=UNDEFINED, 
appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, 
appUser=ambari-qa
16/03/03 02:34:02 INFO distributedshell.Client: Got application report from ASM 
for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, 
appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, 
appStartTime=1456972434150, yarnAppState=ACCEPTED, 
distributedFinalState=UNDEFINED, 
appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, 
appUser=ambari-qa
16/03/03 02:34:03 INFO distributedshell.Client: Got application report from ASM 
for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, 
appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, 
appStartTime=1456972434150, yarnAppState=ACCEPTED, 
distributedFinalState=UNDEFINED, 
appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, 
appUser=ambari-qa
16/03/03 02:34:04 INFO distributedshell.Client: Got application report from ASM 
for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, 
appDiagnostics=, appMasterHost=host-9-1/127.0.0.235, appQueue=default, 
appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=RUNNING, 
distributedFinalState=UNDEFINED, 
appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, 
appUser=ambari-qa
16/03/03 02:34:05 INFO distributedshell.Client: Got application report from ASM 
for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, 
appDiagnostics=, appMasterHost=host-9-1/127.0.0.235, appQueue=default, 
appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=RUNNING, 
distributedFinalState=UNDEFINED, 
appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, 
appUser=ambari-qa
16/03/03 02:34:06 INFO distributedshell.Client: Got application report from ASM 
for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, 
appDiagnostics=, appMasterHost=host-9-1/127.0.0.235, appQueue=default, 
appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=RUNNING, 
distributedFinalState=UNDEFINED, 
appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, 
appUser=ambari-qa
16/03/03 02:34:07 INFO distributedshell.Client: Got application report from ASM 
for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, 
appDiagnostics=, appMasterHost=host-9-1/127.0.0.235, appQueue=default, 
appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=RUNNING, 
distributedFinalState=UNDEFINED, 
appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, 
appUser=ambari-qa
16/03/03 02:34:08 INFO distributedshell.Client: Got application report from ASM 
for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, 
appDiagnostics=, appMasterHost=host-9-1/127.0.0.235, appQueue=default, 
appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=FINISHED, 
distributedFinalState=FAILED, 
appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, 
appUser=ambari-qa
16/03/03 02:34:08 INFO distributedshell.Client: Application did finished 
unsuccessfully. YarnState=FINISHED, DSFinalStatus=FAILED. Breaking monitoring 
loop
16/03/03 02:34:08 ERROR distributedshell.Client: Application failed to complete 
successfully
stdout:   /var/lib/ambari-agent/data/output-822.txt

2016-03-03 02:33:47,974 - Using hadoop conf dir: 
/usr/hdp/current/hadoop-client/conf
2016-03-03 02:33:48,013 - Using hadoop conf dir: 
/usr/hdp/current/hadoop-client/conf
2016-03-03 02:33:48,018 - checked_call['/usr/bin/kinit -kt 
/etc/security/keytabs/smokeuser.headless.keytab ambari...@example.com; yarn 
org.apache.hadoop.yarn.applications.distributedshell.Client -shell_command ls 
-num_containers 1 -jar 
/usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar']
 {'path': '/usr/sbin:/sbin:/usr/local/bin:/bin:/usr/bin', 'user': 'ambari-qa'}
{code}







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to