Zack Marsh created AMBARI-12148:
-----------------------------------

             Summary: Falcon server intermittently fails to start
                 Key: AMBARI-12148
                 URL: https://issues.apache.org/jira/browse/AMBARI-12148
             Project: Ambari
          Issue Type: Bug
         Environment: ambari-2.1.0-1249, hdp-2.3.0.0-2469 , sles11sp3
            Reporter: Zack Marsh
            Priority: Critical


The Falcon server is intermittently failing to start when starting all Hadoop 
services.

Looking at the Ambari ops log, Falcon is failing to start with the following 
output:
{code}
Traceback (most recent call last):
  File 
"/var/lib/ambari-agent/cache/common-services/FALCON/0.5.0.2.1/package/scripts/falcon_server.py",
 line 164, in <module>
    FalconServer().execute()
  File 
"/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
 line 216, in execute
    method(env)
  File 
"/var/lib/ambari-agent/cache/common-services/FALCON/0.5.0.2.1/package/scripts/falcon_server.py",
 line 46, in start
    self.configure(env)
  File 
"/var/lib/ambari-agent/cache/common-services/FALCON/0.5.0.2.1/package/scripts/falcon_server.py",
 line 41, in configure
    falcon('server', action='config')
  File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", 
line 89, in thunk
    return fn(*args, **kwargs)
  File 
"/var/lib/ambari-agent/cache/common-services/FALCON/0.5.0.2.1/package/scripts/falcon.py",
 line 141, in falcon
    source = params.local_data_mirroring_dir)
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", 
line 157, in __init__
    self.env.run()
  File 
"/usr/lib/python2.6/site-packages/resource_management/core/environment.py", 
line 152, in run
    self.run_action(resource, action)
  File 
"/usr/lib/python2.6/site-packages/resource_management/core/environment.py", 
line 118, in run_action
    provider_action()
  File 
"/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py",
 line 390, in action_create_on_execute
    self.action_delayed("create")
  File 
"/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py",
 line 387, in action_delayed
    self.get_hdfs_resource_executor().action_delayed(action_name, self)
  File 
"/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py",
 line 246, in action_delayed
    self._create_resource()
  File 
"/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py",
 line 263, in _create_resource
    self._copy_from_local_directory(self.main_resource.resource.target, 
self.main_resource.resource.source)
  File 
"/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py",
 line 271, in _copy_from_local_directory
    self._create_directory(new_target)
  File 
"/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py",
 line 280, in _create_directory
    self.util.run_command(target, 'MKDIRS', method='PUT')
  File 
"/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py",
 line 201, in run_command
    raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'curl -sS -L -w 
'%{http_code}' -X PUT 
'http://zeus1.labs.teradata.com:50070/webhdfs/v1/apps/data-mirroring/workflows?op=MKDIRS&user.name=hdfs''
 returned status_code=403. 
{
  "RemoteException": {
    "exception": "RetriableException", 
    "javaClassName": "org.apache.hadoop.ipc.RetriableException", 
    "message": "org.apache.hadoop.hdfs.server.namenode.SafeModeException: 
Cannot create directory /apps/data-mirroring/workflows. Name node is in safe 
mode.\nThe reported blocks 0 needs additional 392 blocks to reach the threshold 
0.9990 of total blocks 392.\nThe number of live datanodes 3 has reached the 
minimum number 0. Safe mode will be turned off automatically once the 
thresholds have been reached."
  }
}
{code}

This seems to bea race condition in which the Falcon Server is attempting to 
start prior to the successful start of the HDFS services.

The same error is also intermittently occurring when the HDFS Service Check is 
being executed in the "Start and Test All Services" step of the Ambari Enable 
Kerberos Wizard.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to