Hadoop startup script has a race condition : this causes failures in datanodes 
status and stop commands
-------------------------------------------------------------------------------------------------------

                 Key: HADOOP-7822
                 URL: https://issues.apache.org/jira/browse/HADOOP-7822
             Project: Hadoop Common
          Issue Type: Bug
            Reporter: Rahul Jain


The symptoms are the following:

a) start-all.sh is able to start both hadoop dfs and map-reduce processes, 
assuming same grid nodes are used for dfs and map-reduce
b) stop-all.sh stops map-reduce but fails to stop dfs processes (datanode tasks 
on grid nodes)  
    Instead, the warning message 'no datanode to stop' is seen for all data 
nodes.
c) The 'pid' files for datanode processes do not exist therefore the only way 
to stop datanode processes is to manually execute kill commands.


The root cause of the issue appears to be in hadoop startup scripts. 
start-all.sh is really two parts:

1. start-dfs.sh : Start namenode and datanodes

2. start-mapred.sh: Jobtracker and task trackers.

In this case, running start-dfs.sh did as expected and created the pid files 
for different datanodes. However, start-mapred.sh script did end up forcing 
another rsync from master to slaves, effectively wiping out the pid files 
stored under "pid" directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to