On May 28, 2015, at 9:36 AM, Sangjin Lee <sj...@apache.org> wrote:

> Hi folks,
> 
> I noticed this while setting up a cluster based on the current trunk. It
> appears that setting HADOOP_HOME is now done much later (in
> hadoop_finalize) than branch-2. Importantly this is set *after*
> hadoop-env.sh (or yarn-env.sh) is invoked.
> 
> In our version of hadoop-env.sh, we have used $HADOOP_HOME to define some
> more variables, but it appears that we can no longer rely on the
> HADOOP_HOME value in our *-env.sh customization. Is this an intended change
> in the recent shell script refactoring? What is the right thing to use in
> hadoop-env.sh for the location of hadoop?

        a) HADOOP_HOME was deprecated on Unix systems as part of (IIRC) 0.21.  
HADOOP_PREFIX was its replacement.  (No, I never understood the reasoning for 
this either.)  Past 0.21, it was never safe to rely upon HADOOP_HOME in 
*-env.sh files unless it is set prior to running the shell commands.

        b) That said, functionality-wise, HADOP_HOME is being set in pretty 
much the same place in the code flow.  *-env.sh has already been processed in 
both branch-2 and trunk by the time HADOOP_HOME is configured.  trunk only 
configures HADOOP_HOME for backward compatibility.  The rest of the code uses 
HADOOP_PREFIX as expected and very very early on the lifecycle.  

        What you are likely seeing is the result of a bug fix:  trunk doesn’t 
reprocess *-env.sh files when using the shin commands whereas branch-2 does it 
several times over. (This is also one of the reasons why Java command line 
options are duplicated too.)  So it likely worked for you because of this 
broken behavior.

        In my mind, it is a better practice to configure 
HADOOP_HOME/HADOOP_PREFIX outside of the *-env.sh files (e.g., /etc/profile.d 
on Linux) so that one can use them for PATH, etc.  That should guarantee 
expected behavior.




Reply via email to