[jira] [Commented] (HADOOP-7939) Improve Hadoop subcomponent integration in Hadoop 0.23

Roman Shaposhnik (Commented) (JIRA) Tue, 27 Dec 2011 11:50:55 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176288#comment-13176288
 ]


Roman Shaposhnik commented on HADOOP-7939:
------------------------------------------

@Allen,

first of all, I think there's a chicken and egg problem here. You're [somewhat] 
correct in saying that right now the state of separation between Hadoop 
components is not ideal. That said, I think it is unfair to use it as an excuse 
for NOT working on features that would help clean separation to happen down the 
road.

Personally, I'm operating under the assumption that clean separation between 
these 4 parts of Hadoop is desirable. Please let me know if you believe I'm 
mistaken.

Now, once we agree on that, the next question is implementation.  You are right 
that a cornucopia of env. variables is NOT an ideal solution. The trouble is -- 
we've already got pretty much as many of then in the current scripts with all 
sorts of differences in semantics that would prevent if straightforward 
configuration. Worse yet, we've got code like this (ApplicationConstants.java)

{noformat}
  public static final String[] APPLICATION_CLASSPATH =
      new String[] {
        "$HADOOP_CONF_DIR",
        "$HADOOP_COMMON_HOME/share/hadoop/common/*",
        "$HADOOP_COMMON_HOME/share/hadoop/common/lib/*",
        "$HADOOP_HDFS_HOME/share/hadoop/hdfs/*",
        "$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*",
        "$YARN_HOME/share/hadoop/mapreduce/*",
        "$YARN_HOME/share/hadoop/mapreduce/lib/*"
      };
{noformat}
which only makes sense in tarball deployment scenario and very limited 
packaging. The fact that 'share/hadoop' is hardcoded all over the place was one 
of the main motivations for this JIRA. If we want YARN to be framework 
agnostic, all of the class path elements from the code I quoted above need to 
be tweakable (as in -- not force me to create dummy symlinks that end with 
share/hadoop/* just so that the code is happy). I think we can agree on that 
much.

If all of the above makes sense to you, I think the final question to be 
answered is whether to go the route of env. variables or pkg-config type of 
system. Essentially what I'm suggesting here is the first step to pkg-config 
(with proposed <component>-env.sh scripts which we can rename if we want to). 
The number
of variables is somewhat large, but so is the number of deployment ascpets that 
need to be configured (e.g. I can no longer assume that the log files for
Hadoop should be in $HADOOP_HOME/logs or even /var/log/hadoop for that matter, 
etc.). If any single one of the vars jumps at you as redundant, please let me 
know.

Thanks,
Roman.
                
> Improve Hadoop subcomponent integration in Hadoop 0.23
> ------------------------------------------------------
>
>                 Key: HADOOP-7939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7939
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: build, conf, documentation, scripts
>    Affects Versions: 0.23.0
>            Reporter: Roman Shaposhnik
>            Assignee: Roman Shaposhnik
>             Fix For: 0.23.1
>
>
> h1. Introduction
> For the rest of this proposal it is assumed that the current set
> of Hadoop subcomponents is:
>  * hadoop-common
>  * hadoop-hdfs
>  * hadoop-yarn
>  * hadoop-mapreduce
> It must be noted that this is an open ended list, though. For example,
> implementations of additional frameworks on top of yarn (e.g. MPI) would
> also be considered a subcomponent.
> h1. Problem statement
> Currently there's an unfortunate coupling and hard-coding present at the
> level of launcher scripts, configuration scripts and Java implementation
> code that prevents us from treating all subcomponents of Hadoop independently
> of each other. In a lot of places it is assumed that bits and pieces
> from individual subcomponents *must* be located at predefined places
> and they can not be dynamically registered/discovered during the runtime.
> This prevents a truly flexible deployment of Hadoop 0.23. 
> h1. Proposal
> NOTE: this is NOT a proposal for redefining the layout from HADOOP-6255. 
> The goal here is to keep as much of that layout in place as possible,
> while permitting different deployment layouts.
> The aim of this proposal is to introduce the needed level of indirection and
> flexibility in order to accommodate the current assumed layout of Hadoop 
> tarball
> deployments and all the other styles of deployments as well. To this end the
> following set of environment variables needs to be uniformly used in all of
> the subcomponent's launcher scripts, configuration scripts and Java code
> (<SC> stands for a literal name of a subcomponent). These variables are
> expected to be defined by <SC>-env.sh scripts and sourcing those files is
> expected to have the desired effect of setting the environment up correctly.
>   # HADOOP_<SC>_HOME
>    ## root of the subtree in a filesystem where a subcomponent is expected to 
> be installed 
>    ## default value: $0/..
>   # HADOOP_<SC>_JARS 
>    ## a subdirectory with all of the jar files comprising subcomponent's 
> implementation 
>    ## default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)
>   # HADOOP_<SC>_EXT_JARS
>    ## a subdirectory with all of the jar files needed for extended 
> functionality of the subcomponent (nonessential for correct work of the basic 
> functionality)
>    ## default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)/ext
>   # HADOOP_<SC>_NATIVE_LIBS
>    ## a subdirectory with all the native libraries that component requires
>    ## default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)/native
>   # HADOOP_<SC>_BIN
>    ## a subdirectory with all of the launcher scripts specific to the client 
> side of the component
>    ## default value: $(HADOOP_<SC>_HOME)/bin
>   # HADOOP_<SC>_SBIN
>    ## a subdirectory with all of the launcher scripts specific to the 
> server/system side of the component
>    ## default value: $(HADOOP_<SC>_HOME)/sbin
>   # HADOOP_<SC>_LIBEXEC
>    ## a subdirectory with all of the launcher scripts that are internal to 
> the implementation and should *not* be invoked directly
>    ## default value: $(HADOOP_<SC>_HOME)/libexec
>   # HADOOP_<SC>_CONF
>    ## a subdirectory containing configuration files for a subcomponent
>    ## default value: $(HADOOP_<SC>_HOME)/conf
>   # HADOOP_<SC>_DATA
>    ## a subtree in the local filesystem for storing component's persistent 
> state
>    ## default value: $(HADOOP_<SC>_HOME)/data
>   # HADOOP_<SC>_LOG
>    ## a subdirectory for subcomponents's log files to be stored
>    ## default value: $(HADOOP_<SC>_HOME)/log
>   # HADOOP_<SC>_RUN
>    ## a subdirectory with runtime system specific information
>    ## default value: $(HADOOP_<SC>_HOME)/run
>   # HADOOP_<SC>_TMP
>    ## a subdirectory with temprorary files
>    ## default value: $(HADOOP_<SC>_HOME)/tmp

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7939) Improve Hadoop subcomponent integration in Hadoop 0.23

Reply via email to