[jira] [Commented] (HADOOP-7939) Improve Hadoop subcomponent integration in Hadoop 0.23

Todd Lipcon (Commented) (JIRA) Fri, 30 Dec 2011 15:39:55 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177834#comment-13177834
 ]


Todd Lipcon commented on HADOOP-7939:
-------------------------------------

People seem to have missed Bruno's point here:
bq. Perhaps, I'm expecting too much, but I grew accustomed to my software 
letting me tweak these types of things however I want without resorting to 
changing files in what might as well be a R/O file-system (/usr that is).

This is the main issue of using symlinks for configuration. EG if you want the 
mapred logs to go into {{/data/2/mapred/logs}}, you'd need to modify a symlink 
at /usr/share/mapred/ or /usr/local/mapred/ or whatever, which isn't generally 
considered kosher (eg rpm verification will barf). Folks expect to use a 
configuration file to pick these locations -- as we already support in a couple 
cases by allowing users to override {{HADOOP_PID_DIR}} in {{hadoop-env.sh}} for 
example.

So I think Roman's point is that we allow this "override" in some places but 
not in others, and the names for the overrides are inconsistent. So I agree 
with his point that we should make them consistently named across the 
components.

On the other hand, the "configurability" only really makes sense for 
directories like DATA, TMP, PIDS, LOGS, etc, where Hadoop will be writing data. 
For JARS and NATIVE_LIBS I'm not sure I understand the benefit. Something like 
EXT_JARS seems like a gray area - we might expect users to be able to configure 
a list of directories here in order to "wire together" multiple packages or 
something. Roman/Bruno - can you give a concrete use-case for why you'd want to 
override JARS,BIN,SBIN, etc?
                
> Improve Hadoop subcomponent integration in Hadoop 0.23
> ------------------------------------------------------
>
>                 Key: HADOOP-7939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7939
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: build, conf, documentation, scripts
>    Affects Versions: 0.23.0
>            Reporter: Roman Shaposhnik
>            Assignee: Roman Shaposhnik
>             Fix For: 0.23.1
>
>
> h1. Introduction
> For the rest of this proposal it is assumed that the current set
> of Hadoop subcomponents is:
>  * hadoop-common
>  * hadoop-hdfs
>  * hadoop-yarn
>  * hadoop-mapreduce
> It must be noted that this is an open ended list, though. For example,
> implementations of additional frameworks on top of yarn (e.g. MPI) would
> also be considered a subcomponent.
> h1. Problem statement
> Currently there's an unfortunate coupling and hard-coding present at the
> level of launcher scripts, configuration scripts and Java implementation
> code that prevents us from treating all subcomponents of Hadoop independently
> of each other. In a lot of places it is assumed that bits and pieces
> from individual subcomponents *must* be located at predefined places
> and they can not be dynamically registered/discovered during the runtime.
> This prevents a truly flexible deployment of Hadoop 0.23. 
> h1. Proposal
> NOTE: this is NOT a proposal for redefining the layout from HADOOP-6255. 
> The goal here is to keep as much of that layout in place as possible,
> while permitting different deployment layouts.
> The aim of this proposal is to introduce the needed level of indirection and
> flexibility in order to accommodate the current assumed layout of Hadoop 
> tarball
> deployments and all the other styles of deployments as well. To this end the
> following set of environment variables needs to be uniformly used in all of
> the subcomponent's launcher scripts, configuration scripts and Java code
> (<SC> stands for a literal name of a subcomponent). These variables are
> expected to be defined by <SC>-env.sh scripts and sourcing those files is
> expected to have the desired effect of setting the environment up correctly.
>   # HADOOP_<SC>_HOME
>    ## root of the subtree in a filesystem where a subcomponent is expected to 
> be installed 
>    ## default value: $0/..
>   # HADOOP_<SC>_JARS 
>    ## a subdirectory with all of the jar files comprising subcomponent's 
> implementation 
>    ## default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)
>   # HADOOP_<SC>_EXT_JARS
>    ## a subdirectory with all of the jar files needed for extended 
> functionality of the subcomponent (nonessential for correct work of the basic 
> functionality)
>    ## default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)/ext
>   # HADOOP_<SC>_NATIVE_LIBS
>    ## a subdirectory with all the native libraries that component requires
>    ## default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)/native
>   # HADOOP_<SC>_BIN
>    ## a subdirectory with all of the launcher scripts specific to the client 
> side of the component
>    ## default value: $(HADOOP_<SC>_HOME)/bin
>   # HADOOP_<SC>_SBIN
>    ## a subdirectory with all of the launcher scripts specific to the 
> server/system side of the component
>    ## default value: $(HADOOP_<SC>_HOME)/sbin
>   # HADOOP_<SC>_LIBEXEC
>    ## a subdirectory with all of the launcher scripts that are internal to 
> the implementation and should *not* be invoked directly
>    ## default value: $(HADOOP_<SC>_HOME)/libexec
>   # HADOOP_<SC>_CONF
>    ## a subdirectory containing configuration files for a subcomponent
>    ## default value: $(HADOOP_<SC>_HOME)/conf
>   # HADOOP_<SC>_DATA
>    ## a subtree in the local filesystem for storing component's persistent 
> state
>    ## default value: $(HADOOP_<SC>_HOME)/data
>   # HADOOP_<SC>_LOG
>    ## a subdirectory for subcomponents's log files to be stored
>    ## default value: $(HADOOP_<SC>_HOME)/log
>   # HADOOP_<SC>_RUN
>    ## a subdirectory with runtime system specific information
>    ## default value: $(HADOOP_<SC>_HOME)/run
>   # HADOOP_<SC>_TMP
>    ## a subdirectory with temprorary files
>    ## default value: $(HADOOP_<SC>_HOME)/tmp

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7939) Improve Hadoop subcomponent integration in Hadoop 0.23

Reply via email to