[jira] [Commented] (HADOOP-7939) Improve Hadoop subcomponent integration in Hadoop 0.23

Arun C Murthy (Commented) (JIRA) Sun, 15 Jan 2012 01:21:31 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186458#comment-13186458
 ]


Arun C Murthy commented on HADOOP-7939:
---------------------------------------

Eric & Bruno - I urge you both to exercise restraint as I see this becoming a 
religious war of opinions (I do appreciate Bruno's apology).

----

I've been hoping we could come to a simple consensus quickly, but I'm starting 
to worry this is going to be an ongoing back-and-forth between Hadoop packaging 
and Bigtop and that both parties always leave with a bitter aftertaste. Now 
we've added YARN and I expect we'll have further issues in the future. For e.g. 
there is some chance we'll have an MPI sub-component for YARN... etc. etc.

Worse, this might not be just limited to Apache Hadoop and will repeat in lots 
of projects in the Hadoop ecosystem.

Frankly, I don't care a whole lot and I trust folks like Roman, Eric & Bruno 
are more qualified on packaging matters and I'd hate to have to mediate on an 
ongoing basis since I obviously care about having decent packaging for all the 
work I do here (and I'm sure other developers feel the same). It would mean I'd 
have to spend a lot more time caring about FHS etc. in order to argue with two 
sets of experts, something I can do without.

However, I'm coming to the conclusion, and I say this with trepidation that I 
may be interpreted wrongly, that we'll never get over these while Apache Bigtop 
keeps packaging of Hadoop projects external to the projects themselves. 

(I believe this has been brought up earlier and I confess ignorance to the 
reasons which led to marginalizing this issue. I'm happy to be educated.)

Thus, I'd strongly urge that the Apache Bigtop community to consider resolving 
these issues *once & for all* and to contribute these back to the projects for 
a couple of significant benefits:
a) The individual projects benefit from expertise & oversight of folks like 
Roman & Bruno (along with folks like Eric) so we make the right decisions 
*up-front* and not as an after-thought when we hit an issue via Bigtop.
b) We *never* have to play this go-around between Hadoop projects and Bigtop on 
an ongoing basis which invariably will lead to religious wars of opinions as 
noticed on this jira.

I say all this with utmost humility - please do not interpret this in the wrong 
manner.
                
> Improve Hadoop subcomponent integration in Hadoop 0.23
> ------------------------------------------------------
>
>                 Key: HADOOP-7939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7939
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: build, conf, documentation, scripts
>    Affects Versions: 0.23.0
>            Reporter: Roman Shaposhnik
>            Assignee: Roman Shaposhnik
>             Fix For: 0.23.1
>
>         Attachments: HADOOP-7939.patch.txt, hadoop-layout.sh
>
>
> h1. Introduction
> For the rest of this proposal it is assumed that the current set
> of Hadoop subcomponents is:
>  * hadoop-common
>  * hadoop-hdfs
>  * hadoop-yarn
>  * hadoop-mapreduce
> It must be noted that this is an open ended list, though. For example,
> implementations of additional frameworks on top of yarn (e.g. MPI) would
> also be considered a subcomponent.
> h1. Problem statement
> Currently there's an unfortunate coupling and hard-coding present at the
> level of launcher scripts, configuration scripts and Java implementation
> code that prevents us from treating all subcomponents of Hadoop independently
> of each other. In a lot of places it is assumed that bits and pieces
> from individual subcomponents *must* be located at predefined places
> and they can not be dynamically registered/discovered during the runtime.
> This prevents a truly flexible deployment of Hadoop 0.23. 
> h1. Proposal
> NOTE: this is NOT a proposal for redefining the layout from HADOOP-6255. 
> The goal here is to keep as much of that layout in place as possible,
> while permitting different deployment layouts.
> The aim of this proposal is to introduce the needed level of indirection and
> flexibility in order to accommodate the current assumed layout of Hadoop 
> tarball
> deployments and all the other styles of deployments as well. To this end the
> following set of environment variables needs to be uniformly used in all of
> the subcomponent's launcher scripts, configuration scripts and Java code
> (<SC> stands for a literal name of a subcomponent). These variables are
> expected to be defined by <SC>-env.sh scripts and sourcing those files is
> expected to have the desired effect of setting the environment up correctly.
>   # HADOOP_<SC>_HOME
>    ## root of the subtree in a filesystem where a subcomponent is expected to 
> be installed 
>    ## default value: $0/..
>   # HADOOP_<SC>_JARS 
>    ## a subdirectory with all of the jar files comprising subcomponent's 
> implementation 
>    ## default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)
>   # HADOOP_<SC>_EXT_JARS
>    ## a subdirectory with all of the jar files needed for extended 
> functionality of the subcomponent (nonessential for correct work of the basic 
> functionality)
>    ## default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)/ext
>   # HADOOP_<SC>_NATIVE_LIBS
>    ## a subdirectory with all the native libraries that component requires
>    ## default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)/native
>   # HADOOP_<SC>_BIN
>    ## a subdirectory with all of the launcher scripts specific to the client 
> side of the component
>    ## default value: $(HADOOP_<SC>_HOME)/bin
>   # HADOOP_<SC>_SBIN
>    ## a subdirectory with all of the launcher scripts specific to the 
> server/system side of the component
>    ## default value: $(HADOOP_<SC>_HOME)/sbin
>   # HADOOP_<SC>_LIBEXEC
>    ## a subdirectory with all of the launcher scripts that are internal to 
> the implementation and should *not* be invoked directly
>    ## default value: $(HADOOP_<SC>_HOME)/libexec
>   # HADOOP_<SC>_CONF
>    ## a subdirectory containing configuration files for a subcomponent
>    ## default value: $(HADOOP_<SC>_HOME)/conf
>   # HADOOP_<SC>_DATA
>    ## a subtree in the local filesystem for storing component's persistent 
> state
>    ## default value: $(HADOOP_<SC>_HOME)/data
>   # HADOOP_<SC>_LOG
>    ## a subdirectory for subcomponents's log files to be stored
>    ## default value: $(HADOOP_<SC>_HOME)/log
>   # HADOOP_<SC>_RUN
>    ## a subdirectory with runtime system specific information
>    ## default value: $(HADOOP_<SC>_HOME)/run
>   # HADOOP_<SC>_TMP
>    ## a subdirectory with temprorary files
>    ## default value: $(HADOOP_<SC>_HOME)/tmp

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7939) Improve Hadoop subcomponent integration in Hadoop 0.23

Reply via email to