[jira] [Commented] (HADOOP-7939) Improve Hadoop subcomponent integration in Hadoop 0.23

Arun C Murthy (Commented) (JIRA) Wed, 18 Jan 2012 08:15:07 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188529#comment-13188529
 ]


Arun C Murthy commented on HADOOP-7939:
---------------------------------------

Thanks for the discussion Doug. I agree a long-drawn argument isn't productive. 
However, please, indulge me a little longer (at least for my own education 
*smile*). 

----

I agree that having downstream packagers is useful and very common. However, it 
is uncommon for downstream packagers to seek changes upstream, particularly to 
start/stop scripts. They typically maintain their own i.e. *carry the burden of 
maintenance*. It would not be unreasonable for Bigtop to do the same i.e. 
maintain their own bin/hadoop etc. Not that I would prefer this.

Yes, historically packaging is done downstream, but not in Hadoop's case. We 
have had our own scripts, packaging (tarballs, rpms etc.) for a long while and 
we need to continue to support it for compatibility. 

Also, Bigtop is an Apache project, and is very different from a random 
downstream packager/distro. It seems we could do better here at the ASF by 
collaborating closer between the two communities in the ASF.

OTOH, we are currently debating adding features here which Apache Hadoop will 
never use and then we are assuming the burden of maintenance.

If the argument comes down to 'Hadoop scripts are a mess, there is no harm 
adding some more' then I have very little sympathy as much as I agree we can do 
better.

Seems to me we could eat our dogfood all the time by merging the communities 
for the 'packaging' (alone) and reduce dead-code and increase collaboration. 
Clearly Bigtop is more than just packaging i.e. it does stack validation etc. 
which belongs in a separate project.

My primary interest is to have as little 'dead' code in Hadoop as possible and 
it seems to me we are adding a fair number of variables (features) we'll never 
use in Hadoop. By having Bigtop contribute the packaging back to the project we 
could all share the burden of maintenance. Clearly, taking away features is 
always harder than adding them, and we should be careful to do so. 

Thus, it would be useful for folks in Apache Bigtop project to share why they 
feel they cannot collaborate with Apache Hadoop leading to two different 
implementations of packaging for Hadoop within the ASF. 

----

Again, I appreciate this healthy discussion.
                
> Improve Hadoop subcomponent integration in Hadoop 0.23
> ------------------------------------------------------
>
>                 Key: HADOOP-7939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7939
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: build, conf, documentation, scripts
>    Affects Versions: 0.23.0
>            Reporter: Roman Shaposhnik
>            Assignee: Roman Shaposhnik
>             Fix For: 0.23.1
>
>         Attachments: HADOOP-7939.patch.txt, hadoop-layout.sh
>
>
> h1. Introduction
> For the rest of this proposal it is assumed that the current set
> of Hadoop subcomponents is:
>  * hadoop-common
>  * hadoop-hdfs
>  * hadoop-yarn
>  * hadoop-mapreduce
> It must be noted that this is an open ended list, though. For example,
> implementations of additional frameworks on top of yarn (e.g. MPI) would
> also be considered a subcomponent.
> h1. Problem statement
> Currently there's an unfortunate coupling and hard-coding present at the
> level of launcher scripts, configuration scripts and Java implementation
> code that prevents us from treating all subcomponents of Hadoop independently
> of each other. In a lot of places it is assumed that bits and pieces
> from individual subcomponents *must* be located at predefined places
> and they can not be dynamically registered/discovered during the runtime.
> This prevents a truly flexible deployment of Hadoop 0.23. 
> h1. Proposal
> NOTE: this is NOT a proposal for redefining the layout from HADOOP-6255. 
> The goal here is to keep as much of that layout in place as possible,
> while permitting different deployment layouts.
> The aim of this proposal is to introduce the needed level of indirection and
> flexibility in order to accommodate the current assumed layout of Hadoop 
> tarball
> deployments and all the other styles of deployments as well. To this end the
> following set of environment variables needs to be uniformly used in all of
> the subcomponent's launcher scripts, configuration scripts and Java code
> (<SC> stands for a literal name of a subcomponent). These variables are
> expected to be defined by <SC>-env.sh scripts and sourcing those files is
> expected to have the desired effect of setting the environment up correctly.
>   # HADOOP_<SC>_HOME
>    ## root of the subtree in a filesystem where a subcomponent is expected to 
> be installed 
>    ## default value: $0/..
>   # HADOOP_<SC>_JARS 
>    ## a subdirectory with all of the jar files comprising subcomponent's 
> implementation 
>    ## default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)
>   # HADOOP_<SC>_EXT_JARS
>    ## a subdirectory with all of the jar files needed for extended 
> functionality of the subcomponent (nonessential for correct work of the basic 
> functionality)
>    ## default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)/ext
>   # HADOOP_<SC>_NATIVE_LIBS
>    ## a subdirectory with all the native libraries that component requires
>    ## default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)/native
>   # HADOOP_<SC>_BIN
>    ## a subdirectory with all of the launcher scripts specific to the client 
> side of the component
>    ## default value: $(HADOOP_<SC>_HOME)/bin
>   # HADOOP_<SC>_SBIN
>    ## a subdirectory with all of the launcher scripts specific to the 
> server/system side of the component
>    ## default value: $(HADOOP_<SC>_HOME)/sbin
>   # HADOOP_<SC>_LIBEXEC
>    ## a subdirectory with all of the launcher scripts that are internal to 
> the implementation and should *not* be invoked directly
>    ## default value: $(HADOOP_<SC>_HOME)/libexec
>   # HADOOP_<SC>_CONF
>    ## a subdirectory containing configuration files for a subcomponent
>    ## default value: $(HADOOP_<SC>_HOME)/conf
>   # HADOOP_<SC>_DATA
>    ## a subtree in the local filesystem for storing component's persistent 
> state
>    ## default value: $(HADOOP_<SC>_HOME)/data
>   # HADOOP_<SC>_LOG
>    ## a subdirectory for subcomponents's log files to be stored
>    ## default value: $(HADOOP_<SC>_HOME)/log
>   # HADOOP_<SC>_RUN
>    ## a subdirectory with runtime system specific information
>    ## default value: $(HADOOP_<SC>_HOME)/run
>   # HADOOP_<SC>_TMP
>    ## a subdirectory with temprorary files
>    ## default value: $(HADOOP_<SC>_HOME)/tmp

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7939) Improve Hadoop subcomponent integration in Hadoop 0.23

Reply via email to