Encapsulating startup scripts and jars in a single Jar file.
------------------------------------------------------------

                 Key: HADOOP-435
                 URL: http://issues.apache.org/jira/browse/HADOOP-435
             Project: Hadoop
          Issue Type: New Feature
    Affects Versions: 0.5.0
            Reporter: Benjamin Reed
         Attachments: hadoopit.patch

Currently, hadoop is a set of scripts, configurations, and jar files. It makes 
it a pain to install on compute and datanodes. It also makes it a pain to setup 
clients so that they can use hadoop. Everytime things are updated the pain 
begins again.

I suggest that we should be able to build a single Jar file that has a 
Main-Class defined with the configuration built in so that we can distribute 
that one file to nodes and clients on updates. One nice thing that I haven't 
done would be to make the jarfile downloadable from the JobTracker webpage so 
that clients can easily submit the jobs.

I currently use such a setup on my small cluster. To start the job tracker I 
used "java -jar hadoop.jar -l /tmp/log jobtracker" to submit a job I use "java 
-jar hadoop.jar jar wordcount.jar". I used the client on my linux and Mac OSX 
machines and I'll I need installed in java and the hadoop.jar file.

hadoop.jar helps with logfiles and configurations. The default of pulling the 
config files from the jar file can be overridden by specifying a config 
directory so that you can easily have machine specific configs and still have 
the same hadoop.jar on all machines.

Here are the available commands from hadoop.jar:
USAGE: hadoop [-l logdir] command
  User commands:
    dfs          run a DFS admin client
    jar          run a JAR file
    job          manipulate MapReduce jobs
    fsck         run a DFS filesystem check utility
  Runtime startup commands:
    datanode     run a DFS datanode
    jobtracker   run the MapReduce job Tracker node
    namenode     run the DFS namenode (namenode -format formats the FS)
    tasktracker  run a MapReduce task Tracker node
  HadoopLoader commands:
    buildJar     builds the HadoopLoader jar file
    conf         dump hadoop configuration

Note, I don't have the classes for hadoop streaming built into this Jar file, 
but if I had that would also be an option (it checks for needed classes before 
displaying an option). It makes it very easy for users that just write scripts 
to use hadoop straight from their machines.

I'm also attaching the start.sh and stop.sh scripts that I use. These are the 
only scripts I use to startup the daemons. They are very simple and the 
start.sh script uses the config file to figure out whether or not to start the 
jobtracker and the nameserver.

The attached patch adds the HadoopIt patch, modifies the Configuration class to 
find the config files correctly, and modifies the build to make a fully 
contained hadoop.jar. To update the configuration in a hadoop.jar you simply 
use "zip hadoop.jar hadoop-site.xml".

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to