[
https://issues.apache.org/jira/browse/HADOOP-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491448
]
Doug Cutting commented on HADOOP-435:
-------------------------------------
bq. Once we have everything in a single jar we can deal with incorporating in
the scripts as a separate issue.
I'll repeat my question, stated more precisely: is there a reason why we can't
or shouldn't switch bin/hadoop to invoke Java on the HadoopExe class,
eliminating the big if/else there that's duplicated by the logic in
HadoopExe.java. There's a strong reason to do this: replicated logic makes
code hard to maintain. Is there a strong reason not to?
> Encapsulating startup scripts and jars in a single Jar file.
> ------------------------------------------------------------
>
> Key: HADOOP-435
> URL: https://issues.apache.org/jira/browse/HADOOP-435
> Project: Hadoop
> Issue Type: New Feature
> Affects Versions: 0.12.1
> Reporter: Benjamin Reed
> Fix For: 0.13.0
>
> Attachments: hadoop-exe.patch, hadoop-exe.patch, hadoop-exe.patch,
> hadoop-exe.patch, hadoopit.patch, hadoopit.patch, hadoopit.patch, start.sh,
> stop.sh
>
>
> Currently, hadoop is a set of scripts, configurations, and jar files. It
> makes it a pain to install on compute and datanodes. It also makes it a pain
> to setup clients so that they can use hadoop. Everytime things are updated
> the pain begins again.
> I suggest that we should be able to build a single Jar file that has a
> Main-Class defined with the configuration built in so that we can distribute
> that one file to nodes and clients on updates. One nice thing that I haven't
> done would be to make the jarfile downloadable from the JobTracker webpage so
> that clients can easily submit the jobs.
> I currently use such a setup on my small cluster. To start the job tracker I
> used "java -jar hadoop.jar -l /tmp/log jobtracker" to submit a job I use
> "java -jar hadoop.jar jar wordcount.jar". I used the client on my linux and
> Mac OSX machines and I'll I need installed in java and the hadoop.jar file.
> hadoop.jar helps with logfiles and configurations. The default of pulling the
> config files from the jar file can be overridden by specifying a config
> directory so that you can easily have machine specific configs and still have
> the same hadoop.jar on all machines.
> Here are the available commands from hadoop.jar:
> USAGE: hadoop [-l logdir] command
> User commands:
> dfs run a DFS admin client
> jar run a JAR file
> job manipulate MapReduce jobs
> fsck run a DFS filesystem check utility
> Runtime startup commands:
> datanode run a DFS datanode
> jobtracker run the MapReduce job Tracker node
> namenode run the DFS namenode (namenode -format formats the FS)
> tasktracker run a MapReduce task Tracker node
> HadoopLoader commands:
> buildJar builds the HadoopLoader jar file
> conf dump hadoop configuration
> Note, I don't have the classes for hadoop streaming built into this Jar file,
> but if I had that would also be an option (it checks for needed classes
> before displaying an option). It makes it very easy for users that just write
> scripts to use hadoop straight from their machines.
> I'm also attaching the start.sh and stop.sh scripts that I use. These are the
> only scripts I use to startup the daemons. They are very simple and the
> start.sh script uses the config file to figure out whether or not to start
> the jobtracker and the nameserver.
> The attached patch adds the HadoopIt patch, modifies the Configuration class
> to find the config files correctly, and modifies the build to make a fully
> contained hadoop.jar. To update the configuration in a hadoop.jar you simply
> use "zip hadoop.jar hadoop-site.xml".
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.