[
https://issues.apache.org/jira/browse/HADOOP-435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doug Cutting updated HADOOP-435:
--------------------------------
Fix Version/s: (was: 0.12.0)
Status: Open (was: Patch Available)
I think it could be useful to have an executable-jar-based version of Hadoop,
and Java-based top-level command dispatch. However I have a few concerns with
this patch as it stands.
1. I don't like the name 'HadoopIt'. I'd prefer something descriptive, like
"hadoop-exe".
2. It would be best if this used ToolBase and CLI, as recommended for all
Hadoop command-line programs, for command parsing, error reporting, etc.
3. There are some unexplained changes in StatusHttpServer.
4. The formatting is non-standard for Hadoop, using tabs instead of 2 spaces
for indentation.
5. This doesn't completely replace the shell scripts. For example, when
starting daemons, it doesn't check whether such a daemon is already running.
It doesn't look for appropriate native libraries. Etc. Perhaps some of the
script logic can be moved into Java, but this may be difficult or impossible in
some cases. This suggests that instead we might consider moving as much as
possible from scripts into Java (e.g., command dispatch) but not attempt to
move everything. Other successful Java systems (e.g., Tomcat & Ant) use
scripts to bootstrap, so I think that's an acceptable solution.
> Encapsulating startup scripts and jars in a single Jar file.
> ------------------------------------------------------------
>
> Key: HADOOP-435
> URL: https://issues.apache.org/jira/browse/HADOOP-435
> Project: Hadoop
> Issue Type: New Feature
> Affects Versions: 0.12.0
> Reporter: Benjamin Reed
> Attachments: hadoopit.patch, hadoopit.patch, hadoopit.patch,
> start.sh, stop.sh
>
>
> Currently, hadoop is a set of scripts, configurations, and jar files. It
> makes it a pain to install on compute and datanodes. It also makes it a pain
> to setup clients so that they can use hadoop. Everytime things are updated
> the pain begins again.
> I suggest that we should be able to build a single Jar file that has a
> Main-Class defined with the configuration built in so that we can distribute
> that one file to nodes and clients on updates. One nice thing that I haven't
> done would be to make the jarfile downloadable from the JobTracker webpage so
> that clients can easily submit the jobs.
> I currently use such a setup on my small cluster. To start the job tracker I
> used "java -jar hadoop.jar -l /tmp/log jobtracker" to submit a job I use
> "java -jar hadoop.jar jar wordcount.jar". I used the client on my linux and
> Mac OSX machines and I'll I need installed in java and the hadoop.jar file.
> hadoop.jar helps with logfiles and configurations. The default of pulling the
> config files from the jar file can be overridden by specifying a config
> directory so that you can easily have machine specific configs and still have
> the same hadoop.jar on all machines.
> Here are the available commands from hadoop.jar:
> USAGE: hadoop [-l logdir] command
> User commands:
> dfs run a DFS admin client
> jar run a JAR file
> job manipulate MapReduce jobs
> fsck run a DFS filesystem check utility
> Runtime startup commands:
> datanode run a DFS datanode
> jobtracker run the MapReduce job Tracker node
> namenode run the DFS namenode (namenode -format formats the FS)
> tasktracker run a MapReduce task Tracker node
> HadoopLoader commands:
> buildJar builds the HadoopLoader jar file
> conf dump hadoop configuration
> Note, I don't have the classes for hadoop streaming built into this Jar file,
> but if I had that would also be an option (it checks for needed classes
> before displaying an option). It makes it very easy for users that just write
> scripts to use hadoop straight from their machines.
> I'm also attaching the start.sh and stop.sh scripts that I use. These are the
> only scripts I use to startup the daemons. They are very simple and the
> start.sh script uses the config file to figure out whether or not to start
> the jobtracker and the nameserver.
> The attached patch adds the HadoopIt patch, modifies the Configuration class
> to find the config files correctly, and modifies the build to make a fully
> contained hadoop.jar. To update the configuration in a hadoop.jar you simply
> use "zip hadoop.jar hadoop-site.xml".
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.