Narayanan,

On Fri, Jul 1, 2011 at 11:28 AM, Narayanan K <knarayana...@gmail.com> wrote:
> Hi all,
>
> We are basically working on a research project and I require some help
> regarding this.

Always glad to see research work being done! What're you working on? :)

> How do I submit a mapreduce job from outside the cluster i.e from a
> different machine outside the Hadoop cluster?

If you use Java APIs, use the Job#submit(…) method and/or
JobClient.runJob(…) method.
Basically Hadoop will try to create a jar with all requisite classes
within and will push it out to the JobTracker's filesystem (HDFS, if
you run HDFS). From there on, its like a regular operation.

This even happens on the Hadoop nodes itself, so doing so from an
external place as long as that place has access to Hadoop's JT and
HDFS, should be no different at all.

If you are packing custom libraries along, don't forget to use
DistributedCache. If you are packing custom MR Java code, don't forget
to use Job#setJarByClass/JobClient#setJarByClass and other appropriate
API methods.

> If the above can be done, How can I schedule map reduce jobs to run in
> hadoop like crontab from a different machine?
> Are there any webservice APIs that I can leverage to access a hadoop cluster
> from outside and submit jobs or read/write data from HDFS.

For scheduling jobs, have a look at Oozie: http://yahoo.github.com/oozie/
It is well supported and is very useful in writing MR workflows (which
is a common requirement). You also get coordinator features and can
schedule similar to crontab functionalities.

For HDFS r/w over web, not sure of an existing web app specifically
for this purpose without limitations, but there is a contrib/thriftfs
you can leverage upon (if not writing your own webserver in Java, in
which case its as simple as using HDFS APIs).

Also have a look at the pretty mature Hue project which aims to
provide a great frontend that lets you design jobs, submit jobs,
monitor jobs and upload files or browse the filesystem (among several
other things): http://cloudera.github.com/hue/

-- 
Harsh J

Reply via email to