Re: Integration with SGE

Amin Astaneh Wed, 18 Feb 2009 11:05:37 -0800

Dhruba-

Just did. Thanks!


-Amin

This is cool work! A convenient place to document this information is in the
hadoop wiki:

http://wiki.apache.org/hadoop/

At the bottom of this page, there is a section titled "Related Projects".
You might want to insert a link in that section.

thanka,
dhruba


On Wed, Feb 18, 2009 at 10:37 AM, Amin Astaneh <[email protected]> wrote:

Lukáš-

Well, we have a graduate student that is using our facilities for a
Masters' thesis in Map/Reduce. You guys are generating topics in computer
science research.

What do we need to do in order to get our documentation on the Hadoop
pages?

-Amin

 Thanks guys,it is good to head that Hadoop is spreading... :-)

Regards,
Lukas

On Wed, Feb 18, 2009 at 5:24 PM, Steve Loughran <[email protected]>
wrote:

Amin Astaneh wrote:

Lukáš-

Hi Amin,
I am not familiar with SGE, do you think you could tell me what did you
get
from this combination? What is the benefit of running Hadoop on SGE?

Sun Grid Engine is a distributed resource management platform for
supercomputing centers. We use it to allocate resources to a
supercomputing
task, such as requesting 32 processors to run a particular simulation.
This
mechanism is analogous to the scheduler on a multi-user OS. What I was
able
to accomplish was to turn Hadoop into an as-needed service. When you
submit
a job request to run Hadoop as the documentation describes, a Hadoop
cluster
of arbitrary size is instantiated depending on how many nodes were
requested
by generating a cluster configuration specific to that job request. This
allows the Hadoop cluster to be deployed within the context of
Gridengine,
as well as being able to coexist with other running simulations on the
cluster.

To the researcher or user needing to run a mapreduce code, all they need
to worry about is telling Hadoop to execute it as well as determining
how
many machines should be dedicated to the task. This benefit makes Hadoop
very accessible to people since they don't need to worry about
configuring a
cluster, SGE and it's helper scripts do it for them.

As Steve Loughran accurately commented, as of now we can only run one
set
of Hadoop slave processes per machine, due to the network binding issue.
That problem is mitigated by configuring SGE to spread the slaves one
per
machine automatically to avoid failures.

Only the Namenode and JobTracker need hard-coded/well-known port numbers,
the rest could all be done dynamically.

One thing SGE does offer over Xen-hosted images is better performance
than
virtual machines, for both CPU  and storage, as  virtualised disk
performance can be awful, and even on the latest x86 parts, there is a
measurable hit from VM overheads.

Re: Integration with SGE

Reply via email to