Re: Integration with SGE

Steve Loughran Wed, 18 Feb 2009 08:25:16 -0800

Amin Astaneh wrote:

Lukáš-
Hi Amin,
I am not familiar with SGE, do you think you could tell me what didyou get
from this combination? What is the benefit of running Hadoop on SGE?
Sun Grid Engine is a distributed resource management platform forsupercomputing centers. We use it to allocate resources to asupercomputing task, such as requesting 32 processors to run aparticular simulation. This mechanism is analogous to the scheduler on amulti-user OS. What I was able to accomplish was to turn Hadoop into anas-needed service. When you submit a job request to run Hadoop as thedocumentation describes, a Hadoop cluster of arbitrary size isinstantiated depending on how many nodes were requested by generating acluster configuration specific to that job request. This allows theHadoop cluster to be deployed within the context of Gridengine, as wellas being able to coexist with other running simulations on the cluster.
To the researcher or user needing to run a mapreduce code, all they needto worry about is telling Hadoop to execute it as well as determininghow many machines should be dedicated to the task. This benefit makesHadoop very accessible to people since they don't need to worry aboutconfiguring a cluster, SGE and it's helper scripts do it for them.
As Steve Loughran accurately commented, as of now we can only run oneset of Hadoop slave processes per machine, due to the network bindingissue. That problem is mitigated by configuring SGE to spread the slavesone per machine automatically to avoid failures.

Only the Namenode and JobTracker need hard-coded/well-known portnumbers, the rest could all be done dynamically.

One thing SGE does offer over Xen-hosted images is better performancethan virtual machines, for both CPU and storage, as virtualised diskperformance can be awful, and even on the latest x86 parts, there is ameasurable hit from VM overheads.

Re: Integration with SGE

Reply via email to