Re: Deficiency in Hadoop

Daniel Templeton Thu, 11 Nov 2010 07:50:47 -0800

If the only thing that you're running is Hadoop, it's probably not worthit today. The big win for using Grid Engine with Hadoop is that it letsyou consolidate your Hadoop cluster onto the same resources as yourother workloads, like MPI, batch, whatever. If all you're doing isHadoop, then that isn't an issue. The accounting piece is nice, butthere are probably other ways to solve that problem that would be lessinvasive.

The other piece where Grid Engine brings value is the scheduler. Hadoophas a decent scheduler these days, but the scheduler in Grid Engine hashad two decades of improvements and tuning put into it. (And I meanactual decades, not man-decades.) If you want things like advancereservation, starvation prevention, fine-grained resource quotas,fine-grained preemption, complex fair-share, deep awareness ofheterogeneous resource pools, etc, then Grid Engine might be worthconsidering, even if all you do is Hadoop.

Incidentally, Grid Engine also helps with the problem on not havingredundant JobTrackers. Every Hadoop job running under Grid Engine getsits own JobTracker.

Going forward, Grid Engine will continue to expand its Hadoop support.There are a couple of other "big ticket" issues with running Hadoop inan enterprise IT environment that Grid Engine seems to be well suited tosolving.

Just FYI, I am the product manager at Oracle for the Grid Engineproduct, and I wrote the Hadoop integration as the last thing I didbefore leaving engineering. What I've written above is my (obviouslybiased) view of things. I would love to hear feedback from anyone whohas either looked at the integration or simply takes issue with anythingI said above. I know there are at least two customers out there withGrid Engine managing their Hadoop workloads. I'd love to find others.


Daniel

-------- Original Message --------
Subject: Deficiency in Hadoop
Date: Thu, 11 Nov 2010 16:32:30 +0530
From: Adarsh Sharma <adarsh.sha...@orkash.com>
Reply-To: common-user@hadoop.apache.org
To: common-user@hadoop.apache.org

Dear all,

Does anyone have an experience on working Hadoop Integration with SGE (
Sun Grid Engine ).
It is open -source too ( sge-6.2u5 ).
Did SGE really overcomes some of the deficiencies of Hadoop.
According to a article :-

Instead, to set the stage, let's talk about what Hadoop doesn't do so
well. I currently see two important deficiencies in Hadoop: it doesn't
play well with others, and it has no real accounting framework. Pretty
much every customer I've seen running Hadoop does it on a dedicated
cluster. Why? Because the tasktrackers assume they own the machines on
which they run. If there's anything on the cluster other than Hadoop,
it's in direct competition with Hadoop. That wouldn't be such a big deal
if Hadoop clusters didn't tend to be so huge. Folks are dedicating
hundreds, thousands, or even tens of thousands of machines to their
Hadoop applications. That's a lot of hardware to be walled off for a
single purpose. Are those machines really being used? You may not be
able to tell. You can monitor state in the moment, and you can grep
through log files to find out about past usage (Gah!), but there's no
historical accounting capability there.

So I want to know that is it worthful to use SGE with Hadoop in
Production Cluster or not.
Please share your views.

Thanks in Advance
Adarsh Sharma

Re: Deficiency in Hadoop

Reply via email to