Ted brought up some superficial differences, but if you want to understand 
technical differences, there are a bunch of those as well. Mesos and Hadoop 
next-gen have similar goals (more efficient resource sharing for data centers), 
but they are coming at it from different angles -- HNG is currently mainly 
focusing on MapReduce and aims to support other types of applications too, 
while Mesos was meant to support a very diverse set of applications, including 
long-running services and batch jobs (rather than only multiple instances of 
MapReduce), and is in fact being used for that already. More importantly, HNG 
is really two pieces -- a refactoring of MapReduce to allow one instance of MR 
per application, and a resource manager called YARN that lets these instances 
coordinate. We are going to support having the new MR2 application masters run 
on top of Mesos instead of YARN too (and indeed the refactoring is nice because 
it will enable Hadoop MapReduce to run on other cluster scheduling systems in 
the future).

In terms of the technical differences, here are some of the main ones currently:

- Mesos is implemented in C++ rather than Java, and has APIs in C++ and Python 
in addition to Java.

- The resource allocation models are different: HNG has a central scheduler 
that supports data locality constraints, while Mesos provides "resource offers" 
to let applications pick the resources they like according to other criteria in 
addition to requests/filters to describe which resources you want to be 
offered. Our belief is that resource offers will allow Mesos to support a wider 
range of application scheduling needs, while simultaneously making the system 
more scalable and highly available (minimizing the state and work required of 
the master).

- Mesos can enforce resource isolation through Linux Containers to guard 
against misbehaving / greedy tasks.

- HNG supports Kerberos authentication for users.

- HNG can run the MR2 version of Hadoop, while Mesos can run Hadoop 0.20, Spark 
and MPI.

- There are some smaller architectural differences that may matter for some 
applications, such as communication being based on message-passing in Mesos vs 
periodic heartbeats in HNG, which allows Mesos to provide lower scheduling 
latencies (e.g. to still be efficient if your tasks take 100ms each).

However, overall, as Ted said, many of these differences will likely go away as 
both projects add features. What will be interesting is whether some 
fundamental differences in the target workloads remain, which I think is likely 
to happen. For example, the main deployment of Mesos is currently to run 
long-running stream processing services at Twitter, which is something that 
typical Hadoop environments just don't do and that requires different things 
from the cluster scheduler. I also believe we're going to see a lot of other 
cluster scheduling systems besides Mesos and HNG in the future, as people's 
requirements for these systems grow. There are some very challenging problems 
in designing a general cluster scheduling system that even the Google folks are 
still working hard on.

Matei



On Jun 30, 2011, at 6:26 PM, Edward J. Yoon wrote:

> Thanks for your nice and quick explanation!
> 
> On Fri, Jul 1, 2011 at 10:21 AM, Ted Dunning <[email protected]> wrote:
>> Technically speaking, Mesos has a less expressive model for expressing
>> resource requirements.  The thesis of Mesos is that the negotiation between
>> application and scheduler can make up for this missing information.  Mesos
>> was also first to "market", but Hadoop nextGen is catching up fast.  The
>> MR-279 has code that works, albeit with some issues in production use.  From
>> all reports, these issues are being resolved quickly as Yahoo's considerable
>> QA resources come to bear.
>> 
>> Politically speaking, Mesos has a nearly inactive mailing list which, to
>> outward appearances, indicate a nearly inactive project.  There is some
>> evidence that considerable activity is occurring off-list, but this is a
>> process bug in the Apache model since "if it doesn't happen on the list, it
>> doesn't happen".
>> 
>> On the other side, Hadoop nextGen has the Hadoop community pretty much
>> behind it.  Since HNG has the potential to breakdown some of the deadlocks
>> that have plagued the Hadoop community release process, there is
>> considerable enthusiasm for it.
>> 
>> Combined, these factors make it much more likely that HNG will be the
>> dominant force in the Hadoop world.  That is, more likely in my own
>> estimation.  Others may differ.
>> 
>> 
>> On Thu, Jun 30, 2011 at 5:16 PM, Edward J. Yoon <[email protected]>wrote:
>> 
>>> Hi,
>>> 
>>> I'm newbie, and wonder what's the main differences between Hadoop
>>> nextGen and Mesos.
>>> 
>>> Thanks.
>>> --
>>> Best Regards, Edward J. Yoon
>>> @eddieyoon
>>> 
>> 
> 
> 
> 
> -- 
> Best Regards, Edward J. Yoon
> @eddieyoon

Reply via email to