I wouldn't say it's designed for Yahoo! only, but it's definitely meant to solve issues they saw with large Hadoop clusters (and provides a lot of value for that).
Matei On Jul 1, 2011, at 12:51 AM, Edward J. Yoon wrote: > Hmm, HNG seems designed for their (Y!) own circumstance. > > On Fri, Jul 1, 2011 at 12:47 PM, Matei Zaharia <[email protected]> > wrote: >> Ted brought up some superficial differences, but if you want to understand >> technical differences, there are a bunch of those as well. Mesos and Hadoop >> next-gen have similar goals (more efficient resource sharing for data >> centers), but they are coming at it from different angles -- HNG is >> currently mainly focusing on MapReduce and aims to support other types of >> applications too, while Mesos was meant to support a very diverse set of >> applications, including long-running services and batch jobs (rather than >> only multiple instances of MapReduce), and is in fact being used for that >> already. More importantly, HNG is really two pieces -- a refactoring of >> MapReduce to allow one instance of MR per application, and a resource >> manager called YARN that lets these instances coordinate. We are going to >> support having the new MR2 application masters run on top of Mesos instead >> of YARN too (and indeed the refactoring is nice because it will enable >> Hadoop MapReduce to run on other cluster scheduling systems in the future). >> >> In terms of the technical differences, here are some of the main ones >> currently: >> >> - Mesos is implemented in C++ rather than Java, and has APIs in C++ and >> Python in addition to Java. >> >> - The resource allocation models are different: HNG has a central scheduler >> that supports data locality constraints, while Mesos provides "resource >> offers" to let applications pick the resources they like according to other >> criteria in addition to requests/filters to describe which resources you >> want to be offered. Our belief is that resource offers will allow Mesos to >> support a wider range of application scheduling needs, while simultaneously >> making the system more scalable and highly available (minimizing the state >> and work required of the master). >> >> - Mesos can enforce resource isolation through Linux Containers to guard >> against misbehaving / greedy tasks. >> >> - HNG supports Kerberos authentication for users. >> >> - HNG can run the MR2 version of Hadoop, while Mesos can run Hadoop 0.20, >> Spark and MPI. >> >> - There are some smaller architectural differences that may matter for some >> applications, such as communication being based on message-passing in Mesos >> vs periodic heartbeats in HNG, which allows Mesos to provide lower >> scheduling latencies (e.g. to still be efficient if your tasks take 100ms >> each). >> >> However, overall, as Ted said, many of these differences will likely go away >> as both projects add features. What will be interesting is whether some >> fundamental differences in the target workloads remain, which I think is >> likely to happen. For example, the main deployment of Mesos is currently to >> run long-running stream processing services at Twitter, which is something >> that typical Hadoop environments just don't do and that requires different >> things from the cluster scheduler. I also believe we're going to see a lot >> of other cluster scheduling systems besides Mesos and HNG in the future, as >> people's requirements for these systems grow. There are some very >> challenging problems in designing a general cluster scheduling system that >> even the Google folks are still working hard on. >> >> Matei >> >> >> >> On Jun 30, 2011, at 6:26 PM, Edward J. Yoon wrote: >> >>> Thanks for your nice and quick explanation! >>> >>> On Fri, Jul 1, 2011 at 10:21 AM, Ted Dunning <[email protected]> wrote: >>>> Technically speaking, Mesos has a less expressive model for expressing >>>> resource requirements. The thesis of Mesos is that the negotiation between >>>> application and scheduler can make up for this missing information. Mesos >>>> was also first to "market", but Hadoop nextGen is catching up fast. The >>>> MR-279 has code that works, albeit with some issues in production use. >>>> From >>>> all reports, these issues are being resolved quickly as Yahoo's >>>> considerable >>>> QA resources come to bear. >>>> >>>> Politically speaking, Mesos has a nearly inactive mailing list which, to >>>> outward appearances, indicate a nearly inactive project. There is some >>>> evidence that considerable activity is occurring off-list, but this is a >>>> process bug in the Apache model since "if it doesn't happen on the list, it >>>> doesn't happen". >>>> >>>> On the other side, Hadoop nextGen has the Hadoop community pretty much >>>> behind it. Since HNG has the potential to breakdown some of the deadlocks >>>> that have plagued the Hadoop community release process, there is >>>> considerable enthusiasm for it. >>>> >>>> Combined, these factors make it much more likely that HNG will be the >>>> dominant force in the Hadoop world. That is, more likely in my own >>>> estimation. Others may differ. >>>> >>>> >>>> On Thu, Jun 30, 2011 at 5:16 PM, Edward J. Yoon >>>> <[email protected]>wrote: >>>> >>>>> Hi, >>>>> >>>>> I'm newbie, and wonder what's the main differences between Hadoop >>>>> nextGen and Mesos. >>>>> >>>>> Thanks. >>>>> -- >>>>> Best Regards, Edward J. Yoon >>>>> @eddieyoon >>>>> >>>> >>> >>> >>> >>> -- >>> Best Regards, Edward J. Yoon >>> @eddieyoon >> >> > > > > -- > Best Regards, Edward J. Yoon > @eddieyoon
