Hmm, HNG seems designed for their (Y!) own circumstance. On Fri, Jul 1, 2011 at 12:47 PM, Matei Zaharia <[email protected]> wrote: > Ted brought up some superficial differences, but if you want to understand > technical differences, there are a bunch of those as well. Mesos and Hadoop > next-gen have similar goals (more efficient resource sharing for data > centers), but they are coming at it from different angles -- HNG is currently > mainly focusing on MapReduce and aims to support other types of applications > too, while Mesos was meant to support a very diverse set of applications, > including long-running services and batch jobs (rather than only multiple > instances of MapReduce), and is in fact being used for that already. More > importantly, HNG is really two pieces -- a refactoring of MapReduce to allow > one instance of MR per application, and a resource manager called YARN that > lets these instances coordinate. We are going to support having the new MR2 > application masters run on top of Mesos instead of YARN too (and indeed the > refactoring is nice because it will enable Hadoop MapReduce to run on other > cluster scheduling systems in the future). > > In terms of the technical differences, here are some of the main ones > currently: > > - Mesos is implemented in C++ rather than Java, and has APIs in C++ and > Python in addition to Java. > > - The resource allocation models are different: HNG has a central scheduler > that supports data locality constraints, while Mesos provides "resource > offers" to let applications pick the resources they like according to other > criteria in addition to requests/filters to describe which resources you want > to be offered. Our belief is that resource offers will allow Mesos to support > a wider range of application scheduling needs, while simultaneously making > the system more scalable and highly available (minimizing the state and work > required of the master). > > - Mesos can enforce resource isolation through Linux Containers to guard > against misbehaving / greedy tasks. > > - HNG supports Kerberos authentication for users. > > - HNG can run the MR2 version of Hadoop, while Mesos can run Hadoop 0.20, > Spark and MPI. > > - There are some smaller architectural differences that may matter for some > applications, such as communication being based on message-passing in Mesos > vs periodic heartbeats in HNG, which allows Mesos to provide lower scheduling > latencies (e.g. to still be efficient if your tasks take 100ms each). > > However, overall, as Ted said, many of these differences will likely go away > as both projects add features. What will be interesting is whether some > fundamental differences in the target workloads remain, which I think is > likely to happen. For example, the main deployment of Mesos is currently to > run long-running stream processing services at Twitter, which is something > that typical Hadoop environments just don't do and that requires different > things from the cluster scheduler. I also believe we're going to see a lot of > other cluster scheduling systems besides Mesos and HNG in the future, as > people's requirements for these systems grow. There are some very challenging > problems in designing a general cluster scheduling system that even the > Google folks are still working hard on. > > Matei > > > > On Jun 30, 2011, at 6:26 PM, Edward J. Yoon wrote: > >> Thanks for your nice and quick explanation! >> >> On Fri, Jul 1, 2011 at 10:21 AM, Ted Dunning <[email protected]> wrote: >>> Technically speaking, Mesos has a less expressive model for expressing >>> resource requirements. The thesis of Mesos is that the negotiation between >>> application and scheduler can make up for this missing information. Mesos >>> was also first to "market", but Hadoop nextGen is catching up fast. The >>> MR-279 has code that works, albeit with some issues in production use. From >>> all reports, these issues are being resolved quickly as Yahoo's considerable >>> QA resources come to bear. >>> >>> Politically speaking, Mesos has a nearly inactive mailing list which, to >>> outward appearances, indicate a nearly inactive project. There is some >>> evidence that considerable activity is occurring off-list, but this is a >>> process bug in the Apache model since "if it doesn't happen on the list, it >>> doesn't happen". >>> >>> On the other side, Hadoop nextGen has the Hadoop community pretty much >>> behind it. Since HNG has the potential to breakdown some of the deadlocks >>> that have plagued the Hadoop community release process, there is >>> considerable enthusiasm for it. >>> >>> Combined, these factors make it much more likely that HNG will be the >>> dominant force in the Hadoop world. That is, more likely in my own >>> estimation. Others may differ. >>> >>> >>> On Thu, Jun 30, 2011 at 5:16 PM, Edward J. Yoon >>> <[email protected]>wrote: >>> >>>> Hi, >>>> >>>> I'm newbie, and wonder what's the main differences between Hadoop >>>> nextGen and Mesos. >>>> >>>> Thanks. >>>> -- >>>> Best Regards, Edward J. Yoon >>>> @eddieyoon >>>> >>> >> >> >> >> -- >> Best Regards, Edward J. Yoon >> @eddieyoon > >
-- Best Regards, Edward J. Yoon @eddieyoon
