Here's another silly question. Mesos plans to add HNG? or will be supported only pure Map/Reduce?
On Fri, Jul 1, 2011 at 2:15 PM, Ted Dunning <[email protected]> wrote: > Also, both projects are changing in terms of what they do and what they > intend to do. > > For instance, support for long running processes and alternative execution > models other than map-reduce is an explicit goal for Yarn. > > This illustrates how hard it is for anybody to compare systems. Typically, > any given person knows much more about one system than the other leading to > many comparison points that are only half true (that half being the one with > better information). This isn't remediable without collaborative discussion > between (differently) informed speakers. > > > On Thu, Jun 30, 2011 at 10:10 PM, Edward J. Yoon <[email protected]>wrote: > >> Understood. >> >> On Fri, Jul 1, 2011 at 1:59 PM, Matei Zaharia <[email protected]> >> wrote: >> > I wouldn't say it's designed for Yahoo! only, but it's definitely meant >> to solve issues they saw with large Hadoop clusters (and provides a lot of >> value for that). >> > >> > Matei >> > >> > On Jul 1, 2011, at 12:51 AM, Edward J. Yoon wrote: >> > >> >> Hmm, HNG seems designed for their (Y!) own circumstance. >> >> >> >> On Fri, Jul 1, 2011 at 12:47 PM, Matei Zaharia <[email protected]> >> wrote: >> >>> Ted brought up some superficial differences, but if you want to >> understand technical differences, there are a bunch of those as well. Mesos >> and Hadoop next-gen have similar goals (more efficient resource sharing for >> data centers), but they are coming at it from different angles -- HNG is >> currently mainly focusing on MapReduce and aims to support other types of >> applications too, while Mesos was meant to support a very diverse set of >> applications, including long-running services and batch jobs (rather than >> only multiple instances of MapReduce), and is in fact being used for that >> already. More importantly, HNG is really two pieces -- a refactoring of >> MapReduce to allow one instance of MR per application, and a resource >> manager called YARN that lets these instances coordinate. We are going to >> support having the new MR2 application masters run on top of Mesos instead >> of YARN too (and indeed the refactoring is nice because it will enable >> Hadoop MapReduce to run on other cluster scheduling systems in the future). >> >>> >> >>> In terms of the technical differences, here are some of the main ones >> currently: >> >>> >> >>> - Mesos is implemented in C++ rather than Java, and has APIs in C++ and >> Python in addition to Java. >> >>> >> >>> - The resource allocation models are different: HNG has a central >> scheduler that supports data locality constraints, while Mesos provides >> "resource offers" to let applications pick the resources they like according >> to other criteria in addition to requests/filters to describe which >> resources you want to be offered. Our belief is that resource offers will >> allow Mesos to support a wider range of application scheduling needs, while >> simultaneously making the system more scalable and highly available >> (minimizing the state and work required of the master). >> >>> >> >>> - Mesos can enforce resource isolation through Linux Containers to >> guard against misbehaving / greedy tasks. >> >>> >> >>> - HNG supports Kerberos authentication for users. >> >>> >> >>> - HNG can run the MR2 version of Hadoop, while Mesos can run Hadoop >> 0.20, Spark and MPI. >> >>> >> >>> - There are some smaller architectural differences that may matter for >> some applications, such as communication being based on message-passing in >> Mesos vs periodic heartbeats in HNG, which allows Mesos to provide lower >> scheduling latencies (e.g. to still be efficient if your tasks take 100ms >> each). >> >>> >> >>> However, overall, as Ted said, many of these differences will likely go >> away as both projects add features. What will be interesting is whether some >> fundamental differences in the target workloads remain, which I think is >> likely to happen. For example, the main deployment of Mesos is currently to >> run long-running stream processing services at Twitter, which is something >> that typical Hadoop environments just don't do and that requires different >> things from the cluster scheduler. I also believe we're going to see a lot >> of other cluster scheduling systems besides Mesos and HNG in the future, as >> people's requirements for these systems grow. There are some very >> challenging problems in designing a general cluster scheduling system that >> even the Google folks are still working hard on. >> >>> >> >>> Matei >> >>> >> >>> >> >>> >> >>> On Jun 30, 2011, at 6:26 PM, Edward J. Yoon wrote: >> >>> >> >>>> Thanks for your nice and quick explanation! >> >>>> >> >>>> On Fri, Jul 1, 2011 at 10:21 AM, Ted Dunning <[email protected]> >> wrote: >> >>>>> Technically speaking, Mesos has a less expressive model for >> expressing >> >>>>> resource requirements. The thesis of Mesos is that the negotiation >> between >> >>>>> application and scheduler can make up for this missing information. >> Mesos >> >>>>> was also first to "market", but Hadoop nextGen is catching up fast. >> The >> >>>>> MR-279 has code that works, albeit with some issues in production >> use. From >> >>>>> all reports, these issues are being resolved quickly as Yahoo's >> considerable >> >>>>> QA resources come to bear. >> >>>>> >> >>>>> Politically speaking, Mesos has a nearly inactive mailing list which, >> to >> >>>>> outward appearances, indicate a nearly inactive project. There is >> some >> >>>>> evidence that considerable activity is occurring off-list, but this >> is a >> >>>>> process bug in the Apache model since "if it doesn't happen on the >> list, it >> >>>>> doesn't happen". >> >>>>> >> >>>>> On the other side, Hadoop nextGen has the Hadoop community pretty >> much >> >>>>> behind it. Since HNG has the potential to breakdown some of the >> deadlocks >> >>>>> that have plagued the Hadoop community release process, there is >> >>>>> considerable enthusiasm for it. >> >>>>> >> >>>>> Combined, these factors make it much more likely that HNG will be the >> >>>>> dominant force in the Hadoop world. That is, more likely in my own >> >>>>> estimation. Others may differ. >> >>>>> >> >>>>> >> >>>>> On Thu, Jun 30, 2011 at 5:16 PM, Edward J. Yoon < >> [email protected]>wrote: >> >>>>> >> >>>>>> Hi, >> >>>>>> >> >>>>>> I'm newbie, and wonder what's the main differences between Hadoop >> >>>>>> nextGen and Mesos. >> >>>>>> >> >>>>>> Thanks. >> >>>>>> -- >> >>>>>> Best Regards, Edward J. Yoon >> >>>>>> @eddieyoon >> >>>>>> >> >>>>> >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> Best Regards, Edward J. Yoon >> >>>> @eddieyoon >> >>> >> >>> >> >> >> >> >> >> >> >> -- >> >> Best Regards, Edward J. Yoon >> >> @eddieyoon >> > >> > >> >> >> >> -- >> Best Regards, Edward J. Yoon >> @eddieyoon >> > -- Best Regards, Edward J. Yoon @eddieyoon
