Any one knows are the HadoopNextGen source codes available now? Thanks! Yizheng
2011/7/1 Matei Zaharia <[email protected]> > That's a good question. Right now we were planning to provide a wrapper > that has the same API as the resource manager in HNG, so that not only > MapReduce but other apps written against that API will work. We'll see if we > run into any unforeseen problems with that. > > Matei > > On Jul 1, 2011, at 3:37 AM, Edward J. Yoon wrote: > > > Here's another silly question. > > > > Mesos plans to add HNG? or will be supported only pure Map/Reduce? > > > > On Fri, Jul 1, 2011 at 2:15 PM, Ted Dunning <[email protected]> > wrote: > >> Also, both projects are changing in terms of what they do and what they > >> intend to do. > >> > >> For instance, support for long running processes and alternative > execution > >> models other than map-reduce is an explicit goal for Yarn. > >> > >> This illustrates how hard it is for anybody to compare systems. > Typically, > >> any given person knows much more about one system than the other leading > to > >> many comparison points that are only half true (that half being the one > with > >> better information). This isn't remediable without collaborative > discussion > >> between (differently) informed speakers. > >> > >> > >> On Thu, Jun 30, 2011 at 10:10 PM, Edward J. Yoon <[email protected] > >wrote: > >> > >>> Understood. > >>> > >>> On Fri, Jul 1, 2011 at 1:59 PM, Matei Zaharia <[email protected] > > > >>> wrote: > >>>> I wouldn't say it's designed for Yahoo! only, but it's definitely > meant > >>> to solve issues they saw with large Hadoop clusters (and provides a lot > of > >>> value for that). > >>>> > >>>> Matei > >>>> > >>>> On Jul 1, 2011, at 12:51 AM, Edward J. Yoon wrote: > >>>> > >>>>> Hmm, HNG seems designed for their (Y!) own circumstance. > >>>>> > >>>>> On Fri, Jul 1, 2011 at 12:47 PM, Matei Zaharia < > [email protected]> > >>> wrote: > >>>>>> Ted brought up some superficial differences, but if you want to > >>> understand technical differences, there are a bunch of those as well. > Mesos > >>> and Hadoop next-gen have similar goals (more efficient resource sharing > for > >>> data centers), but they are coming at it from different angles -- HNG > is > >>> currently mainly focusing on MapReduce and aims to support other types > of > >>> applications too, while Mesos was meant to support a very diverse set > of > >>> applications, including long-running services and batch jobs (rather > than > >>> only multiple instances of MapReduce), and is in fact being used for > that > >>> already. More importantly, HNG is really two pieces -- a refactoring of > >>> MapReduce to allow one instance of MR per application, and a resource > >>> manager called YARN that lets these instances coordinate. We are going > to > >>> support having the new MR2 application masters run on top of Mesos > instead > >>> of YARN too (and indeed the refactoring is nice because it will enable > >>> Hadoop MapReduce to run on other cluster scheduling systems in the > future). > >>>>>> > >>>>>> In terms of the technical differences, here are some of the main > ones > >>> currently: > >>>>>> > >>>>>> - Mesos is implemented in C++ rather than Java, and has APIs in C++ > and > >>> Python in addition to Java. > >>>>>> > >>>>>> - The resource allocation models are different: HNG has a central > >>> scheduler that supports data locality constraints, while Mesos provides > >>> "resource offers" to let applications pick the resources they like > according > >>> to other criteria in addition to requests/filters to describe which > >>> resources you want to be offered. Our belief is that resource offers > will > >>> allow Mesos to support a wider range of application scheduling needs, > while > >>> simultaneously making the system more scalable and highly available > >>> (minimizing the state and work required of the master). > >>>>>> > >>>>>> - Mesos can enforce resource isolation through Linux Containers to > >>> guard against misbehaving / greedy tasks. > >>>>>> > >>>>>> - HNG supports Kerberos authentication for users. > >>>>>> > >>>>>> - HNG can run the MR2 version of Hadoop, while Mesos can run Hadoop > >>> 0.20, Spark and MPI. > >>>>>> > >>>>>> - There are some smaller architectural differences that may matter > for > >>> some applications, such as communication being based on message-passing > in > >>> Mesos vs periodic heartbeats in HNG, which allows Mesos to provide > lower > >>> scheduling latencies (e.g. to still be efficient if your tasks take > 100ms > >>> each). > >>>>>> > >>>>>> However, overall, as Ted said, many of these differences will likely > go > >>> away as both projects add features. What will be interesting is whether > some > >>> fundamental differences in the target workloads remain, which I think > is > >>> likely to happen. For example, the main deployment of Mesos is > currently to > >>> run long-running stream processing services at Twitter, which is > something > >>> that typical Hadoop environments just don't do and that requires > different > >>> things from the cluster scheduler. I also believe we're going to see a > lot > >>> of other cluster scheduling systems besides Mesos and HNG in the > future, as > >>> people's requirements for these systems grow. There are some very > >>> challenging problems in designing a general cluster scheduling system > that > >>> even the Google folks are still working hard on. > >>>>>> > >>>>>> Matei > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Jun 30, 2011, at 6:26 PM, Edward J. Yoon wrote: > >>>>>> > >>>>>>> Thanks for your nice and quick explanation! > >>>>>>> > >>>>>>> On Fri, Jul 1, 2011 at 10:21 AM, Ted Dunning < > [email protected]> > >>> wrote: > >>>>>>>> Technically speaking, Mesos has a less expressive model for > >>> expressing > >>>>>>>> resource requirements. The thesis of Mesos is that the > negotiation > >>> between > >>>>>>>> application and scheduler can make up for this missing > information. > >>> Mesos > >>>>>>>> was also first to "market", but Hadoop nextGen is catching up > fast. > >>> The > >>>>>>>> MR-279 has code that works, albeit with some issues in production > >>> use. From > >>>>>>>> all reports, these issues are being resolved quickly as Yahoo's > >>> considerable > >>>>>>>> QA resources come to bear. > >>>>>>>> > >>>>>>>> Politically speaking, Mesos has a nearly inactive mailing list > which, > >>> to > >>>>>>>> outward appearances, indicate a nearly inactive project. There is > >>> some > >>>>>>>> evidence that considerable activity is occurring off-list, but > this > >>> is a > >>>>>>>> process bug in the Apache model since "if it doesn't happen on the > >>> list, it > >>>>>>>> doesn't happen". > >>>>>>>> > >>>>>>>> On the other side, Hadoop nextGen has the Hadoop community pretty > >>> much > >>>>>>>> behind it. Since HNG has the potential to breakdown some of the > >>> deadlocks > >>>>>>>> that have plagued the Hadoop community release process, there is > >>>>>>>> considerable enthusiasm for it. > >>>>>>>> > >>>>>>>> Combined, these factors make it much more likely that HNG will be > the > >>>>>>>> dominant force in the Hadoop world. That is, more likely in my > own > >>>>>>>> estimation. Others may differ. > >>>>>>>> > >>>>>>>> > >>>>>>>> On Thu, Jun 30, 2011 at 5:16 PM, Edward J. Yoon < > >>> [email protected]>wrote: > >>>>>>>> > >>>>>>>>> Hi, > >>>>>>>>> > >>>>>>>>> I'm newbie, and wonder what's the main differences between Hadoop > >>>>>>>>> nextGen and Mesos. > >>>>>>>>> > >>>>>>>>> Thanks. > >>>>>>>>> -- > >>>>>>>>> Best Regards, Edward J. Yoon > >>>>>>>>> @eddieyoon > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Best Regards, Edward J. Yoon > >>>>>>> @eddieyoon > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Best Regards, Edward J. Yoon > >>>>> @eddieyoon > >>>> > >>>> > >>> > >>> > >>> > >>> -- > >>> Best Regards, Edward J. Yoon > >>> @eddieyoon > >>> > >> > > > > > > > > -- > > Best Regards, Edward J. Yoon > > @eddieyoon > >
