I Have the same question. Which version ,Which vender do we choose?
-- hadoop Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On 2012年9月21日Friday at 上午2:22, Aaron Eng wrote: > > I'm tasked with creating a guide that instructs on how to choose a Hadoop > > distribution from the handful of common options. > > Does anyone have any thoughts on what criteria might govern such a > > decision? > > What problem(s) are you trying to solve with Hadoop (and related projects)? > What are your expectations of the technology? > > The details beyond that level could take many, many pages to cover. > > Not all Hadoop distributions are tested the same way, packaged with the > same components, etc. Not all components of a given Hadoop distribution > work with other Hadoop distributions. There are a lot of common things > between distributions which is probably why its difficult to articulate how > to choose one over the another. So when you look at the problem you are > trying to solve and your expectations of the technology, many things may > seem relatively equal and hence you may need to get into some significant > level of detail to pick something that best solves your problem. In some > cases it may be very straightforward as to whether a distribution will meet > your requirements. In other cases, things may look relatively equal across > the board until you drill down to a point where you find differentiation > (or maybe you dont find it). But those would be my critera, articulate the > problem and expectations and compare functionality until you find > differentiation. > > > > On Thu, Sep 20, 2012 at 11:06 AM, Keith Wiley <kwi...@keithwiley.com > (mailto:kwi...@keithwiley.com)> wrote: > > > I'm tasked with creating a guide that instructs on how to choose a Hadoop > > distribution from the handful of common options. I'm finding this rather > > perplexing. While some of the venders offer additional management software > > (Cloudera Manager is an example) I'm unclear whether those packages could > > be installed and run irregardless of the underlying Hadoop distribution or > > if they are exclusively compatible with their vender's distribution (or if > > there's some crossover). I'm also unclear on any other basis for > > comparison. For example HortonWorks originated HCatalog (to the best of my > > understanding), but that doesn't necessarily mean one needs to use the HW > > Hadoop dist. to use HCatalog since it's just a public Apache project anyway > > at this point. I'm sure similar statements could be made about MapR or > > Greenplum (although I thin Greenplum's Hadoop uses MapR's M5 anyway so > > again, the decision-making process in such a case seems baffling). > > > > And then there's the option of installing the Apache version directly, > > always on the table I suppose. > > > > Does anyone have any thoughts on what criteria might govern such a > > decision? I'm not trying to get into an argument about which distribution > > is best, I'm not even looking for defenses or arguments for one > > distribution or another, but rather a notion of what the criteria for > > basing such a decision might be. > > > > Thanks. > > > > Cheers! > > > > > > ________________________________________________________________________________ > > Keith Wiley kwi...@keithwiley.com (mailto:kwi...@keithwiley.com) > > keithwiley.com (http://keithwiley.com) > > music.keithwiley.com (http://music.keithwiley.com) > > > > "It's a fine line between meticulous and obsessive-compulsive and a > > slippery > > rope between obsessive-compulsive and debilitatingly slow." > > -- Keith Wiley > > > > ________________________________________________________________________________ > >