On Mon, Jul 5, 2010 at 1:12 AM, Evert Lammerts <evert.lamme...@sara.nl>wrote:
> There are a number of different versions and distributions of Hadoop > which, as far as I understand, all differ from each other. I know that in > the 0.20-append branch, files in HDFS can be appended, and that the Y! > distribution (0.20.S) implements security features through Kerberos. And > then there are the 0.20.3 and 0.22.0 branches. And trunk of course, which I > guess is 0.20.2 nowadays? In addition to that there are distributions by > Cloudera(CDH2 / 3beta) and IBM (IDAH). > > > > From my perspective, setting up a pilot cluster for a small number of users > from different institutes, security (0.20.S) is very attractive – scientists > like the idea of shielding their data and logic from other users. But what > will I miss if I choose Y!’s distribution over all of these other options? > > Hi Evert, Y!'s distribution does contain a good set of patches, and we at Cloudera are always keeping track of the ydist git repository to incorporate those changes into CDH. Currently, ydist contains the security patch series, but doesn't include the recent append work. CDH3b2 includes the append work, but not security as of yet -- we are currently integrating security and it should be available in the next beta. Aside from the specific patches included, it's worth noting that the Y! dist is a git repository, rather than a full binary-and-source distribution of Hadoop and related tools. CDH includes not just the core hadoop components but also integrates many other important ecosystem components including Pig, Hive, Oozie, HBase, Zookeeper, Flume, etc. Thanks -Todd -- Todd Lipcon Software Engineer, Cloudera