If you are using one of the supported platforms, then it is easy to get up and going fairly quickly as well.
...advice from another seigel/segel Cheers james. On 2011-03-23, at 9:32 AM, Michael Segel wrote: > > Rita, > > It sounds like you're only using Hadoop and have no intentions to really get > into the internals. > > I'm like most admins/developers/IT guys and I'm pretty lazy. > I find it easier to set up the yum repository and then issue the yum install > hadoop command. > > The thing about Cloudera is that they do back port patches so that while > their release is 'heavily patched'. > But they are usually in some sort of sync with the Apache release. Since > you're only working with HDFS and its pretty stable, I'd say go with the > Cloudera release. > > HTH > > -Mike > > > ---------------------------------------- >> Date: Wed, 23 Mar 2011 11:12:30 -0400 >> Subject: Re: CDH and Hadoop >> From: rmorgan...@gmail.com >> To: common-user@hadoop.apache.org >> CC: michael_se...@hotmail.com >> >> Mike, >> >> Thanks. This helps a lot. >> >> At our lab we have close to 60 servers which only run hdfs. I don't need >> mapreduce and other bells and whistles. We just use hdfs for storing dataset >> results ranging from 3gb to 90gb. >> >> So, what is the best practice for hdfs? should I always deploy one version >> before? I understand that Cloudera's version is heavily patched (similar to >> Redhat Linux kernel versus standard Linux kernel). >> >> >> >> >> >> >> On Wed, Mar 23, 2011 at 10:44 AM, Michael Segel >> wrote: >> >>> >>> Rita, >>> >>> Short answer... >>> >>> Cloudera's release is free, and they do also offer a support contract if >>> you want support from them. >>> Cloudera has sources, but most use yum (redhat/centos) to download an >>> already built release. >>> >>> Should you use it? >>> Depends on what you want to do. >>> >>> If your goal is to get up and running with Hadoop and then focus on *using* >>> Hadoop/HBase/Hive/Pig/etc... then it makes sense. >>> >>> If your goal is to do a deep dive in to Hadoop and get your hands dirty >>> mucking around with the latest and greatest in trunk? Then no. You're better >>> off building your own off the official Apache release. >>> >>> Many companies choose Cloudera's release for the following reasons: >>> * Paid support is available. >>> * Companies focus on using a tech not developing the tech, so Cloudera does >>> the heavy lifting while Client Companies focus on 'USING' Hadoop. >>> * Cloudera's release makes sure that the versions in the release work >>> together. That is that when you down load CHD3B4, you get a version of >>> Hadoop that will work with the included version of HBase, Hive, etc ... >>> >>> And no, its never a good idea to try and mix and match Hadoop from >>> different environments and versions in a cluster. >>> (I think it will barf on you.) >>> >>> Does that help? >>> >>> -Mike >>> >>> >>> ---------------------------------------- >>>> Date: Wed, 23 Mar 2011 10:29:16 -0400 >>>> Subject: CDH and Hadoop >>>> From: rmorgan...@gmail.com >>>> To: common-user@hadoop.apache.org >>>> >>>> I have been wondering if I should use CDH ( >>> http://www.cloudera.com/hadoop/) >>>> instead of the standard Hadoop distribution. >>>> >>>> What do most people use? Is CDH free? do they provide the tars or does it >>>> provide source code and I simply compile? Can I have some data nodes as >>> CDH >>>> and the rest as regular Hadoop? >>>> >>>> >>>> I am asking this because so far I noticed a serious bug (IMO) in the >>>> decommissioning process ( >>>> >>> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201103.mbox/%3cAANLkTikPKGt5zw1QGLse+LPzUDP7Mom=ty_mxfcuo...@mail.gmail.com%3e >>>> ) >>>> >>>> >>>> >>>> >>>> -- >>>> --- Get your facts first, then you can distort them as you please.-- >>> >>> >> >> >> >> -- >> --- Get your facts first, then you can distort them as you please.-- >