If you are using one of the supported platforms, then it is easy to get up and 
going fairly quickly as well.

...advice from another seigel/segel

Cheers
james.


On 2011-03-23, at 9:32 AM, Michael Segel wrote:

> 
> Rita,
> 
> It sounds like you're only using Hadoop and have no intentions to really get 
> into the internals.
> 
> I'm like most admins/developers/IT guys and I'm pretty lazy.
> I find it easier to set up the yum repository and then issue the yum install 
> hadoop command. 
> 
> The thing about Cloudera is that they do back port patches so that while 
> their release is 'heavily patched'.
> But they are usually in some sort of sync with the Apache release. Since 
> you're only working with HDFS and its pretty stable, I'd say go with the 
> Cloudera release.
> 
> HTH
> 
> -Mike
> 
> 
> ----------------------------------------
>> Date: Wed, 23 Mar 2011 11:12:30 -0400
>> Subject: Re: CDH and Hadoop
>> From: rmorgan...@gmail.com
>> To: common-user@hadoop.apache.org
>> CC: michael_se...@hotmail.com
>> 
>> Mike,
>> 
>> Thanks. This helps a lot.
>> 
>> At our lab we have close to 60 servers which only run hdfs. I don't need
>> mapreduce and other bells and whistles. We just use hdfs for storing dataset
>> results ranging from 3gb to 90gb.
>> 
>> So, what is the best practice for hdfs? should I always deploy one version
>> before? I understand that Cloudera's version is heavily patched (similar to
>> Redhat Linux kernel versus standard Linux kernel).
>> 
>> 
>> 
>> 
>> 
>> 
>> On Wed, Mar 23, 2011 at 10:44 AM, Michael Segel
>> wrote:
>> 
>>> 
>>> Rita,
>>> 
>>> Short answer...
>>> 
>>> Cloudera's release is free, and they do also offer a support contract if
>>> you want support from them.
>>> Cloudera has sources, but most use yum (redhat/centos) to download an
>>> already built release.
>>> 
>>> Should you use it?
>>> Depends on what you want to do.
>>> 
>>> If your goal is to get up and running with Hadoop and then focus on *using*
>>> Hadoop/HBase/Hive/Pig/etc... then it makes sense.
>>> 
>>> If your goal is to do a deep dive in to Hadoop and get your hands dirty
>>> mucking around with the latest and greatest in trunk? Then no. You're better
>>> off building your own off the official Apache release.
>>> 
>>> Many companies choose Cloudera's release for the following reasons:
>>> * Paid support is available.
>>> * Companies focus on using a tech not developing the tech, so Cloudera does
>>> the heavy lifting while Client Companies focus on 'USING' Hadoop.
>>> * Cloudera's release makes sure that the versions in the release work
>>> together. That is that when you down load CHD3B4, you get a version of
>>> Hadoop that will work with the included version of HBase, Hive, etc ...
>>> 
>>> And no, its never a good idea to try and mix and match Hadoop from
>>> different environments and versions in a cluster.
>>> (I think it will barf on you.)
>>> 
>>> Does that help?
>>> 
>>> -Mike
>>> 
>>> 
>>> ----------------------------------------
>>>> Date: Wed, 23 Mar 2011 10:29:16 -0400
>>>> Subject: CDH and Hadoop
>>>> From: rmorgan...@gmail.com
>>>> To: common-user@hadoop.apache.org
>>>> 
>>>> I have been wondering if I should use CDH (
>>> http://www.cloudera.com/hadoop/)
>>>> instead of the standard Hadoop distribution.
>>>> 
>>>> What do most people use? Is CDH free? do they provide the tars or does it
>>>> provide source code and I simply compile? Can I have some data nodes as
>>> CDH
>>>> and the rest as regular Hadoop?
>>>> 
>>>> 
>>>> I am asking this because so far I noticed a serious bug (IMO) in the
>>>> decommissioning process (
>>>> 
>>> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201103.mbox/%3cAANLkTikPKGt5zw1QGLse+LPzUDP7Mom=ty_mxfcuo...@mail.gmail.com%3e
>>>> )
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> --- Get your facts first, then you can distort them as you please.--
>>> 
>>> 
>> 
>> 
>> 
>> --
>> --- Get your facts first, then you can distort them as you please.--
>                                         

Reply via email to