I'm a big fan of Whirr, though I dont think it support EBS persistance.  My
hadoop deployment strategy has always been store input and output data on
S3, spin up my hadoop cluster with either whirr or Elastic Map Reduce, run
the job, store output data on S3, and kill the cluster.


On Tue, Nov 29, 2011 at 12:28 PM, Periya.Data <periya.d...@gmail.com> wrote:

> Hi All,
>        I am just beginning to learn how to deploy a small cluster (a 3
> node cluster) on EC2. After some quick Googling, I see the following
> approaches:
>
>   1. Use Whirr for quick deployment and tearing down. Uses CDH3. Does it
>   have features for persisting (EBS)?
>   2. CDH Cloud Scripts - has EC2 AMI - again for temp Hadoop clusters/POC
>   etc. Good stuff - I can persist using EBS snapshots. But, this uses CDH2.
>   3. Install hadoop manually and related stuff like Hive...on each cluster
>   node...on EC2 (or use some automation tool like Chef). I do not prefer
> it.
>   4. Hadoop distribution comes with EC2 (under src/contrib) and there are
>   several Hadoop EC2 AMIs available. I have not studied enough to know if
>   that is easy for a beginner like me.
>   5. Anything else??
>
> 1 and 2 look promising as a beginner. If any of you have any thoughts about
> this, I would like to know (like what to keep in mind, what to take care
> of, caveats etc). I want my data /config to persist (using EBS) and
> continue from where I left off...(after a few days).  Also, I want to have
> HIVE and SQOOP installed. Can this done using 1 or 2? Or, will installation
> of them have to be done manually after I set up the cluster?
>
> Thanks very much,
>
> PD.
>



-- 

Thanks,
John C

Reply via email to