Hi,

I use EC2 to run my Hadoop jobs using Cloudera's 0.18.3 AMI. It works great
but I want to automate it a bit more.

I want to be able to:
- start cluster
- copy data from S3 to the DFS
- run the job
- copy result data from DFS to S3
- verify it all copied ok
- shutdown the cluster.


I guess the hardest part is reliably detecting when a job is complete. I've
seen solutions that provide a time based shutdown but they are not suitable
as our jobs vary in time.

Has anyone made a script that does this already? I'm using the Cloudera
python scripts to start/terminate my cluster.

Thanks,
John

Reply via email to