Hi, I use EC2 to run my Hadoop jobs using Cloudera's 0.18.3 AMI. It works great but I want to automate it a bit more.
I want to be able to: - start cluster - copy data from S3 to the DFS - run the job - copy result data from DFS to S3 - verify it all copied ok - shutdown the cluster. I guess the hardest part is reliably detecting when a job is complete. I've seen solutions that provide a time based shutdown but they are not suitable as our jobs vary in time. Has anyone made a script that does this already? I'm using the Cloudera python scripts to start/terminate my cluster. Thanks, John
