Re: newbie best practices: is spark-ec2 intended to be used to manage long-lasting infrastructure ?

2015-12-04 Thread Sean Owen
There is no way to upgrade a running cluster here. You can stop a cluster, and simply start a new cluster in the same way you started the original cluster. That ought to be simple; the only issue I suppose is that you have down-time since you have to shut the whole thing down, but maybe that's

Re: newbie best practices: is spark-ec2 intended to be used to manage long-lasting infrastructure ?

2015-12-04 Thread Michal Klos
If you are running on AWS I would recommend using s3 instead of hdfs as a general practice if you are maintaining state or data there. This way you can treat your spark clusters as ephemeral compute resources that you can swap out easily -- eg if something breaks just spin up a fresh cluster

Re: newbie best practices: is spark-ec2 intended to be used to manage long-lasting infrastructure ?

2015-12-04 Thread Sabarish Sasidharan
#2: if using hdfs it's on the disks. You can use the HDFS command line to browse your data. And then use s3distcp or simply distcp to copy data from hdfs to S3. Or even use hdfs get commands to copy to local disk and then use S3 cli to copy to s3 #3. Cost of accessing data in S3 from Ec2 nodes,

Re: newbie best practices: is spark-ec2 intended to be used to manage long-lasting infrastructure ?

2015-12-03 Thread Divya Gehlot
Hello, Even I have the same queries in mind . What all the upgrades where we can use EC2 as compare to normal servers for spark and other big data product development . Hope to get inputs from the community . Thanks, Divya On Dec 4, 2015 6:05 AM, "Andy Davidson"

newbie best practices: is spark-ec2 intended to be used to manage long-lasting infrastructure ?

2015-12-03 Thread Andy Davidson
About 2 months ago I used spark-ec2 to set up a small cluster. The cluster runs a spark streaming app 7x24 and stores the data to hdfs. I also need to run some batch analytics on the data. Now that I have a little more experience I wonder if this was a good way to set up the cluster the following