Hello, Even I have the same queries in mind . What all the upgrades where we can use EC2 as compare to normal servers for spark and other big data product development . Hope to get inputs from the community .
Thanks, Divya On Dec 4, 2015 6:05 AM, "Andy Davidson" <a...@santacruzintegration.com> wrote: > About 2 months ago I used spark-ec2 to set up a small cluster. The cluster > runs a spark streaming app 7x24 and stores the data to hdfs. I also need to > run some batch analytics on the data. > > Now that I have a little more experience I wonder if this was a good way > to set up the cluster the following issues > > 1. I have not been able to find explicit directions for upgrading the > spark version > 1. > > http://search-hadoop.com/m/q3RTt7E0f92v0tKh2&subj=Re+Upgrading+Spark+in+EC2+clusters > 2. I am not sure where the data is physically be stored. I think I may > accidentally loose all my data > 3. spark-ec2 makes it easy to launch a cluster with as many machines > as you like how ever Its not clear how I would add slaves to an existing > installation > > > Our Java streaming app we call rdd.saveAsTextFile(“hdfs://path”); > > ephemeral-hdfs/conf/hdfs-site.xml: > > <property> > > <name>dfs.data.dir</name> > > <value>/mnt/ephemeral-hdfs/data,/mnt2/ephemeral-hdfs/data</value> > > </property> > > > persistent-hdfs/conf/hdfs-site.xml > > > $ mount > > /dev/xvdb on /mnt type ext3 (rw,nodiratime) > > /dev/xvdf on /mnt2 type ext3 (rw,nodiratime) > > > http://spark.apache.org/docs/latest/ec2-scripts.html > > *"*The spark-ec2 script also supports pausing a cluster. In this case, > the VMs are stopped but not terminated, so they *lose all data on > ephemeral disks* but keep the data in their root partitions and their > persistent-pdfs.” > > > Initially I though using HDFS was a good idea. spark-ec2 makes HDFS easy > to use. I incorrectly thought spark some how knew how HDFS partitioned my > data. > > I think many people are using amazon s3. I do not have an direct > experience with S3. My concern would be that the data is not physically > stored closed to my slaves. I.e. High communication costs. > > Any suggestions would be greatly appreciated > > Andy >