Hello,
Even I have the same queries in mind .
What all the upgrades where we can use EC2 as compare to normal servers for
spark and other big data product development .
Hope to get inputs from the community .

Thanks,
Divya
On Dec 4, 2015 6:05 AM, "Andy Davidson" <a...@santacruzintegration.com>
wrote:

> About 2 months ago I used spark-ec2 to set up a small cluster. The cluster
> runs a spark streaming app 7x24 and stores the data to hdfs. I also need to
> run some batch analytics on the data.
>
> Now that I have a little more experience I wonder if this was a good way
> to set up the cluster the following issues
>
>    1. I have not been able to find explicit directions for upgrading the
>    spark version
>       1.
>       
> http://search-hadoop.com/m/q3RTt7E0f92v0tKh2&subj=Re+Upgrading+Spark+in+EC2+clusters
>    2. I am not sure where the data is physically be stored. I think I may
>    accidentally loose all my data
>    3. spark-ec2 makes it easy to launch a cluster with as many machines
>    as you like how ever Its not clear how I would add slaves to an existing
>    installation
>
>
> Our Java streaming app we call rdd.saveAsTextFile(“hdfs://path”);
>
> ephemeral-hdfs/conf/hdfs-site.xml:
>
>   <property>
>
>     <name>dfs.data.dir</name>
>
>     <value>/mnt/ephemeral-hdfs/data,/mnt2/ephemeral-hdfs/data</value>
>
>   </property>
>
>
> persistent-hdfs/conf/hdfs-site.xml
>
>
> $ mount
>
> /dev/xvdb on /mnt type ext3 (rw,nodiratime)
>
> /dev/xvdf on /mnt2 type ext3 (rw,nodiratime)
>
>
> http://spark.apache.org/docs/latest/ec2-scripts.html
>
> *"*The spark-ec2 script also supports pausing a cluster. In this case,
> the VMs are stopped but not terminated, so they *lose all data on
> ephemeral disks* but keep the data in their root partitions and their
> persistent-pdfs.”
>
>
> Initially I though using HDFS was a good idea. spark-ec2 makes HDFS easy
> to use. I incorrectly thought spark some how knew how HDFS partitioned my
> data.
>
> I think many people are using amazon s3. I do not have an direct
> experience with S3. My concern would be that the data is not physically
> stored closed to my slaves. I.e. High communication costs.
>
> Any suggestions would be greatly appreciated
>
> Andy
>

Reply via email to