Re: Spark on AWS

Renato Perini Thu, 28 Apr 2016 15:00:26 -0700

I have setup a small development cluster using t2.micro machines and anAmazon Linux AMI (CentOS 6.x).The whole setup has been done manually, without using the providedscripts. The whole setup is composed of a total of 5 instances: thefirst machine has an elastic IP and it is used as a bridge to access theother 4 machines (they don't have elastic IPs). The second machine runsa standalone single node Spark cluster (1 master, 1 worker). The other 3machines are configured as an Apache Cassandra cluster. I have tuned theJVM and lots of parameters. I do not use S3 nor HDFS, I just write datausing Spark Streaming (from an Apache Flume sink) to the 3 Cassandranodes, that I use for data retrieval. Data is then processed throughregular Spark jobs.The jobs are submitted to the cluster using LinkedIn Azkaban, executingcustom shell scripts written by me for wrapping the submitting processand handling eventual command line arguments, at scheduled intervals.Results are written directly to other Cassandra tables or in a specificfolder on the filesystem using the regular CSV format.The system is completely autonomous and requires little to no manualadministration.

I'm quite satisfied with it, considering how small and limited themachines involved are. But it required lots of tuning work, because weare clearly under the recommended requirements. 4 of the 5 machines areswitched off during the night, only the bridge machine is alive 24/7.


12$ per month in total.

Renato Perini.


Il 28/04/2016 23:39, Fatma Ozcan ha scritto:

What is your experience using Spark on AWS? Are you setting up yourown Spark cluster, and using HDFS? Or are you using Spark as a servicefrom AWS? In the latter case, what is your experience of using S3directly, without having HDFS in between?
Thanks,
Fatma



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark on AWS

Reply via email to