Re: Need clarification on spark on cluster set up instruction

2015-07-01 Thread Alex Gittens
I have a similar use case, so I wrote a python script to fix the cluster configuration that spark-ec2 uses when you use Hadoop 2. Start a cluster with enough machines that the hdfs system can hold 1Tb (so use instance types that have SSDs), then follow the instructions at

Need clarification on spark on cluster set up instruction

2015-06-29 Thread manish ranjan
Hi All here goes my first question : Here is my use case I have 1TB data I want to process on ec2 using spark I have uploaded the data on ebs volume The instruction on amazon ec2 set up explains *If your application needs to access large datasets, the fastest way to do that is to load them from