I have a similar use case, so I wrote a python script to fix the cluster
configuration that spark-ec2 uses when you use Hadoop 2. Start a cluster
with enough machines that the hdfs system can hold 1Tb (so use instance
types that have SSDs), then follow the instructions at
Hi All
here goes my first question :
Here is my use case
I have 1TB data I want to process on ec2 using spark
I have uploaded the data on ebs volume
The instruction on amazon ec2 set up explains
*If your application needs to access large datasets, the fastest way to do
that is to load them from