Hi All

here goes my first question :
Here is my use case

I have 1TB data I want to process on ec2 using spark
I have uploaded the data on ebs volume
The instruction on amazon ec2 set up explains
"*If your application needs to access large datasets, the fastest way to do
that is to load them from Amazon S3 or an Amazon EBS device into an
instance of the Hadoop Distributed File System (HDFS) on your nodes*"

Now the new amazon instances don't have any physical volume
http://aws.amazon.com/ec2/instance-types/

So do I need to do a set up for HDFS separately  on ec2 (instruction also
says The spark-ec2 script already sets up a HDFS instance for you") ? Any
blog/write up which can help me understanding this better ?

~Manish

Reply via email to