Poor HDFS Data Locality on Spark-EC2

Jerry Lam Tue, 04 Aug 2015 15:44:01 -0700

Hi Spark users and developers,

I have been trying to use spark-ec2. After I launched the spark cluster
(1.4.1) with ephemeral hdfs (using hadoop 2.4.0), I tried to execute a job
where the data is stored in the ephemeral hdfs. It does not matter what I
tried to do, there is no data locality at all. For instance, filtering data
and calculating the count of the filter data will always have locality
level "any". I tweaked the configurations spark.locality.wait.* but it does
not seem to care. I'm guessing this is because the hostname cannot be
resolved properly. Does anyone experience this problem before?


Best Regards,

Jerry

Poor HDFS Data Locality on Spark-EC2

Reply via email to