Hi Spark users and developers, I have been trying to use spark-ec2. After I launched the spark cluster (1.4.1) with ephemeral hdfs (using hadoop 2.4.0), I tried to execute a job where the data is stored in the ephemeral hdfs. It does not matter what I tried to do, there is no data locality at all. For instance, filtering data and calculating the count of the filter data will always have locality level "any". I tweaked the configurations spark.locality.wait.* but it does not seem to care. I'm guessing this is because the hostname cannot be resolved properly. Does anyone experience this problem before?
Best Regards, Jerry