How important is Data Locality to Hadoop? I mean, if we prefer to separate
the HDFS cluster from the MR cluster, we will lose data locality but my
question is how bad is this assuming we provider a reasonable network
connection between the two clusters? EMR kills data locality when using S3
as storage but we do not see a significant job time difference running same
job from the HDFS cluster of the same setup. So, I am wondering
how important is Data Locality to Hadoop in practice?

Thanks,
Mike

Reply via email to