It appears support for this type of control over block placement is going
out in the next version of HDFS:

On Tue, Aug 26, 2014 at 7:43 AM, Gary Malouf <> wrote:

> One of my colleagues has been questioning me as to why Spark/HDFS makes no
> attempts to try to co-locate related data blocks.  He pointed to this
> paper: from 2011 on the
> CoHadoop research and the performance improvements it yielded for
> Map/Reduce jobs.
> Would leveraging these ideas for writing data from Spark make sense/be
> worthwhile?

Reply via email to