I came across this interesting JIRA in hadoop https://issues.apache.org/jira/browse/HDFS-385
In essence, it allows us more control over where blocks are placed. A BlockPlacementPolicy to optimistically write all blocks of a given index-file into same set of datanodes could be highly helpful. Co-locating shard-servers and datanodes along with short-circuit reads should improve greatly. We can always utilize the file-system cache for local files. In case a given file is not served locally, then shard-servers can use the block-cache Do you see some positives in such an approach? -- Ravi
