Shivram,
many thanks for confirming the behavior. I will also turn on the
shortcircuit as you suggested. Appreciate the help
Demai
On Mon, Oct 13, 2014 at 3:42 PM, Shivram Mani sm...@pivotal.io wrote:
Demai, you are right. HDFS's default BlockPlacementPolicyDefault makes
sure one replica of your block is available on the writer's datanode.
The replica selection for the read operation is also aimed at minimizing
bandwidth/latency and will serve the block from the reader's local node.
If you want to further optimize this, you can set
'dfs.client.read.shortcircuit'
to true. This would allow the client to bypass the datanode to read the
file directly.
On Mon, Oct 13, 2014 at 11:58 AM, Demai Ni nid...@gmail.com wrote:
hi, folks,
a very simple question, looking forward a couple pointers.
Let's say I have a hdfs file: testfile, which only have one block(256MB),
and the block has a replica on datanode: host1.hdfs.com (the whole hdfs
may have 100 nodes though, and the other 2 replica are available at other
datanode).
If on host1.hdfs.com, I did a hadoop fs -cat testfile or a java client
to read the file. Should I assume there won't be any significant data
movement through network? That is the namenode is smart enough to give me
the data on host1.hdfs.com directly?
thanks
Demai
--
Thanks
Shivram