Hi I tried to use short circuit read to improve my hbase cluster MR scan performance.
I have the following setting in hdfs-site.xml dfs.client.read.shortcircuit set to true dfs.block.local-path-access.user set to MR job runner. The cluster is 1+4 node and each data node have 16cpu/4HDD, with all hbase table major compact thus all data is local. I have hoped that the short circuit read will improve the performance. While the test result is that with short circuit read enabled, the performance actually dropped 10-15%. Say scan a 50G table cost around 100s instead of 90s. My hadoop version is 1.1.1, any idea on this? Thx! Best Regards, Raymond Liu