I have try to tune io.file.buffer.size to 128K instead of 4K ShortCircuit read performance is still worse than read through datanode.
I am start to wondering, does shortcircuit read really help under hadoop 1.1.1 version? I google to find a few people mention they got 2x gain or so upon CDH etc. But I really can't find out what else I can do to make it even just catch up normal read path.... > > It seems to me that, with short circuit read enabled, the BlockReaderLocal > read data in 512/4096 bytes unit(checksum check enabled/skiped) > > While when It go through datanode, the BlockSender.sendChunks will read and > sent data in 64K bytes units? > > Is that true? And if so, won't it explain that read through datanode will be > faster? Since it read data in bigger block size. > > Best Regards, > Raymond Liu > > > > -----Original Message----- > > From: Liu, Raymond [mailto:raymond....@intel.com] > > Sent: Saturday, February 16, 2013 2:23 PM > > To: user@hadoop.apache.org > > Subject: RE: why my test result on dfs short circuit read is slower? > > > > Hi Arpit Gupta > > > > Yes, this way also confirms that short circuit read is enabled on my > > cluster. > > > > 13/02/16 14:07:34 DEBUG hdfs.DFSClient: Short circuit read is true > > > > 13/02/16 14:07:34 DEBUG hdfs.DFSClient: New BlockReaderLocal for file > > > /mnt/DP_disk4/raymond/hdfs/data/current/subdir63/blk_-2736548898990727 > > 638 of size 134217728 startOffset 0 length 134217728 short circuit > > checksum false > > > > So , any possibility that other setting might impact short circuit > > read to has worse performance than read through datanode? > > > > Raymond > > > > > > > > >Another way to check if short circuit read is configured correctly. > > > > >As the user who is configured for short circuit read issue the > > >following > > command on a node where you expect the data to be read locally. > > > > >export HADOOP_ROOT_LOGGER=debug,console; hadoop dfs -cat > > >/path/to/file_on_hdfs > > > > >On the console you should see something like "hdfs.DFSClient: New > > BlockReaderLocal for file...." > > > > >This would confirm that short circuit read is happening. > > > > -- > > >Arpit Gupta > > >Hortonworks Inc. > > >http://hortonworks.com/ > > > > On Feb 15, 2013, at 9:53 PM, "Liu, Raymond" <raymond....@intel.com> > wrote: > > > > > > Hi Harsh > > > > Yes, I did set both of these. While not in hbase-site.xml but hdfs-site.xml. > > > > And I have double confirmed that local reads are performed, since > > there are no Error in datanode logs, and by watching lo network IO. > > > > > > > > If you want HBase to leverage the shortcircuit, the DN config > > "dfs.block.local-path-access.user" should be set to the user running HBase > (i.e. > > hbase, for example), and the hbase-site.xml should have > > "dfs.client.read.shortcircuit" defined in all its RegionServers. Doing > > this wrong could result in performance penalty and some warn-logging, > > as local reads will be attempted but will begin to fail. > > > > On Sat, Feb 16, 2013 at 8:40 AM, Liu, Raymond <raymond....@intel.com> > > wrote: > > > > Hi > > > > I tried to use short circuit read to improve my hbase cluster > > MR scan performance. > > > > > > I have the following setting in hdfs-site.xml > > > > dfs.client.read.shortcircuit set to true > > dfs.block.local-path-access.user set to MR job runner. > > > > The cluster is 1+4 node and each data node have 16cpu/4HDD, > > with all hbase table major compact thus all data is local. > > > > I have hoped that the short circuit read will improve the > > performance. > > > > > > While the test result is that with short circuit read enabled, > > the performance actually dropped 10-15%. Say scan a 50G table cost > > around 100s instead of 90s. > > > > > > My hadoop version is 1.1.1, any idea on this? Thx! > > > > Best Regards, > > Raymond Liu > > > > > > > > > > -- > > Harsh J