RE: why my test result on dfs short circuit read is slower?

Liu, Raymond Sun, 17 Feb 2013 17:09:52 -0800

I have try to tune io.file.buffer.size to 128K instead of 4K
ShortCircuit read performance is still worse than read through datanode.


I am start to wondering, does shortcircuit read really help under hadoop 1.1.1 
version?
I google to find a few people mention they got 2x gain or so upon CDH etc. But 
I really can't find out what else I can do to make it even just catch up normal 
read path....

> 
> It seems to me that, with short circuit read enabled, the BlockReaderLocal
> read data in 512/4096 bytes unit(checksum check enabled/skiped)
> 
> While when It go through datanode, the BlockSender.sendChunks will read and
> sent data in 64K bytes units?
> 
> Is that true? And if so, won't it explain that read through datanode will be
> faster? Since it read data in bigger block size.
> 
> Best Regards,
> Raymond Liu
> 
> 
> > -----Original Message-----
> > From: Liu, Raymond [mailto:raymond....@intel.com]
> > Sent: Saturday, February 16, 2013 2:23 PM
> > To: user@hadoop.apache.org
> > Subject: RE: why my test result on dfs short circuit read is slower?
> >
> > Hi Arpit Gupta
> >
> > Yes,  this way also confirms that short circuit read is enabled on my 
> > cluster.
> >
> > 13/02/16 14:07:34 DEBUG hdfs.DFSClient: Short circuit read is true
> >
> > 13/02/16 14:07:34 DEBUG hdfs.DFSClient: New BlockReaderLocal for file
> >
> /mnt/DP_disk4/raymond/hdfs/data/current/subdir63/blk_-2736548898990727
> > 638 of size 134217728 startOffset 0 length 134217728 short circuit
> > checksum false
> >
> > So , any possibility that other setting might impact short circuit
> > read to has worse performance than read through datanode?
> >
> > Raymond
> >
> >
> >
> > >Another way to check if short circuit read is configured correctly.
> >
> > >As the user who is configured for short circuit read issue the
> > >following
> > command on a node where you expect the data to be read locally.
> >
> > >export HADOOP_ROOT_LOGGER=debug,console; hadoop dfs -cat
> > >/path/to/file_on_hdfs
> >
> > >On the console you should see something like "hdfs.DFSClient: New
> > BlockReaderLocal for file...."
> >
> > >This would confirm that short circuit read is happening.
> >
> > --
> > >Arpit Gupta
> > >Hortonworks Inc.
> > >http://hortonworks.com/
> >
> > On Feb 15, 2013, at 9:53 PM, "Liu, Raymond" <raymond....@intel.com>
> wrote:
> >
> >
> > Hi Harsh
> >
> > Yes, I did set both of these. While not in hbase-site.xml but hdfs-site.xml.
> >
> > And I have double confirmed that local reads are performed, since
> > there are no Error in datanode logs, and by watching lo network IO.
> >
> >
> >
> > If you want HBase to leverage the shortcircuit, the DN config
> > "dfs.block.local-path-access.user" should be set to the user running HBase
> (i.e.
> > hbase, for example), and the hbase-site.xml should have
> > "dfs.client.read.shortcircuit" defined in all its RegionServers. Doing
> > this wrong could result in performance penalty and some warn-logging,
> > as local reads will be attempted but will begin to fail.
> >
> > On Sat, Feb 16, 2013 at 8:40 AM, Liu, Raymond <raymond....@intel.com>
> > wrote:
> >
> > Hi
> >
> >        I tried to use short circuit read to improve my hbase cluster
> > MR scan performance.
> >
> >
> >        I have the following setting in hdfs-site.xml
> >
> >        dfs.client.read.shortcircuit set to true
> >        dfs.block.local-path-access.user set to MR job runner.
> >
> >        The cluster is 1+4 node and each data node have 16cpu/4HDD,
> > with all hbase table major compact thus all data is local.
> >
> >        I have hoped that the short circuit read will improve the
> > performance.
> >
> >
> >        While the test result is that with short circuit read enabled,
> > the performance actually dropped 10-15%. Say scan a 50G table cost
> > around 100s instead of 90s.
> >
> >
> >        My hadoop version is 1.1.1, any idea on this? Thx!
> >
> > Best Regards,
> > Raymond Liu
> >
> >
> >
> >
> > --
> > Harsh J

RE: why my test result on dfs short circuit read is slower?

Reply via email to