[ 
https://issues.apache.org/jira/browse/HBASE-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910343#action_12910343
 ] 

ryan rawson commented on HBASE-3006:
------------------------------------

if the objects are short lived, it doesnt matter the object allocation size.  
If they are used only for the lifecycle of reading a single block, i'd say size 
it to the compressed Size.



> Reading compressed HFile blocks causes way too many DFS RPC calls severly 
> impacting performance
> -----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3006
>                 URL: https://issues.apache.org/jira/browse/HBASE-3006
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.89.20100621
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>            Priority: Critical
>         Attachments: HBASE-3006.txt
>
>
> On some read perf tests, we noticed several perf outliers (10 second plus 
> range). The rows were large (spanning multiple blocks, but still the numbers 
> didn't add up). We had compression turned on.
> We enabled DN clienttrace logging,
> log4j.logger.org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace=DEBUG
> and noticed lots of 516 byte reads at the DN level, several of them at the 
> same offset in the block.
> {code}
> 2010-09-16 09:28:32,335 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
> /10.30.251.189:50010, dest: /10.30.251.189:38713, bytes: 516, op: HDFS_READ, 
> cliID: DFSClient_-436329957, offset: 39884800, srvID: DS-1757894045-10.3\
> 0.251.189-50010-1283993662994, blockid: blk_-4686540439725119008_1985, 
> duration: 203000
> 2010-09-16 09:28:32,336 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
> /10.30.251.189:50010, dest: /10.30.251.189:40547, bytes: 516, op: HDFS_READ, 
> cliID: DFSClient_-436329957, offset: 39884800, srvID: DS-1757894045-10.3\
> 0.251.189-50010-1283993662994, blockid: blk_-4686540439725119008_1985, 
> duration: 119000
> 2010-09-16 09:28:32,337 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
> /10.30.251.189:50010, dest: /10.30.251.189:40650, bytes: 516, op: HDFS_READ, 
> cliID: DFSClient_-436329957, offset: 39884800, srvID: DS-1757894045-10.3\
> 0.251.189-50010-1283993662994, blockid: blk_-4686540439725119008_1985, 
> duration: 149000
> 2010-09-16 09:28:32,337 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
> /10.30.251.189:50010, dest: /10.30.251.189:40861, bytes: 516, op: HDFS_READ, 
> cliID: DFSClient_-436329957, offset: 39884800, srvID: DS-1757894045-10.3\
> 0.251.189-50010-1283993662994, blockid: blk_-4686540439725119008_1985, 
> duration: 135000
> 2010-09-16 09:28:32,338 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
> /10.30.251.189:50010, dest: /10.30.251.189:41129, bytes: 516, op: HDFS_READ, 
> cliID: DFSClient_-436329957, offset: 39884800, srvID: DS-1757894045-10.3\
> 0.251.189-50010-1283993662994, blockid: blk_-4686540439725119008_1985, 
> duration: 117000
> 2010-09-16 09:28:32,339 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
> /10.30.251.189:50010, dest: /10.30.251.189:41691, bytes: 516, op: HDFS_READ, 
> cliID: DFSClient_-436329957, offset: 39884800, srvID: DS-1757894045-10.3\
> 0.251.189-50010-1283993662994, blockid: blk_-4686540439725119008_1985, 
> duration: 148000
> 2010-09-16 09:28:32,339 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
> /10.30.251.189:50010, dest: /10.30.251.189:42881, bytes: 516, op: HDFS_READ, 
> cliID: DFSClient_-436329957, offset: 39884800, srvID: DS-1757894045-10.3\
> 0.251.189-50010-1283993662994, blockid: blk_-4686540439725119008_1985, 
> duration: 114000
> 2010-09-16 09:28:32,341 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
> /10.30.251.189:50010, dest: /10.30.251.189:49511, bytes: 516, op: HDFS_READ, 
> cliID: DFSClient_-436329957, offset: 39884800, srvID: DS-1757894045-10.3\
> 0.251.189-50010-1283993662994, blockid: blk_-4686540439725119008_1985, 
> duration: 153000
> 2010-09-16 09:28:32,342 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
> /10.30.251.189:50010, dest: /10.30.251.189:51158, bytes: 3096, op: HDFS_READ, 
> cliID: DFSClient_-436329957, offset: 39884800, srvID: DS-1757894045-10.\
> 30.251.189-50010-1283993662994, blockid: blk_-4686540439725119008_1985, 
> duration: 139000
> {code}
> This was strange coz our block size was 64k, and on disk block size after 
> compression should generally have been around 6k.
> Some print debugging at the HFile and BoundedRangeFileInputStream (which is 
> wrapped by createDecompressionStream) revealed the following:
> We are trying to read 20k from DFS @ HFile layer. The 
> BounderRangeFileInputStream instead reads several header bytes 1 byte at a 
> time, and then reads a 11k chunk and later a 9k chunk.
> {code}
> 2010-09-16 09:21:27,912 INFO org.apache.hadoop.hbase.io.hfile.HFile: ### fs 
> read @ offset = 34386760 compressedSize = 20711 decompressedSize = 92324
> 2010-09-16 09:21:27,912 INFO 
> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to 
> reading @ 34386760; bytes: 1
> 2010-09-16 09:21:27,912 INFO 
> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to 
> reading @ 34386761; bytes: 1
> 2010-09-16 09:21:27,912 INFO 
> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to 
> reading @ 34386762; bytes: 1
> 2010-09-16 09:21:27,912 INFO 
> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to 
> reading @ 34386763; bytes: 1
> 2010-09-16 09:21:27,912 INFO 
> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to 
> reading @ 34386764; bytes: 1
> 2010-09-16 09:21:27,912 INFO 
> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to 
> reading @ 34386765; bytes: 1
> 2010-09-16 09:21:27,912 INFO 
> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to 
> reading @ 34386766; bytes: 1
> 2010-09-16 09:21:27,912 INFO 
> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to 
> reading @ 34386767; bytes: 1
> 2010-09-16 09:21:27,912 INFO 
> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to 
> reading @ 34386768; bytes: 11005
> 2010-09-16 09:21:27,912 INFO 
> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to 
> reading @ 34397773; bytes: 1
> 2010-09-16 09:21:27,912 INFO 
> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to 
> reading @ 34397774; bytes: 1
> 2010-09-16 09:21:27,912 INFO 
> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to 
> reading @ 34397775; bytes: 1
> 2010-09-16 09:21:27,912 INFO 
> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to 
> reading @ 34397776; bytes: 1
> 2010-09-16 09:21:27,912 INFO 
> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to 
> reading @ 34397777; bytes: 1
> 2010-09-16 09:21:27,912 INFO 
> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to 
> reading @ 34397778; bytes: 1
> 2010-09-16 09:21:27,912 INFO 
> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to 
> reading @ 34397779; bytes: 1
> 2010-09-16 09:21:27,913 INFO 
> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to 
> reading @ 34397780; bytes: 1
> 2010-09-16 09:21:27,913 INFO 
> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to 
> reading @ 34397781; bytes: 9690
> {code}
> Seems like it should be an easy fix to prefetch the compressed size... rather 
> than incremental fetches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to