[ https://issues.apache.org/jira/browse/HBASE-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910343#action_12910343 ]
ryan rawson commented on HBASE-3006: ------------------------------------ if the objects are short lived, it doesnt matter the object allocation size. If they are used only for the lifecycle of reading a single block, i'd say size it to the compressed Size. > Reading compressed HFile blocks causes way too many DFS RPC calls severly > impacting performance > ----------------------------------------------------------------------------------------------- > > Key: HBASE-3006 > URL: https://issues.apache.org/jira/browse/HBASE-3006 > Project: HBase > Issue Type: Bug > Affects Versions: 0.89.20100621 > Reporter: Kannan Muthukkaruppan > Assignee: Kannan Muthukkaruppan > Priority: Critical > Attachments: HBASE-3006.txt > > > On some read perf tests, we noticed several perf outliers (10 second plus > range). The rows were large (spanning multiple blocks, but still the numbers > didn't add up). We had compression turned on. > We enabled DN clienttrace logging, > log4j.logger.org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace=DEBUG > and noticed lots of 516 byte reads at the DN level, several of them at the > same offset in the block. > {code} > 2010-09-16 09:28:32,335 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: > /10.30.251.189:50010, dest: /10.30.251.189:38713, bytes: 516, op: HDFS_READ, > cliID: DFSClient_-436329957, offset: 39884800, srvID: DS-1757894045-10.3\ > 0.251.189-50010-1283993662994, blockid: blk_-4686540439725119008_1985, > duration: 203000 > 2010-09-16 09:28:32,336 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: > /10.30.251.189:50010, dest: /10.30.251.189:40547, bytes: 516, op: HDFS_READ, > cliID: DFSClient_-436329957, offset: 39884800, srvID: DS-1757894045-10.3\ > 0.251.189-50010-1283993662994, blockid: blk_-4686540439725119008_1985, > duration: 119000 > 2010-09-16 09:28:32,337 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: > /10.30.251.189:50010, dest: /10.30.251.189:40650, bytes: 516, op: HDFS_READ, > cliID: DFSClient_-436329957, offset: 39884800, srvID: DS-1757894045-10.3\ > 0.251.189-50010-1283993662994, blockid: blk_-4686540439725119008_1985, > duration: 149000 > 2010-09-16 09:28:32,337 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: > /10.30.251.189:50010, dest: /10.30.251.189:40861, bytes: 516, op: HDFS_READ, > cliID: DFSClient_-436329957, offset: 39884800, srvID: DS-1757894045-10.3\ > 0.251.189-50010-1283993662994, blockid: blk_-4686540439725119008_1985, > duration: 135000 > 2010-09-16 09:28:32,338 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: > /10.30.251.189:50010, dest: /10.30.251.189:41129, bytes: 516, op: HDFS_READ, > cliID: DFSClient_-436329957, offset: 39884800, srvID: DS-1757894045-10.3\ > 0.251.189-50010-1283993662994, blockid: blk_-4686540439725119008_1985, > duration: 117000 > 2010-09-16 09:28:32,339 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: > /10.30.251.189:50010, dest: /10.30.251.189:41691, bytes: 516, op: HDFS_READ, > cliID: DFSClient_-436329957, offset: 39884800, srvID: DS-1757894045-10.3\ > 0.251.189-50010-1283993662994, blockid: blk_-4686540439725119008_1985, > duration: 148000 > 2010-09-16 09:28:32,339 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: > /10.30.251.189:50010, dest: /10.30.251.189:42881, bytes: 516, op: HDFS_READ, > cliID: DFSClient_-436329957, offset: 39884800, srvID: DS-1757894045-10.3\ > 0.251.189-50010-1283993662994, blockid: blk_-4686540439725119008_1985, > duration: 114000 > 2010-09-16 09:28:32,341 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: > /10.30.251.189:50010, dest: /10.30.251.189:49511, bytes: 516, op: HDFS_READ, > cliID: DFSClient_-436329957, offset: 39884800, srvID: DS-1757894045-10.3\ > 0.251.189-50010-1283993662994, blockid: blk_-4686540439725119008_1985, > duration: 153000 > 2010-09-16 09:28:32,342 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: > /10.30.251.189:50010, dest: /10.30.251.189:51158, bytes: 3096, op: HDFS_READ, > cliID: DFSClient_-436329957, offset: 39884800, srvID: DS-1757894045-10.\ > 30.251.189-50010-1283993662994, blockid: blk_-4686540439725119008_1985, > duration: 139000 > {code} > This was strange coz our block size was 64k, and on disk block size after > compression should generally have been around 6k. > Some print debugging at the HFile and BoundedRangeFileInputStream (which is > wrapped by createDecompressionStream) revealed the following: > We are trying to read 20k from DFS @ HFile layer. The > BounderRangeFileInputStream instead reads several header bytes 1 byte at a > time, and then reads a 11k chunk and later a 9k chunk. > {code} > 2010-09-16 09:21:27,912 INFO org.apache.hadoop.hbase.io.hfile.HFile: ### fs > read @ offset = 34386760 compressedSize = 20711 decompressedSize = 92324 > 2010-09-16 09:21:27,912 INFO > org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to > reading @ 34386760; bytes: 1 > 2010-09-16 09:21:27,912 INFO > org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to > reading @ 34386761; bytes: 1 > 2010-09-16 09:21:27,912 INFO > org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to > reading @ 34386762; bytes: 1 > 2010-09-16 09:21:27,912 INFO > org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to > reading @ 34386763; bytes: 1 > 2010-09-16 09:21:27,912 INFO > org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to > reading @ 34386764; bytes: 1 > 2010-09-16 09:21:27,912 INFO > org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to > reading @ 34386765; bytes: 1 > 2010-09-16 09:21:27,912 INFO > org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to > reading @ 34386766; bytes: 1 > 2010-09-16 09:21:27,912 INFO > org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to > reading @ 34386767; bytes: 1 > 2010-09-16 09:21:27,912 INFO > org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to > reading @ 34386768; bytes: 11005 > 2010-09-16 09:21:27,912 INFO > org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to > reading @ 34397773; bytes: 1 > 2010-09-16 09:21:27,912 INFO > org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to > reading @ 34397774; bytes: 1 > 2010-09-16 09:21:27,912 INFO > org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to > reading @ 34397775; bytes: 1 > 2010-09-16 09:21:27,912 INFO > org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to > reading @ 34397776; bytes: 1 > 2010-09-16 09:21:27,912 INFO > org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to > reading @ 34397777; bytes: 1 > 2010-09-16 09:21:27,912 INFO > org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to > reading @ 34397778; bytes: 1 > 2010-09-16 09:21:27,912 INFO > org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to > reading @ 34397779; bytes: 1 > 2010-09-16 09:21:27,913 INFO > org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to > reading @ 34397780; bytes: 1 > 2010-09-16 09:21:27,913 INFO > org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream: ### seeking to > reading @ 34397781; bytes: 9690 > {code} > Seems like it should be an easy fix to prefetch the compressed size... rather > than incremental fetches. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.