HFile V2 does not honor setCacheBlocks when scanning. -----------------------------------------------------
Key: HBASE-4496 URL: https://issues.apache.org/jira/browse/HBASE-4496 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0, 0.94.0 Reporter: Lars Hofhansl Fix For: 0.92.0, 0.94.0 While testing the LRU cache during the scanning I noticed quite some churn in the cache even when Scan.cacheBlocks is set to false. After debugging this, I found that HFile V2 always caches blocks in the LRU cache regardless of the cacheBlocks setting. Here's a trace (from Eclipse) showing the problem: HFileReaderV2.readBlock(long, int, boolean, boolean, boolean) line: 279 HFileReaderV2.readBlockData(long, long, int, boolean) line: 219 HFileBlockIndex$BlockIndexReader.seekToDataBlock(byte[], int, int, HFileBlock) line: 191 HFileReaderV2$ScannerV2.seekTo(byte[], int, int, boolean) line: 502 HFileReaderV2$ScannerV2.reseekTo(byte[], int, int) line: 539 StoreFileScanner.reseekAtOrAfter(HFileScanner, KeyValue) line: 151 StoreFileScanner.reseek(KeyValue) line: 110 KeyValueHeap.reseek(KeyValue) line: 255 StoreScanner.reseek(KeyValue) line: 409 StoreScanner.next(List<KeyValue>, int) line: 304 KeyValueHeap.next(List<KeyValue>, int) line: 114 KeyValueHeap.next(List<KeyValue>) line: 143 HRegion$RegionScannerImpl.nextRow(byte[]) line: 2774 HRegion$RegionScannerImpl.nextInternal(int) line: 2722 HRegion$RegionScannerImpl.next(List<KeyValue>, int) line: 2682 HRegion$RegionScannerImpl.next(List<KeyValue>) line: 2699 HRegionServer.next(long, int) line: 2092 Every scanner.next causes a reseek, which eventually causes a call to HFileBlockIndex$BlockIndexReader.seekToDataBlock(...) at which point the cacheBlocks information is lost. HFileReaderV2.readBlockData calls HFileReaderV2.readBlock with cacheBlocks set unconditionally to true. The fix is not immediately clear, unless we want to pass cacheBlocks to HFileBlockIndex$BlockIndexReader.seekToDataBlock and then on to HFileBlock.BasicReader.readBlockData and all its implementers, which is ugly as readBlockData should not care about caching. Avoiding caching during scans is somewhat important for us. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira