Arjen Roodselaar created HBASE-10270:
----------------------------------------

             Summary: Remove DataBlockEncoding from BlockCacheKey
                 Key: HBASE-10270
                 URL: https://issues.apache.org/jira/browse/HBASE-10270
             Project: HBase
          Issue Type: Improvement
          Components: regionserver
    Affects Versions: 0.89-fb
            Reporter: Arjen Roodselaar
            Assignee: Arjen Roodselaar
            Priority: Minor
             Fix For: 0.89-fb


When a block is added to the BlockCache its DataBlockEncoding is stored on the 
BlockCacheKey. This block encoding is used in the calculation of the hashCode 
and as such matters when cache lookups are done. Because the keys differ for 
encoded and unencoded (data) blocks, there is a potential for caching them 
twice or missing the cache. This happens for example when using Scan preloading 
as AbstractScannerV2.readNextDataBlock() does a read without knowing the block 
type or the encoding.

This patch removes the block encoding from the key and forces the caller of 
HFileReaderV2.readBlock() to specify the expected BlockType as well as the 
expected DataBlockEncoding when these matter. This allows for a decision on 
either of these at read time instead of cache time, puts responsibility where 
appropriate, fixes some cache misses when using the scan preloading (which does 
a read without knowing the type or encoding), allows for the BlockCacheKey to 
be re-used by the L2 BucketCache and sets us up for a future CompoundScannerV2 
which can read both un-encoded and encoded data blocks.

A gotcha here: ScannerV2 and EncodedScannerV2 expect BlockType.DATA and 
BlockType.ENCODED_DATA respectively and will throw when given a block of the 
wrong type. Adding the DataBlockEncoding on the cache key caused a cache miss 
if the block was cached with the wrong encoding, implicitly defining the 
BlockType and thus keeping this from happening. It is now the scanner's 
responsibility to specify both the expected type and encoding (which is more 
appropriate).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to