[ 
https://issues.apache.org/jira/browse/HBASE-15554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15411754#comment-15411754
 ] 

Anoop Sam John commented on HBASE-15554:
----------------------------------------

Ya right now the logic in Hash impls, get bytes at diff offset and do calc.  
May be that can be refactored in such a way that we can work on bytes coming in 
an iterative manner.  Any way that is ok. If that is not possible can work like 
the HashKey gives a byte at given offset.  The Key diff what I was suggesting 
is instead of having duplicated methods in Hash, we have one which work on a 
HashKey (I just call it that way) and we have diff impl of the HashKey 
depending on the type of bloom and so the bytes it uses.

{code}
 public byte extractByte(int offset) {
    // Always assume that this cell has keyvalue serialized key structure.
    // TODO : Add Key interface to always assert the above key structure
{code}
Now with a proper Cell based abstraction in place, we can remove this 
assumption I believe and then may be no need for a Key interface.
Row bloom impl of HashKey return rk bytes.
ROW_COL type impl need to return rkLen (2 bytes), rk bytes, 0 (single byte to 
say cf len as 0), qual bytes, fixed time stamp (8 bytes), one type byte (fixed)
So we will need some sort of logic in the impl method to map the incoming 
offset to correct area.  Like offset 0 and 1 to return rk len, 2 - <rklel>+2 to 
return rk bytes.. like this.. Need some logic but I dont think that is going to 
be very heavy op. May be wrt that an iterator way would have been having lesser 
overhead. But I agree u will need some sort of refactoring in the Hash impls. 
But that should be possible I believe. Did not check in detail

> StoreFile$Writer.appendGeneralBloomFilter generates extra KV
> ------------------------------------------------------------
>
>                 Key: HBASE-15554
>                 URL: https://issues.apache.org/jira/browse/HBASE-15554
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Performance
>            Reporter: Vladimir Rodionov
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 2.0.0
>
>         Attachments: HBASE-15554.patch, HBASE-15554_3.patch, 
> HBASE-15554_4.patch, HBASE-15554_6.patch, HBASE-15554_7.patch, 
> HBASE-15554_9.patch
>
>
> Accounts for 10% memory allocation in compaction thread when BloomFilterType 
> is ROWCOL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to