[ 
https://issues.apache.org/jira/browse/HBASE-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16724637#comment-16724637
 ] 

Zheng Hu commented on HBASE-21401:
----------------------------------

bq.  I looked at the patch and I still see double-parse, no? (Once to check 
byte array contains a wholesome KV and then the usual parse that happens as 
part of KV usage?). Was thinking we could check wholesomeness inline with use?

Yes,  it's double-parse now, once to check the wholesome KV,  then parse the 
specific fields such as row/family/qualifler/ts/type and so on.  I did not move 
the check wholesomeness inline with use, because I found that in the upper 
layer,  the cell.getRowOffset() and cell.getRowLength() will be called many 
times.  take the scan processing as an example: 
step.1  load block from hfile, and let the cell to ref to the block; 
step.2  compare the row part with given startRow or stopRow in scan, call the 
cell.getRowOffset() and cell.getRowOffset();
step.3   Merge with other hfiles,  still need compare the row part . call the 
cell.getRowOffset() and cell.getRowOffset()  ; 
step.4   filters ... compare the row/family/qulifier/value. 
step.3   Merge with other stores,   compare the row part ... 

I mean the getRowOffset() and getRowOffset() (or 
getFamilyOffset/getFamilyLength() ... ) will be used in the uppler layer so 
many times.  If we move the row sanity check  in getRowOffset() and  
getRowOffset(),  move the family sanity check in getFamilyOffset() and 
getFamilyOffset .... the sanity check will parse the relative fields so many 
times too ?  the cost even large than the double-check,  so i think the 
double-parse will be better in our case.

Please correct me if  I mis-understood something or missed something.

> Sanity check when constructing the KeyValue
> -------------------------------------------
>
>                 Key: HBASE-21401
>                 URL: https://issues.apache.org/jira/browse/HBASE-21401
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver
>            Reporter: Zheng Hu
>            Assignee: Zheng Hu
>            Priority: Critical
>             Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5
>
>         Attachments: HBASE-21401.v1.patch, HBASE-21401.v2.patch, 
> HBASE-21401.v3.patch, HBASE-21401.v4.patch, HBASE-21401.v4.patch, 
> HBASE-21401.v5.patch, HBASE-21401.v6.patch, HBASE-21401.v7.patch
>
>
> In KeyValueDecoder & ByteBuffKeyValueDecoder,  we pass a byte buffer to 
> initialize the Cell without a sanity check (check each field's offset&len 
> exceed the byte buffer or not), so ArrayIndexOutOfBoundsException may happen 
> when read the cell's fields, such as HBASE-21379,  it's hard to debug this 
> kind of bug. 
> An earlier check will help to find such kind of bugs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to