[ https://issues.apache.org/jira/browse/HBASE-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16724637#comment-16724637 ]
Zheng Hu commented on HBASE-21401: ---------------------------------- bq. I looked at the patch and I still see double-parse, no? (Once to check byte array contains a wholesome KV and then the usual parse that happens as part of KV usage?). Was thinking we could check wholesomeness inline with use? Yes, it's double-parse now, once to check the wholesome KV, then parse the specific fields such as row/family/qualifler/ts/type and so on. I did not move the check wholesomeness inline with use, because I found that in the upper layer, the cell.getRowOffset() and cell.getRowLength() will be called many times. take the scan processing as an example: step.1 load block from hfile, and let the cell to ref to the block; step.2 compare the row part with given startRow or stopRow in scan, call the cell.getRowOffset() and cell.getRowOffset(); step.3 Merge with other hfiles, still need compare the row part . call the cell.getRowOffset() and cell.getRowOffset() ; step.4 filters ... compare the row/family/qulifier/value. step.3 Merge with other stores, compare the row part ... I mean the getRowOffset() and getRowOffset() (or getFamilyOffset/getFamilyLength() ... ) will be used in the uppler layer so many times. If we move the row sanity check in getRowOffset() and getRowOffset(), move the family sanity check in getFamilyOffset() and getFamilyOffset .... the sanity check will parse the relative fields so many times too ? the cost even large than the double-check, so i think the double-parse will be better in our case. Please correct me if I mis-understood something or missed something. > Sanity check when constructing the KeyValue > ------------------------------------------- > > Key: HBASE-21401 > URL: https://issues.apache.org/jira/browse/HBASE-21401 > Project: HBase > Issue Type: Sub-task > Components: regionserver > Reporter: Zheng Hu > Assignee: Zheng Hu > Priority: Critical > Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5 > > Attachments: HBASE-21401.v1.patch, HBASE-21401.v2.patch, > HBASE-21401.v3.patch, HBASE-21401.v4.patch, HBASE-21401.v4.patch, > HBASE-21401.v5.patch, HBASE-21401.v6.patch, HBASE-21401.v7.patch > > > In KeyValueDecoder & ByteBuffKeyValueDecoder, we pass a byte buffer to > initialize the Cell without a sanity check (check each field's offset&len > exceed the byte buffer or not), so ArrayIndexOutOfBoundsException may happen > when read the cell's fields, such as HBASE-21379, it's hard to debug this > kind of bug. > An earlier check will help to find such kind of bugs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)