[ 
https://issues.apache.org/jira/browse/HBASE-16624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15486459#comment-15486459
 ] 

deepankar commented on HBASE-16624:
-----------------------------------

Function is nested deep inside the HFileScannerImpl and it was hard to come up 
with a UT. But if you can cross the sequenceId above the Integer.MAX_VALUE you 
would immediately see the issue, to confirm the issue I have changed the 
function 

{code}
    /**
     * Actually do the mvcc read. Does no checks.
     * @param offsetFromPos
     */
    private long _readMvccVersion(ByteBuff blockBuffer) {
        long currMemstoreTS = 0;
        int offsetFromPos = 0;
        // This is Bytes#bytesToVint inlined so can save a few instructions in 
this hot method; i.e.
        // previous if one-byte vint, we'd redo the vint call to find int size.
        // Also the method is kept small so can be inlined.
        byte firstByte = blockBuffer.getByteAfterPosition(offsetFromPos);
        int len = WritableUtils.decodeVIntSize(firstByte);
        if (len == 1) {
            currMemstoreTS = firstByte;
        } else {
            int remaining = len -1;
            long i = 0;
            offsetFromPos++;
            if (remaining >= Bytes.SIZEOF_INT) {
                // The int read has to be converted to unsigned long so the & op
                i = (blockBuffer.getIntAfterPosition(offsetFromPos) & 
0x00000000ffffffffL);
                remaining -= Bytes.SIZEOF_INT;
                offsetFromPos += Bytes.SIZEOF_INT;
            }
            if (remaining >= Bytes.SIZEOF_SHORT) {
                short s = blockBuffer.getShortAfterPosition(offsetFromPos);
                i = i << 16;
                i = i | (s & 0xFFFF);
                remaining -= Bytes.SIZEOF_SHORT;
                offsetFromPos += Bytes.SIZEOF_SHORT;
            }
            for (int idx = 0; idx < remaining; idx++) {
                byte b = blockBuffer.getByteAfterPosition(offsetFromPos + idx);
                i = i << 8;
                i = i | (b & 0xFF);
            }
            currMemstoreTS = (WritableUtils.isNegativeVInt(firstByte) ? ~i : i);
        }
        return currMemstoreTS;
    }
{code}

And I passed the byteBuff I got from

{code}
        ByteArrayDataOutput out = ByteStreams.newDataOutput();
        WritableUtils.writeVLong(out, 3085788160L);
        new SingleByteBuff(ByteBuffer.wrap(out.toByteArray()))
{code}

> MVCC DeSerialization bug in the HFileScannerImpl
> ------------------------------------------------
>
>                 Key: HBASE-16624
>                 URL: https://issues.apache.org/jira/browse/HBASE-16624
>             Project: HBase
>          Issue Type: Bug
>          Components: HFile
>    Affects Versions: 2.0.0
>            Reporter: deepankar
>            Assignee: deepankar
>            Priority: Blocker
>         Attachments: HBASE-16624.patch
>
>
> My colleague [~naggarwal] found a bug in the deserialization of mvcc from 
> HFile, As a part of the optimization of deserialization of VLong, we read a 
> int at once but we forgot to convert it to unsigned one. 
> This would cause issues because once we cross the integer threshold in 
> sequenceId and a compaction happens we would write MAX_MEMSTORE_TS in the 
> trailer as 0 (because we will be reading negative values from the file that 
> got flushed with sequenceId > Integer.MAX_VALUE). And once we have 
> MAX_MEMSTORE_TS as 0, and there are sequenceId values present alongside with 
> KeyValues the regionserver will now start failing to read the compacted file 
> and thus corruption. 
> Interestingly this would happen only on the tables that don't have  
> DataBlockEncoding enabled and unfortunately in our case that turned out to be 
> META and a another small table.
> Fix is small (~20 chars) and attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to