[ 
https://issues.apache.org/jira/browse/HBASE-14782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15003873#comment-15003873
 ] 

Heng Chen commented on HBASE-14782:
-----------------------------------

I found something more.
All StoreScanner.seekAsDirection and StoreScanner.seekToNextRow called 
StoreScanner.reseek(Cell kv) inside.  
The difference is the param Cell passed in.

In StoreScanner.seekToNextRow,   the param Cell passed in reseek is generated 
by CellUtil.createLastOnRow
But in StoreScanner.seekAsDirection,  it is generated by matcher.getNextKeyHint 
which called FuzzyRowFilter.getNextCellHint inside.

CellUtil.createLastOnRow(Cell kv) will create one cell in the same row as kv,  
but with Long.MIN_VALUE as timestamp.
FuzzyRowFilter.getNextCellHint(Cell kv) will create one cell in the next row 
with Long.MAX_VALUE as timestamp.


There will be logic as below (in {{KeyValueHeap.generalizedSeek}})


{code:title=KeyValueHeap.java} 
    
    if (current == null) {
      return false;
    }
    heap.add(current);
    current = null;

    KeyValueScanner scanner;
    while ((scanner = heap.poll()) != null) {
      Cell topKey = scanner.peek();
      if (comparator.getComparator().compare(seekKey, topKey) <= 0) {
        heap.add(scanner);
        current = pollRealKV();
        return current != null;
      }

      boolean seekResult;
      if (isLazy && heap.size() > 0) {
        seekResult = scanner.requestSeek(seekKey, forward, useBloom);
      } else {
        seekResult = NonLazyKeyValueScanner.doRealSeek(
            scanner, seekKey, forward);
      }

      if (!seekResult) {
        this.scannersForDelayedClose.add(scanner);
      } else {
        heap.add(scanner);
      }
    }

    // Heap is returning empty, scanner is done
    return false;
{code}

{code}
For example,  if we just put "\\x9C\\x00\\x044\\x00\\x00\\x00\\x00" 
and "\\x9C\\x00\\x03\\xE9e\\xBB{X\\x1Fwts\\x1F\\x15vRX" into table.

As original logic,  we will go path StoreScanner.seekAsDirection, 
so seekKey in KeyValueHeap.generalizedSeek will be 
'\\x9C\\x00\\x044\\x00\\x00\\x00\\x00' with Long.MAX_VALUE as timestamp

The first round in while,  topKey is 
"\\x9C\\x00\\x03\\xE9e\\xBB{X\\x1Fwts\\x1F\\x15vRX",  
So "if (comparator.getComparator().compare(seekKey, topKey) <= 0)"  will be 
false and 
we can't find seekKey in NonLazyKeyValueScanner.doRealSeek

At last  KeyValueHeap.heap will be empty and KeyValueHeap.current will be null. 
  

{code}














> FuzzyRowFilter skips valid rows
> -------------------------------
>
>                 Key: HBASE-14782
>                 URL: https://issues.apache.org/jira/browse/HBASE-14782
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: Vladimir Rodionov
>            Assignee: Heng Chen
>         Attachments: HBASE-14782.patch
>
>
> The issue may affect not only master branch, but previous releases as well.
> This is from one of our customers:
> {quote}
> We are experiencing a problem with the FuzzyRowFilter for HBase scan. We 
> think that it is a bug. 
> Fuzzy filter should pick a row if it matches filter criteria irrespective of 
> other rows present in table but filter is dropping a row depending on some 
> other row present in table. 
> Details/Step to reproduce/Sample outputs below: 
> Missing row key: \x9C\x00\x044\x00\x00\x00\x00 
> Causing row key: \x9C\x00\x03\xE9e\xBB{X\x1Fwts\x1F\x15vRX 
> Prerequisites 
> 1. Create a test table. HBase shell command -- create 'fuzzytest','d' 
> 2. Insert some test data. HBase shell commands: 
> • put 'fuzzytest',"\x9C\x00\x044\x00\x00\x00\x00",'d:a','junk' 
> • put 'fuzzytest',"\x9C\x00\x044\x01\x00\x00\x00",'d:a','junk' 
> • put 'fuzzytest',"\x9C\x00\x044\x00\x01\x00\x00",'d:a','junk' 
> • put 'fuzzytest',"\x9C\x00\x044\x00\x00\x01\x00",'d:a','junk' 
> • put 'fuzzytest',"\x9C\x00\x044\x00\x01\x00\x01",'d:a','junk' 
> • put 'fuzzytest',"\x9B\x00\x044e\xBB\xB2\xBB",'d:a','junk' 
> • put 'fuzzytest',"\x9D\x00\x044e\xBB\xB2\xBB",'d:a','junk' 
> Now when you run the code, you will find \x9C\x00\x044\x00\x00\x00\x00 in 
> output because it matches filter criteria. (Refer how to run code below) 
> Insert the row key causing bug: 
> HBase shell command: put 
> 'fuzzytest',"\x9C\x00\x03\xE9e\xBB{X\x1Fwts\x1F\x15vRX",'d:a','junk' 
> Now when you run the code, you will not find \x9C\x00\x044\x00\x00\x00\x00 in 
> output even though it still matches filter criteria. 
> {quote}
> Verified the issue on master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to