Eungsop Yoo created HBASE-30226:
-----------------------------------

             Summary: Reverse FuzzyRowFilter can stop making progress when the 
reverse seek hint is equal to the current row
                 Key: HBASE-30226
                 URL: https://issues.apache.org/jira/browse/HBASE-30226
             Project: HBase
          Issue Type: Bug
    Affects Versions: 2.5.12
            Reporter: Eungsop Yoo


Observed on HBase 2.5.12. This likely affects all versions after 2.5.11 that 
include the reverse FuzzyRowFilter hint adjustment from HBASE-28634. A reverse 
Scan with FuzzyRowFilter can keep a RegionServer scan handler RUNNABLE in 
FuzzyRowFilter / RowTracker. In the observed case, the scan queue was empty, 
but active scan handlers remained CPU-bound.

Hot thread stacks repeatedly showed:
{noformat}
  FuzzyRowFilter.getNextForFuzzyRule
  FuzzyRowFilter$RowTracker.updateWith
  FuzzyRowFilter$RowTracker.updateTracker
  FuzzyRowFilter.getNextCellHint
  UserScanQueryMatcher.getNextKeyHint
  StoreScanner.next
  RSRpcServices.scan
  {noformat}
The issue appears when a reverse seek hint does not move before the current 
row. RowTracker can then keep revisiting the same row candidate.

This is not caused by consecutive non-matching rows alone. The problematic case 
is when the hint from one non-matching row points to an existing next
non-matching row, and evaluating that row recreates the same-row hint.
h3. Case explanation

The important part is where the reverse hint sends the scanner.

Does not reproduce with the original HBASE-28634 example:
 * Filter: 1114??
 * Table order in reverse: 111777 non-match, 111611 non-match, 111511 
non-match, 111446 match
 * Actual scan flow: 111777 -> hint 1115 -> seek -> 111446 match
 * The scanner skips the intermediate non-matching rows, so RowTracker does not 
enter the bad poll/add state.

Reproduces:
 * Filter: a?a
 * Table order in reverse: abc non-match, abb non-match, aaa match
 * Actual scan flow: abc -> hint abb -> seek -> abb
 * Then abb -> hint abb again. RowTracker polls abb and adds abb again, so 
updateTracker can loop.

Does not reproduce with only two rows:
 * Filter: a?a
 * Table order in reverse: abb non-match, aaa match
 * Actual scan flow: abb -> aaa match
 * The scanner reaches a matching row before the bad RowTracker state is 
triggered.

h3. Reproduction with hbase shell
{noformat}
  import java.util.Arrays
  import org.apache.hadoop.hbase.filter.FuzzyRowFilter
  import org.apache.hadoop.hbase.util.Bytes
  import org.apache.hadoop.hbase.util.Pair

  create 'FUZZY_REVERSE_REPRO', 'f'

  put 'FUZZY_REVERSE_REPRO', 'aaa', 'f:q1', 'v'
  put 'FUZZY_REVERSE_REPRO', 'abb', 'f:q1', 'v'
  put 'FUZZY_REVERSE_REPRO', 'abc', 'f:q1', 'v'

  scan 'FUZZY_REVERSE_REPRO', {
    REVERSED => true,
    FILTER => FuzzyRowFilter.new(Arrays.asList(
      Pair.new(Bytes.toBytesBinary('aaa'), Bytes.toBytesBinary('\x00\x01\x00'))
    ))
  }
  {noformat}
The fuzzy rule is a?a. Row aaa matches. Rows abc and abb do not match. In 
reverse order, abc is seen before abb. The hint for abc points to abb. When
abb is evaluated next, the reverse hint is again abb, so RowTracker can enter 
the poll/add loop even though each row has only one cell.
h3. Expected

The reverse scan skips abb, returns aaa, and finishes.
h3. Actual

On a vulnerable version, the scan does not return and the client eventually 
times out. For example, with the default 60 second RPC timeout, hbase shell
reports:
{noformat}
  java.net.SocketTimeoutException: callTimeout=60000, callDuration=60142: Call 
to address=<regionserver>:16020 failed on local exception:
  org.apache.hadoop.hbase.ipc.CallTimeoutException: 
Call[id=33,methodName=Scan], waitTime=60010ms, rpcTimeout=60000ms
  ERROR: Call[id=33,methodName=Scan], waitTime=60010ms, rpcTimeout=60000ms
  {noformat}
After the client timeout or disconnect, the RegionServer scan handler can 
remain hot in the FuzzyRowFilter / RowTracker stack.
h3. Cleanup
{noformat}
  disable 'FUZZY_REVERSE_REPRO'
  drop 'FUZZY_REVERSE_REPRO'
  {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to