[
https://issues.apache.org/jira/browse/HBASE-30226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HBASE-30226:
-----------------------------------
Labels: pull-request-available (was: )
> Reverse FuzzyRowFilter can stop making progress when the reverse seek hint is
> equal to the current row
> ------------------------------------------------------------------------------------------------------
>
> Key: HBASE-30226
> URL: https://issues.apache.org/jira/browse/HBASE-30226
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.5.12
> Reporter: Eungsop Yoo
> Priority: Major
> Labels: pull-request-available
>
> Observed on HBase 2.5.12. This likely affects all versions after 2.5.11 that
> include the reverse FuzzyRowFilter hint adjustment from HBASE-28634. A
> reverse Scan with FuzzyRowFilter can keep a RegionServer scan handler
> RUNNABLE in FuzzyRowFilter / RowTracker. In the observed case, the scan queue
> was empty, but active scan handlers remained CPU-bound.
> Hot thread stacks repeatedly showed:
> {noformat}
> FuzzyRowFilter.getNextForFuzzyRule
> FuzzyRowFilter$RowTracker.updateWith
> FuzzyRowFilter$RowTracker.updateTracker
> FuzzyRowFilter.getNextCellHint
> UserScanQueryMatcher.getNextKeyHint
> StoreScanner.next
> RSRpcServices.scan
> {noformat}
> The issue appears when a reverse seek hint does not move before the current
> row. RowTracker can then keep revisiting the same row candidate.
> This is not caused by consecutive non-matching rows alone. The problematic
> case is when the hint from one non-matching row points to an existing next
> non-matching row, and evaluating that row recreates the same-row hint.
> h3. Case explanation
> The important part is where the reverse hint sends the scanner.
> Does not reproduce with the original HBASE-28634 example:
> * Filter: 1114??
> * Table order in reverse: 111777 non-match, 111611 non-match, 111511
> non-match, 111446 match
> * Actual scan flow: 111777 -> hint 1115 -> seek -> 111446 match
> * The scanner skips the intermediate non-matching rows, so RowTracker does
> not enter the bad poll/add state.
> Reproduces:
> * Filter: a?a
> * Table order in reverse: abc non-match, abb non-match, aaa match
> * Actual scan flow: abc -> hint abb -> seek -> abb
> * Then abb -> hint abb again. RowTracker polls abb and adds abb again, so
> updateTracker can loop.
> Does not reproduce with only two rows:
> * Filter: a?a
> * Table order in reverse: abb non-match, aaa match
> * Actual scan flow: abb -> aaa match
> * The scanner reaches a matching row before the bad RowTracker state is
> triggered.
> h3. Reproduction with hbase shell
> {noformat}
> import java.util.Arrays
> import org.apache.hadoop.hbase.filter.FuzzyRowFilter
> import org.apache.hadoop.hbase.util.Bytes
> import org.apache.hadoop.hbase.util.Pair
> create 'FUZZY_REVERSE_REPRO', 'f'
> put 'FUZZY_REVERSE_REPRO', 'aaa', 'f:q1', 'v'
> put 'FUZZY_REVERSE_REPRO', 'abb', 'f:q1', 'v'
> put 'FUZZY_REVERSE_REPRO', 'abc', 'f:q1', 'v'
> scan 'FUZZY_REVERSE_REPRO', {
> REVERSED => true,
> FILTER => FuzzyRowFilter.new(Arrays.asList(
> Pair.new(Bytes.toBytesBinary('aaa'),
> Bytes.toBytesBinary('\x00\x01\x00'))
> ))
> }
> {noformat}
> The fuzzy rule is a?a. Row aaa matches. Rows abc and abb do not match. In
> reverse order, abc is seen before abb. The hint for abc points to abb. When
> abb is evaluated next, the reverse hint is again abb, so RowTracker can enter
> the poll/add loop even though each row has only one cell.
> h3. Expected
> The reverse scan skips abb, returns aaa, and finishes.
> h3. Actual
> On a vulnerable version, the scan does not return and the client eventually
> times out. For example, with the default 60 second RPC timeout, hbase shell
> reports:
> {noformat}
> java.net.SocketTimeoutException: callTimeout=60000, callDuration=60142:
> Call to address=<regionserver>:16020 failed on local exception:
> org.apache.hadoop.hbase.ipc.CallTimeoutException:
> Call[id=33,methodName=Scan], waitTime=60010ms, rpcTimeout=60000ms
> ERROR: Call[id=33,methodName=Scan], waitTime=60010ms, rpcTimeout=60000ms
> {noformat}
> After the client timeout or disconnect, the RegionServer scan handler can
> remain hot in the FuzzyRowFilter / RowTracker stack.
> h3. Cleanup
> {noformat}
> disable 'FUZZY_REVERSE_REPRO'
> drop 'FUZZY_REVERSE_REPRO'
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)