Eungsop Yoo created HBASE-30226:
-----------------------------------
Summary: Reverse FuzzyRowFilter can stop making progress when the
reverse seek hint is equal to the current row
Key: HBASE-30226
URL: https://issues.apache.org/jira/browse/HBASE-30226
Project: HBase
Issue Type: Bug
Affects Versions: 2.5.12
Reporter: Eungsop Yoo
Observed on HBase 2.5.12. This likely affects all versions after 2.5.11 that
include the reverse FuzzyRowFilter hint adjustment from HBASE-28634. A reverse
Scan with FuzzyRowFilter can keep a RegionServer scan handler RUNNABLE in
FuzzyRowFilter / RowTracker. In the observed case, the scan queue was empty,
but active scan handlers remained CPU-bound.
Hot thread stacks repeatedly showed:
{noformat}
FuzzyRowFilter.getNextForFuzzyRule
FuzzyRowFilter$RowTracker.updateWith
FuzzyRowFilter$RowTracker.updateTracker
FuzzyRowFilter.getNextCellHint
UserScanQueryMatcher.getNextKeyHint
StoreScanner.next
RSRpcServices.scan
{noformat}
The issue appears when a reverse seek hint does not move before the current
row. RowTracker can then keep revisiting the same row candidate.
This is not caused by consecutive non-matching rows alone. The problematic case
is when the hint from one non-matching row points to an existing next
non-matching row, and evaluating that row recreates the same-row hint.
h3. Case explanation
The important part is where the reverse hint sends the scanner.
Does not reproduce with the original HBASE-28634 example:
* Filter: 1114??
* Table order in reverse: 111777 non-match, 111611 non-match, 111511
non-match, 111446 match
* Actual scan flow: 111777 -> hint 1115 -> seek -> 111446 match
* The scanner skips the intermediate non-matching rows, so RowTracker does not
enter the bad poll/add state.
Reproduces:
* Filter: a?a
* Table order in reverse: abc non-match, abb non-match, aaa match
* Actual scan flow: abc -> hint abb -> seek -> abb
* Then abb -> hint abb again. RowTracker polls abb and adds abb again, so
updateTracker can loop.
Does not reproduce with only two rows:
* Filter: a?a
* Table order in reverse: abb non-match, aaa match
* Actual scan flow: abb -> aaa match
* The scanner reaches a matching row before the bad RowTracker state is
triggered.
h3. Reproduction with hbase shell
{noformat}
import java.util.Arrays
import org.apache.hadoop.hbase.filter.FuzzyRowFilter
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.util.Pair
create 'FUZZY_REVERSE_REPRO', 'f'
put 'FUZZY_REVERSE_REPRO', 'aaa', 'f:q1', 'v'
put 'FUZZY_REVERSE_REPRO', 'abb', 'f:q1', 'v'
put 'FUZZY_REVERSE_REPRO', 'abc', 'f:q1', 'v'
scan 'FUZZY_REVERSE_REPRO', {
REVERSED => true,
FILTER => FuzzyRowFilter.new(Arrays.asList(
Pair.new(Bytes.toBytesBinary('aaa'), Bytes.toBytesBinary('\x00\x01\x00'))
))
}
{noformat}
The fuzzy rule is a?a. Row aaa matches. Rows abc and abb do not match. In
reverse order, abc is seen before abb. The hint for abc points to abb. When
abb is evaluated next, the reverse hint is again abb, so RowTracker can enter
the poll/add loop even though each row has only one cell.
h3. Expected
The reverse scan skips abb, returns aaa, and finishes.
h3. Actual
On a vulnerable version, the scan does not return and the client eventually
times out. For example, with the default 60 second RPC timeout, hbase shell
reports:
{noformat}
java.net.SocketTimeoutException: callTimeout=60000, callDuration=60142: Call
to address=<regionserver>:16020 failed on local exception:
org.apache.hadoop.hbase.ipc.CallTimeoutException:
Call[id=33,methodName=Scan], waitTime=60010ms, rpcTimeout=60000ms
ERROR: Call[id=33,methodName=Scan], waitTime=60010ms, rpcTimeout=60000ms
{noformat}
After the client timeout or disconnect, the RegionServer scan handler can
remain hot in the FuzzyRowFilter / RowTracker stack.
h3. Cleanup
{noformat}
disable 'FUZZY_REVERSE_REPRO'
drop 'FUZZY_REVERSE_REPRO'
{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)