Jize Ning created HBASE-29907:
---------------------------------
Summary: ROWCOL bloom + StoreScanner.trySkipToNextColumn can
surface out-of-order cells, causing read failure “isDelete failed”
Key: HBASE-29907
URL: https://issues.apache.org/jira/browse/HBASE-29907
Project: HBase
Issue Type: Bug
Components: Filters, Scanners
Affects Versions: 2.5.13, 2.6.4
Reporter: Jize Ning
h3. Summary
We see intermittent read failures (Multi-column GET) when a column family uses
ROWCOL bloom filters. Clients fail with an exception chain that includes:
{code:java}
2026-02-17T07:49:24.041Z,
RpcRetryingCaller{globalStartTime=2026-02-17T07:49:23.547Z, pause=250,
maxAttempts=3}, java.io.IOException: java.io.IOException: isDelete failed:
deleteBuffer=q15, qualifier=q09, timestamp=2010, comparison result: 1
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:479)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102)
at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82)
Caused by: java.lang.IllegalStateException: isDelete failed: deleteBuffer=q15,
qualifier=q09, timestamp=2010, comparison result: 1
at
org.apache.hadoop.hbase.regionserver.querymatcher.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:132)
at
org.apache.hadoop.hbase.regionserver.querymatcher.ScanQueryMatcher.checkDeleted(ScanQueryMatcher.java:204)
at
org.apache.hadoop.hbase.regionserver.querymatcher.NormalUserScanQueryMatcher.match(NormalUserScanQueryMatcher.java:76)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:624)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:145)
at
org.apache.hadoop.hbase.regionserver.RegionScannerImpl.populateResult(RegionScannerImpl.java:342)
at
org.apache.hadoop.hbase.regionserver.RegionScannerImpl.nextInternal(RegionScannerImpl.java:513)
at
org.apache.hadoop.hbase.regionserver.RegionScannerImpl.nextRaw(RegionScannerImpl.java:278)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3402)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3668)
at
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45006)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:415) {code}
h3. Suspected root cause
This appears to be the same class of issue described in
[HBASE-19863|https://issues.apache.org/jira/browse/HBASE-19863], and seems to
regress with the narrower guard introduced by
[HBASE-28055|https://issues.apache.org/jira/browse/HBASE-28055]. When ROWCOL
bloom indicates a row+qualifier is absent from a StoreFile, the scanner may use
a bloom-optimized “fake key”. If such a fake key is consumed during the
trySkipToNextColumn skip loop, a subsequent next() can advance from a stale
physical HFile position and return a cell that sorts before the column being
skipped. When that reaches delete-tracking (isDelete), the read can fail and
surface as isDelete failed.
the HBASE-28055 change *narrowed the safety check* in a way that is
{*}semantically wrong{*}:
* {*}HBASE-19863's check ({{{}compareKeyForNextColumn < 0{}}}){*}: Catches
_any_ backward ordering of the next cell relative to the current cell's
expected next column. This is a *general* guard — it handles any case where
consuming cells in the loop caused the heap to surface a cell that violates
ordering, regardless of whether it was a bloom fake key or something else.
* {*}HBASE-28055's check ({{{}timestamp == OLDEST_TIMESTAMP{}}}){*}: Only
catches the case where {{cell}} itself is a bloom filter fake key (since fake
keys are created with {{{}OLDEST_TIMESTAMP{}}}). But the problem scenario
described in HBASE-19863 is that the bloom fake key gets *consumed inside the
loop* by {{{}heap.next(){}}}, and the _next_ real cell that surfaces is now out
of order. In that case, {{cell}} (the trigger cell passed into
{{{}trySkipToNextColumn{}}}) is a *real cell* with a real timestamp — _not_
{{{}OLDEST_TIMESTAMP{}}}. The check misses it entirely
h3. Impact / correctness concerns
Beyond the immediate isDelete failed read failures, this indicates a deeper
correctness issue: the scan pipeline relies on the invariant that KeyValueHeap
delivers Cells in non-decreasing key order across all participating scanners.
When ROWCOL bloom + trySkipToNextColumn can result in a “smaller key” being
surfaced after a larger key (i.e., a backward jump), the heap’s ordering
guarantee is effectively violated from the perspective of consumers (e.g.,
delete tracking and matchers).
We were able to reproduce the bug with both a mini-cluster stress test and a
small deterministic unit test. It would hit exception with the HBASE-28055 fix
but would not hit it with HBASE-19863 fix.
*Proposed fix*
**
We should revert the change in HBASE-28055. The claimed "performance
improvement" comes from skipping the reseek that should be used to fix the heap
ordering.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)