Jize Ning created HBASE-29907:
---------------------------------

             Summary: ROWCOL bloom + StoreScanner.trySkipToNextColumn can 
surface out-of-order cells, causing read failure “isDelete failed”
                 Key: HBASE-29907
                 URL: https://issues.apache.org/jira/browse/HBASE-29907
             Project: HBase
          Issue Type: Bug
          Components: Filters, Scanners
    Affects Versions: 2.5.13, 2.6.4
            Reporter: Jize Ning


h3. Summary

We see intermittent read failures (Multi-column GET) when a column family uses 
ROWCOL bloom filters. Clients fail with an exception chain that includes:
{code:java}
2026-02-17T07:49:24.041Z, 
RpcRetryingCaller{globalStartTime=2026-02-17T07:49:23.547Z, pause=250, 
maxAttempts=3}, java.io.IOException: java.io.IOException: isDelete failed: 
deleteBuffer=q15, qualifier=q09, timestamp=2010, comparison result: 1
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:479)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
 at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102)
 at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82)
Caused by: java.lang.IllegalStateException: isDelete failed: deleteBuffer=q15, 
qualifier=q09, timestamp=2010, comparison result: 1
 at 
org.apache.hadoop.hbase.regionserver.querymatcher.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:132)
 at 
org.apache.hadoop.hbase.regionserver.querymatcher.ScanQueryMatcher.checkDeleted(ScanQueryMatcher.java:204)
 at 
org.apache.hadoop.hbase.regionserver.querymatcher.NormalUserScanQueryMatcher.match(NormalUserScanQueryMatcher.java:76)
 at 
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:624)
 at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:145)
 at 
org.apache.hadoop.hbase.regionserver.RegionScannerImpl.populateResult(RegionScannerImpl.java:342)
 at 
org.apache.hadoop.hbase.regionserver.RegionScannerImpl.nextInternal(RegionScannerImpl.java:513)
 at 
org.apache.hadoop.hbase.regionserver.RegionScannerImpl.nextRaw(RegionScannerImpl.java:278)
 at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3402)
 at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3668)
 at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45006)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:415) {code}
 
h3. Suspected root cause

This appears to be the same class of issue described in 
[HBASE-19863|https://issues.apache.org/jira/browse/HBASE-19863], and seems to 
regress with the narrower guard introduced by 
[HBASE-28055|https://issues.apache.org/jira/browse/HBASE-28055]. When ROWCOL 
bloom indicates a row+qualifier is absent from a StoreFile, the scanner may use 
a bloom-optimized “fake key”. If such a fake key is consumed during the 
trySkipToNextColumn skip loop, a subsequent next() can advance from a stale 
physical HFile position and return a cell that sorts before the column being 
skipped. When that reaches delete-tracking (isDelete), the read can fail and 
surface as isDelete failed.

 

the HBASE-28055 change *narrowed the safety check* in a way that is 
{*}semantically wrong{*}:
 * {*}HBASE-19863's check ({{{}compareKeyForNextColumn < 0{}}}){*}: Catches 
_any_ backward ordering of the next cell relative to the current cell's 
expected next column. This is a *general* guard — it handles any case where 
consuming cells in the loop caused the heap to surface a cell that violates 
ordering, regardless of whether it was a bloom fake key or something else.

 * {*}HBASE-28055's check ({{{}timestamp == OLDEST_TIMESTAMP{}}}){*}: Only 
catches the case where {{cell}} itself is a bloom filter fake key (since fake 
keys are created with {{{}OLDEST_TIMESTAMP{}}}). But the problem scenario 
described in HBASE-19863 is that the bloom fake key gets *consumed inside the 
loop* by {{{}heap.next(){}}}, and the _next_ real cell that surfaces is now out 
of order. In that case, {{cell}} (the trigger cell passed into 
{{{}trySkipToNextColumn{}}}) is a *real cell* with a real timestamp — _not_ 
{{{}OLDEST_TIMESTAMP{}}}. The check misses it entirely




h3. Impact / correctness concerns



Beyond the immediate isDelete failed read failures, this indicates a deeper 
correctness issue: the scan pipeline relies on the invariant that KeyValueHeap 
delivers Cells in non-decreasing key order across all participating scanners. 
When ROWCOL bloom + trySkipToNextColumn can result in a “smaller key” being 
surfaced after a larger key (i.e., a backward jump), the heap’s ordering 
guarantee is effectively violated from the perspective of consumers (e.g., 
delete tracking and matchers). 

 

We were able to reproduce the bug with both a mini-cluster stress test and a 
small deterministic unit test. It would hit exception with the HBASE-28055 fix 
but would not hit it with HBASE-19863 fix. 


*Proposed fix* 
**

We should revert the change in HBASE-28055. The claimed "performance 
improvement" comes from skipping the reseek that should be used to fix the heap 
ordering.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to