Xiao Liu created HBASE-30256:
--------------------------------

             Summary: FuzzyRowFilter mishandles the fuzzy-mask encoding on the 
no-unsafe path
                 Key: HBASE-30256
                 URL: https://issues.apache.org/jira/browse/HBASE-30256
             Project: HBase
          Issue Type: Bug
          Components: Filters
            Reporter: Xiao Liu
            Assignee: Xiao Liu
             Fix For: 2.7.0, 3.0.0-beta-2, 2.5.16, 2.6.7


h3. Problem
On the "no-unsafe" code path (HBasePlatformDependent.unaligned() == false), 
FuzzyRowFilter
silently returns incorrect/incomplete scan results: rows matching only via a 
wildcard position
are dropped, and next-cell hints are wrong (both forward and reverse scans). 
The same root cause
also corrupts cross-platform filter serialization and makes equals()/hashCode() 
inconsistent.

h3. When it triggers
unaligned() returns false when sun.misc.Unsafe is unavailable (e.g. JDK 17+ 
strong encapsulation,
future Unsafe removal) or on non-x86 architectures. x86_64/aarch64 with Unsafe 
accessible is
unaffected, which is why this has stayed latent.

h3. Root cause
The fuzzy mask has two internal encodings — unsafe {fixed:-1, non-fixed:0} vs
no-unsafe {fixed:0, non-fixed:1} — while the public/wire form is {0,1}. 
Multiple paths assumed the
unsafe encoding: filterCell's unconditional "mask >>= 2" turned a no-unsafe 
{0,1} into all-fixed
{0,0}; getNextForFuzzyRule received the wrong encoding and non-fixed key bytes 
were never zeroed;
toByteArray()/getFuzzyKeys() leaked the internal form onto the wire; 
equals()/hashCode() compared
the internal form (and hashCode() was identity-based, violating the 
equals/hashCode contract).

h3. Fix
* Make mask handling path-aware: satisfiesNoUnsafe keeps {0,1}; the hint path 
converts
  {0,1} -> {-1,0}; non-fixed search-key bytes are zeroed on both paths. 
HBASE-30226's reverse
  same-row hint logic is preserved.
* preprocessMask normalizes a deserialized {-1,2} mask back to {0,1} on the 
no-unsafe path, so a
  filter serialized on an unsafe peer deserializes correctly on a no-unsafe one 
(and vice versa).
* toByteArray()/getFuzzyKeys() emit the canonical {0,1} form; 
equals()/hashCode() compare/hash that
  normalized form (content-based), so they are consistent with each other and 
with the wire bytes.
* Normalize the unsafe mask once in the constructor instead of mutating it per 
cell, so the stored
  mask is immutable during scanning.

h3. Tests
New TestFuzzyRowFilterWoUnsafe forces the no-unsafe path (mockStatic on 
HBasePlatformDependent) and
covers match/hint forward+reverse, multiple keys, cross-platform parseFrom, and 
serialization
round-trip; TestFuzzyRowFilter adds serialize-after-use and equals/hashCode 
regression tests. The
unsafe (production) path behavior is unchanged.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to