Xiao Liu created HBASE-30256:
--------------------------------
Summary: FuzzyRowFilter mishandles the fuzzy-mask encoding on the
no-unsafe path
Key: HBASE-30256
URL: https://issues.apache.org/jira/browse/HBASE-30256
Project: HBase
Issue Type: Bug
Components: Filters
Reporter: Xiao Liu
Assignee: Xiao Liu
Fix For: 2.7.0, 3.0.0-beta-2, 2.5.16, 2.6.7
h3. Problem
On the "no-unsafe" code path (HBasePlatformDependent.unaligned() == false),
FuzzyRowFilter
silently returns incorrect/incomplete scan results: rows matching only via a
wildcard position
are dropped, and next-cell hints are wrong (both forward and reverse scans).
The same root cause
also corrupts cross-platform filter serialization and makes equals()/hashCode()
inconsistent.
h3. When it triggers
unaligned() returns false when sun.misc.Unsafe is unavailable (e.g. JDK 17+
strong encapsulation,
future Unsafe removal) or on non-x86 architectures. x86_64/aarch64 with Unsafe
accessible is
unaffected, which is why this has stayed latent.
h3. Root cause
The fuzzy mask has two internal encodings — unsafe {fixed:-1, non-fixed:0} vs
no-unsafe {fixed:0, non-fixed:1} — while the public/wire form is {0,1}.
Multiple paths assumed the
unsafe encoding: filterCell's unconditional "mask >>= 2" turned a no-unsafe
{0,1} into all-fixed
{0,0}; getNextForFuzzyRule received the wrong encoding and non-fixed key bytes
were never zeroed;
toByteArray()/getFuzzyKeys() leaked the internal form onto the wire;
equals()/hashCode() compared
the internal form (and hashCode() was identity-based, violating the
equals/hashCode contract).
h3. Fix
* Make mask handling path-aware: satisfiesNoUnsafe keeps {0,1}; the hint path
converts
{0,1} -> {-1,0}; non-fixed search-key bytes are zeroed on both paths.
HBASE-30226's reverse
same-row hint logic is preserved.
* preprocessMask normalizes a deserialized {-1,2} mask back to {0,1} on the
no-unsafe path, so a
filter serialized on an unsafe peer deserializes correctly on a no-unsafe one
(and vice versa).
* toByteArray()/getFuzzyKeys() emit the canonical {0,1} form;
equals()/hashCode() compare/hash that
normalized form (content-based), so they are consistent with each other and
with the wire bytes.
* Normalize the unsafe mask once in the constructor instead of mutating it per
cell, so the stored
mask is immutable during scanning.
h3. Tests
New TestFuzzyRowFilterWoUnsafe forces the no-unsafe path (mockStatic on
HBasePlatformDependent) and
covers match/hint forward+reverse, multiple keys, cross-platform parseFrom, and
serialization
round-trip; TestFuzzyRowFilter adds serialize-after-use and equals/hashCode
regression tests. The
unsafe (production) path behavior is unchanged.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)