[
https://issues.apache.org/jira/browse/HBASE-30256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dávid Paksy updated HBASE-30256:
--------------------------------
Status: Patch Available (was: Open)
> FuzzyRowFilter mishandles the fuzzy-mask encoding on the no-unsafe path
> -----------------------------------------------------------------------
>
> Key: HBASE-30256
> URL: https://issues.apache.org/jira/browse/HBASE-30256
> Project: HBase
> Issue Type: Bug
> Components: Filters
> Reporter: Xiao Liu
> Assignee: Xiao Liu
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.5.16, 2.6.7
>
>
> h3. Problem
> On the "no-unsafe" code path (HBasePlatformDependent.unaligned() == false),
> FuzzyRowFilter
> silently returns incorrect/incomplete scan results: rows matching only via a
> wildcard position
> are dropped, and next-cell hints are wrong (both forward and reverse scans).
> The same root cause
> also corrupts cross-platform filter serialization and makes
> equals()/hashCode() inconsistent.
> h3. When it triggers
> unaligned() returns false when sun.misc.Unsafe is unavailable (e.g. JDK 17+
> strong encapsulation,
> future Unsafe removal) or on non-x86 architectures. x86_64/aarch64 with
> Unsafe accessible is
> unaffected, which is why this has stayed latent.
> h3. Root cause
> The fuzzy mask has two internal encodings — unsafe {fixed:-1, non-fixed:0} vs
> no-unsafe {fixed:0, non-fixed:1} — while the public/wire form is {0,1}.
> Multiple paths assumed the
> unsafe encoding: filterCell's unconditional "mask >>= 2" turned a no-unsafe
> {0,1} into all-fixed
> {0,0}; getNextForFuzzyRule received the wrong encoding and non-fixed key
> bytes were never zeroed;
> toByteArray()/getFuzzyKeys() leaked the internal form onto the wire;
> equals()/hashCode() compared
> the internal form (and hashCode() was identity-based, violating the
> equals/hashCode contract).
> h3. Fix
> * Make mask handling path-aware: satisfiesNoUnsafe keeps {0,1}; the hint path
> converts
> {0,1} -> {-1,0}; non-fixed search-key bytes are zeroed on both paths.
> HBASE-30226's reverse
> same-row hint logic is preserved.
> * preprocessMask normalizes a deserialized {-1,2} mask back to {0,1} on the
> no-unsafe path, so a
> filter serialized on an unsafe peer deserializes correctly on a no-unsafe
> one (and vice versa).
> * toByteArray()/getFuzzyKeys() emit the canonical {0,1} form;
> equals()/hashCode() compare/hash that
> normalized form (content-based), so they are consistent with each other and
> with the wire bytes.
> * Normalize the unsafe mask once in the constructor instead of mutating it
> per cell, so the stored
> mask is immutable during scanning.
> h3. Tests
> New TestFuzzyRowFilterWoUnsafe forces the no-unsafe path (mockStatic on
> HBasePlatformDependent) and
> covers match/hint forward+reverse, multiple keys, cross-platform parseFrom,
> and serialization
> round-trip; TestFuzzyRowFilter adds serialize-after-use and equals/hashCode
> regression tests. The
> unsafe (production) path behavior is unchanged.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)