[ 
https://issues.apache.org/jira/browse/HBASE-30256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dávid Paksy updated HBASE-30256:
--------------------------------
    Status: Patch Available  (was: Open)

> FuzzyRowFilter mishandles the fuzzy-mask encoding on the no-unsafe path
> -----------------------------------------------------------------------
>
>                 Key: HBASE-30256
>                 URL: https://issues.apache.org/jira/browse/HBASE-30256
>             Project: HBase
>          Issue Type: Bug
>          Components: Filters
>            Reporter: Xiao Liu
>            Assignee: Xiao Liu
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.7.0, 3.0.0-beta-2, 2.5.16, 2.6.7
>
>
> h3. Problem
> On the "no-unsafe" code path (HBasePlatformDependent.unaligned() == false), 
> FuzzyRowFilter
> silently returns incorrect/incomplete scan results: rows matching only via a 
> wildcard position
> are dropped, and next-cell hints are wrong (both forward and reverse scans). 
> The same root cause
> also corrupts cross-platform filter serialization and makes 
> equals()/hashCode() inconsistent.
> h3. When it triggers
> unaligned() returns false when sun.misc.Unsafe is unavailable (e.g. JDK 17+ 
> strong encapsulation,
> future Unsafe removal) or on non-x86 architectures. x86_64/aarch64 with 
> Unsafe accessible is
> unaffected, which is why this has stayed latent.
> h3. Root cause
> The fuzzy mask has two internal encodings — unsafe {fixed:-1, non-fixed:0} vs
> no-unsafe {fixed:0, non-fixed:1} — while the public/wire form is {0,1}. 
> Multiple paths assumed the
> unsafe encoding: filterCell's unconditional "mask >>= 2" turned a no-unsafe 
> {0,1} into all-fixed
> {0,0}; getNextForFuzzyRule received the wrong encoding and non-fixed key 
> bytes were never zeroed;
> toByteArray()/getFuzzyKeys() leaked the internal form onto the wire; 
> equals()/hashCode() compared
> the internal form (and hashCode() was identity-based, violating the 
> equals/hashCode contract).
> h3. Fix
> * Make mask handling path-aware: satisfiesNoUnsafe keeps {0,1}; the hint path 
> converts
>   {0,1} -> {-1,0}; non-fixed search-key bytes are zeroed on both paths. 
> HBASE-30226's reverse
>   same-row hint logic is preserved.
> * preprocessMask normalizes a deserialized {-1,2} mask back to {0,1} on the 
> no-unsafe path, so a
>   filter serialized on an unsafe peer deserializes correctly on a no-unsafe 
> one (and vice versa).
> * toByteArray()/getFuzzyKeys() emit the canonical {0,1} form; 
> equals()/hashCode() compare/hash that
>   normalized form (content-based), so they are consistent with each other and 
> with the wire bytes.
> * Normalize the unsafe mask once in the constructor instead of mutating it 
> per cell, so the stored
>   mask is immutable during scanning.
> h3. Tests
> New TestFuzzyRowFilterWoUnsafe forces the no-unsafe path (mockStatic on 
> HBasePlatformDependent) and
> covers match/hint forward+reverse, multiple keys, cross-platform parseFrom, 
> and serialization
> round-trip; TestFuzzyRowFilter adds serialize-after-use and equals/hashCode 
> regression tests. The
> unsafe (production) path behavior is unchanged.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to