[
https://issues.apache.org/jira/browse/HBASE-29889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18059931#comment-18059931
]
JinHyuk Kim edited comment on HBASE-29889 at 2/20/26 4:03 PM:
--------------------------------------------------------------
I also reviewed two external libraries,
[Zero-Allocation-Hashing|https://github.com/OpenHFT/Zero-Allocation-Hashing#]
(ZAH) and [hash4j|https://github.com/dynatrace-oss/hash4j], to see whether they
could be good alternatives.
!549837052-2f3a2c79-23a9-416d-abc6-72c4a77c19e2.png|width=772,height=951!
Hash4j performs very well when the input is given as a {*}raw byte array{*}.
In our case, however, I provide data through the
[HashKey|https://github.com/apache/hbase/blob/master/hbase-common/src/main/java/org/apache/hadoop/hbase/util/HashKey.java]
interface using a streaming approach.
When I tested hash4j with this streaming method, the performance dropped
significantly, so it was not a good fit.
!552761600-0407cb2c-445b-4e08-83d0-9b4dde7692f6.png|width=778,height=973!
ZAH also showed lower performance compared to the implementation in this PR,
especially for small and medium input sizes.
Still, it has some advantages in terms of maintainability, so I created a
separate PR with a ZAH-based version as well in case anyone prefers that
direction.
These are tested in here:
[https://github.com/jinhyukify/xxh3-benchmark/tree/hash4j]
was (Author: JIRAUSER306609):
I also reviewed two external libraries,
[Zero-Allocation-Hashing|https://github.com/OpenHFT/Zero-Allocation-Hashing#]
(ZAH) and [hash4j|https://github.com/dynatrace-oss/hash4j], to see whether they
could be good alternatives.
!549837052-2f3a2c79-23a9-416d-abc6-72c4a77c19e2.png|width=772,height=951!
Hash4j performs very well when the input is given as a \{*}raw byte array{*}.
In our case, however, I provide data through the
[HashKey|https://github.com/apache/hbase/blob/master/hbase-common/src/main/java/org/apache/hadoop/hbase/util/HashKey.java]
interface using a streaming approach.
When I tested hash4j with this streaming method, the performance dropped
significantly, so it was not a good fit.
!552761600-0407cb2c-445b-4e08-83d0-9b4dde7692f6.png|width=778,height=973!
ZAH also showed lower performance compared to the implementation in this PR,
especially for small and medium input sizes.
Still, it has some advantages in terms of maintainability, so I created a
separate PR with a ZAH-based version as well in case anyone prefers that
direction.
> Add XXH3 Hash Support to Bloom Filter
> -------------------------------------
>
> Key: HBASE-29889
> URL: https://issues.apache.org/jira/browse/HBASE-29889
> Project: HBase
> Issue Type: New Feature
> Components: regionserver
> Reporter: JinHyuk Kim
> Assignee: JinHyuk Kim
> Priority: Major
> Labels: pull-request-available
> Attachments: 549837052-2f3a2c79-23a9-416d-abc6-72c4a77c19e2.png,
> 552761600-0407cb2c-445b-4e08-83d0-9b4dde7692f6.png
>
>
> h2. Summary
> Added *XXH3* as a new hashing option for the HBase Bloom Filter.
> h2. Background
> Existing hash functions used in HBase Bloom Filters(Jenkins, Murmur and
> Murmur3) were designed years ago and do not fully leverage modern CPU
> architectures.
> [*XXH3*|https://github.com/Cyan4973/xxHash], on the other hand, is optimized
> for today’s CPUs with wide execution units and fast unaligned memory access,
> resulting in significantly faster hashing performance.
> h2. What Was Done
> * Implemented XXH3 Hashing and integrated it as an available hash type for
> Bloom Filters.
> * Conducted benchmark tests comparing XXH3 with existing hash algorithms.
> ** Benchmark test code is available in
> [jinhyukify/xxh3-benchmark.|https://github.com/jinhyukify/xxh3-benchmark]
> * *Benchmark Results:*
> **
> https://docs.google.com/document/d/1KcCLz3nnkDNgUUMpTIWOwvrY8kgpOJFKZHboRNt2mx0/edit?usp=sharing
> h2. Expected Impact
> * *Faster Bloom filter lookups* across all Bloom types during client-side
> read paths.
> * *Slight improvement in Bloom filter write performance* during HFile
> creation and compaction, thanks to the lower hashing overhead of XXH3.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)