[ 
https://issues.apache.org/jira/browse/HBASE-29889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18059931#comment-18059931
 ] 

JinHyuk Kim edited comment on HBASE-29889 at 2/20/26 4:03 PM:
--------------------------------------------------------------

I also reviewed two external libraries, 
[Zero-Allocation-Hashing|https://github.com/OpenHFT/Zero-Allocation-Hashing#] 
(ZAH) and [hash4j|https://github.com/dynatrace-oss/hash4j], to see whether they 
could be good alternatives.

!549837052-2f3a2c79-23a9-416d-abc6-72c4a77c19e2.png|width=772,height=951!

Hash4j performs very well when the input is given as a {*}raw byte array{*}.

In our case, however, I provide data through the 
[HashKey|https://github.com/apache/hbase/blob/master/hbase-common/src/main/java/org/apache/hadoop/hbase/util/HashKey.java]
 interface using a streaming approach.

When I tested hash4j with this streaming method, the performance dropped 
significantly, so it was not a good fit.

!552761600-0407cb2c-445b-4e08-83d0-9b4dde7692f6.png|width=778,height=973!

ZAH also showed lower performance compared to the implementation in this PR, 
especially for small and medium input sizes.

Still, it has some advantages in terms of maintainability, so I created a 
separate PR with a ZAH-based version as well in case anyone prefers that 
direction.

These are tested in here: 
[https://github.com/jinhyukify/xxh3-benchmark/tree/hash4j]


was (Author: JIRAUSER306609):
I also reviewed two external libraries, 
[Zero-Allocation-Hashing|https://github.com/OpenHFT/Zero-Allocation-Hashing#] 
(ZAH) and [hash4j|https://github.com/dynatrace-oss/hash4j], to see whether they 
could be good alternatives.

!549837052-2f3a2c79-23a9-416d-abc6-72c4a77c19e2.png|width=772,height=951!

Hash4j performs very well when the input is given as a \{*}raw byte array{*}.

In our case, however, I provide data through the 
[HashKey|https://github.com/apache/hbase/blob/master/hbase-common/src/main/java/org/apache/hadoop/hbase/util/HashKey.java]
 interface using a streaming approach.

When I tested hash4j with this streaming method, the performance dropped 
significantly, so it was not a good fit.

!552761600-0407cb2c-445b-4e08-83d0-9b4dde7692f6.png|width=778,height=973!

ZAH also showed lower performance compared to the implementation in this PR, 
especially for small and medium input sizes.

Still, it has some advantages in terms of maintainability, so I created a 
separate PR with a ZAH-based version as well in case anyone prefers that 
direction.

> Add XXH3 Hash Support to Bloom Filter
> -------------------------------------
>
>                 Key: HBASE-29889
>                 URL: https://issues.apache.org/jira/browse/HBASE-29889
>             Project: HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: JinHyuk Kim
>            Assignee: JinHyuk Kim
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: 549837052-2f3a2c79-23a9-416d-abc6-72c4a77c19e2.png, 
> 552761600-0407cb2c-445b-4e08-83d0-9b4dde7692f6.png
>
>
> h2. Summary
> Added *XXH3* as a new hashing option for the HBase Bloom Filter.
> h2. Background
> Existing hash functions used in HBase Bloom Filters(Jenkins, Murmur and 
> Murmur3) were designed years ago and do not fully leverage modern CPU 
> architectures.
> [*XXH3*|https://github.com/Cyan4973/xxHash], on the other hand, is optimized 
> for today’s CPUs with wide execution units and fast unaligned memory access, 
> resulting in significantly faster hashing performance.
> h2. What Was Done
>  * Implemented XXH3 Hashing and integrated it as an available hash type for 
> Bloom Filters.
>  * Conducted benchmark tests comparing XXH3 with existing hash algorithms.
>  ** Benchmark test code is available in 
> [jinhyukify/xxh3-benchmark.|https://github.com/jinhyukify/xxh3-benchmark]
>  * *Benchmark Results:*
>  ** 
> https://docs.google.com/document/d/1KcCLz3nnkDNgUUMpTIWOwvrY8kgpOJFKZHboRNt2mx0/edit?usp=sharing
> h2. Expected Impact
>  * *Faster Bloom filter lookups* across all Bloom types during client-side 
> read paths.
>  * *Slight improvement in Bloom filter write performance* during HFile 
> creation and compaction, thanks to the lower hashing overhead of XXH3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to