[
https://issues.apache.org/jira/browse/FLINK-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414110#comment-15414110
]
Stefan Richter commented on FLINK-3623:
---------------------------------------
In my opinion, our implementation of Murmur might be too complicated and
computationally expensive for simply hashing an int. Most parts of the used
algorithm are only required for arbitrary byte[] keys. What matters for mixing
the bits of an int is the finalizer part of Murmur3, which is:
h ^= h >> 16;
h *= 0x85ebca6b;
h ^= h >> 13;
h *= 0xc2b2ae35;
h ^= h >> 16;
If I remember correctly, the purpose of the remaining code is to generate the
intermediate hash from blocks over variable size byte[], whereas the purpose of
the finalizer is to actually increase entropy and reduce collisions of the
intermediate hash. Restricting the code to these lines could speed up hashing
significantly, which might be even more relevant in case hash calculations are
also used to determine keygroups.
> Adjust MurmurHash algorithm
> ---------------------------
>
> Key: FLINK-3623
> URL: https://issues.apache.org/jira/browse/FLINK-3623
> Project: Flink
> Issue Type: Improvement
> Components: Distributed Coordination
> Affects Versions: 1.1.0
> Reporter: Greg Hogan
> Assignee: Greg Hogan
> Priority: Trivial
> Fix For: 1.1.0
>
>
> Flink's MurmurHash implementation differs from the published algorithm.
> From Flink's MathUtils.java:
> {code}
> code *= 0xe6546b64;
> {code}
> The Murmur3_32 algorithm as described by
> [Wikipedia|https://en.wikipedia.org/wiki/MurmurHash]:
> {code}
> m ← 5
> n ← 0xe6546b64
> hash ← hash × m + n
> {code}
> and in Guava's Murmur3_32HashFunction.java:
> {code}
> h1 = h1 * 5 + 0xe6546b64;
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)