[Collections] Implementation of EnhancedDoubleHasher

Juan Manuel Gimeno Illa Fri, 17 May 2024 23:23:14 -0700

The code in the implementation of the EnhancedDoubleHasher is based on the
wikipedia article https://en.wikipedia.org/wiki/Double_hashing.


The code that appears in the article is:

struct key; /// Opaque
/// Use other data types when needed. (Must be unsigned for guaranteed 
wrapping.)
extern unsigned int h1(struct key const *), h2(struct key const *);

/// Calculate k hash values from two underlying hash functions
/// h1() and h2() using enhanced double hashing.  On return,
///     hashes[i] = h1(x) + i*h2(x) + (i*i*i - i)/6.
/// Takes advantage of automatic wrapping (modular reduction)
/// of unsigned types in C.
void ext_dbl_hash(struct key const *x, unsigned int hashes[], unsigned int n)
{
    unsigned int a = h1(x), b = h2(x), i;

    for (i = 0; i < n; i++) {
        hashes[i] = a;
        a += b; // Add quadratic difference to get cubic
        b += i; // Add linear difference to get quadratic
                // i++ adds constant difference to get linear
    }
}

And it is based on code from the paper "Bloom Filters in Probabilistic 
Verification" by Peter C.
Dillinger and Panagiotis Manolios 
(https://www.khoury.northeastern.edu/~pete/pub/bloom-filters-verification.pdf)

But the code that appears on that paper is:

x, y := a(δ) MOD m, b(δ) MOD m
f[0] := x
for i := 1 .. k-1
    x := (x + y) MOD m
    y := (y + i) MOD m
    f[i] := x

which does the assignments to x, y (a, b in the wikipedia article) in a 
different order.

The same algorithm appears the thesis "Adaptive Approximate State Storage"
(http://peterd.org/pcd-diss.pdf)

algorithm " Enhanced DoubleHashing"

input x : "the element being added or queried"
input m : "number of bits in bit vector"
input k : "number of indices to compute"
output g[0 .. k - 1] : "array of indices"
uses h1, h2 : "hash functions"

begin
    a := h1(x)
    b := h2(x)
    g[0] := a
    for i := 1 .. k - 1
    begin
        a := (a + b) MOD m
        b := (b + i) MOD m
        g[i] := a
    end
end

It's easy to see that the computation the enhanced double hash computes

    hash[i] = h1(x) + i * h2(x) + (i^3 - i) / 6

if we use a=h1(x) and b=h2(x), generates the sequence:

    a, a + b, a + 2 * b + 1, a + 3 * b + 4, a + 4 * b + 10, ...

And both your implementation and the wikipedia article generates

    a, a + b, a + 2 * b, a + 3 * b + 1, a + 4 * b + 4, ...

So the last term in the sum lags one step behind.

The solution, I think, is as simple as swapping the assignments of index and 
inc in your implementation.

I suppose the impact of this is nil, but I'm not an expert for judging that.

Thanks,

Juan Manuel Gimeno

[Collections] Implementation of EnhancedDoubleHasher

Reply via email to