I am quite happy to read a discussion about hashing in Guile as this
hash been one of the few real pain points in my experience: the hash is
not behaving as one would expect from a generic hash function. What
follows is not directly related to the question asked, so I hope this
message is not misplaced.

>  Two different lists, same hash values.  Weird.  But it's true:
>
>  scheme@(guile-user)> (= (hash '(1 2 3 (4)) most-positive-fixnum)
>                          (hash '(1 2 3 (4 5)) most-positive-fixnum))
>    => #t
>
> and it gets worse than that:
>
>  scheme@(guile-user)> (= (hash '(1 2 3 4 5) most-positive-fixnum)
>                          (hash '(1 2 3 4 5 6) most-positive-fixnum))
>    => #t

As noted before, there is the issue with the recursion being limited to
a rather low level, but hashing is also invariant across permutation of
its inputs:

```
scheme@(guile-user)> (= (hash '(1 2) most-positive-fixnum)
                        (hash '(2 1) most-positive-fixnum))
$7 = #t
```

And for some data types, it is simply broken:

```
scheme@(guile-user)> (= (hash (make-u8vector 3 0) most-positive-fixnum)
                        (hash (make-u8vector 3 1) most-positive-fixnum))
$6 = #t
```

The same holds for `f32vector` and `f64vector` which are implemented on
top of `u8vector`. Those two specific cases caused very-hard-to-debug
issues in two of my projects...

> One of the design goals for ‘hash’ is that it must be constant-time.
> Thus, by definition, it cannot traverse entire lists or vectors.

I don't know what is the story behind this function. I don't know why it
is required to be constant-time. What I do know is that nowadays,
non-cryptographic hash functions are so fast that hashing is rarely an
issue [1], and that many languages have converged to using SipHash for
hash-tables since it is fast and protects against hash flooding (if
hash-table based on a poorly designed hash is exposed through a public
API, an attacker may be able to forge malicious inputs that all hash to
the same target bucket).

> I think what we probably want is a strong collision resistant hash.  A
> strong crypto hash would do the job for this. So the current core hash
> would not be impacted.

Cryptographic hash functions have properties that are not required for
generic hash functions (like pre-image resistance) which makes them much
slower. SipHash is formally a MAC since it requires a key to be secure,
but it is really fast (see [1]).

I do not think going for a cryptographic hash function as default hash
is a good idea. I would use SipHash for hash-tables and maybe another
one like CityHash or XXH for internal use like generating identifiers.

> I had some discussions related to this, i.e. crypto hash, in core guile,
> with Ludovic and Rob.  I think that having core support for some
> the crypto hashing makes sense, especially if the compiler needs it.

That said, I think having curated cryptographic primitives in the core
language like Go does is a great step toward a more secure ecosystem.
Even if the compiler does not need it in the end.

[1] https://xxhash.com/

Reply via email to