On Tue, 26 May 2026, Théophile BRÉZOT <[email protected]> wrote: > I am quite happy to read a discussion about hashing in Guile as this > hash been one of the few real pain points in my experience: the hash is > not behaving as one would expect from a generic hash function. What > follows is not directly related to the question asked, so I hope this > message is not misplaced. > >> Two different lists, same hash values. Weird. But it's true: >> >> scheme@(guile-user)> (= (hash '(1 2 3 (4)) most-positive-fixnum) >> (hash '(1 2 3 (4 5)) most-positive-fixnum)) >> => #t >> >> and it gets worse than that: >> >> scheme@(guile-user)> (= (hash '(1 2 3 4 5) most-positive-fixnum) >> (hash '(1 2 3 4 5 6) most-positive-fixnum)) >> => #t > > As noted before, there is the issue with the recursion being limited to > a rather low level, but hashing is also invariant across permutation of > its inputs: > > ``` > scheme@(guile-user)> (= (hash '(1 2) most-positive-fixnum) > (hash '(2 1) most-positive-fixnum)) > $7 = #t > ``` > > And for some data types, it is simply broken: > > ``` > scheme@(guile-user)> (= (hash (make-u8vector 3 0) most-positive-fixnum) > (hash (make-u8vector 3 1) most-positive-fixnum)) > $6 = #t > ``` > > The same holds for `f32vector` and `f64vector` which are implemented on > top of `u8vector`. Those two specific cases caused very-hard-to-debug > issues in two of my projects... > >> One of the design goals for ‘hash’ is that it must be constant-time. >> Thus, by definition, it cannot traverse entire lists or vectors. > > I don't know what is the story behind this function. I don't know why it > is required to be constant-time. What I do know is that nowadays, > non-cryptographic hash functions are so fast that hashing is rarely an > issue [1], and that many languages have converged to using SipHash for > hash-tables since it is fast and protects against hash flooding (if > hash-table based on a poorly designed hash is exposed through a public > API, an attacker may be able to forge malicious inputs that all hash to > the same target bucket). > >> I think what we probably want is a strong collision resistant hash. A >> strong crypto hash would do the job for this. So the current core hash >> would not be impacted. > > Cryptographic hash functions have properties that are not required for > generic hash functions (like pre-image resistance) which makes them much > slower. SipHash is formally a MAC since it requires a key to be secure, > but it is really fast (see [1]).
In my experience, if written in C with hardware acceleration, a crypto hash is very fast. But I guess Siphash could be even faster. However, I don't think we could implement it in pure Scheme for performance reasons but also because fixnum might not be able to encode the hash and so this would generate a lot of allocation with bignum. > I do not think going for a cryptographic hash function as default hash > is a good idea. I would use SipHash for hash-tables and maybe another > one like CityHash or XXH for internal use like generating identifiers. Interesting. I will have a look at those. >> I had some discussions related to this, i.e. crypto hash, in core guile, >> with Ludovic and Rob. I think that having core support for some >> the crypto hashing makes sense, especially if the compiler needs it. > > That said, I think having curated cryptographic primitives in the core > language like Go does is a great step toward a more secure ecosystem. > Even if the compiler does not need it in the end. Well we can already offload that to guile-gcrypt. So I wonder, what would be the advantage of having the crypto hashing in core Guile vs relying on an external dependency. Thanks, Olivier -- Olivier Dion
