On Fri, 2009-04-03 at 13:34 -0700, Brian Harvey wrote: > Okay, I'm confused,
Yay! Something useful for me to do! > A hash value is an index into an array. The hell it is. That's BS. It's just a string of bits that hopefully has some useful statistical relationships to the stuff being hashed. Each bit of a hash function *ideally* (never quite, in practice) splits the input - the domain - into two halves. Is the thing I'm hashing A or is it B? That's bit 0. Is the thing I'm hashing C or D? That's bit 1. Etc. We then demand a lot of useful statistical properties of the choices of values for variables A, B, C, D, etc. Ideally, half of lists hashed have a 0 at bit 0 (the others having 1 at bit 0). We ideally hope that all the naturally occurring hashed object we encounter have distinct hashes. We hope that the system of hashing generates as few bits of hash value as possible, though this ideal is subordinate to the uniqueness ideal. The "positive" restriction of R6 has a practical implication. Namely, for many plausible implementations of Scheme, it essentially says "give up one bit of hash values or else take a huge performance hit". > If the > hash value doesn't fit in a fixnum, then the array doesn't fit in memory,\ As Per pointed out in an indirect reply, modulo is your friend. > or even in the virtual address space (by a factor of 4 on a 32-bit machine). > What am I missing here? The "perfect" way to hash is to compute a hash value with enough bits (well enough computed) that all objects hashed in a given run of the program get a unique hash value. Hashing algorithms don't, except in special cases, promise to be perfect. Instead, common algorithms (typical of a Scheme implementation) try to be approximately perfect. Not everything gets a unique hash value but collisions are rare. A given hash algorithm has a "natural" performance characteristic on real hardware. The relevant example: it is easy to give a hashing algorithm that can give 20 bits or 21 bits or 22 bits of hash value, depending on how some constant is defined. In real life, the 20 bit version and 21 bit versions will run equally fast on a 32 bit CPU. Moreover, let's (reasonably) assume that the 21 bit version returns "close to ideal" hash values - that's normally how these things go. So, we should set that constant - the number of bits returned - to the highest number that runs "just as fast as 20 bits" on our 32 bit CPU. The r6 sign limitation steals one bit from any implementation that wants to limit hash values to signed fixnums. -t _______________________________________________ r6rs-discuss mailing list [email protected] http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss
