On Thu, Nov 08, 2001 at 10:54:11AM -0800, Andrew Ho wrote: > Let me point out that if you are using MD5 hashes for directory spreading > (i.e. to spread a large number of files across a tree of directories so > that no one directory is filled with too many files for efficient > filesystem access), the easy solution is to just append the unique key > that you are hashing to the files involved.
Neat. That adds directory levels if you do URI->filesystem_path mapping for the key, or runs in to namelength limits if you fold "/" and, say, "%" and "." in to %FF style escape codes or base64 it. And namelength limitations are even more severe on many non-Unixish filesystems :). I prefer the technique of storing the full text of the hash key in the stored object's metadata (ie in the cache) and comparing it to the requested key on retrieval. When a collision is detected, you can then do overflow processing (like most hashed dictionary data structures from CS-land) and seek the cached copy somewhere else in the cache or just treat it like a cache miss. MD5 is a good thing, but relying on it's uniqueness for a site that needs to be reliable is a bit risky in my mind. YMMV, I just want folks to be aware of the issues. - Barrie