Re: Cache::* and MD5 collisions [was: [OT] Data store options]

Barrie Slaymaker Thu, 08 Nov 2001 11:25:34 -0800

On Thu, Nov 08, 2001 at 10:54:11AM -0800, Andrew Ho wrote:
> Let me point out that if you are using MD5 hashes for directory spreading
> (i.e. to spread a large number of files across a tree of directories so
> that no one directory is filled with too many files for efficient
> filesystem access), the easy solution is to just append the unique key
> that you are hashing to the files involved.


Neat.

That adds directory levels if you do URI->filesystem_path mapping for
the key, or runs in to namelength limits if you fold "/" and, say, "%"
and "." in to %FF style escape codes or base64 it.  And namelength
limitations are even more severe on many non-Unixish filesystems :).

I prefer the technique of storing the full text of the hash key in the
stored object's metadata (ie in the cache) and comparing it to the
requested key on retrieval.  When a collision is detected, you can then
do overflow processing (like most hashed dictionary data structures from
CS-land) and seek the cached copy somewhere else in the cache or just
treat it like a cache miss.

MD5 is a good thing, but relying on it's uniqueness for a site that needs
to be reliable is a bit risky in my mind.  YMMV, I just want folks to be
aware of the issues.

- Barrie

Re: Cache::* and MD5 collisions [was: [OT] Data store options]

Reply via email to