Re: HEADS UP - HAMMER work

FloW Sun, 16 Nov 2008 06:03:48 -0800

Hi there

I usually follow the lists and development of DFBSD quietly, not havingenough time to bring in real effort. But this caught my interest :-)


Matthew Dillon wrote:


    aaaaa       name[0] & 0x1F
    bbbbb       name[1] & 0x1F
    ccccc       name[2] & 0x1F
    mmmmmm      crc32(name + 3, len - 5) and some xor magic -> 6 bits
    yyyyy       name[len-2] & 0x1F
    zzzzz       name[len-1] & 0x1F
    h[31]       crc32(name, len) (entire filename)

    0aaaaabbbbbccccc mmmmmmyyyyyzzzzz hhhhhhhhhhhhhhhh hhhhhhhhhhhhhhh0

Don't know if you already thought of this, but you probably want to usedifferent CRC's for m and h, like crc16 for m and crc32 for h. If h isjust an ordinary function of (a,b,c,m,y,z) you unnecessarily lose hashspace. Ok, I'm not a specialist in CRC, so I could be wrong.

Also I wouldn't consider the zone idea (from a later mail) with achanging length of zone B, since different name lenghts would completelyscramble the hash locality.

As said before, it strongly depends on the particular file name usage ifthis sort of semi-sorted hash provides locality or not. Obviously thegeneric and optimal solution would be to compress the first bits via adirectory wide patricia trie. Not feasible here though.

If you find any realworld use-cases, that would simplify the argumentaround particular benefits and weaknesses of these hashes a lot.

In my opinion the realm of oversized directories would be ad-hoc,prototype or research software. Any "mature" application I've seen yetused subdirectories to organize such a huge number of files.Your best bet for these "immature" kind of applications is probably tohave simple guidelines on how to do efficient file naming. Possiblysimpler than to apply the patricia trie on the application side ;-)


Regards

FloW

Re: HEADS UP - HAMMER work

Reply via email to