Re: Directory hashing question

Karl Vogel Fri, 17 Jan 2003 09:56:52 -0800

>> On Thu, 16 Jan 2003 15:13:26 +0000, 
>> John Ekins <[EMAIL PROTECTED]> said:


J> I have a question about "best" practices for directory hashing.
J> I have about 80,000 zone files which are named after the domain which I
J> generate using a Perl script.  I'm looking for the best hashing to
J> reduce the start up time for bind.

J> I've tried different hashings.  Using example.foo as an example (:-)),
J> if I take the first and second letters of the domain and hash it like
J> /var/named/e/x/example.foo, I still end up with (in a few cases)
J> more than 3000 zones in one directory.

   Yup, lots of domain names start with an English word, and the first two
   letters are not evenly distributed.

J> If I hash using the first+second and third+fourth like
J> /var/named/ex/am/example.com, I end up with a lot fewer zones in the
J> individual directories, but bind's start up time is much longer.

   Probably because it has lots more directories to look through.

J> I'm seeking some advice and suggestions on what others think (or know)
J> would be better.

   The hash function from SDBM has behaved pretty well for me in the past.
   It's simple and quick.  I'd create 256 directories (0x00 - 0xff),
   and use the last byte from the hash value of each domain name to
   determine the directory.  Examples below.

-- 
Karl Vogel                      I don't speak for the USAF or my company
[EMAIL PROTECTED]                          http://www.pobox.com/~vogelke

EXCUSE FOR GETTING TO WORK LATE #3:
I am stuck in the blood pressure machine down at the Food Giant.

---------------------------------------------------------------------------
Create the directories:

    me% cd /var/named
    me% perl -e 'for (0x00 .. 0xff) { printf "%2.2x\n", $_; }' | xargs mkdir

Hash routine:

    me% cat shash
    #!/usr/local/bin/perl
    # sdbm hashing routine
    #
    # Original C code from SDBM library:
    #
    #   long sdbm_hash(register char *str, register int len)
    #   {
    #       register unsigned long n = 0;
    #       while (len--)
    #           n = *str++ + 65587 * n;
    #       return n;
    #   }

    use integer;
    use strict;
    use warnings;

    my $hval;   # hash value.

    while (<>) {
        chomp;
        $hval = sdbmhash($_);
        print "$hval $_\n";
    }

    exit (0);

    sub sdbmhash {
        ($_) = @_;
        my $n = 0;

        # Walk the string one character at a time.
        # Use the lowest 31 bits (avoid sign-bit), and keep
        # the lowest byte.

        while (/(.)/g) {
            $n = (ord($1) + 65587 * $n) & 0x7fffffff;
        }

        return sprintf("%2.2x", $n & 0xff);
    }

    me% echo example.foo | ./shash
    7a example.foo

Here's a short, not-terribly-scientific test.  The hash distributes English
words pretty well:

    me% cd /usr/share/lib/dict

    me% wc -l words
       25143 words

    me% ./shash < words | awk '{print $1}' | sort | uniq -c | pr -5t
     105 00        120 34         93 67        101 9a        101 cd
      96 01        110 35        108 68         96 9b        107 ce
     108 02         95 36        105 69        106 9c         83 cf
     102 03         83 37         95 6a        107 9d         99 d0
      96 04        101 38         96 6b        106 9e         93 d1
      88 05         95 39        101 6c         97 9f         90 d2
      96 06         99 3a        104 6d         77 a0         96 d3
      80 07        102 3b        114 6e         98 a1         82 d4
      86 08         89 3c         88 6f        103 a2         97 d5
     106 09         92 3d         76 70        111 a3         89 d6
     106 0a        109 3e        111 71         97 a4         82 d7
      87 0b         87 3f        108 72         98 a5         93 d8
      97 0c        102 40        111 73        102 a6         84 d9
     103 0d         94 41        106 74        109 a7         96 da
      93 0e        103 42        107 75         99 a8         85 db
     107 0f        114 43        110 76         98 a9        105 dc
      85 10         90 44        104 77         91 aa        111 dd
      95 11        128 45         80 78         97 ab         95 de
     102 12         85 46        108 79        100 ac        109 df
      97 13        100 47         98 7a        101 ad        104 e0
     110 14        122 48         93 7b         86 ae        116 e1
     101 15         94 49        118 7c        106 af         81 e2
      86 16        118 4a        104 7d         94 b0         93 e3
     115 17        108 4b         96 7e        104 b1         83 e4
      82 18         98 4c         98 7f         89 b2         85 e5
     102 19         98 4d        105 80        104 b3        103 e6
      90 1a        103 4e         96 81        100 b4         85 e7
     113 1b         95 4f         98 82         97 b5         99 e8
     105 1c         84 50        111 83        108 b6         83 e9
     101 1d         93 51         86 84        112 b7         94 ea
      96 1e         90 52        104 85        103 b8         92 eb
     136 1f        109 53         90 86        108 b9         99 ec
     100 20         98 54        106 87         78 ba         85 ed
      93 21         88 55         99 88        112 bb        103 ee
     102 22         88 56         91 89         91 bc         78 ef
     106 23         87 57         86 8a         95 bd         87 f0
     106 24         90 58         99 8b        103 be         87 f1
     105 25         83 59         83 8c        101 bf        103 f2
      88 26        105 5a         90 8d         98 c0         99 f3
      96 27        117 5b        101 8e        109 c1        103 f4
      88 28         95 5c         94 8f         92 c2        104 f5
     104 29         91 5d        103 90         95 c3         96 f6
     100 2a        113 5e        116 91        100 c4        100 f7
      94 2b        102 5f        111 92         92 c5         86 f8
      92 2c        100 60        107 93        102 c6        113 f9
     100 2d        112 61         90 94         85 c7        116 fa
     104 2e         86 62         91 95         84 c8        105 fb
      94 2f         98 63         89 96        105 c9         76 fc
     110 30         95 64         85 97         97 ca        102 fd
     103 31         89 65         84 98         99 cb        124 fe
      97 32         85 66         92 99        112 cc         91 ff
      90 33
    

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-questions" in the body of the message

Re: Directory hashing question

Reply via email to