>> On Thu, 16 Jan 2003 15:13:26 +0000, >> John Ekins <[EMAIL PROTECTED]> said:
J> I have a question about "best" practices for directory hashing. J> I have about 80,000 zone files which are named after the domain which I J> generate using a Perl script. I'm looking for the best hashing to J> reduce the start up time for bind. J> I've tried different hashings. Using example.foo as an example (:-)), J> if I take the first and second letters of the domain and hash it like J> /var/named/e/x/example.foo, I still end up with (in a few cases) J> more than 3000 zones in one directory. Yup, lots of domain names start with an English word, and the first two letters are not evenly distributed. J> If I hash using the first+second and third+fourth like J> /var/named/ex/am/example.com, I end up with a lot fewer zones in the J> individual directories, but bind's start up time is much longer. Probably because it has lots more directories to look through. J> I'm seeking some advice and suggestions on what others think (or know) J> would be better. The hash function from SDBM has behaved pretty well for me in the past. It's simple and quick. I'd create 256 directories (0x00 - 0xff), and use the last byte from the hash value of each domain name to determine the directory. Examples below. -- Karl Vogel I don't speak for the USAF or my company [EMAIL PROTECTED] http://www.pobox.com/~vogelke EXCUSE FOR GETTING TO WORK LATE #3: I am stuck in the blood pressure machine down at the Food Giant. --------------------------------------------------------------------------- Create the directories: me% cd /var/named me% perl -e 'for (0x00 .. 0xff) { printf "%2.2x\n", $_; }' | xargs mkdir Hash routine: me% cat shash #!/usr/local/bin/perl # sdbm hashing routine # # Original C code from SDBM library: # # long sdbm_hash(register char *str, register int len) # { # register unsigned long n = 0; # while (len--) # n = *str++ + 65587 * n; # return n; # } use integer; use strict; use warnings; my $hval; # hash value. while (<>) { chomp; $hval = sdbmhash($_); print "$hval $_\n"; } exit (0); sub sdbmhash { ($_) = @_; my $n = 0; # Walk the string one character at a time. # Use the lowest 31 bits (avoid sign-bit), and keep # the lowest byte. while (/(.)/g) { $n = (ord($1) + 65587 * $n) & 0x7fffffff; } return sprintf("%2.2x", $n & 0xff); } me% echo example.foo | ./shash 7a example.foo Here's a short, not-terribly-scientific test. The hash distributes English words pretty well: me% cd /usr/share/lib/dict me% wc -l words 25143 words me% ./shash < words | awk '{print $1}' | sort | uniq -c | pr -5t 105 00 120 34 93 67 101 9a 101 cd 96 01 110 35 108 68 96 9b 107 ce 108 02 95 36 105 69 106 9c 83 cf 102 03 83 37 95 6a 107 9d 99 d0 96 04 101 38 96 6b 106 9e 93 d1 88 05 95 39 101 6c 97 9f 90 d2 96 06 99 3a 104 6d 77 a0 96 d3 80 07 102 3b 114 6e 98 a1 82 d4 86 08 89 3c 88 6f 103 a2 97 d5 106 09 92 3d 76 70 111 a3 89 d6 106 0a 109 3e 111 71 97 a4 82 d7 87 0b 87 3f 108 72 98 a5 93 d8 97 0c 102 40 111 73 102 a6 84 d9 103 0d 94 41 106 74 109 a7 96 da 93 0e 103 42 107 75 99 a8 85 db 107 0f 114 43 110 76 98 a9 105 dc 85 10 90 44 104 77 91 aa 111 dd 95 11 128 45 80 78 97 ab 95 de 102 12 85 46 108 79 100 ac 109 df 97 13 100 47 98 7a 101 ad 104 e0 110 14 122 48 93 7b 86 ae 116 e1 101 15 94 49 118 7c 106 af 81 e2 86 16 118 4a 104 7d 94 b0 93 e3 115 17 108 4b 96 7e 104 b1 83 e4 82 18 98 4c 98 7f 89 b2 85 e5 102 19 98 4d 105 80 104 b3 103 e6 90 1a 103 4e 96 81 100 b4 85 e7 113 1b 95 4f 98 82 97 b5 99 e8 105 1c 84 50 111 83 108 b6 83 e9 101 1d 93 51 86 84 112 b7 94 ea 96 1e 90 52 104 85 103 b8 92 eb 136 1f 109 53 90 86 108 b9 99 ec 100 20 98 54 106 87 78 ba 85 ed 93 21 88 55 99 88 112 bb 103 ee 102 22 88 56 91 89 91 bc 78 ef 106 23 87 57 86 8a 95 bd 87 f0 106 24 90 58 99 8b 103 be 87 f1 105 25 83 59 83 8c 101 bf 103 f2 88 26 105 5a 90 8d 98 c0 99 f3 96 27 117 5b 101 8e 109 c1 103 f4 88 28 95 5c 94 8f 92 c2 104 f5 104 29 91 5d 103 90 95 c3 96 f6 100 2a 113 5e 116 91 100 c4 100 f7 94 2b 102 5f 111 92 92 c5 86 f8 92 2c 100 60 107 93 102 c6 113 f9 100 2d 112 61 90 94 85 c7 116 fa 104 2e 86 62 91 95 84 c8 105 fb 94 2f 98 63 89 96 105 c9 76 fc 110 30 95 64 85 97 97 ca 102 fd 103 31 89 65 84 98 99 cb 124 fe 97 32 85 66 92 99 112 cc 91 ff 90 33 To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-questions" in the body of the message