Paul Gilmartin wrote:
On Tue, 27 Jul 2010 22:06:18 +0000, john gilmore wrote:
Yes, 61, which is prime, is better than 64 = 2^6, which is composite.
...
If division-method hashing is used a prime divisor/modulus is highly desirable.
Clustering at the prime divisors of a composite modulus does occur. I dislike
arguments from authority, but 1) this is not the place for a bibliography and
2) RKFATWTF.
For pure division-method, surely. Modulo 64 merely extracts the
rightmost 6 bits of the original string. But you had suggested
CKSUM, then modulo. And if CKSUM is of high quality, I'd expect
any modulus to give good results.
Unfortunately CKSUM is not high quality. It's pathologically pathetic.
It's nothing more than an additive hash. You will be forced to use a
hash table
sized by a prime number. The best hash table sizes are powers of two so
you can use bitwise ANDs instead of mod division, much
faster for dynamically resizing the table.
A good hash function must achieve avalanche which requires a mixing step
using magic ratios (usually a special prime number).
I've timed just about every known hashing algorithm for strings and the
best one by some distance on z is the murmur hash.
http://sites.google.com/site/murmurhash/
Here's the murmur 2 hash assembler produced by the Metal/C compiler.
HASHMUR#C CSECT
HASHMUR#C AMODE 31
HASHMUR#C RMODE ANY
SYSSTATE ARCHLVL=2
* unsigned int MurmurHash2(
* const void * key,
* int len,
* unsigned int seed
* )
.* The HLASM GOFF option is needed to assemble this code
@@c...@1 ALIAS C'MurmurHash2'
ENTRY @@c...@1
@@c...@1 AMODE 31
@@c...@1 RMODE ANY
@@c...@1 DS 0F
STM 14,6,12(13)
LR 15,13
L 13,8(,13)
ST 15,4(,13)
@@b...@1 DS 0H
USING @@a...@1,13
LARL 3,@@l...@1
USING @@l...@1,3
STMH 14,6,80(13)
* {
* // 'm' and 'r' are mixing constants generated offline.
* // They're not really 'magic', they just happen to work well.
* const unsigned int m = 0x5BD1E995;
* const int r = 24;
*
* // Initialize the hash to a 'random' value
* unsigned int h = seed ^ len;
USING @@pa...@1,1
L 15,@4seed
L 0,@3len
X 15,@3len
CHI 0,4
*
* // Mix 4 bytes at a time into the hash
* const unsigned char * data = (const unsigned char *)key;
L 14,@2key
*
* while(len >= 4)
BRL @1L4
LR 2,0
AHI 2,-4
SRL 2,2
AHI 2,1
SLLG 4,2,2
LARL 1,@@co...@area@@
LR 6,2
SLR 0,4
@1L3 DS 0H
* {
* unsigned int k = *(unsigned int *)data;
L 4,0(,14) (*)uint
*
* k *= m;
MS 4,0(,1)
* k ^= k >> r;
LR 5,4
SRL 5,24
XR 4,5
MS 15,=F'1540483477'
* k *= m;
MS 4,=F'1540483477'
*
* h *= m;
* h ^= k;
*
* data += 4;
LA 14,4(,14) #AddressShadow
XR 15,4
BRCT 6,@1L3
@1L4 DS 0H
* len -= 4;
* }
*
* // Handle the last few bytes of the input array
* switch(len)
CHI 0,1
BRE @1L7
CHI 0,2
BRE @1L8
CHI 0,3
BRNE @1L5
* {
* case 3: h ^= data[2] << 16;
LLGC 0,2(0,14) (*)Cuchar
SLL 0,16
XR 15,0
@1L8 DS 0H
* case 2: h ^= data[1] << 8;
LLGC 0,1(0,14) (*)Cuchar
SLL 0,8
XR 15,0
@1L7 DS 0H
* case 1: h ^= data[0];
LLGC 14,0(0,14) (*)Cuchar
XR 15,14
* h *= m;
MS 15,=F'1540483477'
@1L5 DS 0H
* };
*
* // Do a few final mixes of the hash to ensure the last few
* // bytes are well-incorporated.
* h ^= h >> 13;
LR 14,15
SRL 14,13
XR 15,14
* h *= m;
MS 15,=F'1540483477'
* h ^= h >> 15;
LR 14,15
SRL 14,15
XR 15,14
*
* return h;
* }
@1L6 DS 0H
LMH 14,6,80(13)
DROP
L 13,4(,13)
L 14,12(,13)
LM 1,6,24(13)
BR 14
DS 0F
@@l...@1 LTORG
EJECT
@@a...@1 DSECT
DS XL144
ORG @@a...@1
#GPR_SA_1 DS 18F
DS F
@@pa...@1 DSECT
DS XL12
ORG @@pa...@1+0
@2key DS F
ORG @@pa...@1+4
@3len DS F
ORG @@pa...@1+8
@4seed DS F
EJECT
HASHMUR#C CSECT ,
@@co...@area@@ DS 0D
DC XL4'5BD1E995'
END ,(5694A01 ,1B00,10209)
RKFATWTF!? Not in my lexicon. Nor in Google's, apparently. But
the last three characters are familiar and hauntingly apt.
-- gil
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html