Hannu Krosing <[EMAIL PROTECTED]> writes: > Ãhel kenal päeval, T, 2006-08-01 kell 10:54, kirjutas Andrew Dunstan: > > Gregory Stark wrote: > > > > > > I looked a while back and was suspicious about the actual hash functions > > > too. > > > It seemed like a lot of them were vastly suboptimal. That would mean we're > > > often dealing with mostly empty and mostly full buckets instead of well > > > distributed hash tables. > > > > > > > > > > > > > This is now sounding like a lot of low hanging fruit ... highly > > performant hash indexed tables could possibly be a very big win. > > > > Are you sure about the badness of our hash functions ? > > I just tested and hashtext(text) has about 1.4% of collisions on about > 120M distinct texts, which is not bad considering thet total space for > hashes is 4G, meaning that 120M covers itself already 3% of possible > hash space.
I don't recall exactly what triggered my suspicions, but I think it had more to do with things like how int4 and int8 were mapped to hash codes and what the hash function looks like that compresses the hash codes into the actual bucket. IIRC it's just the low order bits for int8 and it's the actual int4 which then only takes the low order bits. I seem to recall wondering what would happen if you had, say, a list of /24 network addresses for example. I didn't actually do any tests though so it's quite possible I missed something that mitigated the issues I feared. -- greg ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match