Hannu Krosing <[EMAIL PROTECTED]> writes:

> Ühel kenal päeval, T, 2006-08-01 kell 10:54, kirjutas Andrew Dunstan:
> > Gregory Stark wrote:
> > > 
> > > I looked a while back and was suspicious about the actual hash functions 
> > > too.
> > > It seemed like a lot of them were vastly suboptimal. That would mean we're
> > > often dealing with mostly empty and mostly full buckets instead of well
> > > distributed hash tables.
> > >
> > >
> > >   
> > 
> > This is now sounding like a lot of low hanging fruit ... highly 
> > performant hash indexed tables could possibly be a very big win.
> > 
> 
> Are you sure about the badness of our hash functions ?
> 
> I just tested and hashtext(text) has about 1.4% of collisions on about
> 120M distinct texts, which is not bad considering thet total space for
> hashes is 4G, meaning that 120M covers itself already 3% of possible
> hash space.

I don't recall exactly what triggered my suspicions, but I think it had more
to do with things like how int4 and int8 were mapped to hash codes and what
the hash function looks like that compresses the hash codes into the actual
bucket. IIRC it's just the low order bits for int8 and it's the actual int4
which then only takes the low order bits. I seem to recall wondering what
would happen if you had, say, a list of /24 network addresses for example.

I didn't actually do any tests though so it's quite possible I missed
something that mitigated the issues I feared.


-- 
greg


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Reply via email to