Kenneth Marshall <[EMAIL PROTECTED]> writes:
> On Thu, Sep 04, 2008 at 02:01:18PM -0400, Tom Lane wrote:
>> I think what the hash index patch really needs is some performance
>> testing.  I'm willing to take responsibility for the code being okay
>> or not, but I haven't got any production-grade hardware to do realistic
>> performance tests on.  I'd like to see a few more scenarios tested than
>> the one provided so far: in particular, wide vs narrow index keys and
>> good vs bad key distributions.

> Do you mean good vs. bad key distributions with respect to hash value
> collisions? 

Right, I'm just concerned about how badly it degrades with hash value
collisions.  Those will result in wasted trips to the heap if we omit
the original key from the index.  AFAICS that's the only downside of
doing so; but we should have an idea of how bad it could get before
committing to doing this.

> Currently, since a trip to the heap is required to validate any tuple
> even if the exact value is contained in the index, it does not seem
> like it would be a win to store the value in both places.

The point is that currently you *don't* need a trip to the heap if
the key doesn't match, even if it has the same hash code.

> I think that increasing the number of
> hash bits stored would provide more bang-for-the-buck than storing
> the exact value.

Maybe, but that would require an extremely invasive patch that breaks
existing user-defined datatypes.  You can't just magically get more hash
bits from someplace, you need datatype-specific hash functions that will
provide more than 32 bits.  There's going to have to be a LOT of
evidence in support of the value of doing that before I'll buy in.

                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to