Currently, hash indexes always store the hash code in the index, but not the actual Datum. It's recently been noted that this can make a hash index smaller than the corresponding btree index would be if the column is wide. However, if the index is being built on a fixed-width column with a typlen <= sizeof(Datum), we could store the original value in the hash index rather than the hash code without using any more space. That would complicate the code, but I bet it would be faster: we wouldn't need to set xs_recheck, we could rule out hash collisions without visiting the heap, and we could support index-only scans in such cases.
Another thought is that hash codes are 32 bits, but a Datum is 64 bits wide on most current platforms. So we're wasting 4 bytes per index tuple storing nothing. If we generated 64-bit hash codes we could store as many bits of it as a Datum will hold and reduce hash collisions. Alternatively, we could try to stick some other useful information in those bytes, like an abbreviated abbreviated key. Not sure if these are good ideas. They're just ideas. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers