Re: [PATCHES] hash index work

Bruce Momjian Fri, 27 May 2005 19:46:24 -0700

Neil, I have added these item to the TODO list.  Do you plan on applying
this?


---------------------------------------------------------------------------

Neil Conway wrote:
> This patch implements two changes to hash indexes:
> 
> - rather than storing the values of the index keys, we just store their 
> hash value instead. The hash opclasses have been marked "lossy" so that 
> the executor will recheck the scan qualifier to catch hash collisions.
> 
> - within a page of a hash bucket, the entries are stored sorted by hash 
> value. When doing a hash index scan, we can then binary search within a 
> page rather than using a linear search.
> 
> Unfortunately, I haven't been able to measure any performance 
> improvement from either of these changes :-\
> 
> I've run tests on two types of tables: int4 keys and relatively short 
> textual keys (I don't think there's much point testing longer keys: the 
> patches should obviously be a win there, so I'm concerned about speeding 
> up the common case). In both cases, the difference in index scan 
> performance isn't significant: it's about the same with and without the 
> patches. The indexes have been on tables with 1 million rows (of int4) 
> and 400,000 rows (of text). I would test with larger tables, but 
> creating hash indexes is so slow that even these sizes are painful 
> enough. Since indexes of this size should be cached in memory the 
> reduction in I/O for the text case isn't being measured (the index is 
> about 30% smaller than it is when we store the full text value), but 
> even so I would have expected the binary search to produce noticeable 
> gains...
> 
> Perhaps I've made a coding error that has pessimized the performance 
> somehow (although nothing obvious shows up in profiles), or perhaps the 
> linear scan of the pages of the hash bucket isn't a performance problem 
> in the first place, at least for types with a cheap equality operator. 
> Some oprofile data seems to support this -- for example, in the int4 
> case, we spend less than 0.5% of the time in _hash_next and children, 
> and only 1.8% of the runtime in hashgetmulti and children (with the 
> vanilla CVS HEAD code).
> 
> -Neil


> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to [EMAIL PROTECTED] so that your
>       message can get through to the mailing list cleanly

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

Re: [PATCHES] hash index work

Reply via email to