Neil, I have added these item to the TODO list. Do you plan on applying this?
--------------------------------------------------------------------------- Neil Conway wrote: > This patch implements two changes to hash indexes: > > - rather than storing the values of the index keys, we just store their > hash value instead. The hash opclasses have been marked "lossy" so that > the executor will recheck the scan qualifier to catch hash collisions. > > - within a page of a hash bucket, the entries are stored sorted by hash > value. When doing a hash index scan, we can then binary search within a > page rather than using a linear search. > > Unfortunately, I haven't been able to measure any performance > improvement from either of these changes :-\ > > I've run tests on two types of tables: int4 keys and relatively short > textual keys (I don't think there's much point testing longer keys: the > patches should obviously be a win there, so I'm concerned about speeding > up the common case). In both cases, the difference in index scan > performance isn't significant: it's about the same with and without the > patches. The indexes have been on tables with 1 million rows (of int4) > and 400,000 rows (of text). I would test with larger tables, but > creating hash indexes is so slow that even these sizes are painful > enough. Since indexes of this size should be cached in memory the > reduction in I/O for the text case isn't being measured (the index is > about 30% smaller than it is when we store the full text value), but > even so I would have expected the binary search to produce noticeable > gains... > > Perhaps I've made a coding error that has pessimized the performance > somehow (although nothing obvious shows up in profiles), or perhaps the > linear scan of the pages of the hash bucket isn't a performance problem > in the first place, at least for types with a cheap equality operator. > Some oprofile data seems to support this -- for example, in the int4 > case, we spend less than 0.5% of the time in _hash_next and children, > and only 1.8% of the runtime in hashgetmulti and children (with the > vanilla CVS HEAD code). > > -Neil > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to [EMAIL PROTECTED] so that your > message can get through to the mailing list cleanly -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq