On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote: > Kenneth Marshall <[EMAIL PROTECTED]> writes: > > ... This is the rough plan. Does anyone see anything critical that > > is missing at this point? > > Sounds pretty good. Let me brain-dump one item on you: one thing that > hash currently has over btree is the ability to handle index items up > to a full page. Now, if you go with a scheme that only stores hash > codes and not the underlying data, you can not only handle that but > improve on it; but if you reduce the bucket size and don't remove the > data, it'd be a step backward. The idea I had about dealing with that > was to only reduce the size of primary buckets --- if it's necessary to > add overflow space to a bucket, the overflow units are still full pages. > So an index tuple up to a page in size can always be accommodated by > adding an overflow page to the bucket. > > Just a thought, but AFAIR it's not in the archives anywhere. > > regards, tom lane > I was thinking about this some more, and it strikes me that we can keep the page size = bucket size = overflow size in the new scheme of storing the hash value in the index and still reduce the effective bucket size. Let's make the new size the (page size / 2^n) where n is chosen appropriately. Then we we store the value in the page we simply use n bits of the hash to determine which sub-piece of the page to use to actually store the value. We may need to add a multiplier to adjust the decision to split the page based on the mini-page. This should allow us to much more densely pack the pages and approach the 1 item per bucket. This could easily shrink the size of the index by a factor of 2^n. Let me know what you think.
Regards, Ken ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq