On Fri, Jan 09, 2009 at 12:04:15PM -0800, Jeff Davis wrote: > On Mon, 2008-12-22 at 13:47 -0600, Kenneth Marshall wrote: > > Dear PostgreSQL developers, > > > > I am re-sending this to keep this last change to the > > internal hash function on the radar. > > > > Hi Ken, > > A few comments: > > 1. New patch with very minor changes attached. > > 2. I reverted the change you made to indices.sgml. We still don't use > WAL for hash indexes, and in my opinion we should continue to discourage > their use until we do use WAL. We can add back in the comment that hash > indexes are suitable for large keys if we have some results to show > that. > > 3. There was a regression test failure in union.sql because the ordering > of the results was different. I updated the regression test. > > 4. Hash functions affect a lot more than hash indexes, so I ran through > a variety of tests that use a HashAggregate plan. Test setup and results > are attached. These results show no difference between the old and the > new code (about 0.1% better). > > 5. The hash index build time shows some improvement. The new code won in > every instance in which a there were a lot of duplicates in the table > (100 distinct values, 50K of each) by around 5%. > > The new code appeared to be the same or slightly worse in the case of > hash index builds with few duplicates (1000000 distinct values, 5 of > each). The difference was about 1% worse, which is probably just noise. > > Note: I'm no expert on hash functions. Take all of my tests with a grain > of salt. > > I would feel a little better if I saw at least one test that showed > better performance of the new code on a reasonable-looking distribution > of data. The hash index build that you showed only took a second or two > -- it would be nice to see a test that lasted at least a minute. > > Regards, > Jeff Davis > >
Jeff, Thanks for the review. I would not really expect any differences in hash index build times other than normal noise variances. The most definitive benchmark that I have seen was done with my original patch submission in March posted by Luke of Greenplum: "We just applied this and saw a 5 percent speedup on a hash aggregation query with four columns in a 'group by' clause run against a single TPC-H table." I wonder if they would be willing to re-run their test? Thanks again. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers