Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

Kenneth Marshall Fri, 09 Jan 2009 12:29:54 -0800

On Fri, Jan 09, 2009 at 12:04:15PM -0800, Jeff Davis wrote:
> On Mon, 2008-12-22 at 13:47 -0600, Kenneth Marshall wrote: 
> > Dear PostgreSQL developers,
> > 
> > I am re-sending this to keep this last change to the
> > internal hash function on the radar.
> > 
> 
> Hi Ken,
> 
> A few comments:
> 
> 1. New patch with very minor changes attached.
> 
> 2. I reverted the change you made to indices.sgml. We still don't use
> WAL for hash indexes, and in my opinion we should continue to discourage
> their use until we do use WAL. We can add back in the comment that hash
> indexes are suitable for large keys if we have some results to show
> that.
> 
> 3. There was a regression test failure in union.sql because the ordering
> of the results was different. I updated the regression test.
> 
> 4. Hash functions affect a lot more than hash indexes, so I ran through
> a variety of tests that use a HashAggregate plan. Test setup and results
> are attached. These results show no difference between the old and the
> new code (about 0.1% better).
> 
> 5. The hash index build time shows some improvement. The new code won in
> every instance in which a there were a lot of duplicates in the table
> (100 distinct values, 50K of each) by around 5%.
> 
> The new code appeared to be the same or slightly worse in the case of
> hash index builds with few duplicates (1000000 distinct values, 5 of
> each). The difference was about 1% worse, which is probably just noise.
> 
> Note: I'm no expert on hash functions. Take all of my tests with a grain
> of salt.
> 
> I would feel a little better if I saw at least one test that showed
> better performance of the new code on a reasonable-looking distribution
> of data. The hash index build that you showed only took a second or two
> -- it would be nice to see a test that lasted at least a minute.
> 
> Regards,
>       Jeff Davis
> 
>


Jeff,

Thanks for the review. I would not really expect any differences in hash
index build times other than normal noise variances. The most definitive
benchmark that I have seen was done with my original patch submission
in March posted by Luke of Greenplum:

  "We just applied this and saw a 5 percent speedup on a hash aggregation
   query with four columns in a 'group by' clause run against a single
   TPC-H table."

I wonder if they would be willing to re-run their test? Thanks again.

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

Reply via email to