On Wed, Jun 26, 2013 at 9:20 PM, Stephen Frost <sfr...@snowman.net> wrote: > * Atri Sharma (atri.j...@gmail.com) wrote: >> My point is that I would like to help in the implementation, if possible. :) > > Feel free to go ahead and implement it.. I'm not sure when I'll have a > chance to (probably not in the next week or two anyway). Unfortunately, > the bigger issue here is really about testing the results and > determining if it's actually faster/better with various data sets > (including ones which have duplicates). I've got one test data set > which has some interesting characteristics (for one thing, hashing the > "large" side and then seq-scanning the "small" side is actually faster > than going the other way, which is quite 'odd' imv for a hashing > system): http://snowman.net/~sfrost/test_case2.sql > > You might also look at the other emails that I sent regarding this > subject and NTUP_PER_BUCKET. Having someone confirm what I saw wrt > changing that parameter would be nice and it would be a good comparison > point against any kind of pre-filtering that we're doing. > > One thing that re-reading the bloom filter description reminded me of is > that it's at least conceivable that we could take the existing hash > functions for each data type and do double-hashing or perhaps seed the > value to be hashed with additional data to produce an "independent" hash > result to use. Again, a lot of things that need to be tested and > measured to see if they improve overall performance.
Right, let me look.Although, I am pretty busy atm with ordered set functions, so will get it done maybe last week of this month. Another thing I believe in is that we should have multiple hashing functions for bloom filters, which generate different probability values so that the coverage is good. Regards, Atri -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers