Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

Tomas Vondra Thu, 17 Dec 2015 07:36:25 -0800

Hi,

On 12/17/2015 10:50 AM, Shulgin, Oleksandr wrote:

On Tue, Dec 15, 2015 at 11:30 PM, Tomas Vondra
<[email protected] <mailto:[email protected]>> wrote:

Attached is a spreadsheet with results for various work_mem
values, and also with a smaller data set (just 30M rows in the fact
table), which easily fits into memory. Yet it shows similar gains,
shaving off ~40% in the best case, suggesting that this is not just
thanks to reduction of I/O when forcing the temp files to disk.


A neat idea! Have you possibly tried to also collect statistics
about actual false-positive rates and filter allocation sizes in
every of the collected data points?

The patch counts and prints the total number of lookups, and the numberof eliminated rows. The bloom filter is currently sized for 5% falsepositives rate, and the numbers I've seen match that.

I think ultimately we'll need to measure the false positive rate, sothat we can use it to dynamically disable the bloom filter if it getsinefficient. Also maybe put some of that into EXPLAIN ANALYZE.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

Reply via email to