On Tue, 2012-01-24 at 16:07 +0400, Alexander Korotkov wrote:
> Hi!
> 
> 
> New version of patch is attached.

Thank you for the updates. I have a small patch attached.

The only code change I made was very minor: I changed the constants used
in the penalty function because your version used INFINITE_BOUND_PENALTY
when adding an empty range, and that didn't quite make sense to me. If
I'm mistaken you can leave it as-is.

I also attached range-gist-test.sql, which I used for a performance
test. I mix various types of ranges together in a larger table of 1.1M
tuples. And then I create a smaller table that only contains normal
ranges and empty ranges. There are two tests:
  1. Create an index on the big table
  2. Do a "range join" (using "overlaps" rather than "equals") where the
smaller table is on the outer side of a nested loop join and an index
scan over the larger table on the inner.

The index creation time reduces by a small amount with the patch, from
around 16s without the patch to around 13s with the patch. The query
time, however, dropped from around 26s to around 14s! Almost 2x speedup
with the patch!

Moreover, looking at the loop timing in the explain analyze output, it
goes from about "7..24" ms per loop down to about "1.5..13" ms per loop.
That seems to indicate that the index distribution is better, with more
queries returning quickly.

So, great work Alexander! Very convincing results.

Marking "ready for committer", but please apply my comment fixes at your
discretion.

Regards,
        Jeff Davis

PS: the test was run on my workstation (Intel(R) Core(TM) i7-2600 CPU @
3.40GHz) with work_mem=512MB, shared_buffers=512MB, and
checkpoint_segments=32. The rest of the settings were default.


Attachment: range-gist-comments.patch.gz
Description: GNU Zip compressed data

\timing on

drop table big;
drop table small;

create temp table tmp_foo(i int, ir int8range);
insert into tmp_foo select g % 100, 'empty'::int8range from generate_series(1,50000) g;
insert into tmp_foo select g % 100, int8range(NULL,NULL) from generate_series(1,10000) g;
insert into tmp_foo select g % 100, int8range(NULL,((random()-0.5)*g*10)::int8) from generate_series(1,20000) g;
insert into tmp_foo select g % 100, int8range(((random()-0.5)*g*10)::int8,NULL) from generate_series(1,20000) g;
insert into tmp_foo select g % 100,
  int8range(
    (g*10 + 10*(random()-0.5))::int8,
    (g*10 + 10 + 10*(random()-0.5))::int8 )
  from generate_series(1,1000000) g;

create table big as select * from tmp_foo order by random();
drop table tmp_foo;

create table tmp_foo(i int, ir int8range);
insert into tmp_foo select g*1000 % 100, 'empty'::int8range from generate_series(1,50) g;
insert into tmp_foo select g*1000 % 100,
  int8range(
    (g*10*1000 + 10*(random()-0.5))::int8,
    (g*10*1000 + 10 + 10*(random()-0.5))::int8 )
  from generate_series(1,1000) g;

create table small as select * from tmp_foo order by random();
drop table tmp_foo;

vacuum;
vacuum;
vacuum;

create index big_idx on big using gist (ir);

analyze;

set enable_bitmapscan=false;

explain analyze select sum(upper(intersection) - lower(intersection))
from (
  select small.ir * big.ir as intersection from small,big
  where small.ir && big.ir and not lower_inf(big.ir) and not upper_inf(big.ir)
) s;

set enable_bitmapscan to default;
set enable_indexscan=false;

explain analyze select sum(upper(intersection) - lower(intersection))
from (
  select small.ir * big.ir as intersection from small,big
  where small.ir && big.ir and not lower_inf(big.ir) and not upper_inf(big.ir)
) s;

set enable_indexscan to default;
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to