We just run into a very simple query that the planner does much worse on
than we thought it would (in production the table in question is ~ 100
GB).  It surprised us given the planner is generally quite good, so I
thought I share our surprise

Setup:

postgres_prod@proddb_testing=# select version();[1]
                                                    version

────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 PostgreSQL 9.2.16 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7
20120313 (Red Hat 4.4.7-16), 64-bit
(1 row)

Time: 69.246 ms

postgres_prod@proddb_testing=# create table toy_data3 (the_date date, i
int);
CREATE TABLE
Time: 67.096 ms
postgres_prod@proddb_testing=# insert into toy_data3

  (select current_date-(s.idx/1000), s.idx from generate_series(1,1000000)
as s(idx));
INSERT 0 1000000
Time: 1617.483 ms
postgres_prod@proddb_testing=# create index toy_data_date3 on
toy_data3(the_date);
CREATE INDEX
Time: 660.166 ms
postgres_prod@proddb_testing=# analyze toy_data3;
ANALYZE
Time: 294.984 ms

The bad behavior:

postgres_prod@proddb_testing=# explain analyze
  select * from (
   select td1.the_date, td1.i
    from toy_data3 td1, toy_data3 td2  where td1.the_date = td2.the_date
and td1.i = td2.i
  ) foo
  where the_date between current_date and current_date;
                                                                   QUERY
PLAN
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Hash Join  (cost=55.49..21980.50 rows=1 width=8) (actual
time=0.336..179.374 rows=999 loops=1)
   Hash Cond: ((td2.the_date = td1.the_date) AND (td2.i = td1.i))
   ->  Seq Scan on toy_data3 td2  (cost=0.00..14425.00 rows=1000000
width=8) (actual time=0.007..72.510 rows=1000000 lo
   ->  Hash  (cost=40.44..40.44 rows=1003 width=8) (actual
time=0.321..0.321 rows=999 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 40kB
         ->  Index Scan using toy_data_date3 on toy_data3 td1
 (cost=0.01..40.44 rows=1003 width=8) (actual time=0.018.
               Index Cond: ((the_date >= ('now'::cstring)::date) AND
(the_date <= ('now'::cstring)::date))
 Total runtime: 179.440 ms
(8 rows)

Time: 246.094 ms

Notice the red.  Which is sad because one would like it to realize that it
could propagate the index constraint onto td2.  That is on both sides of
the join do the green.

As it does correctly when one explicitly uses equality (bold below) (but of
course we sometimes have multiple day ranges in production and we only used
a single date range above to make it extra interesting for the planner to
NOT do a seqscan):

postgres_prod@proddb_testing=# explain analyze
  select * from (
   select td1.the_date, td1.i
    from toy_data3 td1, toy_data3 td2  where td1.the_date = td2.the_date
and td1.i = td2.i        ) foo
  where *the_date = current_date*;
                                                                   QUERY
PLAN
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Hash Join  (cost=50.47..92.17 rows=1 width=8) (actual time=0.300..0.652
rows=999 loops=1)
   Hash Cond: (td1.i = td2.i)
   ->  Index Scan using toy_data_date3 on toy_data3 td1  (cost=0.00..37.93
rows=1003 width=8) (actual time=0.023..0.169
         Index Cond: (the_date = ('now'::cstring)::date)
   ->  Hash  (cost=37.93..37.93 rows=1003 width=8) (actual
time=0.270..0.270 rows=999 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 40kB
         ->  Index Scan using toy_data_date3 on toy_data3 td2
 (cost=0.00..37.93 rows=1003 width=8) (actual time=0.007.
               Index Cond: (the_date = ('now'::cstring)::date)
 Total runtime: 0.713 ms
(9 rows)

Time: 66.904 ms

Cheers,

Bene

Reply via email to