I would be appreciative if somebody could help explain why we have two nearly
identical queries taking two different planner routes; one a nested index loop
that takes about 5s to complete, and the other a hash join & heap scan that
takes about 2hr. This is using Postgres 9.3.3 on OS X 10.9.4.
These two queries are different only in the value of an hstore entry and a date
entry:
SELECT *, testruns.id FROM testruns JOIN test_types ON
testruns.test_type_id=test_types.id WHERE testruns.custom_data->'SpawnID' =
'SpawnID-428842195.338828' AND testruns.started_at > '2014-08-03
10:49:55.338828';
SELECT *, testruns.id FROM testruns JOIN test_types ON
testruns.test_type_id=test_types.id WHERE testruns.custom_data->'SpawnID' =
'SpawnID-428870395.258592' AND testruns.started_at > '2014-08-03
18:39:55.258592';
We have built an index for the hstore entry for SpawnID and most queries use it
correctly.
Column | Type |
Modifiers
------------------+-----------------------------+-------------------------------------------------------
id | integer | not null default
nextval('testruns_id_seq'::regclass)
started_at | timestamp without time zone | not null
...
Indexes:
...
"index_testruns_on_custom_spawnid" btree ((custom_data -> 'SpawnID'::text))
However after realizing that these two queries differed in execution time so
drastically, an explain showed that they are taking two very different paths:
db=# explain SELECT *, testruns.id FROM testruns JOIN test_types ON
testruns.test_type_id=test_types.id WHERE testruns.custom_data->'SpawnID' =
'SpawnID-428842195.338828' AND testruns.started_at > '2014-08-03
10:49:55.338828';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.85..2696.12 rows=88 width=1466)
-> Index Scan using index_testruns_on_custom_spawnid on testruns
(cost=0.57..2378.40 rows=88 width=1399)
Index Cond: ((custom_data -> 'SpawnID'::text) =
'SpawnID-428842195.338828'::text)
Filter: (started_at > '2014-08-03 10:49:55.338828'::timestamp without
time zone)
-> Index Scan using test_types_pkey on test_types (cost=0.29..3.60 rows=1
width=67)
Index Cond: (id = testruns.test_type_id)
(6 rows)
Time: 22.363 ms
db=# explain SELECT *, testruns.id FROM testruns JOIN test_types ON
testruns.test_type_id=test_types.id WHERE testruns.custom_data->'SpawnID' =
'SpawnID-428870395.258592' AND testruns.started_at > '2014-08-03
18:39:55.258592';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=292249.24..348608.93 rows=28273 width=1466)
Hash Cond: (testruns.test_type_id = test_types.id)
-> Bitmap Heap Scan on testruns (cost=291510.31..347269.21 rows=28273
width=1399)
Recheck Cond: (((custom_data -> 'SpawnID'::text) =
'SpawnID-428870395.258592'::text) AND (started_at > '2014-08-03
18:39:55.258592'::timestamp without time zone))
-> BitmapAnd (cost=291510.31..291510.31 rows=28273 width=0)
-> Bitmap Index Scan on index_testruns_on_custom_spawnid
(cost=0.00..41383.84 rows=500170 width=0)
Index Cond: ((custom_data -> 'SpawnID'::text) =
'SpawnID-428870395.258592'::text)
-> Bitmap Index Scan on index_testruns_on_started_at_2
(cost=0.00..250112.08 rows=3188736 width=0)
Index Cond: (started_at > '2014-08-03
18:39:55.258592'::timestamp without time zone)
-> Hash (cost=563.41..563.41 rows=14041 width=67)
-> Seq Scan on test_types (cost=0.00..563.41 rows=14041 width=67)
(11 rows)
Time: 1.231 ms
Mainly, I don't understand why two queries constructed the same but with subtle
value differences would take two different paths through the analyzer. We
haven't fiddled with the default planner options, aside from random_page_cost
and effective_cache_size to match our hardware. Re-analyzing the table does
not seem to change the cost. The index size is somewhat large (16G for a 114G
table), and I haven't tried rebuilding it yet.
thanks,
~ john
--
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general