Hello, > "index_testruns_on_custom_spawnid" btree ((custom_data -> 'SpawnID'::text)) > .. > WHERE testruns.custom_data->'SpawnID' = 'SpawnID-428842195.338828' > ...
If all your SpawnID have this prefix, you may consider remove it from your index to reduce its size: => "index_testruns_on_custom_spawnid" btree ((substring(custom_data -> 'SpawnID'::text) , 9)) you must of course then use exactly this clause within your query: => WHERE substring(testruns.custom_data->'SpawnID' , 9) = '428842195.338828' And as it now looks like a number, maybe go a step further depending on which patterns these SpawnID can have. This given value can for example be stored as 2 int4 or one int8: ('428842195.338828'::numeric * 1000000)::int8 On the other hand this will only address a limited part of your performance issue. regards, Marc Mamin > -----Original Message----- > From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general- > ow...@postgresql.org] On Behalf Of john gale > Sent: Dienstag, 5. August 2014 21:17 > To: pgsql-general@postgresql.org general > Subject: [GENERAL] understanding why two nearly identical queries take two > different planner routes, one 5s and one 2hr > > > I would be appreciative if somebody could help explain why we have two nearly > identical queries taking two different planner routes; one a nested index > loop that takes about 5s to complete, and the other a hash join & heap scan > that takes about 2hr. This is using Postgres 9.3.3 on OS X 10.9.4. > > These two queries are different only in the value of an hstore entry and a > date entry: > > SELECT *, testruns.id FROM testruns JOIN test_types ON > testruns.test_type_id=test_types.id WHERE testruns.custom_data->'SpawnID' = > 'SpawnID-428842195.338828' AND testruns.started_at > '2014-08-03 > 10:49:55.338828'; SELECT *, testruns.id FROM testruns JOIN test_types ON > testruns.test_type_id=test_types.id WHERE testruns.custom_data->'SpawnID' = > 'SpawnID-428870395.258592' AND testruns.started_at > '2014-08-03 > 18:39:55.258592'; > > We have built an index for the hstore entry for SpawnID and most queries use > it correctly. > > Column | Type | > Modifiers > ------------------+-----------------------------+----------------------- > ------------------+-----------------------------+----------------------- > ------------------+-----------------------------+--------- > id | integer | not null default > nextval('testruns_id_seq'::regclass) > started_at | timestamp without time zone | not null > ... > Indexes: > ... > "index_testruns_on_custom_spawnid" btree ((custom_data -> > 'SpawnID'::text)) > > However after realizing that these two queries differed in execution time so > drastically, an explain showed that they are taking two very different paths: > > db=# explain SELECT *, testruns.id FROM testruns JOIN test_types ON > testruns.test_type_id=test_types.id WHERE testruns.custom_data->'SpawnID' = > 'SpawnID-428842195.338828' AND testruns.started_at > '2014-08-03 > 10:49:55.338828'; > QUERY PLAN > ------------------------------------------------------------------------------ > -------------------------------- > Nested Loop (cost=0.85..2696.12 rows=88 width=1466) > -> Index Scan using index_testruns_on_custom_spawnid on testruns > (cost=0.57..2378.40 rows=88 width=1399) > Index Cond: ((custom_data -> 'SpawnID'::text) = 'SpawnID- > 428842195.338828'::text) > Filter: (started_at > '2014-08-03 10:49:55.338828'::timestamp without > time zone) > -> Index Scan using test_types_pkey on test_types (cost=0.29..3.60 rows=1 > width=67) > Index Cond: (id = testruns.test_type_id) > (6 rows) > > Time: 22.363 ms > db=# explain SELECT *, testruns.id FROM testruns JOIN test_types ON > testruns.test_type_id=test_types.id WHERE testruns.custom_data->'SpawnID' = > 'SpawnID-428870395.258592' AND testruns.started_at > '2014-08-03 > 18:39:55.258592'; > > QUERY PLAN > ------------------------------------------------------------------------------ > ------------------------------------------------------------------------------ > ---------------- > Hash Join (cost=292249.24..348608.93 rows=28273 width=1466) > Hash Cond: (testruns.test_type_id = test_types.id) > -> Bitmap Heap Scan on testruns (cost=291510.31..347269.21 rows=28273 > width=1399) > Recheck Cond: (((custom_data -> 'SpawnID'::text) = 'SpawnID- > 428870395.258592'::text) AND (started_at > '2014-08-03 > 18:39:55.258592'::timestamp without time zone)) > -> BitmapAnd (cost=291510.31..291510.31 rows=28273 width=0) > -> Bitmap Index Scan on index_testruns_on_custom_spawnid > (cost=0.00..41383.84 rows=500170 width=0) > Index Cond: ((custom_data -> 'SpawnID'::text) = 'SpawnID- > 428870395.258592'::text) > -> Bitmap Index Scan on index_testruns_on_started_at_2 > (cost=0.00..250112.08 rows=3188736 width=0) > Index Cond: (started_at > '2014-08-03 > 18:39:55.258592'::timestamp without time zone) > -> Hash (cost=563.41..563.41 rows=14041 width=67) > -> Seq Scan on test_types (cost=0.00..563.41 rows=14041 width=67) > (11 rows) > > Time: 1.231 ms > > > Mainly, I don't understand why two queries constructed the same but with > subtle value differences would take two different paths through the analyzer. > We haven't fiddled with the default planner options, aside from > random_page_cost and effective_cache_size to match our hardware. Re-analyzing > the table does not seem to change the cost. The index size is somewhat large > (16G for a 114G table), and I haven't tried rebuilding it yet. > > thanks, > > ~ john > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make > changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general