Hello, we have some confusion over the planner's use of an index.

Suppose we have a table "parades" with columns:

    "city_id" of type integer
    "description" of type text
    "start_time" of type timestamp without time zone

Suppose also we have indexes:

    "parades_city_id_description_tsv_index" gin (city_id,
to_tsvector('simple'::regconfig, description)) WHERE description IS NOT NULL
    "parades_city_id_start_time_index" btree (city_id, start_time)

When we EXPLAIN the query

    SELECT * FROM "parades" WHERE ((description IS NOT NULL) AND
(to_tsvector('simple', description) @@ to_tsquery('simple', 'fun')) AND
("city_id" IN (<roughly 50 ids>)));

We get

    Bitmap Heap Scan on parades  (cost=12691.97..18831.21 rows=2559
width=886)
      Recheck Cond: ((to_tsvector('simple'::regconfig, description) @@
'''fun'''::tsquery) AND (description IS NOT NULL) AND (city_id = ANY
('{<roughly 50 ids>}'::integer[])))
      ->  BitmapAnd  (cost=12691.97..12691.97 rows=2559 width=0)
            ->  Bitmap Index Scan on parades_city_id_description_tsv_index
 (cost=0.00..2902.97 rows=229463 width=0)
                  Index Cond: (to_tsvector('simple'::regconfig, title) @@
'''fun'''::tsquery)
            ->  Bitmap Index Scan on parades_city_id_start_time_index
 (cost=0.00..9787.47 rows=565483 width=0)
                  Index Cond: (city_id = ANY ('{<roughly 50
ids>}'::integer[]))

When we EXPLAIN the same query but with one city_id

    SELECT * FROM "parades" WHERE ((description IS NOT NULL) AND
(to_tsvector('simple', description) @@ to_tsquery('simple', 'fun')) AND
("city_id" IN (1)));

We get

     Bitmap Heap Scan on parades  (cost=36.20..81.45 rows=20 width=886)
       Recheck Cond: ((city_id = 1) AND (to_tsvector('simple'::regconfig,
description) @@ '''fun'''::tsquery) AND (description IS NOT NULL))
       ->  Bitmap Index Scan on parades_city_id_description_tsv_index
(cost=0.00..36.20
rows=20 width=0)
             Index Cond: ((city_id = 1) AND
(to_tsvector('simple'::regconfig, description) @@ '''fun'''::tsquery))

This leaves us with two questions:

1. How is postgres able to use parades_city_id_description_tsv_index in the
first explain result without any filter on "city_id"?
2. Why does the planner in the first query decide not to simply use
parades_city_id_description_tsv_index (as in the second explain result)
when the cardinality of the set of "city_id"s is high?

Thanks,
Jared

Reply via email to