Dmitri Bichko wrote:
Hello,

I have two tables, one has a foreing key from the other (only showing
the relevant columns and indices here):

             Table "expresso.probes"
   Column    |          Type          | Modifiers
-------------+------------------------+-----------
 platform_id | integer                | not null
 probe_num   | integer                | not null
 mrna_acc    | character varying(50)  |
Indexes:
    "idx_probes_mrna_acc" btree (mrna_acc, platform_id) WHERE mrna_acc
IS NOT NULL

           Table "expresso.mrna_info"
   Column    |         Type          | Modifiers
-------------+-----------------------+-----------
 mrna_acc    | character varying(25) | not null
 symbol      | character varying(50) | not null
Indexes:
    "idx_mrna_info_symbol" btree (upper(symbol::text)) WHERE symbol IS
NOT NULL

1. The two types of mrna_acc don't match - one has a max length of 25, one 50. Why? 2. With idx_probes_mrna_acc, why WHERE mrna_acc IS NOT NULL? NULLs aren't indexed anyway. 3. You say there is a foreign key, but I don't even see a primary key anywhere. I'm guessing mrna_info.mrna_acc is the primary key for that table.

Here are the explains for the two step process:

expression=> EXPLAIN ANALYZE SELECT mrna_acc FROM mrna_info WHERE
upper(symbol) = upper('pparg') AND symbol IS NOT NULL;

As someone else mentions, the IS NOT NULL is redundant.

 Index Scan using idx_mrna_info_symbol on mrna_info  (cost=0.00..2934.78
rows=930 width=12) (actual time=0.038..0.089 rows=12 loops=1)

Note that the estimated number of rows is wrong though (930 rather than the actual 12).

EXPLAIN ANALYZE SELECT platform_id, probe_num FROM probes WHERE mrna_acc
IN
('U10374','U09138','U01841','U01664','NM_015869','NM_013124','NM_011146'
,'NM_005037','D83233','BC021798','BC006811','AB011365') AND mrna_acc IS
NOT NULL;

Again, a redundant IS NOT NULL, which presumably you're putting in to use the index.

 Index Scan using idx_probes_mrna_acc, idx_probes_mrna_acc,
idx_probes_mrna_acc, idx_probes_mrna_acc, idx_probes_mrna_acc,
idx_probes_mrna_acc, idx_probes_mrna_acc, idx_probes_mrna_acc,
idx_probes_mrna_acc, idx_probes_mrna_acc, idx_probes_mrna_acc,
idx_probes_mrna_acc on probes  (cost=0.00..14710.63 rows=4151 width=8)
(actual time=0.040..0.719 rows=142 loops=1)

Again, it's getting the row estimate badly wrong (4151 vs 142).

And here is the explain of the join (it's essentially the same plan as
the subselect and all the other ways I've tried):

expression=> explain SELECT platform_id, probe_num FROM mrna_info m,
probes p WHERE m.mrna_acc = p.mrna_acc and p.mrna_acc is not null and
UPPER(symbol) = UPPER('pparg') AND m.symbol IS NOT NULL;

Well, firstly get rid of the redundant "IS NOT NULL"s in the query and the indexes, then vacuum analyse the tables and post an EXPLAIN ANALYSE.

The problem will probably turn out to be poor row estimates (you can increase the statistics gathered on the mrna_acc values) or poor configuration settings (making indexes look expensive compared to sequential scans).

--
  Richard Huxton
  Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Reply via email to