Hi all,

I've been staring at this for hours (if not days). Any hints are appreciated! 
At first I thought "there must be a way to make postgresql perform on this 
thing", but i've lost hope that pgsql actually can deal with it..

The query is:

SELECT DISTINCT
  posrel.pos,
  questions.number,
  questions.title,
  answers.number, 
  answers.title,
  answers.body 
FROM 
  bar_portals portals, 
  ONLY bar_insrel insrel,
  bar_faq faq, 
  ONLY bar_posrel posrel, 
  bar_questions questions,
  bar_insrel insrel0,
  bar_answers answers 
WHERE 
    portals.number=2534202 
  AND 
    (portals.number=insrel.snumber AND faq.number=insrel.dnumber) 
  AND 
    (faq.number=posrel.snumber AND questions.number=posrel.dnumber AND 
posrel.rnumber=780) 
  AND 
    (
        (questions.number=insrel0.dnumber AND answers.number=insrel0.snumber 
AND insrel0.dir<>1) 
      OR 
        (questions.number=insrel0.snumber AND answers.number=insrel0.dnumber)
    ) 
  AND 
    (
          portals.owner NOT IN ('notinuse','productie') 
      AND insrel.owner NOT IN ('notinuse','productie') 
      AND faq.owner NOT IN ('notinuse','productie') 
      AND posrel.owner NOT IN ('notinuse','productie') 
      AND questions.owner NOT IN ('notinuse','productie') 
      AND insrel0.owner NOT IN ('notinuse','productie') 
      AND answers.owner NOT IN ('notinuse','productie')
    ) 
ORDER BY posrel.pos ASC;

A whole mouth full. Basically, we have portals, which are linked through insrel 
with faq's, which are linked through posrel to questions and finally these 
questions are linked to answers by insrel again.

Insrel is a horrible table that links together every object in this system, not 
just faq's and such. Posrel is similar (actually contains a subset of insrel).

Now, the explain analyze output:

                                                                                
                     QUERY PLAN                                                 
                                                    
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=784671.21..784674.86 rows=209 width=252) (actual 
time=46493.975..46494.052 rows=19 loops=1)
   ->  Sort  (cost=784671.21..784671.73 rows=209 width=252) (actual 
time=46493.969..46493.987 rows=19 loops=1)
         Sort Key: posrel.pos, questions.number, questions.title, 
answers.number, answers.title, answers.body
         ->  Nested Loop  (cost=1.74..784663.15 rows=209 width=252) (actual 
time=3467.099..46493.621 rows=19 loops=1)
               Join Filter: (((questions.number = insrel0.dnumber) AND 
(answers.number = insrel0.snumber) AND (insrel0.dir <> 1)) OR 
((questions.number = insrel0.snumber) AND (answers.number = insrel0.dnumber)))
               ->  Nested Loop  (cost=0.00..6769.08 rows=181978 width=252) 
(actual time=1.962..6548.477 rows=1732040 loops=1)
                     ->  Nested Loop  (cost=0.00..282.30 rows=2 width=33) 
(actual time=1.287..7.785 rows=19 loops=1)
                           ->  Nested Loop  (cost=0.00..277.16 rows=18 width=8) 
(actual time=1.030..5.504 rows=19 loops=1)
                                 ->  Nested Loop  (cost=0.00..182.63 rows=1 
width=8) (actual time=0.921..4.445 rows=1 loops=1)
                                       ->  Nested Loop  (cost=0.00..147.05 
rows=64 width=4) (actual time=0.060..3.222 rows=269 loops=1)
                                             ->  Seq Scan on foo_portals 
portals  (cost=0.00..1.27 rows=1 width=4) (actual time=0.017..0.027 rows=1 
loops=1)
                                                   Filter: ((number = 2534202) 
AND ("owner" <> ALL ('{notinuse,productie}'::text[])))
                                             ->  Index Scan using 
foo_insrel_snumber_idx on foo_insrel insrel  (cost=0.00..145.14 rows=64 
width=8) (actual time=0.038..2.673 rows=269 loops=1)
                                                   Index Cond: (snumber = 
2534202)
                                                   Filter: ("owner" <> ALL 
('{notinuse,productie}'::text[]))
                                       ->  Index Scan using foo_faq_main_idx on 
foo_faq faq  (cost=0.00..0.54 rows=1 width=4) (actual time=0.003..0.003 rows=0 
loops=269)
                                             Index Cond: (faq.number = 
insrel.dnumber)
                                             Filter: ("owner" <> ALL 
('{notinuse,productie}'::text[]))
                                 ->  Index Scan using 
foo_posrel_snumber_owner_idx on foo_posrel posrel  (cost=0.00..93.97 rows=45 
width=12) (actual time=0.105..0.984 rows=19 loops=1)
                                       Index Cond: (faq.number = posrel.snumber)
                                       Filter: (rnumber = 780)
                           ->  Index Scan using foo_questions_main_idx on 
foo_questions questions  (cost=0.00..0.27 rows=1 width=29) (actual 
time=0.106..0.112 rows=1 loops=19)
                                 Index Cond: (questions.number = posrel.dnumber)
                                 Filter: ("owner" <> ALL 
('{notinuse,productie}'::text[]))
                     ->  Seq Scan on foo_answers answers  (cost=0.00..2333.50 
rows=90989 width=219) (actual time=0.061..162.481 rows=91160 loops=19)
                           Filter: ("owner" <> ALL 
('{notinuse,productie}'::text[]))
               ->  Bitmap Heap Scan on foo_insrel insrel0  (cost=1.74..4.25 
rows=1 width=12) (actual time=0.020..0.020 rows=0 loops=1732040)
                     Recheck Cond: (((questions.number = insrel0.dnumber) AND 
(answers.number = insrel0.snumber)) OR ((answers.number = insrel0.dnumber) AND 
(questions.number = insrel0.snumber)))
                     Filter: ("owner" <> ALL ('{notinuse,productie}'::text[]))
                     ->  BitmapOr  (cost=1.74..1.74 rows=1 width=0) (actual 
time=0.018..0.018 rows=0 loops=1732040)
                           ->  Bitmap Index Scan on 
foo_inresl_dnumber_snumber_idx  (cost=0.00..0.87 rows=1 width=0) (actual 
time=0.007..0.007 rows=0 loops=1732040)
                                 Index Cond: ((questions.number = 
insrel0.dnumber) AND (answers.number = insrel0.snumber))
                           ->  Bitmap Index Scan on 
foo_inresl_dnumber_snumber_idx  (cost=0.00..0.87 rows=1 width=0) (actual 
time=0.008..0.008 rows=0 loops=1732040)
                                 Index Cond: ((answers.number = 
insrel0.dnumber) AND (questions.number = insrel0.snumber))
 Total runtime: 46494.271 ms
(35 rows)

As you can see, 46 seconds. Not quite impressive.

Now, when I split up the OR in two distinct queries, everything is nice and 
fast. Both queries run in sub-second time.

Anyway, back to the query plan. The plan nicely starts with portals, insrel, 
faq, posrel and questions. But then it proposes something totally stupid, it 
seems it does a cartesian product of those 19-odd questions it got so far and 
the 91000 answers (the seq scan on foo_answers), and then goes to check for 
each of the pairs in that big set whether the disjunction holds.

Why would it do this? And more importantly: how do i get it to do it in a 
smarter way; eg. by retrieving for each of the questions the corresponding 
answers by traversing the insrel disjunction using indices etc..

My guess (and i'd like to know whether i'm 'warm' or 'cold' here) is that 
insrel is such a messed up table, that postgres fails to extract representative 
statistics from a small sample. The most important correlation is probably in 
snumber/dnumber, but the characteristics of this correlation vary wildly: there 
might be one snumber with a million dnumbers and a million snumbers with only 
one dnumber.

I've tried bigger statistics targets on the insrel dnumber/snumber columns, 
even maxed it out to 1000, but that did not help.

Anyway, any hints on getting this beast perform (without rewriting the query, 
that's not possible in this case due to the query being generated by an ORM) 
are welcome. I'm starting to think it is impossible, and that postgresql just 
doesn't work for this particular query+dataset.


Thanks,

Koen

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to