All right, after some discussion on StackOverflow, we found out that incorrect 
query plan is generated due to the fact that there is a LIMIT keyword in the 
query. I guess Postgresql expects to find appropriate rows faster and that's 
why it generates a seq scan on the table. If I remove LIMIT 1000 everything is 
executed in several milliseconds and query plan looks like this:
Hash Join  (cost=2662004.85..14948213.44 rows=22661658 width=138) (actual 
time=0.105..0.105 rows=0 loops=1)  Hash Cond: (p."PaymentOrderId" = po."Id")  
->  Seq Scan on "Payments" p  (cost=0.00..5724570.00 rows=350000000 width=18) 
(actual time=0.018..0.018 rows=1 loops=1)  ->  Hash  
(cost=2583365.85..2583365.85 rows=2614480 width=120) (actual time=0.046..0.046 
rows=0 loops=1)        Buckets: 8192  Batches: 64  Memory Usage: 0kB        ->  
Hash Join  (cost=904687.05..2583365.85 rows=2614480 width=120) (actual 
time=0.046..0.046 rows=0 loops=1)              Hash Cond: (po."UserId" = 
u."Id")              ->  Seq Scan on "PaymentOrders" po  (cost=0.00..654767.00 
rows=40000000 width=24) (actual time=0.003..0.003 rows=1 loops=1)              
->  Hash  (cost=850909.04..850909.04 rows=1980881 width=96) (actual 
time=0.016..0.016 rows=0 loops=1)                    Buckets: 8192  Batches: 32 
 Memory Usage: 0kB                    ->  Hash Join  (cost=1.20..850909.04 
rows=1980881 width=96) (actual time=0.016..0.016 rows=0 loops=1)                
          Hash Cond: (u."RoleId" = r."Id")                          ->  Seq 
Scan on "Users" u  (cost=0.00..718598.20 rows=30000220 width=80) (actual 
time=0.002..0.002 rows=1 loops=1)                          ->  Hash  
(cost=1.19..1.19 rows=1 width=16) (actual time=0.009..0.009 rows=0 loops=1)     
                           Buckets: 1024  Batches: 1  Memory Usage: 0kB         
                       ->  Seq Scan on "Roles" r  (cost=0.00..1.19 rows=1 
width=16) (actual time=0.009..0.009 rows=0 loops=1)                             
         Filter: (("Name")::text = 'Moses2333'::text)                           
           Rows Removed by Filter: 15Total runtime: 0.209 ms
According to Erwin Brandstetter I also tried pushing the query in a subquery 
and applying LIMIT there:
SELECT *FROM  (   SELECT *   FROM   "Roles"         AS r     JOIN   "Users"     
    AS u  ON u."RoleId" = r."Id"   JOIN   "PaymentOrders" AS po ON po."UserId" 
= u."Id"   JOIN   "Payments"      AS p  ON p."PaymentOrderId" = po."Id"   WHERE 
 r."Name" = 'Moses'  ) xLIMIT  1000;
but this solution also generates incorrect query plan. Any idea how to solve 
this query without omitting LIMIT keyword?Thanks
> Subject: RE: [PERFORM] PostgreSQL strange query plan for my query
> Date: Fri, 16 Nov 2012 13:55:41 +0100
> From: laurenz.a...@wien.gv.at
> To: dato0...@hotmail.com; pgsql-performance@postgresql.org
> 
> David Popiashvili wrote:
> > I have database with few hundred millions of rows. I'm running the
> following query:
> > 
> > select * from "Payments" as p
> > inner join "PaymentOrders" as po
> > on po."Id" = p."PaymentOrderId"
> > inner join "Users" as u
> > On u."Id" = po."UserId"
> > INNER JOIN "Roles" as r
> > on u."RoleId" = r."Id"
> > Where r."Name" = 'Moses'
> > LIMIT 1000
> > When the where clause finds a match in database, I get the result in
> several milliseconds, but if I
> > modify the query and specify a non-existent r."Name" in where clause,
> it takes too much time to
> > complete. I guess that PostgreSQL is doing a sequential scan on the
> Payments table (which contains the
> > most rows), comparing each row one by one.
> > Isn't postgresql smart enough to check first if Roles table contains
> any row with Name 'Moses'?
> > 
> > Roles table contains only 15 row, while Payments contains ~350
> million.
> > 
> > I'm running PostgreSQL 9.2.1.
> 
> > Here'e explain analyse results: http://explain.depesz.com/s/7e7
> 
> Can you also show the plan for the good case?
> 
> Yours,
> Laurenz Albe
                                          

Reply via email to