Re: [PERFORM] Two "equivalent" WITH RECURSIVE queries, one of them slow.

2010-07-07 Thread Merlin Moncure
On Mon, Jul 5, 2010 at 2:07 AM, Octavio Alvarez
 wrote:
> Hello.
>
> I have a tree-like table with a three-field PK (name, date, id) and one
> parent field.
> It has 5k to 6k records as of now, but it will hold about 1 million records.
>
> I am trying the following WITH RECURSIVE query:
>
> WITH RECURSIVE t AS (
>                 SELECT par.id AS tid, par.name, par.date, par.id, par.text,
> par.h_title, par.h_name, par.parent
>                   FROM _books.par
>        UNION
>                 SELECT t.tid AS pid, p.name, p.date, p.id, p.text,
> p.h_title, p.h_name, p.parent
>                   FROM t, _books.par p
>                  WHERE p.name = t.name AND p.date = t.date AND t.id =
> p.parent
>        )
>  SELECT t.tid, t.name, t.date, t.id, t.text, t.h_title, t.h_name, t.parent
>   FROM t WHERE name = 'cfx' AND date = '2009-08-19' AND tid = '28340';
>
> ... which takes 2547.503 ms
>
> However, if I try the same query but adding the same WHERE clause to the
> non-recursive term, I get much better results.
>
>
> WITH RECURSIVE t AS (
>                 SELECT par.id AS tid, par.name, par.date, par.id, par.text,
> par.h_title, par.h_name, par.parent
>                   FROM _books.par WHERE name = 'cfx' AND date = '2009-08-19'
> AND par.id = '28340'
>        UNION
>                 SELECT t.tid AS pid, p.name, p.date, p.id, p.text,
> p.h_title, p.h_name, p.parent
>                   FROM t, _books.par p
>                  WHERE p.name = t.name AND p.date = t.date AND t.id =
> p.parent
>        )
>  SELECT t.tid, t.name, t.date, t.id, t.text, t.h_title, t.h_name, t.parent
>   FROM t WHERE name = 'cfx' AND date = '2009-08-19' AND tid = '28340';
>
> ... which takes 0.221 ms

If you want the fast plan, you might want to consider reworking your
query into a set returning function.  It's pretty easy to do:


create or replace function f(arg int) returns setof something as
$$
  with recursive foo as
  (
select * from bar where id = $1
  union all
[...]
  )
  select * from foo
$$ language sql;

Obviously, a pure view approach would be nicer but it just isn't going
to hapen at present.  CTE are currently problematic generally when you
need quals in the 'with' term, especially in the case of recursive
CTE.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Two "equivalent" WITH RECURSIVE queries, one of them slow.

2010-07-06 Thread Robert Haas
On Mon, Jul 5, 2010 at 2:07 AM, Octavio Alvarez
 wrote:
> Hello.
>
> I have a tree-like table with a three-field PK (name, date, id) and one
> parent field.
> It has 5k to 6k records as of now, but it will hold about 1 million records.
>
> I am trying the following WITH RECURSIVE query:
>
> WITH RECURSIVE t AS (
>                 SELECT par.id AS tid, par.name, par.date, par.id, par.text,
> par.h_title, par.h_name, par.parent
>                   FROM _books.par
>        UNION
>                 SELECT t.tid AS pid, p.name, p.date, p.id, p.text,
> p.h_title, p.h_name, p.parent
>                   FROM t, _books.par p
>                  WHERE p.name = t.name AND p.date = t.date AND t.id =
> p.parent
>        )
>  SELECT t.tid, t.name, t.date, t.id, t.text, t.h_title, t.h_name, t.parent
>   FROM t WHERE name = 'cfx' AND date = '2009-08-19' AND tid = '28340';
>
> ... which takes 2547.503 ms
>
> However, if I try the same query but adding the same WHERE clause to the
> non-recursive term, I get much better results.
>
>
> WITH RECURSIVE t AS (
>                 SELECT par.id AS tid, par.name, par.date, par.id, par.text,
> par.h_title, par.h_name, par.parent
>                   FROM _books.par WHERE name = 'cfx' AND date = '2009-08-19'
> AND par.id = '28340'
>        UNION
>                 SELECT t.tid AS pid, p.name, p.date, p.id, p.text,
> p.h_title, p.h_name, p.parent
>                   FROM t, _books.par p
>                  WHERE p.name = t.name AND p.date = t.date AND t.id =
> p.parent
>        )
>  SELECT t.tid, t.name, t.date, t.id, t.text, t.h_title, t.h_name, t.parent
>   FROM t WHERE name = 'cfx' AND date = '2009-08-19' AND tid = '28340';
>
> ... which takes 0.221 ms
>
> I am being forced to use the slow query because I want to define it as a
> view, leaving the WHERE clause to the application.

I think this is just a limitation of the optimizer.  Recursive queries
are a relatively new feature and the optimizer doesn't know a whole
lot about how to deal with them.  That may get improved at some point,
but the optimization you're hoping for here is pretty tricky.  In
order to push the WHERE clauses down into the non-recursive term, the
optimizer would need to prove that this doesn't change the final
results.  I think that's possible here because it so happens that your
recursive term only generates results that have the same name, date,
and tid as some existing result, but with a slightly different
recursive query that wouldn't be true, so you'd need to make the code
pretty smart to work this one out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Two "equivalent" WITH RECURSIVE queries, one of them slow.

2010-07-05 Thread Octavio Alvarez

Hello.

I have a tree-like table with a three-field PK (name, date, id) and one  
parent field.
It has 5k to 6k records as of now, but it will hold about 1 million  
records.


I am trying the following WITH RECURSIVE query:

WITH RECURSIVE t AS (
 SELECT par.id AS tid, par.name, par.date, par.id,  
par.text, par.h_title, par.h_name, par.parent

   FROM _books.par
UNION
 SELECT t.tid AS pid, p.name, p.date, p.id, p.text,  
p.h_title, p.h_name, p.parent

   FROM t, _books.par p
  WHERE p.name = t.name AND p.date = t.date AND t.id =  
p.parent

)
 SELECT t.tid, t.name, t.date, t.id, t.text, t.h_title, t.h_name, t.parent
   FROM t WHERE name = 'cfx' AND date = '2009-08-19' AND tid = '28340';

... which takes 2547.503 ms

However, if I try the same query but adding the same WHERE clause to the
non-recursive term, I get much better results.


WITH RECURSIVE t AS (
 SELECT par.id AS tid, par.name, par.date, par.id,  
par.text, par.h_title, par.h_name, par.parent
   FROM _books.par WHERE name = 'cfx' AND date =  
'2009-08-19' AND par.id = '28340'

UNION
 SELECT t.tid AS pid, p.name, p.date, p.id, p.text,  
p.h_title, p.h_name, p.parent

   FROM t, _books.par p
  WHERE p.name = t.name AND p.date = t.date AND t.id =  
p.parent

)
 SELECT t.tid, t.name, t.date, t.id, t.text, t.h_title, t.h_name, t.parent
   FROM t WHERE name = 'cfx' AND date = '2009-08-19' AND tid = '28340';

... which takes 0.221 ms

I am being forced to use the slow query because I want to define it as a
view, leaving the WHERE clause to the application.

I fail to see where the two queries might be different, or, what cases the
slow one considers that the fast one doesn't, as to get a clue on how to
workaround this.

I have taken the EXPLAIN ANALYZE output for both queries. It looks like
the slow one is processing all records (read: not adding the WHERE clause
to the non-recursive term).


  QUERY  
PLAN

--
 CTE Scan on t  (cost=96653.20..96820.57 rows=1 width=144) (actual  
time=32.931..2541.792 rows=1 loops=1)
   Filter: (((name)::text = 'cfx'::text) AND (date = '2009-08-19'::date)  
AND (tid = 28340))

   CTE t
 ->  Recursive Union  (cost=0.00..96653.20 rows=6086 width=212)  
(actual time=0.017..2442.655 rows=33191 loops=1)
   ->  Seq Scan on par  (cost=0.00..237.96 rows=5996 width=208)  
(actual time=0.011..5.591 rows=5996 loops=1)
   ->  Merge Join  (cost=8909.74..9629.35 rows=9 width=212)  
(actual time=225.979..254.727 rows=3022 loops=9)
 Merge Cond: (((t.name)::text = (p.name)::text) AND  
(t.date = p.date) AND (t.id = p.parent))
 ->  Sort  (cost=7700.54..7850.44 rows=59960 width=44)  
(actual time=58.163..59.596 rows=3685 loops=9)

   Sort name: t.name, t.date, t.id
   Sort Method:  quicksort  Memory: 17kB
   ->  WorkTable Scan on t  (cost=0.00..1199.20  
rows=59960 width=44) (actual time=0.027..3.486 rows=3688 loops=9)
 ->  Materialize  (cost=1209.20..1284.15 rows=5996  
width=208) (actual time=163.062..177.415 rows=5810 loops=9)
   ->  Sort  (cost=1209.20..1224.19 rows=5996  
width=208) (actual time=163.054..172.543 rows=5810 loops=9)

 Sort name: p.name, p.date, p.parent
 Sort Method:  external merge  Disk: 1304kB
 ->  Seq Scan on par p  (cost=0.00..237.96  
rows=5996 width=208) (actual time=0.015..3.330 rows=5996 loops=9)

 Total runtime: 2547.503 ms
(17 rows)



QUERY  
PLAN

--
 CTE Scan on t  (cost=927.80..928.10 rows=1 width=144) (actual  
time=0.036..0.132 rows=1 loops=1)
   Filter: (((name)::text = 'cfx'::text) AND (date = '2009-08-19'::date)  
AND (tid = 28340))

   CTE t
 ->  Recursive Union  (cost=0.00..927.80 rows=11 width=212) (actual  
time=0.030..0.124 rows=1 loops=1)
   ->  Index Scan using par_id on par  (cost=0.00..8.27 rows=1  
width=208) (actual time=0.024..0.026 rows=1 loops=1)

 Index Cond: (id = 28340)
 Filter: (((name)::text = 'cfx'::text) AND (date =  
'2009-08-19'::date))
   ->  Nested Loop  (cost=0.00..91.93 rows=1 width=212) (actual  
time=0.091..0.091 rows=0 loops=1)
 Join Filter: (((t.name)::text = (p.name)::text) AND  
(t.date = p.date))
 ->  WorkTable Scan on t  (cost=0.00..0.20 rows=10  
width=4