Re: [GENERAL] CTE materializing sets?

2012-10-09 Thread Craig Ringer

On 10/06/2012 08:45 AM, Liam Caffrey wrote:

Hi,

If I run a CTE does that materialize the resulting data in the same (or
a similar) way as if I created a temp table and referred to that
instead? Or does the CTE keep the set in memory?


Really good question, I too would be interested in this.

I'd expect it'd materialize to RAM if the result is within `work_mem` 
but I'd love to know for sure.


--
Craig Ringer



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] CTE materializing sets?

2012-10-09 Thread Serge Fonville
This indeed is a very interesting question.

At http://wiki.postgresql.org/wiki/CTEReadme it seems to suggest that a CTE
is just rewritten and the resulting query is executed.

Kind regards/met vriendelijke groet,

Serge Fonville

http://www.sergefonville.nl

Convince Microsoft!
They need to add TRUNCATE PARTITION in SQL Server
https://connect.microsoft.com/SQLServer/feedback/details/417926/truncate-partition-of-partitioned-table



2012/10/9 Craig Ringer ring...@ringerc.id.au

 On 10/06/2012 08:45 AM, Liam Caffrey wrote:

 Hi,

 If I run a CTE does that materialize the resulting data in the same (or
 a similar) way as if I created a temp table and referred to that
 instead? Or does the CTE keep the set in memory?


 Really good question, I too would be interested in this.

 I'd expect it'd materialize to RAM if the result is within `work_mem` but
 I'd love to know for sure.

 --
 Craig Ringer



 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/**mailpref/pgsql-generalhttp://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] CTE materializing sets?

2012-10-09 Thread Tomas Vondra

Dne 09.10.2012 11:48, Craig Ringer napsal:

On 10/06/2012 08:45 AM, Liam Caffrey wrote:

Hi,

If I run a CTE does that materialize the resulting data in the same 
(or

a similar) way as if I created a temp table and referred to that
instead? Or does the CTE keep the set in memory?


Really good question, I too would be interested in this.

I'd expect it'd materialize to RAM if the result is within `work_mem`
but I'd love to know for sure.


Yes - it's using a tuplestore, and that's the default behavior. It 
stores the
tuples in RAM until it reaches work_mem and then starts spilling the 
data

to disk. It's entirely transparent behavior.

Tomas


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] CTE materializing sets?

2012-10-09 Thread Craig Ringer

On 10/09/2012 05:53 PM, Serge Fonville wrote:

This indeed is a very interesting question.

At http://wiki.postgresql.org/wiki/CTEReadme it seems to suggest that a
CTE is just rewritten and the resulting query is executed.


CTEs are an optimisation fence, so there's something more than a simple 
rewrite.


--
Craig Ringer



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] CTE materializing sets?

2012-10-09 Thread Виктор Егоров
2012/10/9 Serge Fonville serge.fonvi...@gmail.com:
 This indeed is a very interesting question.

 At http://wiki.postgresql.org/wiki/CTEReadme it seems to suggest that a CTE
 is just rewritten and the resulting query is executed.

As was mentioned a couple of times in this list, CTE do have
optimization fence feature (per SQL Standard).
I asked on the #postgresql channel and was pointed, that typically you
get benefits of this feature
when you have to join grouping subquery to itself.

I went and did some tests. Table attempt contains e-mail delivery
attempts for the postfix:

# select 
relname,relpages,reltuples::numeric,pg_size_pretty(pg_relation_size(oid))
from pg_class where relname='attempt';
 relname | relpages | reltuples | pg_size_pretty
-+--+---+
 attempt |   145117 |   4252530 | 1134 MB


My default work_mem is 1MB on this instance.

First, plain query with 2 subqueries:

# explain (analyze, buffers)
select a.eid, b.eid from
  (select recipient_email_id eid, count(*) cnt, min(tstamp) as minmsg,
max(tstamp) as maxmsg from attempt group by recipient_email_id) a,
  (select recipient_email_id eid, count(*) cnt, min(tstamp) as minmsg,
max(tstamp) as maxmsg from attempt group by recipient_email_id) b
where a.minmsg = b.maxmsg;

 QUERY PLAN
---
 Merge Join  (cost=1861911.11..1953183.16 rows=6067386 width=16)
(actual time=65758.378..66115.400 rows=59845 loops=1)
   Merge Cond: (a.minmsg = b.maxmsg)
   Buffers: shared hit=1590 read=288644, temp read=103129 written=103134
   -  Sort  (cost=930955.56..931042.64 rows=34835 width=16) (actual
time=30242.503..30370.379 rows=212434 loops=1)
 Sort Key: a.minmsg
 Sort Method: external merge  Disk: 5400kB
 Buffers: shared hit=779 read=144338, temp read=51481 written=51481
 -  Subquery Scan on a  (cost=873875.76..927729.06 rows=34835
width=16) (actual time=26744.434..30008.996 rows=212434 loops=1)
   Buffers: shared hit=779 read=144338, temp read=50561
written=50561
   -  GroupAggregate  (cost=873875.76..927380.71
rows=34835 width=16) (actual time=26744.433..29951.390 rows=212434
loops=1)
 Buffers: shared hit=779 read=144338, temp
read=50561 written=50561
 -  Sort  (cost=873875.76..884507.08 rows=4252528
width=16) (actual time=26744.273..28296.850 rows=4255749 loops=1)
   Sort Key: public.attempt.recipient_email_id
   Sort Method: external merge  Disk: 108168kB
   Buffers: shared hit=779 read=144338, temp
read=50561 written=50561
   -  Seq Scan on attempt
(cost=0.00..187642.28 rows=4252528 width=16) (actual
time=0.010..13618.612 rows=4255749 loops=1)
 Buffers: shared hit=779 read=144338
   -  Materialize  (cost=930955.56..931129.73 rows=34835 width=16)
(actual time=35515.860..35640.974 rows=214271 loops=1)
 Buffers: shared hit=811 read=144306, temp read=51648 written=51653
 -  Sort  (cost=930955.56..931042.64 rows=34835 width=16)
(actual time=35515.853..35586.598 rows=210800 loops=1)
   Sort Key: b.maxmsg
   Sort Method: external merge  Disk: 5384kB
   Buffers: shared hit=811 read=144306, temp read=51648
written=51653
   -  Subquery Scan on b  (cost=873875.76..927729.06
rows=34835 width=16) (actual time=31879.743..35251.218 rows=212434
loops=1)
 Buffers: shared hit=811 read=144306, temp
read=50561 written=50561
 -  GroupAggregate  (cost=873875.76..927380.71
rows=34835 width=16) (actual time=31879.741..35184.965 rows=212434
loops=1)
   Buffers: shared hit=811 read=144306, temp
read=50561 written=50561
   -  Sort  (cost=873875.76..884507.08
rows=4252528 width=16) (actual time=31879.577..33460.975 rows=4255749
loops=1)
 Sort Key: public.attempt.recipient_email_id
 Sort Method: external merge  Disk: 108168kB
 Buffers: shared hit=811 read=144306,
temp read=50561 written=50561
 -  Seq Scan on attempt
(cost=0.00..187642.28 rows=4252528 width=16) (actual
time=0.012..17637.516 rows=4255749 loops=1)
   Buffers: shared hit=811 read=144306
 Total runtime: 67611.657 ms
(34 rows)

The source relation is scanned twice. Now, using CTE and it's
materialization feature:

# explain (analyze, buffers)
with msgs as (select recipient_email_id eid, count(*) cnt, min(tstamp)
as minmsg, max(tstamp) as maxmsg from attempt group by
recipient_email_id)
select a.eid, b.eid from msgs a, msgs b where a.minmsg=b.maxmsg;

[GENERAL] CTE materializing sets?

2012-10-05 Thread Liam Caffrey
Hi,

If I run a CTE does that materialize the resulting data in the same (or a
similar) way as if I created a temp table and referred to that instead? Or
does the CTE keep the set in memory?

Regards

Liam