Hi,
Attached is a query and its corresponding plan, where sorting of the CTE
acts seems to be the bottle neck. It is a real execution plan captured
with the auto_explain module.
The query is recursive. In each iteration CTE acts is sorted again,
which is obviously quite expensive for about 24000 rows and the same
number of iterations.
So I tried to put the ordering over the keys (d_id, activation_count)
into the CTE definition itself. This is honoured, when evaluating the
CTE but not for the iteration, where the CTE acts is still sorted again.
I cannot see a reason for this. A simple CTE scan with filter condition
should be enough.
Removing the order by from the definition of the CTE has absolutely no
impact on the performance, which is quite obvious regarding the number
of iterations. Further it has no impact on the query plan at all. It
only removes the sort node from the CTE acts node.
Do I miss something which would make the plan incorrect or is the
planner just not intelligent enough to recognize that a table is sorted
by the desired keys?
I hope the attachments prevent outlook from destroying any text
formatting.
Thanks in advance
--
Regards,
Robert
CTE Scan on he_per_dispenser (cost=75008.64..75720.96 rows=35616
width=16) (actual time=141.806..205579.272 rows=53152 loops=1)
Output: he_per_dispenser.he_id, he_per_dispenser.activation_id
CTE acts
-> Sort (cost=8298.16..8431.04 rows=53152 width=24) (actual
time=113.128..117.709 rows=53152 loops=1)
Output: a.id, e.dispenser_id, a.activation_count,
e."timestamp"
Sort Key: e.dispenser_id, a.activation_count
Sort Method: quicksort Memory: 5689kB
-> Hash Join (cost=1531.92..4126.30 rows=53152 width=24)
(actual time=25.653..66.205 rows=53152 loops=1)
Output: a.id, e.dispenser_id, a.activation_count,
e."timestamp"
Hash Cond: (e.id = a.id)
-> Seq Scan on events e (cost=0.00..1280.93
rows=78193 width=20) (actual time=0.006..8.615 rows=78193 loops=1)
Output: e.id, e."timestamp", e.dispenser_id,
e.relocation_id, e.synthetic
-> Hash (cost=867.52..867.52 rows=53152 width=12)
(actual time=18.737..18.737 rows=53152 loops=1)
Output: a.id, a.activation_count
-> Seq Scan on activations a (cost=0.00..867.52
rows=53152 width=12) (actual time=0.005..8.938 rows=53152 loops=1)
Output: a.id, a.activation_count
CTE he_per_dispenser
-> Recursive Union (cost=1369.30..66577.60 rows=35616 width=40)
(actual time=141.803..205530.157 rows=53152 loops=1)
-> Hash Join (cost=1369.30..5359.69 rows=266 width=40)
(actual time=141.803..155.424 rows=11 loops=1)
Output: a.activation_id, acts.dispenser_id,
(min(acts.activation_count)), a."timestamp", CASE WHEN (a."timestamp" <=
(p."timestamp" + $1)) THEN p.last_hygiene_event ELSE a.activation_id END
Hash Cond: ((a.dispenser_id = acts.dispenser_id) AND
(a.activation_count = (min(acts.activation_count))))
-> CTE Scan on acts a (cost=0.00..1063.04 rows=53152
width=24) (actual time=113.130..118.725 rows=53152 loops=1)
Output: a.activation_id, a.dispenser_id,
a.activation_count, a."timestamp"
-> Hash (cost=1366.30..1366.30 rows=200 width=24)
(actual time=28.644..28.644 rows=11 loops=1)
Output: acts.dispenser_id,
(min(acts.activation_count)), p."timestamp", p.last_hygiene_event
-> Hash Left Join (cost=1359.05..1366.30
rows=200 width=24) (actual time=28.634..28.639 rows=11 loops=1)
Output: acts.dispenser_id,
(min(acts.activation_count)), p."timestamp", p.last_hygiene_event
Hash Cond: (acts.dispenser_id =
p.dispenser_id)
-> HashAggregate (cost=1328.80..1331.30
rows=200 width=8) (actual time=28.617..28.620 rows=11 loops=1)
Output: acts.dispenser_id,
min(acts.activation_count)
-> CTE Scan on acts
(cost=0.00..1063.04 rows=53152 width=8) (actual time=0.001..15.560 rows=53152
loops=1)
Output: acts.activation_id,
acts.dispenser_id, acts.activation_count, acts."timestamp"
-> Hash (cost=19.00..19.00 rows=900
width=20) (actual time=0.002..0.002 rows=0 loops=1)
Output: p."timestamp",
p.last_hygiene_event, p.dispenser_id
-> Seq Scan on
processed_activations_cache p (cost=0.00..19.00 rows=900 width=20) (actual
time=0.002..0.002 rows=0 loops=1)
Output: p."timestamp",
p.last_hygiene_event, p.dispenser_id
-> Merge Join (cost=5439.41..6050.56 rows=3535 width=40)
(actual time=3.172..8.274 rows=2 loops=24813)
Output: a.activation_id, a.dispenser_id,
a.activation_count, a."timestamp", CASE WHEN ((a."timestamp" - s."timestamp")
<= $1) THEN s.he_id ELSE a.activation_id END
Merge Cond: ((s.dispenser_id = a.dispenser_id) AND
(((s.activation_count + 1)) = a.activation_count))
-> Sort (cost=204.52..211.17 rows=2660 width=24)
(actual time=0.005..0.006 rows=2 loops=24813)
Output: s."timestamp", s.he_id, s.dispenser_id,
s.activation_count
Sort Key: s.dispenser_id, ((s.activation_count +
1))
Sort Method: quicksort Memory: 25kB
-> WorkTable Scan on he_per_dispenser s
(cost=0.00..53.20 rows=2660 width=24) (actual time=0.001..0.001 rows=2
loops=24813)
Output: s."timestamp", s.he_id,
s.dispenser_id, s.activation_count
-> Sort (cost=5234.90..5367.78 rows=53152 width=24)
(actual time=0.001..2.540 rows=40748 loops=24813)
Output: a.activation_id, a.dispenser_id,
a.activation_count, a."timestamp"
Sort Key: a.dispenser_id, a.activation_count
Sort Method: quicksort Memory: 5689kB
-> CTE Scan on acts a (cost=0.00..1063.04
rows=53152 width=24) (actual time=0.000..5.727 rows=53152 loops=1)
Output: a.activation_id, a.dispenser_id,
a.activation_count, a."timestamp"
-- $1 = INTERVAL'1.5s' in the execution plan
INSERT INTO transformation.high_level_events(
id,
activation_id)
WITH RECURSIVE
acts AS (
/* This CTE should prevent duplicate physical tablescans.
* An order by as the plan suggests, seems worthless. The
* executed plan will will sort this CTE again in each
iteration
* of he_per_day.
*/
SELECT a.id AS activation_id,
e.day_id,
a.activation_count,
e.timestamp
FROM transformation.activations a JOIN transformation.events e
USING(id)
ORDER BY day_id, activatoin_count),
he_per_day AS (
SELECT a.activation_id,
min_acts.day_id,
min_acts.activation_count,
a.timestamp,
CASE
WHEN a.timestamp <= p.timestamp + $1 THEN
p.last_high_level_event
ELSE a.activation_id
END AS he_id -- identifies a specific high level event
FROM (SELECT day_id,
MIN(activation_count) AS
activation_count
FROM acts
GROUP BY day_id) min_acts
JOIN acts a
USING (day_id, activation_count)
-- check whether it belongs to the last high_level
event processed
LEFT OUTER JOIN
transformation.processed_activations_cache p
USING (day_id)
UNION ALL
SELECT a.activation_id,
a.day_id,
a.activation_count,
a.timestamp,
CASE
WHEN a.timestamp - s.timestamp <= $1 THEN
s.he_id
ELSE a.activation_id
END AS he_id
FROM acts a JOIN he_per_day s
ON a.day_id = s.day_id
AND a.activation_count =
s.activation_count + 1)
SELECT he_id AS id, activation_id
FROM he_per_day;
--
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general