Hi everyone,
Il 14/01/2010 19:36, David Fetter ha scritto:
On Thu, Jan 14, 2010 at 03:08:22PM +0100, Matteo Beccati wrote:
Il 14/01/2010 14:39, Dimitri Fontaine ha scritto:
Matteo Beccati<p...@beccati.com> writes:>
Any improvements to sorting are welcome :)
...
ARRAY[uid]
...
Thanks David, using an array rather than text concatenation is slightly
slower and uses a bit more memory, but you've been able to convince me
that it's The Right Way(TM) ;)
Anyway, I've made further changes and I would say that at this point the
PoC is feature complete. There surely are still some rough edges and a
few things to clean up, but I'd like to get your feedback once again:
http://archives.beccati.org
You will find that pgsql-general and -hackers are subscribed and getting
messages live, wihle -hackers-history and -www have been imported from
the archives (about 200k and 1.5k messages respectively at 50 messages/s).
Also, I'd need some help with the CTE query that was picking a wrong
plan and led me to forcibly disable merge joins inside the application
when executing it. Plans are attached.
Cheers
--
Matteo Beccati
Development & Consulting - http://www.beccati.com/
archiveopteryx=# EXPLAIN ANALYZE WITH RECURSIVE t (mailbox, uid, date, subject,
sender, has_attachments, parent_uid, idx, depth) AS (
SELECT mailbox, uid, date, subject, sender, has_attachments, parent_uid,
uid::text, 1
FROM arc_messages
WHERE parent_uid IS NULL AND mailbox = 17 AND date >= '2007-11-01' AND date <
'2007-12-01'
UNION ALL
SELECT a.mailbox, a.uid, a.date, a.subject, a.sender, a.has_attachments,
a.parent_uid, t.idx || '.' || a.uid::text, t.depth + 1
FROM t JOIN arc_messages a USING (mailbox)
WHERE t.uid = a.parent_uid
) SELECT * FROM t ORDER BY idx ;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=92761.67..92769.91 rows=1647 width=121) (actual
time=4183.736..4185.762 rows=1428 loops=1)
Sort Key: t.idx
Sort Method: quicksort Memory: 366kB
CTE t
-> Recursive Union (cost=0.00..92579.09 rows=1647 width=130) (actual
time=0.030..4173.724 rows=1428 loops=1)
-> Index Scan using arc_messages_mailbox_parent_id_date_key on
arc_messages (cost=0.00..486.42 rows=567 width=94) (actual time=0.025..1.432
rows=482 loops=1)
Index Cond: ((mailbox = 17) AND (parent_uid IS NULL) AND (date
>= '2007-11-01 00:00:00+01'::timestamp with time zone) AND (date < '2007-12-01
00:00:00+01'::timestamp with time zone))
-> Merge Join (cost=729.68..9208.61 rows=108 width=130) (actual
time=262.120..277.819 rows=63 loops=15)
Merge Cond: ((a.mailbox = t.mailbox) AND (a.parent_uid =
t.uid))
-> Index Scan using arc_messages_mailbox_parent_id_key on
arc_messages a (cost=0.00..6452.25 rows=193871 width=94) (actual
time=0.018..147.782 rows=85101 loops=15)
-> Sort (cost=729.68..758.03 rows=5670 width=44) (actual
time=0.403..0.559 rows=109 loops=15)
Sort Key: t.mailbox, t.uid
Sort Method: quicksort Memory: 25kB
-> WorkTable Scan on t (cost=0.00..22.68 rows=5670
width=44) (actual time=0.003..0.145 rows=95 loops=15)
-> CTE Scan on t (cost=0.00..6.59 rows=1647 width=121) (actual
time=0.035..4179.686 rows=1428 loops=1)
Total runtime: 4188.187 ms
(16 rows)
archiveopteryx=# SET enable_mergejoin = false;
SET
archiveopteryx=# EXPLAIN ANALYZE WITH RECURSIVE t (mailbox, uid, date, subject,
sender, has_attachments, parent_uid, idx, depth) AS (
SELECT mailbox, uid, date, subject, sender, has_attachments, parent_uid,
uid::text, 1
FROM arc_messages
WHERE parent_uid IS NULL AND mailbox = 17 AND date >= '2007-11-01' AND date <
'2007-12-01'
UNION ALL
SELECT a.mailbox, a.uid, a.date, a.subject, a.sender, a.has_attachments,
a.parent_uid, t.idx || '.' || a.uid::text, t.depth + 1
FROM t JOIN arc_messages a USING (mailbox)
WHERE t.uid = a.parent_uid
) SELECT * FROM t ORDER BY idx ;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=104762.75..104770.98 rows=1647 width=121) (actual
time=34.315..36.331 rows=1428 loops=1)
Sort Key: t.idx
Sort Method: quicksort Memory: 366kB
CTE t
-> Recursive Union (cost=0.00..104580.17 rows=1647 width=130) (actual
time=0.040..24.851 rows=1428 loops=1)
-> Index Scan using arc_messages_mailbox_parent_id_date_key on
arc_messages (cost=0.00..486.42 rows=567 width=94) (actual time=0.034..1.467
rows=482 loops=1)
Index Cond: ((mailbox = 17) AND (parent_uid IS NULL) AND (date
>= '2007-11-01 00:00:00+01'::timestamp with time zone) AND (date < '2007-12-01
00:00:00+01'::timestamp with time zone))
-> Nested Loop (cost=0.00..10408.72 rows=108 width=130) (actual
time=0.018..1.235 rows=63 loops=15)
-> WorkTable Scan on t (cost=0.00..22.68 rows=5670 width=44)
(actual time=0.001..0.138 rows=95 loops=15)
-> Index Scan using arc_messages_mailbox_parent_id_key on
arc_messages a (cost=0.00..1.82 rows=1 width=94) (actual time=0.005..0.006
rows=1 loops=1428)
Index Cond: ((a.mailbox = t.mailbox) AND (a.parent_uid =
t.uid))
-> CTE Scan on t (cost=0.00..6.59 rows=1647 width=121) (actual
time=0.045..30.543 rows=1428 loops=1)
Total runtime: 38.559 ms
(13 rows)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers