Hi everyone,

Il 14/01/2010 19:36, David Fetter ha scritto:
On Thu, Jan 14, 2010 at 03:08:22PM +0100, Matteo Beccati wrote:
Il 14/01/2010 14:39, Dimitri Fontaine ha scritto:
Matteo Beccati<p...@beccati.com>   writes:>
Any improvements to sorting are welcome :)

...
         ARRAY[uid]
...

Thanks David, using an array rather than text concatenation is slightly slower and uses a bit more memory, but you've been able to convince me that it's The Right Way(TM) ;)

Anyway, I've made further changes and I would say that at this point the PoC is feature complete. There surely are still some rough edges and a few things to clean up, but I'd like to get your feedback once again:

http://archives.beccati.org

You will find that pgsql-general and -hackers are subscribed and getting messages live, wihle -hackers-history and -www have been imported from the archives (about 200k and 1.5k messages respectively at 50 messages/s).

Also, I'd need some help with the CTE query that was picking a wrong plan and led me to forcibly disable merge joins inside the application when executing it. Plans are attached.


Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/
archiveopteryx=# EXPLAIN ANALYZE WITH RECURSIVE t (mailbox, uid, date, subject, 
sender, has_attachments, parent_uid, idx, depth) AS (
  SELECT mailbox, uid, date, subject, sender, has_attachments, parent_uid, 
uid::text, 1
  FROM arc_messages
  WHERE parent_uid IS NULL AND mailbox = 17 AND date >= '2007-11-01' AND date < 
'2007-12-01'
  UNION ALL
  SELECT a.mailbox, a.uid, a.date, a.subject, a.sender, a.has_attachments, 
a.parent_uid, t.idx || '.' || a.uid::text, t.depth + 1
  FROM t JOIN arc_messages a USING (mailbox)
  WHERE t.uid = a.parent_uid
) SELECT * FROM t ORDER BY idx ;
                                                                                
               QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=92761.67..92769.91 rows=1647 width=121) (actual 
time=4183.736..4185.762 rows=1428 loops=1)
   Sort Key: t.idx
   Sort Method:  quicksort  Memory: 366kB
   CTE t
     ->  Recursive Union  (cost=0.00..92579.09 rows=1647 width=130) (actual 
time=0.030..4173.724 rows=1428 loops=1)
           ->  Index Scan using arc_messages_mailbox_parent_id_date_key on 
arc_messages  (cost=0.00..486.42 rows=567 width=94) (actual time=0.025..1.432 
rows=482 loops=1)
                 Index Cond: ((mailbox = 17) AND (parent_uid IS NULL) AND (date 
>= '2007-11-01 00:00:00+01'::timestamp with time zone) AND (date < '2007-12-01 
00:00:00+01'::timestamp with time zone))
           ->  Merge Join  (cost=729.68..9208.61 rows=108 width=130) (actual 
time=262.120..277.819 rows=63 loops=15)
                 Merge Cond: ((a.mailbox = t.mailbox) AND (a.parent_uid = 
t.uid))
                 ->  Index Scan using arc_messages_mailbox_parent_id_key on 
arc_messages a  (cost=0.00..6452.25 rows=193871 width=94) (actual 
time=0.018..147.782 rows=85101 loops=15)
                 ->  Sort  (cost=729.68..758.03 rows=5670 width=44) (actual 
time=0.403..0.559 rows=109 loops=15)
                       Sort Key: t.mailbox, t.uid
                       Sort Method:  quicksort  Memory: 25kB
                       ->  WorkTable Scan on t  (cost=0.00..22.68 rows=5670 
width=44) (actual time=0.003..0.145 rows=95 loops=15)
   ->  CTE Scan on t  (cost=0.00..6.59 rows=1647 width=121) (actual 
time=0.035..4179.686 rows=1428 loops=1)
 Total runtime: 4188.187 ms
(16 rows)
archiveopteryx=# SET enable_mergejoin = false;
SET
archiveopteryx=# EXPLAIN ANALYZE WITH RECURSIVE t (mailbox, uid, date, subject, 
sender, has_attachments, parent_uid, idx, depth) AS (
  SELECT mailbox, uid, date, subject, sender, has_attachments, parent_uid, 
uid::text, 1
  FROM arc_messages
  WHERE parent_uid IS NULL AND mailbox = 17 AND date >= '2007-11-01' AND date < 
'2007-12-01'
  UNION ALL
  SELECT a.mailbox, a.uid, a.date, a.subject, a.sender, a.has_attachments, 
a.parent_uid, t.idx || '.' || a.uid::text, t.depth + 1
  FROM t JOIN arc_messages a USING (mailbox)
  WHERE t.uid = a.parent_uid
) SELECT * FROM t ORDER BY idx ;
                                                                                
               QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=104762.75..104770.98 rows=1647 width=121) (actual 
time=34.315..36.331 rows=1428 loops=1)
   Sort Key: t.idx
   Sort Method:  quicksort  Memory: 366kB
   CTE t
     ->  Recursive Union  (cost=0.00..104580.17 rows=1647 width=130) (actual 
time=0.040..24.851 rows=1428 loops=1)
           ->  Index Scan using arc_messages_mailbox_parent_id_date_key on 
arc_messages  (cost=0.00..486.42 rows=567 width=94) (actual time=0.034..1.467 
rows=482 loops=1)
                 Index Cond: ((mailbox = 17) AND (parent_uid IS NULL) AND (date 
>= '2007-11-01 00:00:00+01'::timestamp with time zone) AND (date < '2007-12-01 
00:00:00+01'::timestamp with time zone))
           ->  Nested Loop  (cost=0.00..10408.72 rows=108 width=130) (actual 
time=0.018..1.235 rows=63 loops=15)
                 ->  WorkTable Scan on t  (cost=0.00..22.68 rows=5670 width=44) 
(actual time=0.001..0.138 rows=95 loops=15)
                 ->  Index Scan using arc_messages_mailbox_parent_id_key on 
arc_messages a  (cost=0.00..1.82 rows=1 width=94) (actual time=0.005..0.006 
rows=1 loops=1428)
                       Index Cond: ((a.mailbox = t.mailbox) AND (a.parent_uid = 
t.uid))
   ->  CTE Scan on t  (cost=0.00..6.59 rows=1647 width=121) (actual 
time=0.045..30.543 rows=1428 loops=1)
 Total runtime: 38.559 ms
(13 rows)
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to