I saw a post that looked similar to a PostgreSQL IMAP performance issue I have 
experienced.  I have a relatively small system (< 15 GB of mail) and there is 
one query that makes moving messages between IMAP take several seconds each.  
I'm in the process of moving several thousand messages so I'm not sure it'll 
really finish.

The original query looks like this:

SELECT message_idnr FROM dbmail_messages m JOIN dbmail_physmessage p ON 
m.physmessage_id=p.id JOIN dbmail_headervalue v ON v.physmessage_id=p.id JOIN 
dbmail_headername n ON v.headername_id=n.id WHERE mailbox_idnr = 1329 AND 
status IN (0,1) AND headername ILIKE 'MESSAGE-ID' AND 
SUBSTRING(headervalue,0,255) ILIKE '%<secretstuff>%' ORDER BY message_idnr;

Even though there are indexes on header value and header name they cannot be 
used due to the ILIKE clause.  The query plan looks like this:

                                                                        QUERY 
PLAN                                                                        
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=35461.02..35461.02 rows=1 width=8) (actual time=1494.757..1494.757 
rows=0 loops=1)
   Sort Key: m.message_idnr
   ->  Nested Loop  (cost=4.02..35461.01 rows=1 width=8) (actual 
time=1494.741..1494.741 rows=0 loops=1)
         ->  Nested Loop  (cost=4.02..35454.99 rows=1 width=24) (actual 
time=1494.738..1494.738 rows=0 loops=1)
               ->  Nested Loop  (cost=4.02..35449.05 rows=1 width=32) (actual 
time=1494.736..1494.736 rows=0 loops=1)
                     Join Filter: ("outer".physmessage_id = 
"inner".physmessage_id)
                     ->  Seq Scan on dbmail_headervalue v  (cost=0.00..35429.65 
rows=1 width=16) (actual time=1415.541..1493.264 rows=1 loops=1)
                           Filter: ("substring"(headervalue, 0, 255) ~~* 
'%<secretstuff>%'::text)
                     ->  Bitmap Heap Scan on dbmail_messages m  
(cost=4.02..19.35 rows=4 width=16) (actual time=0.264..1.169 rows=501 loops=1)
                           Recheck Cond: (((mailbox_idnr = 1329) AND (status = 
0)) OR ((mailbox_idnr = 1329) AND (status = 1)))
                           ->  BitmapOr  (cost=4.02..4.02 rows=4 width=0) 
(actual time=0.229..0.229 rows=0 loops=1)
                                 ->  Bitmap Index Scan on dbmail_messages_8  
(cost=0.00..2.01 rows=2 width=0) (actual time=0.217..0.217 rows=501 loops=1)
                                       Index Cond: ((mailbox_idnr = 1329) AND 
(status = 0))
                                 ->  Bitmap Index Scan on dbmail_messages_8  
(cost=0.00..2.01 rows=2 width=0) (actual time=0.009..0.009 rows=0 loops=1)
                                       Index Cond: ((mailbox_idnr = 1329) AND 
(status = 1))
               ->  Index Scan using dbmail_headername_pkey on dbmail_headername 
n  (cost=0.00..5.92 rows=1 width=8) (never executed)
                     Index Cond: ("outer".headername_id = n.id)
                     Filter: ((headername)::text ~~* 'MESSAGE-ID'::text)
         ->  Index Scan using dbmail_physmessage_pkey on dbmail_physmessage p  
(cost=0.00..6.01 rows=1 width=8) (never executed)
               Index Cond: ("outer".physmessage_id = p.id)
 Total runtime: 1494.879 ms
(21 rows)

I modified the query and performance is stellar but I'm unsure of any 
unforeseen consequences.  Here's what I came up with:

SELECT message_idnr FROM dbmail_messages m JOIN dbmail_physmessage p ON 
m.physmessage_id=p.id JOIN dbmail_headervalue v ON v.physmessage_id=p.id JOIN 
dbmail_headername n ON v.headername_id=n.id WHERE mailbox_idnr = 1329 AND 
status IN (0,1) AND lower(headername) = 'message-id' AND lower(headervalue) = 
lower('<secretstuff>') ORDER BY message_idnr;

                                                                                
      QUERY PLAN                                                                
                      
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1173.20..1173.20 rows=1 width=8) (actual time=15.222..15.223 
rows=1 loops=1)
   Sort Key: m.message_idnr
   ->  Nested Loop  (cost=197.55..1173.19 rows=1 width=8) (actual 
time=4.852..15.213 rows=1 loops=1)
         ->  Hash Join  (cost=197.55..1167.16 rows=1 width=24) (actual 
time=4.784..15.143 rows=1 loops=1)
               Hash Cond: ("outer".headername_id = "inner".id)
               ->  Nested Loop  (cost=176.04..1145.08 rows=114 width=32) 
(actual time=4.667..15.024 rows=1 loops=1)
                     ->  Bitmap Heap Scan on dbmail_messages m  
(cost=4.02..19.35 rows=4 width=16) (actual time=0.219..1.283 rows=501 loops=1)
                           Recheck Cond: (((mailbox_idnr = 1329) AND (status = 
0)) OR ((mailbox_idnr = 1329) AND (status = 1)))
                           ->  BitmapOr  (cost=4.02..4.02 rows=4 width=0) 
(actual time=0.192..0.192 rows=0 loops=1)
                                 ->  Bitmap Index Scan on dbmail_messages_8  
(cost=0.00..2.01 rows=2 width=0) (actual time=0.183..0.183 rows=501 loops=1)
                                       Index Cond: ((mailbox_idnr = 1329) AND 
(status = 0))
                                 ->  Bitmap Index Scan on dbmail_messages_8  
(cost=0.00..2.01 rows=2 width=0) (actual time=0.008..0.008 rows=0 loops=1)
                                       Index Cond: ((mailbox_idnr = 1329) AND 
(status = 1))
                     ->  Bitmap Heap Scan on dbmail_headervalue v  
(cost=172.02..281.08 rows=28 width=16) (actual time=0.025..0.025 rows=0 
loops=501)
                           Recheck Cond: ((v.physmessage_id = 
"outer".physmessage_id) AND (lower(v.headervalue) = '<secretstuff>'::text))
                           ->  BitmapAnd  (cost=172.02..172.02 rows=28 width=0) 
(actual time=0.024..0.024 rows=0 loops=501)
                                 ->  Bitmap Index Scan on dbmail_headervalue_2  
(cost=0.00..48.88 rows=5681 width=0) (actual time=0.013..0.013 rows=36 
loops=501)
                                       Index Cond: (v.physmessage_id = 
"outer".physmessage_id)
                                 ->  Bitmap Index Scan on dbmail_headervalue_3  
(cost=0.00..122.89 rows=5681 width=0) (actual time=0.009..0.009 rows=1 
loops=501)
                                       Index Cond: (lower(headervalue) = 
'<secretstuff>'::text)
               ->  Hash  (cost=21.47..21.47 rows=13 width=8) (actual 
time=0.106..0.106 rows=4 loops=1)
                     ->  Bitmap Heap Scan on dbmail_headername n  
(cost=2.05..21.47 rows=13 width=8) (actual time=0.083..0.098 rows=4 loops=1)
                           Recheck Cond: (lower((headername)::text) = 
'message-id'::text)
                           ->  Bitmap Index Scan on dbmail_headername_2  
(cost=0.00..2.05 rows=13 width=0) (actual time=0.065..0.065 rows=4 loops=1)
                                 Index Cond: (lower((headername)::text) = 
'message-id'::text)
         ->  Index Scan using dbmail_physmessage_pkey on dbmail_physmessage p  
(cost=0.00..6.01 rows=1 width=8) (actual time=0.065..0.067 rows=1 loops=1)
               Index Cond: ("outer".physmessage_id = p.id)
 Total runtime: 15.337 ms
(28 rows)

My primary concerns:
  - Are there any reasons why we need to only look at the first 255 characters 
of the header name?
  - Are there any reasons why we would be looking for the <secretstuff> string 
in the middle of a header value?

If those are unfounded then I think this update makes a lot of sense.  Any 
thoughts on whether I can make this change safely?

Thanks,
Tim Mattison
[email protected]
_______________________________________________
Dbmail-dev mailing list
[email protected]
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail-dev

Reply via email to