A NOTE has been added to this issue. 
====================================================================== 
http://dbmail.org/mantis/view.php?id=880 
====================================================================== 
Reported By:                xentac
Assigned To:                
====================================================================== 
Project:                    DBMail
Issue ID:                   880
Category:                   Database layer
Reproducibility:            always
Severity:                   major
Priority:                   normal
Status:                     confirmed
target:                      
====================================================================== 
Date Submitted:             31-Jan-11 23:48 CET
Last Modified:              11-Mar-11 19:26 CET
====================================================================== 
Summary:                    Stored Procedure to improve header search speed
Description: 
Given the dbmail_messages, dbmail_physmessages, dbmail_headervalue, and
dbmail_headername schema, searching for all messages within a certain
folder that have a certain header value ends up with a query like this (in
dbmail-mailbox.c):

SELECT message_idnr FROM dbmail_messages m JOIN dbmail_physmessage p ON
m.physmessage_id=p.id JOIN dbmail_headervalue v ON v.physmessage_id=p.id
JOIN dbmail_headername n ON v.headername_id=n.id WHERE mailbox_idnr = 1 AND
status IN (0,1) AND headername ILIKE 'Message-ID' AND
SUBSTRING(headervalue,0,255) ILIKE '%something%' ORDER BY message_idnr

Once dbmail_headervalue gets sufficiently large, Postgresql decides it's
more efficient to seq scan dbmail_headervalue for the ILIKE match instead
of first looping over dbmail_messages, dbmail_physmessages, and
dbmail_headername.

Because we know more than 99% of the rows in dbmail_headervalue will never
show up in the result set, we can tune the query better by using a stored
procedure.  One such stored procedure is shown in the procedures branch of
paul's dbmail repo (http://git.dbmail.eu/paul/dbmail/log/?h=procedures).

It first finds all messages within the folder and all the headername ids
that match the header name.  It then searches through dbmail_headervalue
using an index on physmessage_id and headername_id (might not have been
created in the original patch), only ILIKE'ing messages that even show up
in the folder.  From there it takes those rows and selects the original
dbmail_messages data for returning back to the process.  The output of the
stored procedure is exactly the same as the above query but it runs
hundreds to thousands of times faster, even for folders with 20k messages
in them.

A similar stored proc would work fine in 3.0, because the header values
are stored the same way.
====================================================================== 

---------------------------------------------------------------------- 
 (0003157) paul (administrator) - 01-Feb-11 09:07
 http://dbmail.org/mantis/view.php?id=880#c3157 
---------------------------------------------------------------------- 
Just for tracking purposes: what's missing in the procedures branch at the
moment is procedures for mysql and oracle, and a work-around or fallback
solution for sqlite. 

Other than that this looks like a very interesting avenue for some pretty
amazing speedups. 

---------------------------------------------------------------------- 
 (0003170) xentac (reporter) - 11-Mar-11 19:26
 http://dbmail.org/mantis/view.php?id=880#c3170 
---------------------------------------------------------------------- 
I wrote another stored procedure to improve header fetching as well.  This
was the second most inefficient query we've run into.

The code is on github: https://github.com/xentac/dbmail/tree/procedures. 

Issue History 
Date Modified    Username       Field                    Change               
====================================================================== 
31-Jan-11 23:48  xentac         New Issue                                    
01-Feb-11 09:07  paul           Note Added: 0003157                          
01-Feb-11 09:07  paul           Status                   new => confirmed    
01-Feb-11 09:07  paul           Projection               none => major rework
11-Mar-11 19:26  xentac         Note Added: 0003170                          
======================================================================

_______________________________________________
Dbmail-dev mailing list
Dbmail-dev@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail-dev

Reply via email to