A NOTE has been added to this issue. ====================================================================== http://dbmail.org/mantis/view.php?id=880 ====================================================================== Reported By: xentac Assigned To: ====================================================================== Project: DBMail Issue ID: 880 Category: Database layer Reproducibility: always Severity: major Priority: normal Status: confirmed target: ====================================================================== Date Submitted: 31-Jan-11 23:48 CET Last Modified: 11-Mar-11 19:26 CET ====================================================================== Summary: Stored Procedure to improve header search speed Description: Given the dbmail_messages, dbmail_physmessages, dbmail_headervalue, and dbmail_headername schema, searching for all messages within a certain folder that have a certain header value ends up with a query like this (in dbmail-mailbox.c):
SELECT message_idnr FROM dbmail_messages m JOIN dbmail_physmessage p ON m.physmessage_id=p.id JOIN dbmail_headervalue v ON v.physmessage_id=p.id JOIN dbmail_headername n ON v.headername_id=n.id WHERE mailbox_idnr = 1 AND status IN (0,1) AND headername ILIKE 'Message-ID' AND SUBSTRING(headervalue,0,255) ILIKE '%something%' ORDER BY message_idnr Once dbmail_headervalue gets sufficiently large, Postgresql decides it's more efficient to seq scan dbmail_headervalue for the ILIKE match instead of first looping over dbmail_messages, dbmail_physmessages, and dbmail_headername. Because we know more than 99% of the rows in dbmail_headervalue will never show up in the result set, we can tune the query better by using a stored procedure. One such stored procedure is shown in the procedures branch of paul's dbmail repo (http://git.dbmail.eu/paul/dbmail/log/?h=procedures). It first finds all messages within the folder and all the headername ids that match the header name. It then searches through dbmail_headervalue using an index on physmessage_id and headername_id (might not have been created in the original patch), only ILIKE'ing messages that even show up in the folder. From there it takes those rows and selects the original dbmail_messages data for returning back to the process. The output of the stored procedure is exactly the same as the above query but it runs hundreds to thousands of times faster, even for folders with 20k messages in them. A similar stored proc would work fine in 3.0, because the header values are stored the same way. ====================================================================== ---------------------------------------------------------------------- (0003157) paul (administrator) - 01-Feb-11 09:07 http://dbmail.org/mantis/view.php?id=880#c3157 ---------------------------------------------------------------------- Just for tracking purposes: what's missing in the procedures branch at the moment is procedures for mysql and oracle, and a work-around or fallback solution for sqlite. Other than that this looks like a very interesting avenue for some pretty amazing speedups. ---------------------------------------------------------------------- (0003170) xentac (reporter) - 11-Mar-11 19:26 http://dbmail.org/mantis/view.php?id=880#c3170 ---------------------------------------------------------------------- I wrote another stored procedure to improve header fetching as well. This was the second most inefficient query we've run into. The code is on github: https://github.com/xentac/dbmail/tree/procedures. Issue History Date Modified Username Field Change ====================================================================== 31-Jan-11 23:48 xentac New Issue 01-Feb-11 09:07 paul Note Added: 0003157 01-Feb-11 09:07 paul Status new => confirmed 01-Feb-11 09:07 paul Projection none => major rework 11-Mar-11 19:26 xentac Note Added: 0003170 ====================================================================== _______________________________________________ Dbmail-dev mailing list Dbmail-dev@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail-dev