The following issue has been SUBMITTED. ====================================================================== http://dbmail.org/mantis/view.php?id=880 ====================================================================== Reported By: xentac Assigned To: ====================================================================== Project: DBMail Issue ID: 880 Category: Database layer Reproducibility: always Severity: major Priority: normal Status: new target: ====================================================================== Date Submitted: 31-Jan-11 23:48 CET Last Modified: 31-Jan-11 23:48 CET ====================================================================== Summary: Stored Procedure to improve header search speed Description: Given the dbmail_messages, dbmail_physmessages, dbmail_headervalue, and dbmail_headername schema, searching for all messages within a certain folder that have a certain header value ends up with a query like this (in dbmail-mailbox.c):
SELECT message_idnr FROM dbmail_messages m JOIN dbmail_physmessage p ON m.physmessage_id=p.id JOIN dbmail_headervalue v ON v.physmessage_id=p.id JOIN dbmail_headername n ON v.headername_id=n.id WHERE mailbox_idnr = 1 AND status IN (0,1) AND headername ILIKE 'Message-ID' AND SUBSTRING(headervalue,0,255) ILIKE '%something%' ORDER BY message_idnr Once dbmail_headervalue gets sufficiently large, Postgresql decides it's more efficient to seq scan dbmail_headervalue for the ILIKE match instead of first looping over dbmail_messages, dbmail_physmessages, and dbmail_headername. Because we know more than 99% of the rows in dbmail_headervalue will never show up in the result set, we can tune the query better by using a stored procedure. One such stored procedure is shown in the procedures branch of paul's dbmail repo (http://git.dbmail.eu/paul/dbmail/log/?h=procedures). It first finds all messages within the folder and all the headername ids that match the header name. It then searches through dbmail_headervalue using an index on physmessage_id and headername_id (might not have been created in the original patch), only ILIKE'ing messages that even show up in the folder. From there it takes those rows and selects the original dbmail_messages data for returning back to the process. The output of the stored procedure is exactly the same as the above query but it runs hundreds to thousands of times faster, even for folders with 20k messages in them. A similar stored proc would work fine in 3.0, because the header values are stored the same way. ====================================================================== Issue History Date Modified Username Field Change ====================================================================== 31-Jan-11 23:48 xentac New Issue ====================================================================== _______________________________________________ Dbmail-dev mailing list [email protected] http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail-dev
