Hi all,

We are struggling to find a way to rework this query in a way
that performs efficiently for large data sets.

The goal is to find the top x most active senders of e-mails
within a date range.

The killer of this query is the ORDER BY clause.  Without it
the results are quick and snappy.  Obviously to get the LIMIT
to provide useful results we have to use the ORDER.  It seems
so close to being able to work efficiently, if only it could be
performing the LIMIT/sort while it's collating the groups.

The magnitude of data we are working with is in the millions
for both message and sender records and this can take hours to
complete the query.

Any ideas would be greatly appreciated.

Thanks, Josh


SELECT
    EMailAddress, COUNT(*) AS Total
FROM
   senders
INNER JOIN
    messages ON messages.message_ID = senders.message_ID
INNER JOIN
email_addresses ON senders.email_address_ID = email_addresses.email_address_ID
WHERE
    CAST (messages.TimeStamp AS INTEGER) BETWEEN 0 AND 9999999999
GROUP BY
    senders.email_address_ID
ORDER BY Total DESC
LIMIT 50

Table create statements:
CREATE TABLE messages (message_ID INTEGER PRIMARY KEY AUTOINCREMENT, TimeStamp INTEGER);
CREATE INDEX messages_timestamp_index ON messages (TimeStamp);

CREATE TABLE email_addresses (email_address_ID INTEGER PRIMARY KEY AUTOINCREMENT, EMailAddress TEXT UNIQUE);

CREATE TABLE senders (message_ID INTEGER, email_address_ID INTEGER DEFAULT NULL, FOREIGN KEY(message_ID) REFERENCES messages(message_ID) ON DELETE CASCADE);
CREATE INDEX senders_emailAddressID_index ON senders (email_address_ID);
CREATE UNIQUE INDEX senders_constraint_index ON senders (message_ID, email_address_ID);



_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to