I've got a production amavisd-new array of servers handling roughly
16,000,000 messages per day.  Users can toggle quarantining.  With the
quarantines that are enabled and a 2 week retention we have roughly 12
million message in quarantine at any given time.  We send a summary
email to subscribers based on their preference (daily, weekly,
monthly) containing the list of messages they have in quarantine.  My
problem comes into play generating these summary emails.   The SQL
server just seems to hack and cough and take too long on the
individual queries to keep up any longer.  Notably, this is a fairly
new server.  Data and comments follow:

Server Stats
AMD Quad Core Opteron 2.3Ghz
8GB ECC RAM
4 x 300GB WD 10krpm Raptors in RAID-10 array

Vital my.conf configuration
# Innodb settings
innodb_file_per_table
innodb_buffer_pool_size  = 5120M
innodb_log_buffer_size = 4M
innodb_flush_log_at_trx_commit = 0
innodb_flush_method = O_DIRECT
thread_concurrency = 8

When running the list of status reports, CPU usage hovers around an
aggregate 2-3%, and that's with the CPUs clocked down by cpuspeed to
1100mhz.  CPU doesn't seem to be an issue.

Disk I/O does not seem to be maxxed.  Disk I/O seems to hover around
30-50%.  Output from iostat -kx 60 for relevant disks:
Device:         rrqm/s   wrqm/s   r/s   w/s    rkB/s    wkB/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               0.25     8.23 45.09 12.93   617.70   272.89    30.70
    0.41    7.09   6.49  37.64
sdb               0.25     8.23 44.98 12.93   617.03   272.89    30.74
    0.39    6.79   6.32  36.59
sdc               0.43     8.10 46.46 13.61   641.56   280.95    30.71
    0.37    6.17   5.71  34.31
sdd               0.63     8.10 46.46 13.61   635.83   280.95    30.52
    0.38    6.40   5.84  35.08



RAM get's pretty well used up by InnoDB, here's details of RAM after
the server's been online for about 45 minutes.
Mem:   7986040k total,  7636416k used,   349624k free,    73244k buffers
Swap:  3951952k total,      164k used,  3951788k free,  1352148k cached

That tables I'm querying on look as follows (not exact because innodb
doesn't report exact numbers in summary):
maddr  9045756 rows, 0.8GB data, 1.8GB index
msgrcpt  26639273 rows, 3.6GB data, 2.2GB index
msgs  24832832 rows, 10.7GB data, 6.1GB index
quarantine 11776200 rows, 93.8GB data, 302MB index

The query.....
SELECT maddr.email AS email, bspam_level, time_num, from_addr,
subject, msgs.mail_id, secret_id
FROM amavisd_stats.msgrcpt, amavisd_stats.msgs, amavisd_stats.maddr
WHERE (maddr.email = 'j...@smith.com')
AND msgrcpt.rid = maddr.id
AND msgrcpt.rs != 'R'
AND (msgrcpt.ds = 'D' OR msgrcpt.ds = 'R' OR msgrcpt.ds = 'B')
AND quar_type = 'Q'
AND time_num >= '1278027742'
AND msgrcpt.mail_id = msgs.mail_id
ORDER BY time_num DESC

EXPLAIN QUERY output (I hope this formats OK via email), if not, it
can be found at http://locu.net/temp/amavis_explain.txt.  I've also
appended the output of SHOW ENGINE INNODB STATUS to the URL.
+----+-------------+---------+------+-------------------------------------+-----------------+---------+-------------------------------+------+-----------------------------------------------------------+
| id | select_type | table   | type | possible_keys
   | key             | key_len | ref                           | rows
| Extra                                                     |
+----+-------------+---------+------+-------------------------------------+-----------------+---------+-------------------------------+------+-----------------------------------------------------------+
|  1 | SIMPLE      | maddr   | ref  | PRIMARY,email
   | email           | 257     | const                         |   55
| Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | msgrcpt | ref  |
msgrcpt_idx_mail_id,msgrcpt_idx_rid | msgrcpt_idx_rid | 4       |
amavisd_stats.maddr.id        |   14 | Using where
                          |
|  1 | SIMPLE      | msgs    | ref  | PRIMARY,msgs_idx_time_num
   | PRIMARY         | 14      | amavisd_stats.msgrcpt.mail_id |    1
| Using where                                               |
+----+-------------+---------+------+-------------------------------------+-----------------+---------+-------------------------------+------+-----------------------------------------------------------+

Provided I'm reading the EXPLAIN output correctly it shows it's
successfully using indexes, and searching a limited number of rows.

More data from the slow query log on one of the same queries.
# Time: 100702 17:11:39
# Query_time: 264.682337  Lock_time: 0.000319 Rows_sent: 217
Rows_examined: 16933
SET timestamp=1278115899;
SELECT maddr.email AS email, bspam_level, time_num, from_addr,
subject, msgs.mail_id, secret_id
                                                                 FROM
amavisd_stats.msgrcpt, amavisd_stats.msgs, amavisd_stats.maddr
                                                                WHERE
(maddr.email = 'j...@smith.com')
                                                                  AND
msgrcpt.rid = maddr.id
                                                                  AND
msgrcpt.rs != 'R'
                                                                  AND
(msgrcpt.ds = 'D' OR msgrcpt.ds = 'R' OR msgrcpt.ds = 'B')
                                                                  AND
quar_type = 'Q'
                                                                  AND
time_num >= '1278029235'
                                                                  AND
msgrcpt.mail_id = msgs.mail_id
                                                                ORDER
BY time_num DESC;

The thing that gets me is the Query time being 264 seconds that when
watching status sits in status 'Copying to tmp table'.  I know InnoDB
doesn't report lock time accurately in the slow query log; however,
there should be no locks affecting a SELECT query anyways.  If the
same query is run using a fake user that doesn't exist, it returns
swiftly with zero results.  Users with more results take longer to
query.  Also running the same query a 2nd time produces results almost
immediately.   RAM would be my first guess; however, if I recall
correctly, these queries were just as slow when the combined total of
indexes were less than the available RAM dedicated to innodb buffer
pool.

I'm hoping there's somebody on the list running a high capacity setup
with some advice.

Thanks,
N

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
AMaViS-user mailing list
AMaViS-user@lists.sourceforge.net 
https://lists.sourceforge.net/lists/listinfo/amavis-user 
 Please visit http://www.ijs.si/software/amavisd/ regularly
 For administrativa requests please send email to rainer at openantivirus dot 
org

Reply via email to