Re: [AMaViS-user] Wow! CPU high load... [was Wow! CPU overload...]
Matt Juszczak wrote: lookup_ldap: 3861 (51%) (4 seconds) I personally don't use LDAP so I have no idea how to improve this, or if this is as good as it gets. But this is obviously where amavisd-new spends half its time. Ahhh ... so we need a better pool of LDAP servers. Yeah, our LDAP servers are overloaded. You might try to make better indexes on the LDAP server before upgrading the hardware. Unindexed LDAP queries are very expensive! If you use a SunONE-based LDAP server its log shows unindexed queries with a U at the end of the line. Don't know about OpenLDAP, and there are too many versions around... I also hacked into amavisd script and commented out unnecessary LDAP queries, that we will not need. I am now only checking for [EMAIL PROTECTED] and [EMAIL PROTECTED] . Lines involved are 2207 to 2228 in my 2.3.1. Mark, would it be possible to add options in amavisd.conf to configure which queries should be done? So that unapplicable ones are skipped. The output of a script that breaks down TIMING lines tells me that: lookup_ldap: 962893 ms total; 5888 times; 163.534816576087 ms average; 38345 ms max amavisd and LDAP server and MySQL quarantine are on three different machines. Still have to figure out why a peak of 38 seconds! Paolo --- SF.Net email is Sponsored by the Better Software Conference EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile Plan-Driven Development * Managing Projects Teams * Testing QA Security * Process Improvement Measurement * http://www.sqe.com/bsce5sf ___ AMaViS-user mailing list AMaViS-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/amavis-user AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3 AMaViS-HowTos:http://www.amavis.org/howto/
Re: [AMaViS-user] Wow! CPU high load... [was Wow! CPU overload...]
On Thu, Aug 11, 2005 at 09:41:02AM +0200, Paolo Cravero as2594 wrote: snip I also hacked into amavisd script and commented out unnecessary LDAP queries, that we will not need. I am now only checking for [EMAIL PROTECTED] and [EMAIL PROTECTED] . Lines involved are 2207 to 2228 in my 2.3.1. Mark, would it be possible to add options in amavisd.conf to configure which queries should be done? So that unapplicable ones are skipped. Some fairly major changes were made to the LDAP code for 2.3.2. '(mail=%m)' is now expanded into '(|(mail=...)(mail=...)(mail=...))' so only one query is done now for all the keys instead of one query for each key. There were also changes in the error reporting, reattempting of failed queries, addition of '%d' (domain token) in the search base, etc. -- He was so crooked you could use him to pull corks with. Mike Hall, System Admin - Rock Island Communications [EMAIL PROTECTED] System Admin - riverside.org, ssdd.org [EMAIL PROTECTED] --- SF.Net email is Sponsored by the Better Software Conference EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile Plan-Driven Development * Managing Projects Teams * Testing QA Security * Process Improvement Measurement * http://www.sqe.com/bsce5sf ___ AMaViS-user mailing list AMaViS-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/amavis-user AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3 AMaViS-HowTos:http://www.amavis.org/howto/
Re: [AMaViS-user] Wow! CPU high load... [was Wow! CPU overload...]
Paolo Cravero as2594 wrote: Matt Juszczak wrote: lookup_ldap: 3861 (51%) (4 seconds) I personally don't use LDAP so I have no idea how to improve this, or if this is as good as it gets. But this is obviously where amavisd-new spends half its time. Ahhh ... so we need a better pool of LDAP servers. Yeah, our LDAP servers are overloaded. You might try to make better indexes on the LDAP server before upgrading the hardware. Paul, Yep, tried this yesterday :) that was the problem. Added another index for mailRoutingAddress (I already had mailLocalAddress created but I guess the mailRoutingAddress needed one too) and now we're experiencing instant mail delivery with no queues :) -Matt --- SF.Net email is Sponsored by the Better Software Conference EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile Plan-Driven Development * Managing Projects Teams * Testing QA Security * Process Improvement Measurement * http://www.sqe.com/bsce5sf ___ AMaViS-user mailing list AMaViS-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/amavis-user AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3 AMaViS-HowTos:http://www.amavis.org/howto/
Re: [AMaViS-user] Wow! CPU high load... [was Wow! CPU overload...]
On Thu, Aug 11, 2005 at 12:12:44PM -0400, Matt Juszczak wrote: You might try to make better indexes on the LDAP server before upgrading the hardware. Paul, Yep, tried this yesterday :) that was the problem. Added another index for mailRoutingAddress (I already had mailLocalAddress created but I guess the mailRoutingAddress needed one too) and now we're experiencing instant mail delivery with no queues :) You definitely want to make sure you have indexes on any attribute used in searches, it can make a huge difference as you found out. If you're using OpenLDAP and 'bdb' databases you also want to make sure and configure the Berkely DB environment with a DB_CONFIG file, and use db_stat to check things. Below is an excerpt from one of our mail servers at work: $ db_stat-4.2 -h /var/db/openldap-data -m 31MB 257KB 604B Total cache size. 1 Number of caches. 31MB 264KB Pool individual cache size. 0 Requested pages mapped into the process' address space. 4253M Requested pages found in the cache (100%). 40 Requested pages not found in the cache. 4374Pages created in the cache. 40 Pages read into the cache. 405215 Pages written from the cache to the backing file. 0 Clean pages forced from the cache. 0 Dirty pages forced from the cache. 0 Dirty pages written by trickle-sync thread. 4414Current total page count. 4228Current clean page count. 186 Current dirty page count. 4099Number of hash buckets used for page location. 4253M Total number of times hash chains searched for a page. 4 The longest hash chain searched for a page. 1617M Total number of hash buckets examined for page location. 4222M The number of hash bucket locks granted without waiting. 0 The number of hash bucket locks granted after waiting. 0 The maximum number of times any hash bucket lock was waited for. 16125 The number of region locks granted without waiting. 0 The number of region locks granted after waiting. 4514The number of page allocations. 0 The number of hash buckets examined during allocations 0 The max number of hash buckets examined for an allocation 0 The number of pages examined during allocations 0 The max number of pages examined for an allocation ... =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Pool File: mailAlternateAddress.bdb 4096Page size. 0 Requested pages mapped into the process' address space. 44M Requested pages found in the cache (100%). 2 Requested pages not found in the cache. 656 Pages created in the cache. 2 Pages read into the cache. 94273 Pages written from the cache to the backing file. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Pool File: mailAccountStatus.bdb 4096Page size. 0 Requested pages mapped into the process' address space. 1035M Requested pages found in the cache (100%). 2 Requested pages not found in the cache. 34 Pages created in the cache. 2 Pages read into the cache. 7041Pages written from the cache to the backing file. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Pool File: mail.bdb 4096Page size. 0 Requested pages mapped into the process' address space. 346MRequested pages found in the cache (100%). 2 Requested pages not found in the cache. 635 Pages created in the cache. 2 Pages read into the cache. 83314 Pages written from the cache to the backing file. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= ... Correctly sizing the cache can make a big difference as answers can be pulled from it vs acessing the disks. -- Patience HELL! Let's kill something! Mike Hall, System Admin - Rock Island Communications [EMAIL PROTECTED] System Admin - riverside.org, ssdd.org [EMAIL PROTECTED] --- SF.Net email is Sponsored by the Better Software Conference EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile Plan-Driven Development * Managing Projects Teams * Testing QA Security * Process Improvement Measurement * http://www.sqe.com/bsce5sf ___ AMaViS-user mailing list AMaViS-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/amavis-user AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3 AMaViS-HowTos:http://www.amavis.org/howto/
Re: [AMaViS-user] Wow! CPU high load... [was Wow! CPU overload...]
Matt wrote: Hi all, We've got two relay servers setup (relay1 and relay2) and its working fine, but the mail coming in is amazing. I'm glad we went with the two relay server solution instead of everything on one box. Each of these relay servers is a 3 GIG processor with 1 GB ram. Today, the servers each have about 500 messages in the mail spool. The load averages are still under 2.0, not bad (1.6 - 1.8 hovering), and CPU/Mem is OK ... its just not processing them fast enough. Any other ideas? One thought is to thwart the common practice of spammers that target only the secondary MX. you may have something like: 10 MX server1.example.com 20 MX server2.example.com for domains using server1 as primary and 10 MX server2.example.com 20 MX server1.example.com for domains using server2 as primary. Have you considered doing something like this: 10 MX server1.example.com 15 MX dummy.example.com 20 MX server2.example.com and 10 MX server2.example.com 15 MX dummy.example.com 20 MX server1.example.com where dummy can be either a real host (with valid A and rDNS records) that simply does not listen on port 25 (or the device at that IP address is dead or nonexistent) so the sending server times out, or possibly a record that points to a nonexistent host with no A record. I'm not sure which is more appropriate to do. I use the first method, and I am noticing a 25% reduction in traffic on both my primary and (now) tertiary servers since I implemented this last Sunday. I would hope that a legitimate server will try the tertiary server if the primary and secondary are both unresponsive, but even if they don't, they should hopefully queue the mail long enough to get the primary back up and running. There has been talk on the Postfix list about having a dummy primary MX, as opposed to a dummy secondary, but I personally have to believe this may punish legitimate servers somewhat, so I chose not to go that route. I realize this has only been in place for a few days, but the results have been consistent so far. Thoughts? Gary V --- SF.Net email is Sponsored by the Better Software Conference EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile Plan-Driven Development * Managing Projects Teams * Testing QA Security * Process Improvement Measurement * http://www.sqe.com/bsce5sf ___ AMaViS-user mailing list AMaViS-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/amavis-user AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3 AMaViS-HowTos:http://www.amavis.org/howto/
Re: [AMaViS-user] Wow! CPU high load... [was Wow! CPU overload...]
Any other ideas? One thought is to thwart the common practice of spammers that target only the secondary MX. That's actually a good idea :) I'll consider that! But all of our MX records are the same priority anyway, but adding a fake secondary might actually be pretty neat (since spammers mght skip relay1 and relay2 and go right for the priority 20 server which wont exist. In a previous email, you guys asked me to send a debug from a message. Here it is. It seems the timing you were looking for is 7566 ms. Our queue on these servers gets really backed up during business hours. Most of the time it is taking about 5 minutes now for a message to be processed from the queue. We do have IDE drives, but the only I/O I notice is the writing to the log file (/var/log/maillog). I do not have /var/amavis/tmp in a RAM drive, because I monitored that directory and there didn't seem to be too much traffic. Am I missing something? Would moving that dir to memory really help improve I/O? Anyway, here is the output: ---snip--- Aug 10 13:57:27 relay1 amavis[29802]: (29802-01) lookup_ldap_attr(amavisspamkilllevel) (WARN: no such attribute in LDAP entry), [EMAIL PROTECTED] result=undef Aug 10 13:57:27 relay1 amavis[29802]: (29802-01) lookup_ldap_attr(amavisspamtaglevel) (WARN: no such attribute in LDAP entry), [EMAIL PROTECTED] result=undef Aug 10 13:57:27 relay1 postfix/smtpd[29864]: connect from localhost[127.0.0.1] Aug 10 13:57:27 relay1 amavis[29802]: (29802-01) lookup_ldap_attr(amavisspamtag2level) (WARN: no such attribute in LDAP entry), [EMAIL PROTECTED] result=undef Aug 10 13:57:27 relay1 amavis[29802]: (29802-01) SPAM-TAG, [EMAIL PROTECTED] - [EMAIL PROTECTED], Yes, hits=11.178 tagged_above=0.01 required=$ tests=[AWL=-1.428, BAYES_50=0.001, FORGED_MUA_MOZILLA=2.303, HTML_80_90=0.146, HTML_IMAGE_RATIO_04=0.105, HTML_MESSAGE=0.001, HTML_TEXT_AFTER_HTML=0.031, HTML_WEB_BUGS=0.035, MIME_HTML_ONLY=0.177, RAZOR2_CF_RANGE_51_100=0.056, RAZOR2_CHECK=1.511, RCVD_IN_SBL=0.107, URIBL_JP_SURBL=2.462, URIBL_OB_SURBL=3.213, URIBL_SBL=0.996, URIBL_WS_SURBL=1.462] Aug 10 13:57:27 relay1 postfix/smtpd[29864]: E6144BA2995: client=localhost[127.0.0.1] Aug 10 13:57:28 relay1 postfix/cleanup[29878]: E6144BA2995: message-id=[EMAIL PROTECTED] Aug 10 13:57:28 relay1 postfix/qmgr[29828]: E6144BA2995: from=[EMAIL PROTECTED], size=5797, nrcpt=1 (queue active) Aug 10 13:57:28 relay1 postfix/smtpd[29864]: disconnect from localhost[127.0.0.1] Aug 10 13:57:28 relay1 amavis[29802]: (29802-01) FWD via SMTP: [EMAIL PROTECTED] - [EMAIL PROTECTED], 250 2.6.0 Ok, id=29802-01, from MTA([127.0.0.1]:10025): 250 Ok: queued as E6144BA2995 Aug 10 13:57:28 relay1 amavis[29802]: (29802-01) lookup_ldap_attr(amavisspamtag2level) (WARN: no such attribute in LDAP entry), [EMAIL PROTECTED] result=undef Aug 10 13:57:28 relay1 amavis[29802]: (29802-01) Passed SPAM, [123.31.192.80] [123.31.192.80] [EMAIL PROTECTED] - [EMAIL PROTECTED], Message-I$ [EMAIL PROTECTED], mail_id: NM+xH6hBsS6u, Hits: 11.178, 7566 ms Aug 10 13:57:28 relay1 amavis[29802]: (29802-01) TIMING [total 7572 ms] - ldap-prepare: 5 (0%)0, SMTP EHLO: 5 (0%)0, SMTP pre-MAIL: 5 (0%)0, mkdir tempdir: 1 (0%)0, create email.txt: 1 (0%)0, ldap-connect: 17 (0%)0, lookup_ldap: 3861 (51%)51, SMTP pre-DATA-flush: 4 (0%)51, SMTP DATA: 189 (2%)54, body_hash: 2 (0%)54, gen_mail_id: 1 (0%)54, mkdir parts: 1 (0%)54, mime_decode: 14 (0%)54, get-file-type1: 21 (0%)54, parts_decode: 1 (0%)54, AV-scan-1: 15 (0%)55, spam-wb-list: 42 (1%)55, SA msg read: 1 (0%)55, SA parse: 3 (0%)55, SA check: 3215 (42%)98, update_cache: 2 (0%)98, deal_with_mail_size: 1 (0%)98, fwd-connect: 13 (0%)98, fwd-mail-from: 6 (0%)98, fwd-rcpt-to: 14 (0%)98, write-header: 4 (0%)98, fwd-data: 0 (0%)98, fwd-data-end: 100 (1%)100, fwd-rundown: 2 (0%)100, main_log_entry: 24 (0%)100, update_snmp: 1 (0%)100, unlink-1-files: 2 (0%)100, rundown: 0 (0%)100 Aug 10 13:57:28 relay1 postfix/smtp[29850]: 5D0EABA2997: to=[EMAIL PROTECTED], orig_to=[EMAIL PROTECTED], relay=127.0.0.1[127.0.0.1], delay=8, status=sent (250 2.6.0 Ok, id=29802-01, from MTA([127.0.0.1]:10025): 250 Ok: queued as E6144BA2995) ---snip--- Thanks! -Matt --- SF.Net email is Sponsored by the Better Software Conference EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile Plan-Driven Development * Managing Projects Teams * Testing QA Security * Process Improvement Measurement * http://www.sqe.com/bsce5sf ___ AMaViS-user mailing list AMaViS-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/amavis-user AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3 AMaViS-HowTos:http://www.amavis.org/howto/
Re: [AMaViS-user] Wow! CPU high load... [was Wow! CPU overload...]
Matt wrote: Any other ideas? One thought is to thwart the common practice of spammers that target only the secondary MX. That's actually a good idea :) I'll consider that! But all of our MX records are the same priority anyway, but adding a fake secondary might actually be pretty neat (since spammers mght skip relay1 and relay2 and go right for the priority 20 server which wont exist. I'm not sure if it would be effective or not if relay1 and relay2 do round robin. I just don't know. In a previous email, you guys asked me to send a debug from a message. Here it is. It seems the timing you were looking for is 7566 ms. Our queue on these servers gets really backed up during business hours. Most of the time it is taking about 5 minutes now for a message to be processed from the queue. We do have IDE drives, but the only I/O I notice is the writing to the log file (/var/log/maillog). I do not have /var/amavis/tmp in a RAM drive, because I monitored that directory and there didn't seem to be too much traffic. Am I missing something? Would moving that dir to memory really help improve I/O? You have to be very careful, if tmpfs fills up, amavisd-new processes croak. I'm not sure it's worth it primarily due to this fact. Do your homework if you try this. Wow, I was wrong. /var/amavis/tmp does get a crap load of traffic :) I'll look into putting this into memory. Then you may need a crap load of memory! How many amavis* directories are there? A large number may indicate a problem. Anyway, here is the output: lookup_ldap: 3861 (51%) (4 seconds) I personally don't use LDAP so I have no idea how to improve this, or if this is as good as it gets. But this is obviously where amavisd-new spends half its time. SA check: 3215 (42%) (3 seconds - looks OK) Just my 0.02 Gary V --- SF.Net email is Sponsored by the Better Software Conference EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile Plan-Driven Development * Managing Projects Teams * Testing QA Security * Process Improvement Measurement * http://www.sqe.com/bsce5sf ___ AMaViS-user mailing list AMaViS-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/amavis-user AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3 AMaViS-HowTos:http://www.amavis.org/howto/
Re: [AMaViS-user] Wow! CPU high load... [was Wow! CPU overload...]
Then you may need a crap load of memory! How many amavis* directories are there? A large number may indicate a problem. About 20? The directory never gets above 2 or 3 megs though, as long as the razor-agent.log is wiped. Thats why I want to know how to turn it off :) lookup_ldap: 3861 (51%) (4 seconds) I personally don't use LDAP so I have no idea how to improve this, or if this is as good as it gets. But this is obviously where amavisd-new spends half its time. Ahhh ... so we need a better pool of LDAP servers. Yeah, our LDAP servers are overloaded. SA check: 3215 (42%) (3 seconds - looks OK) Thanks :) Just my 0.02 Gary V --- SF.Net email is Sponsored by the Better Software Conference EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile Plan-Driven Development * Managing Projects Teams * Testing QA Security * Process Improvement Measurement * http://www.sqe.com/bsce5sf ___ AMaViS-user mailing list AMaViS-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/amavis-user AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3 AMaViS-HowTos:http://www.amavis.org/howto/ !DSPAM:42fa5216633491064716908! --- SF.Net email is Sponsored by the Better Software Conference EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile Plan-Driven Development * Managing Projects Teams * Testing QA Security * Process Improvement Measurement * http://www.sqe.com/bsce5sf ___ AMaViS-user mailing list AMaViS-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/amavis-user AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3 AMaViS-HowTos:http://www.amavis.org/howto/
Re: [AMaViS-user] Wow! CPU high load... [was Wow! CPU overload...]
Matt wrote: Then you may need a crap load of memory! How many amavis* directories are there? A large number may indicate a problem. About 20? The directory never gets above 2 or 3 megs though, as long as the razor-agent.log is wiped. Thats why I want to know how to turn it off :) This looks good. Usually one directory for each child process. A few left over is both typical and insignificant. lookup_ldap: 3861 (51%) (4 seconds) I personally don't use LDAP so I have no idea how to improve this, or if this is as good as it gets. But this is obviously where amavisd-new spends half its time. Ahhh ... so we need a better pool of LDAP servers. Yeah, our LDAP servers are overloaded. SA check: 3215 (42%) (3 seconds - looks OK) Thanks :) ldap-prepare: 5 (0%)0, SMTP EHLO: 5 (0%)0, SMTP pre-MAIL: 5 (0%)0, mkdir tempdir: 1(0%)0, create email.txt: 1 (0%)0, ldap-connect: 17 (0%)0, lookup_ldap: 3861 (51%)51, SMTP pre-DATA-flush: 4 (0%)51, SMTP DATA: 189 (2%)54, body_hash: 2 (0%)54, gen_mail_id: 1 (0%)54, mkdir parts: 1 (0%)54, mime_decode: 14 (0%)54, get-file-type1: 21 (0%)54, parts_decode: 1 (0%)54, AV-scan-1: 15 (0%)55, spam-wb-list: 42 (1%)55, SA msg read: 1 (0%)55, SA parse: 3 (0%)55, SA check: 3215 (42%)98, update_cache: 2 (0%)98, deal_with_mail_size: 1 (0%)98, fwd-connect: 13 (0%)98, fwd-mail-from: 6 (0%)98, fwd-rcpt-to: 14 (0%)98, write-header: 4 (0%)98, fwd-data: 0 (0%)98, fwd-data-end: 100 (1%)100, fwd-rundown: 2 (0%)100, main_log_entry: 24 (0%)100, update_snmp: 1 (0%)100, unlink-1-files: 2 (0%)100, rundown: 0 (0%)100 Your timing results show that amavisd-new really does not spend much time at all decoding messages. I'm sure others will disagree, but considering this, I personally would avoid the additional risk that comes with using tmpfs. Like I said: Just my 0.02 Gary V --- SF.Net email is Sponsored by the Better Software Conference EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile Plan-Driven Development * Managing Projects Teams * Testing QA Security * Process Improvement Measurement * http://www.sqe.com/bsce5sf ___ AMaViS-user mailing list AMaViS-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/amavis-user AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3 AMaViS-HowTos:http://www.amavis.org/howto/