interesting problem with SQL backend
Today I had an interesting situation. This is more of an FYI in case anyone else has run into similar problems. (cross-posted to MIMEDefang list as well) I use SpamAssassin with MIMEDefang. I got notified by one of my users that they were unable to send mail suddenly. after checking the logs I determined that MIMEDefang was timing out and returning errors. the cause for this was very unclear (which is why i'm sharing my findings with all of you)... After digging around (and some assistance from David Skoll on the MIMEDefang list) I was able to determine that the problem was caused by SpamAssassin not being able to connect to the database server where the bayes database is stored. (using MySQL on a remote host) this caused all sorts of weirdness for no apparently good reason and was initially very confusing to diagnose. The symptoms were: * mimedefang started to return busy timeout errors. * when restarting MIMEDefang (with embedded perl enabled) the multiplexor wouldn't complete loading and mimedefang wouldn't create the socket, causing sendmail to spit out file /path/to/mimedefang/socket/file unsafe errors. * turning off embedded perl would allow mimedefang to start and create the socket, but then would spawn multiple instances of mimedefang.pl which just hung. * mimedefang.pl -test and/or mimedefang.pl -features would hang indefinitely with no output. the workaround: after determining the problem to be the connection to the SQL server, simply setting use_bayes 0 in sa-mimedefang.cf and restarting mimedefang resolved the problem. however, this obviously didn't utilize the bayes facilities. the questions: I understand that the SQL code for SA is still 'experimental'. is there any way currently to set a forced timeout to connect to the SQL server? is this something I should open a BZ ticket about? being that I'm definitely not an SQL guru, does anyone have any suggestions for configuring a high-availability MySQL server configuration that could failover to a backup server should the primary one become incapacitated by a low-level hard drive failure? Currently I have 1 MySQL database server with the bayes databases on it (among other databases) and my primary and secondary mail servers both make connections to it to check the bayes database. This may be somewhat specific to the MIMEDefang implentation, but I suspect that there is a possibility that this type of behavior could have negative impact in other types of SA implementations as well. again, this is mostly an FYI, but any suggestions are welcome. Thanks, Alan
Re: interesting problem with SQL backend
On Fri, Mar 25, 2005 at 01:43:49AM +0900, alan premselaar wrote: I understand that the SQL code for SA is still 'experimental'. is there any way currently to set a forced timeout to connect to the SQL server? I would certainly call it less experimental these days, as it's turned out to be quite stable. I'm very confused by your message and really want to see some more debug. If the BayesSQL code is unable to connect to your database then it should just return and move on. There is nothing that would cause a delay. At least at the SA code level. I suppose it is possible that the underlying DBD::mysql module could be hanging trying to connect to a misbehaving server. Can you recreate this hang at will? If so, can you please run SA in debug mode and send the output so we can review. Especially if you can remove mimedefang from the picture because it's not clear that it's use of SA wasn't the cause of the problem. I've run thousands of tests, and literally billions of queries through the SA BayesSQL code and never seen an error such as this. is this something I should open a BZ ticket about? No. Unless we can find something in the debug that indicates a failure on the SA codes part. Michael pgpiiH1PEFLZI.pgp Description: PGP signature
Re: interesting problem with SQL backend
Check to make sure you have PTR records in your dns for the SA computer. If not, mysql has to fall down the name res. chain to get the name of the computer that's connecting. I've seen delays from 20secs to a minute while mysql tries to determine the host name during the auth stage of connecting. Thanks, JamesDR alan premselaar wrote: Today I had an interesting situation. This is more of an FYI in case anyone else has run into similar problems. (cross-posted to MIMEDefang list as well) I use SpamAssassin with MIMEDefang. I got notified by one of my users that they were unable to send mail suddenly. after checking the logs I determined that MIMEDefang was timing out and returning errors. the cause for this was very unclear (which is why i'm sharing my findings with all of you)... After digging around (and some assistance from David Skoll on the MIMEDefang list) I was able to determine that the problem was caused by SpamAssassin not being able to connect to the database server where the bayes database is stored. (using MySQL on a remote host) this caused all sorts of weirdness for no apparently good reason and was initially very confusing to diagnose. The symptoms were: * mimedefang started to return busy timeout errors. * when restarting MIMEDefang (with embedded perl enabled) the multiplexor wouldn't complete loading and mimedefang wouldn't create the socket, causing sendmail to spit out file /path/to/mimedefang/socket/file unsafe errors. * turning off embedded perl would allow mimedefang to start and create the socket, but then would spawn multiple instances of mimedefang.pl which just hung. * mimedefang.pl -test and/or mimedefang.pl -features would hang indefinitely with no output. the workaround: after determining the problem to be the connection to the SQL server, simply setting use_bayes 0 in sa-mimedefang.cf and restarting mimedefang resolved the problem. however, this obviously didn't utilize the bayes facilities. the questions: I understand that the SQL code for SA is still 'experimental'. is there any way currently to set a forced timeout to connect to the SQL server? is this something I should open a BZ ticket about? being that I'm definitely not an SQL guru, does anyone have any suggestions for configuring a high-availability MySQL server configuration that could failover to a backup server should the primary one become incapacitated by a low-level hard drive failure? Currently I have 1 MySQL database server with the bayes databases on it (among other databases) and my primary and secondary mail servers both make connections to it to check the bayes database. This may be somewhat specific to the MIMEDefang implentation, but I suspect that there is a possibility that this type of behavior could have negative impact in other types of SA implementations as well. again, this is mostly an FYI, but any suggestions are welcome. Thanks, Alan smime.p7s Description: S/MIME Cryptographic Signature
Re: interesting problem with SQL backend
--On 03/25/05 01:43:49 +0900 alan premselaar wrote: I got notified by one of my users that they were unable to send mail suddenly. after checking the logs I determined that MIMEDefang was timing out and returning errors. the cause for this was very unclear (which is why i'm sharing my findings with all of you)... After digging around (and some assistance from David Skoll on the MIMEDefang list) I was able to determine that the problem was caused by SpamAssassin not being able to connect to the database server where the bayes database is stored. (using MySQL on a remote host) I've modified my mimedefang-filter file so that mail from internal IP addresses and mail sent via SMTP-AUTH (trusted mail) is not sent through SpamAssassin. In general, SMTP is very resistant to delays (it just stores the message and tries again later). The sole exception is the connection from the MUA to the initial MTA; the MUA can't hang around forever retrying so the user gets an eror message. The more subsystems we add to our mail servers (spam and virus checking, for example) the more random delays we'll get. If you bypass the checking for mail sent from trusted machines then you reduce the chance of delays. Now, in my case, I can trust my users not to send spam and viruses. If you can't, then you can set up a different machine for outgoing mail than for incoming mail. Have the outgoing mailer *always* accept mail from internal or authenticated hosts, queue it, then scan the queue. This way mail is still always scanned but if you have SpamAssassin-caused delays your users probably won't notice. Of course, the ideal solution is for SpamAssassin to never cause delays. Unfortunately this isn't realistic, so the next best solution is to have mail continue to work even when SpamAssassin isn't working. -Kevin pgpQ2I9vuBWwh.pgp Description: PGP signature