interesting problem with SQL backend

2005-03-24 Thread alan premselaar
Today I had an interesting situation.
This is more of an FYI in case anyone else has run into similar 
problems. (cross-posted to MIMEDefang list as well)

I use SpamAssassin with MIMEDefang.
I got notified by one of my users that they were unable to send mail 
suddenly.  after checking the logs I determined that MIMEDefang was 
timing out and returning errors.  the cause for this was very unclear 
(which is why i'm sharing my findings with all of you)...

After digging around (and some assistance from David Skoll on the 
MIMEDefang list) I was able to determine that the problem was caused by 
SpamAssassin not being able to connect to the database server where the 
bayes database is stored. (using MySQL on a remote host)

this caused all sorts of weirdness for no apparently good reason and 
was initially very confusing to diagnose.

The symptoms were:
* mimedefang started to return busy timeout errors.
* when restarting MIMEDefang (with embedded perl enabled) the 
multiplexor wouldn't complete loading and mimedefang wouldn't create the 
socket, causing sendmail to spit out file 
/path/to/mimedefang/socket/file unsafe errors.
* turning off embedded perl would allow mimedefang to start and create 
the socket, but then would spawn multiple instances of mimedefang.pl 
which just hung.
* mimedefang.pl -test and/or mimedefang.pl -features would hang 
indefinitely with no output.

the workaround:
  after determining the problem to be the connection to the SQL server, 
simply setting use_bayes 0 in sa-mimedefang.cf and restarting 
mimedefang resolved the problem. however, this obviously didn't utilize 
the bayes facilities.

the questions:
 I understand that the SQL code for SA is still 'experimental'.  is 
there any way currently to set a forced timeout to connect to the SQL 
server?

is this something I should open a BZ ticket about?
being that I'm definitely not an SQL guru, does anyone have any 
suggestions for configuring a high-availability MySQL server 
configuration that could failover to a backup server should the primary 
one become incapacitated by a low-level hard drive failure?

Currently I have 1 MySQL database server with the bayes databases on it 
(among other databases) and my primary and secondary mail servers both 
make connections to it to check the bayes database.

This may be somewhat specific to the MIMEDefang implentation, but I 
suspect that there is a possibility that this type of behavior could 
have negative impact in other types of SA implementations as well.
again, this is mostly an FYI, but any suggestions are welcome.

Thanks,
Alan


Re: interesting problem with SQL backend

2005-03-24 Thread Michael Parker
On Fri, Mar 25, 2005 at 01:43:49AM +0900, alan premselaar wrote:
 
  I understand that the SQL code for SA is still 'experimental'.  is 
 there any way currently to set a forced timeout to connect to the SQL 
 server?

I would certainly call it less experimental these days, as it's turned
out to be quite stable.  I'm very confused by your message and really
want to see some more debug.  If the BayesSQL code is unable to
connect to your database then it should just return and move on.
There is nothing that would cause a delay.  At least at the SA code
level.  I suppose it is possible that the underlying DBD::mysql module
could be hanging trying to connect to a misbehaving server.

Can you recreate this hang at will?  If so, can you please run SA in
debug mode and send the output so we can review.  Especially if you
can remove mimedefang from the picture because it's not clear that
it's use of SA wasn't the cause of the problem.

I've run thousands of tests, and literally billions of queries through
the SA BayesSQL code and never seen an error such as this.

 
 is this something I should open a BZ ticket about?
 

No.  Unless we can find something in the debug that indicates a
failure on the SA codes part.

Michael


pgpiiH1PEFLZI.pgp
Description: PGP signature


Re: interesting problem with SQL backend

2005-03-24 Thread JamesDR
Check to make sure you have PTR records in your dns for the SA computer. 
If not, mysql has to fall down the name res. chain to get the name of 
the computer that's connecting. I've seen delays from 20secs to a minute 
while mysql tries to determine the host name during the auth stage of 
connecting.

Thanks,
JamesDR
alan premselaar wrote:
Today I had an interesting situation.
This is more of an FYI in case anyone else has run into similar 
problems. (cross-posted to MIMEDefang list as well)

I use SpamAssassin with MIMEDefang.
I got notified by one of my users that they were unable to send mail 
suddenly.  after checking the logs I determined that MIMEDefang was 
timing out and returning errors.  the cause for this was very unclear 
(which is why i'm sharing my findings with all of you)...

After digging around (and some assistance from David Skoll on the 
MIMEDefang list) I was able to determine that the problem was caused by 
SpamAssassin not being able to connect to the database server where the 
bayes database is stored. (using MySQL on a remote host)

this caused all sorts of weirdness for no apparently good reason and 
was initially very confusing to diagnose.

The symptoms were:
* mimedefang started to return busy timeout errors.
* when restarting MIMEDefang (with embedded perl enabled) the 
multiplexor wouldn't complete loading and mimedefang wouldn't create the 
socket, causing sendmail to spit out file 
/path/to/mimedefang/socket/file unsafe errors.
* turning off embedded perl would allow mimedefang to start and create 
the socket, but then would spawn multiple instances of mimedefang.pl 
which just hung.
* mimedefang.pl -test and/or mimedefang.pl -features would hang 
indefinitely with no output.

the workaround:
  after determining the problem to be the connection to the SQL server, 
simply setting use_bayes 0 in sa-mimedefang.cf and restarting 
mimedefang resolved the problem. however, this obviously didn't utilize 
the bayes facilities.

the questions:
 I understand that the SQL code for SA is still 'experimental'.  is 
there any way currently to set a forced timeout to connect to the SQL 
server?

is this something I should open a BZ ticket about?
being that I'm definitely not an SQL guru, does anyone have any 
suggestions for configuring a high-availability MySQL server 
configuration that could failover to a backup server should the primary 
one become incapacitated by a low-level hard drive failure?

Currently I have 1 MySQL database server with the bayes databases on it 
(among other databases) and my primary and secondary mail servers both 
make connections to it to check the bayes database.

This may be somewhat specific to the MIMEDefang implentation, but I 
suspect that there is a possibility that this type of behavior could 
have negative impact in other types of SA implementations as well.
again, this is mostly an FYI, but any suggestions are welcome.

Thanks,
Alan


smime.p7s
Description: S/MIME Cryptographic Signature


Re: interesting problem with SQL backend

2005-03-24 Thread Kevin Sullivan
--On 03/25/05 01:43:49 +0900 alan premselaar wrote:
I got notified by one of my users that they were unable to send mail
suddenly.  after checking the logs I determined that MIMEDefang was
timing out and returning errors.  the cause for this was very unclear
(which is why i'm sharing my findings with all of you)...
After digging around (and some assistance from David Skoll on the
MIMEDefang list) I was able to determine that the problem was caused by
SpamAssassin not being able to connect to the database server where the
bayes database is stored. (using MySQL on a remote host)
I've modified my mimedefang-filter file so that mail from internal IP 
addresses and mail sent via SMTP-AUTH (trusted mail) is not sent through 
SpamAssassin.

In general, SMTP is very resistant to delays (it just stores the message 
and tries again later).  The sole exception is the connection from the MUA 
to the initial MTA; the MUA can't hang around forever retrying so the user 
gets an eror message.  The more subsystems we add to our mail servers (spam 
and virus checking, for example) the more random delays we'll get.  If 
you bypass the checking for mail sent from trusted machines then you 
reduce the chance of delays.

Now, in my case, I can trust my users not to send spam and viruses.  If you 
can't, then you can set up a different machine for outgoing mail than for 
incoming mail.  Have the outgoing mailer *always* accept mail from internal 
or authenticated hosts, queue it, then scan the queue.  This way mail is 
still always scanned but if you have SpamAssassin-caused delays your users 
probably won't notice.

Of course, the ideal solution is for SpamAssassin to never cause delays. 
Unfortunately this isn't realistic, so the next best solution is to have 
mail continue to work even when SpamAssassin isn't working.

	-Kevin


pgpQ2I9vuBWwh.pgp
Description: PGP signature