Problem synchronizing database of two spamassassins

2006-11-07 Thread Angel L. Mateo
Hello,

We have two incoming email servers for our organization. We are running
spamassassin in these servers (debian sarge + postfix 2.1.5 +
spamassassin 3.1.0a). To syncronize spamassassin's database and journal
we copy the /var/lib/amavis/.spamassassin of one server (let's call it
the master server) in the other (and run the sa-learn --sync, the slave
server). We also do all the learn operations in the master server.

With this I thought that these two servers should behave the same way,
but I am observing that they scored different the same messages. For
example, for one message the master server returns for the command spamc
-d master:

X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on
xenon1.telemat.um.es
X-Spam-Level: ***
X-Spam-Status: No, score=3.1 required=5.0
tests=BAYES_60,EXTRA_MPART_TYPE,
HTML_00_10,HTML_MESSAGE,HTML_TAG_BALANCE_BODY,UPPERCASE_25_50
autolearn=disabled version=3.1.0

and the slave:

X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on
xenon2.telemat.um.es
X-Spam-Level: *
X-Spam-Status: Yes, score=5.1 required=5.0
tests=BAYES_80,EXTRA_MPART_TYPE,
HTML_00_10,HTML_MESSAGE,HTML_TAG_BALANCE_BODY,UPPERCASE_25_50
autolearn=disabled version=3.1.0
X-Spam-Report:
*  1.1 EXTRA_MPART_TYPE Header has extraneous
Content-type:...type= entry
*  0.2 HTML_TAG_BALANCE_BODY BODY: HTML has unbalanced body
tags
*  3.0 BAYES_80 BODY: Bayesian spam probability is 80 to 95%
*  [score: 0.9259]
*  0.8 HTML_00_10 BODY: Message is 0% to 10% HTML
*  0.0 HTML_MESSAGE BODY: HTML included in message
*  0.0 UPPERCASE_25_50 message body is 25-50% uppercase

so one of them classified it as spam and the other not. The only
difference I've found is that the master hit the BAYES_60 and the slave
the BAYES_80.

Why this different score? am I synchronizing my servers the right way?

Thanks in advance.

-- 
Angel L. Mateo Martínez
Sección de Telemática
Área de Tecnologías de la Información   _o)
y las Comunicaciones Aplicadas (ATICA)  / \\
http://www.um.es/atica_(___V
Tfo: 968367590
Fax: 968398337




Re: Problem synchronizing database of two spamassassins

2006-11-07 Thread John Andersen
On Tuesday 07 November 2006 00:33, Angel L. Mateo wrote:
  so one of them classified it as spam and the other not. The only
 difference I've found is that the master hit the BAYES_60 and the slave
 the BAYES_80.

 Why this different score? am I synchronizing my servers the right
 way?

So then, you answered your own question.  ;-)

More seriously, are you also copying the bayes database from one
to the other?  

Are you running one site-wide bayes, or individual bases databases
in user accounts?  

Were the files synced BEFORE or AFTER the test message was
scored by the first server?

-- 
_
John Andersen


pgp3JdbD8ZCYs.pgp
Description: PGP signature


Re: Problem synchronizing database of two spamassassins

2006-11-07 Thread Angel L. Mateo
El mar, 07-11-2006 a las 00:58 -0900, John Andersen escribió:
 On Tuesday 07 November 2006 00:33, Angel L. Mateo wrote:
   so one of them classified it as spam and the other not. The only
  difference I've found is that the master hit the BAYES_60 and the slave
  the BAYES_80.
 
  Why this different score? am I synchronizing my servers the right
  way?
 
 So then, you answered your own question.  ;-)
 
I guess I am doing something wrong, but I don't know what neither why
is the correct way to synchronized them.

 More seriously, are you also copying the bayes database from one
 to the other?  
 
Yes, I am copying all files in the /var/lib/amavis/.spamassassin. The
files copied are:

* bayes_journal
* bayes_seen
* bayes_toks
* user_prefs

 Are you running one site-wide bayes, or individual bases databases
 in user accounts?  
 
I am running site-wide bayes, not individual bayes databases.

 Were the files synced BEFORE or AFTER the test message was
 scored by the first server?
 
The files on both servers were synced before I run this test, so
servers are supposed to be using the same bayes database.

-- 
Angel L. Mateo Martínez
Sección de Telemática
Área de Tecnologías de la Información   _o)
y las Comunicaciones Aplicadas (ATICA)  / \\
http://www.um.es/atica_(___V
Tfo: 968367590
Fax: 968398337




Re: Problem synchronizing database of two spamassassins

2006-11-07 Thread Johann Spies
On Tue, Nov 07, 2006 at 11:22:31AM +0100, Angel L. Mateo wrote:
   I am running site-wide bayes, not individual bayes databases.

I am also interested in the answer to your question.  Do you stop spamd
when copying the files or restart it after you have done so?

We have three mail servers an they started out with the same Bayesian
database, and we use the same feedback to feed sa-learn on all three of
them.  Other than that I do not sync them. I also see difference in the
scores from the different machines on the same message.

Would it be possible to rsync the databases while spamd are running?

Regards
Johann
-- 
Johann Spies  Telefoon: 021-808 4036
Informasietegnologie, Universiteit van Stellenbosch

 Jesus said unto her, I am the resurrection, and  
  the life; he that believeth in me, though he were 
  dead, yet shall he live.  John 11:25 


Re: Problem synchronizing database of two spamassassins

2006-11-07 Thread Angel L. Mateo
El mar, 07-11-2006 a las 14:28 +0200, Johann Spies escribió:
 On Tue, Nov 07, 2006 at 11:22:31AM +0100, Angel L. Mateo wrote:
  I am running site-wide bayes, not individual bayes databases.
 
 I am also interested in the answer to your question.  Do you stop spamd
 when copying the files or restart it after you have done so?
 
I copy the files while spamd is running and restart it after the copy.
I run also sa-learn --sync in the slave server.

-- 
Angel L. Mateo Martínez
Sección de Telemática
Área de Tecnologías de la Información   _o)
y las Comunicaciones Aplicadas (ATICA)  / \\
http://www.um.es/atica_(___V
Tfo: 968367590
Fax: 968398337




Re: Problem synchronizing database of two spamassassins

2006-11-07 Thread Mike Kenny
I copy the files while spamd is running and restart it after the copy.I run also sa-learn --sync in the slave server.
Do you run sa-learn --sync on the master?I ask because I wan under the impression that this just synchronized the journal with the database. As you have copied everything across to the slave from the master, it should be in an identical state, until you run the sync, at which stage the DBs are slightly out of sync. I am not sure but suspect that the problem may lie in this area.
mike


Re: Problem synchronizing database of two spamassassins

2006-11-07 Thread Angel L. Mateo
El mar, 07-11-2006 a las 15:37 +0200, Mike Kenny escribió:
 
 I copy the files while spamd is running and restart it
 after the copy.
 I run also sa-learn --sync in the slave server.
 
 
 Do you run sa-learn --sync on the master?
 
In the master and in the slave. I run:

* sa-learn --ham --nosync --showdots ... (master)
* sa-learn --spam --nosync --showdots ... (master)
* sa-learn --sync (master)
* copy files from master to slave
* sa-learn --sync (slave)

 I ask because I wan under the impression that this just synchronized
 the journal with the database. As you have copied everything across to
 the slave from the master, it should be in an identical state, until
 you run the sync, at which stage the DBs are slightly out of sync. I
 am not sure but suspect that the problem may lie in this area. 
 

 
-- 
Angel L. Mateo Martínez
Sección de Telemática
Área de Tecnologías de la Información   _o)
y las Comunicaciones Aplicadas (ATICA)  / \\
http://www.um.es/atica_(___V
Tfo: 968367590
Fax: 968398337