Problem synchronizing database of two spamassassins
Hello, We have two incoming email servers for our organization. We are running spamassassin in these servers (debian sarge + postfix 2.1.5 + spamassassin 3.1.0a). To syncronize spamassassin's database and journal we copy the /var/lib/amavis/.spamassassin of one server (let's call it the master server) in the other (and run the sa-learn --sync, the slave server). We also do all the learn operations in the master server. With this I thought that these two servers should behave the same way, but I am observing that they scored different the same messages. For example, for one message the master server returns for the command spamc -d master: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on xenon1.telemat.um.es X-Spam-Level: *** X-Spam-Status: No, score=3.1 required=5.0 tests=BAYES_60,EXTRA_MPART_TYPE, HTML_00_10,HTML_MESSAGE,HTML_TAG_BALANCE_BODY,UPPERCASE_25_50 autolearn=disabled version=3.1.0 and the slave: X-Spam-Flag: YES X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on xenon2.telemat.um.es X-Spam-Level: * X-Spam-Status: Yes, score=5.1 required=5.0 tests=BAYES_80,EXTRA_MPART_TYPE, HTML_00_10,HTML_MESSAGE,HTML_TAG_BALANCE_BODY,UPPERCASE_25_50 autolearn=disabled version=3.1.0 X-Spam-Report: * 1.1 EXTRA_MPART_TYPE Header has extraneous Content-type:...type= entry * 0.2 HTML_TAG_BALANCE_BODY BODY: HTML has unbalanced body tags * 3.0 BAYES_80 BODY: Bayesian spam probability is 80 to 95% * [score: 0.9259] * 0.8 HTML_00_10 BODY: Message is 0% to 10% HTML * 0.0 HTML_MESSAGE BODY: HTML included in message * 0.0 UPPERCASE_25_50 message body is 25-50% uppercase so one of them classified it as spam and the other not. The only difference I've found is that the master hit the BAYES_60 and the slave the BAYES_80. Why this different score? am I synchronizing my servers the right way? Thanks in advance. -- Angel L. Mateo Martínez Sección de Telemática Área de Tecnologías de la Información _o) y las Comunicaciones Aplicadas (ATICA) / \\ http://www.um.es/atica_(___V Tfo: 968367590 Fax: 968398337
Re: Problem synchronizing database of two spamassassins
On Tuesday 07 November 2006 00:33, Angel L. Mateo wrote: so one of them classified it as spam and the other not. The only difference I've found is that the master hit the BAYES_60 and the slave the BAYES_80. Why this different score? am I synchronizing my servers the right way? So then, you answered your own question. ;-) More seriously, are you also copying the bayes database from one to the other? Are you running one site-wide bayes, or individual bases databases in user accounts? Were the files synced BEFORE or AFTER the test message was scored by the first server? -- _ John Andersen pgp3JdbD8ZCYs.pgp Description: PGP signature
Re: Problem synchronizing database of two spamassassins
El mar, 07-11-2006 a las 00:58 -0900, John Andersen escribió: On Tuesday 07 November 2006 00:33, Angel L. Mateo wrote: so one of them classified it as spam and the other not. The only difference I've found is that the master hit the BAYES_60 and the slave the BAYES_80. Why this different score? am I synchronizing my servers the right way? So then, you answered your own question. ;-) I guess I am doing something wrong, but I don't know what neither why is the correct way to synchronized them. More seriously, are you also copying the bayes database from one to the other? Yes, I am copying all files in the /var/lib/amavis/.spamassassin. The files copied are: * bayes_journal * bayes_seen * bayes_toks * user_prefs Are you running one site-wide bayes, or individual bases databases in user accounts? I am running site-wide bayes, not individual bayes databases. Were the files synced BEFORE or AFTER the test message was scored by the first server? The files on both servers were synced before I run this test, so servers are supposed to be using the same bayes database. -- Angel L. Mateo Martínez Sección de Telemática Área de Tecnologías de la Información _o) y las Comunicaciones Aplicadas (ATICA) / \\ http://www.um.es/atica_(___V Tfo: 968367590 Fax: 968398337
Re: Problem synchronizing database of two spamassassins
On Tue, Nov 07, 2006 at 11:22:31AM +0100, Angel L. Mateo wrote: I am running site-wide bayes, not individual bayes databases. I am also interested in the answer to your question. Do you stop spamd when copying the files or restart it after you have done so? We have three mail servers an they started out with the same Bayesian database, and we use the same feedback to feed sa-learn on all three of them. Other than that I do not sync them. I also see difference in the scores from the different machines on the same message. Would it be possible to rsync the databases while spamd are running? Regards Johann -- Johann Spies Telefoon: 021-808 4036 Informasietegnologie, Universiteit van Stellenbosch Jesus said unto her, I am the resurrection, and the life; he that believeth in me, though he were dead, yet shall he live. John 11:25
Re: Problem synchronizing database of two spamassassins
El mar, 07-11-2006 a las 14:28 +0200, Johann Spies escribió: On Tue, Nov 07, 2006 at 11:22:31AM +0100, Angel L. Mateo wrote: I am running site-wide bayes, not individual bayes databases. I am also interested in the answer to your question. Do you stop spamd when copying the files or restart it after you have done so? I copy the files while spamd is running and restart it after the copy. I run also sa-learn --sync in the slave server. -- Angel L. Mateo Martínez Sección de Telemática Área de Tecnologías de la Información _o) y las Comunicaciones Aplicadas (ATICA) / \\ http://www.um.es/atica_(___V Tfo: 968367590 Fax: 968398337
Re: Problem synchronizing database of two spamassassins
I copy the files while spamd is running and restart it after the copy.I run also sa-learn --sync in the slave server. Do you run sa-learn --sync on the master?I ask because I wan under the impression that this just synchronized the journal with the database. As you have copied everything across to the slave from the master, it should be in an identical state, until you run the sync, at which stage the DBs are slightly out of sync. I am not sure but suspect that the problem may lie in this area. mike
Re: Problem synchronizing database of two spamassassins
El mar, 07-11-2006 a las 15:37 +0200, Mike Kenny escribió: I copy the files while spamd is running and restart it after the copy. I run also sa-learn --sync in the slave server. Do you run sa-learn --sync on the master? In the master and in the slave. I run: * sa-learn --ham --nosync --showdots ... (master) * sa-learn --spam --nosync --showdots ... (master) * sa-learn --sync (master) * copy files from master to slave * sa-learn --sync (slave) I ask because I wan under the impression that this just synchronized the journal with the database. As you have copied everything across to the slave from the master, it should be in an identical state, until you run the sync, at which stage the DBs are slightly out of sync. I am not sure but suspect that the problem may lie in this area. -- Angel L. Mateo Martínez Sección de Telemática Área de Tecnologías de la Información _o) y las Comunicaciones Aplicadas (ATICA) / \\ http://www.um.es/atica_(___V Tfo: 968367590 Fax: 968398337