Re: why does SA without autolearn need bayes read-write?
On Wed, 28 Jan 2015 15:58:56 +0100 Reindl Harald wrote: * first: it is a bug to write/lock when auto_expire / auto_learn is off As I said, it's not a bug. The updates are done in case you want to expire later with sa-learn --force-expire. Auto-expiry means performing the expiry automatically when the database goes over its configured token limit. Most people don't do this because the expiry is then done during a classification which can cause a timeout. Setting auto_expire 0 is not a way of telling SA that you aren't going to expire the database. On Wed, 28 Jan 2015 01:03:37 +0100 Reindl Harald wrote: ... even if we decide to kill spam-spamles older than x months it needs to be done properly to the 50% spam / 50% ham ratio which is the reason the bayes works that good The ratio doesn't matter; it's a myth that it should be 50:50 or match the ratio in your mail. What's important is that you learn enough ham and enough spam, and that the training is correct and sufficiently representative. It is preferable that there isn't a big mismatch between the ham/spam ratio in the corpus as a whole and in recently added mail as that can skew the probabilities of new tokens. compared with autolearning setups where everyone i have seen in the past 8 years became worser each month until classify most ham as spam and let thorugh the real crap It works for some, but when it fails it's not because the ratio of spam to ham is wrong, it's because of a combination of mistraining, inadequate ham and poor choices in what's learned.
Re: why does SA without autolearn need bayes read-write?
On 28.01.15 01:03, Reindl Harald wrote: if understand you correctly we agree that there is no reason /var can't be mounted read-only? I do not agree. The whole point of /var is to contain varying data and mounting it read-only defeats the whole purpose of /var. I see following possibilities for you: - move BAYES to a database of any kind - set up SA to learn to journal, and use overlayfs for the journal (rememer to set bayes_journal_max_size big enough), droping it or syncing periodically -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Your mouse has moved. Windows NT will now restart for changes to take to take effect. [OK]
Re: why does SA without autolearn need bayes read-write?
On 27.01.15 18:49, Reindl Harald wrote: the intention of this *global bayes* is *not* to learn or expire anything - the implemented remove from bayes method is just remove the message from the corpus folder and type sa-learn.sh rebuild I believe it's much more effective to expire old tokens that are not appeating in mail than to purge old mail from DB, when you don't know if the tokens are still used or not. I'm afraid you got the expire issue wrong... -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. One World. One Web. One Program. - Microsoft promotional advertisement Ein Volk, ein Reich, ein Fuhrer! - Adolf Hitler
duplicate key value violates unique constraint bayes_seen_pkey
Hi, I am using SA on Debian 7 with all Debian standard packages as well as a PostgreSQL database to store all Bayes data. From time to time I see the following error in the PostgreSQL log file: 2015-01-29 11:56:26 CET ERROR: duplicate key value violates unique constraint bayes_seen_pkey 2015-01-29 11:56:26 CET DETAIL: Key (id, msgid)=(1, 3698286ebfb6cf3d3a265947504be472593011dc@sa_generated) already exists. 2015-01-29 11:56:26 CET STATEMENT: INSERT INTO bayes_seen (id, msgid, flag) VALUES ($1,$2,$3) Is this a normal behavior? Or is this a sign that something might be wrong in my SA config? Regards ML
Re: Genuine mail from Hotmail hitting MALFORMED_FREEMAIL
Hello RW, Sunday, January 25, 2015, 10:55:59 PM, you wrote: R There's not much that can be done about this other than rescore or R remove it entirely. But this rule and MISSING_HEADERS combine to score 3.8 just because the sender put all the recipients in BCC Seems a touch high for this. -- Best regards, Niamhmailto:ni...@fullbore.co.uk pgpz99tSR3Erd.pgp Description: PGP signature
Re: why does SA without autolearn need bayes read-write?
On Thu, 29 Jan 2015, Reindl Harald wrote: Am 29.01.2015 um 10:18 schrieb Matus UHLAR - fantomas: On 28.01.15 01:03, Reindl Harald wrote: if understand you correctly we agree that there is no reason /var can't be mounted read-only? I do not agree. The whole point of /var is to contain varying data and mounting it read-only defeats the whole purpose of /var. i am not talking about a own partition i am talking about a *systemd namespace* and the intention *not* have anything below /var writeable for a network facing service no reason /var can't be mounted read-only does *not* suggest that. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Political Correctness is a doctrine which is based on the premise that it is possible, through nothing more than a suitable choice of words, to pick up a turd by the clean end. --- 3 days until the 12th anniversary of the loss of STS-107 Columbia
Re: why does SA without autolearn need bayes read-write?
Am 29.01.2015 um 16:23 schrieb John Hardin: On Thu, 29 Jan 2015, Reindl Harald wrote: Am 29.01.2015 um 10:18 schrieb Matus UHLAR - fantomas: On 28.01.15 01:03, Reindl Harald wrote: if understand you correctly we agree that there is no reason /var can't be mounted read-only? I do not agree. The whole point of /var is to contain varying data and mounting it read-only defeats the whole purpose of /var. i am not talking about a own partition i am talking about a *systemd namespace* and the intention *not* have anything below /var writeable for a network facing service no reason /var can't be mounted read-only does *not* suggest that * the initial post makes it pretty clear * it was even quoted by fantomas first reply on this thread * i made that clear multiple times Weitergeleitete Nachricht Betreff: Re: why does SA without autolearn need bayes read-write? Datum: Wed, 28 Jan 2015 15:04:26 +0100 Von: Reindl Harald h.rei...@thelounge.net An: users@spamassassin.apache.org no need for mount own partitions on recent linux systems that's what namespaces are for and systemd has easy interfaces Weitergeleitete Nachricht Betreff: Re: why does SA without autolearn need bayes read-write? Datum: Tue, 27 Jan 2015 13:44:33 +0100 Von: Matus UHLAR - fantomas uh...@fantomas.sk An: users@spamassassin.apache.org On 27.01.15 03:01, Reindl Harald wrote: with bayes_auto_learn 0 there is no reason to lock the bayes database and the spamd-service should be happy with ReadOnlyDirectories=/var/lib the bayes databaase contains not only tokens, but also timestamps used for expiration. That's why you need to write to them. Weitergeleitete Nachricht Betreff: why does SA without autolearn need bayes read-write? Datum: Tue, 27 Jan 2015 03:01:10 +0100 Von: Reindl Harald h.rei...@thelounge.net An: Mailing-List spamassassin users@spamassassin.apache.org IMHO that is a bug with bayes_auto_learn 0 there is no reason to lock the bayes database and the spamd-service should be happy with ReadOnlyDirectories=/var/lib training and sa-update is done on a shell independent of network aware services Jan 27 02:52:58 testserver spamd[2794]: bayes: cannot write to /var/lib/spamass-milter/.spamassassin/bayes_journal, bayes db update ignored: Read-only file system Jan 27 02:52:58 testserver spamd[2794]: spamd: clean message (0.5/5.5) for sa-milt:189 in 0.5 seconds, 804 bytes. Jan 27 02:52:58 testserver spamd[2794]: spamd: result: . 0 - ALL_TRUSTED,BAYES_50,T_RP_MATCHES_RCVD scantime=0.5,size=804,user=sa-milt,uid=189,required_score=5.5,rhost=localhost,raddr=127.0.0.1,rport=20782,mid=54c6ef78.8090...@testserver.rhsoft.net,bayes=0.40,autolearn=disabled signature.asc Description: OpenPGP digital signature
Re: why does SA without autolearn need bayes read-write?
Am 29.01.2015 um 10:18 schrieb Matus UHLAR - fantomas: On 28.01.15 01:03, Reindl Harald wrote: if understand you correctly we agree that there is no reason /var can't be mounted read-only? I do not agree. The whole point of /var is to contain varying data and mounting it read-only defeats the whole purpose of /var. i am not talking about a own partition i am talking about a *systemd namespace* and the intention *not* have anything below /var writeable for a network facing service frankly - can we stop to dicuss left and right? i asked for not touch bayes from the spamd service for good reasons, know the setup and there are well considered reasons why every piece is like it is - if it's not possible - can it made possible and is someone willing to implement it for money and how much money - that's it __ I see following possibilities for you: - move BAYES to a database of any kind for sure not, the bayes is build with a script and rsynced to other machines which have to work *independent* from each other and so there is no point in setup a database with replication, failovers and a lot of time-invest when things can be simple - set up SA to learn to journal, and use overlayfs for the journal (rememer to set bayes_journal_max_size big enough), droping it or syncing periodically it is big enough use_learner 1 use_bayes 1 use_bayes_rules 1 bayes_use_hapaxes 1 bayes_expiry_max_db_size 250 bayes_auto_expire 0 bayes_auto_learn 0 bayes_learn_during_report 0 bayes_learn_to_journal 1 the intention of this *global bayes* is *not* to learn or expire anything - the implemented remove from bayes method is just remove the message from the corpus folder and type sa-learn.sh rebuild I believe it's much more effective to expire old tokens that are not appeating in mail than to purge old mail from DB, when you don't know if the tokens are still used or not. I'm afraid you got the expire issue wrong... i got nothing wrong i don't matter if tokens are not used for two months, 10 years expierience shows they re-appear sooner or later and i don't invest hundrets of work-hours to collect thousands of mail samples to have token expire automatically the bayes works *perfectly* and frankly as started with SA a large part of the spam bayes was built by years old archive data signature.asc Description: OpenPGP digital signature
how to change GTUBE default score
Hi, spamassassing is blocking email from my own dowmain because GTUBE score is too high, I tried changing in ~/.spamassassin/user_prefs to score GTUBE 3.0 but no difference. X-Spam-Flag: YES X-Spam-Score: 847.462 X-Spam-Level: X-Spam-Status: Yes, score=847.462 tag=-999 tag2=5.5 kill=5.6 tests=[ALL_TRUSTED=-1, AWL=0.370, BAYES_00=-1.9, GTUBE=1000, HTML_FONT_SIZE_LARGE=0.001, HTML_MESSAGE=0.001, LOCAL_RCVD=-50, T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100] autolearn=no autolearn_force=no any ideas? Thanks, Motty
Re: how to change GTUBE default score
Am 29.01.2015 um 16:39 schrieb Motty Cruz: spamassassing is blocking email from my own dowmain because GTUBE score is too high that's the whole purpose of the GTUBE - testing the contentfilter in general even if sender or rcpt are whitelisted I tried changing in ~/.spamassassin/user_prefs to score GTUBE 3.0 but no difference. depending on how SA is running it needs to be reloaded and make sure you are working in the correct user_prefs, i prefer local.cf for such changes to make them global X-Spam-Flag: YES X-Spam-Score: 847.462 X-Spam-Level: X-Spam-Status: Yes, score=847.462 tag=-999 tag2=5.5 kill=5.6 tests=[ALL_TRUSTED=-1, AWL=0.370, BAYES_00=-1.9, GTUBE=1000, HTML_FONT_SIZE_LARGE=0.001, HTML_MESSAGE=0.001, LOCAL_RCVD=-50, T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100] autolearn=no autolearn_force=no any ideas? change the score of USER_IN_WHITELIST to -1500 signature.asc Description: OpenPGP digital signature
Re: how to change GTUBE default score
On Thu, 29 Jan 2015, Motty Cruz wrote: spamassassing is blocking email from my own dowmain because GTUBE score is too high, I tried changing in ~/.spamassassin/user_prefs to score GTUBE 3.0 but no difference. X-Spam-Flag: YES X-Spam-Score: 847.462 X-Spam-Level: X-Spam-Status: Yes, score=847.462 tag=-999 tag2=5.5 kill=5.6 tests=[ALL_TRUSTED=-1, AWL=0.370, BAYES_00=-1.9, GTUBE=1000, HTML_FONT_SIZE_LARGE=0.001, HTML_MESSAGE=0.001, LOCAL_RCVD=-50, T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100] autolearn=no autolearn_force=no any ideas? GTUB is a test, that is it's *intended* behavior. Why does mail from your domain include that test string? -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Political Correctness is a doctrine which is based on the premise that it is possible, through nothing more than a suitable choice of words, to pick up a turd by the clean end. --- 3 days until the 12th anniversary of the loss of STS-107 Columbia