Re: why does SA without autolearn need bayes read-write?

2015-01-29 Thread RW
On Wed, 28 Jan 2015 15:58:56 +0100
Reindl Harald wrote:


 * first:  it is a bug to write/lock when auto_expire / auto_learn is
 off

As I said, it's not a bug. The updates are done in case you want to
expire later with  sa-learn --force-expire. 

Auto-expiry means performing the expiry automatically when the database
goes over its configured token limit. Most people don't do this because
the expiry is then  done during a classification which can cause
a timeout.

Setting auto_expire 0 is not a way of telling SA that you aren't going
to expire the database.



On Wed, 28 Jan 2015 01:03:37 +0100
Reindl Harald wrote:

 ...   even if we decide to kill spam-spamles older than x
 months it needs to be done properly to the 50% spam / 50% ham
 ratio which is the reason the bayes works that good 

The ratio doesn't matter; it's a myth that it should be 50:50 or match
the ratio in your mail. 


What's important is that you learn enough ham and enough spam, and that
the training is correct and sufficiently representative. It is
preferable that there isn't a big mismatch between the ham/spam ratio
in the corpus as a whole and in recently added mail as that can skew
the probabilities of new tokens.


 compared with
 autolearning setups where everyone i have seen in the past 8 years
 became worser each month until classify most ham as spam and let
 thorugh the real crap

It works for some, but when it fails it's not because the ratio of
spam to ham is wrong, it's because of a combination of mistraining,
inadequate ham and poor choices in what's learned. 


Re: why does SA without autolearn need bayes read-write?

2015-01-29 Thread Matus UHLAR - fantomas

On 28.01.15 01:03, Reindl Harald wrote:
if understand you correctly we agree that there is no reason /var 
can't be mounted read-only?


I do not agree. The whole point of /var is to contain varying data and
mounting it read-only defeats the whole purpose of /var.

I see following possibilities for you:
- move BAYES to a database of any kind
- set up SA to learn to journal, and use overlayfs for the journal
  (rememer to set bayes_journal_max_size big enough),
  droping it or syncing periodically

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Your mouse has moved. Windows NT will now restart for changes to take
to take effect. [OK]


Re: why does SA without autolearn need bayes read-write?

2015-01-29 Thread Matus UHLAR - fantomas

On 27.01.15 18:49, Reindl Harald wrote:
the intention of this *global bayes* is *not* to learn or expire 
anything - the implemented remove from bayes method is just remove 
the message from the corpus folder and type sa-learn.sh rebuild


I believe it's much more effective to expire old tokens that are not appeating
in mail than to purge old mail from DB, when you don't know if the tokens
are still used or not.

I'm afraid you got the expire issue wrong...
--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
One World. One Web. One Program. - Microsoft promotional advertisement
Ein Volk, ein Reich, ein Fuhrer! - Adolf Hitler


duplicate key value violates unique constraint bayes_seen_pkey

2015-01-29 Thread ML mail
Hi,

I am using SA on Debian 7 with all Debian standard packages as well as a 
PostgreSQL database to store all Bayes data. 


From time to time I see the following error in the PostgreSQL log file:


2015-01-29 11:56:26 CET ERROR:  duplicate key value violates unique constraint 
bayes_seen_pkey
2015-01-29 11:56:26 CET DETAIL:  Key (id, msgid)=(1, 
3698286ebfb6cf3d3a265947504be472593011dc@sa_generated) already exists.
2015-01-29 11:56:26 CET STATEMENT:  INSERT INTO bayes_seen (id, msgid, flag)
VALUES ($1,$2,$3)

Is this a normal behavior? Or is this a sign that something might be wrong in 
my SA config?

Regards
ML


Re: Genuine mail from Hotmail hitting MALFORMED_FREEMAIL

2015-01-29 Thread Niamh Holding

Hello RW,

Sunday, January 25, 2015, 10:55:59 PM, you wrote:

R There's not much that can be done about this other than rescore or
R remove it entirely. 

But this rule and MISSING_HEADERS combine to score 3.8 just because the
sender put all the recipients in BCC

Seems a touch high for this.

-- 
Best regards,
 Niamhmailto:ni...@fullbore.co.uk

pgpz99tSR3Erd.pgp
Description: PGP signature


Re: why does SA without autolearn need bayes read-write?

2015-01-29 Thread John Hardin

On Thu, 29 Jan 2015, Reindl Harald wrote:



Am 29.01.2015 um 10:18 schrieb Matus UHLAR - fantomas:

 On 28.01.15 01:03, Reindl Harald wrote:
  if understand you correctly we agree that there is no reason /var
  can't be mounted read-only?

 I do not agree. The whole point of /var is to contain varying data and
 mounting it read-only defeats the whole purpose of /var.


i am not talking about a own partition

i am talking about a *systemd namespace* and the intention *not* have 
anything below /var writeable for a network facing service


no reason /var can't be mounted read-only does *not* suggest that.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Political Correctness is a doctrine which is based on the premise
  that it is possible, through nothing more than a suitable choice
  of words, to pick up a turd by the clean end.
---
 3 days until the 12th anniversary of the loss of STS-107 Columbia


Re: why does SA without autolearn need bayes read-write?

2015-01-29 Thread Reindl Harald

Am 29.01.2015 um 16:23 schrieb John Hardin:

On Thu, 29 Jan 2015, Reindl Harald wrote:

Am 29.01.2015 um 10:18 schrieb Matus UHLAR - fantomas:

 On 28.01.15 01:03, Reindl Harald wrote:
  if understand you correctly we agree that there is no reason /var
  can't be mounted read-only?

 I do not agree. The whole point of /var is to contain varying data and
 mounting it read-only defeats the whole purpose of /var.


i am not talking about a own partition

i am talking about a *systemd namespace* and the intention *not* have
anything below /var writeable for a network facing service


no reason /var can't be mounted read-only does *not* suggest that


* the initial post makes it pretty clear
* it was even quoted by fantomas first reply on this thread
* i made that clear multiple times

 Weitergeleitete Nachricht 
Betreff: Re: why does SA without autolearn need bayes read-write?
Datum: Wed, 28 Jan 2015 15:04:26 +0100
Von: Reindl Harald h.rei...@thelounge.net
An: users@spamassassin.apache.org

no need for mount own partitions on recent linux systems
that's what namespaces are for and systemd has easy interfaces

 Weitergeleitete Nachricht 
Betreff: Re: why does SA without autolearn need bayes read-write?
Datum: Tue, 27 Jan 2015 13:44:33 +0100
Von: Matus UHLAR - fantomas uh...@fantomas.sk
An: users@spamassassin.apache.org

On 27.01.15 03:01, Reindl Harald wrote:
 with bayes_auto_learn 0 there is no reason to lock the bayes
 database and the spamd-service should be happy with
 ReadOnlyDirectories=/var/lib

the bayes databaase contains not only tokens, but also timestamps used 
for expiration. That's why you need to write to them.


 Weitergeleitete Nachricht 
Betreff: why does SA without autolearn need bayes read-write?
Datum: Tue, 27 Jan 2015 03:01:10 +0100
Von: Reindl Harald h.rei...@thelounge.net
An: Mailing-List spamassassin users@spamassassin.apache.org

IMHO that is a bug

with bayes_auto_learn 0 there is no reason to lock the bayes database
and the spamd-service should be happy with ReadOnlyDirectories=/var/lib

training and sa-update is done on a shell independent of network aware
services

Jan 27 02:52:58 testserver spamd[2794]: bayes: cannot write to
/var/lib/spamass-milter/.spamassassin/bayes_journal, bayes db update
ignored: Read-only file system
Jan 27 02:52:58 testserver spamd[2794]: spamd: clean message (0.5/5.5)
for sa-milt:189 in 0.5 seconds, 804 bytes.
Jan 27 02:52:58 testserver spamd[2794]: spamd: result: . 0 -
ALL_TRUSTED,BAYES_50,T_RP_MATCHES_RCVD
scantime=0.5,size=804,user=sa-milt,uid=189,required_score=5.5,rhost=localhost,raddr=127.0.0.1,rport=20782,mid=54c6ef78.8090...@testserver.rhsoft.net,bayes=0.40,autolearn=disabled



signature.asc
Description: OpenPGP digital signature


Re: why does SA without autolearn need bayes read-write?

2015-01-29 Thread Reindl Harald


Am 29.01.2015 um 10:18 schrieb Matus UHLAR - fantomas:

On 28.01.15 01:03, Reindl Harald wrote:

if understand you correctly we agree that there is no reason /var
can't be mounted read-only?


I do not agree. The whole point of /var is to contain varying data and
mounting it read-only defeats the whole purpose of /var.


i am not talking about a own partition

i am talking about a *systemd namespace* and the intention *not* have 
anything below /var writeable for a network facing service


frankly - can we stop to dicuss left and right?

i asked for not touch bayes from the spamd service for good reasons, 
know the setup and there are well considered reasons why every piece is 
like it is - if it's not possible - can it made possible and is someone 
willing to implement it for money and how much money - that's it

__


I see following possibilities for you:
- move BAYES to a database of any kind


for sure not, the bayes is build with a script and rsynced to other 
machines which have to work *independent* from each other and so there 
is no point in setup a database with replication, failovers and a lot of 
time-invest when things can be simple



- set up SA to learn to journal, and use overlayfs for the journal
   (rememer to set bayes_journal_max_size big enough),
   droping it or syncing periodically


it is big enough

use_learner 1
use_bayes 1
use_bayes_rules 1
bayes_use_hapaxes 1
bayes_expiry_max_db_size 250
bayes_auto_expire 0
bayes_auto_learn 0
bayes_learn_during_report 0
bayes_learn_to_journal 1

 the intention of this *global bayes* is *not* to learn or expire
 anything - the implemented remove from bayes method is just remove
 the message from the corpus folder and type sa-learn.sh rebuild

 I believe it's much more effective to expire old tokens
 that are not appeating in mail than to purge old mail
 from DB, when you don't know if the tokens
 are still used or not.

 I'm afraid you got the expire issue wrong...

i got nothing wrong

i don't matter if tokens are not used for two months, 10 years 
expierience shows they re-appear sooner or later and i don't invest 
hundrets of work-hours to collect thousands of mail samples to have 
token expire automatically


the bayes works *perfectly* and frankly as started with SA a large part 
of the spam bayes was built by years old archive data




signature.asc
Description: OpenPGP digital signature


how to change GTUBE default score

2015-01-29 Thread Motty Cruz

Hi,
spamassassing is blocking email from my own dowmain because GTUBE score 
is too high,
I tried changing in ~/.spamassassin/user_prefs to score GTUBE 3.0 but 
no difference.


X-Spam-Flag: YES
X-Spam-Score: 847.462
X-Spam-Level: 


X-Spam-Status: Yes, score=847.462 tag=-999 tag2=5.5 kill=5.6
tests=[ALL_TRUSTED=-1, AWL=0.370, BAYES_00=-1.9, GTUBE=1000,
HTML_FONT_SIZE_LARGE=0.001, HTML_MESSAGE=0.001, LOCAL_RCVD=-50,
T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100]
autolearn=no autolearn_force=no

any ideas?

Thanks,
Motty


Re: how to change GTUBE default score

2015-01-29 Thread Reindl Harald


Am 29.01.2015 um 16:39 schrieb Motty Cruz:

spamassassing is blocking email from my own dowmain because GTUBE score
is too high


that's the whole purpose of the GTUBE - testing the contentfilter in 
general even if sender or rcpt are whitelisted



I tried changing in ~/.spamassassin/user_prefs to score GTUBE 3.0 but
no difference.


depending on how SA is running it needs to be reloaded and make sure you 
are working in the correct user_prefs, i prefer local.cf for such 
changes to make them global



X-Spam-Flag: YES
X-Spam-Score: 847.462
X-Spam-Level:

X-Spam-Status: Yes, score=847.462 tag=-999 tag2=5.5 kill=5.6
 tests=[ALL_TRUSTED=-1, AWL=0.370, BAYES_00=-1.9, GTUBE=1000,
 HTML_FONT_SIZE_LARGE=0.001, HTML_MESSAGE=0.001, LOCAL_RCVD=-50,
 T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100]
 autolearn=no autolearn_force=no

any ideas?


change the score of USER_IN_WHITELIST to -1500



signature.asc
Description: OpenPGP digital signature


Re: how to change GTUBE default score

2015-01-29 Thread John Hardin

On Thu, 29 Jan 2015, Motty Cruz wrote:

spamassassing is blocking email from my own dowmain because GTUBE score is 
too high,
I tried changing in ~/.spamassassin/user_prefs to score GTUBE 3.0 but no 
difference.


X-Spam-Flag: YES
X-Spam-Score: 847.462
X-Spam-Level: 


X-Spam-Status: Yes, score=847.462 tag=-999 tag2=5.5 kill=5.6
tests=[ALL_TRUSTED=-1, AWL=0.370, BAYES_00=-1.9, GTUBE=1000,
HTML_FONT_SIZE_LARGE=0.001, HTML_MESSAGE=0.001, LOCAL_RCVD=-50,
T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100]
autolearn=no autolearn_force=no

any ideas?


GTUB is a test, that is it's *intended* behavior.

Why does mail from your domain include that test string?

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Political Correctness is a doctrine which is based on the premise
  that it is possible, through nothing more than a suitable choice
  of words, to pick up a turd by the clean end.
---
 3 days until the 12th anniversary of the loss of STS-107 Columbia