Re: pyzor: check failed: internal error, python traceback seen in response

2014-07-01 Thread Matus UHLAR - fantomas

On 06/30/2014 08:58 PM, Steve Bergman wrote:

I'm getting:

pyzor: check failed: internal error, python traceback seen in response

I'm running Ubuntu 10.04 on the server, with the Ubuntu provided packages.


On 30.06.14 21:15, Axb wrote:

time to update...



pyzor 1:0.5.0-0ubuntu2



ancient, buggy, EOL  version


for both issues, you should ask help on ubuntu. I have no idea whether 10.04
is supported still (is that LTS version?) but ubuntu should take care about
such issues if it's supported (well, that's what support means)
--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
REALITY.SYS corrupted. Press any key to reboot Universe.


Re: Bayer Filter Not Working

2014-07-01 Thread Herbert J. Skuhra

Den 25.06.2014 00:42, skrev Bruce Sackett:

I apologize, I’m sure it’s been covered, but I have not been
successful finding results in searches on the web or through the
history of the list.  I get no BAYES results in the headers, so I
don’t see any working.  The part that gets me is below:

Jun 24 13:47:53.165 [3245] dbg: bayes: tie-ing to DB file R/O
/var/lib/amavis/.spamassassin/bayes_toks
Jun 24 13:47:53.166 [3245] dbg: bayes: tie-ing to DB file R/O
/var/lib/amavis/.spamassassin/bayes_seen
Jun 24 13:47:53.167 [3245] dbg: bayes: found bayes db version 3
Jun 24 13:47:53.167 [3245] warn: plugin: eval failed: Insecure
dependency in sprintf while running with -T switch at
/usr/local/share/perl/5.14.2/Mail/SpamAssassin/Logger.pm line 241.
Jun 24 13:47:53.168 [3245] dbg: config: score set 0 chosen.

That seems to be the last time Bayes is referenced in a spamassassin -D 
—lint


Has anyone else run into this?  I am using an Ubuntu 12.04 server, if
that makes any difference.


I have the same problem on FreeBSD:

Jul  1 05:33:51.765 [43144] dbg: bayes: learner_new 
self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x805b09f78), 
bayes_store_module=Mail::SpamAssassin::BayesStore::DBM
Jul  1 05:33:51.778 [43144] dbg: bayes: learner_new: got 
store=Mail::SpamAssassin::BayesStore::DBM=HASH(0x806108798)
Jul  1 05:33:51.779 [43144] dbg: bayes: tie-ing to DB file R/O 
/var/amavis/.spamassassin/bayes_toks
Jul  1 05:33:51.779 [43144] dbg: bayes: tie-ing to DB file R/O 
/var/amavis/.spamassassin/bayes_seen

Jul  1 05:33:51.779 [43144] dbg: bayes: found bayes db version 3
Jul  1 05:33:51.779 [43144] warn: plugin: eval failed: Insecure 
dependency in sprintf while running with -T switch at 
/usr/local/lib/perl5/site_perl/5.16/Mail/SpamAssassin/Logger.pm line 
241.
Jul  1 05:33:51.799 [43144] warn: plugin: eval failed: Insecure 
dependency in sprintf while running with -T switch at 
/usr/local/lib/perl5/site_perl/5.16/Mail/SpamAssassin/Logger.pm line 
241.


Running 'sa-learn --force-expire' seems to resolve the issue temporally.

Jul  1 09:35:06.084 [49647] dbg: bayes: learner_new 
self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x805b09f78), 
bayes_store_module=Mail::SpamAssassin::BayesStore::DBM
Jul  1 09:35:06.097 [49647] dbg: bayes: learner_new: got 
store=Mail::SpamAssassin::BayesStore::DBM=HASH(0x806108798)
Jul  1 09:35:06.098 [49647] dbg: bayes: tie-ing to DB file R/O 
/var/amavis/.spamassassin/bayes_toks
Jul  1 09:35:06.098 [49647] dbg: bayes: tie-ing to DB file R/O 
/var/amavis/.spamassassin/bayes_seen

Jul  1 09:35:06.098 [49647] dbg: bayes: found bayes db version 3
Jul  1 09:35:06.099 [49647] dbg: bayes: DB journal sync: last sync: 0
Jul  1 09:35:06.570 [49647] dbg: bayes: DB journal sync: last sync: 0
Jul  1 09:35:06.570 [49647] dbg: bayes: corpus size: nspam = 120857, 
nham = 664988


After a while the error returns. Do I have to wipe my bayes DB?

--
Herbert


Changes in Spamhaus DBL DNSBL return codes

2014-07-01 Thread Axb

As per:

http://www.spamhaus.org/news/article/713/

Return CodesTypeNote
127.0.1.2   spam domain 
127.0.1.3	spammed redirector / url shortener	(Phased out on January 7th, 
2015)

127.0.1.4   phish domain
127.0.1.5   malware domain  
127.0.1.6   Botnet CC domain   
127.0.1.102 abused legit spam   
127.0.1.103 abused legit redirector / url shortener 
127.0.1.104 abused legit phish  
127.0.1.105 abused legit malware
127.0.1.106 abused legit botnet CC 
127.0.1.255 IP queries prohibited!  


Rules have been updated (SA Bug 7056 - 2014-06-17) to reflect this.
Please run sa-update to get the updated rules  scores.

Axb


Re: pyzor: check failed: internal error, python traceback seen in response

2014-07-01 Thread Steve Bergman

Hmmm...

My original question was where's the traceback. Not whether this or 
that project chooses to abandon its stable releases. Ubuntu 10.04 LTS 
Server is supported until May 2015. And similar time-frame releases of 
SA and Pyzor are supported until 2020 in RHEL/Scientific Linux/Centos.


I'm sure that bugs have been fixed, and new ones introduced, in later 
versions of both packages.


All I really want is to find some diagnostic output. When I run Pyzor 
from the command line on the same emails it returns without an error.


-Steve Bergman


Re: pyzor: check failed: internal error, python traceback seen in response

2014-07-01 Thread Steve Bergman

pyzor 1:0.5.0-0ubuntu2



ancient, buggy, EOL  version


Interestingly, pyzor 0.7.0 (the latest stable version) gives the same 
error. And SA is not preserving the diagnostic output from it for the 
admin to view, even with debuging turned on in both packages. Looks like 
the bugs are in Spamassassin. I guess I'm not sure why such buggy 
software would ever have been released as gold in the first place.


-Steve


Re: pyzor: check failed: internal error, python traceback seen in response

2014-07-01 Thread Steve Bergman

On 06/30/2014 02:15 PM, Axb wrote:

As you don't mention what gue you use with SA it's hard to guess where
your Pyzor config files should be.


I guess I'm not quite sure what gue I am using with SA. Where would I 
find that?


Re: pyzor: check failed: internal error, python traceback seen in response

2014-07-01 Thread Axb

On 07/01/2014 02:57 PM, Steve Bergman wrote:

On 06/30/2014 02:15 PM, Axb wrote:

As you don't mention what gue you use with SA it's hard to guess where
your Pyzor config files should be.


I guess I'm not quite sure what gue I am using with SA. Where would I
find that?


phatfingers meant glue - how do you interface spamassassin with your 
MTA/MUA


amavisd, procmail, some milter, etc..
running under what user etc



Re: getting tons of SPAM

2014-07-01 Thread motty cruz
Hello, I am trying to manipulate spamassassin scores, I am getting lots of
SPAM with very low score.

X-Virus-Scanned: amavisd-new at fqdn.com
X-Spam-Flag: NO
X-Spam-Score: 0.003
X-Spam-Level:
X-Spam-Status: No, score=0.003 tagged_above=-999 required=5.3
tests=[DKIM_SIGNED=0.001, HTML_IMAGE_RATIO_06=0.001,
HTML_MESSAGE=0.001, T_DKIM_INVALID=0.01, T_RP_MATCHES_RCVD=-0.01]
autolearn=unavailable
Authentication-Results: maria.fqdn.com (amavisd-new);
dkim=fail (1024-bit key) reason=fail (body has been altered)
header.d=dttusa.com


Please help,
Thanks


On Fri, Jun 27, 2014 at 8:16 AM, Matus UHLAR - fantomas uh...@fantomas.sk
wrote:

 On 27.06.14 07:50, motty cruz wrote:

 I can't figureout why spammy email get very little score,


  X-Quarantine-ID: 4QFxoaNchYOk
 X-Virus-Scanned: amavisd-new at fqdn.com
 X-Amavis-Alert: BAD HEADER SECTION, MIME error: error: unexpected end of
header


 This might explain much. seems that the mail was broken somehow.
 Did you use default configs for spamassassin and amavis?


  X-Spam-Flag: NO
 X-Spam-Score: 0.102
 X-Spam-Level:
 X-Spam-Status: No, score=0.102 tagged_above=-999 required=5.3
tests=[AWL=0.311, DKIM_SIGNED=0.001, DKIM_VALID=-0.1,
DKIM_VALID_AU=-0.1, DKIM_VERIFIED=-0.001, HTML_MESSAGE=0.001,
T_RP_MATCHES_RCVD=-0.01]
 ---
 Received: by bell.cuxrrb.com id hllmas0e97ct for mo...@fdqn.com; Fri,
 27
 Jun 2014 08:58:12 -0400 (envelope-from life-motty+5F=f...@cuxrrb.com)
 From: Pimsleur Approach l...@cuxrrb.com
 Date: Fri, 27 Jun 2014 08:58:12 -0400
 Subject: Want to speak a foreign language but don't have a lot of time?

 Reply-To: reply-b89161365ddc621bf5b4340f26597...@cuxrrb.com
 Message-ID: b89161365ddc621bf5b4340f2659783e095437-2598-hINbimNU@
 cuxrrb.com


  MIME-Version: 1.0
 Content-Type: multipart/alternative;
 boundary=b89161365ddc621bf5b4340f2659783e69.692014062755451


 --
 Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
 Warning: I wish NOT to receive e-mail advertising to this address.
 Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
 10 GOTO 10 : REM (C) Bill Gates 1998, All Rights Reserved!



Re: getting tons of SPAM

2014-07-01 Thread Matus UHLAR - fantomas

On 27.06.14 07:50, motty cruz wrote:

 X-Quarantine-ID: 4QFxoaNchYOk
X-Virus-Scanned: amavisd-new at fqdn.com
X-Amavis-Alert: BAD HEADER SECTION, MIME error: error: unexpected end of
   header



On Fri, Jun 27, 2014 at 8:16 AM, Matus UHLAR - fantomas uh...@fantomas.sk
wrote:

This might explain much. seems that the mail was broken somehow.
Did you use default configs for spamassassin and amavis?


On 01.07.14 07:48, motty cruz wrote:

Hello, I am trying to manipulate spamassassin scores, I am getting lots of
SPAM with very low score.



you haven't answered my question, have you?

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Chernobyl was an Windows 95 beta test site.


Re: getting tons of SPAM

2014-07-01 Thread motty cruz
maybe I missed your questions,
was this your questions  Did you use default configs for spamassassin and
amavis?

because if it is, I replied immediately, here is my response again,

yes I was using default configurations except for language scores I added
some time ago. 

Thanks,


On Tue, Jul 1, 2014 at 8:49 AM, Matus UHLAR - fantomas uh...@fantomas.sk
wrote:

 On 27.06.14 07:50, motty cruz wrote:

  X-Quarantine-ID: 4QFxoaNchYOk
 X-Virus-Scanned: amavisd-new at fqdn.com
 X-Amavis-Alert: BAD HEADER SECTION, MIME error: error: unexpected end of
header


  On Fri, Jun 27, 2014 at 8:16 AM, Matus UHLAR - fantomas 
 uh...@fantomas.sk
 wrote:

 This might explain much. seems that the mail was broken somehow.
 Did you use default configs for spamassassin and amavis?


 On 01.07.14 07:48, motty cruz wrote:

 Hello, I am trying to manipulate spamassassin scores, I am getting lots of
 SPAM with very low score.



 you haven't answered my question, have you?


 --
 Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
 Warning: I wish NOT to receive e-mail advertising to this address.
 Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
 Chernobyl was an Windows 95 beta test site.



Re: pyzor: check failed: internal error (strace to the rescue)

2014-07-01 Thread Steve Bergman
OK. So I replaced pyzor with a dash script to run it under strace and 
log the output to to a file. What it was complaining about was (drum 
roll, please) the permissions on /home/pyzor/servers. Which is odd, 
because I'm pretty sure I set that file to be world readable and world 
writable for testing purposes. But when I checked again it was owned by 
root with 600 permissions.


If we assume that I had a temporary brain aneurysm or mini-stroke or 
something when I thought I was doing that, it explains both why it 
wasn't working then, and why it wasn't working with aliases earlier, 
since aliases don't have home directories to even have servers files 
with permissions on them.


Or perhaps I didn't have a brain stroke, and those permissions changed. 
I'll be monitoring that. But at least I'm past the it doesn't work and 
I have no idea why stage. And that's very nice, indeed.


-Steve Bergman


Re: getting tons of SPAM

2014-07-01 Thread Steve Bergman

Hey motty cruz,

I just moved our 100 users over from our ISP's mail servers to our own. 
Apparently, the ISP's mail servers were doing remarkably well. Because 
it turns out that we get some 5000 spams a day, and users were getting 
essentially no spam.


Then I upgraded us to a new OS on our Debian/X2Go/MATE desktop server, 
and move us to our own mail server, and the spam was coming through like 
water through the sluice gates of a dam.


It didn't help that I'd moved everyone from Evolution to Thunderbird. So 
the client bayesian spam filters were completely untrained.


So I installed SA on the server. That helped. But it wasn't enough. I 
compiled up DCC and and installed Pyzor, and that helped some. (Though 
SA's Pyzor support had some teething problems, as you can see from my 
recent posts, which I think may be now resolved.)


What SA really needs if for its own Bayesian filter to kick in. But to 
be used at all, you need at least 200 ham and 200 spam messages 
registered with it.


i.e. if you have to have a way to train the filter. I don't really have 
much confidence in autolearn. And I'm a little scared of it. So I 
turned it off. We use Dovecot. So I used the dovecot-antispam plugin to 
automatically train SA when mail gets moved in or out of the junk 
folder. (It handles the moving of mail from Junk into Trash or regular 
folders intelligently and appropriately.)


But that only solved half the problem. You need 200 hams and 200 spams. 
Mail was not getting marked as ham when it went into the Inboxes. So I 
wrote a script that could be called from the users' .forward files to 
mark messages as ham. Then if the user, or Thunderbird's own spam filter 
chooses to move it to Junk, it gets relearned as spam.


Finally, to deal with many of the false positives I was getting with SA, 
I wrote a script, executed from cron, which takes new mail in the users' 
Sent folders, and whitelists them with spamassassin in the users' own 
individual user_prefs files.


This is what it took before I was really happy with the performance of 
SA. Well... that and adding a 1 second sleep after connection in the 
Postfix configuration. That made a huge difference. But our mail volume 
is small enough that the 1 second sleep doesn't cause any problems as it 
would on a really high volume server.


I hope that rough outline is helpful to you in some way.

However, having come through all that, I find myself wondering if we 
should simply impose capital punishment for the crime of spamming, or if 
more drastic action is indicated. ;-)




Re: getting tons of SPAM

2014-07-01 Thread Jeremy McSpadden
No mention of RBLs or greylisting ...

--
Jeremy McSpadden
Flux Labs | http://www.fluxlabs.nethttp://www.fluxlabs.net/ | Endless 
Solutions
Office : 850-250-5590x501tel:850-250-5590;501 | Cell : 
850-890-2543tel:850-890-2543 | Fax : 850-254-2955tel:850-254-2955

On Jul 1, 2014, at 2:06 PM, Steve Bergman 
sbergma...@gmail.commailto:sbergma...@gmail.com wrote:

Hey motty cruz,

I just moved our 100 users over from our ISP's mail servers to our own. 
Apparently, the ISP's mail servers were doing remarkably well. Because it turns 
out that we get some 5000 spams a day, and users were getting essentially no 
spam.

Then I upgraded us to a new OS on our Debian/X2Go/MATE desktop server, and move 
us to our own mail server, and the spam was coming through like water through 
the sluice gates of a dam.

It didn't help that I'd moved everyone from Evolution to Thunderbird. So the 
client bayesian spam filters were completely untrained.

So I installed SA on the server. That helped. But it wasn't enough. I compiled 
up DCC and and installed Pyzor, and that helped some. (Though SA's Pyzor 
support had some teething problems, as you can see from my recent posts, which 
I think may be now resolved.)

What SA really needs if for its own Bayesian filter to kick in. But to be used 
at all, you need at least 200 ham and 200 spam messages registered with it.

i.e. if you have to have a way to train the filter. I don't really have much 
confidence in autolearn. And I'm a little scared of it. So I turned it off. 
We use Dovecot. So I used the dovecot-antispam plugin to automatically train SA 
when mail gets moved in or out of the junk folder. (It handles the moving of 
mail from Junk into Trash or regular folders intelligently and appropriately.)

But that only solved half the problem. You need 200 hams and 200 spams. Mail 
was not getting marked as ham when it went into the Inboxes. So I wrote a 
script that could be called from the users' .forward files to mark messages as 
ham. Then if the user, or Thunderbird's own spam filter chooses to move it to 
Junk, it gets relearned as spam.

Finally, to deal with many of the false positives I was getting with SA, I 
wrote a script, executed from cron, which takes new mail in the users' Sent 
folders, and whitelists them with spamassassin in the users' own individual 
user_prefs files.

This is what it took before I was really happy with the performance of SA. 
Well... that and adding a 1 second sleep after connection in the Postfix 
configuration. That made a huge difference. But our mail volume is small enough 
that the 1 second sleep doesn't cause any problems as it would on a really high 
volume server.

I hope that rough outline is helpful to you in some way.

However, having come through all that, I find myself wondering if we should 
simply impose capital punishment for the crime of spamming, or if more drastic 
action is indicated. ;-)



Re: getting tons of SPAM

2014-07-01 Thread Axb

nor, if using Postfix, postscreen

On 07/01/2014 09:17 PM, Jeremy McSpadden wrote:

No mention of RBLs or greylisting ...

--
Jeremy McSpadden
Flux Labs | http://www.fluxlabs.nethttp://www.fluxlabs.net/ | Endless 
Solutions
Office : 850-250-5590x501tel:850-250-5590;501 | Cell : 
850-890-2543tel:850-890-2543 | Fax : 850-254-2955tel:850-254-2955

On Jul 1, 2014, at 2:06 PM, Steve Bergman 
sbergma...@gmail.commailto:sbergma...@gmail.com wrote:

Hey motty cruz,

I just moved our 100 users over from our ISP's mail servers to our own. 
Apparently, the ISP's mail servers were doing remarkably well. Because it turns 
out that we get some 5000 spams a day, and users were getting essentially no 
spam.

Then I upgraded us to a new OS on our Debian/X2Go/MATE desktop server, and move 
us to our own mail server, and the spam was coming through like water through 
the sluice gates of a dam.

It didn't help that I'd moved everyone from Evolution to Thunderbird. So the 
client bayesian spam filters were completely untrained.

So I installed SA on the server. That helped. But it wasn't enough. I compiled 
up DCC and and installed Pyzor, and that helped some. (Though SA's Pyzor 
support had some teething problems, as you can see from my recent posts, which 
I think may be now resolved.)

What SA really needs if for its own Bayesian filter to kick in. But to be used 
at all, you need at least 200 ham and 200 spam messages registered with it.

i.e. if you have to have a way to train the filter. I don't really have much confidence 
in autolearn. And I'm a little scared of it. So I turned it off. We use 
Dovecot. So I used the dovecot-antispam plugin to automatically train SA when mail gets 
moved in or out of the junk folder. (It handles the moving of mail from Junk into Trash 
or regular folders intelligently and appropriately.)

But that only solved half the problem. You need 200 hams and 200 spams. Mail 
was not getting marked as ham when it went into the Inboxes. So I wrote a 
script that could be called from the users' .forward files to mark messages as 
ham. Then if the user, or Thunderbird's own spam filter chooses to move it to 
Junk, it gets relearned as spam.

Finally, to deal with many of the false positives I was getting with SA, I 
wrote a script, executed from cron, which takes new mail in the users' Sent 
folders, and whitelists them with spamassassin in the users' own individual 
user_prefs files.

This is what it took before I was really happy with the performance of SA. 
Well... that and adding a 1 second sleep after connection in the Postfix 
configuration. That made a huge difference. But our mail volume is small enough 
that the 1 second sleep doesn't cause any problems as it would on a really high 
volume server.

I hope that rough outline is helpful to you in some way.

However, having come through all that, I find myself wondering if we should 
simply impose capital punishment for the crime of spamming, or if more drastic 
action is indicated. ;-)






Re: getting tons of SPAM

2014-07-01 Thread motty cruz
Hello Jeremy,

I have the following rbl main.cfg in postfix:
 reject_rbl_client b.barracudacentral.org,
 reject_rbl_client zen.spamhaus.org,
 reject_rbl_client bl.spamcop.net,
 reject_rbl_client all.spamrats.com

RBL are very nice, helping me block lots of SPAM but a lot of spam are
making it through, with very low score. I trained SA with about 700 SPAM
emails and with about 258 HAM emails.

X-Virus-Scanned: amavisd-new at fqdn.com
X-Spam-Flag: NO
X-Spam-Score: 0.003
X-Spam-Level:
X-Spam-Status: No, score=0.003 tagged_above=-999 required=5.3
tests=[DKIM_SIGNED=0.001, HTML_IMAGE_RATIO_06=0.001,
HTML_MESSAGE=0.001, T_DKIM_INVALID=0.01, T_RP_MATCHES_RCVD=-0.01]
autolearn=no

Email hearder is very spammy,
I need help stoping this attack,

Thanks for your support,




On Tue, Jul 1, 2014 at 12:17 PM, Jeremy McSpadden jer...@fluxlabs.net
wrote:

  No mention of RBLs or greylisting ...


 --
 Jeremy McSpadden
 Flux Labs | http://www.fluxlabs.net | Endless Solutions
 Office : 850-250-5590x501 850-250-5590;501 | Cell : 850-890-2543 | Fax
 : 850-254-2955

 On Jul 1, 2014, at 2:06 PM, Steve Bergman sbergma...@gmail.com wrote:

  Hey motty cruz,

 I just moved our 100 users over from our ISP's mail servers to our own.
 Apparently, the ISP's mail servers were doing remarkably well. Because it
 turns out that we get some 5000 spams a day, and users were getting
 essentially no spam.

 Then I upgraded us to a new OS on our Debian/X2Go/MATE desktop server, and
 move us to our own mail server, and the spam was coming through like water
 through the sluice gates of a dam.

 It didn't help that I'd moved everyone from Evolution to Thunderbird. So
 the client bayesian spam filters were completely untrained.

 So I installed SA on the server. That helped. But it wasn't enough. I
 compiled up DCC and and installed Pyzor, and that helped some. (Though SA's
 Pyzor support had some teething problems, as you can see from my recent
 posts, which I think may be now resolved.)

 What SA really needs if for its own Bayesian filter to kick in. But to be
 used at all, you need at least 200 ham and 200 spam messages registered
 with it.

 i.e. if you have to have a way to train the filter. I don't really have
 much confidence in autolearn. And I'm a little scared of it. So I turned
 it off. We use Dovecot. So I used the dovecot-antispam plugin to
 automatically train SA when mail gets moved in or out of the junk folder.
 (It handles the moving of mail from Junk into Trash or regular folders
 intelligently and appropriately.)

 But that only solved half the problem. You need 200 hams and 200 spams.
 Mail was not getting marked as ham when it went into the Inboxes. So I
 wrote a script that could be called from the users' .forward files to mark
 messages as ham. Then if the user, or Thunderbird's own spam filter chooses
 to move it to Junk, it gets relearned as spam.

 Finally, to deal with many of the false positives I was getting with SA, I
 wrote a script, executed from cron, which takes new mail in the users' Sent
 folders, and whitelists them with spamassassin in the users' own individual
 user_prefs files.

 This is what it took before I was really happy with the performance of SA.
 Well... that and adding a 1 second sleep after connection in the Postfix
 configuration. That made a huge difference. But our mail volume is small
 enough that the 1 second sleep doesn't cause any problems as it would on a
 really high volume server.

 I hope that rough outline is helpful to you in some way.

 However, having come through all that, I find myself wondering if we
 should simply impose capital punishment for the crime of spamming, or if
 more drastic action is indicated. ;-)




Re: getting tons of SPAM

2014-07-01 Thread Steve Bergman

On 07/01/2014 02:23 PM, Axb wrote:


nor, if using Postfix, postscreen



Indeed. I've looked at that. It's probably better than the sleep. But 
it's not yet an option for us. And at 7000 emails per day or whatever we 
get, I'm not sure there's that much difference. (There may be. I haven't 
looked at postscreen all that closely.)


We'll be doing an OS upgrade on our server to Ubuntu 14.04 LTS within 
the next year. Possibly even within the next few weeks. I'd actually 
kind of like to move to Debian 7. But I really can't justify all the 
extra complication when I can do an in-place upgrade of the Ubuntu 10.04.


-Steve





Re: getting tons of SPAM

2014-07-01 Thread Steve Bergman

On 07/01/2014 02:33 PM, motty cruz wrote:

I trained SA with about 700 SPAM emails and with about 258 HAM emails.


In case I missed this, are you the single user, or does this server 
handle many mail accounts? I have many, and took the conservative 
approach of giving each user their own filedb database of tokens, and 
traning them with the stream of emails which are actually coming into 
the users' Inboxes. Doing it that way, it takes a while for the training 
to mature. But my thinking is that they will mature into more accurate 
bayesian classifiers that way.


-Steve


Re: getting tons of SPAM

2014-07-01 Thread Martin Gregorie
On Tue, 2014-07-01 at 19:17 +, Jeremy McSpadden wrote:
 No mention of RBLs or greylisting ...
 
Quite.

When my ISP switched on greylisting my mail immediately went from a
spam:ham ratio of 80:20 to one of 20:80, which pretty much where it has
stayed ever since.

The soam:ham ratio is reported on a daily basis by my spamkiller filter
that I wrote and installed immediately downstream of my local copy of
SA. The filter quarantines spam for a week before deleting it. I don't
use my ISP's copy of SA because it didn't do a good job on some of the
maillists I get: my local SA does, but has a ruleset that is highly
customised for my mail stream.


Martin





Re: getting tons of SPAM

2014-07-01 Thread Steve Bergman



On 07/01/2014 03:29 PM, Martin Gregorie wrote:

On Tue, 2014-07-01 at 19:17 +, Jeremy McSpadden wrote:

No mention of RBLs or greylisting ...


Quite.

When my ISP switched on greylisting my mail immediately went from a
spam:ham ratio of 80:20 to one of 20:80


But the variable delay, which is not under your control? My users 
complained loudly about that minority of mails which took an hour to 
arrive. I had to turn it off. Yes, I'm sure the autowhitelist features 
help with time. But we're always receiving mail from new customers whom 
our mail server has never heard from before. And you really don't want 
to not receive a mail from a new customer for an hour or more when you 
are a service company advertising fast and efficient service of your 
customers' restaurant kitchen equipment during the lunch hours.


I did not find greylisting viable for our use case. And I suspect many 
businesses would have similar incompatibilities with the strategy.


-Steve


Re: getting tons of SPAM

2014-07-01 Thread motty cruz
yes I guest I could change the variable delay, I will do a quick search to
see how would affect users. some users are very sensitive to this issues.

Thanks a bunch,



On Tue, Jul 1, 2014 at 1:37 PM, Steve Bergman sbergma...@gmail.com wrote:



 On 07/01/2014 03:29 PM, Martin Gregorie wrote:

 On Tue, 2014-07-01 at 19:17 +, Jeremy McSpadden wrote:

 No mention of RBLs or greylisting ...

  Quite.

 When my ISP switched on greylisting my mail immediately went from a
 spam:ham ratio of 80:20 to one of 20:80


 But the variable delay, which is not under your control? My users
 complained loudly about that minority of mails which took an hour to
 arrive. I had to turn it off. Yes, I'm sure the autowhitelist features help
 with time. But we're always receiving mail from new customers whom our mail
 server has never heard from before. And you really don't want to not
 receive a mail from a new customer for an hour or more when you are a
 service company advertising fast and efficient service of your customers'
 restaurant kitchen equipment during the lunch hours.

 I did not find greylisting viable for our use case. And I suspect many
 businesses would have similar incompatibilities with the strategy.

 -Steve



Re: Bayer Filter Not Working

2014-07-01 Thread Herbert J. Skuhra
On Tue, 01 Jul 2014 09:37:17 +0200
Herbert J. Skuhra wrote:

 Den 25.06.2014 00:42, skrev Bruce Sackett:
  I apologize, I’m sure it’s been covered, but I have not been
  successful finding results in searches on the web or through the
  history of the list.  I get no BAYES results in the headers, so I
  don’t see any working.  The part that gets me is below:
  
  Jun 24 13:47:53.165 [3245] dbg: bayes: tie-ing to DB file R/O
  /var/lib/amavis/.spamassassin/bayes_toks
  Jun 24 13:47:53.166 [3245] dbg: bayes: tie-ing to DB file R/O
  /var/lib/amavis/.spamassassin/bayes_seen
  Jun 24 13:47:53.167 [3245] dbg: bayes: found bayes db version 3
  Jun 24 13:47:53.167 [3245] warn: plugin: eval failed: Insecure
  dependency in sprintf while running with -T switch at
  /usr/local/share/perl/5.14.2/Mail/SpamAssassin/Logger.pm line 241.
  Jun 24 13:47:53.168 [3245] dbg: config: score set 0 chosen.
  
  That seems to be the last time Bayes is referenced in a spamassassin
  -D ―lint
  
  Has anyone else run into this?  I am using an Ubuntu 12.04 server, if
  that makes any difference.
 
 I have the same problem on FreeBSD:
 
 Jul  1 05:33:51.765 [43144] dbg: bayes: learner_new
 self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x805b09f78),
 bayes_store_module=Mail::SpamAssassin::BayesStore::DBM
 Jul  1 05:33:51.778 [43144] dbg: bayes: learner_new: got
 store=Mail::SpamAssassin::BayesStore::DBM=HASH(0x806108798)
 Jul  1 05:33:51.779 [43144] dbg: bayes: tie-ing to DB file R/O
 /var/amavis/.spamassassin/bayes_toks
 Jul  1 05:33:51.779 [43144] dbg: bayes: tie-ing to DB file R/O
 /var/amavis/.spamassassin/bayes_seen
 Jul  1 05:33:51.779 [43144] dbg: bayes: found bayes db version 3
 Jul  1 05:33:51.779 [43144] warn: plugin: eval failed: Insecure
 dependency in sprintf while running with -T switch at
 /usr/local/lib/perl5/site_perl/5.16/Mail/SpamAssassin/Logger.pm line
 241.
 Jul  1 05:33:51.799 [43144] warn: plugin: eval failed: Insecure
 dependency in sprintf while running with -T switch at
 /usr/local/lib/perl5/site_perl/5.16/Mail/SpamAssassin/Logger.pm line
 241.
 
 Running 'sa-learn --force-expire' seems to resolve the issue temporally.
 
 Jul  1 09:35:06.084 [49647] dbg: bayes: learner_new
 self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x805b09f78),
 bayes_store_module=Mail::SpamAssassin::BayesStore::DBM
 Jul  1 09:35:06.097 [49647] dbg: bayes: learner_new: got
 store=Mail::SpamAssassin::BayesStore::DBM=HASH(0x806108798)
 Jul  1 09:35:06.098 [49647] dbg: bayes: tie-ing to DB file R/O
 /var/amavis/.spamassassin/bayes_toks
 Jul  1 09:35:06.098 [49647] dbg: bayes: tie-ing to DB file R/O
 /var/amavis/.spamassassin/bayes_seen
 Jul  1 09:35:06.098 [49647] dbg: bayes: found bayes db version 3
 Jul  1 09:35:06.099 [49647] dbg: bayes: DB journal sync: last sync: 0
 Jul  1 09:35:06.570 [49647] dbg: bayes: DB journal sync: last sync: 0
 Jul  1 09:35:06.570 [49647] dbg: bayes: corpus size: nspam = 120857,
 nham = 664988
 
 After a while the error returns. Do I have to wipe my bayes DB?

I wiped my bayes DB and learned more than 200 spam and ham messages each.
While nham and nspam were below 200 message error was gone. But now it
is back.

% spamassassin -t  OvwTlDIfJxAe
 [...]
 3.5 BAYES_99   BODY: Bayes spam probability is 99 to 100%
[score: 1.]
 [...]  
 0.2 BAYES_999  BODY: Bayes spam probability is 99.9 to 100%
[score: 1.]
 [...]

Running the same command with the -D switch the error appears and I
don't see the BAYES score. There is also no BAYES score in the amavisd
log. :-(

Any ideas?

Thanks.

--
Herbert


Re: getting tons of SPAM

2014-07-01 Thread Steve Bergman

On 07/01/2014 04:00 PM, motty cruz wrote:

yes I guest I could change the variable delay, I will do a quick search
to see how would affect users. some users are very sensitive to this
issues.



What mail server and version of it are you using?

There was a good suggestion made about postscreen, earlier, if you are 
using version 2.8 or later of postfix, IIRC.


I was a bit confused about what greylisting means, until a few days ago. 
Basically, your server maintains a database of servers that it has 
previously talked to and received legitimate emails from.


If it has not talked to a server at this ip before (generally a block of 
256 or so addresses) then it says Hey, I'd love to accept your incoming 
mail, but I'm *really* busy right now. Could you come back in 5 minutes 
and I'll be able to take it then?


Normal mail servers will come back. Spam servers do a sort of wham, bam, 
thank you ma'am version of sending mail. Except that they don't even 
bother to say thank you, and they certainly don't come back in 5 
minutes. They just move on to their next victim. They don't waste time 
with slow receiving servers.


The problem is that saying come back in 5 minutes does not mean that 
even legitimate servers are going to come back in 5 minutes. They might 
wait 10 minutes. Or 15. Or 20. Or 30. Or an hour.


Some of the sites listed in postgrey's default whitelist file delay as 
much as 12 hours between reties. And you cannot necessarily trust 
whitelists to cover all the important senders on an ongoing basis.


-Steve



Re: getting tons of SPAM

2014-07-01 Thread Martin Gregorie
On Tue, 2014-07-01 at 15:37 -0500, Steve Bergman wrote:
 
 On 07/01/2014 03:29 PM, Martin Gregorie wrote:
  On Tue, 2014-07-01 at 19:17 +, Jeremy McSpadden wrote:
  No mention of RBLs or greylisting ...
 
  Quite.
 
  When my ISP switched on greylisting my mail immediately went from a
  spam:ham ratio of 80:20 to one of 20:80
 
 But the variable delay, which is not under your control?

You're right: its not.

 My users 
 complained loudly about that minority of mails which took an hour to 
 arrive. I had to turn it off.

I know what can happen, and also that those complaints can arise from a
total misunderstanding of what e-mail is designed to do: that it is
*not* an instant messaging medium but it is a reliable one despite
delivering over sometimes flaky networks. IOW demanding instant e-mail
delivery is quite unreasonable.

 Yes, I'm sure the autowhitelist features 
 help with time. But we're always receiving mail from new customers
 whom our mail server has never heard from before. And you really don't
 want to not receive a mail from a new customer for an hour or more
 when you are a service company advertising fast and efficient service
 of your customers' restaurant kitchen equipment during the lunch
 hours.
 
I think that specific whitelisting could help here: I run a mail archive
that takes an automatic BCC feed of both incoming and outgoing mail from
Postfix. This just works and has an important secondary use. SA uses a
special-purpose plugin to query the mail archive: any incoming mail
received from an e-mail address I've previously sent mail to gets
whitelisted. It was simple to do because the archive is held in a
PostgreSQL database and has almost zero maintenance costs. As I run it,
the whitelist is assembled automatically from outgoing mail, but it
would not be hard to accept an address feed from, say, a sales system or
a guarantee registration database which would allow customer addresses
to be whitelisted as their orders are confirmed. For that matter, if a
new customer is always sent an e-mail to ensure they have your address
and to confirm that theirs is correctly entered, then they'd be
automatically whitelisted by that e-mail.


Martin


 I did not find greylisting viable for our use case. And I suspect many 
 businesses would have similar incompatibilities with the strategy.
 
 -Steve
 





Re: getting tons of SPAM

2014-07-01 Thread John Hardin

On Tue, 1 Jul 2014, Martin Gregorie wrote:


On Tue, 2014-07-01 at 15:37 -0500, Steve Bergman wrote:


On 07/01/2014 03:29 PM, Martin Gregorie wrote:

On Tue, 2014-07-01 at 19:17 +, Jeremy McSpadden wrote:

No mention of RBLs or greylisting ...


Quite.

When my ISP switched on greylisting my mail immediately went from a
spam:ham ratio of 80:20 to one of 20:80


But the variable delay, which is not under your control?


You're right: its not.

My users complained loudly about that minority of mails which took an 
hour to arrive. I had to turn it off.


I know what can happen, and also that those complaints can arise from a
total misunderstanding of what e-mail is designed to do: that it is
*not* an instant messaging medium but it is a reliable one despite
delivering over sometimes flaky networks. IOW demanding instant e-mail
delivery is quite unreasonable.


+1

And if your business is predicated on instant e-mail you are setting 
yourself up for pain.


If it needs to be *instant*, have them visit a web page to enter service 
requests.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  It is criminal to teach a man not to defend himself when he is the
  constant victim of brutal attacks.  -- Malcolm X (1964)
---
 3 days until the 238th anniversary of the Declaration of Independence


Re: getting tons of SPAM

2014-07-01 Thread motty cruz
Today I build a new Spam filter with latest release, I leave all default
configuration except a few changes, for now seem to be doing better at
blocking really spammy emails.

Thanks for all your help,



On Tue, Jul 1, 2014 at 2:39 PM, John Hardin jhar...@impsec.org wrote:

 On Tue, 1 Jul 2014, Martin Gregorie wrote:

  On Tue, 2014-07-01 at 15:37 -0500, Steve Bergman wrote:


 On 07/01/2014 03:29 PM, Martin Gregorie wrote:

 On Tue, 2014-07-01 at 19:17 +, Jeremy McSpadden wrote:

 No mention of RBLs or greylisting ...

  Quite.

 When my ISP switched on greylisting my mail immediately went from a
 spam:ham ratio of 80:20 to one of 20:80


 But the variable delay, which is not under your control?


 You're right: its not.

  My users complained loudly about that minority of mails which took an
 hour to arrive. I had to turn it off.


 I know what can happen, and also that those complaints can arise from a
 total misunderstanding of what e-mail is designed to do: that it is
 *not* an instant messaging medium but it is a reliable one despite
 delivering over sometimes flaky networks. IOW demanding instant e-mail
 delivery is quite unreasonable.


 +1

 And if your business is predicated on instant e-mail you are setting
 yourself up for pain.

 If it needs to be *instant*, have them visit a web page to enter service
 requests.


 --
  John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
  jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
 ---
   It is criminal to teach a man not to defend himself when he is the
   constant victim of brutal attacks.  -- Malcolm X (1964)

 ---
  3 days until the 238th anniversary of the Declaration of Independence



Re: getting tons of SPAM

2014-07-01 Thread Steve Bergman



On 07/01/2014 04:31 PM, Martin Gregorie wrote:


I know what can happen, and also that those complaints can arise from a
total misunderstanding of what e-mail is designed to do: that it is
*not* an instant messaging medium but it is a reliable one despite
delivering over sometimes flaky networks. IOW demanding instant e-mail
delivery is quite unreasonable.


I disagree. Email is what it is. If the sender and the receiver both 
agree that they should be able to expect messages between them to arrive 
in a short time, then that is the implied personal contract.


The real issue, here, is that we're applying a technological kluge to 
try to combat the social problem of massive abuse of the email system. 
If the goal of spammers is to wreck the (admittedly rather naive) email 
system, they've won. Of course, that was never their goal. But they've 
still wrecked email. And we admins trying to stop spam are also damaging 
the email system. We just hope we're doing more good than harm.


In short... try to explain that email isn't an instant messaging system 
to a customer with a dead fryer at 11AM emailing for a tech to help 
before the lunch crowd arrives. That's how email is used in the real 
world. And no amount of our saying you shouldn't do that is going to 
change the fact.


People do expect all sorts of things that email was never designed to 
handle. Protection for abuse by spammers is one. The sending of DVD 
attachments is another. And our own abuse of the system to try to 
prevent others' abuse of the system results in a certain collateral 
damage which is quite real.




I think that specific whitelisting could help here


It can help. But I cannot think of a whitelisting system, in tandem with 
a kluge like greylisting, which would not do more harm than good. At 
least not for a service organization like ours.


That said. I have plenty of kluges in place myself. I'm far from being 
authorized to speak from a holier than thou position. ;-)


-Steve


Re: getting tons of SPAM

2014-07-01 Thread Antony Stone
On Wednesday 02 July 2014 at 00:12:07, Steve Bergman wrote:

 In short... try to explain that email isn't an instant messaging system
 to a customer with a dead fryer at 11AM emailing for a tech to help
 before the lunch crowd arrives. That's how email is used in the real
 world. And no amount of our saying you shouldn't do that is going to
 change the fact.

This may be true, but in the example that you give, tech support should really 
have provided a better (ie: more reliable) mechanism for contact than email if 
the customer is entitled to (expect) a prompt response.

I don't agree with blaming users of email for expecting it to work the way it 
used to some years ago (remember the days of open relays, without problems, 
and delivery notifications?) when it's the technology, and the security systems 
which have been imposed on the system, which have changed, without the users 
necessarily realising or being told, and which have made it work differently.

 People do expect all sorts of things that email was never designed to
 handle. Protection for abuse by spammers is one. The sending of DVD
 attachments is another. And our own abuse of the system to try to
 prevent others' abuse of the system results in a certain collateral
 damage which is quite real.

People with email accounts should be (continually) informed of what service 
they're being offered and can thereby reasonably expect to receive.

 I cannot think of a whitelisting system, in tandem with a kluge like
 greylisting, which would not do more harm than good. At least not for a
 service organization like ours.

Horses for courses - what works well for others may not work well for you - 
but that's no reason to dismiss it outright.  (Yes, I agree that you didn't, 
but the point remains that for some people both whitelisting and greylisting 
are very effective.)

 That said. I have plenty of kluges in place myself. I'm far from being
 authorized to speak from a holier than thou position. ;-)

Me too :)


Antony.

-- 
There is no reason for any individual to have a computer in their home.

 - Ken Olsen, President of Digital Equipment Corporation (DEC, later consumed 
by Compaq, later merged with HP)

 Please reply to the list;
   please don't CC me.


Re: getting tons of SPAM

2014-07-01 Thread RW
On Tue, 01 Jul 2014 14:06:14 -0500
Steve Bergman wrote:


 What SA really needs if for its own Bayesian filter to kick in. But
 to be used at all, you need at least 200 ham and 200 spam messages 
 registered with it.
 
 i.e. if you have to have a way to train the filter. I don't really
 have much confidence in autolearn. And I'm a little scared of it.
 So I turned it off. We use Dovecot. So I used the dovecot-antispam
 plugin to automatically train SA when mail gets moved in or out of
 the junk folder. (It handles the moving of mail from Junk into Trash
 or regular folders intelligently and appropriately.)

I'm sceptical about the use of Dovecot-Antispam with Spamassassin.
The problem is that it trains on SpamAssassin errors rather than Bayes
errors. It may be possible to get sufficient spam this way, but ham
is learned very slowly through avoidable FPs.


 But that only solved half the problem. You need 200 hams and 200
 spams. 

You need several thousand hams and spams for it to work optimally.


Re: getting tons of SPAM

2014-07-01 Thread Steve Bergman

On 07/01/2014 05:35 PM, Antony Stone wrote:


This may be true, but in the example that you give, tech support should really
have provided a better (ie: more reliable) mechanism for contact than email if
the customer is entitled to (expect) a prompt response.


There are multiple methods. But customers (New, as well as previously 
existing) do expect that their emails will be received promptly. This is 
the way people use email. We have no way of controlling what new 
customers contacting use expect.


Our minority opinions, as admins, of what email should be today don't 
count for that much. We as admins, our predecessors. our predecessors, 
and their predecessors, are responsible for the way that we have failed 
our users today. The email system we use is incredibly naive.


There's got to be a way to do this right. SPF is a big step forward. But 
we don't trust it. An SPF fail gets, what? 2.0 points in spamassassin? 
An SPF fail should be end of game for spam. But no one trusts SPF well 
enough to do that. Because sending email server admins don't take SPF 
seriously enough for receiving servers to take *them* seriously. (And 
which of us does not handle sending and receiving?)


SPF could go a long way toward forming a basis for fixing email. It 
can't do it by itself. All it does is give basic assurance that My 
sending domain is what I claim it is.. But that is exactly the rock 
solid basis that a real solution could be based upon.


The email system is and always has been, from a security standpoint, a 
joke. And SA is an amazing and wonderful kluge that tries to sweep the 
fact under the rug as best it can. No disrespect intended to SA. But if 
it could absolutely identify the sending domain, with confidence, that 
would be a big step forward.


Spammers could still abuse. But then reputation would really mean something,

I know that it's a complicated problem. And it's a social problem. We 
admins can't agree on a solution. A month ago I was pretty ignorant, 
didn't know about sender protection schemes, etc. So I still kind of 
have a foot in both the camps of enlightenment and ignorance. But I can 
report that there are still a lot of people in that ignorance camp, over 
there. And I certainly cannot claim to be entirely in the enlightenment 
camp.


But such is life. ;-)

That said, I *do* think that it is possible for email to work just the 
way users expect it to work, based upon their experience with it an 
arbitrary number of years ago. It all depends upon admins as a 
community. Or we might fail. I dunno.


I'm just glad SA is here to fill in the gap, and perhaps to herd in a 
better future.


-Steve



Re: getting tons of SPAM

2014-07-01 Thread Steve Bergman

On 07/01/2014 06:09 PM, RW wrote:
 I'm sceptical about the use of Dovecot-Antispam with Spamassassin.
 The problem is that it trains on SpamAssassin errors rather than Bayes
 errors. It may be possible to get sufficient spam this way, but ham
 is learned very slowly through avoidable FPs.


We currently (early days for this installation) get plenty of spam for 
the users to train by moving it to the junk folder. Ham was the problem. 
Dovecot does nothing about training ham. That's why I have a line in the 
users' default .forward file to train incoming mail as ham. Then if they 
or Thunderbird decide to move the mail to Junk, it gets re-trained as spam.


dovecot-antispam is *not* a complete solution, so far as I can see.

At this early stage, it *is* painful to watch all that spam coming in 
over the weekend getting trained as ham. I tell my users to mark it as 
spam on Monday morning. And if they don't, I just figure it's not my fault.


Once the token databases get larger there won't be so much potential 
flux back and forth, I guess.


-Steve


Re: getting tons of SPAM

2014-07-01 Thread Karsten Bräckelmann
On Tue, 2014-07-01 at 12:33 -0700, motty cruz wrote:
 I trained SA with about 700 SPAM emails and with about 258 HAM
 emails.  

 X-Spam-Status: No, score=0.003 tagged_above=-999 required=5.3
   tests=[DKIM_SIGNED=0.001, HTML_IMAGE_RATIO_06=0.001,
   HTML_MESSAGE=0.001, T_DKIM_INVALID=0.01,
   T_RP_MATCHES_RCVD=-0.01] autolearn=no

There's no BAYES_* rule hit. That means your manual training of ham and
spam has been done as the wrong user. You need to do the training as the
same user Amavis / SA runs as.


Earlier header pastes suggest you are using catch-all. Just, don't.

Not using catch-all will *significantly* reduce the amount of spam,
simply by completely eliminating the bulk of spam to otherwise false
addresses.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Bayes, Manual and Auto Learning Strategies (was: Re: getting tons of SPAM)

2014-07-01 Thread Karsten Bräckelmann
On Tue, 2014-07-01 at 18:43 -0500, Steve Bergman wrote:
 On 07/01/2014 06:09 PM, RW wrote:
  I'm sceptical about the use of Dovecot-Antispam with Spamassassin.
  The problem is that it trains on SpamAssassin errors rather than Bayes
  errors. It may be possible to get sufficient spam this way, but ham
  is learned very slowly through avoidable FPs.
 
 We currently (early days for this installation) get plenty of spam for 
 the users to train by moving it to the junk folder. Ham was the problem. 
 Dovecot does nothing about training ham.

Dovecot (and its antispam plugin) does nothing about training ham,
either. It offers target folders and triggers, for easy manual (re-)
classification -- and thus training -- of ham and spam.

 That's why I have a line in the users' default .forward file to train
 incoming mail as ham.

That's pretty bad practice. Fundamentally, you are implementing a custom
auto-learn flavor, overruling the SA configurable auto-learn behavior
and ignoring all safety concepts implemented by SA. There's a reason for
the ham and spam learning thresholds, and the ham threshold to be 0.1 by
default, *not* equaling required_score's default of 5.0.

 Then if they or Thunderbird decide to move the mail to Junk, it gets
 re-trained as spam.

So if a user in a hurry simply deletes some spam, it will remain ham, as
far as Bayes is concerned.


 dovecot-antispam is *not* a complete solution, so far as I can see.
 
 At this early stage, it *is* painful to watch all that spam coming in 
 over the weekend getting trained as ham. I tell my users to mark it as 
 spam on Monday morning. And if they don't, I just figure it's not my fault.

It is your fault to implement a broken training strategy.

 Once the token databases get larger there won't be so much potential 
 flux back and forth, I guess.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: getting tons of SPAM

2014-07-01 Thread Steve Bergman

On 07/01/2014 05:07 PM, motty cruz wrote:

If it needs to be *instant*, have them visit a web page to enter service
requests.



Because there's not way that web-based email forms can be abused.

Please. The whole delay thing is about the ridiculous greylisting kluge. 
There are plenty of other spam avoidance kluges which don't involve 
significant delay. I really can't believe what I'm hearing here. It has 
little to nothing to do with reality. Spam is a problem. But you don't 
have to make your users wait hours for important emails by making your 
mail servers play hard to get games with each other.


This is just silly.

If I forwarded this conversation to my email users, they'd be ROTFL over 
what the experts are saying about the tool they use daily.


It has problems. But long delays would be unacceptable. And http can't 
really replace all it's functionality. Web email forms are the slow, 
limiting, and annoying.


No offense intended. But that's honestly the way I see it.

-Steve


Re: Funky HARP Spam

2014-07-01 Thread Philip Prindeville

On Jun 27, 2014, at 12:34 PM, Philip Prindeville 
philipp_s...@redfish-solutions.com wrote:

 
 On Jun 27, 2014, at 7:30 AM, RW rwmailli...@googlemail.com wrote:
 
 
 As I mentioned before, the real violation is in the previous mime
 section, which claims 7bit, but contains octets with the high-bit set. 
 
 
 Yup.  Just submitted a patch for this:
 
 https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063
 

Loving this filter!  It’s catching 50% or more of our SPAM



Re: Bayes, Manual and Auto Learning Strategies

2014-07-01 Thread Steve Bergman



On 07/01/2014 07:32 PM, Karsten Bräckelmann wrote:


That's pretty bad practice. Fundamentally, you are implementing a custom
auto-learn flavor, overruling the SA configurable auto-learn behavior


SA's autolearn behavior doesn't make much sense. I have no confidence in it.

This method shields the user from the worst of the spam, while giving 
them full control of what gets relearned as spam.



and ignoring all safety concepts implemented by SA.


What safety concepts? autolearn is a complete joke. Even the docs 
explain that it's only there as a last resort method of kinda sorta 
training the spam filter.




So if a user in a hurry simply deletes some spam, it will remain ham, as
far as Bayes is concerned.


Same as with Thunderbird, I think. And it's working very well for them. 
If they act irresponsibly, they'll get more spam. It takes no longer to 
highlight the spam and click Junk than it does to highlight the spam 
and click Delete.


I've pretty much decided at this point that if the users don't do what I 
tell them to do, repeatedly, then what results is not my responsibility.


And it's not.

The alternative is to not mark incoming mail as ham, and allow the SA 
Bayesian filter to remain inactive forever.


I opted to give the users the choice of being responsible for sorting, 
and reaping the benefits of that if they do. And yes, I know that some 
are not going to.


I'd be interested if you have a better solution in mind.

-Steve


Re: Bayes, Manual and Auto Learning Strategies

2014-07-01 Thread Steve Bergman



On 07/01/2014 07:32 PM, Karsten Bräckelmann wrote:


That's pretty bad practice. Fundamentally, you are implementing a custom
auto-learn flavor, overruling the SA configurable auto-learn behavior


BTW, that reminds me of a question I had been meaning to ask on the 
list. Autolearn. There's very little written about it, so far as I am 
aware. But from what I have gleaned, from old posts, is that it is 
system-wide and in-memory. Now, I have Spamass-milter set to run SA 3.3 
as the recipient user, using the filedb backend. So in 3.3, is autolearn 
system wide and in memory, or per user and on disk?


This makes a difference regarding what Karsten and I are discussing. I 
don't suppose I would object to being wrong. But I have a feeling that 
I'm right.


-Steve


Re: getting tons of SPAM

2014-07-01 Thread Daniel Staal
--As of July 1, 2014 7:39:43 PM -0500, Steve Bergman is alleged to have 
said:



On 07/01/2014 05:07 PM, motty cruz wrote:

If it needs to be *instant*, have them visit a web page to enter service
requests.



Because there's not way that web-based email forms can be abused.

Please. The whole delay thing is about the ridiculous greylisting kluge.
There are plenty of other spam avoidance kluges which don't involve
significant delay. I really can't believe what I'm hearing here. It has
little to nothing to do with reality. Spam is a problem. But you don't
have to make your users wait hours for important emails by making your
mail servers play hard to get games with each other.

This is just silly.

If I forwarded this conversation to my email users, they'd be ROTFL over
what the experts are saying about the tool they use daily.

It has problems. But long delays would be unacceptable. And http can't
really replace all it's functionality. Web email forms are the slow,
limiting, and annoying.


--As for the rest, it is mine.

95+% of the time, email is immediate, true.  But it is not uncommon for 
mail to be delayed for hours or days either, even without greylisting.  It 
happens in the wild all the time, even (especially...) with the big 
providers.  Email is also not 100% reliable: It is a best-effort service 
and can and does drop messages on occasion.  (With varying degrees of 
notification: By the spec, notification should always happen, but 
experience says that causes backscatter, so it's not always by the spec.)


If you need an immediate, reliable communication method email will appear 
to work - but will randomly fail, and there will be *nothing you can do 
about it.*  If that's what your users are expecting you are doing a 
*disservice* to your users, because it *won't work.*


There are solutions that will, which have higher overhead costs than email. 
A password-protected web form is better - it won't fail silently.  Or there 
are specialist messaging protocols.  But if your users are expecting email 
to be that solution you are going to give yourself headaches.


Now, if 'most of the time' immediate communication is enough, that's fine. 
It may not be worth it for you to implement a higher reliability protocol - 
they cost time and money.  (I used to work for a company who's sole product 
was a 100% reliable communication protocol.)  But don't complain when it 
fails, because it will, and both you and the users need to expect that.


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: getting tons of SPAM

2014-07-01 Thread Steve Bergman

95+% of the time, email is immediate, true.


More like 99%+ of the time. When it's not, I hear about it.


But it is not uncommon for
mail to be delayed for hours or days either,


It's uncommon enough that when it does happen I get a phone call about a 
user not being able to receive email.




even without greylisting.


Greylisting is an ugly hack that I'm hesitant to even dignify by having 
the topic of serious conversation.


I'm not at all sure what you're talking about regarding email vs web 
form reliability. What are the links in that chain?


The email client can malfunction in some way. But then again, so can a 
browser. The sending server can malfunction in some way. But so can the 
web proxy. Then WAN link can go down on the sending side. But then, that 
can happen with both web and email. The receiving side's WAN can go down 
too. But in the case of a mail server it tries and tries and tries to 
get the message through as quickly as possible. The browser and proxy 
server certainly don't. They just drop it if anything goes wrong.


You tell me that email is unreliable. And yet anyone can see that it 
*is* quite reliable, until you, as a mail admin, foolishly introduce the 
self-DOSing technique of greylisting, and fall on your own sword.


You can go on about how it makes sense to fall on your sword. But I'm a 
realist, and not buying it.


Have fun in your ivory tower.

I'll also be typing this post up, putting a stamp on it, and mailing it. 
It might reach you there faster. ;-)


How many people here actually use greylisting and don't get complaints?

Our ISP, who previously handled our email certainly didn't introduce any 
noticeable delays. And nobody ever got a noticeable amount of spam, or 
reported to me a missed or late email.


Amazing, IMO. But it was obviously done without the ridiculous and 
unacceptable practice of greylististing.


I want to achieve the results that Windstream does.

-Steve


Re: getting tons of SPAM

2014-07-01 Thread Steve Bergman

I said:

 Have fun in your ivory tower.


Please permit me to retroactively back this line out of my previous 
post. The smiley on the next line was intended to cover it. But it just 
came out sounding nasty.


My amigdala's been acting up lately. ;-)

-Steve


Re: Bayes, Manual and Auto Learning Strategies

2014-07-01 Thread Karsten Bräckelmann
On Tue, 2014-07-01 at 20:36 -0500, Steve Bergman wrote:
 On 07/01/2014 07:32 PM, Karsten Bräckelmann wrote:
 
  That's pretty bad practice. Fundamentally, you are implementing a custom
  auto-learn flavor, overruling the SA configurable auto-learn behavior
 
 SA's autolearn behavior doesn't make much sense. I have no confidence in it.

The auto-learning feature is NOT meant to be a fully automated training
system. It's an aid for the user to eliminate the need to care about the
extremes, while focusing on the close-calls. There are options to tweak
to your specific needs, and there even is no single SA autolearn
behavior as you stated, but different flavors. And an option to turn it
off.

Frankly, it appears you don't understand what auto-learning is.

 This method shields the user from the worst of the spam, while giving 
 them full control of what gets relearned as spam.

Wrong. It is not this (your) method, that shields the user from the
worst of the spam. That's SA. Not your style of auto-training.

And unless you disabled Bayes auto-learning in SA (dunno, might have
been mentioned deep in the thread), the user does not have full control
of what gets relearned as spam.


  and ignoring all safety concepts implemented by SA.
 
 What safety concepts? autolearn is a complete joke. Even the docs 
 explain that it's only there as a last resort method of kinda sorta 
 training the spam filter.

You are doing (custom) auto-learning as ham of any message with a score
less than required_score of 5.0. *That* is a joke.

(Besides, you *are* doing auto-learning, which you just claimed to be a
complete joke.)

At this point I won't get into details. It should suffice to highlight
that a default ham auto-learning threshold of 0.1 is part of the safety
concepts. (See the M::SA::Plugin::AutoLearnThreshold man-page for more.)


  So if a user in a hurry simply deletes some spam, it will remain ham, as
  far as Bayes is concerned.
 
 Same as with Thunderbird, I think.

I never checked the TB internal Bayes implementation and auto-learn
strategy, but I'd be surprised if they do train on black/white, without
any gray area in between.

You stated it. Please back up your claim.


 And it's working very well for them. 
 If they act irresponsibly, they'll get more spam. It takes no longer to 
 highlight the spam and click Junk than it does to highlight the spam 
 and click Delete.

While I am aware I'm not the average user -- there's a delete action
key on my keyboard. There's no junk equivalent. Yes, I avoid using the
mouse if keyboard interaction is more productive...


 I've pretty much decided at this point that if the users don't do what I 
 tell them to do, repeatedly, then what results is not my responsibility.
 
 And it's not.

Do you hate your users or your job? (Sorry, snide-remark I couldn't
resist. Feel free to ignore.)

 The alternative is to not mark incoming mail as ham, and allow the SA 
 Bayesian filter to remain inactive forever.

No. I can only guess, but it appears there are some mis-interpretations
in that conclusion.

The SA Bayesian classifier to remain inactive forever can only refer
to insufficient initial training. Manual training. Of at least 200 ham
and spam each (by default, you can lower that to 0). You will easily get
that by manual training of existing messages. And even default auto-
learning would eventually cross the ham number. Less than forever.

More importantly, SA still marks (classifies) incoming mail as ham. Just
because its overall score is less than 5.0. It just does not *learn* all
of them as ham. Because there's a chance it might not actually be ham,
but a FN.

That area, between (default) auto-learning as ham and classifying as
spam is the gray area, where actual user input is of much value. For
both, learning spam AND ham, for that matter. In particular, because
generally (and as SA principle), a FP is *much* worse than a FN.


Your approach of force learning those as ham, is biasing your Bayes DB.
At the very least temporarily (unless a fresh spam campaign has been
re-trained by your users on Monday). At worst, until you clear it.

Btw, is that per-user, or are you gambling a site-wide Bayes DB?


 I opted to give the users the choice of being responsible for sorting, 
 and reaping the benefits of that if they do. And yes, I know that some 
 are not going to.
 
 I'd be interested if you have a better solution in mind.

Do not auto-learn ham every message that scores below required_score.

Introduce train-on-error for your users, with an extended manual
training option. Specific ham and spam folders, where moving or copying
mail into trains the Bayes classifier. Kind of optional for the user,
unless they feel there's too much mis-classification.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: getting tons of SPAM

2014-07-01 Thread Daniel Reynolds
It seems to me that grey listing could be useful for small non time
critical email servers, such as hobbyist setups, but for business, grey
listing is not the way to go.
On Jul 1, 2014 10:48 PM, Steve Bergman sbergma...@gmail.com wrote:

 I said:

  Have fun in your ivory tower.


 Please permit me to retroactively back this line out of my previous post.
 The smiley on the next line was intended to cover it. But it just came out
 sounding nasty.

 My amigdala's been acting up lately. ;-)

 -Steve



Re: Bayes, Manual and Auto Learning Strategies

2014-07-01 Thread Steve Bergman



On 07/01/2014 09:53 PM, Karsten Bräckelmann wrote:


Frankly, it appears you don't understand what auto-learning is.


So please specify, explicitly, what it is. I asked some specific 
questions about it. And I'm very interested in the answers.


Is auto-learn still system-wide? I'd need it to apply to individual 
users. Is it in-memory only? Or can I have it update the users' filedb 
token databases?


If it's now per user and uses the user databases, then I am more than 
ready to reconsider my opinion. But I've not been able to get a clear 
answer to this. I haven't had an opportunity to test. And I'd want 
confirmation from someone in the know anyway, before I changed strategies.





This method shields the user from the worst of the spam, while giving
them full control of what gets relearned as spam.


Wrong. It is not this (your) method, that shields the user from the
worst of the spam. That's SA. Not your style of auto-training.



Mine is not autotraining at all. it's giving the user a way of 
explicitly training the backend spam filter.



And unless you disabled Bayes auto-learning in SA (dunno, might have
been mentioned deep in the thread), the user does not have full control
of what gets relearned as spam.



I have disabled autolearning. I thought I mentioned that to you.



(Besides, you *are* doing auto-learning, which you just claimed to be a
complete joke.)


No. The messages are assumed ham until the user classifies it as spam. 
It is explicit learning. Under user control,




At this point I won't get into details. It should suffice to highlight
that a default ham auto-learning threshold of 0.1 is part of the safety
concepts. (See the M::SA::Plugin::AutoLearnThreshold man-page for more.)



I really don't think you understand what it is I'm doing. Anything below 
a score of 5.0 goes into their mailbox and learned as ham. If it's ham, 
that's great. If it's spam, they move it to Junk and it gets learned as 
spam. auto-learn is as brain dead as the defunct AWL.




I never checked the TB internal Bayes implementation and auto-learn
strategy, but I'd be surprised if they do train on black/white, without
any gray area in between.


Optimally, I would have an incoming folder and then the user could 
manually move the messages from there to spam or ham. But considering 
that this was not even remotely necessary with our old email provider, I 
don't feel that I can put my users to that level of extra trouble that 
they never even thought about having to deal with before, just because 
SA is not performing as well as the spam filter they are used to. The 
mail needs to go into the inbox directly. And for SA's bayesian tp work, 
it needs to be assumed as ham initially.


The only thing I see which might change my view would be explicit 
details about where autolearn stores its data and how it is used on a 
per user basis.


-Steve



Re: Bayes, Manual and Auto Learning Strategies

2014-07-01 Thread Karsten Bräckelmann
On Tue, 2014-07-01 at 20:53 -0500, Steve Bergman wrote:
 On 07/01/2014 07:32 PM, Karsten Bräckelmann wrote:
 
  That's pretty bad practice. Fundamentally, you are implementing a custom
  auto-learn flavor, overruling the SA configurable auto-learn behavior
 
 BTW, that reminds me of a question I had been meaning to ask on the 
 list. Autolearn. There's very little written about it, so far as I am 

http://spamassassin.apache.org/doc/Mail_SpamAssassin_Conf.html
http://spamassassin.apache.org/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html

 aware. But from what I have gleaned, from old posts, is that it is 
 system-wide and in-memory.

It depends on how you call SA (SMTP or MDA level). SA itself is a
filter, called by your mail-processing chain. Thus, there is no SA
default context of system-wide or per-user. It depends on how you call
it.


 Now, I have Spamass-milter set to run SA 3.3 
 as the recipient user, using the filedb backend. So in 3.3, is autolearn 
 system wide and in memory, or per user and on disk?

Milter usually means system-wide. (But since you just asked, it is.)

Which, referring to my previous post, also means, a single sloppy user
deleting your custom-auto-learned FN ham messages affects all your other
users. Or a non-sloppy, but on-vacation-mode user.

Moreover, there is no in-memory only, not on-disk mode. Unless you don't
have to ask about it.


 This makes a difference regarding what Karsten and I are discussing. I 
 don't suppose I would object to being wrong. But I have a feeling that 
 I'm right.

Irrespective of your feeling -- cheers!  /me having a beer


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: getting tons of SPAM

2014-07-01 Thread Steve Bergman

On 07/01/2014 10:11 PM, Daniel Reynolds wrote:

It seems to me that grey listing could be useful for small non time
critical email servers, such as hobbyist setups, but for business, grey
listing is not the way to go.


Indeed. We should always remember that our workloads are *not* the only 
ones out there. There are many different kinds.


Greylisting with postgrey might even work for us, after a teething 
period of building up the necessary (and rather large) necessary 
whitelist of sending servers.


I might even do it if I didn't feel I was being compared side-by-side 
with WindstreamHosting, who delivered neither spam nor delays nor 
noticeable false positives. The gods only know how they manage that.


But I'm learning. And I've gotten some very helpful posts from members 
of this list today, both on the list and privately. I should be able to 
do this without ugly hacks like greylisting.


That said, for my own home use, I'm perfectly fine with ugly hacks. I do 
them all the time.


I've had the whole place done over with Ugly Hack wallpaper, It's 
great. :-)


-Steve


Re: Bayes, Manual and Auto Learning Strategies

2014-07-01 Thread Steve Bergman



On 07/01/2014 10:21 PM, Karsten Bräckelmann wrote:


http://spamassassin.apache.org/doc/Mail_SpamAssassin_Conf.html
http://spamassassin.apache.org/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html


I've read those over and over. It never says anything about where the 
data is maintained, or whether it's per-user or not. The *only* solid 
claim I have is a ten year old (yes, at the dawn of SA Bayes) post which 
specifically says it's in memory, system-wide, and lost upon SA restart.



Milter usually means system-wide. (But since you just asked, it is.)


I'm using spamass-milter. It suid's to the recipient user for most 
mails. For aliases it defaults to a particular user who gets an 
unbelievable amount of spam at the gate, and whom I know sorts his 
ham/spam religiously.




Which, referring to my previous post, also means, a single sloppy user
deleting your custom-auto-learned FN ham messages affects all your other
users.


No. I make sure to keep each user solely responsible for their own email 
welfare.



Irrespective of your feeling -- cheers!  /me having a beer


Whew! After the conversations I've had here, today, I need one, too! ;-)


-Steve



Re: getting tons of SPAM

2014-07-01 Thread John Hardin

On Tue, 1 Jul 2014, Steve Bergman wrote:

I'm not at all sure what you're talking about regarding email vs web form 
reliability. What are the links in that chain?


The email client can malfunction in some way. But then again, so can a 
browser. The sending server can malfunction in some way. But so can the web 
proxy. Then WAN link can go down on the sending side. But then, that can 
happen with both web and email. The receiving side's WAN can go down too. But 
in the case of a mail server it tries and tries and tries to get the message 
through as quickly as possible. The browser and proxy server certainly don't. 
They just drop it if anything goes wrong.


But the user *sees* that failure *immediately* and can fall back to an 
alternate method of communication, say, a telephone call, if the situation 
is as urgent as you portray.


Email is store-and-forward best-effort with *no guarantees* of timely 
delivery, no matter how well it performs 99% of the time. An email message 
can get stuck for a day or more at any (or even all) of the intermediate 
hops, and the system is *working properly* if it is ultimately delivered, 
or a notification is eventually sent back to the user that it cannot be 
delivered.


And greylisting is a perfectly valid way to behave within the defined 
communications protocol. It fails because poor admins set the delivery 
retry time to an absurdly-long period, or poor programmers write MTAs that 
don't even know *how* to retry.


FWIW, I did not say, and did not have in mind a web-email form when I made 
my suggestion. I had in mind a more-direct interface to the trouble ticket 
management system. Of course, I may be assuming a more-sophisticated 
operation than is the case.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  News flash: Lowest Common Denominator down 50 points
---
 3 days until the 238th anniversary of the Declaration of Independence


Re: Bayes, Manual and Auto Learning Strategies

2014-07-01 Thread John Hardin

On Tue, 1 Jul 2014, Steve Bergman wrote:




On 07/01/2014 10:21 PM, Karsten Bräckelmann wrote:


http: //spamassassin.apache.org/doc/Mail_SpamAssassin_Conf.html
http: 
//spamassassin.apache.org/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html


I've read those over and over. It never says anything about where the data is 
maintained, or whether it's per-user or not. The *only* solid claim I have is 
a ten year old (yes, at the dawn of SA Bayes) post which specifically says 
it's in memory, system-wide, and lost upon SA restart.


Autolearn trains the bayes database. The bayes data is stored wherever you 
configured it to be stored, in a DBM database or SQL or redis, and it's 
per-user if you configure per-user Bayes databases and scan emails using 
different usernames (vs. a global user like root or amavis).


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  News flash: Lowest Common Denominator down 50 points
---
 3 days until the 238th anniversary of the Declaration of Independence

Re: Bayes, Manual and Auto Learning Strategies

2014-07-01 Thread Karsten Bräckelmann
On Tue, 2014-07-01 at 22:18 -0500, Steve Bergman wrote:
 On 07/01/2014 09:53 PM, Karsten Bräckelmann wrote:
 
  Frankly, it appears you don't understand what auto-learning is.
 
 So please specify, explicitly, what it is. I asked some specific 
 questions about it. And I'm very interested in the answers.

If you want my opinion, please re-phrase your questions. I locally
deleted most of this previous (originally unrelated) thread.

 Is auto-learn still system-wide? I'd need it to apply to individual 
 users. Is it in-memory only? Or can I have it update the users' filedb 
 token databases?

SA itself never was system-wide, neither user-specific. It is both, can
be either. It depends on the context of calling SA.


 If it's now per user and uses the user databases, then I am more than 
 ready to reconsider my opinion. But I've not been able to get a clear 
 answer to this. I haven't had an opportunity to test. And I'd want 
 confirmation from someone in the know anyway, before I changed strategies.

It does not depend on SA, but on how you invoke SA. We cannot give you a
clear answer. It depends on your system, your SMTP, glue, system wide
calling of SA, and possibly per-user invocations even after system-wide.

To be clear: SA is a filter. It does nothing itself, other than
classification. Being called, and at which point, is outside the scope
of SA. Rejecting, deleting, delivering or any other kind of action is
outside the scope of SA. That's actions performed by the calling layer,
based on the result of SA evaluation.


  This method shields the user from the worst of the spam, while giving
  them full control of what gets relearned as spam.
 
  Wrong. It is not this (your) method, that shields the user from the
  worst of the spam. That's SA. Not your style of auto-training.
 
 Mine is not autotraining at all. it's giving the user a way of 
 explicitly training the backend spam filter.

Quoting your previous post, you have a line in the users' default
.forward file to train incoming mail as ham. That is auto-training.

  (Besides, you *are* doing auto-learning, which you just claimed to be a
  complete joke.)
 
 No. The messages are assumed ham until the user classifies it as spam. 
 It is explicit learning. Under user control,

Being assumed is not the same as being treated and automatically
reinforced. The latter is what you do. (And btw, Yes. You are
auto-learning.)


  At this point I won't get into details. It should suffice to highlight
  that a default ham auto-learning threshold of 0.1 is part of the safety
  concepts. (See the M::SA::Plugin::AutoLearnThreshold man-page for more.)
 
 I really don't think you understand what it is I'm doing. Anything below 
 a score of 5.0 goes into their mailbox and learned as ham. If it's ham, 
 that's great. If it's spam, they move it to Junk and it gets learned as 
 spam. auto-learn is as brain dead as the defunct AWL.

I perfectly understood what you are doing.

You didn't understand why that is bad. Failing to explain might be my
bad, though I'll leave re-explaining for tomorrow my timezone. Or you
carefully re-reading my posts.


  I never checked the TB internal Bayes implementation and auto-learn
  strategy, but I'd be surprised if they do train on black/white, without
  any gray area in between.
 
 Optimally, I would have an incoming folder and then the user could 
 manually move the messages from there to spam or ham. But considering 

Which is basically what you came from, using Dovecot antispam plugin
with SA, and dedicated folders where the user could manually move the
messages to. Why didn't you just set that up?

(Hint: That's your set-up without auto-learning ham Inbox deliveries.)

 that this was not even remotely necessary with our old email provider, I 
 don't feel that I can put my users to that level of extra trouble that 
 they never even thought about having to deal with before, just because 
 SA is not performing as well as the spam filter they are used to. The 

Do initial manual training. Then get back to us.

 mail needs to go into the inbox directly. And for SA's bayesian tp work, 
 it needs to be assumed as ham initially.

No.

It seems your previous email provider, whatever that might be, had
some sort of spam filtering service. Now you're on your own.

Which you are, unless you decide to ask for free (as in beer) support by
the community providing the software for free (as in speech) to help you
weed out the spam. You did ask, which is just fine, but your assumptions
are kind of hostile. Like your previous email provider would not use
SA internally. He most likely does.


 The only thing I see which might change my view would be explicit 
 details about where autolearn stores its data and how it is used on a 
 per user basis.

So the only thing that might change your view would be reading the docs.
Go read them.

Auto-learn stores its data exactly where Bayes generally stores its
data. In fact, it is the same. Just being triggered 

Re: getting tons of SPAM

2014-07-01 Thread Daniel Staal
--As of July 1, 2014 9:40:05 PM -0500, Steve Bergman is alleged to have 
said:



95+% of the time, email is immediate, true.


More like 99%+ of the time. When it's not, I hear about it.


But it is not uncommon for
mail to be delayed for hours or days either,


It's uncommon enough that when it does happen I get a phone call about a
user not being able to receive email.


It's common enough that I saw it every day in my last job.  99.9% of the 
time the users didn't notice, or care.  On the other hand there were the 
times I had to show them the log files showing exactly when we got and sent 
the message, and had to have a talk about expectations.  (Nearly always the 
message had gone through our system in seconds.)



even without greylisting.


Greylisting is an ugly hack that I'm hesitant to even dignify by having
the topic of serious conversation.


I won't defend it.  I've never used it.  ;)


I'm not at all sure what you're talking about regarding email vs web form
reliability. What are the links in that chain?

The email client can malfunction in some way. But then again, so can a
browser. The sending server can malfunction in some way. But so can the
web proxy. Then WAN link can go down on the sending side. But then, that
can happen with both web and email. The receiving side's WAN can go down
too. But in the case of a mail server it tries and tries and tries to get
the message through as quickly as possible. The browser and proxy server
certainly don't. They just drop it if anything goes wrong.


I only said that it won't fail silently: If you are depending on it for 
immediate communications, you'll know when you didn't get that, while with 
email it'll be hidden.


Maybe 'better' wasn't the right word: It's a trade off.  If you want the 
message to go through, email is set up to keep trying.  If you want the 
message to go *now*, the web form will tell you if it did (making the 
assumption that the form returns a 'message delivered' screen once it has 
delivered the message), and the user can try for another form of 
communication if it fails.



You tell me that email is unreliable. And yet anyone can see that it *is*
quite reliable, until you, as a mail admin, foolishly introduce the
self-DOSing technique of greylisting, and fall on your own sword.

You can go on about how it makes sense to fall on your sword. But I'm a
realist, and not buying it.


As I said: I've never used greylisting.  I have seen mail queues regularly 
holding messages for hours or days.  Email is fairly reliable - but I 
wouldn't let a user treat it as 100% reliable and immediate, because I know 
it isn't.  Better a few hard conversations about expectations and options 
then lost business due to using the wrong tool for the job.



I'll also be typing this post up, putting a stamp on it, and mailing it.
It might reach you there faster. ;-)


Not faster, but probably more reliable.  ;)


How many people here actually use greylisting and don't get complaints?

Our ISP, who previously handled our email certainly didn't introduce any
noticeable delays. And nobody ever got a noticeable amount of spam, or
reported to me a missed or late email.


Then they didn't notice them.  In the normal course of things, most mail 
gets through in seconds, and most of the delays are in the range of minutes 
to hours - short enough that people don't see them unless they are paying 
close attention.  (And they may not be checking mail that often anyway.)



Amazing, IMO. But it was obviously done without the ridiculous and
unacceptable practice of greylististing.

I want to achieve the results that Windstream does.


You probably can.  ;)  But I'm sure Windstream didn't get you every piece 
of mail immediately after it was sent - just as soon as they could after 
they got it.  I'm not even saying I like greylisting - I'm just saying you 
should work to set user expectations to reality, which is that email 
sometimes takes time to get delivered and (rarely) gets lost.  If something 
is absolutely time-critical, they should treat email as a backup, not the 
primary form of communication.  If it can spare an hour or two on occasion, 
email's fine.


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: getting tons of SPAM

2014-07-01 Thread Steve Bergman



On 07/01/2014 11:09 PM, John Hardin wrote:


FWIW, I did not say, and did not have in mind a web-email form when I
made my suggestion. I had in mind a more-direct interface to the trouble
ticket management system. Of course, I may be assuming a
more-sophisticated operation than is the case.


John,

What my users expect is the level of speed and reliability of email that 
they have always had over the years with our ISP, up until 2 months ago 
when I took over with our new server. It was fast, reliable, mostly 
spam-free, and free of false positives (that they ever noticed, anyway).


I can't go in and try to convince them that in the last 2 months that 
I've been in charge of the mail server that the world's email has become 
slow, unreliable, and spammy.


I've got to come up with a solution that is as good as what our ISP 
provided.


The good news is that by conservatively (OK, maybe not always so 
conservatively. I was a little desperate at first) adding strategies in 
Postfix and SA, I guess I'm nearly at parity with our old ISP. Allowing 
for a bit of sugar-coating of their descriptions of the good old days, 
maybe I'm even already there.


Until we do our server OS upgrade, I don't have postscreen. But the 1 
second sleep after smtpd connection seems to have been the finishing 
touch on our spam control. It seems about as effective as postgrey.


Personally, I detest those web mail forms. I, too. expect to be able to 
compose an email, send it, and have it received within a minute. And I 
do not think that to be an unreasonable expectation at all, as long as 
we administrators keep our feet on the ground and don't start doing 
stupid stuff like greylisting.


Though, as been pointed out by Daniel, greylisting may be appropriate in 
certain contexts.


-Steve




Re: Bayes, Manual and Auto Learning Strategies

2014-07-01 Thread Karsten Bräckelmann
On Tue, 2014-07-01 at 22:40 -0500, Steve Bergman wrote:
 On 07/01/2014 10:21 PM, Karsten Bräckelmann wrote:
 
  http://spamassassin.apache.org/doc/Mail_SpamAssassin_Conf.html
  http://spamassassin.apache.org/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html
 
 I've read those over and over. It never says anything about where the 
 data is maintained, or whether it's per-user or not. The *only* solid 
 claim I have is a ten year old (yes, at the dawn of SA Bayes) post which 
 specifically says it's in memory, system-wide, and lost upon SA restart.

Those do not tell you about using file or SQL based databases? You never
thought about googling for spamassassin per user and friends? You
never checked the SA wiki?

FWIW, the links given do NOT refer to in-memory only at all.

An in-memory only Bayes database definitely is much more than ten years
ago. If it ever existed. No need for me to even check.

  Milter usually means system-wide. (But since you just asked, it is.)
 
 I'm using spamass-milter. It suid's to the recipient user for most 
 mails. For aliases it defaults to a particular user who gets an 
 unbelievable amount of spam at the gate, and whom I know sorts his 
 ham/spam religiously.

So you want to check back with your specific setup and its docs.
Suid'ing is pretty likely to be per-user, though the definition of user
is not specifically clear in the context of a milter (and the final
recipient).

In either case, that is not SA specific. (SA happily uses both, per-user
or site-wide config AND bayes database, depending on context.) Refer to
your milter's docs.


  Irrespective of your feeling -- cheers!  /me having a beer
 
 Whew! After the conversations I've had here, today, I need one, too! ;-)

Don't see this as an attack on you. It isn't. Just pointers on helping
your understanding of the situation and your issues. Not always gentle,
but that also reflects the initial stance.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: getting tons of SPAM

2014-07-01 Thread Steve Bergman

On 07/01/2014 11:15 PM, Daniel Staal wrote:


You probably can.  ;)  But I'm sure Windstream didn't get you every
piece of mail immediately after it was sent - just as soon as they could
after they got it.


Yeah. I'm conservatively holding myself to higher standards than is 
perhaps warranted. But I think that those standards are along the lines 
of what my long-time customer thought they were getting from Windstream. 
And it Winstream had too many issues, I think I would have heard about it.


And their servers *did* become unavailable for short periods from time 
to time.


But once I'm satisfied that I've reached parity, the real fun starts. We 
were on POP3. Now we're on our own IMAP. And there is Dovecot full text 
search in our near future. It will be fun to be able to go beyond and 
show off a little. My client company's CEO does a lot of full text 
searching over his email history.


  I'm not even saying I like greylisting - I'm just

saying you should work to set user expectations to reality,


When trust died on the Internet, telnet died, but somehow the 
unbelievably naive email system did not. It was never prepared for 
spammer abuse. And we're still accommodating to 7 bit systems for crying 
out loud. If it were material I suppose it would make a fine antique in 
someone's collection. Right along side the PDP-11.


 which is

that email sometimes takes time to get delivered and (rarely) gets
lost.  If something is absolutely time-critical, they should treat email
as a backup,


I think that It's largely a matter of *peoples* expectations and 
understanding, If a mail gets missed, folks can understand an occasional 
I never got your email, we'll send someone over right away.


What I object to is the idea of regular and unpredictable delays as 
introduced by greylisting. And it's just plain ugly from an aesthetic 
standpoint. But then so are our current email protocols. But I do think 
that can be fixed.


Never did like texting. And that's the alternative.

-Steve



Re: Bayes, Manual and Auto Learning Strategies

2014-07-01 Thread Steve Bergman



On 07/01/2014 11:49 PM, Karsten Bräckelmann wrote:


Those do not tell you about using file or SQL based databases?


They do. But not specifically with respect to autolearn.

You never

thought about googling for spamassassin per user and friends? You
never checked the SA wiki?


I have, indeed. No reference to autolearn and persistent storage. The 
lack of mention is notable.


I'd expect people to be lining up to tell me I'm mistaken if I 
absolutely were.


Can you point me to a change log somewhere documenting autolearn moving 
from in-memory and system-wide to per user and persistent?


I don't hold a strong opinion on this. It would be nice if I were wrong. 
It would open more options.


I'm just waiting for evidence that it's the case. My perception is that 
It's not.


-Steve


Re: Bayes, Manual and Auto Learning Strategies

2014-07-01 Thread Steve Bergman



On 07/01/2014 11:14 PM, John Hardin wrote:


Autolearn trains the bayes database. The bayes data is stored wherever
you configured it to be stored, in a DBM database or SQL or redis, and
it's per-user if you configure per-user Bayes databases and scan emails
using different usernames (vs. a global user like root or amavis).



That is interesting. How sure are you of this? Because if you're pretty 
sure, it's a piece of information I've been keen to confirm for a while.


Odd, though, that before I set up .forward to train incoming mails as 
ham and disabled autolearn, no nhams were showing up in sa-learn --dump 
magic for the individual users. Just nspams.


-Steve


Re: Bayes, Manual and Auto Learning Strategies

2014-07-01 Thread Axb

On 07/02/2014 07:19 AM, Steve Bergman wrote:



On 07/01/2014 11:49 PM, Karsten Bräckelmann wrote:


Those do not tell you about using file or SQL based databases?


They do. But not specifically with respect to autolearn.

You never

thought about googling for spamassassin per user and friends? You
never checked the SA wiki?


I have, indeed. No reference to autolearn and persistent storage. The
lack of mention is notable.

I'd expect people to be lining up to tell me I'm mistaken if I
absolutely were.

Can you point me to a change log somewhere documenting autolearn moving
from in-memory and system-wide to per user and persistent?

I don't hold a strong opinion on this. It would be nice if I were wrong.
It would open more options.

I'm just waiting for evidence that it's the case. My perception is that
It's not.


Lets turn this around?  Can you prove autolearn was ever done to memory?

If you mean  autolearn to journal, this is also file based.

I've been using SA since before it was an Apache project, when it was 
developed by McAfee and the sources were on Sourceforge and back then it 
was already file based.






Re: Bayes, Manual and Auto Learning Strategies

2014-07-01 Thread Steve Bergman



Lets turn this around?  Can you prove autolearn was ever done to memory?


I'm not really interested in proving anything. I'm interested in being 
convinced that autolearn is individual file-based when spamc is run as 
the individual user.


I'm not quite sure how that would affect my strategy. But it might (or 
might not) make autolearn useful.


-Steve


Re: Bayes, Manual and Auto Learning Strategies

2014-07-01 Thread Axb

On 07/02/2014 07:37 AM, Steve Bergman wrote:



Lets turn this around?  Can you prove autolearn was ever done to memory?


I'm not really interested in proving anything. I'm interested in being
convinced that autolearn is individual file-based when spamc is run as
the individual user.


It's in the code... but yes, autolearn is always file based and respects 
the per user settings unless you run  spamd with -x



I'm not quite sure how that would affect my strategy. But it might (or
might not) make autolearn useful.


More important, you may need to reconsider is if per user Bayes will 
give you the level of quality you're aiming for, and from experience I 
can tell you: it won't.


Site wide bayes works VERY well even under such ugly conditions as 
traffic with multiple languages, for ham as well as spam.