Re: Bayes db and token expiry questions

2010-03-29 Thread Alex
Hi,

>> Well, what's the missing 120 MB? The journal? Do a complete sync and
>> then delete it.
>
> Probably the signatures in bayes_seen - there's no mechanism for ageing
> them out.

And I assume that isn't a problem then?

>> "too big" is not an absolute figure. If you store 1-occurence tokens
>> you will obviously have more tokens than without them.
>
> There's not really a choice since all tokens start that way.

Maybe a better estimate would be in terms of time. For how long should
the unseen tokens (only occurred once, I guess) remain in the
database? Perhaps that's a good metric. For me it's about a week now.

>> You should use autolearn if you don't do yet.
>
> Autolearning can make things worse by dropping the retention period.

Yes, I'm using autolearn, but how does that affect the retention
period? What do the two have to do with each other? Do you mean
auto-expire, not auto-learn?

My database seems to have improved slightly over the past few days
after increasing the max db size to 1.6M. I guess there is also a lot
of expiry pending also, because the database is currently much larger
than that today:

0.000  02050481  0  non-token data: ntokens

Looks like about 345k to be purged, if I understand correctly?

Thanks,
Alex





Thanks,
Alex


RE: ATTN DEVELOPERS: Mega-Spam

2010-03-29 Thread John Hardin

On Mon, 29 Mar 2010, Brent Kennedy wrote:


Ya know, this got me thinking.  Wonder if I could create a VM with all the
settings and a script to customize the setup.  Then organizations could just
deploy the VM.  Sort of an all in one deployment.  Just update the VM
template every now and then.  Ahh but the learning db might be an issue
oh well just a thought.


A second VM hosting the bayes DB on MYSQL or Postgres. That way you can 
drop-in upgrade the SA vm without destabilizing the bayes DB VM.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Our government wants to do everything it can "for the children,"
  except sparing them crushing tax burdens.
---
 3 days until April Fools' day


Re: trusted_networks

2010-03-29 Thread Matt Kettler
On 3/29/2010 11:40 AM, Kaleb Hosie wrote:
> I'm having a problem with the trusted_networks option. Right now I have it 
> set to:
>
> trusted_networks 10.0.1/24
>
> In postfix, I need to have spamassassin listed under 
> "smtpd_recipient_restrictions" so that it will only scan incoming emails 
> however it would be handy to get this option working if at all possible so it 
> won't scan outgoing emails.
>
> When I try to use this option; I login through telnet port 25, and send the 
> test spam string (from the 10.0.1.0 subnet) it still gets caught in spam. Am 
> I doing something wrong or is there another option I need to choose?
>
> Thanks!
> Kaleb
>
>   

Trusted in this case means "trusted to not forge headers, and while
unlikely to originate spam, this host might relay it." For example, your
front-end MX would be trusted if your SA runs on an internal server the
MX relays to. It will definitely forward whatever spam it gets, because
it forwards all mail.

trusted_networks is not a whitelisting mechanism.

You can check if your trust is working by seeing if messages that are
only handled by trusted hosts match the ALL_TRUSTED rule. This rule
carries a small negative score, but cannot outweigh the GTUBE sample.

In fact, even our whitelist mechanisms won't outweigh a GTUBE. GTUBE is
meant to *ALWAYS* be marked as spam if SA scans it, regardless of
whitelist settings.






RE: ATTN DEVELOPERS: Mega-Spam

2010-03-29 Thread Brent Kennedy
Graylisting does work.  We have been using SQLGrey
(http://sqlgrey.sourceforge.net/) for three years now.  The minute I turned
it on, spam to my junk email folder(what SA used to catch) dropped by 90%.
SQLGrey sits at the MTA level, so it hits the sender when they connect and
before they actually submit email.  

Obviously, it does allow them through if they come back, but most botnet
senders do not retry messages or never have the chance.

I think after I turned it on, the botnet plug-in got bored.  My stats for it
dropped significantly.  So that’s my proof it does adversely affect botnets.
I wish I still had the stats graphs for when I turned it on.  However, you
can see its affect on my graph here: http://brain.chcfl.com/postfix/ ( noted
as rejections ).  I also have active directory setup with the MTA, so no
messages ever hit the server that do not belong nor NDRs generated.  If they
try a dictionary attack, they will be on tarpit duty for a long time.

To see the relief on someone's face after they realize they only 10 junk
emails to glance at rather than 100, you see the value of graylisting.  I
have put my setup in a few other locations and they also report back to me
that their users are now getting work done rather than parsing emails.  

Ya know, this got me thinking.  Wonder if I could create a VM with all the
settings and a script to customize the setup.  Then organizations could just
deploy the VM.  Sort of an all in one deployment.  Just update the VM
template every now and then.  Ahh but the learning db might be an issue
oh well just a thought.

-Brent

-Original Message-
From: Jonas Eckerman [mailto:jonas_li...@frukt.org] 
Sent: Monday, March 29, 2010 6:41 PM
To: John Hardin
Subject: Re: ATTN DEVELOPERS: Mega-Spam

On 2010-03-30 00:12, John Hardin wrote:

> While greylisting will help, it won't spank the offender in that manner.
> It will postpone the message very early in the SMTP exchange, not after
> the body has been received.

Unless the greylisting is done *after* receiving the body. Of course, 
this will spank innocent senders as well.

(My selective greylisting implementation for MIMEDefang does this, 
originally because some stupid MTAs didn't handle tempfails correctly at 
earlier stages... The "selective" stuff keeping delays and spanking of 
innocents down.)

BTW: While I like greylisting because it stops a lot of spam, I've never 
seen any data substantiating claims that it has a measurable negative 
impact on botnets. So I'm not convinced it really does a lot of spanking 
of offenders...

Regards
/Jonas
-- 
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/



Re: ATTN DEVELOPERS: Mega-Spam

2010-03-29 Thread Jonas Eckerman

On 2010-03-30 00:12, John Hardin wrote:


While greylisting will help, it won't spank the offender in that manner.
It will postpone the message very early in the SMTP exchange, not after
the body has been received.


Unless the greylisting is done *after* receiving the body. Of course, 
this will spank innocent senders as well.


(My selective greylisting implementation for MIMEDefang does this, 
originally because some stupid MTAs didn't handle tempfails correctly at 
earlier stages... The "selective" stuff keeping delays and spanking of 
innocents down.)


BTW: While I like greylisting because it stops a lot of spam, I've never 
seen any data substantiating claims that it has a measurable negative 
impact on botnets. So I'm not convinced it really does a lot of spanking 
of offenders...


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: ATTN DEVELOPERS: Mega-Spam

2010-03-29 Thread Mark Martinec
> We've got plenty of time, but I suggest not waiting until it becomes a
> big problem before desperately rushing to fix it :)

Depends on how one defines where a problem starts to become 'big'.

For me the problem of large messages was big enough early last year so
that I had to implement a solution for it in amavisd-new 2.6.3 - with
a corresponding support in the SpamAssassin 3.3.0 library (most of it
in its DKIM plugin, as mentioned by Michael).

From 2.6.3 release notes (April 22, 2009):

- large messages beyond $sa_mail_body_size_limit are now partially passed
  to SpamAssassin and other spam scanners for checking: a copy passed to
  a spam scanner is truncated near or slightly past the indicated limit.
  Large messages are no longer given an almost free passage through spam
  checks.

  Note that message truncation can invalidate a DKIM or DK signature.
  If using (non-default) SpamAssassin rules to assign score points to mail
  with no valid signatures from authors which are expected to always provide
  a valid signature, the message truncation can cause false positives on
  these rules. As a workaround, to a truncated message passed to spam
  scanners, amavisd inserts a header field:
X-Amavis-MessageSize: m, TRUNCATED to n
  which can be captured by SpamAssassin rules, e.g.:
header __TRUNCATED X-Amavis-MessageSize =~ m{\A[^\n]*TRUNCATED}m
  and used in rules like NOTVALID_EBAY to prevent them from triggering.

  Starting with version 3.3.0 of SpamAssassin, its DKIM plugin understands
  the issue and receives undamaged DKIM signature objects directly from
  amavisd, so the above workaround is not needed. Also, a hit on a __TRUNCATED
  rule is automatically generated (explicit header rule is not necessary),
  just in case it might be useful for some purpose.


Just did a grep on our log, seeking out large spam messages (beyond 420 kB)
and print their sizes. Below is the complete list for March 2010. Seems we
are getting about 4.5 large spam messages daily on the average (out of
about 75k messages daily). A lot? Depends on one's point of view.

Mar  1  score: 12.7,  size:   533 kB
Mar  1  score: 23.9,  size:   435 kB
Mar  1  score: 16.4,  size:   533 kB
Mar  1  score:  7.4,  size:   490 kB
Mar  1  score:  9.1,  size:   490 kB
Mar  1  score: 19.6,  size:   721 kB
Mar  2  score: 15.1,  size:  1132 kB
Mar  2  score: 16.9,  size:   643 kB
Mar  2  score: 16.9,  size:   643 kB
Mar  2  score:  7.3,  size:   587 kB
Mar  2  score: 21.9,  size:   721 kB
Mar  2  score: 21.6,  size:   527 kB
Mar  2  score: 24.8,  size:   436 kB
Mar  2  score: 20.5,  size:   527 kB
Mar  2  score: 20.6,  size:   528 kB
Mar  3  score: 23.6,  size:   435 kB
Mar  3  score: 30.4,  size:   543 kB
Mar  3  score: 30.2,  size:   543 kB
Mar  3  score: 30.2,  size:   543 kB
Mar  3  score: 31.5,  size:   543 kB
Mar  3  score: 18.3,  size:  1132 kB
Mar  3  score: 31.5,  size:   543 kB
Mar  4  score: 31.5,  size:   543 kB
Mar  4  score: 32.9,  size:   543 kB
Mar  4  score: 10.3,  size:   719 kB
Mar  4  score: 10.3,  size:   719 kB
Mar  4  score: 10.1,  size:   719 kB
Mar  4  score: 10.0,  size:   720 kB
Mar  4  score: 10.2,  size:   719 kB
Mar  5  score: 16.0,  size:  1132 kB
Mar  5  score: 24.2,  size:   513 kB
Mar  5  score: 25.9,  size:   513 kB
Mar  6  score:  9.9,  size:   719 kB
Mar  7  score: 29.6,  size:   699 kB
Mar  7  score: 12.3,  size:   682 kB
Mar  7  score: 10.2,  size:   433 kB
Mar  7  score: 10.2,  size:   433 kB
Mar  7  score: 38.0,  size:   543 kB
Mar  7  score: 38.0,  size:   543 kB
Mar  7  score: 38.0,  size:   543 kB
Mar  7  score:  7.5,  size:  1787 kB
Mar  7  score: 18.1,  size:   643 kB
Mar  7  score: 38.0,  size:   543 kB
Mar  7  score: 38.0,  size:   543 kB
Mar  8  score: 38.0,  size:   543 kB
Mar  8  score: 18.2,  size:   643 kB
Mar  8  score:  7.1,  size:  1050 kB
Mar  8  score: 18.0,  size:  1132 kB
Mar  8  score: 25.7,  size:   501 kB
Mar  9  score: 30.6,  size:   813 kB
Mar  9  score: 13.7,  size:   779 kB
Mar 10  score: 36.9,  size:   470 kB
Mar 10  score: 29.3,  size:  1407 kB
Mar 10  score: 13.5,  size:   910 kB
Mar 11  score:  8.0,  size:   812 kB
Mar 11  score:  8.4,  size:   821 kB
Mar 11  score: 25.5,  size:   435 kB
Mar 12  score:  8.8,  size:   857 kB
Mar 12  score: 32.5,  size:   543 kB
Mar 12  score: 32.5,  size:   543 kB
Mar 12  score: 32.5,  size:   543 kB
Mar 12  score: 32.5,  size:   543 kB
Mar 12  score: 32.5,  size:   543 kB
Mar 12  score: 32.5,  size:   543 kB
Mar 12  score: 32.5,  size:   543 kB
Mar 12  score: 32.5,  size:   543 kB
Mar 12  score: 32.5,  size:   543 kB
Mar 13  score: 32.5,  size:   543 kB
Mar 13  score:  7.8,  size:   732 kB
Mar 13  score:  7.9,  size:   926 kB
Mar 13  score:  7.7,  size:   732 kB
Mar 13  score: 19.9,  size:   513 kB
Mar 14  score:  8.7,  size:   821 kB
Mar 14  score:  8.8,  size:   821 kB
Mar 14  score:  8.8,  size:   821 kB
Mar 14  score:  8.7,  size:   821 kB
Mar 14  score:  8.8,  size:   821 kB
Mar 14  score:  8.7,  size:   821 kB
Mar 14 

RE: ATTN DEVELOPERS: Mega-Spam

2010-03-29 Thread John Hardin

On Mon, 29 Mar 2010, Brent Kennedy wrote:


My suggestion would be to use graylisting, force them to send that 1MB
message twice.


While greylisting will help, it won't spank the offender in that manner. 
It will postpone the message very early in the SMTP exchange, not after 
the body has been received.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Windows Vista: Windows ME for the XP generation.
---
 3 days until April Fools' day


Re: ATTN DEVELOPERS: Mega-Spam

2010-03-29 Thread Martin Gregorie
On Mon, 2010-03-29 at 23:01 +0200, Mathias Homann wrote:
> I think it has, I get about 2-5 mega spams per day by now.
> and I can't do greylisting because  I have to fetchmail from a central 
> mail server at my hoster that is not under my direct control.
> And no, moving from a vhost to a root server just to be able to 
> greylist is not an option. 5 euro per month versus 50 euro per 
> month...
> 
Can you persuade your hosting site to implement grey-listing? 

My ISP implemented grey-listing over a year ago. When they did, my
overall spam rate immediately dropped from 80% of my mail stream to
under 10%. Currently spam is running at less than 5%. As a result my SA
subsystem is mostly trapping spam sent over less-common channels, e.g.
mailing lists and an ISP-provided address I no longer use.

Having to handle a stream of large spam messages can't be improving the
throughput and disk usage of your hosting site's mail server either, so
its worth pointing that out to them. They may be more amenable to
introducing grey-listing than you realise.
 

Martin




Re: ATTN DEVELOPERS: Mega-Spam

2010-03-29 Thread Karsten Bräckelmann
On Mon, 2010-03-29 at 16:57 -0400, Charles Gregory wrote:
> The spams I've seen so far look more 'amateur' than 'pro'. Easily tracable 
> IP's. Blacklistable domains. I'm just throwing my idea into the queue now 
> so that it can be smoothly integrated with a future release. We've got 
> plenty of time, but I suggest not waiting until it becomes a big problem 
> before desperately rushing to fix it :)

Agreed on the latter part. But then again, this is a topic for the dev
list [1] to start a discussion, not here.

  guenther


[1] Also note your very own Subject.

-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: ATTN DEVELOPERS: Mega-Spam

2010-03-29 Thread Mathias Homann
Am Montag 29 März 2010 schrieb Karsten Bräckelmann:
> On Mon, 2010-03-29 at 16:23 -0400, Brent Kennedy wrote:
> > Wow, I knew this was coming at some point.  I just figured it was
> > too expensive.
> 
> You did read the entire thread, right? :)  There's nothing new
> about this. Moreover, this still is a rare occurrence. Note even
> Charles, who started this thread, claims to have received *one*
> such spam. And it appears to be his first. ;)
> 
> Now, if this starts to become a more general pattern...


I think it has, I get about 2-5 mega spams per day by now.
and I can't do greylisting because  I have to fetchmail from a central 
mail server at my hoster that is not under my direct control.
And no, moving from a vhost to a root server just to be able to 
greylist is not an option. 5 euro per month versus 50 euro per 
month...


bye,
MH



Re: ATTN DEVELOPERS: Mega-Spam

2010-03-29 Thread Charles Gregory

On Mon, 29 Mar 2010, Karsten Bräckelmann wrote:

You did read the entire thread, right? :)  There's nothing new about
this. Moreover, this still is a rare occurrence. Note even Charles, who
started this thread, claims to have received *one* such spam. And it
appears to be his first. ;)


Last September the number of spams exceeding 256KB became frequent enough 
that I bumped up my limit. Now I'm starting to see spams past the new 
limit (400KB). But when they jump up to 1MB, maybe it's time for a 
different solution, and maybe regain some of system efficiency by adding 
the suggested mechanism to SA and only doing significant body scans on 
messages less than 256KN again :)



Now, if this starts to become a more general pattern...


The spams I've seen so far look more 'amateur' than 'pro'. Easily tracable 
IP's. Blacklistable domains. I'm just throwing my idea into the queue now 
so that it can be smoothly integrated with a future release. We've got 
plenty of time, but I suggest not waiting until it becomes a big problem 
before desperately rushing to fix it :)


My 0.02 dollars

- C

Re: Sought Rules Back?

2010-03-29 Thread Karsten Bräckelmann
On Mon, 2010-03-29 at 16:05 -0400, Jason Bertoch wrote:
> > Btw, the three rules JM_SOUGHT_FRAUD_{1,2,3} have a score of zero
> > as per Justin's request (Bug 6155 c 38, c72, c89, c124).
> > Not sure if people using the channel realize that scores
> > need to be bumped up.  Btw, I prefer to avoid them monopolizing
> > the score when more than one hits:
> >
> > score JM_SOUGHT_FRAUD_1 0.1
> > score JM_SOUGHT_FRAUD_2 0.1
> > score JM_SOUGHT_FRAUD_3 0.1
> > meta  JM_SOUGHT_FRAUD_ANY JM_SOUGHT_FRAUD_1 || JM_SOUGHT_FRAUD_2 || 
> > JM_SOUGHT_FRAUD_3
> > score JM_SOUGHT_FRAUD_ANY 3.0

> Bug 6155 is now closed, but the SOUGHT rules still have a score of 0. 
> Anyone have an idea on when these rules will be activated again?

The zero score request applies *only* to the SOUGHT_FRAUD sub-set. It
does *not* affect SOUGHT. Those do have scores according to the GA run.

Also, this applies *only* to 3.3, where this moved into stock. Again,
the dedicated sa-update channel (also suitable for 3.2) is *not*
affected and still has the same scores it used to.


Now, regarding activating again -- just do. They are merely disabled by
default (in 3.3 stock). You can "activate" them on your site, simply by
dropping score lines into your local config.

  guenther


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



RE: ATTN DEVELOPERS: Mega-Spam

2010-03-29 Thread Karsten Bräckelmann
On Mon, 2010-03-29 at 16:23 -0400, Brent Kennedy wrote:
> Wow, I knew this was coming at some point.  I just figured it was too
> expensive.  

You did read the entire thread, right? :)  There's nothing new about
this. Moreover, this still is a rare occurrence. Note even Charles, who
started this thread, claims to have received *one* such spam. And it
appears to be his first. ;)

Now, if this starts to become a more general pattern...

  guenther


> -Original Message-
> From: Charles Gregory [mailto:cgreg...@hwcn.org] 
> 
> Literally, Mega-Spam. I just got a spam with 1MB of images.

-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



RE: ATTN DEVELOPERS: Mega-Spam

2010-03-29 Thread Brent Kennedy
Wow, I knew this was coming at some point.  I just figured it was too
expensive.  

My suggestion would be to use graylisting, force them to send that 1MB
message twice.  Course zombie bots don't do that generally, so you would
never even have to deal with it.  You could also use the botnet plug-in.

It would be good if SA could handle this though.  The above are only
temporary solutions to a bigger problem.

-Brent

-Original Message-
From: Charles Gregory [mailto:cgreg...@hwcn.org] 
Sent: Monday, March 29, 2010 1:09 PM
To: users@spamassassin.apache.org
Subject: ATTN DEVELOPERS: Mega-Spam


Literally, Mega-Spam. I just got a spam with 1MB of images.

My suggestion has been made before, but I would like to ask that it now 
be taken a bit more seriously. SA needs an option to allow efficient
'partial' scanning of large e-mails, so that, for example, we can 
peform all the valuable header checks, and maybe even scan for URIBL hits 
within the first few hundred K of the body?

Is it possible (and easy!) to set a flag that tells SA to stop testing 
aganist the body when it reaches a certain byte count Or perhaps, if 
I understand the docs correctly, most rules only trigger on textual 
message parts anyway, so by simply disabling 'full' rules and possbily
'rawbody', we could get the desired result without too much of a 
processing hit?

- C



Re: Sought Rules Back?

2010-03-29 Thread Jason Bertoch

On 2010/02/01 10:30 AM, Mark Martinec wrote:

Update returned sought rules 1/31/2010.


Actually back since Jan 6. :)  Re-viewed about 1k fraud spam the
following days, for the Sought Fraud sub-set.


Btw, the three rules JM_SOUGHT_FRAUD_{1,2,3} have a score of zero
as per Justin's request (Bug 6155 c 38, c72, c89, c124).
Not sure if people using the channel realize that scores
need to be bumped up.  Btw, I prefer to avoid them monopolizing
the score when more than one hits:

score JM_SOUGHT_FRAUD_1 0.1
score JM_SOUGHT_FRAUD_2 0.1
score JM_SOUGHT_FRAUD_3 0.1
meta  JM_SOUGHT_FRAUD_ANY JM_SOUGHT_FRAUD_1 || JM_SOUGHT_FRAUD_2 || 
JM_SOUGHT_FRAUD_3
score JM_SOUGHT_FRAUD_ANY 3.0


   Mark


Bug 6155 is now closed, but the SOUGHT rules still have a score of 0. 
Anyone have an idea on when these rules will be activated again?


--
/Jason



smime.p7s
Description: S/MIME Cryptographic Signature


Re: FREEMAIL_ENVFROM_END_DIGIT score

2010-03-29 Thread Karsten Bräckelmann
On Mon, 2010-03-29 at 13:52 -0400, Jason Bertoch wrote:
> I recently received a FP report on an e-mail that hit on, among other 
> things, FREEMAIL_ENVFROM_END_DIGIT.  This rule has a score of 1.6, which 
> seems maybe a little high.  Henrik mentioned the same thing in comment 
> 185 [1] of Bug 6155 which is closed as resolved/fixed.  The assumption 
> was that there probably isn't much ham in the corpora that matches 
> addresses like these and therefore the score may be unfairly high.
> 
> The closed bug was addressing overall score generation and not directly 
> related to this rule.  Have any of the devs already looked at this 
> particular issue, or should this be opened as a new bug for further 
> investigation?

Please do. We might want to lock down the score -- given there's no way
yet to do "minimum of score X and GA result", which would be even
better.

  guenther


> [1] https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6155#c185

-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: FREEMAIL_ENVFROM_END_DIGIT score

2010-03-29 Thread Michael Scheidell

On 3/29/10 1:52 PM, Jason Bertoch wrote:
I recently received a FP report on an e-mail that hit on, among other 
things, FREEMAIL_ENVFROM_END_DIGIT.  This rule has a score of 1.6, 
which seems maybe a little high.  Henrik mentioned the same thing in 
comment 185 [1] of Bug 6155 which is closed as resolved/fixed.  The 
assumption was that there probably isn't much ham in the corpora that 
matches addresses like these and therefore the score may be unfairly 
high.


The closed bug was addressing overall score generation and not 
directly related to this rule.  Have any of the devs already looked at 
this particular issue, or should this be opened as a new bug for 
further investigation?


WAY too many gmail and hotmail and yahoo accounts out there, and they 
HAVE TO END IN DIGITS.so, FREEMAIL-ENVFROM_END_DIGIT is redundant with 
FREEMAIL.


oh, and I have clients who claim their lawyer uses aol for his corporate 
email address.  and guess what?  yes, it ends in a digit since his 
lastname , first/last and last/first were already taken.




--
Michael Scheidell, CTO
Phone: 561-999-5000, x 1259
> *| *SECNAP Network Security Corporation

   * Certified SNORT Integrator
   * 2008-9 Hot Company Award Winner, World Executive Alliance
   * Five-Star Partner Program 2009, VARBusiness
   * Best Anti-Spam Product 2008, Network Products Guide
   * King of Spam Filters, SC Magazine 2008

__
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.secnap.com/products/spammertrap/
__  


Re: spamc syslog loglevel for "skipped message, greater than max message size"

2010-03-29 Thread Mark Martinec
Philipp,

> why does
> 
> spamc[28825]: [ID 702911 mail.error] skipped message, greater than max
> message size (512000 bytes)
> 
> have to be log level error?
> 
> Instead of error would "warn" not be enough?

That was fixed in 3.3.0:

  https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5325


Mark


spamc syslog loglevel for "skipped message, greater than max message size"

2010-03-29 Thread mailinglists
Hi

why does

spamc[28825]: [ID 702911 mail.error] skipped message, greater than max
message size (512000 bytes)

have to be log level error?

Instead of error would "warn" not be enough?

thanks,
Philipp



FREEMAIL_ENVFROM_END_DIGIT score

2010-03-29 Thread Jason Bertoch
I recently received a FP report on an e-mail that hit on, among other 
things, FREEMAIL_ENVFROM_END_DIGIT.  This rule has a score of 1.6, which 
seems maybe a little high.  Henrik mentioned the same thing in comment 
185 [1] of Bug 6155 which is closed as resolved/fixed.  The assumption 
was that there probably isn't much ham in the corpora that matches 
addresses like these and therefore the score may be unfairly high.


The closed bug was addressing overall score generation and not directly 
related to this rule.  Have any of the devs already looked at this 
particular issue, or should this be opened as a new bug for further 
investigation?



[1] https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6155#c185


--
/Jason




smime.p7s
Description: S/MIME Cryptographic Signature


Re: ATTN DEVELOPERS: Mega-Spam

2010-03-29 Thread Karsten Bräckelmann
Aw, is that shouting really necessary?  Oh, yes, it is indeed -- you are
trying to get heard over on the dev list, so you need to be quite loud
from here... ;)

The dev list is what you want.


On Mon, 2010-03-29 at 13:09 -0400, Charles Gregory wrote:
> Literally, Mega-Spam. I just got a spam with 1MB of images.

The largest one I've seen included about 4.5 MByte worth of 7 jpeg
images, the largest one of which 1.2 MByte. And that doesn't even
include the considerable base64 overhead for the mail...

On the other hand: Guess what, I get about one spam per year exceeding
the default size threshold of 500 kByte.

  guenther


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: ATTN DEVELOPERS: Mega-Spam

2010-03-29 Thread Michael Scheidell



On 3/29/10 1:09 PM, Charles Gregory wrote:


Literally, Mega-Spam. I just got a spam with 1MB of images.

My suggestion has been made before, but I would like to ask that it 
now be taken a bit more seriously. SA needs an option to allow efficient
'partial' scanning of large e-mails, so that, for example, we can 
peform all the valuable header checks, and maybe even scan for URIBL 
hits within the first few hundred K of the body?

could, will and does mess up dkim checks and language checks.

That said, amavisd-new has a switch to do this already, and for the very 
same reason.
(yes, it costs the scumbags nothing to have aunt martha and her zombot 
send out 600MM 1MB spams)


--
Michael Scheidell, CTO
Phone: 561-999-5000, x 1259
> *| *SECNAP Network Security Corporation

   * Certified SNORT Integrator
   * 2008-9 Hot Company Award Winner, World Executive Alliance
   * Five-Star Partner Program 2009, VARBusiness
   * Best Anti-Spam Product 2008, Network Products Guide
   * King of Spam Filters, SC Magazine 2008

__
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.secnap.com/products/spammertrap/
__  


ATTN DEVELOPERS: Mega-Spam

2010-03-29 Thread Charles Gregory


Literally, Mega-Spam. I just got a spam with 1MB of images.

My suggestion has been made before, but I would like to ask that it now 
be taken a bit more seriously. SA needs an option to allow efficient
'partial' scanning of large e-mails, so that, for example, we can 
peform all the valuable header checks, and maybe even scan for URIBL hits 
within the first few hundred K of the body?


Is it possible (and easy!) to set a flag that tells SA to stop testing 
aganist the body when it reaches a certain byte count Or perhaps, if 
I understand the docs correctly, most rules only trigger on textual 
message parts anyway, so by simply disabling 'full' rules and possbily
'rawbody', we could get the desired result without too much of a 
processing hit?


- C


RE: trusted_networks

2010-03-29 Thread Kaleb Hosie
> On 29.3.2010 18:40, Kaleb Hosie wrote:
> > I'm having a problem with the trusted_networks option.
> Right now I have it set to:
> >
> > trusted_networks 10.0.1/24
> >
> > In postfix, I need to have spamassassin listed under
> "smtpd_recipient_restrictions" so that it will only scan
> incoming emails however it would be handy to get this option
> working if at all possible so it won't scan outgoing emails.
> >
> > When I try to use this option; I login through telnet port
> 25, and send the test spam string (from the 10.0.1.0 subnet)
> it still gets caught in spam. Am I doing something wrong or
> is there another option I need to choose?
> >
>
> What is your glue to SpamAssassin? How is it called?
>
> I call SA from maildrop or procmail, which automatically
> makes it for incoming only. There are so many ways to do it.
>
> --
> http://www.iki.fi/jarif/
>
> You've been leading a dog's life.  Stay off the furniture.
>

I thought that with that option, SA is able to decide itself as to whether to 
scan it or not.

The program that I use to interface with SA is a rather unknown program called 
SpamAssassin Quarantine (SAQ). If SA isn't able to decide for itself to not 
scan particular emails depending upon whether it's from the internal network or 
not then I'll have to see about reprogramming SAQ to work.

Kaleb


Re: trusted_networks

2010-03-29 Thread Karsten Bräckelmann
On Mon, 2010-03-29 at 11:40 -0400, Kaleb Hosie wrote:
> I'm having a problem with the trusted_networks option. Right now I have
> it set to:
> 
> trusted_networks 10.0.1/24

> When I try to use this option; I login through telnet port 25, and send
> the test spam string (from the 10.0.1.0 subnet) it still gets caught
> in spam. Am I doing something wrong or is there another option I need
> to choose?

Please re-read the documentation about trusted_networks again. It is
*not* an option for bypassing SA.

In fact, there is no such option. SA will scan whatever it gets fed. So
if you want to bypass SA for mail generated from a particular network,
you need to adjust the glue that calls SA to just not do that.


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: trusted_networks

2010-03-29 Thread Jari Fredriksson
On 29.3.2010 18:40, Kaleb Hosie wrote:
> I'm having a problem with the trusted_networks option. Right now I have it 
> set to:
> 
> trusted_networks 10.0.1/24
> 
> In postfix, I need to have spamassassin listed under 
> "smtpd_recipient_restrictions" so that it will only scan incoming emails 
> however it would be handy to get this option working if at all possible so it 
> won't scan outgoing emails.
> 
> When I try to use this option; I login through telnet port 25, and send the 
> test spam string (from the 10.0.1.0 subnet) it still gets caught in spam. Am 
> I doing something wrong or is there another option I need to choose?
> 

What is your glue to SpamAssassin? How is it called?

I call SA from maildrop or procmail, which automatically makes it for
incoming only. There are so many ways to do it.

-- 
http://www.iki.fi/jarif/

You've been leading a dog's life.  Stay off the furniture.



signature.asc
Description: OpenPGP digital signature


trusted_networks

2010-03-29 Thread Kaleb Hosie
I'm having a problem with the trusted_networks option. Right now I have it set 
to:

trusted_networks 10.0.1/24

In postfix, I need to have spamassassin listed under 
"smtpd_recipient_restrictions" so that it will only scan incoming emails 
however it would be handy to get this option working if at all possible so it 
won't scan outgoing emails.

When I try to use this option; I login through telnet port 25, and send the 
test spam string (from the 10.0.1.0 subnet) it still gets caught in spam. Am I 
doing something wrong or is there another option I need to choose?

Thanks!
Kaleb


Re: Bayes db and token expiry questions

2010-03-29 Thread RW
On Mon, 29 Mar 2010 13:03:59 +0200
Kai Schaetzl  wrote:

> Alex wrote on Sun, 28 Mar 2010 13:38:25 -0400:
> 
> > I have a bayes db that's about 160MB with a 40MB token db on a
> > system with about 100k messages per day.
> 
> Well, what's the missing 120 MB? The journal? Do a complete sync and
> then delete it.
 
Probably the signatures in bayes_seen - there's no mechanism for ageing
them out.

> You should be
> aware that the expiry kicks in at 75%, not at 100% of max_db_size.

And it may reduce the tokens to 37.5% of nominal

> I suggest you change to SQL. This eliminates the journal.

Isn't that slower than journalled  db?


> > database was too big, so I lowered it back down, but I think that
> > was a mistake.
> 
> "too big" is not an absolute figure. If you store 1-occurence tokens
> you will obviously have more tokens than without them.

There's not really a choice since all tokens start that way.

> You should use autolearn if you don't do yet. 

Autolearning can make things worse by dropping the retention period.



Re: Bayes db and token expiry questions

2010-03-29 Thread Kai Schaetzl
Alex wrote on Sun, 28 Mar 2010 13:38:25 -0400:

> I have a bayes db that's about 160MB with a 40MB token db on a system
> with about 100k messages per day.

Well, what's the missing 120 MB? The journal? Do a complete sync and then 
delete it.

I've just raised the max_db_size set
> to 1.1M tokens (there are currently 1.06M tokens in there).

That's not much for a system with 100.000 messages a day. I don't mean 
it's not sufficient, it is just not "too much". You should be aware that 
the expiry kicks in at 75%, not at 100% of max_db_size.

I've also
> changed bayes to write to the journal instead of directly to the
> database and just checking it periodically to see if the journal needs
> to be synced.

I suggest you change to SQL. This eliminates the journal.

> 
> Can someone explain to me the relationship between the frequency of
> "1-occurrence tokens" and the size of the database? Here is the output
> from a recent manual sync:
> 
> token frequency: 1-occurrence tokens: 72.60%
> token frequency: less than 8 occurrences: 18.11%
> 
> I was thinking that the because the tokens are seen only once,

it probably means you get a lot of fresh tokens in. Do you autolearn?

the
> database was too big, so I lowered it back down, but I think that was
> a mistake.

"too big" is not an absolute figure. If you store 1-occurence tokens you 
will obviously have more tokens than without them. If you slash the db 
(which slashes from all tokens, not just those 1.o ones) and the 
performance goes down afterwards that was obviously a wrong decision ;-) I 
don't know if and how this is reflected in the database itself in size. 
This is a DBM database which will have certain sizes by design no matter 
how many tokens are in it. If the token database is only 40 MB that is not 
overly large, it's normal.

Now some of the same emails are continually hitting only
> BAYES_50 while others seemingly the same hit BAYES_99. I've now raised
> the number of tokens available and continue to manually train the
> database with spam and ham (there are about 1.1M spam and 500k ham
> currently).

You should use autolearn if you don't do yet. If you want to be safe you 
can change the learning thresholds to safer values. (I think I use 8 for 
spam and keep the default for ham.)

> Have I configured something wrong, or am I misunderstanding how this
> works? Is there something else I should read?

I think your db was ok as it was. You should read how to change to SQL 
;-) Do the expiry once per night per cron.

Kai

-- 
Get your web at Conactive Internet Services: http://www.conactive.com