Re: recent update to __STYLE_GIBBERISH_1 leads to 100% CPU usage

2019-05-29 Thread Karsten Bräckelmann
On Wed, 2019-05-29 at 12:47 +0200, Stoiko Ivanov wrote:
> On Wed, 29 May 2019 11:31:42 +0200 Matthias Egger  wrote:
> > On 28.05.19 10:31, Stoiko Ivanov wrote:
> > > with a recent update to the ruleset, we're encountering certain
> > > mails, which cause the rule-evaluation to use 100% cpu.

Thanks for the report, Stoiko.


> > Your sample just triggered the error and therefore the system started 
> > blowing off partially :-) So next time, please paste that example to 
> > e.g. pastebin or github or some website and link to it ;-)
> 
> Aye - sorry for that! I first wanted to open a bug-report at bugzilla,
> but since the one which dealt with a similar issue contained the
> suggestion to contact the user-list with problems for single rules - I
> did just that - without considering those implications!
> 
> Next time I'll definitely take the pastebin-option!

Both is good advice, filing a bug report as well as generally using
pastebin or similar external method to provide samples...

I see this has been filed in bugzilla by now.


> > But anyway, can you tell me how you found out __STYLE_GIBBERISH_1 is
> > the culprit? I have no clue how to isolate that, since a strace does
> > not really help... Or is there some strace for perl which i do not
> > know?
> 
> hmm - in that case the way to go was to enable a commented out
> debug-statement in the spamassassin source, which lists which rule is
> evaluated. (on 3.4.2 installed on a Debian this is
> in /usr/share/perl5/Mail/Spamassassin/Plugin/Check.pm - in
> do_rawbody_tests - just comment out the if-condition for would_log
> 
> Then you see it in the debug-output

Hmm, curious why that would be commented out.

It's the rules-all debug area feature that should generally be
available since the 3.4 branch, IIRC.

  spamassassin -D rules-all

will then announce regex rules *before* evaluating them, so even long-
running regex rules that do not match are easy to identify.


-- 
Karsten Bräckelmann  -- open source. hacker. assassin.



Re: recent update to __STYLE_GIBBERISH_1 leads to 100% CPU usage

2019-05-29 Thread Karsten Bräckelmann
On Wed, 2019-05-29 at 08:27 +0200, Markus Benning wrote:
> Hi,
> 
> seems to work.
> 
> Had to add
> 
> score __STYLE_GIBBERISH_1 0

That's a non-scoring sub-rule, setting its score to 0 has no effect.
Redefining the rule to disable it is the way to go:

  meta __STYLE_GIBBERISH_1  0

> to my SA config to make your mail pass.


-- 
Karsten Bräckelmann  -- open source. hacker. assassin.



Re: Can't Get Removed From List

2018-02-27 Thread Karsten Bräckelmann
On Mon, 2018-02-26 at 10:13 -0700, Kevin Viner wrote:
> Hi everybody, I have an opt-in mailing list through MailChimp, and follow all
> best practices for my monthly emails. Unfortunately, every time I send out a
> list, I'm getting my fingerprint marked by Razor as spammy. SpamAssassin
> advice is: 

The following text is not SA "advice" nor report.

You should start by consulting who / what gave that text in response to
get details.


> "You're sending messages that people don't want to receive, for example
> "=?utf-8?Q?=E2=9D=A4=C2=A0Valentine=27s=20Day=20Mind=20Reading?=".  You need
> to audit your mailing lists."
> 
> The problem I'm having is that I'm not receiving any abuse reports through
> MailChimp, I follow all best practice sending guidelines, and am not sending
> out spammy emails. I'm a professional entertainer with a fairly large list.
> 
> Cloudmark has been helpful in resetting my fingerprint upon request, but
> this has become an ongoing monthly problem that they don't seem to be
> interested in resolving with me. Please advise, as nobody seems to be able
> to tell me what is happening. I have a monthly email database of 10,000+, so
> if there are 1 or 2 complaints happening (which MailChimp isn't even
> seeing), it seems like a 0.1% or less rate of complaints isn't anything I
> can really do something about. And every time I'm flagged, I start having
> issues sending out emails in my day to day work.

-- 
Karsten Bräckelmann  -- open source. hacker. assassin.


Re: FROM header with two email addresses

2017-10-24 Thread Karsten Bräckelmann
On Tue, 2017-10-24 at 13:22 +0200, Merijn van den Kroonenberg wrote:
> > Hello all, I was the original poster of this topic but was away for a
> > couple of days.
> > I find it amazing to see the number of suggestions and ideas that have
> > come up here.
> > 
> > However none of the constuctions matched "my" From: lines of the form
> > 
> > From: "Firstname Lastname@"  > sendern...@real-senders-domain.com
> > <mailto:sendern...@real-senders-domain.com>>

> My comments in this mail are only about the
> "us...@companya.com" 
> situation, not about actual double from addresses.

Indeed, in this thread multiple different forms of "email address alike
in From: sender real name" have surfaced. This type is occasionally
used to try to look legit by using real, valid addresses of the
recipient's domain (a colleague) instead of a real name, wich is harder
to get correct and easier for humans to spot irregularities in.

The OP's form looks like a broken From header and an intermediate SMTP
choking on and rewriting it.


-- 
Karsten Bräckelmann  -- open source. hacker. assassin.


Re: Sender needs help with false positive

2017-08-07 Thread Karsten Bräckelmann
On Mon, 2017-08-07 at 19:15 -0400, Alex wrote:
> > version=3.4.0
> 
> Version 3.4.0 is like ten years old. I also don't recall BAYES_999
> being available in that version, so one thing or the other is not
> correct.

Minor nitpick: 3.4.0 was released in Feb 2014, slightly less than 10
years ago. ;)  But that's code only anyway, with sa-update rules'
version and age are kept up-to-date independently.

Similarly the BAYES_999 test indeed is not part of the original 3.4.0
release. It has been published via sa-update though, and even older
3.3.x installations with sa-update have that rule today.

The check_bayes() eval rule always supported the 99.9% variant, it's
just a float number less than 1.0...


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Results of Individual Tests on spamd "CHECK"

2017-08-07 Thread Karsten Bräckelmann
On Mon, 2017-08-07 at 14:17 -0500, Jerry Malcolm wrote:
> I tried SYMBOLS.  You are correct that it lists the tests, but not the 
> results:
> 
> BAYES_95,HTML_IMAGE_ONLY_32,HTML_MESSAGE,JAM_DO_STH_HERE,LOTS_OF_MONEY,MIME_HTML_ONLY,
>  [...]
> 
> But I saw this line in a forum discussion... So I'm sure there is some 
> way to generate it.
> 
>  >>> tests=[AWL=-1.103, BAYES_00=-2.599, 
> HTML_MESSAGE=0.001,URIBL_BLACK=1.955, URIBL_GREY=0.25]
> 
> Any ideas?

That particular one appears to be part of the Amavisd-new generated
headers. You can get the same rules with individual scores in stock SA
using the _TESTSSCORES(,)_ Template Tag with the add_header config
option. See M::SA::Conf docs [1].

For ad-hoc testing without adding this to your general SA / spamd
configuration, feed the sample message to the plain spamassassin script
with additional --cf configuration:

  spamassassin --cf="add_header all TestsScores tests=_TESTSSCORES(,)_"  < 
message

Also see 10_default_prefs.cf for more informational detail in the stock
Status header.


> On 8/7/2017 1:13 PM, Daniel J. Luke wrote:
> > On Aug 7, 2017, at 2:00 PM, Jerry Malcolm  wrote:
> > > I'm invoking spamd using:
> > >
> > > CHECK SPAMC/1.2\r\n
> > > 

Not your best option for ad-hoc tests... ;)

> > > Can someone tell me what I need to add to the spamd call (and the
> > > syntax) in order to get the results of the individual tests
> > > returned as part of the status?

You will need SA configuration. The spamd protocol itself does not allow
such fine grained configuration.


[1] http://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.html

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Is this really the SpamAssassin list? (was Re: unsubscribe)

2014-10-28 Thread Karsten Bräckelmann
On Tue, 2014-10-28 at 19:56 -0700, jdebert wrote:
> On Wed, 29 Oct 2014 00:33:04 +0100
> Karsten Bräckelmann  wrote:

> > > > > Redirecting them makes people lazy. Better than annoying but
> > > > > they don't learn anything except to repeat their mistakes.
> > > > 
> > > > Your assumption, the list moderators (aka owner, me being one of
> > > > them) would simply and silently obey and dutifully do the
> > > > un-subscription for them, is flawed. ;)
> > > 
> > > This assumption is unwarranted. I did not say that.
> > 
> > You said that the unsubscribe-to-list posting user would not learn and
> > get lazy, when those posts get redirected to the owner rather than
> > hitting the list.
> 
> Not exactly what I said. 

In the part you snipped of my previous post, I asked you to explain what
you did mean, if not what I discussed in detail.

This response is not helpful, neither constructive.


> > Not learning: False. As I said, moderators would respond with
> > explanation and instructions. In particular learning about his mistake
> > and how to properly (and in future) unsubscribe, does make him learn.
> > Since we'd not just unsub him, the user will even have to proof that
> > he learned, by following procedures unsubscribing himself.
> 
> False as evidenced by how the same people repeat the same thing on
> the same list and on other lists. Got it.

Show me an example of one subscriber repeating this mistake on this
list.

Show me an example of one subscriber repeating this mistake on this
list, after the proposed and discussed "redirect to owner" procedure is
in effect, which is meant to help with the issue.

You cannot possibly show the latter, since it is not yet in effect. So
there is no "evidence" as you just claimed. Moreover, there is
absolutely no basis to your "evidence" claim, that directly approaching
those subscribers by moderators would not make them learn.

You'll have a really hard time showing the first, too.

Got it. (Not a native English speaker, what's that supposed to mean in
the context of your quote? Equivalent of a foot-stomp?)


> > Getting lazy: People are lazy. But since there's absolutely nothing we
> > would simply do for them, there's no potential in the process to get
> > lazy over. They will have to read and understand how to do it. And
> > they will have to follow every step of the unsub procedure themselves.
> 
> The long form of saying we're agreed. And one of the reasons to
> automate the process.

Fun research project for you in strong favor of automation: How many
such posts did this list get in the last month? Statistically irrelevant
spike. Last 6 months? Last year? Two years?

I am a moderator of this list. I do know that handling those bad unsub
requests manually would be barely noticeable compared to the general
moderation load. Which isn't high either.


> > > Did you read the rest of the message?
> > 
> > Yes. And quite frankly, "catching unsub messages and bouncing them
> > with a note" as you mentioned is almost identical to the proposed
> > "redirect them to owner" to handle it. With the latter involving
> > moderators, having the advantage, that we can and will offer
> > additional help if need be.
> 
> Having the listserver catch the messages and handle them is
> "almost identical" to redirecting them to the owner for manual
> handling? I could see that if list owners still managed lists
> manually. But there's this nifty new software that manages lists
> automatically, freeing the list owners from all that drudge work.

I am very sorry, but it appears you have absolutely no clue what nursing
mailing lists today means.

Yes, all subscription (and un-subscription) is handled automatically. No
owner intervention, not even notices. Automation.

What we mostly do face is posts by non-subscribers. Mostly spam (just
ignore), but also a non-negligible amount of valid posts by
non-subscribers, or list-replies by subscribers using a wrong address.
The latter outweighs by far the amount of non-subscribers.

Unsub posts to the list? About the same as non-subscriber posts. Very
limited. Almost negligible, if some rare samples won't trigger an
on-list shitstorm.


With the proposed process in place, I would have spent less lime
managing and resolving the last 12 months' bad unsub requests, than it
took me arguing with you about something that really does not concern
you.


> Your assumption is that I am telling you to do all this manually. You
> seemed to be ambivalent about this, not preferring to do it manually but
> seeming to prefer to do it manually. 

No. I know from experience that 

Re: procmail

2014-10-28 Thread Karsten Bräckelmann
On Tue, 2014-10-28 at 22:10 -0400, David F. Skoll wrote:
> > frankly in times of LMTP and Sieve there is hardly a need to use 
> > procmail - it is used because "i know it and it just works" - so why 
> > should somebody step in and maintain it while nobody is forced to use
> > it
> 
> I use Email::Filter, not procmail, but tell me: Can LMTP and Sieve do
> the following?

Dammit, this is just too teasing... Sorry. ;)

procmail can do all of those. (Yeah, not your question, but still...)


> 1) Cc: mail containing a specific header to a certain address, but only
> between 08:00-09:00 or 17:00-21:00.

Sure. Limiting to specific days or hours can be achieved without
external process by recipe conditions based on our own SMTP server's
Received header, which we can trust to be correct.

> 2) Archive mail in a folder called Received-Archive/-MM.

Trivial. See man procmailex.

> 3) Take mail to a specific address, shorten it by replacing things
> like "four" with "4", "this" with "dis", etc. and send as much of the
> result as possible as a 140-character SMS message?  Oh, and only do
> this if the support calendar says that I am on the support pager that
> week.

Yep. Completely internal, given there's an email to SMS gateway
(flashback 15 years ago), calling an external process for SMS delivery
otherwise.

> 4) Take the voicemail notifications produced by our Asterisk
> software and replace the giant .WAV attachment with a much
> smaller .MP3 equivalent.

Check. Calling an external process, but I doubt procmail and ffmpeg /
avconv is worse than Perl and the modules required for that audio
conversion.

Granted, in this case I'd need some rather skillful sed-fu in the pipe,
or a little help of an external Perl script using MIME-tools... ;)


> These are all real-world requirements that my filter fulfills.  And it
> does most of them without forking external processes.  (Item 3 actually 
> consults
> a calendar program to see who's on support, but the rest are all handled
> in-process.)

That said, and all joking apart:

Do you guys even remember when this got completely off topic?


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Is this really the SpamAssassin list? (was Re: unsubscribe)

2014-10-28 Thread Karsten Bräckelmann
On Tue, 2014-10-28 at 11:19 -0700, jdebert wrote:
> On Tue, 28 Oct 2014 04:27:14 +0100
> Karsten Bräckelmann  wrote:
> > On Mon, 2014-10-27 at 19:44 -0700, jdebert wrote:

> > > Redirecting them makes people lazy. Better than annoying but they
> > > don't learn anything except to repeat their mistakes.
> > 
> > Your assumption, the list moderators (aka owner, me being one of them)
> > would simply and silently obey and dutifully do the un-subscription
> > for them, is flawed. ;)
> 
> This assumption is unwarranted. I did not say that.

You said that the unsubscribe-to-list posting user would not learn and
get lazy, when those posts get redirected to the owner rather than
hitting the list.

Not learning: False. As I said, moderators would respond with
explanation and instructions. In particular learning about his mistake
and how to properly (and in future) unsubscribe, does make him learn.
Since we'd not just unsub him, the user will even have to proof that he
learned, by following procedures unsubscribing himself.

Getting lazy: People are lazy. But since there's absolutely nothing we
would simply do for them, there's no potential in the process to get
lazy over. They will have to read and understand how to do it. And they
will have to follow every step of the unsub procedure themselves.

So if my assumption was really that unwarranted, please explain what
else you did mean with those two sentences.


> Did you read the rest of the message?

Yes. And quite frankly, "catching unsub messages and bouncing them with
a note" as you mentioned is almost identical to the proposed "redirect
them to owner" to handle it. With the latter involving moderators,
having the advantage, that we can and will offer additional help if need
be.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: How is it that my X-Spam-Status is no, but my header gets marked with

2014-10-27 Thread Karsten Bräckelmann
On Mon, 2014-10-27 at 20:19 -0700, jdebert wrote:
> On Mon, 27 Oct 2014 15:45:03 -0700 (PDT)
> John Hardin  wrote:

> > The apparent culprit is a procmail rule that explicitly passes a
> > message through the mail system again. The message is being scanned
> > twice. If she can either deliver to a local mailbox rather than
> > forwarding to an email address, or modify the procmail rule that
> > calls SA to ignore messages that have already passed through the
> > server once, I think the problem would go away.
> 
> It looks as if it's the global procmailrc that always puts all mail,
> even mail between local users through spamassassin. However, I don't
> see how going through spamassassin again will modify the header. It's

It is not the second run that modifies the header. It's the first one.
With the second run classifying the mail as not-spam.

> already modified before the user procmail rule sees it. Something
> appears to be causing the first run of sa to modify the header
> unconditionally. If global procmail actually does the first run.

A system-wide procmail recipe feeds mail to SA.

Then there's a user procmail recipe that forwards mail with a Subject
matching /SPAM/ to another "dedicated spam dump" address with the same
domain, which ends up being delivered to that domain's MX. The same SMTP
server. Now re-processing the original mail (possibly wrapped in an
RFC822 attachment by SA), feeding it to SA due to the system-wide
procmail recipe...

On that second run, the message previously classified spam does not
exceed the threshold. Thus the X-Spam-Status of no, overriding the
previous Status header which is being ignored by SA anyway.

Result: Subject header rewritten by SA, despite final (delivery time)
spam status of no. This thread's Subject.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Is this really the SpamAssassin list? (was Re: unsubscribe)

2014-10-27 Thread Karsten Bräckelmann
On Mon, 2014-10-27 at 19:44 -0700, jdebert wrote:
> On Mon, 27 Oct 2014 17:00:11 -0400
> "Kevin A. McGrail"  wrote:

> > I've emailed infra with the following request:
> > 
> > ...we have been getting consistent unsubscribe messages posted to
> > the entire users list which begs the question if there is a way to
> > redirect those to the mailing list owner instead of just posting
> > them?
> 
> Redirecting them makes people lazy. Better than annoying but they
> don't learn anything except to repeat their mistakes.

Your assumption, the list moderators (aka owner, me being one of them)
would simply and silently obey and dutifully do the un-subscription for
them, is flawed. ;)

Just as with regular moderation, we'd respond with a template explaining
things, offering instructions -- and additional information on a
case-by-case basis.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Is this really the SpamAssassin list? (was Re: unsubscribe)

2014-10-27 Thread Karsten Bräckelmann
On Mon, 2014-10-27 at 17:00 -0400, Kevin A. McGrail wrote:
> On 10/27/2014 4:48 PM, Kevin A. McGrail wrote:
> > On 10/27/2014 4:45 PM, David F. Skoll wrote:

> > > How hard would it be to have the mailing list quarantine a message 
> > > whose subject consists solely of the word "unsubscribe" ? 

> > Heh... Apparently more needed than I hoped.  I'll have to ask the
> > foundation if they can implement something to achieve this. 
> I've emailed infra with the following request:

Might help, but not worth much effort if infra cannot set it up easily.
While we've seen a few recently, usual and overall frequency is *much*
lower.


> header__KAM_SA_BLOCK_UNSUB1Subject =~ /unsubscribe/i

Ouch. Would you please /^anchor$/ that beast? Unless you actually intend
this sub-thread to be swept off the list, too. ;)


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: How is it that my X-Spam-Status is no, but my header gets marked with

2014-10-25 Thread Karsten Bräckelmann
On Sat, 2014-10-25 at 20:06 -0700, Cathryn Mataga wrote:
> 
> Okay, here's another header.Shows X-Xpam-Status as no.
> 
> In local.cf I changed to this, just to be sure.
> 
> rewrite_header Subject [SPAM][JUNGLEVISION SPAM CHECK]

> X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on 
> ecuador.junglevision.com
> X-Spam-Level: *
> X-Spam-Status: No, score=1.5 required=3.5 tests=BAYES_50,HTML_MESSAGE, 
> MIME_HTML_ONLY,MIME_QP_LONG_LINE autolearn=disabled version=3.3.2

> Subject: [SPAM][JUNGLEVISION SPAM CHECK] Confirmation of Order Number 
> 684588 * Please Do Not Reply To This Email *

Somehow, you are passing messages to SA twice.

First one classifies it spam and rewrites the Subject. Second run
doesn't. Added headers, content wrapping, or most likely re-transmission
from trusted networks makes the second run fail.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: .link TLD spammer haven?

2014-10-25 Thread Karsten Bräckelmann
On Fri, 2014-10-24 at 19:05 -0700, John Hardin wrote:
> On Fri, 24 Oct 2014, John Hardin wrote:
> 
> > On Sat, 25 Oct 2014, Martin Gregorie wrote:
> >
> > >  Less obviously, it doesn't seem to matter whether you write the rule
> > >  as /\.link\b/  or /\.link$/ - both give identical matches. Both match
> > >  the following regexes just as you'd expect:
> > >http://www.linkedin.com/home/user/data.link
> > >http://www.example.link
> > >
> > >  but, less obviously, both also match this:
> > >http://www.example.link/path/to/file.txt
> >
> > {boggle}
> >
> > >  ...but
> > >"grep -P '\.link\b'" matches it, but
> > >"grep -P '\.link$'"  does not.
> > >
> > >  I presume that this means that the uri rule tests against two strings:
> > >  one being just the domain name and the other being the whole URI and
> > >  declares a rule hit if either string matches.

Basically correct. SA uri rules are not only tested against the raw URI
as extracted from the message, but also some normalized variations.
Without going into details, OTOH this includes un-escaping, protocol
prefix (if missing) and path stripping.

  $ echo -e "\n apache.org/path/" |
  ./spamassassin -D -L --cf="uri URI_DOMAIN /^http:\/\/[^\/]+$/"

  dbg: rules: ran uri rule URI_DOMAIN ==> got hit: "http://apache.org";

Note the regex matching a "domain only" anything-but-slash [^/]+
substring anchored at the end of the string. Also note the input
message's URI lacking a protocol, but the rule hit showing the (default)
protocol added by SA in one variation.


> > I don't think so, but I'm not positive.
> >
> > If you have a testing environment set up, try adding this and see what you 
> > get in the log:
> >
> >uri__ALL_URI  /.*/
> 
> oops. This too:
> 
>   tflags __ALL_URI  multiple
> 
> Sorry for forgetting that bit, it's rather important. :)

That seemingly straight-forward approach does not work in this case. The
tflags multiple option does not make uri rules match multiple times on a
single URI extracted from the message. It still generates a single hit
per extracted URI only, not including multiple hits on its normalized
variations.

The tflags multiple option on a uri rule enables it to match multiple
times on different URIs extracted from the message.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: URIBL_RHS_DOB high hits

2014-10-11 Thread Karsten Bräckelmann
On Sun, 2014-10-12 at 02:58 +0200, Reindl Harald wrote:
> Am 12.10.2014 um 02:20 schrieb Karsten Bräckelmann:

> > You have exactly one false positive listing. That is not even close to
> > "hit randomly".
> 
> well, i can't verify the other hits because don't have access to other 
> users email - the follwoing is another one and that *is* the definition 
> of randomly - in doubt such a list must not answer when there is not 
> verified data instead hit a FP
> 
> URIBL_RHS_DOB Contains an URI of a new domain (Day Old Bread)
> [URIs: goo.gl]

Another false positive DOB listing. Not good. Thanks for taking some
time to actually provide detail.

As for your personal definition of randomness, please see what others
have to say about it. Multiple bad listings still is not random.

  http://en.wikipedia.org/wiki/Randomness


> > Please stop the repeated, false accusations on this list.
> 
> point out that it is not trustable currently is not a accusation and 

You claimed DOB listed sourceforge.net, which it didn't. You repeatedly
claimed their listing to be random, which it isn't. That is what I
referred to as "false accusations".


> frankly http://support-intelligence.com/dob/ itself states "The list is 
> currently in BETA and should be used accordingly. We still have some 
> kinks in it and occasionally domains older than five days, or other 
> important domains end up in the list"

Yes. So what?

You are free to disable DOB on your server. You are free and in fact
welcome to report any issue with stock SA included DNSBLs, on-list or in
bugzilla, with founded evidence.

You are not free to claim $list responses to be random without proof.


> >>> Obviously, you did not check facts or investigate the issue at all.
> >>
> >> don't get me wrong, there ist not much to investigate if it hits legit
> >> mailing-list messages
> >
> > Correct, there is not much to investigate. The *only* thing would be to
> > verify *which* domain hit the DOB listing, and whether it actually is a
> > bad or warranted listing. Besides, that one is absolutely crucial to
> > check before claiming a false positive.
> >
> > A single thing to verify. You did not
> 
> if it hits a regular mailing list thread it is problematic and as said

No. It depends on the content. See this list for prime example.

> if there are no data for whatever reason the answer should be NXDOMAIN 
> and not 127.0.0.1 in doubt because FP does more harm than FN

False accusation, again. You just claimed $list would return anything
other than NXDOMAIN in case of not-being-listed.

  $ host not-registered-domain.com.dob.sibl.support-intelligence.net
  Host not-registered-domain.com.dob.sibl.support-intelligence.net not found: 
3(NXDOMAIN)

We're talking false positive listings. Not random responses, neither
positive listing if "in doubt".

Again, stop unfounded false accusations on this list.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: URIBL_RHS_DOB high hits

2014-10-11 Thread Karsten Bräckelmann
On Sun, 2014-10-12 at 01:28 +0200, Reindl Harald wrote:
> Am 12.10.2014 um 01:09 schrieb Karsten Bräckelmann:

> >>>>> it hits again and i doubt that sourceforge is a new domain

> > However, what I am much more annoyed about is your rambling, claiming
> > DOB would list sourceforge.net -- and by that, particularly with this
> > thread's topic, giving the impression of DOB again listing the world.
> > Which it doesn't.
> 
> it seems to hit randomly which is even more worse because listing the 
> world is more obvious - i claim that it is not trustable currently, not 
> more and not less, may anybody make his own decision, i told mine and 
> there is nothing worng with that

You have exactly one false positive listing. That is not even close to
"hit randomly".

Please stop the repeated, false accusations on this list.


> > Obviously, you did not check facts or investigate the issue at all.
> 
> don't get me wrong, there ist not much to investigate if it hits legit 
> mailing-list messages

Correct, there is not much to investigate. The *only* thing would be to
verify *which* domain hit the DOB listing, and whether it actually is a
bad or warranted listing. Besides, that one is absolutely crucial to
check before claiming a false positive.

A single thing to verify. You did not.

Besides, it is just a coincidence that another domain in your log paste
actually was listed when I checked. Any other domain from the body could
have been the culprit. And still potentially can, since you only posted
logs -- no SA headers, body, or list of URIs.


> > With a configuration of "add_header all Report _REPORT_", the listed
> > domain even is included in the report, without any need for manual
> > post-processing.
> >
> >*  0.3 URIBL_RHS_DOB Contains an URI of a new domain (Day Old Bread)
> >*  [URIs: tieman.se]
> 
> which is just not true - the domain is way older

Yes, that seems to be a DOB false positive listing (and the only one
known right now, see above). Get over it.

And BTW, that was meant as a helpful hint for you and anyone else
reading this thread, about getting crucial details while investigating
(or reporting) issues. No need to bark at me, and repeat yet again
that's the one bad listing you encountered. The above is "how to do it"
and "what you get".


> and the SBL hit because 
> "support-intelligence.net" makes things not better
> 
> URIBL_SBL Contains an URL's NS IP listed in the SBL blocklist * 
> [URIs: tieman.se.dob.sibl.support-intelligence.net]

That is a SpamHaus listing. Support Intelligence is not responsible for
it, but the "victim".

This is entirely unrelated to URIBL_RHS_DOB and this thread's topic.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: URIBL_RHS_DOB high hits

2014-10-11 Thread Karsten Bräckelmann
On Sun, 2014-10-12 at 00:29 +0200, Reindl Harald wrote:
> Am 12.10.2014 um 00:23 schrieb Reindl Harald:
> > Am 12.10.2014 um 00:18 schrieb Karsten Bräckelmann:
> > > On Sat, 2014-10-11 at 23:40 +0200, Reindl Harald wrote:

> > > > it hits again and i doubt that sourceforge is a new domain
> > > > whatever the reason is - for me enough to disable it forever
> > >
> > > Jumping to conclusions, aren't you?
> >
> > yes - the conclusion is that it had way too much FP's recently

Arguably, tieman.se should be sufficiently old to not be listed.

However, what I am much more annoyed about is your rambling, claiming
DOB would list sourceforge.net -- and by that, particularly with this
thread's topic, giving the impression of DOB again listing the world.
Which it doesn't.

Obviously, you did not check facts or investigate the issue at all.


> frankly it hitted even my own message you replied to
> see at bottom

Yes, so will this one. DOB does NOT operate on sender or From header.
See for yourself:

  echo -e "\n tieman.se" | ./spamassassin

So yes, it hit on your mail. But no, it does not list your domain.


> >>> Oct 11 23:34:43 mail-gw spamd[28079]: spamd: result: . 0 -

FWIW, you can investigate and check any detail you want, because the
mail has been accepted by your SMTP server.

With a configuration of "add_header all Report _REPORT_", the listed
domain even is included in the report, without any need for manual
post-processing.

  *  0.3 URIBL_RHS_DOB Contains an URI of a new domain (Day Old Bread)
  *  [URIs: tieman.se]


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: URIBL_RHS_DOB high hits

2014-10-11 Thread Karsten Bräckelmann
On Sat, 2014-10-11 at 23:40 +0200, Reindl Harald wrote:
> it hits again and i doubt that sourceforge is a new domain
> whatever the reason is - for me enough to disable it forever

Jumping to conclusions, aren't you?


> Oct 11 23:34:43 mail-gw spamd[28079]: spamd: result: . 0 - 
> BAYES_50,CUST_DNSWL_7,CUST_DNSWL_9,DKIM_ADSP_ALL,HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS,SPF_PASS,T_RP_MATCHES_RCVD,URIBL_RHS_DOB,USER_IN_MORE_SPAM_TO
>  
> scantime=0.9,size=8902,user=sa-milt,uid=189,required_score=4.5,rhost=localhost,raddr=127.0.0.1,rport=39381,mid=<7655276d-92b5-4dbd-8041-6db5c4fb8...@tieman.se>,bayes=0.499983,autolearn=disabled
> Oct 11 23:34:43 mail-gw postfix/qmgr[28308]: 3jFfYt4WVTz1l: 
> from=, size=8829, nrcpt=1 
> (queue active)

$ host sourceforge.net.dob.sibl.support-intelligence.net
Host sourceforge.net.dob.sibl.support-intelligence.net not found: 3(NXDOMAIN)

$ host tieman.se.dob.sibl.support-intelligence.net
tieman.se.dob.sibl.support-intelligence.net has address 127.0.0.2

$ whois tieman.se | grep 2014
created:  2014-01-11
modified: 2014-09-20


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Score Ignored

2014-10-08 Thread Karsten Bräckelmann
On Wed, 2014-10-08 at 15:48 -0500, Robert A. Ober wrote:
> > On Mon, 22 Sep 2014 15:11:44 -0500 Robert A. Ober wrote:

> > > *Yes,  my test messages and SPAM hit the rules but ignore the score.*

> What is the easiest way to know what score is applied per rule? Neither 
> the server log nor the header breaks it down.

Wait. If there's no Report, if you do not have the list of rules hit and
its respective scores, how do you tell your custom rule's score is
ignored by SA?


Besides the Report as mentioned by Axb already, you also can modify the
default Status header to include per-rule scores.

add_header all Status "_YESNO_, score=_SCORE_ required=_REQD_ 
tests=_TESTSSCORES(,)_ autolearn=_AUTOLEARN_ version=_VERSION_"


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: recent channel update woes

2014-10-07 Thread Karsten Bräckelmann
On Tue, 2014-10-07 at 16:37 -0700, Dave Warren wrote:
> If you're paranoid, you can monitor the DNSBLs that you use via script 
> (externally from SpamAssassin) and generate something that reports to 
> you when there's a possible issue. If you're really paranoid, you can 
> have it write a .cf that would 0 out the scores, but I assure you that 
> you'll spend more time building, testing and maintaining such a system 
> than it's worth in the long run, in my experience it's better to just 
> page an admin.
> 
> I monitor positive and negative responses, for IP based DNS BLs, I use 
> the following by default:
> 
> 127.0.0.1 should not be listed.
> 127.0.0.2 should be listed.

Depending on how the DNSBL implements such static test-points, they
might not be affected by the issue causing the false listings.
Similarly, domains likely to appear on exonerate lists (compare
uridnsbl_skip_domain e.g.) might also not be affected.

For paranoid monitoring, low-profile domains that definitely do not and
will not match the listing criteria might be better suited for the task.


> $MYIP should not be listed.
> 
> Obviously these need to be tweaked and configured per-list, not all 
> lists list 127.0.0.2, and some lists use status codes, so "should not be 
> listed" and "should be listed" are really "match/do-not-match some 
> condition"
> 
> In the case of DNSWL, $MYIP should be listed, if I get de-listed, I want 
> to know about that too.

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: recent channel update woes

2014-10-07 Thread Karsten Bräckelmann
On Wed, 2014-10-08 at 01:18 +0200, Reindl Harald wrote:
> Am 08.10.2014 um 00:49 schrieb Eric Cunningham:

> > Is there a way to configure URIBL_RHS_DOB conditionally such that if
> > there are issues with dob.sibl.support-intelligence.net like we're
> > seeing, that associated scoring remains neutral rather than increasing
> > (or decreasing)?
> 
> not really - if you get the response from the DNS - well, you are done
> 
> the only exception are dnslists which stop to answer if you excedd the 
> free limit but in that case they answer with a different response what 
> is caught by the rules

Exceeding free usage limit is totally different from the recent DOB
"listing the world" issue.

Also, exceeding limit is handled differently in lots of ways. It ranges
from specific "limit exceeded" results, up to "listing the world" at the
hostile end or in extreme situations to finally get the admin's
attention. It also includes simply no results other than NXDOMAIN, which
is hard to distinguish from proper operation in certain low-listing
conditions.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: recent channel update woes

2014-10-07 Thread Karsten Bräckelmann
On Tue, 2014-10-07 at 18:49 -0400, Eric Cunningham wrote:
> Is there a way to configure URIBL_RHS_DOB conditionally such that if 
> there are issues with dob.sibl.support-intelligence.net like we're 
> seeing, that associated scoring remains neutral rather than increasing 
> (or decreasing)?

No. As-is, a correct DNSxL listing is indistinguishable from a false
positive listing.


One possible strategy to detect FP listings would be an additional DNSxL
query of a test-point or known-to-be not listed value. This comes at the
cost of increased load both for the DNSxL as well as SA instance, and
will lag behind due to TTL and DNS caching. The lower the lag, the lower
the caching, the higher the additional load.

By doing such tests not on a per message basis but per spamd child. or
even having the parent process monitor for possible world-listed
situations, the additional overhead and load could be massively reduced.

Simply monitoring real results (without test queries) likely would not
work. It is entirely possible that really large chunks of the mail
stream continuously result in positive DNSxL listings. Prime candidates
would be PBL hitting botnet spew, or exclusively DNSWL trusted messages
during otherwise low traffic conditions. Distinguishing lots of
consecutive correct listings from false positives would be really hard
and prone to errors.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: spamd does not start

2014-10-07 Thread Karsten Bräckelmann
On Tue, 2014-10-07 at 18:55 +0300, Jari Fredrisson wrote:
> I built SA 3.4 using cpan to my old Debian Squeeze-lts.
> 
> root@hurricane:~# time service spamassassin start
> Starting SpamAssassin Mail Filter Daemon: child process [4868] exited or
> timed out without signaling production of a PID file: exit 255 at
> /usr/local/bin/spamd line 2960.
> 
> real0m1.230s

> I read that line in spamd and it talks about two bugs. And a long
> timeout needed. But this dies at once, hardly a timeout?

It states the "child process exited or timed out". Indeed, obviously not
a timeout, so the child process simply exited.

Anything in syslog left by the child?


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: rejected Null-Senders

2014-10-07 Thread Karsten Bräckelmann
On Tue, 2014-10-07 at 17:46 +0200, Reindl Harald wrote:
> can somebody comment in what context null-senders and
> so bounces and probably autorepsonders are blocked
> by "DKIM_ADSP_NXDOMAIN,USER_IN_BLACKLIST"

SA does not block. *sigh*

In this context, the DKIM_ADSP_NXDOMAIN hit is irrelevant, given its low
score. The USER_IN_BLACKLIST hit is what's pushing the score beyond your
STMP reject threshold.


> DKIM_ADSP_NXDOMAIN,USER_IN_BLACKLIST
> from=<> to=
> 3jC2XD1j8Cz1y: milter-reject: END-OF-MESSAGE

See whitelist_from documentation for the from / sender type mail headers
SA uses for black- and whitelisting.

The above seems to show SMTP stage MAIL FROM, which results in only one
of the possible headers and depends on your SMTP server (and milter in
your case).


> a customer sends out his yearly members-invitation nad i see some 
> bounces / autrorepsonders pass through and some are blocked with the 
> above tags, at least one from his own outgoing mainserver
> 
> what i don't completly understand is the "DKIM_ADSP_NXDOMAIN" since in 
> case of NXDOMAIN the message trigger the response could not have been 
> delivered and how the "USER_IN_BLACKLIST" comes with a empty sender
> 
> not that i am against block some amount of backscatters, i just want to 
> understand the conditions

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: SpamAssassin false positive bayes with attachments

2014-10-06 Thread Karsten Bräckelmann
On Mon, 2014-10-06 at 09:03 -0400, jdime abuse wrote:
> I have been seeing some issues with bayes detection from base64
> strings within attachments causing false positives.
> 
> Example:
> Oct  6 09:02:14.374 [15869] dbg: bayes: token 'H4f' => 0.71186828264
> Oct  6 09:02:14.374 [15869] dbg: bayes: token 'wx2' => 0.68644662127
> Oct  6 09:02:14.374 [15869] dbg: bayes: token 'z4f' => 0.68502147581
> Oct  6 09:02:14.378 [15869] dbg: bayes: token '0vf' => 0.66604823748
> 
> Is there a solution to prevent triggering bayes from the base64 data
> in an attachment? It was my impression that attachments should not
> trigger bayes data, but it seems that it is parsing it as text rather
> than an attachment.

Bayes tokens are basically taken from rendered, textual body parts (and
mail headers). Attachments are not tokenized.

Unless the message's MIME-structure is severely broken, these tokens
appear somewhere other than a base64 encoded attachment. Can you provide
a sample uploaded to a pastebin?


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Administrivia (was: Re: recent channel update woes)

2014-10-06 Thread Karsten Bräckelmann
On Mon, 2014-10-06 at 13:36 -0400, Kevin A. McGrail wrote:
> On 10/6/2014 1:23 PM, Kevin A. McGrail wrote:
> > On 10/6/2014 1:11 PM, Jason Goldberg wrote:

> > > How to i get removed from this stupid list.
> > >
> > > I love begin spammed by a list about spam which i did not signup for.
> >
> > Email users-h...@spamassassin.apache.org and the system will mail you 
> > instructions.
> >
> > If you did not sign up for the list, that is very troublesome and we 
> > can ask infrastructure to research but I believe we have a 
> > confirmation email requirement to get on the list. 

First of all: Jason's posts are stuck in moderation. The sender address
he uses is not the one he subscribed with.

Sidney and I (both list moderators) have been contacting Jason off-list
with detailed instructions how to find the subscribed address and
offering further help.


> Obviously we take this very seriously as anti-spammers because the 
> definition I follow for spam is it's about consent not content.  If you 
> didn't consent to receive these emails, we have a major issue.

The list server requires clear and active confirmation of the
subscription request by mail, validating both the address as well as
consent.


> I've confirmed we have a confirmation email process in place that 
> requires the subscribee to confirm the subscription request.  And I 
> believe this has been in place for many years.  So if you did not 
> subscribe to the list or confirm the subscription, you may need to check 
> if your email address credentials have been compromised as that's the 
> second most likely scenario for the cause beyond an administrator adding 
> you directly.
> 
> Karsten, any thoughts other than if a list administrator added them 
> directly?   Have infrastructure check the records for when and how the 
> subscriber was added?  Open a ticket with Google?

He has not been added by a list administrator.

Without the subscribed address, there is absolutely nothing we can do. I
grepped the subscription list and transaction logs for parts of Jason's
name and company. The address in question is entirely different.


Just to give some answers. This issue should further be handled
off-list.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: running own updateserver

2014-10-05 Thread Karsten Bräckelmann
On Wed, 2014-10-01 at 13:19 +0200, A. Schulze wrote:
> Hello,
> 
> I had the idea to run my own updateserver for two purposes:
>   1. distribute own rules
>   2. override existing rules
> 
> But somehow I fail on #2.
> 
> 
> SA rules normally reside in /var/.../spamassassin/$SA-VERSION/channelname/*.cf
> Also the are files /var/.../spamassassin/$SA-VERSION/channelname.cf  
> including the real files in channelname/
> 
> Now I had some rules overriding existing SA rules in  
> /etc/mail/spamassassin/local.cf
> These rules I moved to my own channelname and now the defaults from  
> updates_spamassassin_org
> are active again.
> 
> My guess: rules are included in lexical order from  

Correct.

> /var/.../spamassassin/$SA-VERSION/channelname.cf
> and my new channel spamassassin_example_org is *not after*  
> updates_spamassassin_org
> 
> I proved my guess by "renaming" the channelfiles to z_spamassassin_example_org
> ( adjusted the .cf + include also )
> 
> Immediately the intended override was active again.
> 
> Is my guess right?

Yes.

> If so, any (other then renaming the channel) chance to modify the order?

No. The directory name and accompanying cf file are generated by
sa-update based on the channel name. There is no way for the channel to
enforce order.

Besides picking a channel name that lexicographically comes after the
to-be-overridden target channel, you're limited to local post sa-update
rename or symlink hacks with additional maintenance cost.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: bad local parts (thisisjusttestletter)

2014-10-04 Thread Karsten Bräckelmann
On Sun, 2014-10-05 at 02:43 +0200, Reindl Harald wrote:
> Am 05.10.2014 um 02:27 schrieb Karsten Bräckelmann:
> > On Sun, 2014-10-05 at 01:53 +0200, Reindl Harald wrote:
> >> Am 05.10.2014 um 01:41 schrieb Karsten Bräckelmann:
> >>> On Sat, 2014-10-04 at 22:15 +0200, Reindl Harald wrote:

> >>>> i recently found "thisisjusttestletter@random-domain" as sender as well
> >>>> as "thisisjusttestletter@random-of-our-domains" as RCPT in my logs and
> >>>> remember that crap for many years now
> >>>
> >>> Surely, SA would never see that message, since that's not an actual,
> >>> valid address at your domain. And you're not using catch-all, do you?
> >>>
> >>> (Yes, that question is somewhere between rhetoric and sarcastic.)
> >>
> >> but "thisisjusttestletter@random-domain" is a valid address in his
> >> domain until you prove the opposite with sender-verification and it's
> >> drawbacks
> >
> > Correct. And it is unsafe to assume any given address local part could
> > not possibly be valid and used as sender address in ham.
> 
> most - any excludes that one honestly

I would agree, gladly. If only I would not have these pictures in my
head of an admin creating that as a deliverability testing address. Same
ball park as a Subject of "test". I almost can hear his accent...


> > If at all, such tests should be assigned a low-ish score, not used in
> > SMTP access map blacklisting. However, I seriously doubt it's actually
> > worthwhile to maintain such rules.
> 
> agreed - i only asked if there are known other local parts
> of that sort because i noticed that one at least 5 years
> ago as annoying

Annoying? That was before using SA and with using catch-all, right?

So it was annoying back then. Doesn't explain why you're chasing it
today. How many of them can you find in your logs? Even including its
variants (e.g. "atall" appended), I assume the total number to be really
low. And, frankly, exclusively existent in SMTP logs rejecting the
message.

Unless there still is catch-all in effect, that should have been axed
some 10 years ago.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: bad local parts (thisisjusttestletter)

2014-10-04 Thread Karsten Bräckelmann
On Sun, 2014-10-05 at 01:53 +0200, Reindl Harald wrote:
> Am 05.10.2014 um 01:41 schrieb Karsten Bräckelmann:
> > On Sat, 2014-10-04 at 22:15 +0200, Reindl Harald wrote:

> > > i recently found "thisisjusttestletter@random-domain" as sender as well
> > > as "thisisjusttestletter@random-of-our-domains" as RCPT in my logs and
> > > remember that crap for many years now
> > 
> > Surely, SA would never see that message, since that's not an actual,
> > valid address at your domain. And you're not using catch-all, do you?
> >
> > (Yes, that question is somewhere between rhetoric and sarcastic.)
> 
> but "thisisjusttestletter@random-domain" is a valid address in his 
> domain until you prove the opposite with sender-verification and it's 
> drawbacks

Correct. And it is unsafe to assume any given address local part could
not possibly be valid and used as sender address in ham.

If at all, such tests should be assigned a low-ish score, not used in
SMTP access map blacklisting. However, I seriously doubt it's actually
worthwhile to maintain such rules.


> > > well, postfix access maps after switch away from commercial
> > > appliances - are there other well nown local-parts to add
> > > to this list?
> > 
> > What would you need a blacklist of spammy address local parts for? Do
> > not accept messages to SMTP RCPT addresses that don't exist. Do not use
> > catch-all. Problem solved...
> 
> don't get me wrong but you missed the 'i recently found 
> "thisisjusttestletter@random-domain' as sender" at the start of my post

As sender, continued by "as well as [...] as RCPT" using the exact same
local part.

So you just found one such instance in your logs. And yes, I have seen
that very address local part, too, occasionally. Although only in SMTP
logs and AFAIR never ever in SMTP accepted spam, let alone FNs, because
just like your sample, they always sported a similarly invalid RCPT
address.

Did you ever see this in MAIL FROM with a *valid* RCPT TO address?

And did it end up scored low-ish? Below 15? Otherwise, it's just not
worth it.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: bad local parts (thisisjusttestletter)

2014-10-04 Thread Karsten Bräckelmann
On Sat, 2014-10-04 at 22:15 +0200, Reindl Harald wrote:
> i recently found "thisisjusttestletter@random-domain" as sender as well 
> as "thisisjusttestletter@random-of-our-domains" as RCPT in my logs and 
> remember that crap for many years now

Surely, SA would never see that message, since that's not an actual,
valid address at your domain. And you're not using catch-all, do you?

(Yes, that question is somewhere between rhetoric and sarcastic.)

> well, postfix access maps after switch away from commercial
> appliances - are there other well nown local-parts to add
> to this list?

What would you need a blacklist of spammy address local parts for? Do
not accept messages to SMTP RCPT addresses that don't exist. Do not use
catch-all. Problem solved...

Other than that, this is an OT postfix question.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Valid TLDs (was: Re: Custom rule not hitting suddenly?)

2014-09-08 Thread Karsten Bräckelmann
On Mon, 2014-09-08 at 21:45 -0500, Dave Pooser wrote:
> On 9/8/14 8:45 PM, "Karsten Bräckelmann"  wrote:
> 
> >There is one down side: A new dependency on Regexp::List [1]. The RE
> >pre-compile one-time upstart penalty should be negligible.
> >
> >[1] Well, or a really, really f*cking ugly option that takes a
> >pre-optimzed qr// blob containing the VALID_TLDS_RE.
> 
> I may be biased as I've been dealing with a different CPAN dependency
> flustercluck recently (love maintainers who can't be bothered to update
> the version info so CPAN doesn't realize there's an update and I have to
> manually un/re install), but I'm a vote for the hideously ugly
> preoptimized blob over adding a new dependency.
> 
> That said, I'd rather have the new dependency than keep the configuration
> embedded in the rules.
  ^
Code, not rules. Which basically is the issue here...

> So, in order of preference:
> 1) Pre-optimized blob
> 2) Regexp::List dependency
> 3) Current method

Got ya. Both (1) and (2) would require code changes, so it's 3.4.1+ only
anyway.

Thanks.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Valid TLDs (was: Re: Custom rule not hitting suddenly?)

2014-09-08 Thread Karsten Bräckelmann
On Mon, 2014-09-08 at 22:37 -0400, listsb-spamassas...@bitrate.net wrote:
> On Sep 8, 2014, at 21.45, Karsten Bräckelmann  wrote:
> 
> > Some discussion of the underlying issue.
> > 
> > On Tue, 2014-09-09 at 02:59 +0200, Karsten Bräckelmann wrote:
> >> At the time of the 3.3.2 release, the .club TLD simply didn't exist. It
> >> has been accepted by IANA just recently. Of course I was conveniently
> >> using a trunk checkout for testing and kind of shrugged off that TLD in
> >> question.
> >> 
> >> FWIW, this is not actually a 3.3.x issue. It's the same with 3.4.0. Yes,
> >> that is a *recent* TLD addition... *sigh*
> > 
> > Unlike the util_rb_[23]tld options, the set of valid TLDs is actually
> > hard-coded. It would not be a problem to make that an option, too.
> > Which, on the plus side, would make it possible to propagate new TLDs
> > via sa-update. Not only 3.3.x would benefit from that, but also 3.4.0
> > instances. Plus, it would be generally faster anyway.
> > 
> > There is one down side: A new dependency on Regexp::List [1]. The RE
> > pre-compile one-time upstart penalty should be negligible.
> > 
> > The question is: Is it worth it?  WILL it be worth it?
> 
> pardon my possible technical ignorance here - could this potentially be
> a network test, rather than a list propagated by sa-update?  e.g.
> query dns for existence of delegation?

This cannot be queried for. Because the Valid TLDs (code|option) is what
is used to identify URIs in the first place, even from plain text links
any normal MUA would linki-fy.

Apart from that, the list of generic TLDs is not going to change *that*
frequent, that a few days between IANA acceptance, SA incorporating it,
and first occurrence in mail as sa-update takes would make a difference.

And as I hinted at before, (new) generic TLD owners have a vital
interest in their TLD not be mostly abused. If it is, it's not worth the
investment.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Valid TLDs (was: Re: Custom rule not hitting suddenly?)

2014-09-08 Thread Karsten Bräckelmann
On Mon, 2014-09-08 at 22:15 -0400, Daniel Staal wrote:
> --As of September 9, 2014 3:45:33 AM +0200, Karsten Bräckelmann is alleged 
> to have said:
> 
> > This incidence is part of the initial round of IANA accepting generic
> > TLDs. There's hundreds in this wave, and some are abused early. This is
> > moonshine registration, nothing like new TLDs being accepted in the
> > coming years.
> >
> > Or is it? Will new generic TLDs in the future be abused like that, too?
> > How frequently will that happen? Is it worth being able to react to it
> > quickly? How long will URIBLs take to list them? How long will it take
> > for the average MUA to even linki-fy them?
> >
> > Opinions? Discussion in here, or should I move this to dev?
> 
> --As for the rest, it is mine.
> 
> New TLDs will always be abused...

And old ones. "TK, re-naming the web." Yes, sometimes it is valid to add
a point or two for the mere occurence of a TLD in a URI.

For how long? Whoever applied for new generic $tld put about 180 grand
up the shelve. How much is it worth them to prevent spammers from
tasting domains and actually turn their investment into serious
customers paying bucks?


> Anyway, personal opinion: Spamassassin is currently structured to have code 
> and rules as separate things.  Putting this in the code blurs that - it's a 
> rule.  Unless there is a major performance penalty, I would move it to be 
> with the rest of the rules.  It should make maintenance easier and clearer.

It is and would not be "a rule" as you stated, but configuration.

Apart from that nitpick, I understand you would be in favor of a Valid
TLD option, rather than hard-coded. Noted.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Valid TLDs (was: Re: Custom rule not hitting suddenly?)

2014-09-08 Thread Karsten Bräckelmann
Some discussion of the underlying issue.

On Tue, 2014-09-09 at 02:59 +0200, Karsten Bräckelmann wrote:
> At the time of the 3.3.2 release, the .club TLD simply didn't exist. It
> has been accepted by IANA just recently. Of course I was conveniently
> using a trunk checkout for testing and kind of shrugged off that TLD in
> question.
> 
> FWIW, this is not actually a 3.3.x issue. It's the same with 3.4.0. Yes,
> that is a *recent* TLD addition... *sigh*

Unlike the util_rb_[23]tld options, the set of valid TLDs is actually
hard-coded. It would not be a problem to make that an option, too.
Which, on the plus side, would make it possible to propagate new TLDs
via sa-update. Not only 3.3.x would benefit from that, but also 3.4.0
instances. Plus, it would be generally faster anyway.

There is one down side: A new dependency on Regexp::List [1]. The RE
pre-compile one-time upstart penalty should be negligible.

The question is: Is it worth it?  WILL it be worth it?

This incidence is part of the initial round of IANA accepting generic
TLDs. There's hundreds in this wave, and some are abused early. This is
moonshine registration, nothing like new TLDs being accepted in the
coming years.

Or is it? Will new generic TLDs in the future be abused like that, too?
How frequently will that happen? Is it worth being able to react to it
quickly? How long will URIBLs take to list them? How long will it take
for the average MUA to even linki-fy them?

Opinions? Discussion in here, or should I move this to dev?

I guess I'd be happy to introduce to you... util_rb_tld.


[1] Well, or a really, really f*cking ugly option that takes a
pre-optimzed qr// blob containing the VALID_TLDS_RE.

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Custom rule not hitting suddenly?

2014-09-08 Thread Karsten Bräckelmann
On Mon, 2014-09-08 at 18:08 -0600, Amir Caspi wrote:
> On Sep 8, 2014, at 4:09 PM, Karsten Bräckelmann  
> wrote:
> 
> > Pulled the sample from pastebin and fed to spamassassin -D with your
> > custom rule added as additional configuration. That rule hits.
> 
> It does not hit on mine, and I think I've figured out why.  I'm using
> SA 3.3.2 with perl 5.8.8 on CentOS 5.10.  Yes, I know I should be
> using 3.4, but I haven't yet had a chance to try the RPM that a couple
> of people have built.  Nonetheless, with SA 3.3.2, it appears that the
> URI engine doesn't like the .club TLD.  See below.

Good one. Yes, it's the TLD.


> Sep  8 20:02:58.897 [9267] dbg: rules: ran uri rule AC_ALL_URI ==> got 
> hit: "negative match"
> 
> So, for some reason, the URI engine is not picking out these .club
> URIs, it's getting "negative match."  Is it because the engine in
> 3.3.2 doesn't like that TLD?  To test this, I manually changed the TLD
> of the second spam URI (out.blah) to .us or .org, and then the engine
> picked it out just fine:

At the time of the 3.3.2 release, the .club TLD simply didn't exist. It
has been accepted by IANA just recently. Of course I was conveniently
using a trunk checkout for testing and kind of shrugged off that TLD in
question.

FWIW, this is not actually a 3.3.x issue. It's the same with 3.4.0. Yes,
that is a *recent* TLD addition... *sigh*


> Sep  8 20:03:43.151 [9197] dbg: rules: ran uri rule AC_ALL_URI ==> got 
> hit: "http://out.dosearchcarsonsale.us";
> Sep  8 20:04:35.578 [9227] dbg: rules: ran uri rule AC_ALL_URI ==> got 
> hit: "http://out.dosearchcarsonsale.org";
> 
> So, it seems to me that the URI engine is barfing on the TLD, and
> that's the problem...

> Is there a patch I can apply that would fix this, until I can upgrade to 3.4?

SVN revision 1615088. The "text changed" link shows the diff and
has a link to the plain patch.

  http://svn.apache.org/viewvc?view=revision&revision=1615088

Dunno if that the patch applies cleanly to 3.3.2, though.

You also can change M::SA::Util::RegistrarBoundaries manually. As per
the svn diff above, two blobs are involved:  (a) the VALID_TLDS hash
foreach() definition and  (b) the VALID_TLDS_RE.

So you could get those out of trunk and edit RegistrarBoundaries.pm
locally. It also should be possible to simply replace that Perl module
with the current trunk version.

And last but not least, generation ob both these TLD blobs is documented
in the code right before their definition. You can always generate it
fresh.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Custom rule not hitting suddenly?

2014-09-08 Thread Karsten Bräckelmann
On Mon, 2014-09-08 at 11:35 -0600, Amir Caspi wrote:
> One of my spammy URI template rules is, for some reason, not hitting
> any more.  Spample here:
> 
> http://pastebin.com/jy6WZhWW
> 
> In my local.cf sandbox I have the following:
> 
> uri __AC_STOPRANDDOM_URI1 
> /(?:stop|halt|quit|leave|leavehere|out|exit|disallow|discontinue|end)\.[a-z0-9-]{10,}\.(?:us|me|com|club|org|net)\b/
> 
> This is part of my AC_SPAMMY_URI_PATTERNS meta rule, which hits just
> fine on other emails (including others of this particular format).
> 
> Debug output shows this subrule didn't hit anything (that is, the rule
> isn't mentioned at all in the debug output), but regexpal.com says it
> should have hit just fine.

Works for me.

Pulled the sample from pastebin and fed to spamassassin -D with your
custom rule added as additional configuration. That rule hits.


> Could the problem be with the \b delimiter at the end?

No. The word-boundary \b does not only match between a word \w and
non-word \W char, but also at the beginning or end of the string, if the
adjacent char is a word char.

> I've noticed that sometimes can cause issues in failing to hit, but
> usually only when a URI ends with a slash...

That, too, would be unrelated to the \b word-boundary.

What bothers me is that "sometimes" qualification. Either it matches or
it doesn't. If it matches sometimes, something yet unnoticed has a
severe impact.


Did you grep the -D debug output for the hostname? Also try grepping for
URIHOSTS (SA 3.4, without -L local only mode), which lists all hostnames
found in the message.


> and this same rule hits other matching URIs in other spams.  However,
> this isn't the first time I've noticed a failure to match... so any
> idea why it's not hitting?  Per the regex rules, it SHOULD be hitting
> fine unless it's the \b...
> 
> Any ideas?

The URI is at the very end of a line with a CRLF delimiter following and
the next line beginning with a word character. If you inject a space
after the URI, does that make the rule match? (That should not be the
issue, just trying to rule out conversion problems.)

Also I noticed the headers are CRLF delimited, too. How did you get that
sample? Any chance it has been modified or re-formatted by a text editor
and does not equal the raw, original message?

Does the pastebin uploaded file still not trigger the rule for you?


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Bayes autolearn questions

2014-09-06 Thread Karsten Bräckelmann
Please use plain-text rather than HTML. In particular with that really
bad indentation format of quoting.


On Sat, 2014-09-06 at 17:22 -0400, Alex wrote:
> On Thu, Sep 4, 2014 at 1:44 PM, Karsten Bräckelmann wrote:
> > On Wed, 2014-09-03 at 23:50 -0400, Alex wrote:
> >
> > > > > I looked in the quarantined message, and according to the _TOKEN_
> > > > > header I've added:
> > > > >
> > > > > X-Spam-MyReport: Tokens: new, 47; hammy, 7; neutral, 54; spammy, 16.
> > > > >
> > > > > Isn't that sufficient for auto-learning this message as spam?
> > 
> > That's clearly referring to the _TOKEN_ data in the custom header, is it
> > not?
> 
> Yes. Burning the candle at both ends. Really overworked.

Sorry to hear. Nonetheless, did you take the time to really understand
my explanations? It seems you sometimes didn't in the past, and I am not
happy to waste my time on other people's problems if they aren't
following thoroughly.


> > > > That has absolutely nothing to do with auto-learning. Where did you get
> > > > the impression it might?
> > >
> > > If the conditions for autolearning had been met, I understood that it
> > > would be those new tokens that would be learned.
> >
> > Learning is not limited to new tokens. All tokens are learned,
> > regardless their current (h|sp)ammyness.
> >
> > Still, the number of (new) tokens is not a condition for auto-learning.
> > That header shows some more or less nice information, but in this
> > context absolutely irrelevant information.
> 
> I understood "new" to mean the tokens that have not been seen before, and
> would be learned if the other conditions were met.

Well, yes. So what?

Did you understand that the number of previously not seen tokens has
absolutely nothing to do with auto-learning? Did you understand that all
tokens are learned, regardless whether they have been seen before?

This whole part is entirely unrelated to auto-learning and your original
question.


> > Auto-learning in a nutshell: Take all tests hit. Drop some of them with
> > certain tflags, like the BAYES_xx rules. For the remaining rules, look
> > up their scores in the non-Bayes scoreset 0 or 1. Sum up those scores to
> > a total, and compare with the auto-learn threshold values. For spam,
> > also check there are at least 3 points each by header and body rules.
> > Finally, if all that matches, learn.
> 
> Is it important to understand how those three points are achieved or
> calculated?

In most cases, no, I guess. Though that is really just a distinction
usually easy to do based on the rule's type: header vs body-ish rule
definitions.

If the re-calculated total score in scoreset 0 or 1 exceeds the
auto-learn threshold but the message still is not -- then it is
important. Unless you trust the auto-learn discriminator to not cheat on
you.


> > > Okay, of course I understood the difference between points and tokens.
> > > Since the points were over the specified threshold, I thought those
> > > new tokens would have been added.
> >
> > As I have mentioned before in this thread: It is NOT the message's
> > reported total score that must exceed the threshold. The auto-learning
> > discriminator uses an internally calculated score using the respective
> > non-Bayes scoreset.
> 
> Very helpful, thanks. Is there a way to see more about how it makes that
> decision on a particular message?

  spamassassin -D learn

Unsurprisingly, the -D debug option shows information on that decision.
In this case limiting debug output to the 'learn' area comes in handy,
eliminating the noise.

The output includes the important details like auto-learn decision with
human readable explanation, score computed for autolearn as well as head
and body points.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: shouldn't "spamc -L spam" always create BAYES_99?

2014-09-06 Thread Karsten Bräckelmann
On Sun, 2014-09-07 at 09:09 +1200, Jason Haar wrote:
> We've got a problem with a tonne of spam getting BAYES_50 or even
> BAYES_00. We're re-training SA using "spamc -L spam" but it doesn't seem
> to do as much as we'd like. Sometimes it doesn't change the BAYES_
> score, and other times it might go from BAYES_50 to BAYES_80
> 
> I think bayes is working (there's also a tonne of mail getting BAYES_99)
> but I'm guessing there's some "learning logic" I'm not aware of to
> explain why me telling SA "this is spam" doesn't seem to be entirely
> listened to?

The Bayesian classifier operates on tokens, not messages. So while
training a message as spam is like "this is spam" as you put it,
according to Bayes it's "these tokens appear in spam".

For each token (think of it as words), the number of ham and spam they
appeared in and have been learned from are counted. The higher that
ratio is, the higher the probability of a message to be the same
classification for any given token found in later mail.


> So my question is: shouldn't "-L spam"/"-L ham" always make SA re-train
> the bayes more explicitly? Or is that really not possible with a single
> email message? (ie it's a statistics thing). Just trying to understand
> the backend :-)

It's statistics. Learning (increasing the number of ham or spam a token
has been seen in) has less effect for tokens seen about equally frequent
in both ham and spam, than if there already is a bias. Similarly, tokens
with high counts need more training to change overall probability, than
tokens less common in mail. IOW, words like "and" will never be a strong
spammyness indicator.


For more details on that entire topic of Bayes and training, I suggest
the sa-learn man page / documentation. For a closer look at the tokens
used for classification see the hammy/spammytokens Template Tags in the
M::SA::Conf docs. Both available here:

  http://spamassassin.apache.org/doc/

For ad-hoc debugging after training see the spamassassin --cf option to
add_header the token details without a need to actually add them to
every mail.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Large commented out body HTML causing SA to timeout/give up/allow spam

2014-09-05 Thread Karsten Bräckelmann
On Fri, 2014-09-05 at 11:55 -0400, Justin Edmands wrote:
> We are seeing a few emails that are about a 1MB and [...]

> dbg: timing: total 46640 ms

> BUT, because the live test likely took 46 seconds, I think SA is
> giving up or something similar. The actual email run through the live
> SA instance shows no score at all.

If SA timed out, this would be reflected in your logs. Your guessing
suggests you did not check logs.

How are you passing messages to SA? Using spamc/d? With spamc the size
limit of messages it will process is 500 kByte by default. Other methods
and glue are likely to have a size limit, too.

Odds are, that message simply has not been passed to SA.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: correct AWL on training

2014-09-04 Thread Karsten Bräckelmann
On Fri, 2014-09-05 at 01:05 +0200, Karsten Bräckelmann wrote:
> The AWL manipulating options are rather limited, offering addition of a
> high scoring positive or negative entry, or plain removal of an address.
> In particular unlike Bayes, AWL doesn't work on a per-message basis.
> Forgetting a single message's history entry is not supported.

In related news: The AWL plugin was enabled by default in 3.1 and 3.2,
disabled by default again since 3.3.

TxRep is a proposed replacement (see bugzilla). It might be worth
evaluating whether it better addresses the features you'd benefit from
in this case, including forgetting or correcting per-message entries.
Since it still is under development, even feature requests or discussing
these issues for TxRep might be worth it.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: correct AWL on training

2014-09-04 Thread Karsten Bräckelmann
On Thu, 2014-09-04 at 09:11 -0600, Jesse Norell wrote:
> On Thu, 2014-09-04 at 13:04 +0200, Matus UHLAR - fantomas wrote:
> > On 03.09.14 15:13, Jesse Norell wrote:

> > >   Both today and in the past I've looked at some FP's that scored very
> > > high on AWL.  At least today I dug up the old messages that caused AWL
> > > to get out of line, and trained them as ham.  AWL's scores still show
> > > the high scores on those (in this case I manually corrected AWL).  It
> > > sure seems like manual training should at minimum remove the incorrect
> > > score from AWL, if not actually make an adjustment in the opposite
> > > direction.

I can see how one could wish for this.

However, keep in mind those are entirely unrelated sub-systems. The AWL
really only is a rather simple historic score-averager.

In this context it is also important to note, that sa-learn is Bayes
only. Any other type of reporting is spamc or spamassassin, including
AWL manipulation. The spamassassin executable notably is the only one
that actually can handle both.

The AWL manipulating options are rather limited, offering addition of a
high scoring positive or negative entry, or plain removal of an address.
In particular unlike Bayes, AWL doesn't work on a per-message basis.
Forgetting a single message's history entry is not supported.


> > spamassassin has options for manipulating adress list:
> > --add-to-whitelist --add-to-blacklist --remove-from-whitelist
> > --add-addr-to-whitelist --add-addr-to-blacklist --remove-addr-from-whitelist
> > 
> > and you can clean up AWL by using sa-awl.
> 
>   I can as an admin, but pop/imap users can't.  They can access the
> spam/ham training, it just doesn't correct the AWL data any.  In this

So you implemented a feedback / training mechanism for Bayes for your
POP or IMAP users. SA doesn't provide it.

> case I'm looking at, a few messages came in first that got AWL way off,
> and now training it as ham (which is hard enough to get users to do)
> doesn't help the situation.  (Some of our systems allow the user access
> to whitelist, but unfortunately this one doesn't - they can't "fix it".)

Bayes training will have an effect of ~5.5 at max, which is the extreme
between BAYES_00 and 999. Real life effect of training is commonly about
half of that max. This is likely to not suffice "way off" AWL scores.
Besides you're trying to correct AWL by Bayes training.


The question is: Why was the AWL score way off in the first place?

In your FP case, why have (more than one?) messages from that sender
address, originating from a given net-block been classified spam before?
Even worse, given AWL now was "way off" and pulled the score above
threshold, the previous messages recorded in AWL are not just spam, but
with a high score. Again, why?


> > >   Ie. after training, AWL had score of ~47 from 7 messages.  Seems like
> > > those FP scores should be subtracted, and even another -5 per message
> > > trained wouldn't hurt.  Likewise, FN should adjust AWL upwards on manual
> > > training, no?
> > 
> > I am not sure how should the manual training be done when talking about AWL.
> > The only way I think is to remove the address from AWL.
> 
>   Just adjust the score would be another option.  "AWL, you got it
> wrong, lets take the score the other direction."  (or at least undue the
> mistake/damage it just did)  You could have a config option for how much
> adjustment to make in the other direction (maybe 3 to 5ish?).

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: A rule for Phil

2014-09-04 Thread Karsten Bräckelmann
On Thu, 2014-09-04 at 13:54 -0600, Philip Prindeville wrote:
> On Sep 3, 2014, at 7:36 PM, Karsten Bräckelmann  
> wrote:

> >> header __KAM_PHIL1To =~ /phil\@example\.com/i
> >> header __KAM_PHIL2Subject =~ /(?:CV|Curriculum)/i
> > 
> > Bonus points for using non-matching grouping. But major deduction of
> > points for that entirely un-anchored case insensitive 'cv' substring
> > match.
> 
> I’d anchor both matches,

Generally correct, of course. For anchoring the To header regex, I
suggest using the To:addr variant I used in my rules. That way the
address easily can be anchored at the beginning /^ and end $/ of the
whole string, which equals the address. Without the :addr option, proper
anchoring is a real mess.

> or else  will fire.

Granted, the To header is cosmetic and does not necessarily hold the
actual recipient address. However, since example.com is the OPs domain
(so to speak), it is unlikely he'll receive mail with addresses like
that. ;)


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Bayes autolearn questions

2014-09-04 Thread Karsten Bräckelmann
On Wed, 2014-09-03 at 23:50 -0400, Alex wrote:

> > > I looked in the quarantined message, and according to the _TOKEN_
> > > header I've added:
> > > 
> > > X-Spam-MyReport: Tokens: new, 47; hammy, 7; neutral, 54; spammy, 16.
> > > 
> > > Isn't that sufficient for auto-learning this message as spam?

That's clearly referring to the _TOKEN_ data in the custom header, is it
not?

> > That has absolutely nothing to do with auto-learning. Where did you get
> > the impression it might?
> 
> If the conditions for autolearning had been met, I understood that it
> would be those new tokens that would be learned.

Learning is not limited to new tokens. All tokens are learned,
regardless their current (h|sp)ammyness.

Still, the number of (new) tokens is not a condition for auto-learning.
That header shows some more or less nice information, but in this
context absolutely irrelevant information.


Auto-learning in a nutshell: Take all tests hit. Drop some of them with
certain tflags, like the BAYES_xx rules. For the remaining rules, look
up their scores in the non-Bayes scoreset 0 or 1. Sum up those scores to
a total, and compare with the auto-learn threshold values. For spam,
also check there are at least 3 points each by header and body rules.
Finally, if all that matches, learn.


> Okay, of course I understood the difference between points and tokens.
> Since the points were over the specified threshold, I thought those
> new tokens would have been added.

As I have mentioned before in this thread: It is NOT the message's
reported total score that must exceed the threshold. The auto-learning
discriminator uses an internally calculated score using the respective
non-Bayes scoreset.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: A rule for Phil

2014-09-03 Thread Karsten Bräckelmann
On Wed, 2014-09-03 at 17:18 -0400, Kevin A. McGrail wrote:
> On 9/3/2014 5:14 PM, Karsten Bräckelmann wrote:
> > > > The specified criteria are trivial, and can be easily translated into
> > > > rules. [...]

> > header __PHIL_TOTo:addr =~ /phil\@example.com/i
> > header __PHIL_SUBJ  Subject =~ /\b(cv|curriculum)\b/i
> >
> > meta PHIL_CURRICULUM  __PHIL_TO && __PHIL_SUBJ
> > describe PHIL_CURRICULUM  CV for Phil
> > scorePHIL_CURRICULUM  -2
> >
> > meta PHIL_NOT_CURRICULUM  __PHIL_TO && !__PHIL_SUBJ
> > describe PHIL_NOT_CURRICULUM  Not a CV for Phil
> > scorePHIL_NOT_CURRICULUM  1

> It appears I did not email the list my response but should provide an 
> interesting exercise if only to see how similar our approach was:

Which isn't much of a surprise. It's practically the very translation of
the stated requirements into simple logic and regex header rules. ;)


> header __KAM_PHIL1To =~ /phil\@example\.com/i
> header __KAM_PHIL2Subject =~ /(?:CV|Curriculum)/i

Bonus points for using non-matching grouping. But major deduction of
points for that entirely un-anchored case insensitive 'cv' substring
match.

(As a matter of principle, since that's a seriously short substring
match. Granted, that char combination is pretty rare in dict/words.)


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: A rule for Phil

2014-09-03 Thread Karsten Bräckelmann
On Wed, 2014-09-03 at 12:30 +0200, Luciano Rinetti wrote:
> Thank You for the answer Karsten,
> you have right, Phil doesn't exists, (as example.com) but i hide the
> real address for obvious reasons, and it is a "role" email that i want
> will receive only mail with subject "CV" or "Curriculum" and all the
> general mail will be treated and scored as spam.
> My intention are not "top secret", i will be glad even only if you
> address me to the "SA conf docs" or "the rule-writing wiki".

Let me google that for you. The first result should be the SA wiki
WritingRules page as a starter.

  http://lmgtfy.com/?q=spamassassin+rule+writing


> Il 03/09/2014 05:21, Karsten Bräckelmann ha scritto:
> > On Mon, 2014-09-01 at 07:36 +0200, Luciano Rinetti wrote:

> > > I need a rule that, when a message is sento to p...@example.com
> > > and the Subject contains "CV" or "Curriculum", scores the message with -9
> > > and a rule that, when a message is sent to to p...@example.com
> > > and the Subject doesn't contains CV or Curriculum, scores the message 
> > > with 7

> > The specified criteria are trivial, and can be easily translated into
> > rules. Reading the SA conf docs and maybe some of the rule-writing wiki
> > docs should enable the reader to do exactly that. (Hint: meta rules)

Oh well, here goes. Untested.

header __PHIL_TOTo:addr =~ /phil\@example.com/i
header __PHIL_SUBJ  Subject =~ /\b(cv|curriculum)\b/i

meta PHIL_CURRICULUM  __PHIL_TO && __PHIL_SUBJ
describe PHIL_CURRICULUM  CV for Phil
scorePHIL_CURRICULUM  -2

meta PHIL_NOT_CURRICULUM  __PHIL_TO && !__PHIL_SUBJ
describe PHIL_NOT_CURRICULUM  Not a CV for Phil
scorePHIL_NOT_CURRICULUM  1

Do note though, that this approach is NOT fool-proof. Messages
containing a CV still can end up classified spam for various reasons.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Bayes autolearn questions

2014-09-02 Thread Karsten Bräckelmann
On Tue, 2014-09-02 at 21:16 -0600, LuKreme wrote:
> On 02 Sep 2014, at 20:50 , Karsten Bräckelmann  wrote:
> > On Tue, 2014-09-02 at 20:22 -0600, LuKreme wrote:

> >> I believe the score threshold is the base score WITHOUT bayes.
> >> 
> >> Try running the email through with a -D flag and see what you get.
> >> 
> >> (And that is only a partial answer, the threshold number ignores
> >> certain classes of tests beyond bayes,but I don't remember which ones.
> >> It's unfortunate that the learn_threshold_spam uses a number that
> >> appears to be related to the spam score, because it isn't.
> > 
> > It is. Using the accompanying, non-Bayes score-set. To avoid direct
> > Bayes self-feeding, and other rules indirect self-feeding due to Bayes-
> > enabled scores.
> > 
> > BTW, if one knows of that mysterious (bayes_auto_) learn_threshold_spam
> > you mentioned, one found the AutoLearnThreshold doc mentioning exactly
> > that: Bayes auto-learning is based on non-Bayes scores.
> 
> But that is not the case, You can have a score without bayes that
> exceeds the threshold and still have the message not auto learned.

True.

I chose to not repeat myself highlighting the details and mentioning the
constraint of header and body rules' points. See my other post half an
hour earlier to this thread. And the docs.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: A rule for Phil

2014-09-02 Thread Karsten Bräckelmann
On Mon, 2014-09-01 at 07:36 +0200, Luciano Rinetti wrote:
> I need a rule that, when a message is sento to p...@example.com
> and the Subject contains "CV" or "Curriculum", scores the message with -9

Scoring the message with $number is impossible and not how SA works.
Triggering a rule with a negative score (e.x. -9) is possible.

> and a rule that, when a message is sent to to p...@example.com
> and the Subject doesn't contains CV or Curriculum, scores the message 
> with 7

Same. Won't "score the message with 7", but can trigger a rule worth
some points.


The specified criteria are trivial, and can be easily translated into
rules. Reading the SA conf docs and maybe some of the rule-writing wiki
docs should enable the reader to do exactly that. (Hint: meta rules)

However, since this request is just too simple, and way too easy too
shoot one's own foot, I'll spend more time on this explanation than
simply dumping the requested flawed rules would take.

What are you actually after? What is your problem?

And why would Phil distinguish that strong between Subject tagged mail
and general mail to him? Sure, because it's not phil but a role account.
But you chose to disguise the purpose, so it's harder for us to help
you.

It's easier, if you don't try to hide your actual question.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Bayes autolearn questions

2014-09-02 Thread Karsten Bräckelmann
On Tue, 2014-09-02 at 20:22 -0600, LuKreme wrote:
> On 02 Sep 2014, at 19:11 , Alex  wrote:
> 
> > However, spam with scores greater than 9.0 aren't being autolearned:
> 
> I believe the score threshold is the base score WITHOUT bayes.
> 
> Try running the email through with a -D flag and see what you get.
> 
> (And that is only a partial answer, the threshold number ignores
> certain classes of tests beyond bayes,but I don't remember which ones.
> It's unfortunate that the learn_threshold_spam uses a number that
> appears to be related to the spam score, because it isn't.

It is. Using the accompanying, non-Bayes score-set. To avoid direct
Bayes self-feeding, and other rules indirect self-feeding due to Bayes-
enabled scores.

BTW, if one knows of that mysterious (bayes_auto_) learn_threshold_spam
you mentioned, one found the AutoLearnThreshold doc mentioning exactly
that: Bayes auto-learning is based on non-Bayes scores.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Bayes autolearn questions

2014-09-02 Thread Karsten Bräckelmann
On Tue, 2014-09-02 at 21:11 -0400, Alex wrote:
> I have a spamassassin-3.4 system with the following bayes config:
> 
> required_hits 5.0
> rbl_timeout 8
> use_bayes 1
> bayes_auto_learn 1
> bayes_auto_learn_on_error 1
> bayes_auto_learn_threshold_spam 9.0
> bayes_expiry_max_db_size 950
> bayes_auto_expire 0
> 
> However, spam with scores greater than 9.0 aren't being autolearned:

http://spamassassin.apache.org/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html


> Sep  2 21:01:51 mail01 amavis[25938]: (25938-10)
> header_edits_for_quar:  ->
> , Yes, score=16.519 tag=-200 tag2=5 kill=5
> tests=[BAYES_50=0.8, KAM_LAZY_DOMAIN_SECURITY=1, KAM_LINKBAIT=5,
> LOC_DOT_SUBJ=0.1, LOC_SHORT=3.1, RCVD_IN_BL_SPAMCOP_NET=1.347,
> RCVD_IN_BRBL_LASTEXT=1.449, RCVD_IN_PSBL=2.3,
> RCVD_IN_UCEPROTECT1=0.01, RCVD_IN_UCEPROTECT2=0.01, RDNS_NONE=0.793,
> RELAYCOUNTRY_CN=0.1, RELAYCOUNTRY_HIGH=0.5, SAGREY=0.01] autolearn=no
> autolearn_force=no
> 
> I've re-read the autolearn section of the docs,

The one I linked to above?

> and don't see any reason why this 16-point email wouldn't have any new
> tokens to be learned?

Rules with certain tflags are ignored when determining whether a message
should be trained upon. Most notably here BAYES_xx.

Moreover, the auto-learning decision occurs using scores from either
scoreset 0 or 1, that is using scores of a non-Bayes scoreset. IOW the
message's score of 16 is irrelevant, since the auto-learn algorithm uses
different scores per rule.

Next safety net is requiring at least 3 points each from header and body
rules, unless autolearn_force is enabled. Which it is not in your
sample.

Either of those could have prevented auto-learning.


Also, according to your wording, you seem to think in terms of (number
of) "new tokens to be learned". Which has nothing in common with
auto-learning.

(Even worse, "new tokens" would strongly apply to random gibberish
strings, hapaxes in Bayes context. Which are commonly ignored in Bayes
classification.)


> I looked in the quarantined message, and according to the _TOKEN_
> header I've added:
> 
> X-Spam-MyReport: Tokens: new, 47; hammy, 7; neutral, 54; spammy, 16.
> 
> Isn't that sufficient for auto-learning this message as spam?

That has absolutely nothing to do with auto-learning. Where did you get
the impression it might?


> I just wanted to be sure this is just a case of not enough new points
> (tokens?) for the message to be learned, and that I I wasn't doing
> something wrong.

Points: aka score, used in the context of per-rule (per-test) and
overall score classifying a message based on the required_score setting.

Token: think of it as "word" used by the Bayesian classifier sub-system.
In practice, it is more complicated than simply space separated words.
Context (e.x. headers) and case might be taken into account, too.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: no subject tagging in case of "X-Spam-Status: Yes"

2014-08-29 Thread Karsten Bräckelmann
On Fri, 2014-08-29 at 12:02 +0200, Reindl Harald wrote:
> Am 29.08.2014 um 04:03 schrieb Karsten Bräckelmann:

> > Now, moving forward: I've had a look at the message diffs. Quite
> > interesting, and I honestly want to figure out what's happening.
> 
> it looks really like spamass-milter is responsible
> 
> in the second version below it whines it can't extract
> the score to decide if it's above reject and so it
> really looks like the milter heavily relies on headers

Yay for case in-sensitive parsing...

> found that out much later last night by plaing with headers in general
> 
> spamass-milter[14891]: Could not extract score from  Tag-Level=5.0, Block-Level=10>
> 
> add_header all Status _YESNO_, score=_SCORE_, tag-level=_REQD_, block-level=10
> add_header all Status _YESNO_, Score=_SCORE_, Tag-Level=_REQD_, Block-Level=10

If you use the SA default Status header, or at least the prefix
containing score and required, is header rewriting retained by the
milter without the Flag header?

  add_header all Status "_YESNO_, score=_SCORE_ required=_REQD_ ..."

Given that log line, a likely explanation simply is that the milter
needs to determine the spam status, to decide which SA generated headers
to apply to the message. Your choice of custom Status header is not what
the milter expects, and thus needs to resort to the simple Flag header.

(Note the comma after yes/no, but no comma between score and required.)


> > First of all, minus all those different datetime strings, IDs and
> > ordering, the real differences are
> > 
> >   -Subject: [SPAM] Test^M
> >   -X-Spam-Flag: Yes^M
> > 
> >   +Subject: Test^M
> > 
> > So it appears that only the sample with add_header spam Flag has the
> > Subject re-written.
> 
> correct
> 
> > However, there's something else going on. When re-writing the Subject
> > header, SA adds an X-Spam-Prev-Subject header with the original. Which
> > is clearly missing.
> 
> the version is killed in smtp_header_checks which is also
> the reason that i started to play around with headers
> 
> nobody but me has a reason to know exact versions of running software

Previous-Subject, not Version.

I mentioned this specifically, because the absence of the Previous
Subject header with Subject rewrite clearly shows, SA generated headers
are not unconditionally added to the message, but single headers are
cherry picked.

IOW, header rewriting does work without the Flag header. It is the glue
that decides whether to inherit the rewritten header, and outright
ignores the Previous Subject header.


> > Thus, something else has a severe impact on which headers are added or
> > modified. In *both* cases, there is at least one SA generated header
> > missing and/or SA modified header not preserved.

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Spam info headers

2014-08-29 Thread Karsten Bräckelmann
On Fri, 2014-08-29 at 00:30 -0400, Alex wrote:
> Regarding report_safe, the docs say it can only be applied to spam. Is
> that correct?

Yes, it only applies to spam. It defines whether classified spam will be
attached to a newly generated reporting message, or only modified by
adding some X-Spam headers.

Ham will never get wrapped in another message by SA...


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: remove_header not working?

2014-08-29 Thread Karsten Bräckelmann
On Fri, 2014-08-29 at 11:46 +0200, Axb wrote:
> Those reports are "added" by Exim's interface which does not seem to 
> respect the local.cf directives.

Exim accessing SA template tags?


> On 08/29/2014 11:29 AM, Fürtbauer Wolfgang wrote:
> > unfortunatelly not, X-Spam-Reports are still there

If the option report_safe 0 is set, SA automatically adds a Report
header, though only to spam. Equivalent

  add_header spam  Report _REPORT_


The following is not only added to ham, but its contents are not the
_REPORT_ template tag but resemble the default "report" template, the
body text used for spam with report_safe 1.

There is no template tag to access the "report" template. Thus, this
header must be defined somewhere in the configuration, complete with all
that text, embedded \n newlines and _PREVIEW_ and _SUMMARY_ template
tags.

> > X-Spam-Report: Spam detection software, running on the system
> >   "hausmeister.intern.luisesteiner.at",
> >   has NOT identified this incoming email as spam.  The original
> >   message has been attached to this so you can view it or label
> >   similar future email.  If you have any questions, see
> >   postmaster for details.
> >
> >   Content preview:  [...]

> >   Content analysis details:   (-221.0 points, 5.0 required)
> >
> >pts rule name  description
> >    -- 
> > --
> >   -100 USER_IN_WHITELIST  From: address is in the user's white-list


> > X-Spam-Report: Software zur Erkennung von "Spam" auf dem Rechner
> >   aohsupport02.asamer.holding.ah

Are there really *two* X-Spam-Report headers?

Also, why is this one in German? SA doesn't mix languages during a
single run.

Why do the hostnames differ?

And, well, which hostmaster fat-fingered that ccTLD?


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Advice on how to block via a mail domain in maillog

2014-08-29 Thread Karsten Bräckelmann
On Fri, 2014-08-29 at 12:43 -0600, Philip Prindeville wrote:
> On Aug 29, 2014, at 6:45 AM, Kevin A. McGrail  wrote:
> > On 8/29/2014 5:48 AM, emailitis.com wrote:

> > > I have a lot of Spam getting into our mail servers where the common
> > > thread is cloudapp

You guys realize cloudapp.net is Microsoft Azure, don't you?


> > > And the hyperlinks in the emails are http://expert.cloudapp.net/.
> > > 
> > > Please could you advise on how I can block by the information on
> > > the maillog on that, or using a rule which checks the URL to include
> > > the above thread?

SA does not block.


> > There is a new feature in trunk that I believe will help you easily
> > called URILocalBL.pm

> That should do it.
> 
> There’s a configuration example in the bug, and POD documentation in
> the plugin, but in this particular case you’d do something like:
> 
> uri_block_cidr L_BLOCK_CLOUDAPP   191.237.208.246
> body L_BLOCK_CLOUDAPP eval:check_uri_local_bl()

That seem an overly complicated variant of a simple uri regex rule. And
it really depends on the IP to match a URI? And manual looking it up?

  uri URI_EXPERT_CLOUDAPP  m~^https?://expert\.cloudapp\.net$~


> describe L_BLOCK_CLOUDAPP Block URI’s pointing to expert.cloudapp.net
> score L_BLOCK_CLOUDAPP5.0

SA does not block. *sigh*


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Add spamassassin triggered rules in logs when email is blocked

2014-08-29 Thread Karsten Bräckelmann
On Fri, 2014-08-29 at 11:27 -0400, Karl Johnson wrote:
> I'm using amavisd-new-2.9.1 and SpamAssassin v3.3.1. I would like to
> know if it's possible to add Spamassassin triggered rules when an
> email is blocked because I discard the email when it's spam and I want
> to know why it's blocked (which rules).

Wrong place, that is an Amavis question. SA does not reject, discard or
otherwise block mail. Amavis does, based on the SA score.


> For now I only have the score (hits) in maillog:
> 
> Aug 24 04:04:36 relais amavis[3475]: (03475-08) Blocked SPAM
> {DiscardedInternal}, MYNETS LOCAL [205.0.0.0]:54459 [205.0.0.0]
>  -> , Message-ID:
> , mail_id: 4RZ-Vm0_iZmi, Hits: 13.573,
> size: 4269, 10089 ms

That log line is generated by Amavis. SA has no control of its contents.


> I would like to add in logs for example:
> 
> DATE_IN_FUTURE_06_12=0.001, DCC_CHECK=4,
> SPF_PASS=-0.001,TVD_SPACE_RATIO=0.001
> 
> Is that possible?

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: no subject tagging in case of "X-Spam-Status: Yes"

2014-08-28 Thread Karsten Bräckelmann
On Fri, 2014-08-29 at 02:15 +0200, Reindl Harald wrote:
> look at the attached zp-archive [...]

Since I already had a closer look at the contents including your local
cf, and I am here to offer help and didn't mean no harm, some comments
regarding the SA config.


> # resolves a bug with milter always triggering a wrong informational header
> score UNPARSEABLE_RELAY 0

See the RH bug you filed and its upstream report. Do you still need
that? This would be the first instance of continued triggering of that
test I ever encountered.


> # disable most builtin DNSBL/DNSWL to not collide with webinterface settings
> score __RCVD_IN_SORBS 0
> score __RCVD_IN_ZEN 0
> score __RCVD_IN_DNSWL 0

Rules starting with double-underline are non-scoring sub-rules.
Assigning a zero score doesn't disable them like it does with regular
rules. In the case of RBL sub-rules like the above, it does not prevent
DNS queries. It is better to

  meta __FOO 0

overwrite the sub-rule, rather than set a score that doesn't exist.


> # unconditional sender whitelists
> whitelist_from *@apache.org
> whitelist_from *@bipa.co.at
> whitelist_from *@centos.org
> whitelist_from *@dovecot.org
  [...]

Unconditional whitelisting generally is a bad idea and might appear in
forged addresses.

If possible, it is strongly suggested to use whitelist_from_auth, or at
least whitelist_from_rcvd (which requires *_networks be set correctly).


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: no subject tagging in case of "X-Spam-Status: Yes"

2014-08-28 Thread Karsten Bräckelmann
On Fri, 2014-08-29 at 02:15 +0200, Reindl Harald wrote:
> look at the attached zp-archive and both messages
> produced with the same content before you pretend
> others lying damned - to make it easier i even
> added a config-diff

But no message diff. ;)

> and now what?
> 
> maybe you should accept that even new users are
> no idiots and know what they are talking about

Please accept my apologies. It appears something else is going on here,
and you in fact did not lie.

I'd like to add, though, that I do *not* assume new users to be idiots.
Plus, I generally spend quite some time on helping others fixing their
problems, including new users, as you certainly have noticed.


Now, moving forward: I've had a look at the message diffs. Quite
interesting, and I honestly want to figure out what's happening.

First of all, minus all those different datetime strings, IDs and
ordering, the real differences are

  -Subject: [SPAM] Test^M
  -X-Spam-Flag: Yes^M

  +Subject: Test^M

So it appears that only the sample with add_header spam Flag has the
Subject re-written.

However, there's something else going on. When re-writing the Subject
header, SA adds an X-Spam-Prev-Subject header with the original. Which
is clearly missing.

Thus, something else has a severe impact on which headers are added or
modified. In *both* cases, there is at least one SA generated header
missing and/or SA modified header not preserved.

Definitely involved: Postfix, spamass-milter, SA. And probably some
other tool rewriting the message / reflowing headers, as per some
previous posts (and the X-Spam-Report header majorly inconvenienced by
re-flowing headers).

Regarding SA and the features in question: There is no different
behavior between calling the plain spamassassin script and using
spamc/d. There is absolutely nothing in SA itself that could explain the
discrepancy in Subject rewriting, nor the missing X-Spam-Prev-Subject
header.

My best bet would be on the SA invoking glue, not accepting or
overwriting headers as received by SA. Which tool that actually is, I
don't know. But I'd be interested to hear about it, if you find out.


(The additional empty line between message headers and body in the case
without X-Spam-Flag header most likely is just copy-n-paste body. Or
possibly another artifact of some tool munging messages.)


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: writing own rbl rules

2014-08-28 Thread Karsten Bräckelmann
On Fri, 2014-08-29 at 01:59 +0200, Reindl Harald wrote:
> Am 29.08.2014 um 01:51 schrieb Karsten Bräckelmann:
> > On Fri, 2014-08-29 at 01:06 +0200, Reindl Harald wrote:

> > > the question was just "how can i enforce RBL tests inside the own LAN"
> > 
> > RBL tests cannot be enforced. Internal and trusted networks settings
> > need to be configured correctly to match the RBL test's scope, in your
> > case last-external.
> > 
> > If there are trusted relays found in the Received headers, and the first
> > trusted one's connecting relay is external (not in the internal_networks
> > set), then an RBL test for last-external will be run.
> > 
> > This is entirely unrelated to "own LAN" or "network range"
> 
> that may all be true for blacklists and default RBL rules
> 
> it is no longer true in case of 4 internal WHITELISTS which you
> want to use to LOWER scores to reduce false positives while
> otherwise bayes may hit - such traffic can also come from
> the internal network

There is absolutely no difference between black and whitelists. With the
only, obvious exception of the rule's score.

So, yes, it still is true in the case of (internal) whitelists.


Besides that, you are (still) confusing SA *_networks settings with the
local network topology. They are loosely related, but don't have to
match.

You can easily run RBL tests against IPs from within the local network
and treat them like any other sending SMTP client, by  (a) excluding
them from the appropriate *_networks settings, and  (b) define the RBL
test accordingly. If you want to query for the last-external, it has to
be the last external relay according to the configuration.

BTW, unless the set of IPs to whitelist is permanently changing, it is
much easier to write a negative score rule based on the X-Spam-Relays-*
pseudo-headers. This also has the benefit of being highly flexible, not
depend on trust borders and allow to maintain internal_networks matching
the LAN topology.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: no subject tagging in case of "X-Spam-Status: Yes"

2014-08-28 Thread Karsten Bräckelmann
On Fri, 2014-08-29 at 01:23 +0200, Reindl Harald wrote:
> Am 29.08.2014 um 01:20 schrieb Karsten Bräckelmann:
> > On Fri, 2014-08-29 at 00:30 +0200, Reindl Harald wrote:
> > > besides the permissions problem after the nightly "sa-update" the reason
> > > was simply "clear_headers" without "add_header spam Flag _YESNO" which
> > > is entirely unexpected behavior
> > 
> > No, that is not the cause.
> > 
> > $ echo -e "Subject: Foo\n" | ./spamassassin | grep Subject
> > Subject: [SPAM] Foo
> > X-Spam-Prev-Subject: Foo
> > 
> > $ cat rules/99_DEVEL.cf
> > required_score -999# regardless of score, classify spam
> ># to enforce header rewriting
> > clear_headers
> > rewrite_header Subject [SPAM]
> > 
> > Besides, your own reply to my first post to this thread on Mon also
> > shows this claim to be false. The output of the command I asked you to
> > run clearly shows clear_headers in your config being in effect and a
> > rewritten Subject
> 
> i verfied that 20 times in my environment
> 
> removing the line "add_header spam Flag _YESNO_" and no tagging
> maybe the combination of spamass-milter and SA but it's fact

So far I attributed most of your arguing to being stubborn and
opinionated. Not any longer.

Now you're outright lying.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: writing own rbl rules

2014-08-28 Thread Karsten Bräckelmann
On Fri, 2014-08-29 at 01:06 +0200, Reindl Harald wrote:
> the question was just "how can i enforce RBL tests inside the own LAN"

> the question was just "how can i enforce RBL tests inside the own LAN"

> the question was just "how can i enforce RBL tests inside the own LAN"

RBL tests cannot be enforced. Internal and trusted networks settings
need to be configured correctly to match the RBL test's scope, in your
case last-external.

If there are trusted relays found in the Received headers, and the first
trusted one's connecting relay is external (not in the internal_networks
set), then an RBL test for last-external will be run.

This is entirely unrelated to "own LAN" or "network range".


> >>> Received headers before that simply CANNOT be trusted. There is no way
> >>> to guarantee the host they claim to have received the message from is
> >>> legit
> >>
> >> in case running postfix with SA as milter *there are no* Received
> >> headers *before* because there is nobody before
> > 
> > There almost always is at least one Received header before, the sender's
> > outgoing SMTP server
> 
> *no no no and no again*
> 
> there is no Received header before because a botnet zombie don't use
> a outgoing SMTP server

I said "almost always", with direct-to-MX delivery being the obvious
exception. Possible with botnet spam, yes, but too easy to detect. Thus,
botnet zombies frequently forge Received headers.

(Besides, in your environment SA won't see much botnet spam anyway.
Spamhaus PBL as first level of defense in your Postfix configuration
will reject most of them. But that's not the point here.)


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: no subject tagging in case of "X-Spam-Status: Yes"

2014-08-28 Thread Karsten Bräckelmann
On Fri, 2014-08-29 at 00:30 +0200, Reindl Harald wrote:
> besides the permissions problem after the nightly "sa-update" the reason
> was simply "clear_headers" without "add_header spam Flag _YESNO" which
> is entirely unexpected behavior

No, that is not the cause.

$ echo -e "Subject: Foo\n" | ./spamassassin | grep Subject
Subject: [SPAM] Foo
X-Spam-Prev-Subject: Foo

$ cat rules/99_DEVEL.cf
required_score -999# regardless of score, classify spam
   # to enforce header rewriting
clear_headers
rewrite_header Subject [SPAM]


Besides, your own reply to my first post to this thread on Mon also
shows this claim to be false. The output of the command I asked you to
run clearly shows clear_headers in your config being in effect and a
rewritten Subject.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: writing own rbl rules

2014-08-28 Thread Karsten Bräckelmann
On Fri, 2014-08-29 at 00:22 +0200, Reindl Harald wrote:
> the simple answer to my question would have been "no, in no case SA does
> any RBL check if the client is from the same network range and there is
> no way to change that temporary even for development" [...]

That would have been simpler indeed, but that also would have been
wrong.


> if there is no hop before and hence no received headers before
> there is still a known IP - the one and only and in that case
> the currently connection client - there is no reason not fire
> a DNSBL/DNSWL against that IP

SA is not the SMTP server, it has no knowledge of the connection's
remote IP. SA depends on the Received headers added by the internal
network's SMTP server (or its milter) to get that information.


> > Besides: SA is not an SMTP. It does not add the Received header. And it
> > absolutely has to inspect headers, whether you like that or not. That is
> > how SA determines exactly that last, trustworthy, "physical" IP. And for
> > that, trusted and internal networks need be correct, so by extension
> > external networks also are correct.
> 
> and the machine SA is running on receiving the message adds that
> header which is in case of direct testing the one and only and
> so trustable

Your configuration stated that machine is not trustable.

> > In particular, your MX, your first internal relay, absolutely MUST be
> > trusted by SA. That is the SMTP relay identifying the sending host,
> > complete with IP and rDNS.
> 
> again: the machine running SA *is the MX*

Correct (even though it is irrelevant whether it is or not). So don't
configure SA to not trust that machine, and include at the very least
that IP in your trusted_networks.

Your configuration stated that machine is not trustable.


> > Received headers before that simply CANNOT be trusted. There is no way
> > to guarantee the host they claim to have received the message from is
> > legit
> 
> in case running postfix with SA as milter *there are no* Received
> headers *before* because there is nobody before

There almost always is at least one Received header before, the sender's
outgoing SMTP server.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Reporting to SpamCop

2014-08-28 Thread Karsten Bräckelmann
On Thu, 2014-08-28 at 16:14 -0500, Chris wrote:
> I'm having an issue with getting SA 3.4.0 when run as spamassassin -D -r
> to report spam to SpamCop. The errors I'm seeing are:

Ignoring the Perl warnings for now.

> In my v310.pre file I have:
> 
> loadplugin Mail::SpamAssassin::Plugin::SpamCop 
> /usr/local/share/perl/5.18.2/Mail/SpamAssassin/Plugin/SpamCop.pm

It should never be necessary to provide the (optional) filename argument
with stock SA plugins. Even worse, absolute paths will eventually be
harmful.

> I have set the SpamCop from and to addresses in the SpamCop.pm file:

The Perl modules are no user-serviceable parts. Do not edit them.

Moreover, the SpamCop plugin provides the spamcop_(from|to)_address
options to set these in your configuration. See

  http://spamassassin.apache.org/doc/Mail_SpamAssassin_Plugin_SpamCop.html

> setting => 'cpoll...@example.com',
> setting => 'submit.exam...@spam.spamcop.net',

Wait... What exactly did you edit?

The only instances of 'setting' in SpamCop.pm are the ones used to
register SA options. Did you replace the string spamcop_from_address
with your email address?

I have a gut feeling the Perl warnings will disappear, if you revert any
modifications to the SpamCop.pm Perl module and set the options in your
configuration instead...


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: formatting of report headers

2014-08-28 Thread Karsten Bräckelmann
On Thu, 2014-08-28 at 21:43 +0200, Reindl Harald wrote:
> Am 28.08.2014 um 19:11 schrieb Karsten Bräckelmann:

> > FWIW, SA even generates the Report header by default with your setting
> > of report_safe 0. Not in your case, because you chose to clear_headers
> > and manually define almost identical versions to the default headers.

More detail, in addition to my other reply.

> # header configuration
> fold_headers 1
> report_safe 0

 "If this option is set to 0, [...]. In addition, a header named
  X-Spam-Report will be added to spam."  -- M::SA::Conf docs

> X-Spam-Status: No, score=0.3 required=5.0 tests=BAYES_50,CUST_DNSBL_2,
>   
> CUST_DNSBL_5,CUST_DNSWL_7,DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,SPF_SOFTFAIL
>   autolearn=disabled version=3.4.0

Not spam, no X-Spam-Report header.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: formatting of report headers

2014-08-28 Thread Karsten Bräckelmann
On Thu, 2014-08-28 at 21:43 +0200, Reindl Harald wrote:
> Am 28.08.2014 um 19:11 schrieb Karsten Bräckelmann:

> > FWIW, SA even generates the Report header by default with your setting
> > of report_safe 0. Not in your case, because you chose to clear_headers
> > and manually define almost identical versions to the default headers.
> 
> no, it don't

Yes, it does.

Read my comment again, carefully. And see the docs, option report_safe
in the section Basic Message Tagging Options.

  http://spamassassin.apache.org/doc/Mail_SpamAssassin_Conf.html


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Certain types of spam seem to get through SA

2014-08-28 Thread Karsten Bräckelmann
On Thu, 2014-08-28 at 09:15 -0600, LuKreme wrote:
> X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on mail.covisp.net
> X-Spam-Level: *
> X-Spam-Status: No, score=1.7 required=5.0 tests=URIBL_BLACK autolearn=no
>   version=3.3.2

> X-Spam-Status: No, score=-0.0 required=5.0 tests=SPF_HELO_PASS,SPF_PASS
>   autolearn=ham version=3.3.2

Bayes and auto-learning are enabled, yet there are no BAYES_XX rules hit
in either sample. Something seems broken.

(Not a first time poster, so I just assume the Bayes DB isn't fresh.)


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: formatting of report headers

2014-08-28 Thread Karsten Bräckelmann
On Thu, 2014-08-28 at 11:08 +0200, Reindl Harald wrote:
> is it somehow possible to get line-breaks in the
> report headers to have them better readable?

SA inserts line-breaks by default, to keep headers below 80 chars wide.


> report_safe 0
> clear_headers
> add_header spam Flag _YESNO_
> add_header all Status _YESNO_, score=_SCORE_/_REQD_, tests=_TESTS_, 
> report=_REPORT_

> on the shell it looks like this

What you get in the shell is precisely what SA returns -- to the shell
or any other calling process. Any reformatting or re-flow of multiline
headers has been done by other tools.


> X-Spam-Status: No, score=4.3/5.0,
> tests=ADVANCE_FEE_4_NEW,ADVANCE_FEE_4_NEW_MONEY,ADVANCE_FEE_5_NEW,ADVANCE_FEE_5_NEW_MONEY,ALL_TRUSTED,BAYES_99,BAYES_999,DEAR_SOMETHING,DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,LOTS_OF_MONEY,T_MONEY_PERCENT,URG_BIZ,
> report=
> * -2.0 ALL_TRUSTED Passed through trusted hosts only via SMTP
> *  3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100%
> *  [score: 1.]

That long _TESTS_ string without line-breaks is due to the very long
_REPORT_ in that header. If you add a dedicated Report header, the
Status header and its list of tests will be wrapped appropriately, too.

FWIW, SA even generates the Report header by default with your setting
of report_safe 0. Not in your case, because you chose to clear_headers
and manually define almost identical versions to the default headers.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Spam info headers

2014-08-27 Thread Karsten Bräckelmann
On Wed, 2014-08-27 at 21:37 -0400, Alex wrote:
> On Wed, Aug 27, 2014 at 6:18 PM, Karsten Bräckelmann  
> wrote:

> > The URIs [1] are automatically added to the uridnsbl rule's description
> > for _REPORT_ and _SUMMARY_ template tags. The latter is identical to the
> > additional summary at the end with the -t option, the first one is
> > suitable for headers.
> >
> >   add_header spam  Report _REPORT_
> >
> > That Report header is set by default with report_safe 0 (stock SA, not
> > Amavis).
> 
> I now recall having added a few custom headers in the past, and it was
> indeed necessary to instruct amavis to display them. I did a little
> more digging around, and learned how I was doing it previously was
> replaced with the following, in amavisd.conf:
> 
>   $allowed_added_header_fields{lc('X-Spam-Report')} = 1;
> 
> So I've modified my local.cf with the following:
> 
> report_safe 0
> clear_report_template

That's actually a historic, unfortunate naming.

Despite it's name, the report option (see 10_default_prefs.cf) sets the
template used with report_safe 1 or 2, which by default shows a brief
description, (attached spam) content preview and _SUMMARY_.

It does not have any impact on the X-Spam-Report header added with
report_safe 0 by default or the _REPORT_ template tag.

In the case of report_safe 0, the clear_report_template option actually
has no effective impact at all. That "report" will just not be added
anyway.

> add_header all Report _REPORT_
> 
> Despite specifying "all", it's only displayed in quarantined messages.
> I need it to be displayed on non-spam messages, and "all" messages
> would be most desirable.

That'd be an Amavis specific issue. Using add_header all, SA does add
that header to both ham and spam no matter what. In particular,
quarantining is outside the scope of SA, and if that makes a difference
whether a certain header appears or not, that's also outside the scope
of SA.


> There's also this in the SA conf docs:
> 
>report ...some text for a report...
>  Set the report template which is attached to spam mail
>  messages. See the 10_misc.cf configuration file in  
>  /usr/share/spamassassin for an example.

> Is this still valid? 10_misc.cf apparently no longer exists, so I
> wasn't able to follow through there.

Wow. 10_misc.cf last appeared in 3.1.x, and is otherwise identical to
10_default_prefs.cf since 3.2. In particular with respect to that very
doc snippet -- nothing at all changed in that paragraph, except that
file name.

You want to update your docs bookmarks.


It's times like these I wonder whether I am the only one left grepping
his way through files and directories, searching for $option. Or
remembering the ancient magic of a , when looking for possibly
matching (numbered!) files...


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Spam info headers

2014-08-27 Thread Karsten Bräckelmann
On Wed, 2014-08-27 at 17:07 -0400, Alex wrote:
> I've set up a local URI DNSBL and I believe there are some FPs that
> I'd like to identify. I've currently set up amavisd to set
> $sa_tag_level_deflt at a value low enough that it always produces the
> X-Spam-Status header on every email.
> 
> It will show "LOC_URIBL=1" in the status, but is it possible to have
> it somehow report/show the domain that caused the rule to fire, in the
> same way that it can be done with spamassassin directly on the
> command-line using -t?

The URIs [1] are automatically added to the uridnsbl rule's description
for _REPORT_ and _SUMMARY_ template tags. The latter is identical to the
additional summary at the end with the -t option, the first one is
suitable for headers.

  add_header spam  Report _REPORT_

That Report header is set by default with report_safe 0 (stock SA, not
Amavis).


[1] Actually lists a single one only, if multiple URIs are hit. That's a
comment documented TODO item.

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: writing own rbl rules

2014-08-26 Thread Karsten Bräckelmann
On Wed, 2014-08-27 at 03:01 +0200, Reindl Harald wrote:
> > If it's internal, it's internal. There is a reason you are setting up
> > lastexternal DNSxL rules.
> 
> the intention is to handle the internal IP like it would be external

Again: Craft your samples to match real-life (production) environment.
Do not configure or try to fake an environment that will not match
production later. It won't work.

You want to configure SA. So configure SA. Correctly.

If you insist on not following that advice, please refrain from further
postings to this list.


> >> Aug 27 00:59:29.249 [30833] dbg: metadata: X-Spam-Relays-Untrusted: [ 
> >> ip=10.0.0.19 rdns=mail-gw.thelounge.net
> >> helo=mail-gw.thelounge.net by=mail.thelounge.net ident= envfrom= intl=0 
> >> id=3hjPzJ6TWVz23 auth= msa=0 ] [
> >> ip=10.0.0.6 rdns=arrakis.thelounge.net helo=arrakis.thelounge.net 
> >> by=mail-gw.thelounge.net ident= envfrom= intl=0
> >> id=3hjPzJ2tkPz1w auth= msa=0 ]
> > 
> > There is no X-Spam-Relays-Trusted metadata in your grep for "dns", which
> > means there is absolutely no trusted relay. Given those relays are in
> > the 10/8 class A network and you deliberately breaking trusted_networks
> > in a previous post, that seems about right...
> 
> the intention to berak it was to behave like it is external
> and just check the RBL behavior

Read my previous post again, carefully. If you define everything to be
external, there is no *last* external SA can trust.


> > Anyway, there are no "dbg: dns: IPs found:" and "dbg: dns: launching"
> > lines, so this clearly shows the RBLs are NOT queried.
> 
> that's my problem :-)

So you know how to fix it. Configure *_networks in SA correctly, and
send a message from an external host.


> > No activity with your custom RBL either. But well, how would you expect
> > SA to query *last* external, given you deliberately told SA there are no
> > internal relays...
> 
> well, there will never be internal relays, just a inbound-only MX

That IS an internal relay. Your MX must be in your internal_networks,
and it is by the very definition of MX an SMTP relay.


> > All external. No internal, no last external aka "hop before first
> > internal" either.
> 
> i want that RBL checks in general only for the *phyiscal* IP
> with no header inspections - 90% of inflow will be finally
> filtered out by postcsreen anyways

You need an internal, trusted relay to get that IP you desire. That
relay is what generates the Received header with precisely that IP.

Besides: SA is not an SMTP. It does not add the Received header. And it
absolutely has to inspect headers, whether you like that or not. That is
how SA determines exactly that last, trustworthy, "physical" IP. And for
that, trusted and internal networks need be correct, so by extension
external networks also are correct.


> > First of all, do read and understand the (trusted|internal)_networks
> > options in the M::SA::Conf [1] docs, section Network Test Options.
> > 
> > Then remove the current bad *_networks options in your conf. If you
> > don't fully understand those docs, keep it at that, default. If you do
> > understand and see an actual need to manually set them, do so, but do so
> > *correctly*.
> 
> the intention is no trust / untrust at all and handle any IP
> with it's phyiscal connection

Do read the docs I linked to.

You are totally misunderstanding trust. It is not about what you trust,
or don't. It is about which Received headers SA can trust to be correct.

In particular, your MX, your first internal relay, absolutely MUST be
trusted by SA. That is the SMTP relay identifying the sending host,
complete with IP and rDNS.

Received headers before that simply CANNOT be trusted. There is no way
to guarantee the host they claim to have received the message from is
legit.


> > [1] http://spamassassin.apache.org/doc/Mail_SpamAssassin_Conf.html
> 
> thanks!

In general, I stand to what I wrote in the previous post. And I strongly
suggest you follow that advice.

The approach you tried and defended with claws in this already lengthy
thread will not work and is bound to fail. Stop arguing, and start
setting up a serious test environment and correct SA options.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: writing own rbl rules

2014-08-26 Thread Karsten Bräckelmann
On Wed, 2014-08-27 at 01:08 +0200, Reindl Harald wrote:
> below the stdout/sterr of following script filtered for "dns"
> so the lists are asked, but the question remains why that
> don't happen from a IP in the same network

Nope, no RBL queries. See below.

> in the meantime there are a lot of "cust-lastexternal"
> generated from a web-interface including the 4 below and
> the local network range is listed on them, hence why i
> want them used unconidtionally and not only with foreign IP's

If it's internal, it's internal. There is a reason you are setting up
lastexternal DNSxL rules.

Do not invalidate SA *_networks configuration in an attempt to adjust it
to poorly, non real-live generated samples. Generate a proper sample
instead, either by actually sending mail from external IPs, or if need
be by manually editing the MX Received header, forging an external
source (do pay attention to detail).

Besides, there is no point in whitelisting your own LAN IPs. Those
should simply hit ALL_TRUSTED, or just not be filtered in the first
place.


> /usr/bin/spamassassin -D  < /var/lib/spamass-milter/spam-example.eml

> [sa-milt@mail-gw:~]$ cat debug.txt | grep -i dns

> Aug 27 00:59:29.249 [30833] dbg: metadata: X-Spam-Relays-Untrusted: [ 
> ip=10.0.0.19 rdns=mail-gw.thelounge.net
> helo=mail-gw.thelounge.net by=mail.thelounge.net ident= envfrom= intl=0 
> id=3hjPzJ6TWVz23 auth= msa=0 ] [
> ip=10.0.0.6 rdns=arrakis.thelounge.net helo=arrakis.thelounge.net 
> by=mail-gw.thelounge.net ident= envfrom= intl=0
> id=3hjPzJ2tkPz1w auth= msa=0 ]

There is no X-Spam-Relays-Trusted metadata in your grep for "dns", which
means there is absolutely no trusted relay. Given those relays are in
the 10/8 class A network and you deliberately breaking trusted_networks
in a previous post, that seems about right...

> Aug 27 00:59:29.249 [30833] dbg: metadata: X-Spam-Relays-External: [ 
> ip=10.0.0.19 rdns=mail-gw.thelounge.net
> helo=mail-gw.thelounge.net by=mail.thelounge.net ident= envfrom= intl=0 
> id=3hjPzJ6TWVz23 auth= msa=0 ] [
> ip=10.0.0.6 rdns=arrakis.thelounge.net helo=arrakis.thelounge.net 
> by=mail-gw.thelounge.net ident= envfrom= intl=0
> id=3hjPzJ2tkPz1w auth= msa=0 ]

Same issue with X-Spam-Relays-Internal not showing up in the grep, thus
being completely empty. Unless you specified internal_networks manually,
it is set to trusted_networks. Thus equally invalid.


> Aug 27 00:59:29.254 [30833] dbg: dns: checking RBL bl.spameatingmonkey.net., 
> set cust12-lastexternal
> Aug 27 00:59:29.254 [30833] dbg: dns: checking RBL spam.dnsbl.sorbs.net., set 
> cust15-lastexternal
> Aug 27 00:59:29.254 [30833] dbg: dns: checking RBL psbl.surriel.com., set 
> cust14-lastexternal

All those third-party RBLs with your "cust" sets are extremely fishy.

Anyway, there are no "dbg: dns: IPs found:" and "dbg: dns: launching"
lines, so this clearly shows the RBLs are NOT queried.

> Aug 27 00:59:29.254 [30833] dbg: dns: checking RBL dnswl-low.thelounge.net., 
> set cust16-lastexternal

No activity with your custom RBL either. But well, how would you expect
SA to query *last* external, given you deliberately told SA there are no
internal relays...

All external. No internal, no last external aka "hop before first
internal" either.


First of all, do read and understand the (trusted|internal)_networks
options in the M::SA::Conf [1] docs, section Network Test Options.

Then remove the current bad *_networks options in your conf. If you
don't fully understand those docs, keep it at that, default. If you do
understand and see an actual need to manually set them, do so, but do so
*correctly*.

Hints on gathering relevant information from the debug output:

Don't just grep for generic "dns", but check specifics by grepping for
X-Spam-Relays and (trusted|internal)_networks. Better yet, don't grep
but search the debug output interactively, and read nearby / related
info.

While debugging, actually reading, searching for terms or at least
glimpsing the entire debug output is good advice anyway.


[1] http://spamassassin.apache.org/doc/Mail_SpamAssassin_Conf.html

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Prevent DNSBL URI matches, without affecting regex URI rules?

2014-08-26 Thread Karsten Bräckelmann
On Tue, 2014-08-26 at 11:22 -0400, Kris Deugau wrote:
> Is there a way to prevent a URI from being looked up in DNSBLs, without
> *also* preventing that URI from matching on uri regex rules?
> 
> I would like to add quite a few popular URL shorteners to
> uridnsbl_skip_domain, but then I can't match those domains in uri regex
> rules for feeding "x and URL shortener" meta rules.

Works for me.

$ echo -e "\n example.com" | ./spamassassin -D --cf="uri HAS_URI /.+/"
dbg: rules: ran uri rule HAS_URI ==> got hit: "http://example.com";

$ ./spamassassin --version
SpamAssassin version 3.3.3-r1136734
  running on Perl version 5.14.2

$ grep example.com rules/25_uribl.cf
uridnsbl_skip_domain example.com example.net example.org


> Still using SA 3.3.2;  if the behaviour of uridnsbl_skip_domain has been
> narrowed down in 3.4 to only skipping the listed domains on DNSBL
> lookups (as per its name) that may prod me to get 3.4 running.

Oh, 3.3.2...

Also verified the 3.3.2 (and 3.3.0 for that matter) svn tag version, in
addition to my local 3.3 branch above. Same result, works for me.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: drop of score after update tonight

2014-08-25 Thread Karsten Bräckelmann
On Tue, 2014-08-26 at 00:08 +0200, Reindl Harald wrote:
> the "bayes=1.00" below makes me wonder because around 1000 careful
> selected ham/spam messages for training - IMHO that should be more in
> such clear cases

Please do read the docs or at least the rule's description (hint, see
the BAYES_99 one) before venting such opinion.

The Bayesian Classifier returns a probability of the mail being ham or
spam, in a range between 0 and 1. Zero being ham, 1 spam, and a value of
0.5 being neutral, kind of undecided.

A bayes value of 1. is as high as it gets, and the rules'
descriptions also clearly state the spam probability being 99.9 to 100%.


> however, i admit that i am a beginner with SA!
> 
> Aug 26 00:01:32 mail-gw spamd[6836]: spamd: result: Y 5 -
> ADVANCE_FEE_4_NEW,ADVANCE_FEE_4_NEW_MONEY,ADVANCE_FEE_5_NEW,ADVANCE_FEE_5_NEW_MONEY,ALL_TRUSTED,BAYES_99,BAYES_999,DEAR_SOMETHING,DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,LOTS_OF_MONEY,T_MONEY_PERCENT,URG_BIZ
> scantime=0.3,size=4760,user=sa-milt,uid=189,required_score=1.0,rhost=localhost,raddr=127.0.0.1,rport=29317,mid=<*>,bayes=1.00,autolearn=disabled

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: no subject tagging in case of "X-Spam-Status: Yes"

2014-08-25 Thread Karsten Bräckelmann
On Mon, 2014-08-25 at 19:43 +0200, Reindl Harald wrote:
> Am 25.08.2014 um 19:13 schrieb Karsten Bräckelmann:

> > No tests at all. I doubt the milter generated all those missing headers
> > including From and Date, instead of a Received one only. So it seems the
> > restricted sa-milt user has no read permissions on the SA config.
> > 
> > As that user, have a close look at the -D debug output.
> > 
> > spamassassin -D --lint
> 
> bingo - only a snippet below
> thank you so much for setp in that thread


> the files inside exept one have correct permissions (0644)
> but "/var/lib/spamassassin/3.004000/updates_spamassassin_org" not

> i guess i will setup a cronjob to make sure the permissions
> below "/var/lib/spamassassin/" are 755 and 644 for any item

A dedicated cron job doesn't make sense. You should add that to the
existing cron job that runs sa-update and conditionally restarts spamd.
Changing permissions has to be done before restarting spamd.

Alternatively, ensure the respective users for spamd, sa-update and the
milter are identical, or at least share a common group.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: no subject tagging in case of "X-Spam-Status: Yes"

2014-08-25 Thread Karsten Bräckelmann
On Mon, 2014-08-25 at 18:55 +0200, Reindl Harald wrote:
> Am 25.08.2014 um 18:00 schrieb Karsten Bräckelmann:

> > What does this command return?
> > 
> >   echo -e "Subject: Foo\n" | spamassassin --cf="required_score 1"
> 
> as root as expected the modified subject
> as the milter user the unmodified

> [root@mail-gw:~]$ echo -e "Subject: Foo\n" | spamassassin 
> --cf="required_score 1"

> X-Spam-Status: Yes, score=3.7 required=1.0 tests=MISSING_DATE,MISSING_FROM,
> MISSING_HEADERS,MISSING_MID,NO_HEADERS_MESSAGE,NO_RECEIVED,NO_RELAYS
> Subject: [SPAM] Foo
> X-Spam-Prev-Subject: Foo

Exactly as expected. Subject tagging works.


> [root@mail-gw:~]$ su - sa-milt
> [sa-milt@mail-gw:~]$ echo -e "Subject: Foo\n" | spamassassin 
> --cf="required_score 1"

> X-Spam-Status: No, score=0.0 required=1.0 tests=none
> Subject: Foo

No tests at all. I doubt the milter generated all those missing headers
including From and Date, instead of a Received one only. So it seems the
restricted sa-milt user has no read permissions on the SA config.

As that user, have a close look at the -D debug output.

  spamassassin -D --lint


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: drop of score after update tonight

2014-08-25 Thread Karsten Bräckelmann
On Mon, 2014-08-25 at 17:47 +0200, Reindl Harald wrote:

> yes and that is one which the currently existing
> Barracuda Spamfirewall scored with around 20 and
> grabbed from the backend there for testings

> the plain content i attached as ZIP (what made it to the listg)
> is used for testing by just copy the content to a formmailer or
> in a new plaintext message in TB point directly to the test MX

Given  (a) you disabled RBL checks in SA,  (b) that sample is a plain
body without any headers, and  (c) your method of sending the sample
even hits ALL_TRUSTED,  SA still does a pretty decent job in comparison.

The Barracuda appliance you're comparing results to did not have those
disadvantages.


Anyway, changing scores after a successful sa-update are to be expected.
The re-scoring algorithm only uses the default threshold of 5.0, it does
not know the concept of a second "reject" score.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: no subject tagging in case of "X-Spam-Status: Yes"

2014-08-25 Thread Karsten Bräckelmann
On Mon, 2014-08-25 at 11:37 +0200, Reindl Harald wrote:
> header contains "X-Spam-Status: Yes, score=7.5 required=5.0"
> but the subject does not get [SPAM] tagging with the config
> below - not sure what i am missing

What does this command return?

  echo -e "Subject: Foo\n" | spamassassin --cf="required_score 1"


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Rule to check return-path for To address

2014-08-23 Thread Karsten Bräckelmann
On Sat, 2014-08-23 at 14:59 -0400, Jeff wrote:
> I recently started getting hammered by spam and nearly all of the spam
> emails have one thing in common. The return-path header contains the
> email address that the spam is being sent to.
> 
> Below is a sample header:
> ...
> Return-Path: amazon-voucher-myname=mydomain@indiarti.com
> ...
> 
> The green text above is the email address that the spam is being sent
> to (i.e., myn...@mydomain.com).

That's common practice with legitimate mail, too, in particular mailing
lists. Have a look at this mail's Return-Path header.


> Is there a way to write a custom SpamAssassin rule that will mark any
> message as spam if the return-path contains the 'To' address,
> regardless of what it may be, and the equal sign (i.e.,
> user=domain.tld)?

See the TO_EQ_FROM stock rule.

A similar rule for the Return-Path should actually be simpler, though.
The Return-Path header (or similar envelope from type headers) is
generated by the MTA, so the order of Return-Path and To headers should
be static -- unlike To and From, which are set by the sending MUA.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Learning both spam and ham, edge case

2014-08-22 Thread Karsten Bräckelmann
On Fri, 2014-08-22 at 17:44 -0700, Ian Zimmerman wrote:
> I know that if you misclassify a mail as spam with
> 
>  sa-learn --spam /path/to/ham
> 
> you can later run
> 
>  sa-learn --ham /path/to/ham
> 
> to correct the mistake, and SA will do the right thing (ie. forget the
> wrong classification).  And conversely, with ham <-> spam.

Correct. SA will recognize it has been learned before, and automatically
forget the previous training before re-training.


> My question is, what happens if you run
> 
>  sa-learn --spam /path/to/spam --ham /path/to/ham
> 
> and the same message is in both mailboxes?  Is the behavior even
> well-defined (ie. not random)?  And if so, can it be relied on in new
> versions?

Interesting...

First of all, see the man-page.  --ham and --spam are options, they
don't take arguments.

   sa-learn [options] [file]...

So your example is flawed by the assumption that --ham or --spam would
affect its file/path arguments, or possibly any following file/paths.
Which they don't.

Experimenting with --ham and --spam options, and two (identical) file
arguments yields:

Learning as ham or spam is not based on command-line option order, but
sa-learn code: --ham file --spam file results in learning spam, then
ham.

If you want to know more about sa-learn innards, I recommend looking at
its source code, or at least investigating

  sa-learn -D [...] 2>&1 | egrep '(learn|archive-iterator)'


In short: It is not random, but well-defined (see the source code). In
particular, there is no order of options. It is not guaranteed to be the
same in future (major|minor) versions, since your invocation sample is
not even documented.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Bayes training via inotify (incron)

2014-08-22 Thread Karsten Bräckelmann
On Fri, 2014-08-22 at 17:32 -0700, Ian Zimmerman wrote:
> Isn't inotify a bit of overkill for this?  If you have a dedicated
> maildir for training, you know that anything in maildir/new is, uh,
> new.  So you process it and move it to maildir/cur.  What am I missing?

The new/ directory is for delivery, messages moved will end up in cur/.

Training on messages in new/ means training solely on classification.
These messages have not been seen by a human, and he's most likely not
even aware there's new mail at all.

Messages moved (copied) into dedicated (ham|spam) learning folders will
be placed in cur/.

Thus, training on content in dedicated learning folders' new/ dirs won't
work, because human reviewed mail does not go there. And training on
new/ dirs in general is like overriding all of the precaution measures
of SA auto-learning, and blindly train anything and everything above or
below the required_score threshold.


Besides, moving messages from new/ to cur/ is the IMAP server's duty. No
third-party script should ever mess with that.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Delays with Check_Bayes

2014-08-21 Thread Karsten Bräckelmann
On Thu, 2014-08-21 at 13:13 -0700, redtailjason wrote:

> Are you open to the possibility of upgrading to 3.4.0 and using the Redis 
> backend for Bayes? (Just offering an alternative.)
> 
> We have been developing and upgrade plan to 3.4. Based on this, we are
> prioritize this upgrade and will be expediting it. Thanks. 

Thanks for including the part you're directly referring to, as I
requested. However, please do distinguish the quoted part from your
comments. The first paragraph actually was written by John, but your
post lacks any hint of the author, and even worse displays the quote and
your text visually identical.

See the difference between your latest two posts and any other post in
this thread?


I blame Nabble for even making this possible. In a reply, the quoted
text must be visually distinctive. More reason to avoid Nabble.

> View this message in context: 
> http://spamassassin.1065346.n5.nabble.com/Delays-with-Check-Bayes-tp111067p18.html
> Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
 
Sic. This is a mailing list. And Nabble a third-party list archive
service and poor forum-style web frontend to the mailing list.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Delays with Check_Bayes

2014-08-20 Thread Karsten Bräckelmann
On Wed, 2014-08-20 at 13:38 -0700, redtailjason wrote:
> We are seeing about 4000-7000 delayed messages per day. We do utilize a
> dedicated MySQL Server for the Bayes and all 8 scanners share it. Please let
> me know if this does not fully clarify our setup for you. 

So we're talking about 1% of the messages.

Does this happen with all scanner machines, or is this isolated to a
single one? If not all scanners are affected, any differences in network
connection?

When did this start? Any relevant changes roughly about that time?

What's your DB server load? Any noticeable load spikes, like 5k times a
day? In particular, while a message is taking 2 minutes wall-clock time
for Bayes, does either the scanner or database server have an unusual
high load? Do you have MySQL logs which might show issues?

Can you reproduce the Bayes lags? That is, can you identify a sample
message, and re-process manually?


When replying, please include the relevant quoted parts you're directly
referring to. With some context it is easier to follow the thread.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Delays with Check_Bayes

2014-08-20 Thread Karsten Bräckelmann
On Wed, 2014-08-20 at 06:15 -0700, redtailjason wrote:
> Hello and good morning. We are running into some delays that we are trying to
> pin down a root cause for. 
> 
> Below are some examples. Within the examples, you can see that the
> check_bayes: scan is consuming most of the timing. Does anyone have any
> suggests on what to look at? We use 3.3.2. We have eight scanners setup to
> handle the scanning with 5GB RAM and 4 CPUs each. Volume is 250K - 500K per
> day. 

That volume means throughput of about 350 messages per minute, 5.8 per
second. Sounds reasonable for 8 dedicated scanners.

Your samples are showing overall timings between about 90 seconds and
more than 2 minutes. Which means processing commonly takes less time,
and these are some extreme cases -- unless you really do have 50-100
busy processes per machine.

How many such long-running processes do you see, how frequent are they?

Also, you mentioned you are using the MySQL backend for Bayes. You did
not add any further detail, though.

Do you have dedicated MySQL servers for Bayes? Or does each scanner
machine run a local MySQL server? Do they share / sync databases
somehow?

Please elaborate on your environment, in particular everything
concerning Bayes.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Delays with Check_Bayes

2014-08-20 Thread Karsten Bräckelmann
On Wed, 2014-08-20 at 08:51 -0700, redtailjason wrote:
> The initial post was data extracted from mail.log on the scanner using cat
> /var/log/mail.log | grep check_bayes while logged as administrator. 

It doesn't matter what user greps the logs.

It was Amavis generating the logs. Thus, for debugging, all execution of
Amavis or SA commands must be done as the user Amavis runs as.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Delays with Check_Bayes

2014-08-20 Thread Karsten Bräckelmann
On Wed, 2014-08-20 at 07:35 -0700, redtailjason wrote:
> Here is the dump from one of the scanners:
> 
> netset: cannot include 127.0.0.1/32 as it has already been included
> 0.000  0  3  0  non-token data: bayes db version
> 0.000  0613  0  non-token data: nspam
> 0.000  0  0  0  non-token data: nham
> 0.000  0  50382  0  non-token data: ntokens
> 0.000  0 1362372138  0  non-token data: oldest atime
> 0.000  0 1396547409  0  non-token data: newest atime

That's back in April -- and obviously not a production database.

You need to run sa-update as the user SA uses during scan. In your case
that's the user Amavis uses.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Advice sought on how to convince irresponsible Megapath ISP.

2014-08-17 Thread Karsten Bräckelmann
On Sun, 2014-08-17 at 07:37 -0700, Linda Walsh wrote:
> Karsten Bräckelmann wrote:

> > Be liberal in what you accept, strict in what you send. In particular,
> > later stages simply must not be less liberal than early stages.

> > Your MX has accepted the message.
> 
> My ISP's MX has accepted it, because it doesn't do domain checking.  My 
> machine's MX rejects it so fetchmail keeps trying to deliver it. 

There is only one MX, run by your ISP. You are running an SMTP relay,
not an MX.

> While I *could* figure out how to hack sendmail to not reject the message,

You don't have a choice. That sendmail is an *internal* SMTP relay after
the MX border. While you certainly are not looking at it this way, your
own services *together* with the SMTP run by your ISP form your internal
network.

The internal relay you run must not be stricter than the MX. In fact, it
simply cannot be stricter, without mail ending up in limbo. Exactly what
you have...


> > There is no forwarding.
> 
> It comes in their MX, and is forwarded to their users.

Again, that is not forwarding. (Hint: You are using fetchmail, not
being-forwarded-to-me-mail.)


> > > Any ideas on how to get a cheapo-doesn't want to support anything ISP to 
> > > start blocking all the garbage the pass on?
> >
> > Change ISP. You decided for them to run your MX.
> 
> I didn't decide for them, I inherited them when they bought out the 
> competition to supply lower quality service for the same price.

We're about to split hairs, but it is your decision to try get your ISP
to behave as you want, instead of taking your business elsewhere. So,
yes, it is your decision to let them run your MX.

> > It is your choice to aim for a cheapo service (your words).
> 
> It wasn't when I signed up.   Cost $100 extra/month.  Now only $30
> extra/month that I don't host the domain with them.

But it is now, and all you're doing is complaining about it.

Expenses dropped to a fraction of what it used to be, yet you expect the
same service as before?

> > If you're unhappy with the service, take your business elsewhere.
> > Better service doesn't necessarily mean more expensive, but you
> > might need to shell out a few bucks for the service you want.
> 
> I already am... my ISP (cable company) doesn't have the services I want 
> for mail hosting.  I went to another company for that,

It is irrelevant weather your mail service provider happens to also be
your cable provider. You are paying for mail services. And if you want
better service, you might need to pay more -- which is what I said.

Besides, your wording is almost ironic. Your ISP didn't offer the email
service you want, so you went for another company. Now your current
(mail) service provider doesn't offer the service you want...


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



RE: Hotfix/phishing spam

2014-08-16 Thread Karsten Bräckelmann
On Thu, 2014-08-14 at 19:37 -0500, John Traweek CCNA, Sec+ wrote:
> Usually an end user has to request the hotfix and fill out a form on
> the MS site and then MS will send out an email with the URI.

Pardon my ignorance, but... WHY!?

Why would anyone require filling out a web form, to send an automated
email with a link as response? Why not simply, you know, put the link in
the page the user gets in return after sending that completed form
anyway?

Using an email message as response to an HTTP GET or POST request to
transfer a http(s) URI is beyond clusterfuck.


(Yes, I do realize you merely described what MS does, and you're not
responsible for their lame process.)


> So to answer your question, yes, MS does send out emails with
> hotfixes, but only when an end user requests it, at least in my
> experience… 
> 
> If the end user did not specifically fill out a form/request the hot
> fix, then I would be very suspicious…


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Advice sought on how to convince irresponsible Megapath ISP.

2014-08-16 Thread Karsten Bräckelmann
On Fri, 2014-08-15 at 19:06 -0700, Linda A. Walsh wrote:
> My old email service was bought out by Megapath who is letting alot of 
> services slide.
> 
> My main issue is that my incoming email scripts follow the SMTP RFC's and if
> the sender address isn't valid, then it's not a valid email that should be
> forwarded. 
> 
> My script simply check for the domain existing or not - if it doesn't exist,
> then it rejects it.  This causes about 100-200 messages a month that get
> stuck in an IMAP queue waiting for download -- only to be downloaded and 
> rejected due to the sender domain not existing.

Linda, your are rather vague on details, and definitely confusing terms
and terminology.

You state your ISP would forward mail to you. While on the other hand, a
sub-set of the mail is not accepted by your scripts, thus stuck in an
IMAP account "waiting for download". Both, the usage of IMAP as well as
mentioning download shows, your ISP is not forwarding mail, but you
fetching mail.

Similarly, your scripts do not reject messages, but choose not to fetch
them.


Pragmatic solution: If you insist on your scripts to not fetch those
spam messages (which have been accepted by the MX, mind you), automate
the "manual download and delete stage", which frankly only exists due to
your choice of not downloading them in the first place. Make your
scripts delete, instead of skipping over them.

Be liberal in what you accept, strict in what you send. In particular,
later stages simply must not be less liberal than early stages.

Your MX has accepted the message. At that point, there is absolutely no
way to not accept, reject it later. You can classify, which you use SA
for (I guess, given you posting here). You can filter or even delete
based on classification, or other criteria.


> The only response my ISP will give is to turn on their spam filtering. 
> I tried that. In about a 2 hour time frame, over 400 messages were
> blocked as spam.  Of those less than 10 were actually spam, the rest
> were from various lists.
> 
> So having them censoring my incoming mail isn't gonna work, but neither will
> the reject the obvious invalid domain email.
> 
> I can't believe that they insist on forwarding SPAM to their users even 
> though they know it is invalid and is spam. 

There is no censoring. There is no forwarding.

> Any ideas on how to get a cheapo-doesn't want to support anything ISP to 
> start blocking all the garbage the pass on?

Change ISP. You decided for them to run your MX.

It is your choice to aim for a cheapo service (your words). If you're
unhappy with the service, take your business elsewhere. Better service
doesn't necessarily mean more expensive, but you might need to shell out
a few bucks for the service you want.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Second step with SA

2014-08-15 Thread Karsten Bräckelmann
On Fri, 2014-08-15 at 12:21 -0400, Daniel Staal wrote:
> --As of August 15, 2014 1:23:37 PM +0200, Antony Stone is alleged to have 
> said:

> > http://spamassassin.apache.org/full/3.0.x/dist/doc/Mail_SpamAssassin_Conf
> > .html#language_options

> Both of these links are out of date.  The whitelist/blacklist it probably 
> doesn't matter to much, but the language option in the first has been 
> discontinued entirely.

Nope. The ok_languages option has not been discontinued. It has been
plugin-ized since 3.1, still lives to this date in the TextCat language
guesser plugin.


I do however agree, that those 3.0 links are way too old. I guess Antony
should clean up some bookmarks. ;)

Regarding white- and blacklist options, there have been some significant
changes since. Most notably, in addition to the whitelist_from_rcvd,
today there's the most convenient whitelist_auth and its piece-meal
whitelist_from_(spf|dk|dkim) counterparts.


> The correct links for the current version of Spamassassin are:
> 
> 

Latest stable version documentation, always:

  http://spamassassin.apache.org/doc/


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: spamassassin at 100 percent CPU

2014-08-13 Thread Karsten Bräckelmann
On Wed, 2014-08-13 at 11:20 -0700, Noah wrote:
> This is a new machine with rules copied over from another machine.  How 
> about this?  I just start new.  Is there a good page out that explains 
> setting up spamassassin from scratch and getting the sa rules set up 
> well and cleaned up nicely?  I am happy to start from the beginning with 
> best practices.

If you cannot answer our rather specific questions, you're in for a much
steeper learning curve than you seem to expect...


What the best way of setting up SA on a new machine is? Just install the
distro provided SA packages.

Getting the SA rules set up well? Same. Cleaned up? Do not copy over
configuration and rules from $ome other system, unless you know what you
are copying. IOW, don't. That's clean by definition.

What I really don't get from your reply is this, though:

A new machine, with "rules copied over". Yet, you seem to be unable to
answer our questions regarding custom rules and configuration you put
there. Which equals everything you "copied over" to begin with. If you
did, why can't you answer our question?

Or revert that "copying over", which results in the "cleaned up" state
you asked for.


Regardless of continuing with the current system, or setting up the
whole system from scratch again -- there are important questions raised,
you just didn't answer. Which, frankly, are likely to have a *much* more
severe impact than removing bad, copied rules.

What mail is that system handling, if it is not an MX? How large are
those messages, and what's your size limit? How is SA integrated, what
software is passing mail to SA?

What is the actual process's name, and for how long does it run at CPU
max?


Without answering these (basically, get back to my previous post and
actually answer all my very specific questions), there is absolutely no
point in you posing more or other questions. It won't help.


Reference:

> On 8/11/14 4:31 PM, Karsten Bräckelmann wrote:
> > On Mon, 2014-08-11 at 09:18 -0400, Joe Quinn wrote:
> >> Keep replies on list.
> >>
> >> Do you remember making any changes, or are you using spamassassin as it
> >> comes? What kind of email is going through your server? Very large
> >> emails can cause trouble with poorly written rules. If you can, perhaps
> >> systematically turn off things that are pushing email to that server
> >> could narrow it down to a particular type of email.
> >>
> >> On 8/9/2014 4:41 PM, Noah wrote:
> >>> thanks for your response.  I am not handling much email its a new
> >>> server and currently the MX points to another server.
> >
> > What mail is it handling?
> >
> > Not MX, so I assume it does not receive externally generated mail at
> > all. Which pretty much leaves us with locally generated -- cron noise
> > and other report types.
> >
> > How is SA integrated? What's your message size limit (see config of the
> > service passing mail to SA)? Are you per chance scanning multi MB text
> > reports?
> >
> > A sane size limit is about 500 kB. Besides, local generated mail isn't
> > worth processing with SA, and in the case of cron mail often harmful
> > (think virus scanner report).
> >
> >
> >>> How do I check the SA configuration?  How do I check if I am using
> >>> additional rules?
> >
> > By additional rules, we mean any rules or configuration that is not
> > stock SA. Anything other than the debian package or running sa-update.
> > Generally, anything *you* added.
> >
> >
> >>>> On 7/31/2014 3:19 PM, Noah wrote:
> >>>>> what are some things to check with spamassassin commonly running at
> >>>>> 100 percent?
> >
> > For how long does it run at CPU max? What is the actual process name?
> >
> > It would be rather common for the plain 'spamassassin' script to consume
> > a couple wall-clock seconds of CPU, since it has to read and compile the
> > full rule-set at each invocation.
> >
> > Unlike the 'spamd' daemon, which has that considerable overhead only
> > once during service start. In both cases may the actual scan time with
> > high CPU load be lower than the start-up overhead.
> >
> >
-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Rule for single URL in body with very few text

2014-08-12 Thread Karsten Bräckelmann
On Tue, 2014-08-12 at 11:42 -0400, Karl Johnson wrote:
> Thanks for the rule Karsten. I've already searched the archive to find
> this kind of rule and found few topic but I haven't been able to make
> it works yet. I will try this one and see how it goes.

Searching is much easier, if you know some unique pointers like the
sub-rule's name in question. Which is what I used to dig up the
rules. ;)

I didn't mean to RTFM you, just didn't feel like discussing yet again
what should be possible to deduct from the rules itself, or from the
archived threads. Hence me pointing at the archives with info on how to
find what you need, just in case you do need or want more details.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: spamassassin at 100 percent CPU

2014-08-11 Thread Karsten Bräckelmann
On Mon, 2014-08-11 at 09:18 -0400, Joe Quinn wrote:
> Keep replies on list.
> 
> Do you remember making any changes, or are you using spamassassin as it 
> comes? What kind of email is going through your server? Very large 
> emails can cause trouble with poorly written rules. If you can, perhaps 
> systematically turn off things that are pushing email to that server 
> could narrow it down to a particular type of email.
> 
> On 8/9/2014 4:41 PM, Noah wrote:
> > thanks for your response.  I am not handling much email its a new 
> > server and currently the MX points to another server.

What mail is it handling?

Not MX, so I assume it does not receive externally generated mail at
all. Which pretty much leaves us with locally generated -- cron noise
and other report types.

How is SA integrated? What's your message size limit (see config of the
service passing mail to SA)? Are you per chance scanning multi MB text
reports?

A sane size limit is about 500 kB. Besides, local generated mail isn't
worth processing with SA, and in the case of cron mail often harmful
(think virus scanner report).


> > How do I check the SA configuration?  How do I check if I am using 
> > additional rules?

By additional rules, we mean any rules or configuration that is not
stock SA. Anything other than the debian package or running sa-update.
Generally, anything *you* added.


> > > On 7/31/2014 3:19 PM, Noah wrote:
> > > > what are some things to check with spamassassin commonly running at
> > > > 100 percent?

For how long does it run at CPU max? What is the actual process name?

It would be rather common for the plain 'spamassassin' script to consume
a couple wall-clock seconds of CPU, since it has to read and compile the
full rule-set at each invocation.

Unlike the 'spamd' daemon, which has that considerable overhead only
once during service start. In both cases may the actual scan time with
high CPU load be lower than the start-up overhead.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Rule for single URL in body with very few text

2014-08-11 Thread Karsten Bräckelmann
On Mon, 2014-08-11 at 22:57 +0300, Jari Fredriksson wrote:

> *  1.8 DKIM_ADSP_DISCARD No valid author signature, domain signs all mail
> *  and suggests discarding the rest

> This is a corner case. I got it tagged, but probably just because I
> tested it later and URIBL has it now.

Minus the 1.8 score for DKIM_ADSP_DISCARD, it wouldn't have crossed the
5.0 threshold for you either.

Seeing all those x instead of (real|user|host) names and domains, it
seems safe to assume the unredacted message does not claim to be sent
from an x.com address... ;)


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Rule for single URL in body with very few text

2014-08-11 Thread Karsten Bräckelmann
On Mon, 2014-08-11 at 15:48 -0400, Karl Johnson wrote:
> Is there any rule to score an email with only 1 URL and very few text?
> It could trigger only text formatted email because they usually aren't
> in HTML.

Identify very short (raw)bodies.

  rawbody __RB_GT_200  /^.{201}/s
  meta__RB_LE_200  !__RB_GT_200

Chain together with the stock __HAS_URI sub-test.

  metaSHORT_BODY_WITH_URI  __RB_LE_200 && __HAS_URI


I have discussed and explained the rule to identify short messages a few
times already. Please search your preferred archive [1] for the rule's
name, to find the complete threads.


[1] List of archives: http://wiki.apache.org/spamassassin/MailingLists

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Running SA without the bayesian classifier

2014-08-11 Thread Karsten Bräckelmann
On Mon, 2014-08-11 at 16:38 +0200, Matteo Dessalvi wrote:
> I am planning to install SA on our SMTP MTAs, which deals only with
> outgoing traffic generated in the internal network.

Outgoing traffic. That means, most DNSBLs are either completely useless
or effectively disabled. You'll also need to zero out the ALL_TRUSTED
rule for the same reason.


> I am making the assumption that our clients are mostly sending 'clean'
> email (I know, I am trusting *a lot* my users but nevertheless).
> 
> So the question is: how efficient will be SA without using the bayesian
> classifier? Are all the remaining rulesets (apart from BAYES_*)
> sufficient to shave off spam email?

Define spam.

Running SA on your outgoing SMTP will not catch botnet generated junk,
neither spam nor malware. This would require sniffing raw traffic. Or
completely firewalling off outgoing port 25 connections.

You explicitly mention your users (corporate or home?) "sending mail".
Are you talking about them possibly running bulk sending services, or
hand crafted unsolicited mail to individual recipients?

Unless there's a 419 gang operating from your internal network, there
might not be much left for SA with stock rules to classify spam...


That said, it is entirely possible to run SA without the Bayesian
classifier. There's an option to disable it, and different score sets
are used generated specifically for this case.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Similar pattern of emails Comparing Prices

2014-08-07 Thread Karsten Bräckelmann
On Thu, 2014-08-07 at 17:14 +0100, emailitis.com wrote:
> I have had a fair number of VERY similar Spam emails that are all
> about comparing prices.  I have put a number in a pastebin below.

We need full, raw samples. Those are mostly just headers with the raw
body missing (multipart/alternative, thus most likely HTML and plain
text versions).

The blobs including a body-ish part appear to be copied from your MUA's
rendered display.


> They all seem to be originating from Fasthosts in UK which I cannot
> really blacklist in entirety.
> 
> Can anyone suggest how to block it with a Spamassassin rule?

First impression thought was to match on that List-Unsubscribe header's
domain. On second thought, bad idea, since cloudapp.net is MS Azure, not
the spammer's domain.

Still, that might make for an easy rule. That unsub link includes some
campaign, recipient, etc identifying numbers. And one that most likely
identifies the sender, identical in all 7 samples.

  header AZURE_BAD_CUSTOMER  List-Unsubscribe =~ 
/email-delivery.cloudapp.net\/sender\/box.php?.*s=bfa2e2429e7a4f0b0993c32a75aebc0e/

Note: This is only assuming the s value identifies the campaign's sender
and misbehaving Azure customer.

The body most certainly contains links with very similar structure.


> http://pastebin.com/B9YqTsvZ
> 
> I had tried to create something from a meta rule, but that has not
> worked so far: 
> 
> body __CGK_CLOUDAPP_1 /cloudapp/i
> body __CGK_CLOUDAPP_2 /\bCompare\b/i
> meta CGK_CLOUDAPP (( __CGK_CLOUDAPP_1 +  __CGK_CLOUDAPP_2) > 1)

No surprise. There is no "cloudapp" string in the body at all, according
to your two formatted samples.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: unsubscribe

2014-08-05 Thread Karsten Bräckelmann
Wrong address. To unsubscribe, send a mail to the appropriate
list-command address, not the mailing list itself.

See the headers of each and every post on this list:

  list-help: 
  list-unsubscribe: 


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: stable branch vs trunk (was: Re: "colors" TLDs in spam)

2014-08-04 Thread Karsten Bräckelmann
On Sun, 2014-08-03 at 09:22 -0400, Kevin A. McGrail wrote:
> Hi Karsten, I did bring this up a few months ago discussing releases.

I'm currently catching up on list mail, and figured recent threads might
be more important than revising old-ish, finished threads, in particular
about releases already published. *sigh*

> Right now trunk is effectively 3.4.1 and there is no reason to
> maintain a branch. When 3.4.1 is released, I would make sure this was
> the case and recopy from trunk but do not stress as I will confirm
> this. We should aim for a sept 30 3.4.1 release.
> 
> But until we have a need for the branch, to me it is a waste of time
> to sync both.

Fair enough.

> And the plugin system let's new, experimental code go into trunk
> without risking stability.

That holds true only for new plugins, like TxRep (trunk) or the Redis
BayesStore during 3.4 development. It does not prevent potential major
issues in cases like e.g. new URIDNSBL features, general DNS system
rewrite or tflags changes, which happened in trunk with the (then)
stable 3.3 branch being unaffected.

Not opposing in general. Just pointing out that this argument is only
valid, as long as substantial changes are in fact isolated in new
plugins.


> So right now, I do not really envision a need for a branch and I run
> trunk. My $0.02.

Hey, I didn't say trunk is unsafe either! Even while Mark happens to
rewrite large parts of DNS handling or DNSBL return value masks. ;)


As long as there is no real need for separating stable and development
branches, I'm fine with this. Given branching will happen prior to
disruptive commits.

I guess my concerns also can be outlined by anecdotal evidence: I
recently asked for RTC votes, to commit a patch not only to trunk, but
the 3.4 branch also. You told me we're not in RTC mode and to go ahead,
so I committed to the stable branch and closed the bug report. You did
not tell me committing to the branch would be needless...


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: moving from "fetched" mail to "direct deliver" mail

2014-08-04 Thread Karsten Bräckelmann
On Mon, 2014-08-04 at 18:16 -0400, Joe Acquisto-j4 wrote:
> On 8/4/2014 at 5:03 PM, RW  wrote:

> > > Do I gotta start fresh?  or will the config changes to SA for direct
> > > drop allow magic to happen?

There's magic. And there's probably no SA conf changes. ;)


> > I'm not sure whether you are referring to the Bayes database or a
> > collection of email, but either way I'd keep it - at least until I
> > had a few thousand new hams and spams to reset it. 

> Well, either or both, I guess.   I guess my question really is, is
> Bayes OK as is, or will the changes that will exist in the headers
> make it useless.I think I hear, "it should be ok, for now". ?

Bayes is entirely fine with that. For now, and later.

Your change in environment only effects a very few headers added by the
relays, like Received ones. Bayes tokens taken from headers do include
header specifics. With a change like this, you will only lose a *very*
few indicators for spam vs ham. There's hardly any potential for damage
at all regarding your Bayes training.

You'll probably not even notice.


> > If you are going to learn from older mail you should ideally keep the
> > old internal and trusted network settings. You can comment them out in
> > normal use, but they should be present for sa-learn.
> 
> Umm.  ?.   So,  I  can keep the existing Bayes, but if I should have to
> re-learn,   I should revert to my old settings for learning.

Yes. The only settings you'd want to keep in case of re-training from a
corpus including those old mail are internal_ and trusted_networks,
though.

If at all. SA does detect certain mail fetching and does the magic for
you. E.g. in a rather straight environment of using 'fetchmail' with
local SA afterward (postfix, and possibly procmail), the internal and
trusted networks do not need to be set.

So in that case, there's no config needed to be retained, because
there's no config you had to set due to your mail fetching environment
in the first place.

Point in case: Retain configuration you did need in the previous setup,
which becomes obsolete with your new environment.


> I guess I should also, once I change,  start a second "corpus" with the
> new settings and, at least until I amass a sufficient store of new
> mail, relearn from both, adjusting SA config as appropriate?

As I hopefully made clear above, there's no need for starting a new
corpus. There's probably no need for new settings, if at all very
limited.

Your text sounds like major conf changes to me. Go through 'em, which
changes do you think you'll need? My guess is little to none.


> Make sense?  Am I way off base and/or making this too complicated?

Too complicated. ;)


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: New at SpamAssassin - how to not get headers

2014-08-04 Thread Karsten Bräckelmann
On Mon, 2014-08-04 at 13:02 -0700, Robert Grimes wrote:
> Robert Grimes wrote

> > I have changed the user that runs the spamd service to be the same as when
> > I ran from command line. I will see what, if any changes occur. I will
> > leave Bayes alone for the moment; just try one thing at a time to keep the
> > confusion down.

By that change of the user your spamd service runs as, you lost your
previous Bayes training (which seems to be linked to the service user).
Unless you deliberately nuked the Bayes DB to start fresh.


Ignoring DNSBL blocking and broken format, which has been covered
already.

> X-Spam-Status: No, score=0.0 required=5.0 tests=HTML_MESSAGE,SPF_HELO_PASS,
>   URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0

There is no BAYES_xx rule hit. If Bayes is enabled and has been trained
sufficiently, there will *always* be a BAYES_xx rule indicating the
Bayesian probability of being spam.

The absence of any such rule since you changed the spamd service user
means, that user has no access to the previously trained Bayes DB.

> I saved the messaged from outlook and ran spamc [...]

> X-Spam-Status: Yes, score=7.3 required=5.0 tests=MISSING_DATE,MISSING_FROM,   
> 
>   MISSING_HEADERS,MISSING_MID,MISSING_SUBJECT,NO_HEADERS_MESSAGE,NO_RECEIVED,
>   NO_RELAYS,NULL_IN_BODY,URIBL_BLOCKED,URI_HEX autolearn=no autolearn_force=no
>   version=3.4.0

No BAYES_xx rule either, same problem as above.

However, do note the autolearn=no part. Bayes is enabled (just not
sufficiently trained yet). In a follow-up to this thread, you pasted
headers of spam manually scanned with spamc, showing autolearn=ham.

A spam message incorrectly has been learned as ham. You want to correct
that by re-training (simply learn as spam). And keep an eye on that part
in future.


> both should be running under the same administrator account.

It is important to use the same user  (a) scanning incoming mail, and
(b) using for training as well as  (c) manually running through spamc
later.

Unless spamd changes user on a per-recipient basis (which it seems is
not the case in your setup), that's a single user. Changing that user as
you just did, requires moving $HOME data or changing ownership for the
Bayes DB.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: New at SpamAssassin - how to not get headers

2014-08-04 Thread Karsten Bräckelmann
On Mon, 2014-08-04 at 14:11 -0700, Robert Grimes wrote:
> Both spamc and hMailServer SA service are running in the same directory
> where the binaries for SA are. I am not sure the significance of the
> directory name. As I stated both use the same parameters which is only -l
> therefore SA uses default config file locations for both.

Earlier in this thread you mentioned using the -l option with spamd. Now
you mention using that option with "both". So, by "hMailServer SA
service", are you referring to spamd?

In either case, your assumption of using identical command line options
resulting in spamd and spamc using the same configuration is false.

* For spamc, the -l option sends log messages to stderr instead of
syslog. Given you're running Windows, I don't even know if that option
has any effect at all.

* For spamd, the -l option enables telling, that is allowing learning
(Bayes) and reporting spam to external services via spamc.

The latter is a rather uncommon option, and even less likely to be used
deliberately in the environment of a new SA user.


For spamc/d options and a lot more details, see the documentation. In
particular the docs named after their respective programs and the Conf
one.

  http://spamassassin.apache.org/doc/


> I have had serveral hundred hams. Wouldn't that be enough?

Yes, as Martin mentioned, learning 200 spam and ham each is sufficient
for Bayes to start working.

But see my other reply to this thread in a few.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



  1   2   3   4   5   6   7   8   9   10   >