Re: Why is RP_MATCHES_RCVD so "heavy"?

2016-11-23 Thread @lbutlr
On Nov 22, 2016, at 3:54 PM, Eric Abrahamsen  wrote:
> I get a lot of spam that passes the RP_MATCHES_RCVD test; it wouldn't
> make it into my inbox otherwise. I see the scoring recently got bumped
> to -3.0, which makes false negatives even more likely.

I do see this in spam, but I see it so much more in ham that I’ve not changed 
the score. The spam that does hit it seems to score very highly in other areas 
(bayes_99 and bayes_999 especially). I see it in a lot of mail that is often 
tagged by the user as spam, but os not actually spam. For example, emails from 
macy’s or target which the user did sign up for, but is too lazy to unsubscribe.

But run it against your corpus and adjust the score as needed.




Re: Is SA scoring affected by envelope field changes?

2016-11-23 Thread Matus UHLAR - fantomas

On 22.11.16 13:11, MRob wrote:
I'd like to ask if SA scoring would be affected by potential changes 
in the envelope fields (which presumably depends on when in the mail 
flow SA is used, Postfix in this case):


* content_filter when receive_override_options=no_address_mappings 
(sent to filter via SMTP)


* content_filter when address mappings have occurred (sent to filter 
via SMTP)


* in the delivery agent (given to delivery agent via LMTP)

Are there impacts to scoring to consider? Are they small or large? 


there are no impacts - envelope header often does not show up in headers.

Can adding a X-Original-To header help the delivery agent show SA how 
the message was originally received?


only if different recipient causes SA to treat mail differently. Should not
happen usually
--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
You have the right to remain silent. Anything you say will be misquoted,
then used against you. 


Re: Why is RP_MATCHES_RCVD so "heavy"?

2016-11-23 Thread Bill Cole

On 22 Nov 2016, at 17:54, Eric Abrahamsen wrote:


I get a lot of spam that passes the RP_MATCHES_RCVD test; it wouldn't
make it into my inbox otherwise. I see the scoring recently got bumped
to -3.0, which makes false negatives even more likely.

I'm not expert enough in the nature of spam to really understand why
this test is so strong, nor to feel confident in simply whacking a few
points off it without knowing more.

In the year or so that I've been running my own mail server, I don't
think I've seen a *single* false positive (at least not one that I
noticed), but get maybe an average of two spam mails into my inbox 
every

day. I've beefed up the BAYES scores, and that helped, but haven't
tweaked anything else.

Can anyone tell me why it's scored so heavily?


Probably someone more intimate withe the RuleQA process can explain it. 
To me it looks too noisy to be scored so strongly, and for years I've 
had it pegged for my systems at -0.3. I suspect that much of the 
non-matching spam is stuff that many sites exclude well ahead of SA, so 
it is not as indicative in production systems as it is in RuleQA.



Would it be a bad idea to
just drop it down to -1.5 or something?


In the past 2 years on multiple mail systems I have had no indication of 
any false positives which would have been cured by a stronger ham score 
for RP_MATCHES_RCVD. My reduction to -0.3 was based on the rule 
chronically redeeming a stream of snowshoe spam that was otherwise 
scoring in the ~6 range. Whether and how far you reduce its power should 
be based on your local circumstances, but -1.5 strikes me as probably a 
reasonable & prudent guess in the absence of careful analysis.


Re: Why is RP_MATCHES_RCVD so "heavy"?

2016-11-23 Thread Kris Deugau
Eric Abrahamsen wrote:
> I get a lot of spam that passes the RP_MATCHES_RCVD test; it wouldn't
> make it into my inbox otherwise. I see the scoring recently got bumped
> to -3.0, which makes false negatives even more likely.
> 
> I'm not expert enough in the nature of spam to really understand why
> this test is so strong, nor to feel confident in simply whacking a few
> points off it without knowing more.
> 
> In the year or so that I've been running my own mail server, I don't
> think I've seen a *single* false positive (at least not one that I
> noticed), but get maybe an average of two spam mails into my inbox every
> day. I've beefed up the BAYES scores, and that helped, but haven't
> tweaked anything else.
> 
> Can anyone tell me why it's scored so heavily? Would it be a bad idea to
> just drop it down to -1.5 or something?

This is a rule whose usefulness is likely to vary a lot more for your
mail stream.

Locally, I found it was firing on enough of the reported false-negatives
that I squashed it down to a purely advisory -0.001 quite a while ago,
and I haven't seen any issues with doing so.

I didn't disable it outright as some others do, since it's used in
several meta rules.

-kgd


Re: Why is RP_MATCHES_RCVD so "heavy"?

2016-11-23 Thread Matus UHLAR - fantomas

Eric Abrahamsen wrote:

I get a lot of spam that passes the RP_MATCHES_RCVD test; it wouldn't
make it into my inbox otherwise. I see the scoring recently got bumped
to -3.0, which makes false negatives even more likely.

I'm not expert enough in the nature of spam to really understand why
this test is so strong, nor to feel confident in simply whacking a few
points off it without knowing more.

In the year or so that I've been running my own mail server, I don't
think I've seen a *single* false positive (at least not one that I
noticed), but get maybe an average of two spam mails into my inbox every
day. I've beefed up the BAYES scores, and that helped, but haven't
tweaked anything else.

Can anyone tell me why it's scored so heavily? Would it be a bad idea to
just drop it down to -1.5 or something?


On 23.11.16 10:29, Kris Deugau wrote:

This is a rule whose usefulness is likely to vary a lot more for your
mail stream.

Locally, I found it was firing on enough of the reported false-negatives
that I squashed it down to a purely advisory -0.001 quite a while ago,
and I haven't seen any issues with doing so.

I didn't disable it outright as some others do, since it's used in
several meta rules.


meta rules should match __RP_MATCHES_RCVD which is exactly the same rule
- blanking RP_MATCHES_RCVD should make no difference

Thus I (again) recommend blanking it...

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Despite the cost of living, have you noticed how popular it remains? 


Re: Why is RP_MATCHES_RCVD so "heavy"?

2016-11-23 Thread Eric Abrahamsen
Matus UHLAR - fantomas  writes:

>>Eric Abrahamsen wrote:
>>> I get a lot of spam that passes the RP_MATCHES_RCVD test; it wouldn't
>>> make it into my inbox otherwise. I see the scoring recently got bumped
>>> to -3.0, which makes false negatives even more likely.
>>>
>>> I'm not expert enough in the nature of spam to really understand why
>>> this test is so strong, nor to feel confident in simply whacking a few
>>> points off it without knowing more.
>>>
>>> In the year or so that I've been running my own mail server, I don't
>>> think I've seen a *single* false positive (at least not one that I
>>> noticed), but get maybe an average of two spam mails into my inbox every
>>> day. I've beefed up the BAYES scores, and that helped, but haven't
>>> tweaked anything else.
>>>
>>> Can anyone tell me why it's scored so heavily? Would it be a bad idea to
>>> just drop it down to -1.5 or something?
>
> On 23.11.16 10:29, Kris Deugau wrote:
>>This is a rule whose usefulness is likely to vary a lot more for your
>>mail stream.
>>
>>Locally, I found it was firing on enough of the reported false-negatives
>>that I squashed it down to a purely advisory -0.001 quite a while ago,
>>and I haven't seen any issues with doing so.
>>
>>I didn't disable it outright as some others do, since it's used in
>>several meta rules.
>
> meta rules should match __RP_MATCHES_RCVD which is exactly the same rule
> - blanking RP_MATCHES_RCVD should make no difference
>
> Thus I (again) recommend blanking it...

Thanks to all of you for the responses! I'll weaken the rule a bit and
see how it goes -- looking at total scores for the spam that makes it
past SA, just a point or two should do it.

It was helpful seeing everyone's thought-process here, thanks again.

E



"Complex regular subexpression recursion limit exceeded" error from sa-learn

2016-11-23 Thread Rich Wales
I'm running Postfix, Spamassassin, amavisd-new, and Dovecot on an Ubuntu
16.04 LTS server.

For some time now, I've been running my inbox and junk folder through
*sa-learn* every night, in order to educate my e-mail server about spam
messages that make it past Spamassassin but which I subsequently mark as
spam manually.

Lately, I've been seeing a fair number of messages from *sa-learn* like
the following:

/Complex regular subexpression recursion limit (32766) exceeded at
/usr/share/perl5/Mail/SpamAssassin/HTML.pm line 745./

I've managed to identify specific individual e-mails that generate this
diagnostic, but I've looked at them and can't see anything obviously
strange.

Any thoughts?
-- 
*Rich Wales*
ri...@richw.org


Re: "Complex regular subexpression recursion limit exceeded" error from sa-learn

2016-11-23 Thread John Hardin

On Wed, 23 Nov 2016, Rich Wales wrote:


I'm running Postfix, Spamassassin, amavisd-new, and Dovecot on an Ubuntu
16.04 LTS server.

For some time now, I've been running my inbox and junk folder through
*sa-learn* every night, in order to educate my e-mail server about spam
messages that make it past Spamassassin but which I subsequently mark as
spam manually.

Lately, I've been seeing a fair number of messages from *sa-learn* like
the following:

/Complex regular subexpression recursion limit (32766) exceeded at
/usr/share/perl5/Mail/SpamAssassin/HTML.pm line 745./

I've managed to identify specific individual e-mails that generate this
diagnostic, but I've looked at them and can't see anything obviously
strange.

Any thoughts?


The RE at that line looks pretty firmly anchored...

Can you gzip up a sample that fails for you and send it to me?


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  You do not examine legislation in the light of the benefits it
  will convey if properly administered, but in the light of the
  wrongs it would do and the harms it would cause if improperly
  administered.  -- Lyndon B. Johnson
---
 338 days since the first successful real return to launch site (SpaceX)


Re: "Complex regular subexpression recursion limit exceeded" error from sa-learn

2016-11-23 Thread John Hardin

On Wed, 23 Nov 2016, Rich Wales wrote:


/The RE at that line looks pretty firmly anchored... Can you gzip up a
sample that fails for you and send it to me?/


Sure.  See the attachment.


OK, I can repro on trunk:

Nov 23 19:17:00.141 [18349] dbg: message: HTML::Parser utf8_mode on (assumed 
UTF-8 octets)
Nov 23 19:17:00.187 [18349] warn: Complex regular subexpression recursion limit 
(32766) exceeded at lib/Mail/SpamAssassin/HTML.pm line 745.
Nov 23 19:17:00.193 [18349] dbg: message: spaces (octets) in HTML: 952 out of 
3954

It's that very long block of QP blanks right at the end. If you edit out 
all those =20s after the  it stops emitting that warning.


That would be a workaround for you to make sa-learn shut up about your 
corpus until the problem is fixed. Blanks don't affect Bayes (at least, 
not until we implement multi-word tokens) so it shouldn't affect what gets 
learned.


Please open a bug and attach that spample as a repro test case. I'm not 
too familiar with that bit of the code so I don't have a fast fix.



--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
 338 days since the first successful real return to launch site (SpaceX)


Re: "Complex regular subexpression recursion limit exceeded" error from sa-learn

2016-11-23 Thread Rich Wales

> /OK, I can repro on trunk: . . .  It's that very long block of QP
> blanks right at the end. . . .  Please open a bug and attach that
> spample as a repro test case./

Done.  (https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7374)
-- 
*Rich Wales*
ri...@richw.org


Re: "Complex regular subexpression recursion limit exceeded" error from sa-learn

2016-11-23 Thread Benny Pedersen

Rich Wales skrev den 2016-11-24 06:01:

_OK, I can repro on trunk: . . .  It's that very long block of QP
blanks right at the end. . . .  Please open a bug and attach that
spample as a repro test case._


Done.  (https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7374)


disable html postings on maillist still left to do ? :=)

and output to this ticket of "spamassassin --lint -D 2>&1 >/tmp.txt"

so all installed plugins versions are known, in case its already fixed


Re: "Complex regular subexpression recursion limit exceeded" error from sa-learn

2016-11-23 Thread Rich Wales
On 11/23/16 21:13, Benny Pedersen wrote:

> and output to this ticket of "spamassassin --lint -D 2>&1 >/tmp.txt"
> so all installed plugins versions are known, in case its already fixed

Done.

Rich Wales
ri...@richw.org