Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-12 Thread Henrik K
On Fri, Nov 12, 2021 at 07:49:00PM -0800, John Hardin wrote:
> 
> What would be helpful here would be logging of when a rule *starts*
> evaluation. Normally that would be painful, but for tracking a runaway it
> would be useful. Perhaps I can code up something to capture that and log it
> on a timeout...

It already exists

spamassassin -D all,rules-all < msg



Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-12 Thread John Hardin

On Fri, 12 Nov 2021, Philip Prindeville wrote:


I got the message, saved it to a flat file, and ran "spamassassin -t -D rules < 
netdev.eml" and saw:

...
Nov 12 11:45:38.048 [36367] dbg: rules: ran eval rule __ANY_TEXT_ATTACH_DOC 
==> got hit (1)
...
Nov 12 11:45:38.063 [36367] dbg: rules: ran eval rule __ANY_TEXT_ATTACH ==> 
got hit (1)
Nov 12 11:49:58.565 [36367] info: check: exceeded time limit in 
Mail::SpamAssassin::Plugin::Check::_eval_tests_type11_pri0_set1, skipping 
further tests
...

Am I correct that __ANY_TEXT_ATTACH alone took 4:30s?


"ran ... got hit" is past tense. And it needs to complete the rule to know 
whether it got a hit.


11:45:38.048 -> 11:45:38.063 = less than 20 msec.

The next rule, whatever that was, is the one that timed out after 4m20s.


Could there be rules that *aren't* matching but are taking a while?


It's timing out on a rule that's running away. The timeout triggers before 
"hit/no hit" is known.


What would be helpful here would be logging of when a rule *starts* 
evaluation. Normally that would be painful, but for tracking a runaway it 
would be useful. Perhaps I can code up something to capture that and log 
it on a timeout...


If you want to send me that message zipped up I can try it here with those 
changes and see if it's a base rule running away.




--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.org pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The most glaring example of the cognitive dissonance on the left
  is the concept that human beings are inherently good, yet at the
  same time cannot be trusted with any kind of weapon, unless the
  magic fairy dust of government authority gets sprinkled upon them.
   -- Moshe Ben-David
---
 531 days since the first private commercial manned orbital mission (SpaceX)


Re: Fw: spam from gmail.com

2021-11-12 Thread Łukasz Michalski

On 11/12/21 00:43, Loren Wilton wrote:
I have to admit I'd never paid much attention to the RCVD_IN_DNSWL_* 
scores on spam before.

Looking at spam for last month, I don't have a single RCVD_IN_DNSWL_MED.

But I do have 12 pretty blatent spams that hit RCVD_IN_DNSWL_HI.
It makes me wonder just how useful a rule it is.

Especially when it includes sendgrid as part of the "HI" reputation 
senders.


When I was using my provider DNS server, I started to receive a lot of 
spam, mails were scored with RCVD_IN_DNSWL_HI=-5.
I turned out that most queries were resolved as 127.0.0.255 (BLOCKED), 
but some of them as 127.0.10.3 (listed HI as "some special cases" category)


So you need to use your own DNS server and make sure you are below 100k 
queries/day, or get a subscription. Otherwise spam occasionally starts 
to get in.


Regards,
Łukasz



Re: spam from gmail.com

2021-11-12 Thread Philip Prindeville



> On Nov 9, 2021, at 6:49 AM, Jared Hall  wrote:
> 
> On 11/8/2021 11:36 PM, Peter wrote:
>> It seems that people aren't taking google as seriously any more.
> First came Freemail.  Then came SpamAssassin.  I DO think that people take 
> Google seriously.  There are just so many ways to deal with this problem - 
> none of which is better than any other.
> 
> Google touts their AI capabilities with Spam.  Too bad they don't scan their 
> outbound email.  Instead, they seem to have adopted a cowardly philosophy 
> that an old C Telephone tech conveyed to me decades ago: "Problem's leaving 
> here fine!"
> 
> Google should practice what they preach:  SANITIZE USER INPUT. Instead, their 
> careless attitude presents a security threat to us all.
> 
> -- Jared Hall
> 


What... you mean "do no evil" is just lip-service?  I'm so... so... 
disillusioned!

-Philip



Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-12 Thread Philip Prindeville
Hi,

I got an email from net...@vger.kernel.org that was a lengthy (422K) regression 
test report from a patch someone had submitted.

I got the message, saved it to a flat file, and ran "spamassassin -t -D rules < 
netdev.eml" and saw:

...
Nov 12 11:45:38.048 [36367] dbg: rules: ran eval rule __ANY_TEXT_ATTACH_DOC 
==> got hit (1)
...
Nov 12 11:45:38.063 [36367] dbg: rules: ran eval rule __ANY_TEXT_ATTACH ==> 
got hit (1)
Nov 12 11:49:58.565 [36367] info: check: exceeded time limit in 
Mail::SpamAssassin::Plugin::Check::_eval_tests_type11_pri0_set1, skipping 
further tests
...

Am I correct that __ANY_TEXT_ATTACH alone took 4:30s? Looking at the rule, I 
don't understand why it's taking so long...  unless that's not the smoking gun. 
 Could there be rules that *aren't* matching but are taking a while?

72_active.cf:  mimeheader  __ANY_TEXT_ATTACH Content-Type =~ /text\/\w+/i

And how do I dig into why I'm getting that last message?

I can't even find type11_pri0_set1 as a string in 
/usr/share/perl5/vendor_perl/Mail/SpamAssassin/

Also, why are there multiple runs of:

Nov 12 15:05:37.368 [38290] dbg: rules: ran body rule __LOWER_E ==> got 
hit: "e"
Nov 12 15:05:37.368 [38290] dbg: rules: ran body rule __LOWER_E ==> got 
hit: "e"
Nov 12 15:05:37.368 [38290] dbg: rules: ran body rule __LOWER_E ==> got 
hit: "e"
Nov 12 15:05:37.368 [38290] dbg: rules: ran body rule __LOWER_E ==> got 
hit: "e"
Nov 12 15:05:37.368 [38290] dbg: rules: ran body rule __LOWER_E ==> got 
hit: "e"
Nov 12 15:05:37.368 [38290] dbg: rules: ran body rule __LOWER_E ==> got 
hit: "e"
Nov 12 15:05:37.369 [38290] dbg: rules: ran body rule __LOWER_E ==> got 
hit: "e"
Nov 12 15:05:37.369 [38290] dbg: rules: ran body rule __LOWER_E ==> got 
hit: "e"
Nov 12 15:05:37.369 [38290] dbg: rules: ran body rule __LOWER_E ==> got 
hit: "e"
Nov 12 15:05:37.369 [38290] dbg: rules: ran body rule __LOWER_E ==> got 
hit: "e"


Should this be capped to a maximum number of matches the way __HIGHBITS is?

And I'm not sure I want messages that haven't been fully scanned being 
delivered.  Should I crank TIME_LIMIT_EXCEEDED to 20.0?

Thanks,

-Philip



Re: MIME_BASE64_TEXT only on us-ascii

2021-11-12 Thread Bill Cole

On 2021-11-12 at 04:33:34 UTC-0500 (Fri, 12 Nov 2021 10:33:34 +0100)
Philipp Ewald 
is rumored to have said:


Hi folks,

it's seems to be that spamassins dont check non ASCII Base64 decodes 
Mails.


I cannot make that line of text into a coherent English sentence.



Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: base64

[BAYES_99=3.5, BAYES_999=5, HTML_FONT_LOW_CONTRAST=0.001,
HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.723,
RCVD_IN_BL_SPAMCOP_NET=1.347, RCVD_IN_RP_RNBL=1.31]


Mails with:
Content-Type: text/html;
charset="us-ascii"


would get "MIME_BASE64_TEXT"

[BAYES_99=3.5, BAYES_999=5, CK_HELO_GENERIC=0.001,
HELO_DYNAMIC_DHCP=0.206, HTML_IMAGE_ONLY_28=1.404, 
HTML_MESSAGE=0.001,

HTTP_EXCESSIVE_ESCAPES=1.572, KHOP_DYNAMIC=0.001,
MIME_BASE64_TEXT=1.741, MIME_HTML_ONLY=0.723,
RAZOR2_CF_RANGE_51_100=1.886, RAZOR2_CHECK=0.922,
RCVD_IN_RP_RNBL=1.31, T_REMOTE_IMAGE=0.01]


Is this a Bug?


Not until it's reproducible and described in a coherent manner.

If you can provide valid email messages (perhaps artificially 
constructed) that do (or don't) hit the rules that you believe they 
should (or should not,) please do so.


The purpose of MIME_BASE64_TEXT is to identify messages where a text 
part (or the whole message) with pure US-ASCII content has been 
Base64-encoded instead of being sent unencoded (or just QP-encoded to 
protect overlong lines.)



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Fw: spam from gmail.com

2021-11-12 Thread Greg Troxel

Arne Jensen  writes:

> Den 11-11-2021 kl. 20:21 skrev Greg Troxel:
>> It's a really interesting question what DNSWL_MED ought to be for score.
>> Given what MED is supposed to be:
>>
>>MediumRare spam occurrences, corrected promptly.
>>
>> -2.3 points seems entirely reasonable.
>>
>> But I don't see how gmail makes sense being medium, as spam from gmail
>> is not rare.  Probably it happens to me every day.  NONE seems more
>> appropriate, especially since I have no perception of google making a
>> serious attempt to avoid emanating spam.  (I realize this comment
>> belongs on the DNSWL list, but for now I'm not bothered personally
>> because the v6 addrs aren't listed.)
>
> Google (Gmail) is not, and have never been on medium.
>
> Last score change on Google's addresses, was in June 2018, demoting
> the last remaining ones from "low" to "none".
>
> Are you by any chance forwarding traffic from one server to another,
> and/or potentially missing something in your trusted_networks and/or
> internal_networks? This one is *very* common.

Sorry for being fuzzy. What I meant, and didn't say clearly, is:

  I get a lot of spam from gmail (that is properly DKIM signed and
  passes SPF).  I'm not seeing any of it get tagged as coming from
  DNSWL_MED.

  Having seen other people claim that google servers are on MED, I was
  opining that this didn't make sense.  (It seems that everybody agrees
  that it doesn't make sense and also that it has never been true.)

> Checking up with DNSWL is actually done by checking the first server
> in reverse order, that your own server does not trust, so if the
> inbound message you see was sent from Gmail, relayed over your
> friend's server (which is/was at medium), and then finally hitting
> yours, and that you do not have set your friend's server as one of
> your trusted ones, the DNSWL check will be done on your friend's
> server, ending up with flagging the message as medium.

For me, the trickiness is in mailinglists, especially when they are set
up without restrict-to-list-member and without good filtering.   So I
have put their addresses into trusted_networks.   This isn't quite the
same as someone MX-catching for me, but I think it works out the same.

Greg


signature.asc
Description: PGP signature


Re: Fw: spam from gmail.com

2021-11-12 Thread Greg Troxel

Arne Jensen  writes:

> Den 12-11-2021 kl. 00:43 skrev Loren Wilton:
>> I have to admit I'd never paid much attention to the RCVD_IN_DNSWL_*
>> scores on spam before.
> [...]
>> Looking at spam for last month, [...]
>>
>> But I do have 12 pretty blatent spams that hit RCVD_IN_DNSWL_HI.
>> It makes me wonder just how useful a rule it is.
> A pretty blatant misconfiguration of a mail server (and/or the system
> running same), can unfortunately lead to various negative side
> effects.

Loren might want to check about spam received by mailinglists.   I have
seen spam sent to lists and then delivered to me, so that it arrives
from the MTA of the org running the list.   Adding that to
trusted_networks moves the check points earlier and avoids treating
the mail as good because it came from the list.

Of course, it would be better if the list were set up for both spam
filtering and rejecting non-member posts, and machines that host lists
that send spam probably aren't in DNSWL anyway.


Thanks for all the confirmations for what isn't listed.  I have always
had the view that DNSWL runs a tight ship (and fairly too), and I
continue to feel that -2.3 for MED is a reasonable score.


signature.asc
Description: PGP signature


MIME_BASE64_TEXT only on us-ascii

2021-11-12 Thread Philipp Ewald

Hi folks,

it's seems to be that spamassins dont check non ASCII Base64 decodes Mails.

Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: base64

[BAYES_99=3.5, BAYES_999=5, HTML_FONT_LOW_CONTRAST=0.001,
HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.723,
RCVD_IN_BL_SPAMCOP_NET=1.347, RCVD_IN_RP_RNBL=1.31]


Mails with:
Content-Type: text/html;
charset="us-ascii"


would get "MIME_BASE64_TEXT"

[BAYES_99=3.5, BAYES_999=5, CK_HELO_GENERIC=0.001,
HELO_DYNAMIC_DHCP=0.206, HTML_IMAGE_ONLY_28=1.404, HTML_MESSAGE=0.001,
HTTP_EXCESSIVE_ESCAPES=1.572, KHOP_DYNAMIC=0.001,
MIME_BASE64_TEXT=1.741, MIME_HTML_ONLY=0.723,
RAZOR2_CF_RANGE_51_100=1.886, RAZOR2_CHECK=0.922,
RCVD_IN_RP_RNBL=1.31, T_REMOTE_IMAGE=0.01]


Is this a Bug?


Kind regards
Philipp

--
Philipp Ewald
Administrator

DigiOnline GmbH, Probsteigasse 15 - 19, 50670 Köln
Fax: +49 221 6500-690, E-Mail: philipp.ew...@digionline.de

AG Köln HRB 27711, St.-Nr. 5215 5811 0640
Geschäftsführer: Werner Grafenhain

Informationen zum Datenschutz: www.digionline.de/ds