Re: Help matching a spam (regex)

2019-06-04 Thread John Hardin

On Tue, 4 Jun 2019, Marcio Vogel Merlone dos Santos wrote:


Hi all,

Trying to match a message using uri_detail with no luck. On body I have 
something like this:


Something →

That "something" is changed on a daily basis, so I am trying to match the 
→ which is common to all variations, and failing miserably. I have tried 
the obvious and some (desperate) variations:


uri_detail  A1_URI_FAKE_LINK    text =~ /→/i

uri_detail  A1_URI_FAKE_LINK    text =~ /.rarr;/i

uri_detail  A1_URI_FAKE_LINK    text =~ /.rarr./i

uri_detail  A1_URI_FAKE_LINK    text =~ /rarr/i

What have I missed? Thanks for any enlightenment, RTFM.


This may help to figure it out in debug mode:

   uri_detail  __ALL_URI_DTL_TXT    text =~ /.*/
   tflags  __ALL_URI_DTL_TXTmultiple

You *should* be able to see exactly what is there - the HTML token or a 
UTF-8 byte sequence.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
 2 days until the 75th anniversary of D-Day

Re: Help matching a spam (regex)

2019-06-04 Thread Amir Caspi
On Jun 4, 2019, at 4:05 PM, RW  wrote:
> 
> On Tue, 4 Jun 2019 16:06:10 -0300 Marcio Vogel Merlone dos Santos wrote:
> 
>> Trying to match a message using uri_detail with no luck. On body I
>> have something like this:
>> 
>> Something →

> &rarr represents a '→' (right arrow) character, IIWY I'd try its
> UTF-8 byte sequence:
> 
> \xe2\x86\x92 

Correct me if I'm wrong, but aren't the HTML entities converted to unicode as 
part of localize_charset?  In that case, uri_detail would have to be done in 
unicode as RW suggests... matching &rarr would require a rawbody rule, right?

--- Amir



Re: Help matching a spam (regex)

2019-06-04 Thread RW
On Tue, 4 Jun 2019 16:06:10 -0300
Marcio Vogel Merlone dos Santos wrote:

> Hi all,
> 
> Trying to match a message using uri_detail with no luck. On body I
> have something like this:
> 
> Something →
> 
> That "something" is changed on a daily basis, so I am trying to match 
> the → which is common to all variations, and failing miserably.
> I have tried the obvious and some (desperate) variations:
> 
> uri_detail  A1_URI_FAKE_LINK    text =~ /→/i
> 
> uri_detail  A1_URI_FAKE_LINK    text =~ /.rarr;/i
> 
> uri_detail  A1_URI_FAKE_LINK    text =~ /.rarr./i
> 
> uri_detail  A1_URI_FAKE_LINK    text =~ /rarr/i
> 
> What have I missed? Thanks for any enlightenment, RTFM.
> 


&rarr represents a '→' (right arrow) character, IIWY I'd try its
UTF-8 byte sequence:

\xe2\x86\x92 



Re: Meta for bogus MIME with DKIM valid?

2019-06-04 Thread Amir Caspi
On Jun 4, 2019, at 1:24 PM, Paul Stead  wrote:
> 
> Certainly worth letting QA do it's thing and autoscore?

My worry about autoscore is that if it looks at network tests, particularly 
RBLs, then it may reduce the value of the rule.  The primary value of this rule 
is for early botnet runs before the relays and/or URIs are caught by the RBLs, 
and for content that doesn't hit any/many other rules (such as all of the 
spamples I posted).  After only a few minutes, the RBLs pick up these runs and 
the rule becomes relatively less important when considering the network 
tests... but it's a REALLY good spamminess indicator in isolation.  (The same 
argument applies with/without Bayes.)

So, if autoscore gives it a high value without network/bayes tests but a low 
value with network/bayes tests, then my strong recommendation would be to give 
it a single atomic score rather than network/non-network scoreset.

Locally, I've got the score at 4.0, and will be increasing it to 4.5 shortly.  
At least with my spamset (per the spamples I posted), a score of 4.5 seems to 
be the "magic" value that should catch almost all the FNs (at least the ones 
that hit BAYES_50 ... the ones that hit BAYES_00 might require more aggression).

Cheers.

--- Amir



Re: Meta for bogus MIME with DKIM valid?

2019-06-04 Thread Paul Stead
The rules looks to be performing better in masscheck after the updates to
the corpus checking:

https://ruleqa.spamassassin.org/20190604-r1860591-n/__BOGUS_MIME_VER_01/detail
https://ruleqa.spamassassin.org/20190604-r1860591-n/__BOGUS_MIME_VER_02/detail

Certainly worth letting QA do it's thing and autoscore?

On Tue, 4 Jun 2019 at 02:10, Amir Caspi  wrote:

> Hi Kevin,
>
> Here are some spamples -- I've specifically chosen the ones that did NOT
> score enough through other means to get tagged, i.e., these are false
> negatives.  Note that many of them have valid DKIM and hit no other
> markers.  (The spample will NOT pass DKIM because headers have been
> modified for anonymity.)  If you run them through NOW you'll probably find
> they hit Razor and Pyzor and various other things... but they clearly
> didn't at the time of receipt.  Most of them score 4.6 unless they manage
> to have enough Bayes "poison" to score lower.  (And I STILL don't know how
> they keep hitting only BAYES_50...)
>
> https://pastebin.com/BQH3JgWD
> https://pastebin.com/nXtZtUdm
> https://pastebin.com/tBQt1Raw
> https://pastebin.com/wEGvcs73
> https://pastebin.com/nuFJ48k0
> https://pastebin.com/ykCuEPNQ
> ** This last one I received from two different servers within a minute of
> each other.  The first one got nailed by SPFBL so it got marked as spam,
> but only because the combo of SPFBL (2.2) and local BOGUS_MIME_VERSION
> (4.0) pushed it over threshold.  This spample, the second of the two,
> didn't get nailed because the relay wasn't in SPFBL, so BOGUS_MIME_VERSION
> wasn't enough by itself at a score of 4.0, although it WOULD have been
> enough at a score of 4.5.
>
> I should also mention I've seen at least a few recent ones that hit
> Mailscanner's "Eudora long-MIME-boundary attack" rule.  I'm not including
> those as spamples since they got sanitized by MailScanner so aren't useful,
> but I figured it was worth mentioning.
>
> My feeling is that BOGUS_MIME_VERSION is incredibly useful during the
> early hits of snowshoers, before the RBLs, URIBLs, and content hash DBs can
> catch up.  Since it would seem to be 100% spam and 0% ham, I think scoring
> it very highly (4+ points) would be both safe and useful -- it will help
> nix these early hits but won't hinder anything else.
>
> From my experience and these spamples, where most of them are scoring 4.6
> (with 4.0 of that from BOGUS_MIME_VERSION), an optimal score would be in
> the range of 4.5 to 4.9 ... that would push these 4.6s to 5.1 or higher.
>
> I've got MANY other examples in the Junk folders on my server, and I would
> be happy to send them to you privately if needed.
>
> Cheers.
>
> --- Amir
>
> On May 30, 2019, at 9:24 AM, Kevin A. McGrail  wrote:
>
>
> Fair enough.  Happy to look at spamples but I've seen virtually nothing in
> the wild for this.
>
>
>


Help matching a spam (regex)

2019-06-04 Thread Marcio Vogel Merlone dos Santos

Hi all,

Trying to match a message using uri_detail with no luck. On body I have 
something like this:


Something →

That "something" is changed on a daily basis, so I am trying to match 
the → which is common to all variations, and failing miserably. I 
have tried the obvious and some (desperate) variations:


uri_detail  A1_URI_FAKE_LINK    text =~ /→/i

uri_detail  A1_URI_FAKE_LINK    text =~ /.rarr;/i

uri_detail  A1_URI_FAKE_LINK    text =~ /.rarr./i

uri_detail  A1_URI_FAKE_LINK    text =~ /rarr/i

What have I missed? Thanks for any enlightenment, RTFM.


Best regards.


--
*Marcio Merlone*


Re: MISSING_SUBJECT rule on email with subject

2019-06-04 Thread RW
On Tue, 4 Jun 2019 18:10:51 +0300
Savvas Karagiannidis wrote:

> Hi,
> 
> my guess is that for some reason an empty line is inserted in the
> email somewhere above the headers and before the message is processed
> by spamassassin. This will cause all headers below this empty line to
> be treated as the actual body of the message, so all missing header
> tests will hit and will result in what you actually see. 

But as has already been pointed out it has the combination of
MISSING_FROM and HK_RANDOM_FROM, and the latter is based on a From:addr
test. 


Re: MISSING_SUBJECT rule on email with subject

2019-06-04 Thread Savvas Karagiannidis

Hi,

my guess is that for some reason an empty line is inserted in the email 
somewhere above the headers and before the message is processed by 
spamassassin. This will cause all headers below this empty line to be 
treated as the actual body of the message, so all missing header tests 
will hit and will result in what you actually see. This could be a bug 
in the software you use for email content filtering...


Regards,

Savvas Karagiannidis


On 04/06/2019 17:29, Stephan Fourie wrote:

Hi,

My apologies, seems something went wrong with the formatting when it 
was pasted to the pastebin. Here's a new example with spacing intact: 
https://pastebin.com/raw/tQtSMQPs


In this example some of the other headers were also not 'seen'.

Thanks!
Stephan

On 2019/06/04 10:55, Matus UHLAR - fantomas wrote:

On 3 Jun 2019, at 2:20, Stephan Fourie wrote:
> We're currently seeing the rule MISSING_SUBJECT sporadically
> hitting on emails that have a subject. This issue seems to have
> started during last week, which is when clients started complaining
> about false positive detections. Please see example headers at the
> following link:
>
> https://pastebin.com/raw/GtnV67Hj



On Mon, 03 Jun 2019 11:43:44 -0400 Bill Cole wrote:

The headers are all missing the traditional space between the colon
and the header content.


On 03.06.19 19:11, RW wrote:

And this include google headers, so presumably the spaces have been
stripped locally.


now one question is,
if the spaces have been stripped prior to spam checking,
another is,
if SA does/should expect whitespaces after header fields.

if the first answer is true, then SA can't do much about misformatted
e-mail.

But since FROM_AND_TO_IS_SAME_DOMAIN was hit, I don't think the 
spaces were

stripped, so

- we need to see the original message as it was scanned. Anything else,
 reformated by anyone (e.g. outlook or exchange use to reformat mail),
can't help us much finding the issue.





Re: MISSING_SUBJECT rule on email with subject

2019-06-04 Thread Matus UHLAR - fantomas

On 04.06.19 16:29, Stephan Fourie wrote:
My apologies, seems something went wrong with the formatting when it 
was pasted to the pastebin. Here's a new example with spacing intact: 
https://pastebin.com/raw/tQtSMQPs


In this example some of the other headers were also not 'seen'.


there's something strange:

 1.0 HK_RANDOM_FROM From username looks random
 0.5 FREEMAIL_FROM  Sender email is commonly abused enduser mail
provider (x[at]gmail.com)


 1.0 MISSING_FROM   Missing From: header
 1.8 MISSING_SUBJECTMissing Subject: header

so the spam scanner both did and did not see the From: header.

What do you use for mail scanning? 


On 2019/06/04 10:55, Matus UHLAR - fantomas wrote:

On 3 Jun 2019, at 2:20, Stephan Fourie wrote:

We're currently seeing the rule MISSING_SUBJECT sporadically
hitting on emails that have a subject. This issue seems to have
started during last week, which is when clients started complaining
about false positive detections. Please see example headers at the
following link:

https://pastebin.com/raw/GtnV67Hj



On Mon, 03 Jun 2019 11:43:44 -0400 Bill Cole wrote:

The headers are all missing the traditional space between the colon
and the header content.


On 03.06.19 19:11, RW wrote:

And this include google headers, so presumably the spaces have been
stripped locally.


now one question is,
if the spaces have been stripped prior to spam checking,
another is,
if SA does/should expect whitespaces after header fields.

if the first answer is true, then SA can't do much about misformatted
e-mail.

But since FROM_AND_TO_IS_SAME_DOMAIN was hit, I don't think the 
spaces were

stripped, so

- we need to see the original message as it was scanned. Anything else,
 reformated by anyone (e.g. outlook or exchange use to reformat mail),
can't help us much finding the issue.





--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Microsoft dick is soft to do no harm


Re: MISSING_SUBJECT rule on email with subject

2019-06-04 Thread Stephan Fourie

Hi,

My apologies, seems something went wrong with the formatting when it was 
pasted to the pastebin. Here's a new example with spacing intact: 
https://pastebin.com/raw/tQtSMQPs


In this example some of the other headers were also not 'seen'.

Thanks!
Stephan

On 2019/06/04 10:55, Matus UHLAR - fantomas wrote:

On 3 Jun 2019, at 2:20, Stephan Fourie wrote:
> We're currently seeing the rule MISSING_SUBJECT sporadically
> hitting on emails that have a subject. This issue seems to have
> started during last week, which is when clients started complaining
> about false positive detections. Please see example headers at the
> following link:
>
> https://pastebin.com/raw/GtnV67Hj



On Mon, 03 Jun 2019 11:43:44 -0400 Bill Cole wrote:

The headers are all missing the traditional space between the colon
and the header content.


On 03.06.19 19:11, RW wrote:

And this include google headers, so presumably the spaces have been
stripped locally.


now one question is,
if the spaces have been stripped prior to spam checking,
another is,
if SA does/should expect whitespaces after header fields.

if the first answer is true, then SA can't do much about misformatted
e-mail.

But since FROM_AND_TO_IS_SAME_DOMAIN was hit, I don't think the spaces 
were

stripped, so

- we need to see the original message as it was scanned. Anything else,
 reformated by anyone (e.g. outlook or exchange use to reformat mail),
can't help us much finding the issue.





Re: A new url shortener not in __URL_SHORTENER?

2019-06-04 Thread Kevin A. McGrail
+1. If it is a shortener, we should add it.

On Tue, Jun 4, 2019, 03:57 hg user  wrote:

>
> Hi,
> I noticed spam using ccuz url shortener in an italian spam
> advertising a sex site.
> I was wondering if it would be good to be added to __URL_SHORTENER or not.
> In this specific case it won't help to score higher but who knows in the
> future?
>
>
>
>
>
>


Re: MISSING_SUBJECT rule on email with subject

2019-06-04 Thread Matus UHLAR - fantomas

On 3 Jun 2019, at 2:20, Stephan Fourie wrote:
> We're currently seeing the rule MISSING_SUBJECT sporadically
> hitting on emails that have a subject. This issue seems to have
> started during last week, which is when clients started complaining
> about false positive detections. Please see example headers at the
> following link:
>
> https://pastebin.com/raw/GtnV67Hj



On Mon, 03 Jun 2019 11:43:44 -0400 Bill Cole wrote:

The headers are all missing the traditional space between the colon
and the header content.


On 03.06.19 19:11, RW wrote:

And this include google headers, so presumably the spaces have been
stripped locally.


now one question is,
if the spaces have been stripped prior to spam checking,
another is,
if SA does/should expect whitespaces after header fields.

if the first answer is true, then SA can't do much about misformatted
e-mail.

But since FROM_AND_TO_IS_SAME_DOMAIN was hit, I don't think the spaces were
stripped, so

- we need to see the original message as it was scanned. Anything else,
 reformated by anyone (e.g. outlook or exchange use to reformat mail),
can't help us much finding the issue.

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Linux IS user friendly, it's just selective who its friends are...


A new url shortener not in __URL_SHORTENER?

2019-06-04 Thread hg user
Hi,
I noticed spam using ccuz url shortener in an italian spam advertising
a sex site.
I was wondering if it would be good to be added to __URL_SHORTENER or not.
In this specific case it won't help to score higher but who knows in the
future?