Re: Rules for a recent flood of BTC/webcam spam

2021-02-25 Thread John Hardin

On Fri, 26 Feb 2021, RW wrote:


It's also possible to tighten the range down to {32,33} or even
{33} without losing many matches:


$ for n  in `jot 12  25` ; do   printf "$n" ;   < bitcoinlist  egrep
"^[13].{${n}}$" | wc -l ; done
25   0
26   0
27   0
28   0
29   3
30   1
31   4
321659
33   50290
34   8


Interesting analysis, thanks. I'll tighten it up a bit based on that.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.org pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  USMC Rules of Gunfighting #20: The faster you finish the fight,
  the less shot you will get.
---
 271 days since the first private commercial manned orbital mission (SpaceX)


Re: Rules for a recent flood of BTC/webcam spam

2021-02-25 Thread RW
On Thu, 25 Feb 2021 12:13:59 -0500
Alan wrote:


> Bitcoin addresses start with either 1 or 3. 

Most do, but around 13% of those reported to the bitcoin abuse database
are in the format starting with "bc".

> It's less general specifically to avoid FPs. Personally I'm weighting
> this pretty high so I don't want to trigger on non-obfuscated BTC
> addresses. 

Now I come to think of it I think we've been here before, and allowing
arbitrary spaces lead to a reported FP on ordinary text. 

If you still meta with A4A_PORNSCAM_WORD you can afford to take some
risks with the address match though.

Before __BITCOIN_ID was in the core rules I had my own version for the
^[13] format that checked for mixed case and an additional digit. If
those conditions are not met it's most likely an FP. 

It's also possible to tighten the range down to {32,33} or even
{33} without losing many matches:


$ for n  in `jot 12  25` ; do   printf "$n" ;   < bitcoinlist  egrep
"^[13].{${n}}$" | wc -l ; done 
25   0
26   0
27   0
28   0
29   3
30   1
31   4
321659
33   50290
34   8



Re: Mal formed urls

2021-02-25 Thread Bill Cole

On 25 Feb 2021, at 17:14, Rick Cooper wrote:

As far as I can tell the authority/path-abempty portion of a uri is 
optional

and must begin with // but can be empty


No, https://tools.ietf.org/html/rfc7230#section-2.7.1 shows the 
definition in ABNF, a strictly-defined syntax for strictly defining 
other syntaxes.  The "//" part denotes a mandatory literal string, in 
the same way that the "http:" part is a mandatory literal string. The 
'authority' and 'path-abempty' parts are distinct mandatory named 
components which are defined in RFC3986, the text of which states that 
an authority is *preceded by* '//' (as it is in the spec of the http: 
URI) while the ABNF definition of authority (which is usually just a 
'host' component) does not include '//' at all, i.e. an authority 
component itself does not include the preceding '//'.


Yeah, I know: pedantry. RFCs are intrinsically pedantic.

Incidentally, earlier this week there was a blog post by a security firm 
decrying such obfuscation of URIs in phishing email as if it were a 
cutting edge new tactic for bypassing filters. It is neither new nor 
does it fool any decent filters.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


RE: Mal formed urls

2021-02-25 Thread Rick Cooper


Bill Cole wrote:
> On 25 Feb 2021, at 13:37, Rick Cooper wrote:
> 
>> I was just working on some rules to catch the current crop of mal
>> formed urls used to escape detection by solutions that extract urls
>> from emails and compare them to known bad urls and I am wondering if
>> spamassassin's patterns for extraction take this into account?
>> 
>> For instance:
>> 
>> https:www.google.com/mail
>> https:\/www.google.com/mail
>> https:\\www.google.com/mail
>> 
>> Will all work at getting you to gmail because the technical spec
>> doesn't actually require \\ after the colon.
> 
> Of course not: A http: URI must NOT contain '\\' after the colon, it
> MUST contain '//' after the colon. See

Sorry , the \\ is a type since that would be the beginning of a unc path for
a windows box.

As far as I can tell the authority/path-abempty portion of a uri is optional
and must begin with // but can be empty 
Hence https:www.google.com or https:\/www.google.com/. I have noticed every
browser I tested it with normalizes it back to the conventional //. But my
question was, given this is apparently an issue with some solutions parsing
of uris does SA extract them and as both you and John pointed out it does so
I am happy


> https://tools.ietf.org/html/rfc7230#section-2.7.1 which is the
> technical spec for the formal syntax of a http URI. OTOH, there are
> URI schemes which do not include '//' (e.g. mailto:) so any tool that
> is doing broad URI detection can't be too picky.
> 
> What flavors of garbage almost-URIs will work in a browser very much
> depends on the whims of browser developers, and whether those are
> 'clickable' in your preferred MUA is dependent on the gullibility of
> your MUA author.
> 
> SpamAssassin traditionally has assumed that there will always be some
> MUA and browser authors who lack any sense of caution or prudence, so
> SA is VERY loose with what it will consider as maybe being a hostname
> in something that could be a URI in some obscure or novel scheme.
> 
>> Will spamassassin still extract and normalize the urls above?
> 
> Yes, it will see all 3 as the same canonicalized URI.
> 
>> I was hoping
>> to avoid digging through the source to find out.
> 
> No need to dig though the source, you can see what URIs SpamAssassin
> detects (trimmed of the parts after the hostname) in a message by
> manually testing it with 'spamassassin -D uri' Note that SA will only
> show one instance of otherwise identical URIs after trimming and
> canonicalization.



Re: Mal formed urls

2021-02-25 Thread Bill Cole

On 25 Feb 2021, at 13:37, Rick Cooper wrote:

I was just working on some rules to catch the current crop of mal 
formed
urls used to escape detection by solutions that extract urls from 
emails and
compare them to known bad urls and I am wondering if spamassassin's 
patterns

for extraction take this into account?

For instance:

https:www.google.com/mail
https:\/www.google.com/mail
https:\\www.google.com/mail

Will all work at getting you to gmail because the technical spec 
doesn't

actually require \\ after the colon.


Of course not: A http: URI must NOT contain '\\' after the colon, it 
MUST contain '//' after the colon. See 
https://tools.ietf.org/html/rfc7230#section-2.7.1 which is the technical 
spec for the formal syntax of a http URI. OTOH, there are URI schemes 
which do not include '//' (e.g. mailto:) so any tool that is doing broad 
URI detection can't be too picky.


What flavors of garbage almost-URIs will work in a browser very much 
depends on the whims of browser developers, and whether those are 
'clickable' in your preferred MUA is dependent on the gullibility of 
your MUA author.


SpamAssassin traditionally has assumed that there will always be some 
MUA and browser authors who lack any sense of caution or prudence, so SA 
is VERY loose with what it will consider as maybe being a hostname in 
something that could be a URI in some obscure or novel scheme.



Will spamassassin still extract and normalize the urls above?


Yes, it will see all 3 as the same canonicalized URI.


I was hoping
to avoid digging through the source to find out.


No need to dig though the source, you can see what URIs SpamAssassin 
detects (trimmed of the parts after the hostname) in a message by 
manually testing it with 'spamassassin -D uri' Note that SA will only 
show one instance of otherwise identical URIs after trimming and 
canonicalization.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Mal formed urls

2021-02-25 Thread John Hardin

On Thu, 25 Feb 2021, Rick Cooper wrote:


I was just working on some rules to catch the current crop of mal formed
urls used to escape detection by solutions that extract urls from emails and
compare them to known bad urls and I am wondering if spamassassin's patterns
for extraction take this into account?

For instance:

https:www.google.com/mail
https:\/www.google.com/mail
https:\\www.google.com/mail

Will all work at getting you to gmail because the technical spec doesn't
actually require \\ after the colon.
Will spamassassin still extract and normalize the urls above? I was hoping
to avoid digging through the source to find out.


Yes, all of those do get detected and normalized.

http:fnord01.com/blah
http:\/fnord02.com/blah
http:/\fnord03.com/blah
http:\\fnord04.com/blah

Feb 25 13:24:03.445 [13854] dbg: rules: ran uri rule __ALL_URI ==> got hit: 
"http://fnord03.com/blah";
Feb 25 13:24:03.446 [13854] dbg: rules: ran uri rule __ALL_URI ==> got hit: 
"http://fnord02.com/blah";
Feb 25 13:24:03.447 [13854] dbg: rules: ran uri rule __ALL_URI ==> got hit: 
"http://fnord01.com/blah";
Feb 25 13:24:03.447 [13854] dbg: rules: ran uri rule __ALL_URI ==> got hit: 
"http://fnord04.com/blah";


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.org pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Are you a mildly tech-literate politico horrified by the level of
  ignorance demonstrated by lawmakers gearing up to regulate online
  technology they don't even begin to grasp? Cool. Now you have a
  tiny glimpse into a day in the life of a gun owner.   -- Sean Davis
---
 271 days since the first private commercial manned orbital mission (SpaceX)


Mal formed urls

2021-02-25 Thread Rick Cooper
I was just working on some rules to catch the current crop of mal formed
urls used to escape detection by solutions that extract urls from emails and
compare them to known bad urls and I am wondering if spamassassin's patterns
for extraction take this into account?

For instance:

https:www.google.com/mail
https:\/www.google.com/mail
https:\\www.google.com/mail

Will all work at getting you to gmail because the technical spec doesn't
actually require \\ after the colon.
Will spamassassin still extract and normalize the urls above? I was hoping
to avoid digging through the source to find out.

Rick 


Re: Rules for a recent flood of BTC/webcam spam

2021-02-25 Thread Alan



On 2021-02-25 10:54, John Hardin wrote:

On Thu, 25 Feb 2021, RW wrote:


On Wed, 24 Feb 2021 18:37:42 -0800 (PST)
John Hardin wrote:


On Wed, 24 Feb 2021, Alan wrote:


After a little more research, a better regex for an obfuscated BTC
address is

/[13][ \-]([a-km-zA-HJ-NP-Z0-9][ \-]){25,32}[a-km-zA-HJ-NP-Z0-9]/

It might be worth adding = and _ to the obfuscating delimiters.
YMMV.


I've updated __BITCOIN_ID with -, = and _ obfuscations, which I
haven't seen myself yet.

Thanks!



Possibly

 (?:[-_=\s][a-km-zA-HJ-NP-Z1-9]){25,34}|[a-km-zA-HJ-NP-Z1-9]{25,34})

should be

 (?:[-_=\s]*[a-km-zA-HJ-NP-Z1-9]){25,34}

It's shorter and more general.


I'd prefer:

 (?:[-_=\s]?[a-km-zA-HJ-NP-Z1-9]){25,34}

The reason I haven't is I have not seen a mixture yet - it's either 
all spaced or not at all.


I'll take a look at that tonight when I have some time.


The more loose you get with matching obfuscation the greater the 
chance of false positives. Consider, for example, the PGP key in my 
.sig (which has a zero, but I'd wager there are PGP key signatures 
that look like obfuscated bitcoin wallet addresses...)


Also, there's a limit to how complex the obfuscation can get before 
the recipient can't (or won't) follow the instructions.



Bitcoin addresses start with either 1 or 3. It's less general 
specifically to avoid FPs. Personally I'm weighting this pretty high so 
I don't want to trigger on non-obfuscated BTC addresses. So far, all of 
my targets send a plain text version so "just a space" has been working.


All that said, another potential obfuscation would be a period. I'm 
going to add that.


--
For SpamAsassin Users List



Re: Rules for a recent flood of BTC/webcam spam

2021-02-25 Thread John Hardin

On Thu, 25 Feb 2021, RW wrote:


On Wed, 24 Feb 2021 18:37:42 -0800 (PST)
John Hardin wrote:


On Wed, 24 Feb 2021, Alan wrote:


After a little more research, a better regex for an obfuscated BTC
address is

/[13][ \-]([a-km-zA-HJ-NP-Z0-9][ \-]){25,32}[a-km-zA-HJ-NP-Z0-9]/

It might be worth adding = and _ to the obfuscating delimiters.
YMMV.


I've updated __BITCOIN_ID with -, = and _ obfuscations, which I
haven't seen myself yet.

Thanks!



Possibly

 (?:[-_=\s][a-km-zA-HJ-NP-Z1-9]){25,34}|[a-km-zA-HJ-NP-Z1-9]{25,34})

should be

 (?:[-_=\s]*[a-km-zA-HJ-NP-Z1-9]){25,34}

It's shorter and more general.


I'd prefer:

 (?:[-_=\s]?[a-km-zA-HJ-NP-Z1-9]){25,34}

The reason I haven't is I have not seen a mixture yet - it's either all 
spaced or not at all.


I'll take a look at that tonight when I have some time.


The more loose you get with matching obfuscation the greater the chance of 
false positives. Consider, for example, the PGP key in my .sig (which has 
a zero, but I'd wager there are PGP key signatures that look like 
obfuscated bitcoin wallet addresses...)


Also, there's a limit to how complex the obfuscation can get before the 
recipient can't (or won't) follow the instructions.



--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.org pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Where are my space habitats? Where is my flying car?
  It's 2010 and all I got from the SF books of my youth
  is the lousy dystopian government.  -- perlhaqr
---
 271 days since the first private commercial manned orbital mission (SpaceX)


Re: Trouble with XM_RANDOM rule

2021-02-25 Thread John Hardin

On Thu, 25 Feb 2021, Jared Hall wrote:


On 2/24/2021 9:43 PM, John Hardin wrote:

The __XM_RANDOM header rule is intended to catch the specific condition of 
the email, the scored XM_RANDOM meta is intended to add points for when 
that condition indicates spam.


Ouch, I figured as much.  With a name like XM_RANDOM, it's gotta be good :)

I recall about 10 years ago getting floods with (pseudo)random (eg: 
qxvfdgeexcfffdf, etc) type mailers.  I was just wondering if this was 
artifactual.


It's current. Somebody decided to send a large spam campaign using forged 
sender addresses in my wife's domain, so I got a lot of NDA bounces with 
spam content I don't usually see. There were a lot of random gibberish 
mailers, as well as some that look plausible at a glance but suspicious 
upon further consideration.


I got a bunch of new rules off that so I'm not complaining too hard.

  I don't know if you Guys (pc: and Gals)  keep notes when each 
rule gets developed and what not.  But that's not really a question for 
this list, so No Big Deal.


For myself, not beyond the SVN history.

I've been scanning all outbound Email for 3-1/2 years now.  I scan at the 
SMTP level, with no discernible performance hit.  It certainly has saved my 
butt on a few occasions.  Now I *opine* this:  There is something to the  
ZERO-TRUST security model.


Hm, yeah.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.org pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Where are my space habitats? Where is my flying car?
  It's 2010 and all I got from the SF books of my youth
  is the lousy dystopian government.  -- perlhaqr
---
 271 days since the first private commercial manned orbital mission (SpaceX)

Re: Rules for a recent flood of BTC/webcam spam

2021-02-25 Thread RW
On Wed, 24 Feb 2021 18:37:42 -0800 (PST)
John Hardin wrote:

> On Wed, 24 Feb 2021, Alan wrote:
> 
> > After a little more research, a better regex for an obfuscated BTC
> > address is
> >
> > /[13][ \-]([a-km-zA-HJ-NP-Z0-9][ \-]){25,32}[a-km-zA-HJ-NP-Z0-9]/
> >
> > It might be worth adding = and _ to the obfuscating delimiters.
> > YMMV.  
> 
> I've updated __BITCOIN_ID with -, = and _ obfuscations, which I
> haven't seen myself yet.
> 
> Thanks!
> 

Possibly

  (?:[-_=\s][a-km-zA-HJ-NP-Z1-9]){25,34}|[a-km-zA-HJ-NP-Z1-9]{25,34})

should be 

  (?:[-_=\s]*[a-km-zA-HJ-NP-Z1-9]){25,34}

It's shorter and more general.