Re: Stealth HREF= (missed by SA)

2023-09-20 Thread Joe Wein via users

On Friday, September 15, 2023 15:34, Giovanni wrote:

On 9/14/23 17:01, Pedro David Marco wrote:

The same happens with other HTML tags...




do you have a spample to share (public or privately) ?


I am happy to confirm that revision 1912414 is working great and fixes the 
problem.


Grazie mille!

Joe
SURBL


 Thanks
   Giovanni




Re: Stealth HREF= (missed by SA)

2023-09-17 Thread John Hardin

On Fri, 15 Sep 2023, Bill Cole wrote:


On 2023-09-14 at 11:01:37 UTC-0400 (Thu, 14 Sep 2023 15:01:37 + (UTC))
Pedro David Marco via users 
is rumored to have said:


 The same happens with other HTML tags...
 <=

DEFANGED_IMG  src=  can be replaced with <=
DEFANGED_IMG xyz/src=

 virtually any char but >

 so, with Giovanni permission, i  tighten the nut 1 more turn   (limiting
 to 100 chars to prevent Regex Self-DOS)
 rawbody BADHREF /<(a|img|video)[^>]{0,100}\/(src|href)\=/


 Pete.


I've tweaked this a bit and added it to my ruleQA sandbox:

describe HTML_BADATTR Illegal char in HTML attribute name
rawbody  HTML_BADATTR /<[a-z]{1,10}[^>]{1,80}\/(src|href)\=/


Probably should loosen that a tiny bit to allow for whitespace between the 
attr and the equals sign, and a whitespace after the tag name will keep 
the two variable-length REs from competing:


/<[a-z]{1,10}\s[^>]{1,80}\/(src|href)\s*\=/



--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.org pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Microsoft is not a standards body.
---
 Today: the 236th anniversary of the signing of the U.S. Constitution

Re: Stealth HREF= (missed by SA)

2023-09-15 Thread Bill Cole
On 2023-09-14 at 11:01:37 UTC-0400 (Thu, 14 Sep 2023 15:01:37 + 
(UTC))

Pedro David Marco via users 
is rumored to have said:


The same happens with other HTML tags...


so, with Giovanni permission, i  tighten the nut 1 more turn  
 (limiting to 100 chars to prevent Regex Self-DOS)

rawbody BADHREF /<(a|img|video)[^>]{0,100}\/(src|href)\=/


Pete.


I've tweaked this a bit and added it to my ruleQA sandbox:

describe HTML_BADATTR Illegal char in HTML attribute name
rawbody  HTML_BADATTR /<[a-z]{1,10}[^>]{1,80}\/(src|href)\=/
scoreHTML_BADATTR 1
tflags   HTML_BADATTR publish






On Thursday, September 14, 2023 at 04:37:15 PM GMT+2, 
 wrote:


 On 9/14/23 16:24, Bill Cole wrote:

On 2023-09-14 at 04:37:03 UTC-0400 (Thu, 14 Sep 2023 17:37:03 +0900)
Joe Wein via users 
is rumored to have said:

I filed a bug for this issue on Bugzilla (#8186) but so far no 
response from developers.

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8186


FWIW, I've thought about it a bit...

We're seeing literally millions of phishing spams from Tencent VMs 
in Singapore targeting mostly Amazon Japan that are getting around 
SA checks because of this issue.


Wow. I didn't expect that this was that big of a tactic.

I am wondering how many other users are seeing this problem which 
allows spammers to circumvent URI checks in links in spam (i.e. hide 
the payload sites).


I don't see it, but the systems I manage have no reason to expect 
anything but criminal-grade spam from anything on a Tencent network 
in Singapore. Everyone gets their own bespoke spamstream I guess.


They do it by prefixing the href= attribute in an HTML href="..."> tag with letters and a slash, for example:


https://some.phishing.site:>https://amazon.co.jp

Both Chrome and mail clients like Mozilla Thunderbird discard that 
"h/" prefix (perhaps treating it as a separate unrecognizable 
attribute, like "the payload site while SpamAssassin will not see the URI and 
therefore not it through any of the rules for URIs.


This means even if the bad site is listed on domain RBLs (SURBL, 
Spamhaus or URIBL), the mail is not tagged for that.


Joe Wein
SURBL


I'm thinking that the best approach may not be in trying to parse the 
bogus tag to glean a domain that may or may not be known to be bad, 
but rather to detect the general pattern, which is itself a direct 
indicator of bad intent.



rawbody BADHREF /\s+.\/href\=/

should be a start to write a rule to catch those spam messages.
  Giovanni



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Stealth HREF= (missed by SA)

2023-09-15 Thread giova...@paclan.it

On 9/14/23 17:01, Pedro David Marco wrote:

The same happens with other HTML tags...




do you have a spample to share (public or privately) ?
 Thanks
   Giovanni




so, with Giovanni permission, i  tighten the nut 1 more turn   (limiting to 100 
chars to prevent Regex Self-DOS)

rawbody BADHREF /<(a|img|video)[^>]{0,100}\/(src|href)\=/


Pete.



On Thursday, September 14, 2023 at 04:37:15 PM GMT+2,  
wrote:


On 9/14/23 16:24, Bill Cole wrote:

 > On 2023-09-14 at 04:37:03 UTC-0400 (Thu, 14 Sep 2023 17:37:03 +0900)
 > Joe Wein via users mailto:joew...@surbl.org>>
 > is rumored to have said:
 >
 >> I filed a bug for this issue on Bugzilla (#8186) but so far no response 
from developers.
 >> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8186 

 >
 > FWIW, I've thought about it a bit...
 >
 >> We're seeing literally millions of phishing spams from Tencent VMs in 
Singapore targeting mostly Amazon Japan that are getting around SA checks because of 
this issue.
 >
 > Wow. I didn't expect that this was that big of a tactic.
 >
 >> I am wondering how many other users are seeing this problem which allows 
spammers to circumvent URI checks in links in spam (i.e. hide the payload sites).
 >
 > I don't see it, but the systems I manage have no reason to expect anything 
but criminal-grade spam from anything on a Tencent network in Singapore. Everyone 
gets their own bespoke spamstream I guess.
 >
 >> They do it by prefixing the href= attribute in an HTML  tag 
with letters and a slash, for example:
 >>
 >> https://some.phishing.site:>https://amazon.co.jp 

 >>
 >> Both Chrome and mail clients like Mozilla Thunderbird discard that "h/" prefix (perhaps 
treating it as a separate unrecognizable attribute, like ">
 >> This means even if the bad site is listed on domain RBLs (SURBL, Spamhaus 
or URIBL), the mail is not tagged for that.
 >>
 >> Joe Wein
 >> SURBL
 >
 > I'm thinking that the best approach may not be in trying to parse the bogus 
tag to glean a domain that may or may not be known to be bad, but rather to detect 
the general pattern, which is itself a direct indicator of bad intent.

 >
rawbody BADHREF /\s+.\/href\=/

should be a start to write a rule to catch those spam messages.
   Giovanni






OpenPGP_signature
Description: OpenPGP digital signature


Re: Stealth HREF= (missed by SA)

2023-09-14 Thread Pedro David Marco via users
 The same happens with other HTML tags...


so, with Giovanni permission, i  tighten the nut 1 more turn   (limiting to 100 
chars to prevent Regex Self-DOS)
rawbody BADHREF /<(a|img|video)[^>]{0,100}\/(src|href)\=/


Pete.


On Thursday, September 14, 2023 at 04:37:15 PM GMT+2,  
wrote:  
 
 On 9/14/23 16:24, Bill Cole wrote:
> On 2023-09-14 at 04:37:03 UTC-0400 (Thu, 14 Sep 2023 17:37:03 +0900)
> Joe Wein via users 
> is rumored to have said:
> 
>> I filed a bug for this issue on Bugzilla (#8186) but so far no response from 
>> developers.
>> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8186
> 
> FWIW, I've thought about it a bit...
> 
>> We're seeing literally millions of phishing spams from Tencent VMs in 
>> Singapore targeting mostly Amazon Japan that are getting around SA checks 
>> because of this issue.
> 
> Wow. I didn't expect that this was that big of a tactic.
> 
>> I am wondering how many other users are seeing this problem which allows 
>> spammers to circumvent URI checks in links in spam (i.e. hide the payload 
>> sites).
> 
> I don't see it, but the systems I manage have no reason to expect anything 
> but criminal-grade spam from anything on a Tencent network in Singapore. 
> Everyone gets their own bespoke spamstream I guess.
> 
>> They do it by prefixing the href= attribute in an HTML  tag 
>> with letters and a slash, for example:
>>
>> https://some.phishing.site:>https://amazon.co.jp
>>
>> Both Chrome and mail clients like Mozilla Thunderbird discard that "h/" 
>> prefix (perhaps treating it as a separate unrecognizable attribute, like "> h href="...") and display a clickable link to the payload site while 
>> SpamAssassin will not see the URI and therefore not it through any of the 
>> rules for URIs.
>>
>> This means even if the bad site is listed on domain RBLs (SURBL, Spamhaus or 
>> URIBL), the mail is not tagged for that.
>>
>> Joe Wein
>> SURBL
> 
> I'm thinking that the best approach may not be in trying to parse the bogus 
> tag to glean a domain that may or may not be known to be bad, but rather to 
> detect the general pattern, which is itself a direct indicator of bad intent.
> 
rawbody BADHREF /\s+.\/href\=/

should be a start to write a rule to catch those spam messages.
  Giovanni

  

Re: Stealth HREF= (missed by SA)

2023-09-14 Thread giovanni

On 9/14/23 16:24, Bill Cole wrote:

On 2023-09-14 at 04:37:03 UTC-0400 (Thu, 14 Sep 2023 17:37:03 +0900)
Joe Wein via users 
is rumored to have said:


I filed a bug for this issue on Bugzilla (#8186) but so far no response from 
developers.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8186


FWIW, I've thought about it a bit...


We're seeing literally millions of phishing spams from Tencent VMs in Singapore 
targeting mostly Amazon Japan that are getting around SA checks because of this 
issue.


Wow. I didn't expect that this was that big of a tactic.


I am wondering how many other users are seeing this problem which allows 
spammers to circumvent URI checks in links in spam (i.e. hide the payload 
sites).


I don't see it, but the systems I manage have no reason to expect anything but 
criminal-grade spam from anything on a Tencent network in Singapore. Everyone 
gets their own bespoke spamstream I guess.


They do it by prefixing the href= attribute in an HTML  tag with 
letters and a slash, for example:

https://some.phishing.site:>https://amazon.co.jp

Both Chrome and mail clients like Mozilla Thunderbird discard that "h/" prefix (perhaps treating 
it as a separate unrecognizable attribute, like "

I'm thinking that the best approach may not be in trying to parse the bogus tag 
to glean a domain that may or may not be known to be bad, but rather to detect 
the general pattern, which is itself a direct indicator of bad intent.


rawbody BADHREF /\s+.\/href\=/

should be a start to write a rule to catch those spam messages.
 Giovanni



OpenPGP_signature
Description: OpenPGP digital signature


Re: Stealth HREF= (missed by SA)

2023-09-14 Thread Bill Cole

On 2023-09-14 at 04:37:03 UTC-0400 (Thu, 14 Sep 2023 17:37:03 +0900)
Joe Wein via users 
is rumored to have said:

I filed a bug for this issue on Bugzilla (#8186) but so far no 
response from developers.

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8186


FWIW, I've thought about it a bit...

We're seeing literally millions of phishing spams from Tencent VMs in 
Singapore targeting mostly Amazon Japan that are getting around SA 
checks because of this issue.


Wow. I didn't expect that this was that big of a tactic.

I am wondering how many other users are seeing this problem which 
allows spammers to circumvent URI checks in links in spam (i.e. hide 
the payload sites).


I don't see it, but the systems I manage have no reason to expect 
anything but criminal-grade spam from anything on a Tencent network in 
Singapore. Everyone gets their own bespoke spamstream I guess.


They do it by prefixing the href= attribute in an HTML  
tag with letters and a slash, for example:


https://some.phishing.site:>https://amazon.co.jp

Both Chrome and mail clients like Mozilla Thunderbird discard that 
"h/" prefix (perhaps treating it as a separate unrecognizable 
attribute, like "payload site while SpamAssassin will not see the URI and therefore not 
it through any of the rules for URIs.


This means even if the bad site is listed on domain RBLs (SURBL, 
Spamhaus or URIBL), the mail is not tagged for that.


Joe Wein
SURBL


I'm thinking that the best approach may not be in trying to parse the 
bogus tag to glean a domain that may or may not be known to be bad, but 
rather to detect the general pattern, which is itself a direct indicator 
of bad intent.





--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Stealth HREF= (missed by SA)

2023-09-14 Thread Benny Pedersen

Joe Wein via users skrev den 2023-09-14 10:37:


This means even if the bad site is listed on domain RBLs (SURBL,
Spamhaus or URIBL), the mail is not tagged for that.


should sa maybe begin using HtmlTidi 
https://metacpan.org/dist/Perl-Tidy/view/lib/Perl/Tidy.pod


i have samples with src="" and href="", is this same error you see ?

cli tools https://www.html-tidy.org/ can this be added to sa, so sa 
always see valid html, or possible check if there is lots of invalid 
code or just non-sense html refs


thanks for the bugzilla, it hopefully can be fixed




Stealth HREF= (missed by SA)

2023-09-14 Thread Joe Wein via users
I filed a bug for this issue on Bugzilla (#8186) but so far no response from 
developers.

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8186

We're seeing literally millions of phishing spams from Tencent VMs in 
Singapore targeting mostly Amazon Japan that are getting around SA checks 
because of this issue.


I am wondering how many other users are seeing this problem which allows 
spammers to circumvent URI checks in links in spam (i.e. hide the payload 
sites).


They do it by prefixing the href= attribute in an HTML  tag 
with letters and a slash, for example:


https://some.phishing.site:>https://amazon.co.jp

Both Chrome and mail clients like Mozilla Thunderbird discard that "h/" 
prefix (perhaps treating it as a separate unrecognizable attribute, like "h href="...") and display a clickable link to the payload site while 
SpamAssassin will not see the URI and therefore not it through any of the 
rules for URIs.


This means even if the bad site is listed on domain RBLs (SURBL, Spamhaus or 
URIBL), the mail is not tagged for that.


Joe Wein
SURBL