Re: Rules for a recent flood of BTC/webcam spam
On Fri, 26 Feb 2021, RW wrote: It's also possible to tighten the range down to {32,33} or even {33} without losing many matches: $ for n in `jot 12 25` ; do printf "$n" ; < bitcoinlist egrep "^[13].{${n}}$" | wc -l ; done 25 0 26 0 27 0 28 0 29 3 30 1 31 4 321659 33 50290 34 8 Interesting analysis, thanks. I'll tighten it up a bit based on that. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.org pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- USMC Rules of Gunfighting #20: The faster you finish the fight, the less shot you will get. --- 271 days since the first private commercial manned orbital mission (SpaceX)
Re: Rules for a recent flood of BTC/webcam spam
On Thu, 25 Feb 2021 12:13:59 -0500 Alan wrote: > Bitcoin addresses start with either 1 or 3. Most do, but around 13% of those reported to the bitcoin abuse database are in the format starting with "bc". > It's less general specifically to avoid FPs. Personally I'm weighting > this pretty high so I don't want to trigger on non-obfuscated BTC > addresses. Now I come to think of it I think we've been here before, and allowing arbitrary spaces lead to a reported FP on ordinary text. If you still meta with A4A_PORNSCAM_WORD you can afford to take some risks with the address match though. Before __BITCOIN_ID was in the core rules I had my own version for the ^[13] format that checked for mixed case and an additional digit. If those conditions are not met it's most likely an FP. It's also possible to tighten the range down to {32,33} or even {33} without losing many matches: $ for n in `jot 12 25` ; do printf "$n" ; < bitcoinlist egrep "^[13].{${n}}$" | wc -l ; done 25 0 26 0 27 0 28 0 29 3 30 1 31 4 321659 33 50290 34 8
Re: Mal formed urls
On 25 Feb 2021, at 17:14, Rick Cooper wrote: As far as I can tell the authority/path-abempty portion of a uri is optional and must begin with // but can be empty No, https://tools.ietf.org/html/rfc7230#section-2.7.1 shows the definition in ABNF, a strictly-defined syntax for strictly defining other syntaxes. The "//" part denotes a mandatory literal string, in the same way that the "http:" part is a mandatory literal string. The 'authority' and 'path-abempty' parts are distinct mandatory named components which are defined in RFC3986, the text of which states that an authority is *preceded by* '//' (as it is in the spec of the http: URI) while the ABNF definition of authority (which is usually just a 'host' component) does not include '//' at all, i.e. an authority component itself does not include the preceding '//'. Yeah, I know: pedantry. RFCs are intrinsically pedantic. Incidentally, earlier this week there was a blog post by a security firm decrying such obfuscation of URIs in phishing email as if it were a cutting edge new tactic for bypassing filters. It is neither new nor does it fool any decent filters. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Not Currently Available For Hire
RE: Mal formed urls
Bill Cole wrote: > On 25 Feb 2021, at 13:37, Rick Cooper wrote: > >> I was just working on some rules to catch the current crop of mal >> formed urls used to escape detection by solutions that extract urls >> from emails and compare them to known bad urls and I am wondering if >> spamassassin's patterns for extraction take this into account? >> >> For instance: >> >> https:www.google.com/mail >> https:\/www.google.com/mail >> https:\\www.google.com/mail >> >> Will all work at getting you to gmail because the technical spec >> doesn't actually require \\ after the colon. > > Of course not: A http: URI must NOT contain '\\' after the colon, it > MUST contain '//' after the colon. See Sorry , the \\ is a type since that would be the beginning of a unc path for a windows box. As far as I can tell the authority/path-abempty portion of a uri is optional and must begin with // but can be empty Hence https:www.google.com or https:\/www.google.com/. I have noticed every browser I tested it with normalizes it back to the conventional //. But my question was, given this is apparently an issue with some solutions parsing of uris does SA extract them and as both you and John pointed out it does so I am happy > https://tools.ietf.org/html/rfc7230#section-2.7.1 which is the > technical spec for the formal syntax of a http URI. OTOH, there are > URI schemes which do not include '//' (e.g. mailto:) so any tool that > is doing broad URI detection can't be too picky. > > What flavors of garbage almost-URIs will work in a browser very much > depends on the whims of browser developers, and whether those are > 'clickable' in your preferred MUA is dependent on the gullibility of > your MUA author. > > SpamAssassin traditionally has assumed that there will always be some > MUA and browser authors who lack any sense of caution or prudence, so > SA is VERY loose with what it will consider as maybe being a hostname > in something that could be a URI in some obscure or novel scheme. > >> Will spamassassin still extract and normalize the urls above? > > Yes, it will see all 3 as the same canonicalized URI. > >> I was hoping >> to avoid digging through the source to find out. > > No need to dig though the source, you can see what URIs SpamAssassin > detects (trimmed of the parts after the hostname) in a message by > manually testing it with 'spamassassin -D uri' Note that SA will only > show one instance of otherwise identical URIs after trimming and > canonicalization.
Re: Mal formed urls
On 25 Feb 2021, at 13:37, Rick Cooper wrote: I was just working on some rules to catch the current crop of mal formed urls used to escape detection by solutions that extract urls from emails and compare them to known bad urls and I am wondering if spamassassin's patterns for extraction take this into account? For instance: https:www.google.com/mail https:\/www.google.com/mail https:\\www.google.com/mail Will all work at getting you to gmail because the technical spec doesn't actually require \\ after the colon. Of course not: A http: URI must NOT contain '\\' after the colon, it MUST contain '//' after the colon. See https://tools.ietf.org/html/rfc7230#section-2.7.1 which is the technical spec for the formal syntax of a http URI. OTOH, there are URI schemes which do not include '//' (e.g. mailto:) so any tool that is doing broad URI detection can't be too picky. What flavors of garbage almost-URIs will work in a browser very much depends on the whims of browser developers, and whether those are 'clickable' in your preferred MUA is dependent on the gullibility of your MUA author. SpamAssassin traditionally has assumed that there will always be some MUA and browser authors who lack any sense of caution or prudence, so SA is VERY loose with what it will consider as maybe being a hostname in something that could be a URI in some obscure or novel scheme. Will spamassassin still extract and normalize the urls above? Yes, it will see all 3 as the same canonicalized URI. I was hoping to avoid digging through the source to find out. No need to dig though the source, you can see what URIs SpamAssassin detects (trimmed of the parts after the hostname) in a message by manually testing it with 'spamassassin -D uri' Note that SA will only show one instance of otherwise identical URIs after trimming and canonicalization. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Not Currently Available For Hire
Re: Mal formed urls
On Thu, 25 Feb 2021, Rick Cooper wrote: I was just working on some rules to catch the current crop of mal formed urls used to escape detection by solutions that extract urls from emails and compare them to known bad urls and I am wondering if spamassassin's patterns for extraction take this into account? For instance: https:www.google.com/mail https:\/www.google.com/mail https:\\www.google.com/mail Will all work at getting you to gmail because the technical spec doesn't actually require \\ after the colon. Will spamassassin still extract and normalize the urls above? I was hoping to avoid digging through the source to find out. Yes, all of those do get detected and normalized. http:fnord01.com/blah http:\/fnord02.com/blah http:/\fnord03.com/blah http:\\fnord04.com/blah Feb 25 13:24:03.445 [13854] dbg: rules: ran uri rule __ALL_URI ==> got hit: "http://fnord03.com/blah"; Feb 25 13:24:03.446 [13854] dbg: rules: ran uri rule __ALL_URI ==> got hit: "http://fnord02.com/blah"; Feb 25 13:24:03.447 [13854] dbg: rules: ran uri rule __ALL_URI ==> got hit: "http://fnord01.com/blah"; Feb 25 13:24:03.447 [13854] dbg: rules: ran uri rule __ALL_URI ==> got hit: "http://fnord04.com/blah"; -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.org pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Are you a mildly tech-literate politico horrified by the level of ignorance demonstrated by lawmakers gearing up to regulate online technology they don't even begin to grasp? Cool. Now you have a tiny glimpse into a day in the life of a gun owner. -- Sean Davis --- 271 days since the first private commercial manned orbital mission (SpaceX)
Mal formed urls
I was just working on some rules to catch the current crop of mal formed urls used to escape detection by solutions that extract urls from emails and compare them to known bad urls and I am wondering if spamassassin's patterns for extraction take this into account? For instance: https:www.google.com/mail https:\/www.google.com/mail https:\\www.google.com/mail Will all work at getting you to gmail because the technical spec doesn't actually require \\ after the colon. Will spamassassin still extract and normalize the urls above? I was hoping to avoid digging through the source to find out. Rick
Re: Rules for a recent flood of BTC/webcam spam
On 2021-02-25 10:54, John Hardin wrote: On Thu, 25 Feb 2021, RW wrote: On Wed, 24 Feb 2021 18:37:42 -0800 (PST) John Hardin wrote: On Wed, 24 Feb 2021, Alan wrote: After a little more research, a better regex for an obfuscated BTC address is /[13][ \-]([a-km-zA-HJ-NP-Z0-9][ \-]){25,32}[a-km-zA-HJ-NP-Z0-9]/ It might be worth adding = and _ to the obfuscating delimiters. YMMV. I've updated __BITCOIN_ID with -, = and _ obfuscations, which I haven't seen myself yet. Thanks! Possibly (?:[-_=\s][a-km-zA-HJ-NP-Z1-9]){25,34}|[a-km-zA-HJ-NP-Z1-9]{25,34}) should be (?:[-_=\s]*[a-km-zA-HJ-NP-Z1-9]){25,34} It's shorter and more general. I'd prefer: (?:[-_=\s]?[a-km-zA-HJ-NP-Z1-9]){25,34} The reason I haven't is I have not seen a mixture yet - it's either all spaced or not at all. I'll take a look at that tonight when I have some time. The more loose you get with matching obfuscation the greater the chance of false positives. Consider, for example, the PGP key in my .sig (which has a zero, but I'd wager there are PGP key signatures that look like obfuscated bitcoin wallet addresses...) Also, there's a limit to how complex the obfuscation can get before the recipient can't (or won't) follow the instructions. Bitcoin addresses start with either 1 or 3. It's less general specifically to avoid FPs. Personally I'm weighting this pretty high so I don't want to trigger on non-obfuscated BTC addresses. So far, all of my targets send a plain text version so "just a space" has been working. All that said, another potential obfuscation would be a period. I'm going to add that. -- For SpamAsassin Users List
Re: Rules for a recent flood of BTC/webcam spam
On Thu, 25 Feb 2021, RW wrote: On Wed, 24 Feb 2021 18:37:42 -0800 (PST) John Hardin wrote: On Wed, 24 Feb 2021, Alan wrote: After a little more research, a better regex for an obfuscated BTC address is /[13][ \-]([a-km-zA-HJ-NP-Z0-9][ \-]){25,32}[a-km-zA-HJ-NP-Z0-9]/ It might be worth adding = and _ to the obfuscating delimiters. YMMV. I've updated __BITCOIN_ID with -, = and _ obfuscations, which I haven't seen myself yet. Thanks! Possibly (?:[-_=\s][a-km-zA-HJ-NP-Z1-9]){25,34}|[a-km-zA-HJ-NP-Z1-9]{25,34}) should be (?:[-_=\s]*[a-km-zA-HJ-NP-Z1-9]){25,34} It's shorter and more general. I'd prefer: (?:[-_=\s]?[a-km-zA-HJ-NP-Z1-9]){25,34} The reason I haven't is I have not seen a mixture yet - it's either all spaced or not at all. I'll take a look at that tonight when I have some time. The more loose you get with matching obfuscation the greater the chance of false positives. Consider, for example, the PGP key in my .sig (which has a zero, but I'd wager there are PGP key signatures that look like obfuscated bitcoin wallet addresses...) Also, there's a limit to how complex the obfuscation can get before the recipient can't (or won't) follow the instructions. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.org pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Where are my space habitats? Where is my flying car? It's 2010 and all I got from the SF books of my youth is the lousy dystopian government. -- perlhaqr --- 271 days since the first private commercial manned orbital mission (SpaceX)
Re: Trouble with XM_RANDOM rule
On Thu, 25 Feb 2021, Jared Hall wrote: On 2/24/2021 9:43 PM, John Hardin wrote: The __XM_RANDOM header rule is intended to catch the specific condition of the email, the scored XM_RANDOM meta is intended to add points for when that condition indicates spam. Ouch, I figured as much. With a name like XM_RANDOM, it's gotta be good :) I recall about 10 years ago getting floods with (pseudo)random (eg: qxvfdgeexcfffdf, etc) type mailers. I was just wondering if this was artifactual. It's current. Somebody decided to send a large spam campaign using forged sender addresses in my wife's domain, so I got a lot of NDA bounces with spam content I don't usually see. There were a lot of random gibberish mailers, as well as some that look plausible at a glance but suspicious upon further consideration. I got a bunch of new rules off that so I'm not complaining too hard. I don't know if you Guys (pc: and Gals) keep notes when each rule gets developed and what not. But that's not really a question for this list, so No Big Deal. For myself, not beyond the SVN history. I've been scanning all outbound Email for 3-1/2 years now. I scan at the SMTP level, with no discernible performance hit. It certainly has saved my butt on a few occasions. Now I *opine* this: There is something to the ZERO-TRUST security model. Hm, yeah. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.org pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Where are my space habitats? Where is my flying car? It's 2010 and all I got from the SF books of my youth is the lousy dystopian government. -- perlhaqr --- 271 days since the first private commercial manned orbital mission (SpaceX)
Re: Rules for a recent flood of BTC/webcam spam
On Wed, 24 Feb 2021 18:37:42 -0800 (PST) John Hardin wrote: > On Wed, 24 Feb 2021, Alan wrote: > > > After a little more research, a better regex for an obfuscated BTC > > address is > > > > /[13][ \-]([a-km-zA-HJ-NP-Z0-9][ \-]){25,32}[a-km-zA-HJ-NP-Z0-9]/ > > > > It might be worth adding = and _ to the obfuscating delimiters. > > YMMV. > > I've updated __BITCOIN_ID with -, = and _ obfuscations, which I > haven't seen myself yet. > > Thanks! > Possibly (?:[-_=\s][a-km-zA-HJ-NP-Z1-9]){25,34}|[a-km-zA-HJ-NP-Z1-9]{25,34}) should be (?:[-_=\s]*[a-km-zA-HJ-NP-Z1-9]){25,34} It's shorter and more general.