Re: Help matching a spam (regex)
On Tue, 4 Jun 2019, Marcio Vogel Merlone dos Santos wrote: Hi all, Trying to match a message using uri_detail with no luck. On body I have something like this: Something → That "something" is changed on a daily basis, so I am trying to match the → which is common to all variations, and failing miserably. I have tried the obvious and some (desperate) variations: uri_detail A1_URI_FAKE_LINK text =~ /→/i uri_detail A1_URI_FAKE_LINK text =~ /.rarr;/i uri_detail A1_URI_FAKE_LINK text =~ /.rarr./i uri_detail A1_URI_FAKE_LINK text =~ /rarr/i What have I missed? Thanks for any enlightenment, RTFM. This may help to figure it out in debug mode: uri_detail __ALL_URI_DTL_TXT text =~ /.*/ tflags __ALL_URI_DTL_TXTmultiple You *should* be able to see exactly what is there - the HTML token or a UTF-8 byte sequence. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- 2 days until the 75th anniversary of D-Day
Re: Help matching a spam (regex)
On Jun 4, 2019, at 4:05 PM, RW wrote: > > On Tue, 4 Jun 2019 16:06:10 -0300 Marcio Vogel Merlone dos Santos wrote: > >> Trying to match a message using uri_detail with no luck. On body I >> have something like this: >> >> Something → > &rarr represents a '→' (right arrow) character, IIWY I'd try its > UTF-8 byte sequence: > > \xe2\x86\x92 Correct me if I'm wrong, but aren't the HTML entities converted to unicode as part of localize_charset? In that case, uri_detail would have to be done in unicode as RW suggests... matching &rarr would require a rawbody rule, right? --- Amir
Re: Help matching a spam (regex)
On Tue, 4 Jun 2019 16:06:10 -0300 Marcio Vogel Merlone dos Santos wrote: > Hi all, > > Trying to match a message using uri_detail with no luck. On body I > have something like this: > > Something → > > That "something" is changed on a daily basis, so I am trying to match > the → which is common to all variations, and failing miserably. > I have tried the obvious and some (desperate) variations: > > uri_detail A1_URI_FAKE_LINK text =~ /→/i > > uri_detail A1_URI_FAKE_LINK text =~ /.rarr;/i > > uri_detail A1_URI_FAKE_LINK text =~ /.rarr./i > > uri_detail A1_URI_FAKE_LINK text =~ /rarr/i > > What have I missed? Thanks for any enlightenment, RTFM. > &rarr represents a '→' (right arrow) character, IIWY I'd try its UTF-8 byte sequence: \xe2\x86\x92
Re: Meta for bogus MIME with DKIM valid?
On Jun 4, 2019, at 1:24 PM, Paul Stead wrote: > > Certainly worth letting QA do it's thing and autoscore? My worry about autoscore is that if it looks at network tests, particularly RBLs, then it may reduce the value of the rule. The primary value of this rule is for early botnet runs before the relays and/or URIs are caught by the RBLs, and for content that doesn't hit any/many other rules (such as all of the spamples I posted). After only a few minutes, the RBLs pick up these runs and the rule becomes relatively less important when considering the network tests... but it's a REALLY good spamminess indicator in isolation. (The same argument applies with/without Bayes.) So, if autoscore gives it a high value without network/bayes tests but a low value with network/bayes tests, then my strong recommendation would be to give it a single atomic score rather than network/non-network scoreset. Locally, I've got the score at 4.0, and will be increasing it to 4.5 shortly. At least with my spamset (per the spamples I posted), a score of 4.5 seems to be the "magic" value that should catch almost all the FNs (at least the ones that hit BAYES_50 ... the ones that hit BAYES_00 might require more aggression). Cheers. --- Amir
Re: Meta for bogus MIME with DKIM valid?
The rules looks to be performing better in masscheck after the updates to the corpus checking: https://ruleqa.spamassassin.org/20190604-r1860591-n/__BOGUS_MIME_VER_01/detail https://ruleqa.spamassassin.org/20190604-r1860591-n/__BOGUS_MIME_VER_02/detail Certainly worth letting QA do it's thing and autoscore? On Tue, 4 Jun 2019 at 02:10, Amir Caspi wrote: > Hi Kevin, > > Here are some spamples -- I've specifically chosen the ones that did NOT > score enough through other means to get tagged, i.e., these are false > negatives. Note that many of them have valid DKIM and hit no other > markers. (The spample will NOT pass DKIM because headers have been > modified for anonymity.) If you run them through NOW you'll probably find > they hit Razor and Pyzor and various other things... but they clearly > didn't at the time of receipt. Most of them score 4.6 unless they manage > to have enough Bayes "poison" to score lower. (And I STILL don't know how > they keep hitting only BAYES_50...) > > https://pastebin.com/BQH3JgWD > https://pastebin.com/nXtZtUdm > https://pastebin.com/tBQt1Raw > https://pastebin.com/wEGvcs73 > https://pastebin.com/nuFJ48k0 > https://pastebin.com/ykCuEPNQ > ** This last one I received from two different servers within a minute of > each other. The first one got nailed by SPFBL so it got marked as spam, > but only because the combo of SPFBL (2.2) and local BOGUS_MIME_VERSION > (4.0) pushed it over threshold. This spample, the second of the two, > didn't get nailed because the relay wasn't in SPFBL, so BOGUS_MIME_VERSION > wasn't enough by itself at a score of 4.0, although it WOULD have been > enough at a score of 4.5. > > I should also mention I've seen at least a few recent ones that hit > Mailscanner's "Eudora long-MIME-boundary attack" rule. I'm not including > those as spamples since they got sanitized by MailScanner so aren't useful, > but I figured it was worth mentioning. > > My feeling is that BOGUS_MIME_VERSION is incredibly useful during the > early hits of snowshoers, before the RBLs, URIBLs, and content hash DBs can > catch up. Since it would seem to be 100% spam and 0% ham, I think scoring > it very highly (4+ points) would be both safe and useful -- it will help > nix these early hits but won't hinder anything else. > > From my experience and these spamples, where most of them are scoring 4.6 > (with 4.0 of that from BOGUS_MIME_VERSION), an optimal score would be in > the range of 4.5 to 4.9 ... that would push these 4.6s to 5.1 or higher. > > I've got MANY other examples in the Junk folders on my server, and I would > be happy to send them to you privately if needed. > > Cheers. > > --- Amir > > On May 30, 2019, at 9:24 AM, Kevin A. McGrail wrote: > > > Fair enough. Happy to look at spamples but I've seen virtually nothing in > the wild for this. > > >
Help matching a spam (regex)
Hi all, Trying to match a message using uri_detail with no luck. On body I have something like this: Something → That "something" is changed on a daily basis, so I am trying to match the → which is common to all variations, and failing miserably. I have tried the obvious and some (desperate) variations: uri_detail A1_URI_FAKE_LINK text =~ /→/i uri_detail A1_URI_FAKE_LINK text =~ /.rarr;/i uri_detail A1_URI_FAKE_LINK text =~ /.rarr./i uri_detail A1_URI_FAKE_LINK text =~ /rarr/i What have I missed? Thanks for any enlightenment, RTFM. Best regards. -- *Marcio Merlone*
Re: MISSING_SUBJECT rule on email with subject
On Tue, 4 Jun 2019 18:10:51 +0300 Savvas Karagiannidis wrote: > Hi, > > my guess is that for some reason an empty line is inserted in the > email somewhere above the headers and before the message is processed > by spamassassin. This will cause all headers below this empty line to > be treated as the actual body of the message, so all missing header > tests will hit and will result in what you actually see. But as has already been pointed out it has the combination of MISSING_FROM and HK_RANDOM_FROM, and the latter is based on a From:addr test.
Re: MISSING_SUBJECT rule on email with subject
Hi, my guess is that for some reason an empty line is inserted in the email somewhere above the headers and before the message is processed by spamassassin. This will cause all headers below this empty line to be treated as the actual body of the message, so all missing header tests will hit and will result in what you actually see. This could be a bug in the software you use for email content filtering... Regards, Savvas Karagiannidis On 04/06/2019 17:29, Stephan Fourie wrote: Hi, My apologies, seems something went wrong with the formatting when it was pasted to the pastebin. Here's a new example with spacing intact: https://pastebin.com/raw/tQtSMQPs In this example some of the other headers were also not 'seen'. Thanks! Stephan On 2019/06/04 10:55, Matus UHLAR - fantomas wrote: On 3 Jun 2019, at 2:20, Stephan Fourie wrote: > We're currently seeing the rule MISSING_SUBJECT sporadically > hitting on emails that have a subject. This issue seems to have > started during last week, which is when clients started complaining > about false positive detections. Please see example headers at the > following link: > > https://pastebin.com/raw/GtnV67Hj On Mon, 03 Jun 2019 11:43:44 -0400 Bill Cole wrote: The headers are all missing the traditional space between the colon and the header content. On 03.06.19 19:11, RW wrote: And this include google headers, so presumably the spaces have been stripped locally. now one question is, if the spaces have been stripped prior to spam checking, another is, if SA does/should expect whitespaces after header fields. if the first answer is true, then SA can't do much about misformatted e-mail. But since FROM_AND_TO_IS_SAME_DOMAIN was hit, I don't think the spaces were stripped, so - we need to see the original message as it was scanned. Anything else, reformated by anyone (e.g. outlook or exchange use to reformat mail), can't help us much finding the issue.
Re: MISSING_SUBJECT rule on email with subject
On 04.06.19 16:29, Stephan Fourie wrote: My apologies, seems something went wrong with the formatting when it was pasted to the pastebin. Here's a new example with spacing intact: https://pastebin.com/raw/tQtSMQPs In this example some of the other headers were also not 'seen'. there's something strange: 1.0 HK_RANDOM_FROM From username looks random 0.5 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (x[at]gmail.com) 1.0 MISSING_FROM Missing From: header 1.8 MISSING_SUBJECTMissing Subject: header so the spam scanner both did and did not see the From: header. What do you use for mail scanning? On 2019/06/04 10:55, Matus UHLAR - fantomas wrote: On 3 Jun 2019, at 2:20, Stephan Fourie wrote: We're currently seeing the rule MISSING_SUBJECT sporadically hitting on emails that have a subject. This issue seems to have started during last week, which is when clients started complaining about false positive detections. Please see example headers at the following link: https://pastebin.com/raw/GtnV67Hj On Mon, 03 Jun 2019 11:43:44 -0400 Bill Cole wrote: The headers are all missing the traditional space between the colon and the header content. On 03.06.19 19:11, RW wrote: And this include google headers, so presumably the spaces have been stripped locally. now one question is, if the spaces have been stripped prior to spam checking, another is, if SA does/should expect whitespaces after header fields. if the first answer is true, then SA can't do much about misformatted e-mail. But since FROM_AND_TO_IS_SAME_DOMAIN was hit, I don't think the spaces were stripped, so - we need to see the original message as it was scanned. Anything else, reformated by anyone (e.g. outlook or exchange use to reformat mail), can't help us much finding the issue. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Microsoft dick is soft to do no harm
Re: MISSING_SUBJECT rule on email with subject
Hi, My apologies, seems something went wrong with the formatting when it was pasted to the pastebin. Here's a new example with spacing intact: https://pastebin.com/raw/tQtSMQPs In this example some of the other headers were also not 'seen'. Thanks! Stephan On 2019/06/04 10:55, Matus UHLAR - fantomas wrote: On 3 Jun 2019, at 2:20, Stephan Fourie wrote: > We're currently seeing the rule MISSING_SUBJECT sporadically > hitting on emails that have a subject. This issue seems to have > started during last week, which is when clients started complaining > about false positive detections. Please see example headers at the > following link: > > https://pastebin.com/raw/GtnV67Hj On Mon, 03 Jun 2019 11:43:44 -0400 Bill Cole wrote: The headers are all missing the traditional space between the colon and the header content. On 03.06.19 19:11, RW wrote: And this include google headers, so presumably the spaces have been stripped locally. now one question is, if the spaces have been stripped prior to spam checking, another is, if SA does/should expect whitespaces after header fields. if the first answer is true, then SA can't do much about misformatted e-mail. But since FROM_AND_TO_IS_SAME_DOMAIN was hit, I don't think the spaces were stripped, so - we need to see the original message as it was scanned. Anything else, reformated by anyone (e.g. outlook or exchange use to reformat mail), can't help us much finding the issue.
Re: A new url shortener not in __URL_SHORTENER?
+1. If it is a shortener, we should add it. On Tue, Jun 4, 2019, 03:57 hg user wrote: > > Hi, > I noticed spam using ccuz url shortener in an italian spam > advertising a sex site. > I was wondering if it would be good to be added to __URL_SHORTENER or not. > In this specific case it won't help to score higher but who knows in the > future? > > > > > >
Re: MISSING_SUBJECT rule on email with subject
On 3 Jun 2019, at 2:20, Stephan Fourie wrote: > We're currently seeing the rule MISSING_SUBJECT sporadically > hitting on emails that have a subject. This issue seems to have > started during last week, which is when clients started complaining > about false positive detections. Please see example headers at the > following link: > > https://pastebin.com/raw/GtnV67Hj On Mon, 03 Jun 2019 11:43:44 -0400 Bill Cole wrote: The headers are all missing the traditional space between the colon and the header content. On 03.06.19 19:11, RW wrote: And this include google headers, so presumably the spaces have been stripped locally. now one question is, if the spaces have been stripped prior to spam checking, another is, if SA does/should expect whitespaces after header fields. if the first answer is true, then SA can't do much about misformatted e-mail. But since FROM_AND_TO_IS_SAME_DOMAIN was hit, I don't think the spaces were stripped, so - we need to see the original message as it was scanned. Anything else, reformated by anyone (e.g. outlook or exchange use to reformat mail), can't help us much finding the issue. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Linux IS user friendly, it's just selective who its friends are...
A new url shortener not in __URL_SHORTENER?
Hi, I noticed spam using ccuz url shortener in an italian spam advertising a sex site. I was wondering if it would be good to be added to __URL_SHORTENER or not. In this specific case it won't help to score higher but who knows in the future?