Re: check utf-8 subjects/from?
On Wed, 13 Dec 2017, Alex wrote: We've been seeing a number of emails with subjects using UTF-8 in an attempt to obscure the sender by using some form of 8-bit characters. For example, this spells dropbox: From: "=?utf-8?B?xJByb3Bib8+X?=" How would we write a header rule against that? Just use From:raw? Is it possible to write a rule using the decoded characters, like "dróp-bóx" or "Dṙopḇoẋ"? I've also tried variations of "dropbox" such as "dr?pb?x" etc... There are already obfuscated-text rules, and the subject is incorporated in the body text so they would scan that. Take a look at the existing FUZZY_* rules. Possibly (untested): body FUZZY_DROPBOX /(?!ropbox)/i replace_rules FUZZY_DROPBOX describe FUZZY_DROPBOX Obfuscated "dropbox" -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Activist: Someone who gets involved. Unregistered Lobbyist: Someone who gets involved with something the MSM doesn't approve of. -- WizardPC --- Tomorrow: Bill of Rights day
Re: check utf-8 subjects/from?
On 12/13/2017 6:58 PM, Reindl Harald wrote: > There seems to be a large disparity between your (10%) result and my > (2%) result. Can you explain how that could be? surely, from the moment you have not only english messages it looks completly different and don't forget that the corpus where i run the quick grep is only a very low subset of real mailflow for training as ham when needed I'm not sure I understand what you are saying now. Are you saying you ran a flawed/inaccurate test but sent the results anyway in order to make a point that no one asked you about? Or are you saying that every mail environment is (necessarily) different, and whatever your opinion and results in your local environment are, they may not be applicable to another environment in another country, so you probably should not make your assumptions and opinions sound like facts? In my OPINION, the aforementioned rule that I will test is likely NOT a good candidate for many environments - but I never promoted it as such in the first place. Apologies to all whose inboxes were cluttered with this tangent.
Re: check utf-8 subjects/from?
>> Hi, >> >> On Wed, Dec 13, 2017 at 9:08 PM, David B Funk >> wrote: >> > On Wed, 13 Dec 2017, AJ Weber wrote: >> > >> >> Is there an easy way to check if the Subject or From is UTF-8 -- or >> >> non-ASCII -- char set? >> >> >> >> I see in some of my recent spam, either the Subject or the From (sometimes >> >> both) starts with "=?UTF-8?" (in these cases the rest is Base64 encoded, >> >> but >> >> I don't want to qualify on that). >> >> >> >> If I check a header with a "header ... =~" regex rule, is it the raw text >> >> that I will check, or is it the decoded characters I will be checking >> >> against? >> >> >> >> If it's the raw text, I can probably just look for that prefix to indicate >> >> the UTF-8 encoding. >> >> >> >> I do get some legitimate emails with encoded chars and emojis, etc...but I >> >> think I'd like a rule to support it being SPAM in general. >> > >> > >> > As other people have said, the header ":raw" rule form will let you match >> > on >> > that. >> > There are two commonly used encoding methods for UTF-8: >> > Base64 "=?utf-8?B?" >> > Quoted-Printable "=?utf-8?Q?" >> > >> > There's nothing that prevents a mailer from using either for purely 7-bit >> > ASCII, >> > even though it isn't necessary. You are more likely to see that used by >> > international clients. They may just utf-8 encode by default so not to have >> > to do special processing for non 7-bit ASCII headers. >> >> We've been seeing a number of emails with subjects using UTF-8 in an >> attempt to obscure the sender by using some form of 8-bit characters. >> For example, this spells dropbox: >> >> From: "=?utf-8?B?xJByb3Bib8+X?=" >> >> How would we write a header rule against that? Just use From:raw? >> >> Is it possible to write a rule using the decoded characters, like >> "dr�p-b�x" or "D?op?o?"? >> >> I've also tried variations of "dropbox" such as "dr?pb?x" etc... Hi Alex, as I live in Germany, I also see nothing special in encoded utf-8 ... Just use the decoded From line rather than the raw version. One thing that certainly is worth detecting is a plain name part containing a different email. (I am not sure if such a rule already exists) Now for your example, you would probably have to write rules with the purported sender's spelling variations and a meta in case the _real_ name and a valid email is detected. Regards Wolfgang
Re: check utf-8 subjects/from?
Hi, On Wed, Dec 13, 2017 at 9:08 PM, David B Funk wrote: > On Wed, 13 Dec 2017, AJ Weber wrote: > >> Is there an easy way to check if the Subject or From is UTF-8 -- or >> non-ASCII -- char set? >> >> I see in some of my recent spam, either the Subject or the From (sometimes >> both) starts with "=?UTF-8?" (in these cases the rest is Base64 encoded, but >> I don't want to qualify on that). >> >> If I check a header with a "header ... =~" regex rule, is it the raw text >> that I will check, or is it the decoded characters I will be checking >> against? >> >> If it's the raw text, I can probably just look for that prefix to indicate >> the UTF-8 encoding. >> >> I do get some legitimate emails with encoded chars and emojis, etc...but I >> think I'd like a rule to support it being SPAM in general. > > > As other people have said, the header ":raw" rule form will let you match on > that. > There are two commonly used encoding methods for UTF-8: > Base64 "=?utf-8?B?" > Quoted-Printable "=?utf-8?Q?" > > There's nothing that prevents a mailer from using either for purely 7-bit > ASCII, > even though it isn't necessary. You are more likely to see that used by > international clients. They may just utf-8 encode by default so not to have > to do special processing for non 7-bit ASCII headers. We've been seeing a number of emails with subjects using UTF-8 in an attempt to obscure the sender by using some form of 8-bit characters. For example, this spells dropbox: From: "=?utf-8?B?xJByb3Bib8+X?=" How would we write a header rule against that? Just use From:raw? Is it possible to write a rule using the decoded characters, like "dróp-bóx" or "Dṙopḇoẋ"? I've also tried variations of "dropbox" such as "dr?pb?x" etc...
Re: check utf-8 subjects/from?
On 13 Dec 2017, at 21:08 (-0500), David B Funk wrote: [...] There's nothing that prevents a mailer from using either for purely 7-bit ASCII, even though it isn't necessary. You are more likely to see that used by international clients. They may just utf-8 encode by default so not to have to do special processing for non 7-bit ASCII headers. There's even a SA rule for that: FROM_EXCESS_BASE64 -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: check utf-8 subjects/from?
On Wed, 13 Dec 2017, AJ Weber wrote: Is there an easy way to check if the Subject or From is UTF-8 -- or non-ASCII -- char set? I see in some of my recent spam, either the Subject or the From (sometimes both) starts with "=?UTF-8?" (in these cases the rest is Base64 encoded, but I don't want to qualify on that). If I check a header with a "header ... =~" regex rule, is it the raw text that I will check, or is it the decoded characters I will be checking against? If it's the raw text, I can probably just look for that prefix to indicate the UTF-8 encoding. I do get some legitimate emails with encoded chars and emojis, etc...but I think I'd like a rule to support it being SPAM in general. As other people have said, the header ":raw" rule form will let you match on that. There are two commonly used encoding methods for UTF-8: Base64 "=?utf-8?B?" Quoted-Printable "=?utf-8?Q?" There's nothing that prevents a mailer from using either for purely 7-bit ASCII, even though it isn't necessary. You are more likely to see that used by international clients. They may just utf-8 encode by default so not to have to do special processing for non 7-bit ASCII headers. -- Dave Funk University of Iowa College of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527 #include Better is not better, 'standard' is better. B{
Re: check utf-8 subjects/from?
On Wed, 13 Dec 2017 18:37:59 -0500 AJ Weber wrote: > >>> > >>> that tells me that rougly 10% of all ham mails would hit > There seems to be a large disparity between your (10%) result and my > (2%) result. Can you explain how that could be? He's Austrian, so it's probably mainly due to umlauts.
Re: check utf-8 subjects/from?
On 12/13/2017 5:18 PM, Reindl Harald wrote: my statements are based on a decade expierinece with a lot of users from all over the world, on you personal server you can even reject anything not whitelisted, from the moment on when other peoples mailflow is affected it's no longer that easy It's true. At first I noticed a pattern and decided to look-into how I could write a rule, probably starting with a low score, to test its effectiveness. However, I ran your test to determine how many emails it would actually affect. In a folder of just over 5100 emails, there would be < 2% false-positives. That's actually better than I expected! If you offered me a rule that only anticipated 2% false positives to try, I would say it was worth it for sure! this would be a rule with a majority of false positives you really should also look at your HAM I didn't see the basis for your "majority" of false positives. Did you run your test against a spam folder as well? What were the results there? cat *.eml | grep UTF-8 | grep -i subject | wc -l 2150 that tells me that rougly 10% of all ham mails would hit There seems to be a large disparity between your (10%) result and my (2%) result. Can you explain how that could be? Thank you again!
Re: check utf-8 subjects/from?
Would you be so kind as to tell me how you hacked into my mail server to determine the basis for your statements? On 12/13/2017 4:52 PM, Reindl Harald wrote: Am 13.12.2017 um 19:44 schrieb AJ Weber: Is there an easy way to check if the Subject or From is UTF-8 -- or non-ASCII -- char set? I see in some of my recent spam, either the Subject or the From (sometimes both) starts with "=?UTF-8?" (in these cases the rest is Base64 encoded, but I don't want to qualify on that). If I check a header with a "header ... =~" regex rule, is it the raw text that I will check, or is it the decoded characters I will be checking against? If it's the raw text, I can probably just look for that prefix to indicate the UTF-8 encoding. I do get some legitimate emails with encoded chars and emojis, etc...but I think I'd like a rule to support it being SPAM in general based on what? this would be a rule with a majority of false positives you really should also look at your HAM cat *.eml | grep UTF-8 | grep -i subject | wc -l 2150 that tells me that rougly 10% of all ham mails would hit
Re: check utf-8 subjects/from?
On Wed, 13 Dec 2017 13:44:49 -0500 AJ Weber wrote: > If I check a header with a "header ... =~" regex rule, is it the raw > text that I will check, or is it the decoded characters I will be > checking against? You can use From:raw to get the raw From header. BTW if you want to ask a new question, please just send an email to the list address rather than reply to an existing thread.
check utf-8 subjects/from?
Is there an easy way to check if the Subject or From is UTF-8 -- or non-ASCII -- char set? I see in some of my recent spam, either the Subject or the From (sometimes both) starts with "=?UTF-8?" (in these cases the rest is Base64 encoded, but I don't want to qualify on that). If I check a header with a "header ... =~" regex rule, is it the raw text that I will check, or is it the decoded characters I will be checking against? If it's the raw text, I can probably just look for that prefix to indicate the UTF-8 encoding. I do get some legitimate emails with encoded chars and emojis, etc...but I think I'd like a rule to support it being SPAM in general. Thanks again, AJ