Re: check utf-8 subjects/from?

2017-12-14 Thread John Hardin

On Wed, 13 Dec 2017, Alex wrote:


We've been seeing a number of emails with subjects using UTF-8 in an
attempt to obscure the sender by using some form of 8-bit characters.
For example, this spells dropbox:

 From: "=?utf-8?B?xJByb3Bib8+X?=" 

How would we write a header rule against that? Just use From:raw?

Is it possible to write a rule using the decoded characters, like
"dróp-bóx" or "Dṙopḇoẋ"?

I've also tried variations of "dropbox" such as "dr?pb?x" etc...


There are already obfuscated-text rules, and the subject is incorporated 
in the body text so they would scan that.


Take a look at the existing FUZZY_* rules.

Possibly (untested):

body  FUZZY_DROPBOX  /(?!ropbox)/i
replace_rules FUZZY_DROPBOX
describe  FUZZY_DROPBOX  Obfuscated "dropbox"



--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Activist: Someone who gets involved.
  Unregistered Lobbyist: Someone who gets involved
   with something the MSM doesn't approve of. -- WizardPC
---
 Tomorrow: Bill of Rights day

Re: check utf-8 subjects/from?

2017-12-14 Thread AJ Weber

On 12/13/2017 6:58 PM, Reindl Harald wrote:

> There seems to be a large disparity between your (10%) result and my
> (2%) result.  Can you explain how that could be?

surely, from the moment you have not only english messages it looks 
completly different and don't forget that the corpus where i run the 
quick grep is only a very low subset of real mailflow for training as 
ham when needed



I'm not sure I understand what you are saying now.

Are you saying you ran a flawed/inaccurate test but sent the results 
anyway in order to make a point that no one asked you about?


Or are you saying that every mail environment is (necessarily) 
different, and whatever your opinion and results in your local 
environment are, they may not be applicable to another environment in 
another country, so you probably should not make your assumptions and 
opinions sound like facts?


In my OPINION, the aforementioned rule that I will test is likely NOT a 
good candidate for many environments - but I never promoted it as such 
in the first place.


Apologies to all whose inboxes were cluttered with this tangent.


Re: check utf-8 subjects/from?

2017-12-14 Thread hamann . w
>> Hi,
>> 
>> On Wed, Dec 13, 2017 at 9:08 PM, David B Funk
>>  wrote:
>> > On Wed, 13 Dec 2017, AJ Weber wrote:
>> >
>> >> Is there an easy way to check if the Subject or From is UTF-8 -- or
>> >> non-ASCII -- char set?
>> >>
>> >> I see in some of my recent spam, either the Subject or the From (sometimes
>> >> both) starts with "=?UTF-8?" (in these cases the rest is Base64 encoded, 
>> >> but
>> >> I don't want to qualify on that).
>> >>
>> >> If I check a header with a "header ... =~" regex rule, is it the raw text
>> >> that I will check, or is it the decoded characters I will be checking
>> >> against?
>> >>
>> >> If it's the raw text, I can probably just look for that prefix to indicate
>> >> the UTF-8 encoding.
>> >>
>> >> I do get some legitimate emails with encoded chars and emojis, etc...but I
>> >> think I'd like a rule to support it being SPAM in general.
>> >
>> >
>> > As other people have said, the header ":raw" rule form will let you match 
>> > on
>> > that.
>> > There are two commonly used encoding methods for UTF-8:
>> >  Base64 "=?utf-8?B?"
>> >  Quoted-Printable "=?utf-8?Q?"
>> >
>> > There's nothing that prevents a mailer from using either for purely 7-bit
>> > ASCII,
>> > even though it isn't necessary. You are more likely to see that used by
>> > international clients. They may just utf-8 encode by default so not to have
>> > to do special processing for non 7-bit ASCII headers.
>> 
>> We've been seeing a number of emails with subjects using UTF-8 in an
>> attempt to obscure the sender by using some form of 8-bit characters.
>> For example, this spells dropbox:
>> 
>>   From: "=?utf-8?B?xJByb3Bib8+X?=" 
>> 
>> How would we write a header rule against that? Just use From:raw?
>> 
>> Is it possible to write a rule using the decoded characters, like
>> "dr�p-b�x" or "D?op?o?"?
>> 
>> I've also tried variations of "dropbox" such as "dr?pb?x" etc...

Hi Alex,

as I live in Germany, I also see nothing special in encoded utf-8 ... 
Just use the decoded From line rather than the raw version.

One thing that certainly is worth detecting is a plain name part containing a 
different email. (I am
not sure if such a rule already exists)
Now for your example, you would probably have to write rules with the purported 
sender's spelling variations
and a meta in case the _real_ name and a valid email is detected.

Regards
Wolfgang




Re: check utf-8 subjects/from?

2017-12-13 Thread Alex
Hi,

On Wed, Dec 13, 2017 at 9:08 PM, David B Funk
 wrote:
> On Wed, 13 Dec 2017, AJ Weber wrote:
>
>> Is there an easy way to check if the Subject or From is UTF-8 -- or
>> non-ASCII -- char set?
>>
>> I see in some of my recent spam, either the Subject or the From (sometimes
>> both) starts with "=?UTF-8?" (in these cases the rest is Base64 encoded, but
>> I don't want to qualify on that).
>>
>> If I check a header with a "header ... =~" regex rule, is it the raw text
>> that I will check, or is it the decoded characters I will be checking
>> against?
>>
>> If it's the raw text, I can probably just look for that prefix to indicate
>> the UTF-8 encoding.
>>
>> I do get some legitimate emails with encoded chars and emojis, etc...but I
>> think I'd like a rule to support it being SPAM in general.
>
>
> As other people have said, the header ":raw" rule form will let you match on
> that.
> There are two commonly used encoding methods for UTF-8:
>  Base64 "=?utf-8?B?"
>  Quoted-Printable "=?utf-8?Q?"
>
> There's nothing that prevents a mailer from using either for purely 7-bit
> ASCII,
> even though it isn't necessary. You are more likely to see that used by
> international clients. They may just utf-8 encode by default so not to have
> to do special processing for non 7-bit ASCII headers.

We've been seeing a number of emails with subjects using UTF-8 in an
attempt to obscure the sender by using some form of 8-bit characters.
For example, this spells dropbox:

  From: "=?utf-8?B?xJByb3Bib8+X?=" 

How would we write a header rule against that? Just use From:raw?

Is it possible to write a rule using the decoded characters, like
"dróp-bóx" or "Dṙopḇoẋ"?

I've also tried variations of "dropbox" such as "dr?pb?x" etc...


Re: check utf-8 subjects/from?

2017-12-13 Thread Bill Cole

On 13 Dec 2017, at 21:08 (-0500), David B Funk wrote:

[...]
There's nothing that prevents a mailer from using either for purely 
7-bit ASCII,
even though it isn't necessary. You are more likely to see that used 
by international clients. They may just utf-8 encode by default so not 
to have to do special processing for non 7-bit ASCII headers.


There's even a SA rule for that: FROM_EXCESS_BASE64

--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: check utf-8 subjects/from?

2017-12-13 Thread David B Funk

On Wed, 13 Dec 2017, AJ Weber wrote:

Is there an easy way to check if the Subject or From is UTF-8 -- or non-ASCII 
-- char set?


I see in some of my recent spam, either the Subject or the From (sometimes 
both) starts with "=?UTF-8?" (in these cases the rest is Base64 encoded, but 
I don't want to qualify on that).


If I check a header with a "header ... =~" regex rule, is it the raw text 
that I will check, or is it the decoded characters I will be checking 
against?


If it's the raw text, I can probably just look for that prefix to indicate 
the UTF-8 encoding.


I do get some legitimate emails with encoded chars and emojis, etc...but I 
think I'd like a rule to support it being SPAM in general.


As other people have said, the header ":raw" rule form will let you match on 
that.
There are two commonly used encoding methods for UTF-8:
 Base64 "=?utf-8?B?"
 Quoted-Printable "=?utf-8?Q?"

There's nothing that prevents a mailer from using either for purely 7-bit ASCII,
even though it isn't necessary. You are more likely to see that used by 
international clients. They may just utf-8 encode by default so not to have to 
do special processing for non 7-bit ASCII headers.



--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: check utf-8 subjects/from?

2017-12-13 Thread RW
On Wed, 13 Dec 2017 18:37:59 -0500
AJ Weber wrote:


> >>>
> >>> that tells me that rougly 10% of all ham mails would hit  
> There seems to be a large disparity between your (10%) result and my 
> (2%) result.  Can you explain how that could be?

He's Austrian, so it's probably mainly due to umlauts.


Re: check utf-8 subjects/from?

2017-12-13 Thread AJ Weber

On 12/13/2017 5:18 PM, Reindl Harald wrote:


my statements are based on a decade expierinece with a lot of users 
from all over the world, on you personal server you can even reject 
anything not whitelisted, from the moment on when other peoples 
mailflow is affected it's no longer that easy
It's true.  At first I noticed a pattern and decided to look-into how I 
could write a rule, probably starting with a low score, to test its 
effectiveness.


However, I ran your test to determine how many emails it would actually 
affect.  In a folder of just over 5100 emails, there would be < 2% 
false-positives.  That's actually better than I expected!  If you 
offered me a rule that only anticipated 2% false positives to try, I 
would say it was worth it for sure!





this would be a rule with a majority of false positives
you really should also look at your HAM
I didn't see the basis for your "majority" of false positives.  Did you 
run your test against a spam folder as well?  What were the results there?


cat *.eml | grep UTF-8 | grep -i subject | wc -l
2150

that tells me that rougly 10% of all ham mails would hit
There seems to be a large disparity between your (10%) result and my 
(2%) result.  Can you explain how that could be?


Thank you again!


Re: check utf-8 subjects/from?

2017-12-13 Thread AJ Weber
Would you be so kind as to tell me how you hacked into my mail server to 
determine the basis for your statements?




On 12/13/2017 4:52 PM, Reindl Harald wrote:



Am 13.12.2017 um 19:44 schrieb AJ Weber:
Is there an easy way to check if the Subject or From is UTF-8 -- or 
non-ASCII -- char set?


I see in some of my recent spam, either the Subject or the From 
(sometimes both) starts with "=?UTF-8?" (in these cases the rest is 
Base64 encoded, but I don't want to qualify on that).


If I check a header with a "header ... =~" regex rule, is it the raw 
text that I will check, or is it the decoded characters I will be 
checking against?


If it's the raw text, I can probably just look for that prefix to 
indicate the UTF-8 encoding.


I do get some legitimate emails with encoded chars and emojis, 
etc...but I think I'd like a rule to support it being SPAM in general


based on what?

this would be a rule with a majority of false positives
you really should also look at your HAM

cat *.eml | grep UTF-8 | grep -i subject | wc -l
2150

that tells me that rougly 10% of all ham mails would hit




Re: check utf-8 subjects/from?

2017-12-13 Thread RW
On Wed, 13 Dec 2017 13:44:49 -0500
AJ Weber wrote:

> If I check a header with a "header ... =~" regex rule, is it the raw 
> text that I will check, or is it the decoded characters I will be 
> checking against?

You can use  From:raw to get the raw From header.


BTW if you want to ask a new question, please just send an email to the
list address rather than reply to an existing thread. 


check utf-8 subjects/from?

2017-12-13 Thread AJ Weber
Is there an easy way to check if the Subject or From is UTF-8 -- or 
non-ASCII -- char set?


I see in some of my recent spam, either the Subject or the From 
(sometimes both) starts with "=?UTF-8?" (in these cases the rest is 
Base64 encoded, but I don't want to qualify on that).


If I check a header with a "header ... =~" regex rule, is it the raw 
text that I will check, or is it the decoded characters I will be 
checking against?


If it's the raw text, I can probably just look for that prefix to 
indicate the UTF-8 encoding.


I do get some legitimate emails with encoded chars and emojis, etc...but 
I think I'd like a rule to support it being SPAM in general.


Thanks again,
AJ