Re: FW: Rule for Russian character sets (=?koi8-r? not quite acharset)

2008-02-18 Thread Karsten Bräckelmann
On Mon, 2008-02-18 at 09:36 +1300, Michael Hutchinson wrote:
   We don't want to only allow the English locale, because we (here at
   my work) do not want our international clients (non Russian) to be
   denied email service.
  
  ok_locales  en ja ko th zh
  
  This will allow anything but Cyrillic char sets. Please note that en
  does *not* mean English locale despite its name. It applies to all
  Western charsets, including German Umlauts, Swedisch, French, Turkish,
  etc. Basically everything that uses the characters in this post, plus
  language specific chars.
  
 Ok now we're talking turkey. Thanks for providing the much needed
 clarity on ok_locales. I may just employ that technique yet, pending
 whether we get any more Russian spam through the gates.
 
  Sorry, I did not mean to troll nor any kind of offense.
 
 You have my apologies, as being a Friday afternoon, I was pretty sick of
 work and shouldn't have taken it out on you or the list. Sorry.

  Hope this clarifies my previous posts and is appreciated again...
 
 Your posts are appreciated, and sorry for the mean comment.

Thanks.  No offense taken, no harm done, don't worry. :)

  guenther


-- 
char *t=[EMAIL PROTECTED];
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



FW: Rule for Russian character sets (=?koi8-r? not quite acharset)

2008-02-17 Thread Michael Hutchinson
-Original Message-snipsnip
  We don't want to only allow the English locale, because we (here
at
  my work) do not want our international clients (non Russian) to be
  denied email service.
 
 ok_locales  en ja ko th zh
 
 This will allow anything but Cyrillic char sets. Please note that en
 does *not* mean English locale despite its name. It applies to all
 Western charsets, including German Umlauts, Swedisch, French, Turkish,
 etc. Basically everything that uses the characters in this post, plus
 language specific chars.
 
Ok now we're talking turkey. Thanks for providing the much needed
clarity on ok_locales. I may just employ that technique yet, pending
whether we get any more Russian spam through the gates.

 Sorry, I did not mean to troll nor any kind of offense.

You have my apologies, as being a Friday afternoon, I was pretty sick of
work and shouldn't have taken it out on you or the list. Sorry.
 
 However, you missed my point. Getting detailed with REs is a good
thing,
 sure. I was not about that -- but the RE in question does not properly
 handle charset encoding. See the Subject for an example which is not
 encoding, but will be matched by your rule.
 
 My point was, that the rule discussed aims at being something that it
 unfortunately is not, because charset encoding is slightly more
complex
 and definitely requires a closing part. A Regular Expression that does
 this can be found in check_for_faraway_charset_in_headers() in
 HeaderEval.pm:
   $hdr =~ /=\?(.+?)\?.\?.*?\?=/g
 
 Hence, the my re-inventing the wheel analogy. And these wheels are
quite
 flexible, too. ;-)
 
 Also, your rule applies to the Subject only, whereas ok_locales does
 check all MIME parts and will trigger on Russian spam with a western
 Subject.

The RE in question (my one) was not just written for subject, but a
separate rule was written for the raw From: line as well. As we only
score spam here and leave filing it to the MUA (unless a score of 25 is
reached, where SA bins it), scoring against the Subject and From lines
makes OK sense, because if you used simply (=?koi8-r?) in the subject it
would not score high enough on it's own to be filtered or blocked. (I'm
trying to employ what I've learned from the SA webpage about writing
multiple low-scoring rules, instead of a few big-scoring ones).

I can see it is flawed, but have to also admit that it is working rather
well at the moment. Mind you, I have taken the time to translate some of
the Russian Spam, work out spammy phrases, and then quote those phrases
to be scored against by SA.

 Hope this clarifies my previous posts and is appreciated again...

Your posts are appreciated, and sorry for the mean comment.

Cheers,
Mike



FW: Rule for Russian character sets

2008-02-14 Thread Michael Hutchinson

-Original Message-
From: John Hardin [mailto:[EMAIL PROTECTED] 
Sent: Friday, 15 February 2008 3:07 p.m.
To: Michael Hutchinson
Subject: RE: Rule for Russian character sets

On Fri, 15 Feb 2008, Michael Hutchinson wrote:

 Now what about matching a question mark and an equals sign?

An equals sign isn't special but a question mark is.

 Except for a backslash, but I've heard no testimony would suggest this
 line will work with Spamassassin, and like before, the SARE Regular
 Expressions Expander tool doesn't like it (and may have put un-due
doubt
 in my head):

 /\=\?koi8\-r\?/

Try/=\?koi8-r\?/i

NB: You can also use [?] (a character set consisting of a single
question 
mark) but that's a little clumsy.

-- 
  John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
  [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED]
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  It may be possible to start a programme of weapon registration as a
  first step towards the physical collection phase. ... Assurances
  must be provided, and met, that the process of registration will
  not lead to immediate weapons seizures by security forces.
   -- the UN, who doesn't want to confiscate guns
---
  8 days until George Washington's 276th Birthday