subject:"FW\: Rule for Russian character sets"

FW: Rule for Russian character sets

2008-02-14 Thread Michael Hutchinson


-Original Message-
From: John Hardin [mailto:[EMAIL PROTECTED] 
Sent: Friday, 15 February 2008 3:07 p.m.
To: Michael Hutchinson
Subject: RE: Rule for Russian character sets

On Fri, 15 Feb 2008, Michael Hutchinson wrote:

> Now what about matching a question mark and an equals sign?

An equals sign isn't special but a question mark is.

> Except for a backslash, but I've heard no testimony would suggest this
> line will work with Spamassassin, and like before, the SARE Regular
> Expressions Expander tool doesn't like it (and may have put un-due
doubt
> in my head):
>
> /\=\?koi8\-r\?/

Try/=\?koi8-r\?/i

NB: You can also use [?] (a character set consisting of a single
question 
mark) but that's a little clumsy.

-- 
  John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
  [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED]
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  It may be possible to start a programme of weapon registration as a
  first step towards the physical collection phase. ... Assurances
  must be provided, and met, that the process of registration will
  not lead to immediate weapons seizures by security forces.
   -- the UN, who "doesn't want to confiscate guns"
---
  8 days until George Washington's 276th Birthday

FW: Rule for Russian character sets (=?koi8-r? not quite acharset)

2008-02-17 Thread Michael Hutchinson

-Original Message-
> > We don't want to "only allow" the English locale, because we (here
at
> > my work) do not want our international clients (non Russian) to be
> > denied email service.
> 
> ok_locales  en ja ko th zh
> 
> This will allow anything but Cyrillic char sets. Please note that en
> does *not* mean "English locale" despite its name. It applies to all
> Western charsets, including German Umlauts, Swedisch, French, Turkish,
> etc. Basically everything that uses the characters in this post, plus
> language specific chars.
 
Ok now we're talking turkey. Thanks for providing the much needed
clarity on ok_locales. I may just employ that technique yet, pending
whether we get any more Russian spam through the gates.

> Sorry, I did not mean to troll nor any kind of offense.

You have my apologies, as being a Friday afternoon, I was pretty sick of
work and shouldn't have taken it out on you or the list. Sorry.
 
> However, you missed my point. Getting detailed with REs is a good
thing,
> sure. I was not about that -- but the RE in question does not properly
> handle charset encoding. See the Subject for an example which is not
> encoding, but will be matched by your rule.
> 
> My point was, that the rule discussed aims at being something that it
> unfortunately is not, because charset encoding is slightly more
complex
> and definitely requires a closing part. A Regular Expression that does
> this can be found in check_for_faraway_charset_in_headers() in
> HeaderEval.pm:
>   $hdr =~ /=\?(.+?)\?.\?.*?\?=/g
> 
> Hence, the my re-inventing the wheel analogy. And these wheels are
quite
> flexible, too. ;-)
> 
> Also, your rule applies to the Subject only, whereas ok_locales does
> check all MIME parts and will trigger on Russian spam with a "western"
> Subject.

The RE in question (my one) was not just written for subject, but a
separate rule was written for the raw From: line as well. As we only
score spam here and leave filing it to the MUA (unless a score of 25 is
reached, where SA bins it), scoring against the Subject and From lines
makes OK sense, because if you used simply (=?koi8-r?) in the subject it
would not score high enough on it's own to be filtered or blocked. (I'm
trying to employ what I've learned from the SA webpage about writing
multiple low-scoring rules, instead of a few big-scoring ones).

I can see it is flawed, but have to also admit that it is working rather
well at the moment. Mind you, I have taken the time to translate some of
the Russian Spam, work out spammy phrases, and then quote those phrases
to be scored against by SA.

> Hope this clarifies my previous posts and is appreciated again...

Your posts are appreciated, and sorry for the mean comment.

Cheers,
Mike

Re: FW: Rule for Russian character sets (=?koi8-r? not quite acharset)

2008-02-18 Thread Karsten Bräckelmann

On Mon, 2008-02-18 at 09:36 +1300, Michael Hutchinson wrote:
> > > We don't want to "only allow" the English locale, because we (here at
> > > my work) do not want our international clients (non Russian) to be
> > > denied email service.
> > 
> > ok_locales  en ja ko th zh
> > 
> > This will allow anything but Cyrillic char sets. Please note that en
> > does *not* mean "English locale" despite its name. It applies to all
> > Western charsets, including German Umlauts, Swedisch, French, Turkish,
> > etc. Basically everything that uses the characters in this post, plus
> > language specific chars.
>  
> Ok now we're talking turkey. Thanks for providing the much needed
> clarity on ok_locales. I may just employ that technique yet, pending
> whether we get any more Russian spam through the gates.
> 
> > Sorry, I did not mean to troll nor any kind of offense.
> 
> You have my apologies, as being a Friday afternoon, I was pretty sick of
> work and shouldn't have taken it out on you or the list. Sorry.

> > Hope this clarifies my previous posts and is appreciated again...
> 
> Your posts are appreciated, and sorry for the mean comment.

Thanks.  No offense taken, no harm done, don't worry. :)

  guenther


-- 
char *t="[EMAIL PROTECTED]";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

FW: Rule for Russian character sets

FW: Rule for Russian character sets (=?koi8-r? not quite acharset)

Re: FW: Rule for Russian character sets (=?koi8-r? not quite acharset)

3 matches

Site Navigation

Mail list logo

Footer information