Re: FW: Rule for Russian character sets (=?koi8-r? not quite acharset)
On Mon, 2008-02-18 at 09:36 +1300, Michael Hutchinson wrote: We don't want to only allow the English locale, because we (here at my work) do not want our international clients (non Russian) to be denied email service. ok_locales en ja ko th zh This will allow anything but Cyrillic char sets. Please note that en does *not* mean English locale despite its name. It applies to all Western charsets, including German Umlauts, Swedisch, French, Turkish, etc. Basically everything that uses the characters in this post, plus language specific chars. Ok now we're talking turkey. Thanks for providing the much needed clarity on ok_locales. I may just employ that technique yet, pending whether we get any more Russian spam through the gates. Sorry, I did not mean to troll nor any kind of offense. You have my apologies, as being a Friday afternoon, I was pretty sick of work and shouldn't have taken it out on you or the list. Sorry. Hope this clarifies my previous posts and is appreciated again... Your posts are appreciated, and sorry for the mean comment. Thanks. No offense taken, no harm done, don't worry. :) guenther -- char *t=[EMAIL PROTECTED]; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
FW: Rule for Russian character sets (=?koi8-r? not quite acharset)
-Original Message-snipsnip We don't want to only allow the English locale, because we (here at my work) do not want our international clients (non Russian) to be denied email service. ok_locales en ja ko th zh This will allow anything but Cyrillic char sets. Please note that en does *not* mean English locale despite its name. It applies to all Western charsets, including German Umlauts, Swedisch, French, Turkish, etc. Basically everything that uses the characters in this post, plus language specific chars. Ok now we're talking turkey. Thanks for providing the much needed clarity on ok_locales. I may just employ that technique yet, pending whether we get any more Russian spam through the gates. Sorry, I did not mean to troll nor any kind of offense. You have my apologies, as being a Friday afternoon, I was pretty sick of work and shouldn't have taken it out on you or the list. Sorry. However, you missed my point. Getting detailed with REs is a good thing, sure. I was not about that -- but the RE in question does not properly handle charset encoding. See the Subject for an example which is not encoding, but will be matched by your rule. My point was, that the rule discussed aims at being something that it unfortunately is not, because charset encoding is slightly more complex and definitely requires a closing part. A Regular Expression that does this can be found in check_for_faraway_charset_in_headers() in HeaderEval.pm: $hdr =~ /=\?(.+?)\?.\?.*?\?=/g Hence, the my re-inventing the wheel analogy. And these wheels are quite flexible, too. ;-) Also, your rule applies to the Subject only, whereas ok_locales does check all MIME parts and will trigger on Russian spam with a western Subject. The RE in question (my one) was not just written for subject, but a separate rule was written for the raw From: line as well. As we only score spam here and leave filing it to the MUA (unless a score of 25 is reached, where SA bins it), scoring against the Subject and From lines makes OK sense, because if you used simply (=?koi8-r?) in the subject it would not score high enough on it's own to be filtered or blocked. (I'm trying to employ what I've learned from the SA webpage about writing multiple low-scoring rules, instead of a few big-scoring ones). I can see it is flawed, but have to also admit that it is working rather well at the moment. Mind you, I have taken the time to translate some of the Russian Spam, work out spammy phrases, and then quote those phrases to be scored against by SA. Hope this clarifies my previous posts and is appreciated again... Your posts are appreciated, and sorry for the mean comment. Cheers, Mike
FW: Rule for Russian character sets
-Original Message- From: John Hardin [mailto:[EMAIL PROTECTED] Sent: Friday, 15 February 2008 3:07 p.m. To: Michael Hutchinson Subject: RE: Rule for Russian character sets On Fri, 15 Feb 2008, Michael Hutchinson wrote: Now what about matching a question mark and an equals sign? An equals sign isn't special but a question mark is. Except for a backslash, but I've heard no testimony would suggest this line will work with Spamassassin, and like before, the SARE Regular Expressions Expander tool doesn't like it (and may have put un-due doubt in my head): /\=\?koi8\-r\?/ Try/=\?koi8-r\?/i NB: You can also use [?] (a character set consisting of a single question mark) but that's a little clumsy. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- It may be possible to start a programme of weapon registration as a first step towards the physical collection phase. ... Assurances must be provided, and met, that the process of registration will not lead to immediate weapons seizures by security forces. -- the UN, who doesn't want to confiscate guns --- 8 days until George Washington's 276th Birthday