Re: [Mailman-Users] Chinese characters spam filter?

Yasuhito FUTATSUKI Sat, 09 Jul 2016 08:53:02 -0700

Hi,

On 07/07/16 04:41, Mark Sapiro wrote:
> That should be
> 
> ^Subject:.*[list of all Chinese characters here]
> 
> except that if your list's preferred language is English and you haven't
> changed Mailman's character set for English from ASCII to UTF-8, the
> text you are matching against won't contain any Chinese characters
> because the decoded headers are converted to the character set of the
> list's preferred language and all the Chinese characters will be
> converted to '?'.
> 
> You might try something like
> 
> ^Subject:.*\?{4,}
> 
> This will match any subject that contains 4 or more non-ascii characters
> in a row. Unfortunately, it will also match
> 
> Subject: WTF is happening here????
> 
> but you could try some number other than 4 but greater than 1


How about using 'backslashreplace' instead of 'replace' to encode to
list's preferred language in Mailman/Handlers/SpamDetect.py ?

Then, desirable pattern in this case seems to be

~Subject.*(\\u[0-9a-f]{4}){4}

It also matches strings like 
'What does the string "\\u6709\\u9650\\u516c\\u53f8" mean?', though.

=== modified file 'Mailman/Handlers/SpamDetect.py'
--- Mailman/Handlers/SpamDetect.py      2016-01-18 23:56:58 +0000
+++ Mailman/Handlers/SpamDetect.py      2016-07-09 00:47:33 +0000
@@ -86,7 +86,7 @@
                 # unicode it as iso-8859-1 which may result in a garbled
                 # mess, but we have to do something.
                 uvalue += unicode(frag, 'iso-8859-1', 'replace')
-        headers += '%s: %s\n' % (h, uvalue.encode(cset, 'replace'))
+        headers += '%s: %s\n' % (h, uvalue.encode(cset, 'backslashreplace'))
     return headers

-- 
Yasuhito FUTATSUKI <futat...@poem.co.jp>
------------------------------------------------------
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Re: [Mailman-Users] Chinese characters spam filter?

Reply via email to