On 07/13/16 03:47, Mark Sapiro wrote:
On 07/12/2016 12:03 AM, Stephen J. Turnbull wrote:
Mark Sapiro writes:
> On 7/8/16 6:04 PM, Yasuhito FUTATSUKI wrote:
> >
> > How about using 'backslashreplace' instead of 'replace' to encode to
> > list's preferred language in Mailman/Handlers/SpamDetect.py ?
I see you've already done this, but ...
I would consider xmlrefreplace as well. xmlrefs are something most
people (users/moderators) have seen, backslash they're not going to
recognize unless they're programmers.
I have now switched to xmlcharrefreplace instead of backslashreplace as
I agree this will be easier to explain and understand. I was uncertain
about this at first because I didn't know that xmlcharrefreplace
wouldn't use entity names in some cases, but it appears that it only
uses numeric references.
I don't have strong objection to switch to xmlcharrefreplace because my
main subject is to distinguish '?' from replaced characters.
But personally I prefer backslashreplace for looking up Unicode table,
for numeric reference of xmlcharreplace seems to use decimal, while
backslashreplace uses hexadecimal, and most of Unicode table uses
hexadecimal for express code point like U+4E8C.
At an earlier stage, you could also just do a trial re-encoding with
the list preferred codec, set errors = 'strict', catch the Exception,
and re-raise as a Hold (or Discard, according to per-list policy).
(Then discard the output.) I would prefer this solution, I think, as
creating regexps turns out to be an issue for many list owners.
People would have to learn not to use emoji in headers, of course, or
suffer moderation delays or even discards.
I think this will have too many undesired effects. Not just emoji, but
accented latin or CJK characters, etc. in display names would I think be
real problems, even on English language lists.
I suggest to use variable to select handler from 'replace' (for backword
compatibility), 'xmlcharrefreplace', or 'backslashreplace' in mm_cfg.py.
I think it is better to hold string attributes of mm_cfg and mlist class
as Unicode than site_language code or list's preferred language code
encoded (but I know it is so trouble to do so).
--
Yasuhito FUTATSUKI <futat...@poem.co.jp>
------------------------------------------------------
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe:
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org