Wolfgang Rohdewald added the comment:

If you cannot offer a solution for arbitrary unicode, you have no solution at 
all. Afer all, that is what unicode is about: support ALL languages, not only 
your own.

I do not quite understand why you think this is not a bug.

If cgitb encodes unicode like & x e 4 ; (remove spaces), the browser does not 
have to guess the encoding, it will always show the correct character. This 
works for all of unicode. See 
https://en.wikipedia.org/wiki/Unicode_and_HTML#Numeric_character_references

So this bug is fixable, I am reopening it.

For Python3, the fix is actually very simple: Do not write doc but 
str(doc.encode('ascii', 'xmlcharrefreplace')), like in the attached patch. This 
patch works for me but there might be yet uncovered code paths. And my source 
file is encoded in utf-8, other source file encodings should be tested too. I 
do not know if cgitb correctly honors the source file header like # -*- coding: 
utf-8 -*-

Fixing this for Python2 is certainly doable too but perhaps more difficult 
because a Python2 str() may have an unknown encoding.

----------
keywords: +patch
resolution: not a bug -> 
status: closed -> open
Added file: http://bugs.python.org/file37047/22746.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue22746>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to