On 11:59 am, [EMAIL PROTECTED] wrote:
Sorry, I wasn't clear enough. I'll try to explain further...
Let's assume we have a filename like this:
0xc2 0xa9 0x2f 0x7f
The first two bytes are the copyright sign encoded in UTF-8, followed
by a
slash (0x2f, path separator) and a character encoded in an unknown
codepage
(0x7f is not ASCII!).
Originally I thought that this was a valid idea, but then it became
clear that this could be a problem. Consider a filename which includes
a UTF-8 encoding of a PUA code point.
I'm not sure if the use I proposed is correct according to the intended
use of
the PUA. I know that ideally no such string would escape from Python,
i.e. it
should only be visible internally. I would guess that that is something
the
PUA was intended for.
Viewing the PUA with GNOME charmap, I can see that many code points
there have character renderings on my Ubuntu system. I have to assume,
therefore, that there are other (and potentially conflicting) uses for
this unicode feature.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com