Greg Ewing writes: > Stephen J. Turnbull wrote:
> > What should happen internally is that all undecodable characters > > (which PUA characters are by definition for standard codecs) are > > mapped to unused codepoints in the PUA, chosen by Python. > > You mean chosen dynamically? Yes. > What happens if these PUA characters get encoded some other way, You can't win that, because Unicode is the only encoding that attempts to guarantee even the possibility of round-tripping. The only thing you can win is if it's the *same* character set (which might be used by multiple encodings), and then we record the character set and the code point. That's the best we can do in theory. The main problem with this scheme that I know of is that if you have a Python string that contains such a code point, you'll need to somehow include the information about the original encoding when pickling and the like. _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
