Thanks for the clarity, Steve, a couple questions/thoughts:

The choices are:
>
> * don't represent them at all (remove bytes API)
>

Would the bytes API be removed on *nix also?


> * convert and drop characters not in the (legacy) active code page
> * convert and fail on characters not in the (legacy) active code page
>

"Failure is not an option" -- These two seem like a plain old bad idea.

* convert and fail on invalid surrogate pairs
>

where would an invalid surrogate pair come from? never from a file system
API call, yes?

* represent them as UTF-16-LE in bytes (with embedded '\0' everywhere)
>

would this be doing anything -- or just keeping whatever the Windows API
takes/returns? i.e. exactly what is done on *nix?


> The fifth option is the best for round-tripping within Windows APIs.
>

How is it better? only performance (i.e. no encoding/decoding required) --
or would it be more reliable as well?

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

chris.bar...@noaa.gov
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to