Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

MRAB Thu, 30 Apr 2009 13:08:35 -0700

Barry Scott wrote:


On 30 Apr 2009, at 05:52, Martin v. Löwis wrote:

How do get a printable unicode version of these path strings if they
contain none unicode data?


Define "printable". One way would be to use a regular expression,
replacing all codes in a certain range with a question mark.


What I mean by printable is that the string must be valid unicode
that I can print to a UTF-8 console or place as text in a UTF-8
web page.

I think your PEP gives me a string that will not encode to
valid UTF-8 that the outside of python world likes. Did I get this
point wrong?

I'm guessing that an app has to understand that filenames come in twoforms
unicode and bytes if its not utf-8 data. Why not simply return string if
its valid utf-8 otherwise return bytes?


That would have been an alternative solution, and the one that 2.x uses
for listdir. People didn't like it.


In our application we are running fedora with the assumption that the
filenames are UTF-8. When Windows systems FTP files to our system
the files are in CP-1251(?) and not valid UTF-8.

What we have to do is detect these non UTF-8 filename and get the
users to rename them.

Having an algorithm that says if its a string no problem, if its
a byte deal with the exceptions seems simple.

How do I do this detection with the PEP proposal?
Do I end up using the byte interface and doing the utf-8 decode
myself?

What do you do currently?

The PEP just offers a way of reading all filenames as Unicode, if that's
what you want. So what if the strings can't be encoded to normal UTF-8!
The filenames aren't valid UTF-8 anyway! :-)
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

Reply via email to