Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

Barry Scott Thu, 30 Apr 2009 12:44:07 -0700


On 30 Apr 2009, at 05:52, Martin v. Löwis wrote:

How do get a printable unicode version of these path strings if they
contain none unicode data?


Define "printable". One way would be to use a regular expression,
replacing all codes in a certain range with a question mark.


What I mean by printable is that the string must be valid unicode
that I can print to a UTF-8 console or place as text in a UTF-8
web page.

I think your PEP gives me a string that will not encode to
valid UTF-8 that the outside of python world likes. Did I get this
point wrong?

I'm guessing that an app has to understand that filenames come intwo formsunicode and bytes if its not utf-8 data. Why not simply returnstring if
its valid utf-8 otherwise return bytes?
That would have been an alternative solution, and the one that 2.xuses
for listdir. People didn't like it.


In our application we are running fedora with the assumption that the
filenames are UTF-8. When Windows systems FTP files to our system
the files are in CP-1251(?) and not valid UTF-8.

What we have to do is detect these non UTF-8 filename and get the
users to rename them.

Having an algorithm that says if its a string no problem, if its
a byte deal with the exceptions seems simple.

How do I do this detection with the PEP proposal?
Do I end up using the byte interface and doing the utf-8 decode
myself?

Barry

_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

Reply via email to