On 07/06/2013 12:53, Νικόλαος Κούρας wrote:
[snip]

#========================================================
# Collect filenames of the path dir as bytes
greek_filenames = os.listdir( b'/home/nikos/public_html/data/apps/' )

for filename in greek_filenames:
        # Compute 'path/to/filename' in bytes
        greek_path = b'/home/nikos/public_html/data/apps/' + b'filename'
        try:

This is a worse way of doing it because the ISO-8859-7 encoding has 1
byte per codepoint, meaning that it's more 'tolerant' (if that's the
word) of errors. A sequence of bytes that is actually UTF-8 can be
decoded as ISO-8859-7, giving gibberish.

UTF-8 is less tolerant, and it's the encoding that ideally you should
be using everywhere, so it's better to assume UTF-8 and, if it fails, try ISO-8859-7 and then rename so that any names that were ISO-8859-7
will be converted to UTF-8.

That's the reason I did it that way in the code I posted, but, yet
again, you've changed it without understanding why!

                filepath = greek_path.decode('iso-8859-7')
                
                # Rename current filename from greek bytes --> utf-8 bytes
                os.rename( greek_path, filepath.encode('utf-8') )
        except UnicodeDecodeError:
                # Since its not a greek bytestring then its a proper utf8 
bytestring
                filepath = greek_path.decode('utf-8')

[snip]

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to