On 07/06/2013 12:53, Νικόλαος Κούρας wrote:
[snip]
#========================================================
# Collect filenames of the path dir as bytes
greek_filenames = os.listdir( b'/home/nikos/public_html/data/apps/' )
for filename in greek_filenames:
# Compute 'path/to/filename' in bytes
greek_path = b'/home/nikos/public_html/data/apps/' + b'filename'
try:
This is a worse way of doing it because the ISO-8859-7 encoding has 1
byte per codepoint, meaning that it's more 'tolerant' (if that's the
word) of errors. A sequence of bytes that is actually UTF-8 can be
decoded as ISO-8859-7, giving gibberish.
UTF-8 is less tolerant, and it's the encoding that ideally you should
be using everywhere, so it's better to assume UTF-8 and, if it fails,
try ISO-8859-7 and then rename so that any names that were ISO-8859-7
will be converted to UTF-8.
That's the reason I did it that way in the code I posted, but, yet
again, you've changed it without understanding why!
filepath = greek_path.decode('iso-8859-7')
# Rename current filename from greek bytes --> utf-8 bytes
os.rename( greek_path, filepath.encode('utf-8') )
except UnicodeDecodeError:
# Since its not a greek bytestring then its a proper utf8
bytestring
filepath = greek_path.decode('utf-8')
[snip]
--
http://mail.python.org/mailman/listinfo/python-list