On 05/06/2013 06:40, Michael Torrie wrote:
On 06/04/2013 10:15 PM, Νικόλαος Κούρας wrote:
One of my Greek filenames is "Ευχή του Ιησού.mp3". Just a Greek
filename with spaces. Is there a problem when a filename contain both
english and greek letters? Isn't it still a unicode string?

All i did in my CentOS was 'mv "Euxi tou Ihsou.mp3" "Ευχή του
Ιησού.mp3"

and the displayed filename after 'ls -l' returned was:

is -rw-r--r-- 1 nikos nikos 3511233 Jun 4 14:11 \305\365\367\336\
\364\357\365\ \311\347\363\357\375.mp3

There is no way at all to check the charset used to store it in hdd?
It should be UTF-8, but it doesn't look like it. Is there some linxu
command or some python command that will print out the actual
encoding of '\305\365\367\336\ \364\357\365\
\311\347\363\357\375.mp3' ?

I can see that you are starting to understand things. I can't answer
your question (don't know the answer), but you're correct about one
thing.  A filename is just a sequence of bytes.  We'd hope it would be
utf-8, but it could be anything.  Even worse, it's not possible to tell
from a byte stream what encoding it is unless we just try one and see
what happens.  Text editors, for example, have to either make a guess
(utf-8 is a good one these days), or ask, or try to read from the first
line of the file using ascii and see if there's a source code character
set command to give it an idea.

From the previous posts I guessed that the filename might be encoded
using ISO-8859-7:

>>> s = b"\305\365\367\336\ \364\357\365\ \311\347\363\357\375.mp3"
>>> s.decode("iso-8859-7")
'Ευχή\\ του\\ Ιησού.mp3'

Yes, that looks the same.
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to