On 18/07/2007 4:11 AM, ddtl wrote: > Hello everybody, > > I want to create a script which reads files in a > current directory and renames them according to some > scheme. The file names are in Russian - sometimes > the names encoded as win-1251, sometimes as koi8-r etc.
You have a file system with 8-bit file names with no indication of 'codepage' or 'encoding', either globally or per file? Which operating system are you using? > I want to read in file name and convert it to list for > further processing. Read file name from a text file? Or do you mean using e.g. glob.glob() or os.listdir() What do you mean by "convert it to list"? Do you mean 'foo.txt' -> ['f', 'o', ....etc]??? Why? > The problem is that Python treats > non-ascii characters as multibyte characters - for > example, hex code for "Small Character A" in koi8-r is > 0xc1, but Python interprets it as a sequence of > \xd0, \xb1 bytes. Python is very unlikely to do that all by itself. Please show us the script or whatever evidence you have. I strongly suggest that immediately after "reading" a file name, you do print repr(file_name) NOT print file_name so that you can see *exactly* what you've got. Are you sure about the \xb1??? Consider this: >>> '\xc1'.decode('koi8-r') u'\u0430' >>> '\xc1'.decode('koi8-r').encode('utf8') '\xd0\xb0' >>> Also: >>> import sys; sys.stdout.encoding 'cp850' # Win XP Pro, command prompt >>> What do you get when you do that? > > What can I do so that Python interprets non-ascii > characters correctly? Know how your non-ascii characters are encoded. Tell Python what to do with them. Read this: http://www.amk.ca/python/howto/unicode Hope this helps, John -- http://mail.python.org/mailman/listinfo/python-list