I have two files: test.py: -------------------------------------------------- # -*- encoding : utf8 -*- print 'in this file', repr('中文')
# tt.txt is saved as utf8 encoding f = file('tt.txt') line1 = f.readline().strip() print 'another file', repr(line1) ------------------------------------------------------- tt.txt: ---------------------------------------------------- 中文 test ------------------------------------------------------- run test.py and I get the following output: in this file '\xe4\xb8\xad\xe6\x96\x87' another file '\xef\xbb\xbf\xe4\xb8\xad\xe6\x96\x87' and I cann't encode line1 like: line1.decode('utf8').encode('gbk') get this error: UnicodeEncodeError: 'gbk' codec can't encode character u'\ufeff' in position 0: illegal multibyte sequence why did I get the different repr values?
-- http://mail.python.org/mailman/listinfo/python-list