stijn added the comment: New here, but I think this is the correct issue to get info about this unicode problem. On the windows console:
> chcp Active code page: 437 > type utf.txt ╨ƒ╤Ç╨╕╨▓╨╡╤é > chcp 65001 Active code page: 65001 > type utf.txt Привет > python --version Python 3.5.0a0 > cat utf.py f = open('utf.txt') l = f.readline() print(l) print(len(l)) > python utf.py Привет �²ÐµÑ‚ �‚ 13 > cat utf_explicit.py import codecs f = codecs.open('utf.txt', encoding='utf-8', mode='r') l = f.readline() print(l) print(len(l)) > python utf_explicit.py Привет ет 7 I partly read through the page but these things are a bit above my head. Could anyone explain - how to figure out what codec files returned by open()? - is there a way to change it globally to utf-8? - the last case is almost correct: it has the correct number of characters, but the print() still does something wrong. I got this working by using the stream patch, but got another example on which is is not correct, see below. Any way around this? > type utf2.txt aαbβcγdδ > cat utf2.py import streams import codecs streams.enable() f = codecs.open('utf2.txt', encoding='utf-8', mode='r') print(f.read(1)) print(f.read(1)) print(f.read(2)) print(f.read(4)) > python utf2.py a α bβc γdδ ---------- nosy: +stijn _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue1602> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com