stijn added the comment:

New here, but I think this is the correct issue to get info about this unicode 
problem. On the windows console:

> chcp
Active code page: 437

> type utf.txt
Привет

> chcp 65001
Active code page: 65001

> type utf.txt
Привет

> python --version
Python 3.5.0a0

> cat utf.py
f = open('utf.txt')
l = f.readline()
print(l)
print(len(l))

> python utf.py
Привет
�²ÐµÑ‚
�‚


13

> cat utf_explicit.py
import codecs
f = codecs.open('utf.txt', encoding='utf-8', mode='r')
l = f.readline()
print(l)
print(len(l))

> python utf_explicit.py
Привет
ет


7

I partly read through the page but these things are a bit above my head. Could 
anyone explain
- how to figure out what codec files returned by open()?
- is there a way to change it globally to utf-8?
- the last case is almost correct: it has the correct number of characters, but 
the print() still does something wrong. I got this working by using the stream 
patch, but got another example on which is is not correct, see below. Any way 
around this?

> type utf2.txt
aαbβcγdδ

> cat utf2.py
import streams
import codecs
streams.enable()
f = codecs.open('utf2.txt', encoding='utf-8', mode='r')
print(f.read(1))
print(f.read(1))
print(f.read(2))
print(f.read(4))

> python utf2.py
a
α
bβc
γdδ

----------
nosy: +stijn

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue1602>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to