New submission from Ben Finney:
In `tokenize.detect_encoding` is the following code::
first = read_or_stop()
if first.startswith(BOM_UTF8):
# …
The `read_or_stop` function is defined as::
def read_or_stop():
try:
return readline()
except StopIteration:
return b''
So, on catching ``StopIteration``, the return value will be a byte string. The
`detect_encoding` code then immediately calls `sartswith`, which fails::
File "/usr/lib/python3.4/tokenize.py", line 409, in detect_encoding
if first.startswith(BOM_UTF8):
TypeError: startswith first arg must be str or a tuple of str, not bytes
One or both of those locations in the code is wrong. Either `read_or_stop`
should never return a byte string; or `detect_encoding` should not assume it
can call `startswith` on the result.
----------
components: Library (Lib)
messages: 234470
nosy: bignose
priority: normal
severity: normal
status: open
title: ‘tokenize.detect_encoding’ is confused between text and bytes: no
‘startswith’ method on a byte string
versions: Python 3.4
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue23296>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com