[issue23296] ‘tokenize.detect_encoding’ is confused between text and bytes: no ‘startswith’ method on a byte string

Ben Finney Wed, 21 Jan 2015 20:45:07 -0800

New submission from Ben Finney:

In `tokenize.detect_encoding` is the following code::


    first = read_or_stop()
    if first.startswith(BOM_UTF8):
        # …

The `read_or_stop` function is defined as::

    def read_or_stop():
        try:
            return readline()
        except StopIteration:
            return b''

So, on catching ``StopIteration``, the return value will be a byte string. The 
`detect_encoding` code then immediately calls `sartswith`, which fails::

    File "/usr/lib/python3.4/tokenize.py", line 409, in detect_encoding
      if first.startswith(BOM_UTF8):
  TypeError: startswith first arg must be str or a tuple of str, not bytes

One or both of those locations in the code is wrong. Either `read_or_stop` 
should never return a byte string; or `detect_encoding` should not assume it 
can call `startswith` on the result.

----------
components: Library (Lib)
messages: 234470
nosy: bignose
priority: normal
severity: normal
status: open
title: ‘tokenize.detect_encoding’ is confused between text and bytes: no 
‘startswith’ method on a byte string
versions: Python 3.4

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue23296>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue23296] ‘tokenize.detect_encoding’ is confused between text and bytes: no ‘startswith’ method on a byte string

Reply via email to