[issue3347] urllib.robotparser doesn't work in Py3k

Matt Giuca Sat, 12 Jul 2008 06:46:31 -0700

New submission from Matt Giuca <[EMAIL PROTECTED]>:

urllib.robotparser is broken in Python 3.0, due to a bytes object
appearing where a str is expected.


Example:

>>> import urllib.robotparser
>>> r =
urllib.robotparser.RobotFileParser('http://www.python.org/robots.txt')
>>> r.read()
TypeError: expected an object with the buffer interface

This is because the variable f in RobotFileParser.read is opened by
urlopen as a binary file, so f.read() returns a bytes object.

I've included a patch, which checks if it's a bytes, and if so, decodes
it with 'utf-8'. A more thorough fix might figure out what the charset
of the document is (in f.headers['Content-Type']), but at least this
works, and will be sufficient in almost all cases.

Also there are no test cases for urllib.robotparser.

Patch (robotparser.py.patch) is for branch /branches/py3k, revision 64891.

Commit log:

Lib/urllib/robotparser.py: Fixed robotparser for Python 3.0. urlopen
returns bytes objects where str expected. Decode the bytes using UTF-8.

----------
components: Library (Lib)
files: robotparser.py.patch
keywords: patch
messages: 69586
nosy: mgiuca
severity: normal
status: open
title: urllib.robotparser doesn't work in Py3k
type: behavior
versions: Python 3.0
Added file: http://bugs.python.org/file10885/robotparser.py.patch

_______________________________________
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue3347>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue3347] urllib.robotparser doesn't work in Py3k

Reply via email to