New submission from Matt Giuca <[EMAIL PROTECTED]>: urllib.robotparser is broken in Python 3.0, due to a bytes object appearing where a str is expected.
Example: >>> import urllib.robotparser >>> r = urllib.robotparser.RobotFileParser('http://www.python.org/robots.txt') >>> r.read() TypeError: expected an object with the buffer interface This is because the variable f in RobotFileParser.read is opened by urlopen as a binary file, so f.read() returns a bytes object. I've included a patch, which checks if it's a bytes, and if so, decodes it with 'utf-8'. A more thorough fix might figure out what the charset of the document is (in f.headers['Content-Type']), but at least this works, and will be sufficient in almost all cases. Also there are no test cases for urllib.robotparser. Patch (robotparser.py.patch) is for branch /branches/py3k, revision 64891. Commit log: Lib/urllib/robotparser.py: Fixed robotparser for Python 3.0. urlopen returns bytes objects where str expected. Decode the bytes using UTF-8. ---------- components: Library (Lib) files: robotparser.py.patch keywords: patch messages: 69586 nosy: mgiuca severity: normal status: open title: urllib.robotparser doesn't work in Py3k type: behavior versions: Python 3.0 Added file: http://bugs.python.org/file10885/robotparser.py.patch _______________________________________ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue3347> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com