New submission from Michael Lazar <lazar.michae...@gmail.com>:

Hello,

I have stumbled upon a couple of inconsistencies in urllib.robotparser's 
__str__ methods.

These appear to be unintentional omissions; basically the code was modified but 
the string methods were never updated.

1. The RobotFileParser.__str__ method doesn't include the default (*) 
User-agent entry.

    >>> from urllib.robotparser import RobotFileParser
    >>> parser = RobotFileParser()
    >>> text = """
    ... User-agent: *
    ... Allow: /some/path
    ... Disallow: /another/path
    ...
    ... User-agent: Googlebot
    ... Allow: /folder1/myfile.html
    ... """
    >>> parser.parse(text.splitlines())
    >>> print(parser)
    User-agent: Googlebot
    Allow: /folder1/myfile.html
    
    
    >>>

This is *especially* awkward when parsing a valid robots.txt that only contains 
a wildcard User-agent.

    >>> from urllib.robotparser import RobotFileParser
    >>> parser = RobotFileParser()
    >>> text = """
    ... User-agent: *
    ... Allow: /some/path
    ... Disallow: /another/path
    ... """
    >>> parser.parse(text.splitlines())
    >>> print(parser)
    
    
    >>>


2. Support was recently added for `Crawl-delay` and `Request-Rate` lines, but 
__str__ does not include these.

    >>> from urllib.robotparser import RobotFileParser
    >>> parser = RobotFileParser()
    >>> text = """
    ... User-agent: figtree
    ... Crawl-delay: 3
    ... Request-rate: 9/30
    ... Disallow: /tmp
    ... """
    >>> parser.parse(text.splitlines())
    >>> print(parser)
    User-agent: figtree
    Disallow: /tmp


    >>>

3. Two unnecessary trailing newlines are being appended to the string output 
(one for the last RuleLine and one for the last Entry)

    (see above examples)


Taken on their own these are all minor issues, but they do make things quite 
confusing when using robotparser from the REPL!

----------
components: Library (Lib)
messages: 312259
nosy: michael-lazar
priority: normal
severity: normal
status: open
title: urllib.robotparser: incomplete __str__ methods
type: behavior
versions: Python 3.8

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue32861>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to