[issue25400] robotparser doesn't return crawl delay for default entry

2017-03-31 Thread Donald Stufft

Changes by Donald Stufft :


--
pull_requests: +1034

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25400] robotparser doesn't return crawl delay for default entry

2016-09-18 Thread Berker Peksag

Changes by Berker Peksag :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25400] robotparser doesn't return crawl delay for default entry

2016-09-18 Thread Roundup Robot

Roundup Robot added the comment:

New changeset d5d910cfd288 by Berker Peksag in branch '3.6':
Issue #25400: RobotFileParser now correctly returns default values for 
crawl_delay and request_rate
https://hg.python.org/cpython/rev/d5d910cfd288

New changeset 911070065e38 by Berker Peksag in branch 'default':
Issue #25400: Merge from 3.6
https://hg.python.org/cpython/rev/911070065e38

--
nosy: +python-dev

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25400] robotparser doesn't return crawl delay for default entry

2016-09-18 Thread Berker Peksag

Changes by Berker Peksag :


Added file: http://bugs.python.org/file44740/issue25400_v3.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25400] robotparser doesn't return crawl delay for default entry

2016-09-18 Thread Berker Peksag

Berker Peksag added the comment:

Here's an updated patch.

--
versions: +Python 3.7
Added file: http://bugs.python.org/file44739/issue25400_v2.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25400] robotparser doesn't return crawl delay for default entry

2016-09-11 Thread Berker Peksag

Berker Peksag added the comment:

I've now updated Lib/test/test_robotparser.py (issue 25497) Peter, do you have 
time to update your patch? Thanks!

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25400] robotparser doesn't return crawl delay for default entry

2015-10-14 Thread Peter Wirtz

Peter Wirtz added the comment:

On further inspection of the tests, it appears that the way in which the tests 
are written, a test case can only be tested for one useragent at a time. I will 
attempt to work on the tests so work correctly. Any advice would be much 
appreciated.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25400] robotparser doesn't return crawl delay for default entry

2015-10-14 Thread Berker Peksag

Berker Peksag added the comment:

Thanks for the patch Peter(and welcome to Python and open source development). 
I have a WIP patch to rewrite test_robotparser in a less magic way. So we can 
ignore test failures for now. I'll take a closer look to your patch.

--
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25400] robotparser doesn't return crawl delay for default entry

2015-10-14 Thread Berker Peksag

Changes by Berker Peksag :


--
nosy: +berker.peksag

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25400] robotparser doesn't return crawl delay for default entry

2015-10-14 Thread Peter Wirtz

Peter Wirtz added the comment:

Ok, for the mean time, I reworked the test so it appears to test correctly and 
tests passes. There does seem to be some magic, so I do hope I did not overlook 
anything. Here is the new patch.

--
Added file: http://bugs.python.org/file40784/robotparser_crawl_delay_v2.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25400] robotparser doesn't return crawl delay for default entry

2015-10-13 Thread Peter Wirtz

New submission from Peter Wirtz:

After changeset http://hg.python.org/lookup/dbed7cacfb7e, calling the 
crawl_delay method for a robots.txt files that has a crawl-delay for * 
useragents always returns None.

Ex:

Python 3.6.0a0 (default:1aae9b6a6929+, Oct  9 2015, 22:08:05)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib.robotparser
>>> parser = urllib.robotparser.RobotFileParser()
>>> parser.set_url('https://www.carthage.edu/robots.txt')
>>> parser.read()
>>> parser.crawl_delay('test_robotparser')
>>> parser.crawl_delay('*')
>>> print(parser.default_entry.delay)
120
>>>

Excerpt from https://www.carthage.edu/robots.txt:

User-agent: *
Crawl-Delay: 120
Disallow: /cgi-bin

I have written a patch that solves this. With patch, output is:

Python 3.6.0a0 (default:1aae9b6a6929+, Oct  9 2015, 22:08:05)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib.robotparser
>>> parser = urllib.robotparser.RobotFileParser()
>>> parser.set_url('https://www.carthage.edu/robots.txt')
>>> parser.read()
>>> parser.crawl_delay('test_robotparser')
120
>>> parser.crawl_delay('*')
120
>>> print(parser.default_entry.delay)
120
>>>

This also applies to the request_rate method.

--
components: Library (Lib)
files: robotparser_crawl_delay.patch
keywords: patch
messages: 252971
nosy: pwirtz
priority: normal
severity: normal
status: open
title: robotparser doesn't return crawl delay for default entry
type: behavior
versions: Python 3.6
Added file: http://bugs.python.org/file40777/robotparser_crawl_delay.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25400] robotparser doesn't return crawl delay for default entry

2015-10-13 Thread Peter Wirtz

Peter Wirtz added the comment:

This fix breaks the unit tests though. I am not sure how to go about checking 
those as this would be my first contribution to python and an open source 
project in general.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com