[jira] [Assigned] (NUTCH-1042) Fetcher.max.crawl.delay property not taken into account correctly when set to -1

2013-01-20 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney reassigned NUTCH-1042:
---

Assignee: Lewis John McGibbney  (was: Tejas Patil)

> Fetcher.max.crawl.delay property not taken into account correctly when set to 
> -1
> 
>
> Key: NUTCH-1042
> URL: https://issues.apache.org/jira/browse/NUTCH-1042
> Project: Nutch
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 1.3
>Reporter: Nutch User - 1
>Assignee: Lewis John McGibbney
> Fix For: 1.7, 2.2
>
>
> [Originally: 
> (http://lucene.472066.n3.nabble.com/A-possible-bug-or-misleading-documentation-td3162397.html).]
> From nutch-default.xml:
> "
> 
>  fetcher.max.crawl.delay
>  30
>  
>  If the Crawl-Delay in robots.txt is set to greater than this value (in
>  seconds) then the fetcher will skip this page, generating an error report.
>  If set to -1 the fetcher will never skip such pages and will wait the
>  amount of time retrieved from robots.txt Crawl-Delay, however long that
>  might be.
>  
> 
> "
> Fetcher.java:
> (http://svn.apache.org/viewvc/nutch/branches/branch-1.3/src/java/org/apache/nutch/fetcher/Fetcher.java?view=markup).
> The line 554 in Fetcher.java: "this.maxCrawlDelay =
> conf.getInt("fetcher.max.crawl.delay", 30) * 1000;" .
> The lines 615-616 in Fetcher.java:
> "
> if (rules.getCrawlDelay() > 0) {
>   if (rules.getCrawlDelay() > maxCrawlDelay) {
> "
> Now, the documentation states that, if fetcher.max.crawl.delay is set to
> -1, the crawler will always wait the amount of time the Crawl-Delay
> parameter specifies. However, as you can see, if it really is negative
> the condition on the line 616 is always true, which leads to skipping
> the page whose Crawl-Delay is set.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (NUTCH-1042) Fetcher.max.crawl.delay property not taken into account correctly when set to -1

2013-01-20 Thread Tejas Patil (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tejas Patil reassigned NUTCH-1042:
--

Assignee: Tejas Patil

> Fetcher.max.crawl.delay property not taken into account correctly when set to 
> -1
> 
>
> Key: NUTCH-1042
> URL: https://issues.apache.org/jira/browse/NUTCH-1042
> Project: Nutch
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 1.3
>Reporter: Nutch User - 1
>Assignee: Tejas Patil
> Fix For: 1.7, 2.2
>
>
> [Originally: 
> (http://lucene.472066.n3.nabble.com/A-possible-bug-or-misleading-documentation-td3162397.html).]
> From nutch-default.xml:
> "
> 
>  fetcher.max.crawl.delay
>  30
>  
>  If the Crawl-Delay in robots.txt is set to greater than this value (in
>  seconds) then the fetcher will skip this page, generating an error report.
>  If set to -1 the fetcher will never skip such pages and will wait the
>  amount of time retrieved from robots.txt Crawl-Delay, however long that
>  might be.
>  
> 
> "
> Fetcher.java:
> (http://svn.apache.org/viewvc/nutch/branches/branch-1.3/src/java/org/apache/nutch/fetcher/Fetcher.java?view=markup).
> The line 554 in Fetcher.java: "this.maxCrawlDelay =
> conf.getInt("fetcher.max.crawl.delay", 30) * 1000;" .
> The lines 615-616 in Fetcher.java:
> "
> if (rules.getCrawlDelay() > 0) {
>   if (rules.getCrawlDelay() > maxCrawlDelay) {
> "
> Now, the documentation states that, if fetcher.max.crawl.delay is set to
> -1, the crawler will always wait the amount of time the Crawl-Delay
> parameter specifies. However, as you can see, if it really is negative
> the condition on the line 616 is always true, which leads to skipping
> the page whose Crawl-Delay is set.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira