[jira] [Commented] (NUTCH-1247) CrawlDatum.retries should be int

Andrzej Bialecki (Commented) (JIRA) Fri, 13 Jan 2012 14:15:03 -0800

    [ 
https://issues.apache.org/jira/browse/NUTCH-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185908#comment-13185908
 ]


Andrzej Bialecki  commented on NUTCH-1247:
------------------------------------------

Originally the reason for a byte was compactness, but we can get the same 
effect using vint.

Markus, something seems off in your setup if you get such high values of 
retries ... usually CrawlDbReducer will set STATUS_DB_GONE if the number of 
retries reaches db.fetch.retry.max, so the page will not be tried again until 
FetchSchedule.forceRefetch resets its status (and the number of retries).
                
> CrawlDatum.retries should be int
> --------------------------------
>
>                 Key: NUTCH-1247
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1247
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.4
>            Reporter: Markus Jelsma
>             Fix For: 1.5
>
>
> CrawlDatum.retries is a byte and goes bad with larger values.
> 12/01/12 18:35:22 INFO crawl.CrawlDbReader: retry -127: 1
> 12/01/12 18:35:22 INFO crawl.CrawlDbReader: retry -128: 1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1247) CrawlDatum.retries should be int

Reply via email to