[ https://issues.apache.org/jira/browse/NUTCH-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14943609#comment-14943609 ]
Sebastian Nagel commented on NUTCH-2124: ---------------------------------------- I've tested the patch with the mentioned URL as only seed URL and http.redirect.max == 5: {noformat} ... 2015-09-28 19:46:16,183 INFO crawl.Injector - Injector: Total new urls injected: 1 ... 2015-09-28 19:46:23,342 INFO fetcher.FetcherThread - fetching http://www.wikipedia.com/wiki/URL_redirection (queue crawl delay=1000ms) 2015-09-28 19:46:23,342 INFO fetcher.FetcherThread - Using queue mode : byHost 2015-09-28 19:46:23,343 DEBUG fetcher.FetcherThread - redirectCount=0 ... 2015-09-28 19:46:24,096 DEBUG fetcher.FetcherThread - - protocol redirect to http://www.wikipedia.org/wiki/URL_redirection (fetching now) 2015-09-28 19:46:24,097 INFO fetcher.FetcherThread - fetching http://www.wikipedia.org/wiki/URL_redirection (queue crawl delay=1000ms) 2015-09-28 19:46:24,097 DEBUG fetcher.FetcherThread - redirectCount=1 2015-09-28 19:46:24,179 DEBUG fetcher.FetcherThread - - protocol redirect to https://www.wikipedia.org/wiki/URL_redirection (fetching now) 2015-09-28 19:46:24,180 INFO fetcher.FetcherThread - fetching https://www.wikipedia.org/wiki/URL_redirection (queue crawl delay=1000ms) 2015-09-28 19:46:24,180 DEBUG fetcher.FetcherThread - redirectCount=2 ... 2015-09-28 19:46:25,460 DEBUG fetcher.FetcherThread - - protocol redirect to https://en.wikipedia.org/wiki/URL_redirection (fetching now) 2015-09-28 19:46:25,461 INFO fetcher.FetcherThread - fetching https://en.wikipedia.org/wiki/URL_redirection (queue crawl delay=1000ms) 2015-09-28 19:46:25,461 DEBUG fetcher.FetcherThread - redirectCount=3 ... 2015-09-28 19:46:36,441 INFO crawl.CrawlDbReader - status 1 (db_unfetched): 58 2015-09-28 19:46:36,441 INFO crawl.CrawlDbReader - status 2 (db_fetched): 1 2015-09-28 19:46:36,441 INFO crawl.CrawlDbReader - status 5 (db_redir_perm): 3 ... {noformat} Can you verify the solution again with the given URL and http.redirect.max large enough to follow all redirects? Let's track further problems as separate issues to get this problem fixed. > redirect following same link again and again , max redirect exceed and went > db_gone > ----------------------------------------------------------------------------------- > > Key: NUTCH-2124 > URL: https://issues.apache.org/jira/browse/NUTCH-2124 > Project: Nutch > Issue Type: Bug > Components: fetcher > Affects Versions: 1.11 > Reporter: Yogendra Kumar Soni > Priority: Blocker > Labels: db_gone, fetcher, redirect > Fix For: 1.11 > > Attachments: NUTCH-2124.patch > > > Hello, followredirect is not working in trunk. please see the below log. > Fetcher: throughput threshold retries: 5 > fetcher.maxNum.threads can't be < than 50 : using 50 instead > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, > fetchQueues.getQueueCount=1 > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, > fetchQueues.getQueueCount=1 > {color:red} > fetching http://www.wikipedia.com/wiki/URL_redirection (queue crawl > delay=5000ms) > fetching http://www.wikipedia.com/wiki/URL_redirection (queue crawl > delay=5000ms) > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, > fetchQueues.getQueueCount=2 > fetching http://www.wikipedia.com/wiki/URL_redirection (queue crawl > delay=5000ms) > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, > fetchQueues.getQueueCount=2 > fetching http://www.wikipedia.com/wiki/URL_redirection (queue crawl > delay=5000ms) > fetching http://www.wikipedia.com/wiki/URL_redirection (queue crawl > delay=5000ms) > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, > fetchQueues.getQueueCount=2 > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, > fetchQueues.getQueueCount=2 > - redirect count exceeded http://www.wikipedia.com/wiki/URL_redirection > {color} > Thread FetcherThread has no more work available > -finishing thread FetcherThread, activeThreads=0 > -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0, > fetchQueues.getQueueCount=2 > -activeThreads=0 > Fetcher: finished at 2015-09-28 19:32:05, elapsed: 00:00:09 > Parsing : 20150928193153 -- This message was sent by Atlassian JIRA (v6.3.4#6332)