I am trying to index a site that has, URL with length 325 chars and its failing.
I started with 2 URLs in the urls/seed.txt file with both of length 325 and only difference between both the URLs is the right side, last 3 chars I ran the fallowing 2 commands $ bin/nutch inject crawl/crawldb urls Injector: starting Injector: crawlDb: crawl/crawldb Injector: urlDir: urls Injector: Converting injected urls to crawl db entries. Injector: Merging injected urls into crawl db. Injector: done $ bin/nutch readdb crawl/crawldb -dump dump CrawlDb dump: starting CrawlDb db: crawl/crawldb CrawlDb dump: done I opened the part-00000 file in the dump folder and there, is only ONE url and it has been truncated to 318 chars How make Nutch consider URLs with length more than 318 chars ---- Thanks/Regards, Parvez
