could be tricky from what i've seen; theres limits on how many times you can hit one host/ip;
also what depth you are crawling at may come to play in your case (which is probably what you want to look at in this case). > Any hint to increase the session time of the Nutch crawl thread. > I tried crawling with one thread, still no luck. > > ---- > Thanks/Regards, > Parvez > > > > On Tue, Sep 8, 2009 at 4:02 PM, Mohamed Parvez <[email protected]> wrote: > >> I have a paginated pages, which will only work if its crawled in a given >> sequence, and in the same session. >> >> For example first URL is >> >> http://www.myhost.com/?page_number=1 >> http://www.myhost.com/?page_number=2 >> http://www.myhost.com/?page_number=3 >> >> The first page has link to second page. >> Second page has link to first and second page. >> Third page has link to third and second page. >> So On... >> >> Nutch is able to crawl the the first 6 pages, but beyond that it is not >> able to crawl or is getting empty result. >> >> If I manually click through the pagination, in a browser, I can reach >> till >> the end with no problem. >> >> Is the Nutch Crawl Session timing out? How do we increase it. >> >> I tried crawling with on thread but still same result. >> >> Any suggestion ? >> >> --- >> Thanks/Regards, >> Parvez >> >> >
