how many threads are you running at? nutch doesnt know about sessions;
you might have to do something like fetching one thread at a time but thats slow. or maybe make nutch aware of session cookies. > I am crawling at depth 40 as there are 40 pages in the pagination. > > It works fine till the first 6 pages and after that it goes to the 7th > page, > but looks like its different session and hence the pagination wont work. > > I mean if you you directly hit page 7, using the URL, the pagination wont > work and will return empty set. > > But if you go in the sequence in the same session the pagination works. > > > --- > Thanks/Regards, > Parvez > > > On Wed, Sep 9, 2009 at 12:15 AM, <[email protected]> wrote: > >> could be tricky from what i've seen; >> >> theres limits on how many times you can hit one host/ip; >> >> also what depth you are crawling at may come to play in your case (which >> is probably what you want to look at in this case). >> >> >> > Any hint to increase the session time of the Nutch crawl thread. >> > I tried crawling with one thread, still no luck. >> > >> > ---- >> > Thanks/Regards, >> > Parvez >> > >> > >> > >> > On Tue, Sep 8, 2009 at 4:02 PM, Mohamed Parvez <[email protected]> >> wrote: >> > >> >> I have a paginated pages, which will only work if its crawled in a >> given >> >> sequence, and in the same session. >> >> >> >> For example first URL is >> >> >> >> http://www.myhost.com/?page_number=1 >> >> http://www.myhost.com/?page_number=2 >> >> http://www.myhost.com/?page_number=3 >> >> >> >> The first page has link to second page. >> >> Second page has link to first and second page. >> >> Third page has link to third and second page. >> >> So On... >> >> >> >> Nutch is able to crawl the the first 6 pages, but beyond that it is >> not >> >> able to crawl or is getting empty result. >> >> >> >> If I manually click through the pagination, in a browser, I can reach >> >> till >> >> the end with no problem. >> >> >> >> Is the Nutch Crawl Session timing out? How do we increase it. >> >> >> >> I tried crawling with on thread but still same result. >> >> >> >> Any suggestion ? >> >> >> >> --- >> >> Thanks/Regards, >> >> Parvez >> >> >> >> >> > >> >> >> >
