strange strange :). maybe you got a timeout error? have u change this property in the nutch-site or nutch-default?
<property> <name>http.timeout</name> <value>10000</value> <description>The default network timeout, in milliseconds.</description> </property> 2009/4/1 陈琛 <[email protected]> > > thanks very much ;) > > the log in the cygwin~(out.txt) > and the nutch log (hahoop.log) > > > i cannot find the any clues > > 2009/4/1 Alejandro Gonzalez <[email protected]> > >> send me the log of the crawling if possible. for sure there are some clues >> on it >> >> 2009/4/1 陈琛 <[email protected]> >> >> > yes, the depth is 10 and topN is 2000... >> > >> > So strange....the other urls it is normal..but the 4 urls.. >> > >> > >> > >> > 2009/4/1 Alejandro Gonzalez <[email protected]> >> > >> > > seems strange. have u tried to start a crawl just with these 4 seed >> > pages? >> > > >> > > Are you setting the topN parameter? >> > > >> > > >> > > 2009/4/1 陈琛 <[email protected]> >> > > >> > > > >> > > > thanks,i have Collection of urls Only these four can not search a >> > subset >> > > > of their pages >> > > > >> > > > the urls and crawl-urlfilter like Attachment >> > > > >> > > > >> > > > 2009/4/1 Alejandro Gonzalez <[email protected]> >> > > > >> > > > it's your crawl-urlfilter ok? are u sure it's fetching them >> properly? >> > > maybe >> > > >> it's not getting the content of the pages and so it cannot extract >> > links >> > > >> for >> > > >> fetch in the next level (i suppose you have set the crawl depth >> just >> > for >> > > >> the >> > > >> seeds level). >> > > >> >> > > >> So or your filters are skipping the seeds (i suppose it's not the >> case >> > > >> cause >> > > >> you say that urls arrive to Fetcher), or the fetching it's not >> going >> > ok >> > > >> (network issues?). take a look on that >> > > >> >> > > >> 2009/4/1 陈琛 <[email protected]> >> > > >> >> > > >> > HI,all >> > > >> > I have four urls, like this: >> > > >> > http://www.lao-indochina.com >> > > >> > http://www.nuol.edu.la >> > > >> > http://www.corninc.com.la >> > > >> > http://www.vientianecollege.laopdr.com >> > > >> > >> > > >> > only fetch the HomePage why? Sub-page is not fetch。。。 >> > > >> > >> > > >> >> > > > >> > > > >> > > >> > >> > >
