strange strange :). maybe you got a timeout error? have u change this
property in the nutch-site or nutch-default?

<property>
  <name>http.timeout</name>
  <value>10000</value>
  <description>The default network timeout, in milliseconds.</description>
</property>



2009/4/1 陈琛 <[email protected]>

>
> thanks very much ;)
>
> the log in the cygwin~(out.txt)
> and the nutch log (hahoop.log)
>
>
> i cannot find the any clues
>
> 2009/4/1 Alejandro Gonzalez <[email protected]>
>
>> send me the log of the crawling if possible. for sure there are some clues
>> on it
>>
>> 2009/4/1 陈琛 <[email protected]>
>>
>> > yes, the depth is 10 and topN is 2000...
>> >
>> >  So strange....the other urls it is normal..but the 4 urls..
>> >
>> >
>> >
>> > 2009/4/1 Alejandro Gonzalez <[email protected]>
>> >
>> > > seems strange. have u tried to start a crawl just with these 4 seed
>> > pages?
>> > >
>> > > Are you setting the topN parameter?
>> > >
>> > >
>> > > 2009/4/1 陈琛 <[email protected]>
>> > >
>> > > >
>> > > > thanks,i have Collection of urls Only these four can not search a
>> > subset
>> > > > of their pages
>> > > >
>> > > > the urls and crawl-urlfilter like Attachment
>> > > >
>> > > >
>> > > > 2009/4/1 Alejandro Gonzalez <[email protected]>
>> > > >
>> > > > it's your crawl-urlfilter ok? are u sure it's fetching them
>> properly?
>> > > maybe
>> > > >> it's not getting the content of the pages and so it cannot extract
>> > links
>> > > >> for
>> > > >> fetch in the next level (i suppose you have set the crawl depth
>> just
>> > for
>> > > >> the
>> > > >> seeds level).
>> > > >>
>> > > >> So or your filters are skipping the seeds (i suppose it's not the
>> case
>> > > >> cause
>> > > >> you say that urls arrive to Fetcher), or the fetching it's not
>> going
>> > ok
>> > > >> (network issues?). take a look on that
>> > > >>
>> > > >> 2009/4/1 陈琛 <[email protected]>
>> > > >>
>> > > >> > HI,all
>> > > >> >       I have four urls, like this:
>> > > >> >       http://www.lao-indochina.com
>> > > >> >       http://www.nuol.edu.la
>> > > >> >       http://www.corninc.com.la
>> > > >> >       http://www.vientianecollege.laopdr.com
>> > > >> >
>> > > >> > only fetch the HomePage why? Sub-page is not fetch。。。
>> > > >> >
>> > > >>
>> > > >
>> > > >
>> > >
>> >
>>
>
>

Reply via email to