thanks, but I do not think this is the timeout problem

i think they are a special website, Perhaps the link they are from other
sources

like some javasripts?

so i do not know what is right url can be fetch by nutch..

2009/4/1 Alejandro Gonzalez <[email protected]>

> strange strange :). maybe you got a timeout error? have u change this
> property in the nutch-site or nutch-default?
>
> <property>
>  <name>http.timeout</name>
>  <value>10000</value>
>  <description>The default network timeout, in milliseconds.</description>
> </property>
>
>
>
> 2009/4/1 陈琛 <[email protected]>
>
> >
> > thanks very much ;)
> >
> > the log in the cygwin~(out.txt)
> > and the nutch log (hahoop.log)
> >
> >
> > i cannot find the any clues
> >
> > 2009/4/1 Alejandro Gonzalez <[email protected]>
> >
> >> send me the log of the crawling if possible. for sure there are some
> clues
> >> on it
> >>
> >> 2009/4/1 陈琛 <[email protected]>
> >>
> >> > yes, the depth is 10 and topN is 2000...
> >> >
> >> >  So strange....the other urls it is normal..but the 4 urls..
> >> >
> >> >
> >> >
> >> > 2009/4/1 Alejandro Gonzalez <[email protected]>
> >> >
> >> > > seems strange. have u tried to start a crawl just with these 4 seed
> >> > pages?
> >> > >
> >> > > Are you setting the topN parameter?
> >> > >
> >> > >
> >> > > 2009/4/1 陈琛 <[email protected]>
> >> > >
> >> > > >
> >> > > > thanks,i have Collection of urls Only these four can not search a
> >> > subset
> >> > > > of their pages
> >> > > >
> >> > > > the urls and crawl-urlfilter like Attachment
> >> > > >
> >> > > >
> >> > > > 2009/4/1 Alejandro Gonzalez <[email protected]>
> >> > > >
> >> > > > it's your crawl-urlfilter ok? are u sure it's fetching them
> >> properly?
> >> > > maybe
> >> > > >> it's not getting the content of the pages and so it cannot
> extract
> >> > links
> >> > > >> for
> >> > > >> fetch in the next level (i suppose you have set the crawl depth
> >> just
> >> > for
> >> > > >> the
> >> > > >> seeds level).
> >> > > >>
> >> > > >> So or your filters are skipping the seeds (i suppose it's not the
> >> case
> >> > > >> cause
> >> > > >> you say that urls arrive to Fetcher), or the fetching it's not
> >> going
> >> > ok
> >> > > >> (network issues?). take a look on that
> >> > > >>
> >> > > >> 2009/4/1 陈琛 <[email protected]>
> >> > > >>
> >> > > >> > HI,all
> >> > > >> >       I have four urls, like this:
> >> > > >> >       http://www.lao-indochina.com
> >> > > >> >       http://www.nuol.edu.la
> >> > > >> >       http://www.corninc.com.la
> >> > > >> >       http://www.vientianecollege.laopdr.com
> >> > > >> >
> >> > > >> > only fetch the HomePage why? Sub-page is not fetch。。。
> >> > > >> >
> >> > > >>
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Reply via email to