Hi:

thanks. I don't disable that loop regexp in url-filter. I find an issue
report about this: https://issues.apache.org/jira/browse/NUTCH-620. I just
apply the patch and it seems work fine.If that problem reoccur, I will try
your advice.

thanks for your help.

yanky


2009/4/8 Stevan Kovacevic <[email protected]>

> you can disable this in url-filter file, it is disabled by default. you ran
> into a loop on that site
>
> On Wed, Apr 8, 2009 at 7:32 AM, yanky young <[email protected]> wrote:
>
> > Hi guys:
> >
> > I am using nutch in a project. But I found that nutch repeat fetching
> some
> > pages. For example:
> >
> > http://www.me.washington.edu//people/faculty/wang/
> >
> > this is a page fetched. But also, there are some urls like this in
> > commandline output:
> >
> > http://www.me.washington.edu//people/faculty/wang/
> > http://www.me.washington.edu///people/faculty/wang/
> > http://www.me.washington.edu////people/faculty/wang/
> > ......
> > http://www.me.washington.edu////////////people/faculty/wang/
> >
> > it seems nutch will repeat this process for ever. Why is that?
> >
> > any help is appreciated!
> >
> > yanky
> >
>

Reply via email to