Hi: thanks. I don't disable that loop regexp in url-filter. I find an issue report about this: https://issues.apache.org/jira/browse/NUTCH-620. I just apply the patch and it seems work fine.If that problem reoccur, I will try your advice.
thanks for your help. yanky 2009/4/8 Stevan Kovacevic <[email protected]> > you can disable this in url-filter file, it is disabled by default. you ran > into a loop on that site > > On Wed, Apr 8, 2009 at 7:32 AM, yanky young <[email protected]> wrote: > > > Hi guys: > > > > I am using nutch in a project. But I found that nutch repeat fetching > some > > pages. For example: > > > > http://www.me.washington.edu//people/faculty/wang/ > > > > this is a page fetched. But also, there are some urls like this in > > commandline output: > > > > http://www.me.washington.edu//people/faculty/wang/ > > http://www.me.washington.edu///people/faculty/wang/ > > http://www.me.washington.edu////people/faculty/wang/ > > ...... > > http://www.me.washington.edu////////////people/faculty/wang/ > > > > it seems nutch will repeat this process for ever. Why is that? > > > > any help is appreciated! > > > > yanky > > >
