are you commenting or adapting this line in crawl-urlfilter ? -^(file|ftp|mailto):
On Thu, Apr 2, 2009 at 5:23 PM, Wolf Fischer < [email protected]> wrote: > Hi there, > > i currently try to use Nutch for a local file directory. I have the url to > the directory, which looks like the following: > file:///C:/test/ > in crawl-urlfilter.txt I added +.* for testing purposes, however this > resulted in the famous "bug" of also looking through the parent directories. > So i looked into the FAQ as well as the mailing list archive and found the > solution: I simply should add something like > +^file:///c:/top/directory/^ > -. > to the urlfilter.txt. So I did: > +^file:///c:/test/ > -. > However if I do this the fetcher does not get any url at all and > immediately exits because of "no more URLs to fetch." > I have no idea why this is not working. I tried several other solutions and > simply cant get it to work the way i want it to work. Can somebody please > give me a hint on what i am doing wrong? > > Thanks in advance! > > Wolf > > -- > Dipl.-Inf. Wolf Fischer > > Programming Distributed Systems Lab > Institute of Computer Science > University of Augsburg > Universitätsstr. 14 > 86135 Augsburg, Germany > > Tel: +49 821 598-3102 > Fax: +49 821 598-2175 > >
