Hi Wolf, I had the same problem and solved it by adding another line to the urlfilter: +^file:/c:/test/
Also verify, if the drive letter in your path, should be in upper-case as written in your example or in lower-case as written in your urlfilter. Kind regards, Martina -----Ursprüngliche Nachricht----- Von: Wolf Fischer [mailto:[email protected]] Gesendet: Donnerstag, 2. April 2009 17:23 An: [email protected] Betreff: Problem with Crawler and Parent Directories Hi there, i currently try to use Nutch for a local file directory. I have the url to the directory, which looks like the following: file:///C:/test/ in crawl-urlfilter.txt I added +.* for testing purposes, however this resulted in the famous "bug" of also looking through the parent directories. So i looked into the FAQ as well as the mailing list archive and found the solution: I simply should add something like +^file:///c:/top/directory/^ -. to the urlfilter.txt. So I did: +^file:///c:/test/ -. However if I do this the fetcher does not get any url at all and immediately exits because of "no more URLs to fetch." I have no idea why this is not working. I tried several other solutions and simply cant get it to work the way i want it to work. Can somebody please give me a hint on what i am doing wrong? Thanks in advance! Wolf -- Dipl.-Inf. Wolf Fischer Programming Distributed Systems Lab Institute of Computer Science University of Augsburg Universitätsstr. 14 86135 Augsburg, Germany Tel: +49 821 598-3102 Fax: +49 821 598-2175
