Hi there,

i currently try to use Nutch for a local file directory. I have the url to the directory, which looks like the following:
file:///C:/test/
in crawl-urlfilter.txt I added +.* for testing purposes, however this resulted in the famous "bug" of also looking through the parent directories. So i looked into the FAQ as well as the mailing list archive and found the solution: I simply should add something like
+^file:///c:/top/directory/^
-.
to the urlfilter.txt. So I did:
+^file:///c:/test/
-.
However if I do this the fetcher does not get any url at all and immediately exits because of "no more URLs to fetch." I have no idea why this is not working. I tried several other solutions and simply cant get it to work the way i want it to work. Can somebody please give me a hint on what i am doing wrong?

Thanks in advance!

Wolf

--
Dipl.-Inf. Wolf Fischer

Programming Distributed Systems Lab
Institute of Computer Science
University of Augsburg
Universitätsstr. 14
86135 Augsburg, Germany

Tel:    +49 821 598-3102
Fax:    +49 821 598-2175

Reply via email to