During the crawling process,I see lots of report on org.apache.nutch.protocol.file.FileError: File Error: 404,which are all on locations with space in it. I'm using nutch0.9, is this really of bug?Any patch for it?
Here is part of the error logs: /usr/local/apache2/resumes_txt/50/Summit Point/Marissafolli/Receptionist/Administrative Assistant /Marissa org.apache.nutch.protocol.file.FileError: File Error: 404 at org.apache.nutch.protocol.file.File.getProtocolOutput(File.java:100) at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:145) org.apache.nutch.protocol.file.FileError: File Error: 404 at org.apache.nutch.protocol.file.File.getProtocolOutput(File.java:100) at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:145) The exact file is actually: [r...@file ~]# ls /usr/local/apache2/resumes_txt/50/Summit\ Point/Marissafolli/Receptionist/Administrative\ Assistant\ /Marissa\'s\ Resume.txt.txt /usr/local/apache2/resumes_txt/50/Summit Point/Marissafolli/Receptionist/Administrative Assistant /Marissa's Resume.txt.txt Seems nutch has failed to parse the url? I'm using the file protocol, sample url: fetching file:////usr/local/apache2/resumes_txt/50/Ronceverte/tonyobrien/Owner/Operator/Anthony O -- http://maishudi.com/OMegle.php Anonymous private chatting,have fun!