Hi nutch developer,

I'm may be wrong but there is a bug in the fetcher and the tutorial. I spend around 5 hours today to search for a solution.

In case I run the intranet tutorial with depth of 10 there will be a url link included:
http://www.nutch.org/conf/nutch-default.xml
Since this is no known content type a IOException will be thrown. (I was wondering why here is a IOException used, since in the OutPutThread is a UnknownContentTypeException)

I do not understand following lines code in the Fetcher.java.

<x-tad-bigger> </x-tad-bigger><x-tad-bigger>if</x-tad-bigger><x-tad-bigger> (LogFormatter.hasLoggedSevere())

</x-tad-bigger><x-tad-bigger>throw</x-tad-bigger><x-tad-bigger> </x-tad-bigger><x-tad-bigger>new</x-tad-bigger><x-tad-bigger> RuntimeException(</x-tad-bigger><x-tad-bigger>"SEVERE error logged. Exiting fetcher."</x-tad-bigger><x-tad-bigger>);</x-tad-bigger>


Can someone please tell me what this do? Why a LogFormatter handle any Logic of the Fetcher????

I ad tried to catch the IOException but still the RuntimeException will be thrown and the crawling stop.

Thanks for any hints,
Stefan


---------------------------------------------------------------
open technology: http://www.media-style.com
open source: http://www.weta-group.net
open discussion: http://www.text-mining.org

Reply via email to