I logged in to check my crawl this morning and noticed that fetching seemed
frozen.  Console output was showing an exception.  I had gotten that yesterday
too. But today I thought I would let it run.  I logged back in a while later
and I noticed it had recovered.  My login was fortuitous because an inspection
of the whole hadoop.log file revealed only that one gap:

2007-07-31 07:47:10,003 WARN  regex.RegexURLNormalizer - can't find rules for
scope 'outlink', using default
2007-07-31 08:47:59,528 INFO  fetcher.Fetcher - Fetcher: done

(A bigger chunk of the log surrounding this gap follows.)

That's a gap of one hour and 49 seconds.  What would cause nutch to freeze up
like that for a whole hour?

Here's the command I used to run the crawl:

$ nohup time nutch crawl /usr/tmp/urls.txt -dir /usr/tmp/85sites -threads 20
-depth 10 -topN 103103

I'm using the nightly build nutch-2007-06-27_06-52-44.

At the recommendation of LE QuocAnh in regard to my problem from yesterday I
decreased the number of threads from 200 to 20:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg08985.html

Here is a larger chunk of hadoop.log:

2007-07-31 07:43:01,332 INFO  fetcher.Fetcher - fetching
http://blogs.ign.com/sng-ign/
2007-07-31 07:43:01,459 INFO  fetcher.Fetcher - fetching
http://www.mediarights.org/film/the_rules_of_the_game.php
2007-07-31 07:45:07,869 WARN  parse.ParseUtil - No suitable parser found when
trying to parse content http://www.fest21.com/_textimage/image/1185865273 of
type image/png
2007-07-31 07:45:07,869 WARN  fetcher.Fetcher - Error parsing:
http://www.fest21.com/_textimage/image/1185865273:
org.apache.nutch.parse.ParseException: parser not found for
contentType=image/png url=http://www.fest21.com/_textimage/image/1185865273
        at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:76)
        at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:309)
        at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:154)

2007-07-31 07:46:28,395 WARN  parse.ParseUtil - No suitable parser found when
trying to parse content http://www.fest21.com/_textimage/image/1185864507 of
type image/png
2007-07-31 07:46:28,395 WARN  fetcher.Fetcher - Error parsing:
http://www.fest21.com/_textimage/image/1185864507:
org.apache.nutch.parse.ParseException: parser not found for
contentType=image/png url=http://www.fest21.com/_textimage/image/1185864507
        at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:76)
        at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:309)
        at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:154)

2007-07-31 07:47:08,977 INFO  plugin.PluginRepository - Plugins: looking in:
/usr/local/nutch-2007-06-27_06-52-44/plugins
2007-07-31 07:47:09,274 INFO  plugin.PluginRepository - Plugin Auto-activation
mode: [true]
2007-07-31 07:47:09,274 INFO  plugin.PluginRepository - Registered Plugins:
2007-07-31 07:47:09,274 INFO  plugin.PluginRepository -         CyberNeko HTML 
Parser
(lib-nekohtml)
2007-07-31 07:47:09,274 INFO  plugin.PluginRepository -         Site Query 
Filter
(query-site)
2007-07-31 07:47:09,274 INFO  plugin.PluginRepository -         Basic URL 
Normalizer
(urlnormalizer-basic)
2007-07-31 07:47:09,274 INFO  plugin.PluginRepository -         Html Parse 
Plug-in
(parse-html)
2007-07-31 07:47:09,274 INFO  plugin.PluginRepository -         Pass-through URL
Normalizer (urlnormalizer-pass)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository -         Regex URL Filter
Framework (lib-regex-filter)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository -         Feed 
Parse/Index/Query
Plug-in (feed)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository -         Basic Indexing 
Filter
(index-basic)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository -         Basic Summarizer
Plug-in (summary-basic)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository -         Text Parse 
Plug-in
(parse-text)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository -         JavaScript 
Parser
(parse-js)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository -         Basic Query 
Filter
(query-basic)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository -         Regex URL Filter
(urlfilter-regex)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository -         HTTP Framework
(lib-http)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository -         XML Libraries
(lib-xml)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository -         URL Query Filter
(query-url)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository -         Regex URL 
Normalizer
(urlnormalizer-regex)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository -         Http Protocol 
Plug-in
(protocol-http)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository -         the nutch core
extension points (nutch-extensionpoints)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository -         OPIC Scoring 
Plug-in
(scoring-opic)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository - Registered
Extension-Points:
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository -         Nutch Summarizer
(org.apache.nutch.searcher.Summarizer)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository -         Nutch Scoring
(org.apache.nutch.scoring.ScoringFilter)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository -         Nutch Protocol
(org.apache.nutch.protocol.Protocol)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository -         Nutch URL 
Normalizer
(org.apache.nutch.net.URLNormalizer)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository -         Nutch URL Filter
(org.apache.nutch.net.URLFilter)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository -         HTML Parse 
Filter
(org.apache.nutch.parse.HtmlParseFilter)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository -         Nutch Online 
Search
Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository -         Nutch Indexing 
Filter
(org.apache.nutch.indexer.IndexingFilter)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository -         Nutch Content 
Parser
(org.apache.nutch.parse.Parser)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository -         Ontology Model 
Loader
(org.apache.nutch.ontology.Ontology)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository -         Nutch Analysis
(org.apache.nutch.analysis.NutchAnalyzer)
2007-07-31 07:47:09,276 INFO  plugin.PluginRepository -         Nutch Query 
Filter
(org.apache.nutch.searcher.QueryFilter)
2007-07-31 07:47:10,003 WARN  regex.RegexURLNormalizer - can't find rules for
scope 'outlink', using default
2007-07-31 08:47:59,528 INFO  fetcher.Fetcher - Fetcher: done
2007-07-31 08:47:59,530 INFO  crawl.CrawlDb - CrawlDb update: starting
2007-07-31 08:47:59,530 INFO  crawl.CrawlDb - CrawlDb update: db:
/usr/tmp/85sites/crawldb
2007-07-31 08:47:59,530 INFO  crawl.CrawlDb - CrawlDb update: segments:
[/usr/tmp/85sites/segments/20070731002418]
2007-07-31 08:47:59,530 INFO  crawl.CrawlDb - CrawlDb update: additions
allowed: true
2007-07-31 08:47:59,530 INFO  crawl.CrawlDb - CrawlDb update: URL normalizing:
true
2007-07-31 08:47:59,530 INFO  crawl.CrawlDb - CrawlDb update: URL filtering:
true
2007-07-31 08:47:59,559 INFO  crawl.CrawlDb - CrawlDb update: Merging segment
data into db.
2007-07-31 08:48:08,626 INFO  plugin.PluginRepository - Plugins: looking in:
/usr/local/nutch-2007-06-27_06-52-44/plugins
2007-07-31 08:48:08,713 INFO  plugin.PluginRepository - Plugin Auto-activation
mode: [true]
2007-07-31 08:48:08,713 INFO  plugin.PluginRepository - Registered Plugins:
2007-07-31 08:48:08,713 INFO  plugin.PluginRepository -         CyberNeko HTML 
Parser
(lib-nekohtml)
2007-07-31 08:48:08,713 INFO  plugin.PluginRepository -         Site Query 
Filter
(query-site)
2007-07-31 08:48:08,713 INFO  plugin.PluginRepository -         Basic URL 
Normalizer
(urlnormalizer-basic)
2007-07-31 08:48:08,713 INFO  plugin.PluginRepository -         Html Parse 
Plug-in
(parse-html)
2007-07-31 08:48:08,713 INFO  plugin.PluginRepository -         Pass-through URL
Normalizer (urlnormalizer-pass)
2007-07-31 08:48:08,713 INFO  plugin.PluginRepository -         Regex URL Filter
Framework (lib-regex-filter)
2007-07-31 08:48:08,713 INFO  plugin.PluginRepository -         Feed 
Parse/Index/Query
Plug-in (feed)
2007-07-31 08:48:08,713 INFO  plugin.PluginRepository -         Basic Indexing 
Filter
(index-basic)
2007-07-31 08:48:08,713 INFO  plugin.PluginRepository -         Basic Summarizer
Plug-in (summary-basic)
2007-07-31 08:48:08,713 INFO  plugin.PluginRepository -         Text Parse 
Plug-in
(parse-text)
2007-07-31 08:48:08,713 INFO  plugin.PluginRepository -         JavaScript 
Parser
(parse-js)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository -         Basic Query 
Filter
(query-basic)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository -         Regex URL Filter
(urlfilter-regex)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository -         HTTP Framework
(lib-http)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository -         XML Libraries
(lib-xml)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository -         URL Query Filter
(query-url)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository -         Regex URL 
Normalizer
(urlnormalizer-regex)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository -         Http Protocol 
Plug-in
(protocol-http)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository -         the nutch core
extension points (nutch-extensionpoints)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository -         OPIC Scoring 
Plug-in
(scoring-opic)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository - Registered
Extension-Points:
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository -         Nutch Summarizer
(org.apache.nutch.searcher.Summarizer)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository -         Nutch Scoring
(org.apache.nutch.scoring.ScoringFilter)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository -         Nutch Protocol
(org.apache.nutch.protocol.Protocol)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository -         Nutch URL 
Normalizer
(org.apache.nutch.net.URLNormalizer)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository -         Nutch URL Filter
(org.apache.nutch.net.URLFilter)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository -         HTML Parse 
Filter
(org.apache.nutch.parse.HtmlParseFilter)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository -         Nutch Online 
Search
Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository -         Nutch Indexing 
Filter
(org.apache.nutch.indexer.IndexingFilter)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository -         Nutch Content 
Parser
(org.apache.nutch.parse.Parser)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository -         Ontology Model 
Loader
(org.apache.nutch.ontology.Ontology)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository -         Nutch Analysis
(org.apache.nutch.analysis.NutchAnalyzer)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository -         Nutch Query 
Filter
(org.apache.nutch.searcher.QueryFilter)
2007-07-31 08:48:08,737 WARN  regex.RegexURLNormalizer - can't find rules for
scope 'crawldb', using default
2007-07-31 08:48:21,597 INFO  plugin.PluginRepository - Plugins: looking in:
/usr/local/nutch-2007-06-27_06-52-44/plugins
2007-07-31 08:48:21,682 INFO  plugin.PluginRepository - Plugin Auto-activation
mode: [true]
2007-07-31 08:48:21,682 INFO  plugin.PluginRepository - Registered Plugins:
2007-07-31 08:48:21,682 INFO  plugin.PluginRepository -         CyberNeko HTML 
Parser
(lib-nekohtml)
2007-07-31 08:48:21,682 INFO  plugin.PluginRepository -         Site Query 
Filter
(query-site)
2007-07-31 08:48:21,682 INFO  plugin.PluginRepository -         Basic URL 
Normalizer
(urlnormalizer-basic)
2007-07-31 08:48:21,682 INFO  plugin.PluginRepository -         Html Parse 
Plug-in
(parse-html)
2007-07-31 08:48:21,682 INFO  plugin.PluginRepository -         Pass-through URL
Normalizer (urlnormalizer-pass)
2007-07-31 08:48:21,682 INFO  plugin.PluginRepository -         Regex URL Filter
Framework (lib-regex-filter)
2007-07-31 08:48:21,682 INFO  plugin.PluginRepository -         Feed 
Parse/Index/Query
Plug-in (feed)
2007-07-31 08:48:21,682 INFO  plugin.PluginRepository -         Basic Indexing 
Filter
(index-basic)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository -         Basic Summarizer
Plug-in (summary-basic)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository -         Text Parse 
Plug-in
(parse-text)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository -         JavaScript 
Parser
(parse-js)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository -         Basic Query 
Filter
(query-basic)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository -         Regex URL Filter
(urlfilter-regex)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository -         HTTP Framework
(lib-http)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository -         XML Libraries
(lib-xml)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository -         URL Query Filter
(query-url)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository -         Regex URL 
Normalizer
(urlnormalizer-regex)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository -         Http Protocol 
Plug-in
(protocol-http)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository -         the nutch core
extension points (nutch-extensionpoints)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository -         OPIC Scoring 
Plug-in
(scoring-opic)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository - Registered
Extension-Points:
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository -         Nutch Summarizer
(org.apache.nutch.searcher.Summarizer)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository -         Nutch Scoring
(org.apache.nutch.scoring.ScoringFilter)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository -         Nutch Protocol
(org.apache.nutch.protocol.Protocol)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository -         Nutch URL 
Normalizer
(org.apache.nutch.net.URLNormalizer)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository -         Nutch URL Filter
(org.apache.nutch.net.URLFilter)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository -         HTML Parse 
Filter
(org.apache.nutch.parse.HtmlParseFilter)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository -         Nutch Online 
Search
Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository -         Nutch Indexing 
Filter
(org.apache.nutch.indexer.IndexingFilter)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository -         Nutch Content 
Parser
(org.apache.nutch.parse.Parser)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository -         Ontology Model 
Loader
(org.apache.nutch.ontology.Ontology)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository -         Nutch Analysis
(org.apache.nutch.analysis.NutchAnalyzer)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository -         Nutch Query 
Filter
(org.apache.nutch.searcher.QueryFilter)
2007-07-31 08:48:21,706 WARN  regex.RegexURLNormalizer - can't find rules for
scope 'crawldb', using default
2007-07-31 08:48:34,141 INFO  plugin.PluginRepository - Plugins: looking in:
/usr/local/nutch-2007-06-27_06-52-44/plugins
2007-07-31 08:48:34,239 INFO  plugin.PluginRepository - Plugin Auto-activation
mode: [true]
2007-07-31 08:48:34,239 INFO  plugin.PluginRepository - Registered Plugins:
2007-07-31 08:48:34,239 INFO  plugin.PluginRepository -         CyberNeko HTML 
Parser
(lib-nekohtml)
2007-07-31 08:48:34,239 INFO  plugin.PluginRepository -         Site Query 
Filter
(query-site)
2007-07-31 08:48:34,239 INFO  plugin.PluginRepository -         Basic URL 
Normalizer
(urlnormalizer-basic)
2007-07-31 08:48:34,239 INFO  plugin.PluginRepository -         Html Parse 
Plug-in
(parse-html)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository -         Pass-through URL
Normalizer (urlnormalizer-pass)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository -         Regex URL Filter
Framework (lib-regex-filter)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository -         Feed 
Parse/Index/Query
Plug-in (feed)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository -         Basic Indexing 
Filter
(index-basic)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository -         Basic Summarizer
Plug-in (summary-basic)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository -         Text Parse 
Plug-in
(parse-text)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository -         JavaScript 
Parser
(parse-js)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository -         Basic Query 
Filter
(query-basic)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository -         Regex URL Filter
(urlfilter-regex)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository -         HTTP Framework
(lib-http)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository -         XML Libraries
(lib-xml)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository -         URL Query Filter
(query-url)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository -         Regex URL 
Normalizer
(urlnormalizer-regex)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository -         Http Protocol 
Plug-in
(protocol-http)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository -         the nutch core
extension points (nutch-extensionpoints)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository -         OPIC Scoring 
Plug-in
(scoring-opic)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository - Registered
Extension-Points:
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository -         Nutch Summarizer
(org.apache.nutch.searcher.Summarizer)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository -         Nutch Scoring
(org.apache.nutch.scoring.ScoringFilter)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository -         Nutch Protocol
(org.apache.nutch.protocol.Protocol)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository -         Nutch URL 
Normalizer
(org.apache.nutch.net.URLNormalizer)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository -         Nutch URL Filter
(org.apache.nutch.net.URLFilter)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository -         HTML Parse 
Filter
(org.apache.nutch.parse.HtmlParseFilter)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository -         Nutch Online 
Search
Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository -         Nutch Indexing 
Filter
(org.apache.nutch.indexer.IndexingFilter)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository -         Nutch Content 
Parser
(org.apache.nutch.parse.Parser)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository -         Ontology Model 
Loader
(org.apache.nutch.ontology.Ontology)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository -         Nutch Analysis
(org.apache.nutch.analysis.NutchAnalyzer)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository -         Nutch Query 
Filter
(org.apache.nutch.searcher.QueryFilter)
2007-07-31 08:48:34,264 WARN  regex.RegexURLNormalizer - can't find rules for
scope 'crawldb', using default
2007-07-31 08:50:03,544 INFO  plugin.PluginRepository - Plugins: looking in:
/usr/local/nutch-2007-06-27_06-52-44/plugins
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository - Plugin Auto-activation
mode: [true]
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository - Registered Plugins:
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository -         CyberNeko HTML 
Parser
(lib-nekohtml)
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository -         Site Query 
Filter
(query-site)
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository -         Basic URL 
Normalizer
(urlnormalizer-basic)
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository -         Html Parse 
Plug-in
(parse-html)
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository -         Pass-through URL
Normalizer (urlnormalizer-pass)
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository -         Regex URL Filter
Framework (lib-regex-filter)
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository -         Feed 
Parse/Index/Query
Plug-in (feed)
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository -         Basic Indexing 
Filter
(index-basic)
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository -         Basic Summarizer
Plug-in (summary-basic)
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository -         Text Parse 
Plug-in
(parse-text)
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository -         JavaScript 
Parser
(parse-js)
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository -         Basic Query 
Filter
(query-basic)
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository -         Regex URL Filter
(urlfilter-regex)
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository -         HTTP Framework
(lib-http)
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository -         XML Libraries
(lib-xml)
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository -         URL Query Filter
(query-url)
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository -         Regex URL 
Normalizer
(urlnormalizer-regex)
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository -         Http Protocol 
Plug-in
(protocol-http)
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository -         the nutch core
extension points (nutch-extensionpoints)
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository -         OPIC Scoring 
Plug-in
(scoring-opic)
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository - Registered
Extension-Points:
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository -         Nutch Summarizer
(org.apache.nutch.searcher.Summarizer)
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository -         Nutch Scoring
(org.apache.nutch.scoring.ScoringFilter)
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository -         Nutch Protocol
(org.apache.nutch.protocol.Protocol)
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository -         Nutch URL 
Normalizer
(org.apache.nutch.net.URLNormalizer)
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository -         Nutch URL Filter
(org.apache.nutch.net.URLFilter)
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository -         HTML Parse 
Filter
(org.apache.nutch.parse.HtmlParseFilter)
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository -         Nutch Online 
Search
Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository -         Nutch Indexing 
Filter
(org.apache.nutch.indexer.IndexingFilter)
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository -         Nutch Content 
Parser
(org.apache.nutch.parse.Parser)
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository -         Ontology Model 
Loader
(org.apache.nutch.ontology.Ontology)
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository -         Nutch Analysis
(org.apache.nutch.analysis.NutchAnalyzer)
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository -         Nutch Query 
Filter
(org.apache.nutch.searcher.QueryFilter)
2007-07-31 08:50:03,656 WARN  regex.RegexURLNormalizer - can't find rules for
scope 'crawldb', using default
2007-07-31 08:51:07,967 INFO  plugin.PluginRepository - Plugins: looking in:
/usr/local/nutch-2007-06-27_06-52-44/plugins
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository - Plugin Auto-activation
mode: [true]
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository - Registered Plugins:
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository -         CyberNeko HTML 
Parser
(lib-nekohtml)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository -         Site Query 
Filter
(query-site)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository -         Basic URL 
Normalizer
(urlnormalizer-basic)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository -         Html Parse 
Plug-in
(parse-html)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository -         Pass-through URL
Normalizer (urlnormalizer-pass)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository -         Regex URL Filter
Framework (lib-regex-filter)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository -         Feed 
Parse/Index/Query
Plug-in (feed)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository -         Basic Indexing 
Filter
(index-basic)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository -         Basic Summarizer
Plug-in (summary-basic)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository -         Text Parse 
Plug-in
(parse-text)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository -         JavaScript 
Parser
(parse-js)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository -         Basic Query 
Filter
(query-basic)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository -         Regex URL Filter
(urlfilter-regex)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository -         HTTP Framework
(lib-http)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository -         XML Libraries
(lib-xml)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository -         URL Query Filter
(query-url)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository -         Regex URL 
Normalizer
(urlnormalizer-regex)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository -         Http Protocol 
Plug-in
(protocol-http)
2007-07-31 08:51:08,064 INFO  plugin.PluginRepository -         the nutch core
extension points (nutch-extensionpoints)
2007-07-31 08:51:08,064 INFO  plugin.PluginRepository -         OPIC Scoring 
Plug-in
(scoring-opic)
2007-07-31 08:51:08,064 INFO  plugin.PluginRepository - Registered
Extension-Points:
2007-07-31 08:51:08,064 INFO  plugin.PluginRepository -         Nutch Summarizer
(org.apache.nutch.searcher.Summarizer)
2007-07-31 08:51:08,064 INFO  plugin.PluginRepository -         Nutch Scoring
(org.apache.nutch.scoring.ScoringFilter)
2007-07-31 08:51:08,064 INFO  plugin.PluginRepository -         Nutch Protocol
(org.apache.nutch.protocol.Protocol)
2007-07-31 08:51:08,064 INFO  plugin.PluginRepository -         Nutch URL 
Normalizer
(org.apache.nutch.net.URLNormalizer)
2007-07-31 08:51:08,064 INFO  plugin.PluginRepository -         Nutch URL Filter
(org.apache.nutch.net.URLFilter)
2007-07-31 08:51:08,064 INFO  plugin.PluginRepository -         HTML Parse 
Filter
(org.apache.nutch.parse.HtmlParseFilter)
2007-07-31 08:51:08,064 INFO  plugin.PluginRepository -         Nutch Online 
Search
Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
2007-07-31 08:51:08,064 INFO  plugin.PluginRepository -         Nutch Indexing 
Filter
(org.apache.nutch.indexer.IndexingFilter)
2007-07-31 08:51:08,064 INFO  plugin.PluginRepository -         Nutch Content 
Parser
(org.apache.nutch.parse.Parser)
2007-07-31 08:51:08,064 INFO  plugin.PluginRepository -         Ontology Model 
Loader
(org.apache.nutch.ontology.Ontology)
2007-07-31 08:51:08,064 INFO  plugin.PluginRepository -         Nutch Analysis
(org.apache.nutch.analysis.NutchAnalyzer)
2007-07-31 08:51:08,064 INFO  plugin.PluginRepository -         Nutch Query 
Filter
(org.apache.nutch.searcher.QueryFilter)
2007-07-31 08:51:08,086 WARN  regex.RegexURLNormalizer - can't find rules for
scope 'crawldb', using default
2007-07-31 08:52:07,809 INFO  plugin.PluginRepository - Plugins: looking in:
/usr/local/nutch-2007-06-27_06-52-44/plugins
2007-07-31 08:52:07,899 INFO  plugin.PluginRepository - Plugin Auto-activation
mode: [true]
2007-07-31 08:52:07,899 INFO  plugin.PluginRepository - Registered Plugins:
2007-07-31 08:52:07,899 INFO  plugin.PluginRepository -         CyberNeko HTML 
Parser
(lib-nekohtml)
2007-07-31 08:52:07,899 INFO  plugin.PluginRepository -         Site Query 
Filter
(query-site)
2007-07-31 08:52:07,899 INFO  plugin.PluginRepository -         Basic URL 
Normalizer
(urlnormalizer-basic)
2007-07-31 08:52:07,899 INFO  plugin.PluginRepository -         Html Parse 
Plug-in
(parse-html)
2007-07-31 08:52:07,899 INFO  plugin.PluginRepository -         Pass-through URL
Normalizer (urlnormalizer-pass)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository -         Regex URL Filter
Framework (lib-regex-filter)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository -         Feed 
Parse/Index/Query
Plug-in (feed)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository -         Basic Indexing 
Filter
(index-basic)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository -         Basic Summarizer
Plug-in (summary-basic)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository -         Text Parse 
Plug-in
(parse-text)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository -         JavaScript 
Parser
(parse-js)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository -         Basic Query 
Filter
(query-basic)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository -         Regex URL Filter
(urlfilter-regex)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository -         HTTP Framework
(lib-http)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository -         XML Libraries
(lib-xml)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository -         URL Query Filter
(query-url)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository -         Regex URL 
Normalizer
(urlnormalizer-regex)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository -         Http Protocol 
Plug-in
(protocol-http)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository -         the nutch core
extension points (nutch-extensionpoints)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository -         OPIC Scoring 
Plug-in
(scoring-opic)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository - Registered
Extension-Points:
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository -         Nutch Summarizer
(org.apache.nutch.searcher.Summarizer)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository -         Nutch Scoring
(org.apache.nutch.scoring.ScoringFilter)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository -         Nutch Protocol
(org.apache.nutch.protocol.Protocol)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository -         Nutch URL 
Normalizer
(org.apache.nutch.net.URLNormalizer)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository -         Nutch URL Filter
(org.apache.nutch.net.URLFilter)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository -         HTML Parse 
Filter
(org.apache.nutch.parse.HtmlParseFilter)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository -         Nutch Online 
Search
Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository -         Nutch Indexing 
Filter
(org.apache.nutch.indexer.IndexingFilter)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository -         Nutch Content 
Parser
(org.apache.nutch.parse.Parser)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository -         Ontology Model 
Loader
(org.apache.nutch.ontology.Ontology)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository -         Nutch Analysis
(org.apache.nutch.analysis.NutchAnalyzer)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository -         Nutch Query 
Filter
(org.apache.nutch.searcher.QueryFilter)
2007-07-31 08:52:07,923 WARN  regex.RegexURLNormalizer - can't find rules for
scope 'crawldb', using default
2007-07-31 08:53:25,575 INFO  plugin.PluginRepository - Plugins: looking in:
/usr/local/nutch-2007-06-27_06-52-44/plugins
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository - Plugin Auto-activation
mode: [true]
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository - Registered Plugins:
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository -         CyberNeko HTML 
Parser
(lib-nekohtml)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository -         Site Query 
Filter
(query-site)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository -         Basic URL 
Normalizer
(urlnormalizer-basic)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository -         Html Parse 
Plug-in
(parse-html)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository -         Pass-through URL
Normalizer (urlnormalizer-pass)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository -         Regex URL Filter
Framework (lib-regex-filter)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository -         Feed 
Parse/Index/Query
Plug-in (feed)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository -         Basic Indexing 
Filter
(index-basic)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository -         Basic Summarizer
Plug-in (summary-basic)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository -         Text Parse 
Plug-in
(parse-text)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository -         JavaScript 
Parser
(parse-js)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository -         Basic Query 
Filter
(query-basic)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository -         Regex URL Filter
(urlfilter-regex)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository -         HTTP Framework
(lib-http)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository -         XML Libraries
(lib-xml)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository -         URL Query Filter
(query-url)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository -         Regex URL 
Normalizer
(urlnormalizer-regex)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository -         Http Protocol 
Plug-in
(protocol-http)



       
____________________________________________________________________________________
Need a vacation? Get great deals
to amazing places on Yahoo! Travel.
http://travel.yahoo.com/

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to