I logged in to check my crawl this morning and noticed that fetching seemed frozen. Console output was showing an exception. I had gotten that yesterday too. But today I thought I would let it run. I logged back in a while later and I noticed it had recovered. My login was fortuitous because an inspection of the whole hadoop.log file revealed only that one gap:
2007-07-31 07:47:10,003 WARN regex.RegexURLNormalizer - can't find rules for scope 'outlink', using default 2007-07-31 08:47:59,528 INFO fetcher.Fetcher - Fetcher: done (A bigger chunk of the log surrounding this gap follows.) That's a gap of one hour and 49 seconds. What would cause nutch to freeze up like that for a whole hour? Here's the command I used to run the crawl: $ nohup time nutch crawl /usr/tmp/urls.txt -dir /usr/tmp/85sites -threads 20 -depth 10 -topN 103103 I'm using the nightly build nutch-2007-06-27_06-52-44. At the recommendation of LE QuocAnh in regard to my problem from yesterday I decreased the number of threads from 200 to 20: http://www.mail-archive.com/[EMAIL PROTECTED]/msg08985.html Here is a larger chunk of hadoop.log: 2007-07-31 07:43:01,332 INFO fetcher.Fetcher - fetching http://blogs.ign.com/sng-ign/ 2007-07-31 07:43:01,459 INFO fetcher.Fetcher - fetching http://www.mediarights.org/film/the_rules_of_the_game.php 2007-07-31 07:45:07,869 WARN parse.ParseUtil - No suitable parser found when trying to parse content http://www.fest21.com/_textimage/image/1185865273 of type image/png 2007-07-31 07:45:07,869 WARN fetcher.Fetcher - Error parsing: http://www.fest21.com/_textimage/image/1185865273: org.apache.nutch.parse.ParseException: parser not found for contentType=image/png url=http://www.fest21.com/_textimage/image/1185865273 at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:76) at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:309) at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:154) 2007-07-31 07:46:28,395 WARN parse.ParseUtil - No suitable parser found when trying to parse content http://www.fest21.com/_textimage/image/1185864507 of type image/png 2007-07-31 07:46:28,395 WARN fetcher.Fetcher - Error parsing: http://www.fest21.com/_textimage/image/1185864507: org.apache.nutch.parse.ParseException: parser not found for contentType=image/png url=http://www.fest21.com/_textimage/image/1185864507 at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:76) at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:309) at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:154) 2007-07-31 07:47:08,977 INFO plugin.PluginRepository - Plugins: looking in: /usr/local/nutch-2007-06-27_06-52-44/plugins 2007-07-31 07:47:09,274 INFO plugin.PluginRepository - Plugin Auto-activation mode: [true] 2007-07-31 07:47:09,274 INFO plugin.PluginRepository - Registered Plugins: 2007-07-31 07:47:09,274 INFO plugin.PluginRepository - CyberNeko HTML Parser (lib-nekohtml) 2007-07-31 07:47:09,274 INFO plugin.PluginRepository - Site Query Filter (query-site) 2007-07-31 07:47:09,274 INFO plugin.PluginRepository - Basic URL Normalizer (urlnormalizer-basic) 2007-07-31 07:47:09,274 INFO plugin.PluginRepository - Html Parse Plug-in (parse-html) 2007-07-31 07:47:09,274 INFO plugin.PluginRepository - Pass-through URL Normalizer (urlnormalizer-pass) 2007-07-31 07:47:09,275 INFO plugin.PluginRepository - Regex URL Filter Framework (lib-regex-filter) 2007-07-31 07:47:09,275 INFO plugin.PluginRepository - Feed Parse/Index/Query Plug-in (feed) 2007-07-31 07:47:09,275 INFO plugin.PluginRepository - Basic Indexing Filter (index-basic) 2007-07-31 07:47:09,275 INFO plugin.PluginRepository - Basic Summarizer Plug-in (summary-basic) 2007-07-31 07:47:09,275 INFO plugin.PluginRepository - Text Parse Plug-in (parse-text) 2007-07-31 07:47:09,275 INFO plugin.PluginRepository - JavaScript Parser (parse-js) 2007-07-31 07:47:09,275 INFO plugin.PluginRepository - Basic Query Filter (query-basic) 2007-07-31 07:47:09,275 INFO plugin.PluginRepository - Regex URL Filter (urlfilter-regex) 2007-07-31 07:47:09,275 INFO plugin.PluginRepository - HTTP Framework (lib-http) 2007-07-31 07:47:09,275 INFO plugin.PluginRepository - XML Libraries (lib-xml) 2007-07-31 07:47:09,275 INFO plugin.PluginRepository - URL Query Filter (query-url) 2007-07-31 07:47:09,275 INFO plugin.PluginRepository - Regex URL Normalizer (urlnormalizer-regex) 2007-07-31 07:47:09,275 INFO plugin.PluginRepository - Http Protocol Plug-in (protocol-http) 2007-07-31 07:47:09,275 INFO plugin.PluginRepository - the nutch core extension points (nutch-extensionpoints) 2007-07-31 07:47:09,275 INFO plugin.PluginRepository - OPIC Scoring Plug-in (scoring-opic) 2007-07-31 07:47:09,275 INFO plugin.PluginRepository - Registered Extension-Points: 2007-07-31 07:47:09,275 INFO plugin.PluginRepository - Nutch Summarizer (org.apache.nutch.searcher.Summarizer) 2007-07-31 07:47:09,275 INFO plugin.PluginRepository - Nutch Scoring (org.apache.nutch.scoring.ScoringFilter) 2007-07-31 07:47:09,275 INFO plugin.PluginRepository - Nutch Protocol (org.apache.nutch.protocol.Protocol) 2007-07-31 07:47:09,275 INFO plugin.PluginRepository - Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer) 2007-07-31 07:47:09,275 INFO plugin.PluginRepository - Nutch URL Filter (org.apache.nutch.net.URLFilter) 2007-07-31 07:47:09,275 INFO plugin.PluginRepository - HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter) 2007-07-31 07:47:09,275 INFO plugin.PluginRepository - Nutch Online Search Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer) 2007-07-31 07:47:09,275 INFO plugin.PluginRepository - Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter) 2007-07-31 07:47:09,275 INFO plugin.PluginRepository - Nutch Content Parser (org.apache.nutch.parse.Parser) 2007-07-31 07:47:09,275 INFO plugin.PluginRepository - Ontology Model Loader (org.apache.nutch.ontology.Ontology) 2007-07-31 07:47:09,275 INFO plugin.PluginRepository - Nutch Analysis (org.apache.nutch.analysis.NutchAnalyzer) 2007-07-31 07:47:09,276 INFO plugin.PluginRepository - Nutch Query Filter (org.apache.nutch.searcher.QueryFilter) 2007-07-31 07:47:10,003 WARN regex.RegexURLNormalizer - can't find rules for scope 'outlink', using default 2007-07-31 08:47:59,528 INFO fetcher.Fetcher - Fetcher: done 2007-07-31 08:47:59,530 INFO crawl.CrawlDb - CrawlDb update: starting 2007-07-31 08:47:59,530 INFO crawl.CrawlDb - CrawlDb update: db: /usr/tmp/85sites/crawldb 2007-07-31 08:47:59,530 INFO crawl.CrawlDb - CrawlDb update: segments: [/usr/tmp/85sites/segments/20070731002418] 2007-07-31 08:47:59,530 INFO crawl.CrawlDb - CrawlDb update: additions allowed: true 2007-07-31 08:47:59,530 INFO crawl.CrawlDb - CrawlDb update: URL normalizing: true 2007-07-31 08:47:59,530 INFO crawl.CrawlDb - CrawlDb update: URL filtering: true 2007-07-31 08:47:59,559 INFO crawl.CrawlDb - CrawlDb update: Merging segment data into db. 2007-07-31 08:48:08,626 INFO plugin.PluginRepository - Plugins: looking in: /usr/local/nutch-2007-06-27_06-52-44/plugins 2007-07-31 08:48:08,713 INFO plugin.PluginRepository - Plugin Auto-activation mode: [true] 2007-07-31 08:48:08,713 INFO plugin.PluginRepository - Registered Plugins: 2007-07-31 08:48:08,713 INFO plugin.PluginRepository - CyberNeko HTML Parser (lib-nekohtml) 2007-07-31 08:48:08,713 INFO plugin.PluginRepository - Site Query Filter (query-site) 2007-07-31 08:48:08,713 INFO plugin.PluginRepository - Basic URL Normalizer (urlnormalizer-basic) 2007-07-31 08:48:08,713 INFO plugin.PluginRepository - Html Parse Plug-in (parse-html) 2007-07-31 08:48:08,713 INFO plugin.PluginRepository - Pass-through URL Normalizer (urlnormalizer-pass) 2007-07-31 08:48:08,713 INFO plugin.PluginRepository - Regex URL Filter Framework (lib-regex-filter) 2007-07-31 08:48:08,713 INFO plugin.PluginRepository - Feed Parse/Index/Query Plug-in (feed) 2007-07-31 08:48:08,713 INFO plugin.PluginRepository - Basic Indexing Filter (index-basic) 2007-07-31 08:48:08,713 INFO plugin.PluginRepository - Basic Summarizer Plug-in (summary-basic) 2007-07-31 08:48:08,713 INFO plugin.PluginRepository - Text Parse Plug-in (parse-text) 2007-07-31 08:48:08,713 INFO plugin.PluginRepository - JavaScript Parser (parse-js) 2007-07-31 08:48:08,714 INFO plugin.PluginRepository - Basic Query Filter (query-basic) 2007-07-31 08:48:08,714 INFO plugin.PluginRepository - Regex URL Filter (urlfilter-regex) 2007-07-31 08:48:08,714 INFO plugin.PluginRepository - HTTP Framework (lib-http) 2007-07-31 08:48:08,714 INFO plugin.PluginRepository - XML Libraries (lib-xml) 2007-07-31 08:48:08,714 INFO plugin.PluginRepository - URL Query Filter (query-url) 2007-07-31 08:48:08,714 INFO plugin.PluginRepository - Regex URL Normalizer (urlnormalizer-regex) 2007-07-31 08:48:08,714 INFO plugin.PluginRepository - Http Protocol Plug-in (protocol-http) 2007-07-31 08:48:08,714 INFO plugin.PluginRepository - the nutch core extension points (nutch-extensionpoints) 2007-07-31 08:48:08,714 INFO plugin.PluginRepository - OPIC Scoring Plug-in (scoring-opic) 2007-07-31 08:48:08,714 INFO plugin.PluginRepository - Registered Extension-Points: 2007-07-31 08:48:08,714 INFO plugin.PluginRepository - Nutch Summarizer (org.apache.nutch.searcher.Summarizer) 2007-07-31 08:48:08,714 INFO plugin.PluginRepository - Nutch Scoring (org.apache.nutch.scoring.ScoringFilter) 2007-07-31 08:48:08,714 INFO plugin.PluginRepository - Nutch Protocol (org.apache.nutch.protocol.Protocol) 2007-07-31 08:48:08,714 INFO plugin.PluginRepository - Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer) 2007-07-31 08:48:08,714 INFO plugin.PluginRepository - Nutch URL Filter (org.apache.nutch.net.URLFilter) 2007-07-31 08:48:08,714 INFO plugin.PluginRepository - HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter) 2007-07-31 08:48:08,714 INFO plugin.PluginRepository - Nutch Online Search Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer) 2007-07-31 08:48:08,714 INFO plugin.PluginRepository - Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter) 2007-07-31 08:48:08,714 INFO plugin.PluginRepository - Nutch Content Parser (org.apache.nutch.parse.Parser) 2007-07-31 08:48:08,714 INFO plugin.PluginRepository - Ontology Model Loader (org.apache.nutch.ontology.Ontology) 2007-07-31 08:48:08,714 INFO plugin.PluginRepository - Nutch Analysis (org.apache.nutch.analysis.NutchAnalyzer) 2007-07-31 08:48:08,714 INFO plugin.PluginRepository - Nutch Query Filter (org.apache.nutch.searcher.QueryFilter) 2007-07-31 08:48:08,737 WARN regex.RegexURLNormalizer - can't find rules for scope 'crawldb', using default 2007-07-31 08:48:21,597 INFO plugin.PluginRepository - Plugins: looking in: /usr/local/nutch-2007-06-27_06-52-44/plugins 2007-07-31 08:48:21,682 INFO plugin.PluginRepository - Plugin Auto-activation mode: [true] 2007-07-31 08:48:21,682 INFO plugin.PluginRepository - Registered Plugins: 2007-07-31 08:48:21,682 INFO plugin.PluginRepository - CyberNeko HTML Parser (lib-nekohtml) 2007-07-31 08:48:21,682 INFO plugin.PluginRepository - Site Query Filter (query-site) 2007-07-31 08:48:21,682 INFO plugin.PluginRepository - Basic URL Normalizer (urlnormalizer-basic) 2007-07-31 08:48:21,682 INFO plugin.PluginRepository - Html Parse Plug-in (parse-html) 2007-07-31 08:48:21,682 INFO plugin.PluginRepository - Pass-through URL Normalizer (urlnormalizer-pass) 2007-07-31 08:48:21,682 INFO plugin.PluginRepository - Regex URL Filter Framework (lib-regex-filter) 2007-07-31 08:48:21,682 INFO plugin.PluginRepository - Feed Parse/Index/Query Plug-in (feed) 2007-07-31 08:48:21,682 INFO plugin.PluginRepository - Basic Indexing Filter (index-basic) 2007-07-31 08:48:21,683 INFO plugin.PluginRepository - Basic Summarizer Plug-in (summary-basic) 2007-07-31 08:48:21,683 INFO plugin.PluginRepository - Text Parse Plug-in (parse-text) 2007-07-31 08:48:21,683 INFO plugin.PluginRepository - JavaScript Parser (parse-js) 2007-07-31 08:48:21,683 INFO plugin.PluginRepository - Basic Query Filter (query-basic) 2007-07-31 08:48:21,683 INFO plugin.PluginRepository - Regex URL Filter (urlfilter-regex) 2007-07-31 08:48:21,683 INFO plugin.PluginRepository - HTTP Framework (lib-http) 2007-07-31 08:48:21,683 INFO plugin.PluginRepository - XML Libraries (lib-xml) 2007-07-31 08:48:21,683 INFO plugin.PluginRepository - URL Query Filter (query-url) 2007-07-31 08:48:21,683 INFO plugin.PluginRepository - Regex URL Normalizer (urlnormalizer-regex) 2007-07-31 08:48:21,683 INFO plugin.PluginRepository - Http Protocol Plug-in (protocol-http) 2007-07-31 08:48:21,683 INFO plugin.PluginRepository - the nutch core extension points (nutch-extensionpoints) 2007-07-31 08:48:21,683 INFO plugin.PluginRepository - OPIC Scoring Plug-in (scoring-opic) 2007-07-31 08:48:21,683 INFO plugin.PluginRepository - Registered Extension-Points: 2007-07-31 08:48:21,683 INFO plugin.PluginRepository - Nutch Summarizer (org.apache.nutch.searcher.Summarizer) 2007-07-31 08:48:21,683 INFO plugin.PluginRepository - Nutch Scoring (org.apache.nutch.scoring.ScoringFilter) 2007-07-31 08:48:21,683 INFO plugin.PluginRepository - Nutch Protocol (org.apache.nutch.protocol.Protocol) 2007-07-31 08:48:21,683 INFO plugin.PluginRepository - Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer) 2007-07-31 08:48:21,683 INFO plugin.PluginRepository - Nutch URL Filter (org.apache.nutch.net.URLFilter) 2007-07-31 08:48:21,683 INFO plugin.PluginRepository - HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter) 2007-07-31 08:48:21,683 INFO plugin.PluginRepository - Nutch Online Search Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer) 2007-07-31 08:48:21,683 INFO plugin.PluginRepository - Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter) 2007-07-31 08:48:21,683 INFO plugin.PluginRepository - Nutch Content Parser (org.apache.nutch.parse.Parser) 2007-07-31 08:48:21,683 INFO plugin.PluginRepository - Ontology Model Loader (org.apache.nutch.ontology.Ontology) 2007-07-31 08:48:21,683 INFO plugin.PluginRepository - Nutch Analysis (org.apache.nutch.analysis.NutchAnalyzer) 2007-07-31 08:48:21,683 INFO plugin.PluginRepository - Nutch Query Filter (org.apache.nutch.searcher.QueryFilter) 2007-07-31 08:48:21,706 WARN regex.RegexURLNormalizer - can't find rules for scope 'crawldb', using default 2007-07-31 08:48:34,141 INFO plugin.PluginRepository - Plugins: looking in: /usr/local/nutch-2007-06-27_06-52-44/plugins 2007-07-31 08:48:34,239 INFO plugin.PluginRepository - Plugin Auto-activation mode: [true] 2007-07-31 08:48:34,239 INFO plugin.PluginRepository - Registered Plugins: 2007-07-31 08:48:34,239 INFO plugin.PluginRepository - CyberNeko HTML Parser (lib-nekohtml) 2007-07-31 08:48:34,239 INFO plugin.PluginRepository - Site Query Filter (query-site) 2007-07-31 08:48:34,239 INFO plugin.PluginRepository - Basic URL Normalizer (urlnormalizer-basic) 2007-07-31 08:48:34,239 INFO plugin.PluginRepository - Html Parse Plug-in (parse-html) 2007-07-31 08:48:34,240 INFO plugin.PluginRepository - Pass-through URL Normalizer (urlnormalizer-pass) 2007-07-31 08:48:34,240 INFO plugin.PluginRepository - Regex URL Filter Framework (lib-regex-filter) 2007-07-31 08:48:34,240 INFO plugin.PluginRepository - Feed Parse/Index/Query Plug-in (feed) 2007-07-31 08:48:34,240 INFO plugin.PluginRepository - Basic Indexing Filter (index-basic) 2007-07-31 08:48:34,240 INFO plugin.PluginRepository - Basic Summarizer Plug-in (summary-basic) 2007-07-31 08:48:34,240 INFO plugin.PluginRepository - Text Parse Plug-in (parse-text) 2007-07-31 08:48:34,240 INFO plugin.PluginRepository - JavaScript Parser (parse-js) 2007-07-31 08:48:34,240 INFO plugin.PluginRepository - Basic Query Filter (query-basic) 2007-07-31 08:48:34,240 INFO plugin.PluginRepository - Regex URL Filter (urlfilter-regex) 2007-07-31 08:48:34,240 INFO plugin.PluginRepository - HTTP Framework (lib-http) 2007-07-31 08:48:34,240 INFO plugin.PluginRepository - XML Libraries (lib-xml) 2007-07-31 08:48:34,240 INFO plugin.PluginRepository - URL Query Filter (query-url) 2007-07-31 08:48:34,240 INFO plugin.PluginRepository - Regex URL Normalizer (urlnormalizer-regex) 2007-07-31 08:48:34,240 INFO plugin.PluginRepository - Http Protocol Plug-in (protocol-http) 2007-07-31 08:48:34,240 INFO plugin.PluginRepository - the nutch core extension points (nutch-extensionpoints) 2007-07-31 08:48:34,240 INFO plugin.PluginRepository - OPIC Scoring Plug-in (scoring-opic) 2007-07-31 08:48:34,240 INFO plugin.PluginRepository - Registered Extension-Points: 2007-07-31 08:48:34,240 INFO plugin.PluginRepository - Nutch Summarizer (org.apache.nutch.searcher.Summarizer) 2007-07-31 08:48:34,240 INFO plugin.PluginRepository - Nutch Scoring (org.apache.nutch.scoring.ScoringFilter) 2007-07-31 08:48:34,240 INFO plugin.PluginRepository - Nutch Protocol (org.apache.nutch.protocol.Protocol) 2007-07-31 08:48:34,240 INFO plugin.PluginRepository - Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer) 2007-07-31 08:48:34,240 INFO plugin.PluginRepository - Nutch URL Filter (org.apache.nutch.net.URLFilter) 2007-07-31 08:48:34,240 INFO plugin.PluginRepository - HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter) 2007-07-31 08:48:34,240 INFO plugin.PluginRepository - Nutch Online Search Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer) 2007-07-31 08:48:34,240 INFO plugin.PluginRepository - Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter) 2007-07-31 08:48:34,240 INFO plugin.PluginRepository - Nutch Content Parser (org.apache.nutch.parse.Parser) 2007-07-31 08:48:34,240 INFO plugin.PluginRepository - Ontology Model Loader (org.apache.nutch.ontology.Ontology) 2007-07-31 08:48:34,240 INFO plugin.PluginRepository - Nutch Analysis (org.apache.nutch.analysis.NutchAnalyzer) 2007-07-31 08:48:34,240 INFO plugin.PluginRepository - Nutch Query Filter (org.apache.nutch.searcher.QueryFilter) 2007-07-31 08:48:34,264 WARN regex.RegexURLNormalizer - can't find rules for scope 'crawldb', using default 2007-07-31 08:50:03,544 INFO plugin.PluginRepository - Plugins: looking in: /usr/local/nutch-2007-06-27_06-52-44/plugins 2007-07-31 08:50:03,632 INFO plugin.PluginRepository - Plugin Auto-activation mode: [true] 2007-07-31 08:50:03,632 INFO plugin.PluginRepository - Registered Plugins: 2007-07-31 08:50:03,632 INFO plugin.PluginRepository - CyberNeko HTML Parser (lib-nekohtml) 2007-07-31 08:50:03,632 INFO plugin.PluginRepository - Site Query Filter (query-site) 2007-07-31 08:50:03,632 INFO plugin.PluginRepository - Basic URL Normalizer (urlnormalizer-basic) 2007-07-31 08:50:03,632 INFO plugin.PluginRepository - Html Parse Plug-in (parse-html) 2007-07-31 08:50:03,632 INFO plugin.PluginRepository - Pass-through URL Normalizer (urlnormalizer-pass) 2007-07-31 08:50:03,632 INFO plugin.PluginRepository - Regex URL Filter Framework (lib-regex-filter) 2007-07-31 08:50:03,632 INFO plugin.PluginRepository - Feed Parse/Index/Query Plug-in (feed) 2007-07-31 08:50:03,632 INFO plugin.PluginRepository - Basic Indexing Filter (index-basic) 2007-07-31 08:50:03,632 INFO plugin.PluginRepository - Basic Summarizer Plug-in (summary-basic) 2007-07-31 08:50:03,632 INFO plugin.PluginRepository - Text Parse Plug-in (parse-text) 2007-07-31 08:50:03,632 INFO plugin.PluginRepository - JavaScript Parser (parse-js) 2007-07-31 08:50:03,632 INFO plugin.PluginRepository - Basic Query Filter (query-basic) 2007-07-31 08:50:03,632 INFO plugin.PluginRepository - Regex URL Filter (urlfilter-regex) 2007-07-31 08:50:03,632 INFO plugin.PluginRepository - HTTP Framework (lib-http) 2007-07-31 08:50:03,632 INFO plugin.PluginRepository - XML Libraries (lib-xml) 2007-07-31 08:50:03,632 INFO plugin.PluginRepository - URL Query Filter (query-url) 2007-07-31 08:50:03,633 INFO plugin.PluginRepository - Regex URL Normalizer (urlnormalizer-regex) 2007-07-31 08:50:03,633 INFO plugin.PluginRepository - Http Protocol Plug-in (protocol-http) 2007-07-31 08:50:03,633 INFO plugin.PluginRepository - the nutch core extension points (nutch-extensionpoints) 2007-07-31 08:50:03,633 INFO plugin.PluginRepository - OPIC Scoring Plug-in (scoring-opic) 2007-07-31 08:50:03,633 INFO plugin.PluginRepository - Registered Extension-Points: 2007-07-31 08:50:03,633 INFO plugin.PluginRepository - Nutch Summarizer (org.apache.nutch.searcher.Summarizer) 2007-07-31 08:50:03,633 INFO plugin.PluginRepository - Nutch Scoring (org.apache.nutch.scoring.ScoringFilter) 2007-07-31 08:50:03,633 INFO plugin.PluginRepository - Nutch Protocol (org.apache.nutch.protocol.Protocol) 2007-07-31 08:50:03,633 INFO plugin.PluginRepository - Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer) 2007-07-31 08:50:03,633 INFO plugin.PluginRepository - Nutch URL Filter (org.apache.nutch.net.URLFilter) 2007-07-31 08:50:03,633 INFO plugin.PluginRepository - HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter) 2007-07-31 08:50:03,633 INFO plugin.PluginRepository - Nutch Online Search Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer) 2007-07-31 08:50:03,633 INFO plugin.PluginRepository - Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter) 2007-07-31 08:50:03,633 INFO plugin.PluginRepository - Nutch Content Parser (org.apache.nutch.parse.Parser) 2007-07-31 08:50:03,633 INFO plugin.PluginRepository - Ontology Model Loader (org.apache.nutch.ontology.Ontology) 2007-07-31 08:50:03,633 INFO plugin.PluginRepository - Nutch Analysis (org.apache.nutch.analysis.NutchAnalyzer) 2007-07-31 08:50:03,633 INFO plugin.PluginRepository - Nutch Query Filter (org.apache.nutch.searcher.QueryFilter) 2007-07-31 08:50:03,656 WARN regex.RegexURLNormalizer - can't find rules for scope 'crawldb', using default 2007-07-31 08:51:07,967 INFO plugin.PluginRepository - Plugins: looking in: /usr/local/nutch-2007-06-27_06-52-44/plugins 2007-07-31 08:51:08,063 INFO plugin.PluginRepository - Plugin Auto-activation mode: [true] 2007-07-31 08:51:08,063 INFO plugin.PluginRepository - Registered Plugins: 2007-07-31 08:51:08,063 INFO plugin.PluginRepository - CyberNeko HTML Parser (lib-nekohtml) 2007-07-31 08:51:08,063 INFO plugin.PluginRepository - Site Query Filter (query-site) 2007-07-31 08:51:08,063 INFO plugin.PluginRepository - Basic URL Normalizer (urlnormalizer-basic) 2007-07-31 08:51:08,063 INFO plugin.PluginRepository - Html Parse Plug-in (parse-html) 2007-07-31 08:51:08,063 INFO plugin.PluginRepository - Pass-through URL Normalizer (urlnormalizer-pass) 2007-07-31 08:51:08,063 INFO plugin.PluginRepository - Regex URL Filter Framework (lib-regex-filter) 2007-07-31 08:51:08,063 INFO plugin.PluginRepository - Feed Parse/Index/Query Plug-in (feed) 2007-07-31 08:51:08,063 INFO plugin.PluginRepository - Basic Indexing Filter (index-basic) 2007-07-31 08:51:08,063 INFO plugin.PluginRepository - Basic Summarizer Plug-in (summary-basic) 2007-07-31 08:51:08,063 INFO plugin.PluginRepository - Text Parse Plug-in (parse-text) 2007-07-31 08:51:08,063 INFO plugin.PluginRepository - JavaScript Parser (parse-js) 2007-07-31 08:51:08,063 INFO plugin.PluginRepository - Basic Query Filter (query-basic) 2007-07-31 08:51:08,063 INFO plugin.PluginRepository - Regex URL Filter (urlfilter-regex) 2007-07-31 08:51:08,063 INFO plugin.PluginRepository - HTTP Framework (lib-http) 2007-07-31 08:51:08,063 INFO plugin.PluginRepository - XML Libraries (lib-xml) 2007-07-31 08:51:08,063 INFO plugin.PluginRepository - URL Query Filter (query-url) 2007-07-31 08:51:08,063 INFO plugin.PluginRepository - Regex URL Normalizer (urlnormalizer-regex) 2007-07-31 08:51:08,063 INFO plugin.PluginRepository - Http Protocol Plug-in (protocol-http) 2007-07-31 08:51:08,064 INFO plugin.PluginRepository - the nutch core extension points (nutch-extensionpoints) 2007-07-31 08:51:08,064 INFO plugin.PluginRepository - OPIC Scoring Plug-in (scoring-opic) 2007-07-31 08:51:08,064 INFO plugin.PluginRepository - Registered Extension-Points: 2007-07-31 08:51:08,064 INFO plugin.PluginRepository - Nutch Summarizer (org.apache.nutch.searcher.Summarizer) 2007-07-31 08:51:08,064 INFO plugin.PluginRepository - Nutch Scoring (org.apache.nutch.scoring.ScoringFilter) 2007-07-31 08:51:08,064 INFO plugin.PluginRepository - Nutch Protocol (org.apache.nutch.protocol.Protocol) 2007-07-31 08:51:08,064 INFO plugin.PluginRepository - Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer) 2007-07-31 08:51:08,064 INFO plugin.PluginRepository - Nutch URL Filter (org.apache.nutch.net.URLFilter) 2007-07-31 08:51:08,064 INFO plugin.PluginRepository - HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter) 2007-07-31 08:51:08,064 INFO plugin.PluginRepository - Nutch Online Search Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer) 2007-07-31 08:51:08,064 INFO plugin.PluginRepository - Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter) 2007-07-31 08:51:08,064 INFO plugin.PluginRepository - Nutch Content Parser (org.apache.nutch.parse.Parser) 2007-07-31 08:51:08,064 INFO plugin.PluginRepository - Ontology Model Loader (org.apache.nutch.ontology.Ontology) 2007-07-31 08:51:08,064 INFO plugin.PluginRepository - Nutch Analysis (org.apache.nutch.analysis.NutchAnalyzer) 2007-07-31 08:51:08,064 INFO plugin.PluginRepository - Nutch Query Filter (org.apache.nutch.searcher.QueryFilter) 2007-07-31 08:51:08,086 WARN regex.RegexURLNormalizer - can't find rules for scope 'crawldb', using default 2007-07-31 08:52:07,809 INFO plugin.PluginRepository - Plugins: looking in: /usr/local/nutch-2007-06-27_06-52-44/plugins 2007-07-31 08:52:07,899 INFO plugin.PluginRepository - Plugin Auto-activation mode: [true] 2007-07-31 08:52:07,899 INFO plugin.PluginRepository - Registered Plugins: 2007-07-31 08:52:07,899 INFO plugin.PluginRepository - CyberNeko HTML Parser (lib-nekohtml) 2007-07-31 08:52:07,899 INFO plugin.PluginRepository - Site Query Filter (query-site) 2007-07-31 08:52:07,899 INFO plugin.PluginRepository - Basic URL Normalizer (urlnormalizer-basic) 2007-07-31 08:52:07,899 INFO plugin.PluginRepository - Html Parse Plug-in (parse-html) 2007-07-31 08:52:07,899 INFO plugin.PluginRepository - Pass-through URL Normalizer (urlnormalizer-pass) 2007-07-31 08:52:07,900 INFO plugin.PluginRepository - Regex URL Filter Framework (lib-regex-filter) 2007-07-31 08:52:07,900 INFO plugin.PluginRepository - Feed Parse/Index/Query Plug-in (feed) 2007-07-31 08:52:07,900 INFO plugin.PluginRepository - Basic Indexing Filter (index-basic) 2007-07-31 08:52:07,900 INFO plugin.PluginRepository - Basic Summarizer Plug-in (summary-basic) 2007-07-31 08:52:07,900 INFO plugin.PluginRepository - Text Parse Plug-in (parse-text) 2007-07-31 08:52:07,900 INFO plugin.PluginRepository - JavaScript Parser (parse-js) 2007-07-31 08:52:07,900 INFO plugin.PluginRepository - Basic Query Filter (query-basic) 2007-07-31 08:52:07,900 INFO plugin.PluginRepository - Regex URL Filter (urlfilter-regex) 2007-07-31 08:52:07,900 INFO plugin.PluginRepository - HTTP Framework (lib-http) 2007-07-31 08:52:07,900 INFO plugin.PluginRepository - XML Libraries (lib-xml) 2007-07-31 08:52:07,900 INFO plugin.PluginRepository - URL Query Filter (query-url) 2007-07-31 08:52:07,900 INFO plugin.PluginRepository - Regex URL Normalizer (urlnormalizer-regex) 2007-07-31 08:52:07,900 INFO plugin.PluginRepository - Http Protocol Plug-in (protocol-http) 2007-07-31 08:52:07,900 INFO plugin.PluginRepository - the nutch core extension points (nutch-extensionpoints) 2007-07-31 08:52:07,900 INFO plugin.PluginRepository - OPIC Scoring Plug-in (scoring-opic) 2007-07-31 08:52:07,900 INFO plugin.PluginRepository - Registered Extension-Points: 2007-07-31 08:52:07,900 INFO plugin.PluginRepository - Nutch Summarizer (org.apache.nutch.searcher.Summarizer) 2007-07-31 08:52:07,900 INFO plugin.PluginRepository - Nutch Scoring (org.apache.nutch.scoring.ScoringFilter) 2007-07-31 08:52:07,900 INFO plugin.PluginRepository - Nutch Protocol (org.apache.nutch.protocol.Protocol) 2007-07-31 08:52:07,900 INFO plugin.PluginRepository - Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer) 2007-07-31 08:52:07,900 INFO plugin.PluginRepository - Nutch URL Filter (org.apache.nutch.net.URLFilter) 2007-07-31 08:52:07,900 INFO plugin.PluginRepository - HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter) 2007-07-31 08:52:07,900 INFO plugin.PluginRepository - Nutch Online Search Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer) 2007-07-31 08:52:07,900 INFO plugin.PluginRepository - Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter) 2007-07-31 08:52:07,900 INFO plugin.PluginRepository - Nutch Content Parser (org.apache.nutch.parse.Parser) 2007-07-31 08:52:07,900 INFO plugin.PluginRepository - Ontology Model Loader (org.apache.nutch.ontology.Ontology) 2007-07-31 08:52:07,900 INFO plugin.PluginRepository - Nutch Analysis (org.apache.nutch.analysis.NutchAnalyzer) 2007-07-31 08:52:07,900 INFO plugin.PluginRepository - Nutch Query Filter (org.apache.nutch.searcher.QueryFilter) 2007-07-31 08:52:07,923 WARN regex.RegexURLNormalizer - can't find rules for scope 'crawldb', using default 2007-07-31 08:53:25,575 INFO plugin.PluginRepository - Plugins: looking in: /usr/local/nutch-2007-06-27_06-52-44/plugins 2007-07-31 08:53:25,670 INFO plugin.PluginRepository - Plugin Auto-activation mode: [true] 2007-07-31 08:53:25,670 INFO plugin.PluginRepository - Registered Plugins: 2007-07-31 08:53:25,670 INFO plugin.PluginRepository - CyberNeko HTML Parser (lib-nekohtml) 2007-07-31 08:53:25,670 INFO plugin.PluginRepository - Site Query Filter (query-site) 2007-07-31 08:53:25,670 INFO plugin.PluginRepository - Basic URL Normalizer (urlnormalizer-basic) 2007-07-31 08:53:25,670 INFO plugin.PluginRepository - Html Parse Plug-in (parse-html) 2007-07-31 08:53:25,670 INFO plugin.PluginRepository - Pass-through URL Normalizer (urlnormalizer-pass) 2007-07-31 08:53:25,670 INFO plugin.PluginRepository - Regex URL Filter Framework (lib-regex-filter) 2007-07-31 08:53:25,670 INFO plugin.PluginRepository - Feed Parse/Index/Query Plug-in (feed) 2007-07-31 08:53:25,670 INFO plugin.PluginRepository - Basic Indexing Filter (index-basic) 2007-07-31 08:53:25,670 INFO plugin.PluginRepository - Basic Summarizer Plug-in (summary-basic) 2007-07-31 08:53:25,670 INFO plugin.PluginRepository - Text Parse Plug-in (parse-text) 2007-07-31 08:53:25,670 INFO plugin.PluginRepository - JavaScript Parser (parse-js) 2007-07-31 08:53:25,670 INFO plugin.PluginRepository - Basic Query Filter (query-basic) 2007-07-31 08:53:25,670 INFO plugin.PluginRepository - Regex URL Filter (urlfilter-regex) 2007-07-31 08:53:25,670 INFO plugin.PluginRepository - HTTP Framework (lib-http) 2007-07-31 08:53:25,670 INFO plugin.PluginRepository - XML Libraries (lib-xml) 2007-07-31 08:53:25,670 INFO plugin.PluginRepository - URL Query Filter (query-url) 2007-07-31 08:53:25,670 INFO plugin.PluginRepository - Regex URL Normalizer (urlnormalizer-regex) 2007-07-31 08:53:25,670 INFO plugin.PluginRepository - Http Protocol Plug-in (protocol-http) ____________________________________________________________________________________ Need a vacation? Get great deals to amazing places on Yahoo! Travel. http://travel.yahoo.com/ ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
