2007-01-29 15:48:49,844 INFO conf.Configuration (Configuration.java:loadResource(397)) - parsing jar:file:/E:/work/digibot_news/lib/hadoop-0.4.0-patched.jar!/hadoop-default. xml
2007-01-29 15:48:50,079 INFO conf.Configuration (Configuration.java:loadResource(397)) - parsing file:/E:/work/digibot_news/build_tmp/nutch-default.xml 2007-01-29 15:48:50,173 INFO conf.Configuration (Configuration.java:loadResource(397)) - parsing file:/E:/work/digibot_news/build_tmp/nutch-site.xml 2007-01-29 15:48:50,204 INFO conf.Configuration (Configuration.java:loadResource(397)) - parsing file:/E:/work/digibot_news/build_tmp/hadoop-site.xml 2007-01-29 15:48:50,219 INFO plugin.PluginRepository (PluginManifestParser. java:parsePluginFolder(81)) - Plugins: looking in: E:\work\digibot_news\plugins 2007-01-29 15:48:50,641 WARN plugin.PluginRepository (PluginManifestParser. java:parsePluginFolder(102)) - java.io.FileNotFoundException: E:\work\digibot_news\plugins\parse-xml\plugin.xml (系统找不到指定的文件。) 2007-01-29 15:48:50,907 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(333)) - Plugin Auto-activation mode: [true] 2007-01-29 15:48:50,907 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(334)) - Registered Plugins: 2007-01-29 15:48:50,907 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(341)) - CyberNeko HTML Parser (lib-nekohtml) 2007-01-29 15:48:50,907 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(341)) - Site Query Filter (query-site) 2007-01-29 15:48:50,907 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(341)) - Html Parse Plug-in (parse-html) 2007-01-29 15:48:50,907 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(341)) - Jakarta Commons HTTP Client (lib-commons-httpclient) 2007-01-29 15:48:50,907 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(341)) - Regex URL Filter Framework (lib-regex-filter) 2007-01-29 15:48:50,923 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(341)) - Basic Indexing Filter (index-basic) 2007-01-29 15:48:50,923 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(341)) - Basic Summarizer Plug-in (summary-basic) 2007-01-29 15:48:50,923 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(341)) - File Protocol Plug-in (protocol-file) 2007-01-29 15:48:50,923 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(341)) - Text Parse Plug-in (parse-text) 2007-01-29 15:48:50,923 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(341)) - JavaScript Parser (parse-js) 2007-01-29 15:48:50,923 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(341)) - Regex URL Filter (urlfilter-regex) 2007-01-29 15:48:50,938 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(341)) - Basic Query Filter (query-basic) 2007-01-29 15:48:50,938 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(341)) - XML Libraries (lib-xml) 2007-01-29 15:48:50,938 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(341)) - HTTP Framework (lib-http) 2007-01-29 15:48:50,938 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(341)) - URL Query Filter (query-url) 2007-01-29 15:48:50,938 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(341)) - Log4j (lib-log4j) 2007-01-29 15:48:50,938 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(341)) - Zip Parse Plug-in (parse-zip) 2007-01-29 15:48:50,938 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(341)) - Http Protocol Plug-in (protocol-http) 2007-01-29 15:48:50,938 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(341)) - RSS Parse Plug-in (parse-rss) 2007-01-29 15:48:50,938 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(341)) - the nutch core extension points (nutch-extensionpoints) 2007-01-29 15:48:50,938 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(341)) - OPIC Scoring Plug-in (scoring-opic) 2007-01-29 15:48:50,954 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(345)) - Registered Extension-Points: 2007-01-29 15:48:50,954 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(352)) - Nutch Summarizer (org.apache. nutch.searcher.Summarizer) 2007-01-29 15:48:50,954 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(352)) - Nutch Scoring (org.apache.nutch.scoring.ScoringFilter) 2007-01-29 15:48:50,954 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(352)) - Nutch Protocol (org.apache.nutch.protocol.Protocol) 2007-01-29 15:48:50,954 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(352)) - Nutch URL Filter (org.apache. nutch.net.URLFilter) 2007-01-29 15:48:51,032 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(352)) - HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter) 2007-01-29 15:48:51,032 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(352)) - Nutch Online Search Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer) 2007-01-29 15:48:51,094 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(352)) - Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter) 2007-01-29 15:48:51,094 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(352)) - Nutch Content Parser (org.apache.nutch.parse.Parser) 2007-01-29 15:48:51,094 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(352)) - Ontology Model Loader (org.apache.nutch.ontology.Ontology) 2007-01-29 15:48:51,094 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(352)) - Nutch Analysis (org.apache.nutch.analysis.NutchAnalyzer) 2007-01-29 15:48:51,110 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(352)) - Nutch Query Filter (org.apache.nutch.searcher.QueryFilter) 2007-01-29 15:48:51,516 INFO conf.Configuration (Configuration.java:getConfResourceAsInputStream(340)) - found resource parse-plugins.xml at file:/E:/work/digibot_news/build_tmp/parse-plugins.xml 2007-01-29 15:48:51,751 WARN parse.ParseUtil (ParseUtil.java:parseByExtensionId(126)) - No suitable parser found when trying to parse content url: file:/E:/work/digibot_news/xmltest.xml base: file:/E:/work/digibot_news/xmltest.xml contentType: text/xml metadata: Content-Length=168 Content-Type= Last-Modified=Mon, 29 Jan 2007 05:47:16 GMT Content: <?xml version="1.0" encoding="UTF-8"?> <dc xmlns:dc="http://purl.org/dc/elements/1.1/"> <dc:title>XMLParser parse XML data using namespace and XPath</dc:title> </dc>