I can explain the why the "Input path doesn't exist" error disappeared
when you used windows-like paths.

Though I have not used Cygwin since a long time, but I guess your
Cygwin distribution wouldn't be having its own java. So when you
execute bin/nutch, it must be using the java.exe present in one of the
folders of your Windows PATH variable. Now that Java wouldn't know
what /cygdrive/c/nutch is because it is not running as a part of
Cygwin and doesn't use the Cygwin emulation layer. It would need the
Windows like path C:/nutch since it is running as a full-fledged
Windows command.

My description might be technically a little inaccurate but I hope I
have conveyed the basic idea properly.

Regards,
Susam Pal
http://susam.in/

On 7/27/07, Kai_testing Middleton <[EMAIL PROTECTED]> wrote:
> I really wonder if this is some kind of nutch + cygwin error.  Check this 
> out.  I change the paths to windows-like paths (not the cygwin mounted paths 
> -- maybe the /cygwin/c mount point is the problem).  Note that I use forward 
> slashes in the windows-like paths:  I no longer get the "Input path doesnt 
> exist" error, though I still get a failure.
>
> [EMAIL PROTECTED] /cygdrive/c/nutch-2007-07-26_04-01-20/logs
> $ nutch crawl C:/nutch-2007-07-26_04-01-20/content/urls.txt -dir 
> c:/nutch-2007-07-26_04-01-20/content/sf911truth -depth
>  3 -topN 200
> crawl started in: c:/nutch-2007-07-26_04-01-20/content/sf911truth
> rootUrlDir = C:/nutch-2007-07-26_04-01-20/content/urls.txt
> threads = 10
> depth = 3
> topN = 200
> Injector: starting
> Injector: crawlDb: c:/nutch-2007-07-26_04-01-20/content/sf911truth/crawldb
> Injector: urlDir: C:/nutch-2007-07-26_04-01-20/content/urls.txt
> Injector: Converting injected urls to crawl db entries.
> Injector: Merging injected urls into crawl db.
> Injector: done
> Generator: Selecting best-scoring urls due for fetch.
> Generator: starting
> Generator: segment: 
> c:/nutch-2007-07-26_04-01-20/content/sf911truth/segments/20070727003008
> Generator: filtering: false
> Generator: topN: 200
> Generator: jobtracker is 'local', generating exactly one partition.
> Generator: Partitioning selected urls by host, for politeness.
> Generator: done.
> Fetcher: starting
> Fetcher: segment: 
> c:/nutch-2007-07-26_04-01-20/content/sf911truth/segments/20070727003008
> Fetcher: threads: 10
> fetching http://www.sf911truth.org/
> Exception in thread "main" java.io.IOException: Job failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
>         at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:499)
>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:124)
>
> [EMAIL PROTECTED] /cygdrive/c/nutch-2007-07-26_04-01-20/logs
> $ cat hadoop.log
> 2007-07-27 00:30:03,171 INFO  crawl.Crawl - crawl started in: 
> c:/nutch-2007-07-26_04-01-20/content/sf911truth
> 2007-07-27 00:30:03,187 INFO  crawl.Crawl - rootUrlDir = 
> C:/nutch-2007-07-26_04-01-20/content/urls.txt
> 2007-07-27 00:30:03,187 INFO  crawl.Crawl - threads = 10
> 2007-07-27 00:30:03,187 INFO  crawl.Crawl - depth = 3
> 2007-07-27 00:30:03,187 INFO  crawl.Crawl - topN = 200
> 2007-07-27 00:30:03,281 INFO  crawl.Injector - Injector: starting
> 2007-07-27 00:30:03,281 INFO  crawl.Injector - Injector: crawlDb: 
> c:/nutch-2007-07-26_04-01-20/content/sf911truth/crawld
> b
> 2007-07-27 00:30:03,281 INFO  crawl.Injector - Injector: urlDir: 
> C:/nutch-2007-07-26_04-01-20/content/urls.txt
> 2007-07-27 00:30:03,296 INFO  crawl.Injector - Injector: Converting injected 
> urls to crawl db entries.
> 2007-07-27 00:30:04,031 INFO  plugin.PluginRepository - Plugins: looking in: 
> C:\nutch-2007-07-26_04-01-20\plugins
> 2007-07-27 00:30:04,296 INFO  plugin.PluginRepository - Plugin 
> Auto-activation mode: [true]
> 2007-07-27 00:30:04,296 INFO  plugin.PluginRepository - Registered Plugins:
> 2007-07-27 00:30:04,296 INFO  plugin.PluginRepository -         CyberNeko 
> HTML Parser (lib-nekohtml)
> 2007-07-27 00:30:04,296 INFO  plugin.PluginRepository -         Site Query 
> Filter (query-site)
> 2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Basic URL 
> Normalizer (urlnormalizer-basic)
> 2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Html Parse 
> Plug-in (parse-html)
> 2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Pass-through 
> URL Normalizer (urlnormalizer-pass)
> 2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Regex URL 
> Filter Framework (lib-regex-filter)
> 2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Feed 
> Parse/Index/Query Plug-in (feed)
> 2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Basic 
> Indexing Filter (index-basic)
> 2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Basic 
> Summarizer Plug-in (summary-basic)
> 2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Text Parse 
> Plug-in (parse-text)
> 2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         JavaScript 
> Parser (parse-js)
> 2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Basic Query 
> Filter (query-basic)
> 2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Regex URL 
> Filter (urlfilter-regex)
> 2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         HTTP 
> Framework (lib-http)
> 2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         XML Libraries 
> (lib-xml)
> 2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         URL Query 
> Filter (query-url)
> 2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Regex URL 
> Normalizer (urlnormalizer-regex)
> 2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Http Protocol 
> Plug-in (protocol-http)
> 2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         the nutch 
> core extension points (nutch-extensionpoints)
> 2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         OPIC Scoring 
> Plug-in (scoring-opic)
> 2007-07-27 00:30:04,312 INFO  plugin.PluginRepository - Registered 
> Extension-Points:
> 2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Nutch 
> Summarizer (org.apache.nutch.searcher.Summarizer)
> 2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Nutch Scoring 
> (org.apache.nutch.scoring.ScoringFilter)
> 2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Nutch 
> Protocol (org.apache.nutch.protocol.Protocol)
> 2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Nutch URL 
> Normalizer (org.apache.nutch.net.URLNormalizer
> )
> 2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Nutch URL 
> Filter (org.apache.nutch.net.URLFilter)
> 2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         HTML Parse 
> Filter (org.apache.nutch.parse.HtmlParseFilte
> r)
> 2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Nutch Online 
> Search Results Clustering Plugin (org.apach
> e.nutch.clustering.OnlineClusterer)
> 2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Nutch 
> Indexing Filter (org.apache.nutch.indexer.Indexing
> Filter)
> 2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Nutch Content 
> Parser (org.apache.nutch.parse.Parser)
> 2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Ontology 
> Model Loader (org.apache.nutch.ontology.Ontolog
> y)
> 2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Nutch 
> Analysis (org.apache.nutch.analysis.NutchAnalyzer)
>
> 2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Nutch Query 
> Filter (org.apache.nutch.searcher.QueryFilte
> r)
> 2007-07-27 00:30:04,375 WARN  regex.RegexURLNormalizer - can't find rules for 
> scope 'inject', using default
> 2007-07-27 00:30:06,046 INFO  crawl.Injector - Injector: Merging injected 
> urls into crawl db.
> 2007-07-27 00:30:06,640 WARN  util.NativeCodeLoader - Unable to load 
> native-hadoop library for your platform... using bu
> iltin-java classes where applicable
> 2007-07-27 00:30:07,500 INFO  crawl.Injector - Injector: done
> 2007-07-27 00:30:08,500 INFO  crawl.Generator - Generator: Selecting 
> best-scoring urls due for fetch.
> 2007-07-27 00:30:08,500 INFO  crawl.Generator - Generator: starting
> 2007-07-27 00:30:08,500 INFO  crawl.Generator - Generator: segment: 
> c:/nutch-2007-07-26_04-01-20/content/sf911truth/segm
> ents/20070727003008
> 2007-07-27 00:30:08,500 INFO  crawl.Generator - Generator: filtering: false
> 2007-07-27 00:30:08,500 INFO  crawl.Generator - Generator: topN: 200
> 2007-07-27 00:30:08,531 INFO  crawl.Generator - Generator: jobtracker is 
> 'local', generating exactly one partition.
> 2007-07-27 00:30:08,984 INFO  plugin.PluginRepository - Plugins: looking in: 
> C:\nutch-2007-07-26_04-01-20\plugins
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository - Plugin 
> Auto-activation mode: [true]
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository - Registered Plugins:
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         CyberNeko 
> HTML Parser (lib-nekohtml)
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Site Query 
> Filter (query-site)
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Basic URL 
> Normalizer (urlnormalizer-basic)
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Html Parse 
> Plug-in (parse-html)
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Pass-through 
> URL Normalizer (urlnormalizer-pass)
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Regex URL 
> Filter Framework (lib-regex-filter)
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Feed 
> Parse/Index/Query Plug-in (feed)
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Basic 
> Indexing Filter (index-basic)
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Basic 
> Summarizer Plug-in (summary-basic)
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Text Parse 
> Plug-in (parse-text)
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         JavaScript 
> Parser (parse-js)
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Basic Query 
> Filter (query-basic)
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Regex URL 
> Filter (urlfilter-regex)
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         HTTP 
> Framework (lib-http)
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         XML Libraries 
> (lib-xml)
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         URL Query 
> Filter (query-url)
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Regex URL 
> Normalizer (urlnormalizer-regex)
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Http Protocol 
> Plug-in (protocol-http)
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         the nutch 
> core extension points (nutch-extensionpoints)
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         OPIC Scoring 
> Plug-in (scoring-opic)
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository - Registered 
> Extension-Points:
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Nutch 
> Summarizer (org.apache.nutch.searcher.Summarizer)
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Nutch Scoring 
> (org.apache.nutch.scoring.ScoringFilter)
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Nutch 
> Protocol (org.apache.nutch.protocol.Protocol)
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Nutch URL 
> Normalizer (org.apache.nutch.net.URLNormalizer
> )
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Nutch URL 
> Filter (org.apache.nutch.net.URLFilter)
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         HTML Parse 
> Filter (org.apache.nutch.parse.HtmlParseFilte
> r)
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Nutch Online 
> Search Results Clustering Plugin (org.apach
> e.nutch.clustering.OnlineClusterer)
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Nutch 
> Indexing Filter (org.apache.nutch.indexer.Indexing
> Filter)
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Nutch Content 
> Parser (org.apache.nutch.parse.Parser)
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Ontology 
> Model Loader (org.apache.nutch.ontology.Ontolog
> y)
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Nutch 
> Analysis (org.apache.nutch.analysis.NutchAnalyzer)
>
> 2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Nutch Query 
> Filter (org.apache.nutch.searcher.QueryFilte
> r)
> 2007-07-27 00:30:09,218 INFO  crawl.FetchScheduleFactory - Using 
> FetchSchedule impl: org.apache.nutch.crawl.DefaultFetch
> Schedule
> 2007-07-27 00:30:09,218 INFO  crawl.AbstractFetchSchedule - 
> defaultInterval=2592000.0
> 2007-07-27 00:30:09,218 INFO  crawl.AbstractFetchSchedule - 
> maxInterval=7776000.0
> 2007-07-27 00:30:09,234 WARN  regex.RegexURLNormalizer - can't find rules for 
> scope 'partition', using default
> 2007-07-27 00:30:09,296 INFO  plugin.PluginRepository - Plugins: looking in: 
> C:\nutch-2007-07-26_04-01-20\plugins
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository - Plugin 
> Auto-activation mode: [true]
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository - Registered Plugins:
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         CyberNeko 
> HTML Parser (lib-nekohtml)
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Site Query 
> Filter (query-site)
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Basic URL 
> Normalizer (urlnormalizer-basic)
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Html Parse 
> Plug-in (parse-html)
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Pass-through 
> URL Normalizer (urlnormalizer-pass)
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Regex URL 
> Filter Framework (lib-regex-filter)
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Feed 
> Parse/Index/Query Plug-in (feed)
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Basic 
> Indexing Filter (index-basic)
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Basic 
> Summarizer Plug-in (summary-basic)
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Text Parse 
> Plug-in (parse-text)
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         JavaScript 
> Parser (parse-js)
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Basic Query 
> Filter (query-basic)
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Regex URL 
> Filter (urlfilter-regex)
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         HTTP 
> Framework (lib-http)
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         XML Libraries 
> (lib-xml)
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         URL Query 
> Filter (query-url)
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Regex URL 
> Normalizer (urlnormalizer-regex)
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Http Protocol 
> Plug-in (protocol-http)
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         the nutch 
> core extension points (nutch-extensionpoints)
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         OPIC Scoring 
> Plug-in (scoring-opic)
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository - Registered 
> Extension-Points:
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Nutch 
> Summarizer (org.apache.nutch.searcher.Summarizer)
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Nutch Scoring 
> (org.apache.nutch.scoring.ScoringFilter)
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Nutch 
> Protocol (org.apache.nutch.protocol.Protocol)
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Nutch URL 
> Normalizer (org.apache.nutch.net.URLNormalizer
> )
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Nutch URL 
> Filter (org.apache.nutch.net.URLFilter)
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         HTML Parse 
> Filter (org.apache.nutch.parse.HtmlParseFilte
> r)
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Nutch Online 
> Search Results Clustering Plugin (org.apach
> e.nutch.clustering.OnlineClusterer)
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Nutch 
> Indexing Filter (org.apache.nutch.indexer.Indexing
> Filter)
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Nutch Content 
> Parser (org.apache.nutch.parse.Parser)
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Ontology 
> Model Loader (org.apache.nutch.ontology.Ontolog
> y)
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Nutch 
> Analysis (org.apache.nutch.analysis.NutchAnalyzer)
>
> 2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Nutch Query 
> Filter (org.apache.nutch.searcher.QueryFilte
> r)
> 2007-07-27 00:30:09,500 INFO  crawl.FetchScheduleFactory - Using 
> FetchSchedule impl: org.apache.nutch.crawl.DefaultFetch
> Schedule
> 2007-07-27 00:30:09,500 INFO  crawl.AbstractFetchSchedule - 
> defaultInterval=2592000.0
> 2007-07-27 00:30:09,500 INFO  crawl.AbstractFetchSchedule - 
> maxInterval=7776000.0
> 2007-07-27 00:30:10,187 INFO  crawl.Generator - Generator: Partitioning 
> selected urls by host, for politeness.
> 2007-07-27 00:30:10,687 INFO  plugin.PluginRepository - Plugins: looking in: 
> C:\nutch-2007-07-26_04-01-20\plugins
> 2007-07-27 00:30:10,859 INFO  plugin.PluginRepository - Plugin 
> Auto-activation mode: [true]
> 2007-07-27 00:30:10,859 INFO  plugin.PluginRepository - Registered Plugins:
> 2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         CyberNeko 
> HTML Parser (lib-nekohtml)
> 2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Site Query 
> Filter (query-site)
> 2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Basic URL 
> Normalizer (urlnormalizer-basic)
> 2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Html Parse 
> Plug-in (parse-html)
> 2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Pass-through 
> URL Normalizer (urlnormalizer-pass)
> 2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Regex URL 
> Filter Framework (lib-regex-filter)
> 2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Feed 
> Parse/Index/Query Plug-in (feed)
> 2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Basic 
> Indexing Filter (index-basic)
> 2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Basic 
> Summarizer Plug-in (summary-basic)
> 2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Text Parse 
> Plug-in (parse-text)
> 2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         JavaScript 
> Parser (parse-js)
> 2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Basic Query 
> Filter (query-basic)
> 2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Regex URL 
> Filter (urlfilter-regex)
> 2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         HTTP 
> Framework (lib-http)
> 2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         XML Libraries 
> (lib-xml)
> 2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         URL Query 
> Filter (query-url)
> 2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Regex URL 
> Normalizer (urlnormalizer-regex)
> 2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Http Protocol 
> Plug-in (protocol-http)
> 2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         the nutch 
> core extension points (nutch-extensionpoints)
> 2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         OPIC Scoring 
> Plug-in (scoring-opic)
> 2007-07-27 00:30:10,859 INFO  plugin.PluginRepository - Registered 
> Extension-Points:
> 2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Nutch 
> Summarizer (org.apache.nutch.searcher.Summarizer)
> 2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Nutch Scoring 
> (org.apache.nutch.scoring.ScoringFilter)
> 2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Nutch 
> Protocol (org.apache.nutch.protocol.Protocol)
> 2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Nutch URL 
> Normalizer (org.apache.nutch.net.URLNormalizer
> )
> 2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Nutch URL 
> Filter (org.apache.nutch.net.URLFilter)
> 2007-07-27 00:30:10,875 INFO  plugin.PluginRepository -         HTML Parse 
> Filter (org.apache.nutch.parse.HtmlParseFilte
> r)
> 2007-07-27 00:30:10,875 INFO  plugin.PluginRepository -         Nutch Online 
> Search Results Clustering Plugin (org.apach
> e.nutch.clustering.OnlineClusterer)
> 2007-07-27 00:30:10,875 INFO  plugin.PluginRepository -         Nutch 
> Indexing Filter (org.apache.nutch.indexer.Indexing
> Filter)
> 2007-07-27 00:30:10,875 INFO  plugin.PluginRepository -         Nutch Content 
> Parser (org.apache.nutch.parse.Parser)
> 2007-07-27 00:30:10,875 INFO  plugin.PluginRepository -         Ontology 
> Model Loader (org.apache.nutch.ontology.Ontolog
> y)
> 2007-07-27 00:30:10,875 INFO  plugin.PluginRepository -         Nutch 
> Analysis (org.apache.nutch.analysis.NutchAnalyzer)
>
> 2007-07-27 00:30:10,875 INFO  plugin.PluginRepository -         Nutch Query 
> Filter (org.apache.nutch.searcher.QueryFilte
> r)
> 2007-07-27 00:30:10,890 WARN  regex.RegexURLNormalizer - can't find rules for 
> scope 'partition', using default
> 2007-07-27 00:30:11,625 INFO  crawl.Generator - Generator: done.
> 2007-07-27 00:30:11,625 INFO  fetcher.Fetcher - Fetcher: starting
> 2007-07-27 00:30:11,625 INFO  fetcher.Fetcher - Fetcher: segment: 
> c:/nutch-2007-07-26_04-01-20/content/sf911truth/segmen
> ts/20070727003008
> 2007-07-27 00:30:12,078 INFO  fetcher.Fetcher - Fetcher: threads: 10
> 2007-07-27 00:30:12,093 INFO  plugin.PluginRepository - Plugins: looking in: 
> C:\nutch-2007-07-26_04-01-20\plugins
> 2007-07-27 00:30:12,218 INFO  plugin.PluginRepository - Plugin 
> Auto-activation mode: [true]
> 2007-07-27 00:30:12,218 INFO  plugin.PluginRepository - Registered Plugins:
> 2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         CyberNeko 
> HTML Parser (lib-nekohtml)
> 2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Site Query 
> Filter (query-site)
> 2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Basic URL 
> Normalizer (urlnormalizer-basic)
> 2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Html Parse 
> Plug-in (parse-html)
> 2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Pass-through 
> URL Normalizer (urlnormalizer-pass)
> 2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Regex URL 
> Filter Framework (lib-regex-filter)
> 2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Feed 
> Parse/Index/Query Plug-in (feed)
> 2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Basic 
> Indexing Filter (index-basic)
> 2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Basic 
> Summarizer Plug-in (summary-basic)
> 2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Text Parse 
> Plug-in (parse-text)
> 2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         JavaScript 
> Parser (parse-js)
> 2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Basic Query 
> Filter (query-basic)
> 2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Regex URL 
> Filter (urlfilter-regex)
> 2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         HTTP 
> Framework (lib-http)
> 2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         XML Libraries 
> (lib-xml)
> 2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         URL Query 
> Filter (query-url)
> 2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Regex URL 
> Normalizer (urlnormalizer-regex)
> 2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Http Protocol 
> Plug-in (protocol-http)
> 2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         the nutch 
> core extension points (nutch-extensionpoints)
> 2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         OPIC Scoring 
> Plug-in (scoring-opic)
> 2007-07-27 00:30:12,218 INFO  plugin.PluginRepository - Registered 
> Extension-Points:
> 2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Nutch 
> Summarizer (org.apache.nutch.searcher.Summarizer)
> 2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Nutch Scoring 
> (org.apache.nutch.scoring.ScoringFilter)
> 2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Nutch 
> Protocol (org.apache.nutch.protocol.Protocol)
> 2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Nutch URL 
> Normalizer (org.apache.nutch.net.URLNormalizer
> )
> 2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Nutch URL 
> Filter (org.apache.nutch.net.URLFilter)
> 2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         HTML Parse 
> Filter (org.apache.nutch.parse.HtmlParseFilte
> r)
> 2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Nutch Online 
> Search Results Clustering Plugin (org.apach
> e.nutch.clustering.OnlineClusterer)
> 2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Nutch 
> Indexing Filter (org.apache.nutch.indexer.Indexing
> Filter)
> 2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Nutch Content 
> Parser (org.apache.nutch.parse.Parser)
> 2007-07-27 00:30:12,234 INFO  plugin.PluginRepository -         Ontology 
> Model Loader (org.apache.nutch.ontology.Ontolog
> y)
> 2007-07-27 00:30:12,234 INFO  plugin.PluginRepository -         Nutch 
> Analysis (org.apache.nutch.analysis.NutchAnalyzer)
>
> 2007-07-27 00:30:12,234 INFO  plugin.PluginRepository -         Nutch Query 
> Filter (org.apache.nutch.searcher.QueryFilte
> r)
> 2007-07-27 00:30:12,265 INFO  fetcher.Fetcher - fetching 
> http://www.sf911truth.org/
> 2007-07-27 00:30:12,312 FATAL api.RobotRulesParser - Agent we advertise 
> (microlith-nutch) not listed first in 'http.robo
> ts.agents' property!
> 2007-07-27 00:30:12,312 INFO  http.Http - http.proxy.host = null
> 2007-07-27 00:30:12,312 INFO  http.Http - http.proxy.port = 8080
> 2007-07-27 00:30:12,312 INFO  http.Http - http.timeout = 10000
> 2007-07-27 00:30:12,312 INFO  http.Http - http.content.limit = 65536
> 2007-07-27 00:30:12,312 INFO  http.Http - http.agent = 
> microlith-nutch/Nutch-1.0-dev (crawler nutch-2007-07-26_04-01-20;
>  http://hopoo.dyndns.org; kai(underscore)testing(att)yahoo(dotcom))
> 2007-07-27 00:30:12,312 INFO  http.Http - protocol.plugin.check.blocking = 
> true
> 2007-07-27 00:30:12,312 INFO  http.Http - protocol.plugin.check.robots = true
> 2007-07-27 00:30:12,312 INFO  http.Http - fetcher.server.delay = 3000
> 2007-07-27 00:30:12,312 INFO  http.Http - http.max.delays = 100
> 2007-07-27 00:30:13,578 WARN  regex.RegexURLNormalizer - can't find rules for 
> scope 'outlink', using default
> 2007-07-27 00:30:13,640 INFO  crawl.SignatureFactory - Using Signature impl: 
> org.apache.nutch.crawl.MD5Signature
> 2007-07-27 00:30:14,406 INFO  plugin.PluginRepository - Plugins: looking in: 
> C:\nutch-2007-07-26_04-01-20\plugins
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository - Plugin 
> Auto-activation mode: [true]
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository - Registered Plugins:
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         CyberNeko 
> HTML Parser (lib-nekohtml)
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Site Query 
> Filter (query-site)
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Basic URL 
> Normalizer (urlnormalizer-basic)
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Html Parse 
> Plug-in (parse-html)
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Pass-through 
> URL Normalizer (urlnormalizer-pass)
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Regex URL 
> Filter Framework (lib-regex-filter)
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Feed 
> Parse/Index/Query Plug-in (feed)
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Basic 
> Indexing Filter (index-basic)
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Basic 
> Summarizer Plug-in (summary-basic)
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Text Parse 
> Plug-in (parse-text)
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         JavaScript 
> Parser (parse-js)
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Basic Query 
> Filter (query-basic)
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Regex URL 
> Filter (urlfilter-regex)
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         HTTP 
> Framework (lib-http)
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         XML Libraries 
> (lib-xml)
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         URL Query 
> Filter (query-url)
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Regex URL 
> Normalizer (urlnormalizer-regex)
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Http Protocol 
> Plug-in (protocol-http)
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         the nutch 
> core extension points (nutch-extensionpoints)
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         OPIC Scoring 
> Plug-in (scoring-opic)
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository - Registered 
> Extension-Points:
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Nutch 
> Summarizer (org.apache.nutch.searcher.Summarizer)
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Nutch Scoring 
> (org.apache.nutch.scoring.ScoringFilter)
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Nutch 
> Protocol (org.apache.nutch.protocol.Protocol)
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Nutch URL 
> Normalizer (org.apache.nutch.net.URLNormalizer
> )
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Nutch URL 
> Filter (org.apache.nutch.net.URLFilter)
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         HTML Parse 
> Filter (org.apache.nutch.parse.HtmlParseFilte
> r)
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Nutch Online 
> Search Results Clustering Plugin (org.apach
> e.nutch.clustering.OnlineClusterer)
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Nutch 
> Indexing Filter (org.apache.nutch.indexer.Indexing
> Filter)
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Nutch Content 
> Parser (org.apache.nutch.parse.Parser)
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Ontology 
> Model Loader (org.apache.nutch.ontology.Ontolog
> y)
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Nutch 
> Analysis (org.apache.nutch.analysis.NutchAnalyzer)
>
> 2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Nutch Query 
> Filter (org.apache.nutch.searcher.QueryFilte
> r)
> 2007-07-27 00:30:14,718 WARN  mapred.LocalJobRunner - job_8r2j8
> java.lang.IllegalArgumentException: Illegal Capacity: -1
>         at java.util.ArrayList.<init>(ArrayList.java:111)
>         at 
> org.apache.nutch.parse.ParseOutputFormat$1.write(ParseOutputFormat.java:149)
>         at 
> org.apache.nutch.fetcher.FetcherOutputFormat$1.write(FetcherOutputFormat.java:94)
>         at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:311)
>         at 
> org.apache.hadoop.mapred.lib.IdentityReducer.reduce(IdentityReducer.java:41)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:326)
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:155)
>
>
>
>
>
> ____________________________________________________________________________________Ready
>  for the edge of your seat?
> Check out tonight's top picks on Yahoo! TV.
> http://tv.yahoo.com/

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to