I really wonder if this is some kind of nutch + cygwin error.  Check this out.  
I change the paths to windows-like paths (not the cygwin mounted paths -- maybe 
the /cygwin/c mount point is the problem).  Note that I use forward slashes in 
the windows-like paths:  I no longer get the "Input path doesnt exist" error, 
though I still get a failure.

[EMAIL PROTECTED] /cygdrive/c/nutch-2007-07-26_04-01-20/logs
$ nutch crawl C:/nutch-2007-07-26_04-01-20/content/urls.txt -dir 
c:/nutch-2007-07-26_04-01-20/content/sf911truth -depth
 3 -topN 200
crawl started in: c:/nutch-2007-07-26_04-01-20/content/sf911truth
rootUrlDir = C:/nutch-2007-07-26_04-01-20/content/urls.txt
threads = 10
depth = 3
topN = 200
Injector: starting
Injector: crawlDb: c:/nutch-2007-07-26_04-01-20/content/sf911truth/crawldb
Injector: urlDir: C:/nutch-2007-07-26_04-01-20/content/urls.txt
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: done
Generator: Selecting best-scoring urls due for fetch.
Generator: starting
Generator: segment: 
c:/nutch-2007-07-26_04-01-20/content/sf911truth/segments/20070727003008
Generator: filtering: false
Generator: topN: 200
Generator: jobtracker is 'local', generating exactly one partition.
Generator: Partitioning selected urls by host, for politeness.
Generator: done.
Fetcher: starting
Fetcher: segment: 
c:/nutch-2007-07-26_04-01-20/content/sf911truth/segments/20070727003008
Fetcher: threads: 10
fetching http://www.sf911truth.org/
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:499)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:124)

[EMAIL PROTECTED] /cygdrive/c/nutch-2007-07-26_04-01-20/logs
$ cat hadoop.log
2007-07-27 00:30:03,171 INFO  crawl.Crawl - crawl started in: 
c:/nutch-2007-07-26_04-01-20/content/sf911truth
2007-07-27 00:30:03,187 INFO  crawl.Crawl - rootUrlDir = 
C:/nutch-2007-07-26_04-01-20/content/urls.txt
2007-07-27 00:30:03,187 INFO  crawl.Crawl - threads = 10
2007-07-27 00:30:03,187 INFO  crawl.Crawl - depth = 3
2007-07-27 00:30:03,187 INFO  crawl.Crawl - topN = 200
2007-07-27 00:30:03,281 INFO  crawl.Injector - Injector: starting
2007-07-27 00:30:03,281 INFO  crawl.Injector - Injector: crawlDb: 
c:/nutch-2007-07-26_04-01-20/content/sf911truth/crawld
b
2007-07-27 00:30:03,281 INFO  crawl.Injector - Injector: urlDir: 
C:/nutch-2007-07-26_04-01-20/content/urls.txt
2007-07-27 00:30:03,296 INFO  crawl.Injector - Injector: Converting injected 
urls to crawl db entries.
2007-07-27 00:30:04,031 INFO  plugin.PluginRepository - Plugins: looking in: 
C:\nutch-2007-07-26_04-01-20\plugins
2007-07-27 00:30:04,296 INFO  plugin.PluginRepository - Plugin Auto-activation 
mode: [true]
2007-07-27 00:30:04,296 INFO  plugin.PluginRepository - Registered Plugins:
2007-07-27 00:30:04,296 INFO  plugin.PluginRepository -         CyberNeko HTML 
Parser (lib-nekohtml)
2007-07-27 00:30:04,296 INFO  plugin.PluginRepository -         Site Query 
Filter (query-site)
2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Basic URL 
Normalizer (urlnormalizer-basic)
2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Html Parse 
Plug-in (parse-html)
2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Pass-through 
URL Normalizer (urlnormalizer-pass)
2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Regex URL 
Filter Framework (lib-regex-filter)
2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Feed 
Parse/Index/Query Plug-in (feed)
2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Basic Indexing 
Filter (index-basic)
2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Basic 
Summarizer Plug-in (summary-basic)
2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Text Parse 
Plug-in (parse-text)
2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         JavaScript 
Parser (parse-js)
2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Basic Query 
Filter (query-basic)
2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Regex URL 
Filter (urlfilter-regex)
2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         HTTP Framework 
(lib-http)
2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         XML Libraries 
(lib-xml)
2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         URL Query 
Filter (query-url)
2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Regex URL 
Normalizer (urlnormalizer-regex)
2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Http Protocol 
Plug-in (protocol-http)
2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         the nutch core 
extension points (nutch-extensionpoints)
2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         OPIC Scoring 
Plug-in (scoring-opic)
2007-07-27 00:30:04,312 INFO  plugin.PluginRepository - Registered 
Extension-Points:
2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Nutch 
Summarizer (org.apache.nutch.searcher.Summarizer)
2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Nutch Scoring 
(org.apache.nutch.scoring.ScoringFilter)
2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Nutch Protocol 
(org.apache.nutch.protocol.Protocol)
2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Nutch URL 
Normalizer (org.apache.nutch.net.URLNormalizer
)
2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Nutch URL 
Filter (org.apache.nutch.net.URLFilter)
2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         HTML Parse 
Filter (org.apache.nutch.parse.HtmlParseFilte
r)
2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Nutch Online 
Search Results Clustering Plugin (org.apach
e.nutch.clustering.OnlineClusterer)
2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Nutch Indexing 
Filter (org.apache.nutch.indexer.Indexing
Filter)
2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Nutch Content 
Parser (org.apache.nutch.parse.Parser)
2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Ontology Model 
Loader (org.apache.nutch.ontology.Ontolog
y)
2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Nutch Analysis 
(org.apache.nutch.analysis.NutchAnalyzer)

2007-07-27 00:30:04,312 INFO  plugin.PluginRepository -         Nutch Query 
Filter (org.apache.nutch.searcher.QueryFilte
r)
2007-07-27 00:30:04,375 WARN  regex.RegexURLNormalizer - can't find rules for 
scope 'inject', using default
2007-07-27 00:30:06,046 INFO  crawl.Injector - Injector: Merging injected urls 
into crawl db.
2007-07-27 00:30:06,640 WARN  util.NativeCodeLoader - Unable to load 
native-hadoop library for your platform... using bu
iltin-java classes where applicable
2007-07-27 00:30:07,500 INFO  crawl.Injector - Injector: done
2007-07-27 00:30:08,500 INFO  crawl.Generator - Generator: Selecting 
best-scoring urls due for fetch.
2007-07-27 00:30:08,500 INFO  crawl.Generator - Generator: starting
2007-07-27 00:30:08,500 INFO  crawl.Generator - Generator: segment: 
c:/nutch-2007-07-26_04-01-20/content/sf911truth/segm
ents/20070727003008
2007-07-27 00:30:08,500 INFO  crawl.Generator - Generator: filtering: false
2007-07-27 00:30:08,500 INFO  crawl.Generator - Generator: topN: 200
2007-07-27 00:30:08,531 INFO  crawl.Generator - Generator: jobtracker is 
'local', generating exactly one partition.
2007-07-27 00:30:08,984 INFO  plugin.PluginRepository - Plugins: looking in: 
C:\nutch-2007-07-26_04-01-20\plugins
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository - Plugin Auto-activation 
mode: [true]
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository - Registered Plugins:
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         CyberNeko HTML 
Parser (lib-nekohtml)
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Site Query 
Filter (query-site)
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Basic URL 
Normalizer (urlnormalizer-basic)
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Html Parse 
Plug-in (parse-html)
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Pass-through 
URL Normalizer (urlnormalizer-pass)
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Regex URL 
Filter Framework (lib-regex-filter)
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Feed 
Parse/Index/Query Plug-in (feed)
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Basic Indexing 
Filter (index-basic)
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Basic 
Summarizer Plug-in (summary-basic)
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Text Parse 
Plug-in (parse-text)
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         JavaScript 
Parser (parse-js)
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Basic Query 
Filter (query-basic)
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Regex URL 
Filter (urlfilter-regex)
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         HTTP Framework 
(lib-http)
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         XML Libraries 
(lib-xml)
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         URL Query 
Filter (query-url)
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Regex URL 
Normalizer (urlnormalizer-regex)
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Http Protocol 
Plug-in (protocol-http)
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         the nutch core 
extension points (nutch-extensionpoints)
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         OPIC Scoring 
Plug-in (scoring-opic)
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository - Registered 
Extension-Points:
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Nutch 
Summarizer (org.apache.nutch.searcher.Summarizer)
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Nutch Scoring 
(org.apache.nutch.scoring.ScoringFilter)
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Nutch Protocol 
(org.apache.nutch.protocol.Protocol)
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Nutch URL 
Normalizer (org.apache.nutch.net.URLNormalizer
)
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Nutch URL 
Filter (org.apache.nutch.net.URLFilter)
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         HTML Parse 
Filter (org.apache.nutch.parse.HtmlParseFilte
r)
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Nutch Online 
Search Results Clustering Plugin (org.apach
e.nutch.clustering.OnlineClusterer)
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Nutch Indexing 
Filter (org.apache.nutch.indexer.Indexing
Filter)
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Nutch Content 
Parser (org.apache.nutch.parse.Parser)
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Ontology Model 
Loader (org.apache.nutch.ontology.Ontolog
y)
2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Nutch Analysis 
(org.apache.nutch.analysis.NutchAnalyzer)

2007-07-27 00:30:09,187 INFO  plugin.PluginRepository -         Nutch Query 
Filter (org.apache.nutch.searcher.QueryFilte
r)
2007-07-27 00:30:09,218 INFO  crawl.FetchScheduleFactory - Using FetchSchedule 
impl: org.apache.nutch.crawl.DefaultFetch
Schedule
2007-07-27 00:30:09,218 INFO  crawl.AbstractFetchSchedule - 
defaultInterval=2592000.0
2007-07-27 00:30:09,218 INFO  crawl.AbstractFetchSchedule - 
maxInterval=7776000.0
2007-07-27 00:30:09,234 WARN  regex.RegexURLNormalizer - can't find rules for 
scope 'partition', using default
2007-07-27 00:30:09,296 INFO  plugin.PluginRepository - Plugins: looking in: 
C:\nutch-2007-07-26_04-01-20\plugins
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository - Plugin Auto-activation 
mode: [true]
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository - Registered Plugins:
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         CyberNeko HTML 
Parser (lib-nekohtml)
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Site Query 
Filter (query-site)
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Basic URL 
Normalizer (urlnormalizer-basic)
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Html Parse 
Plug-in (parse-html)
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Pass-through 
URL Normalizer (urlnormalizer-pass)
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Regex URL 
Filter Framework (lib-regex-filter)
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Feed 
Parse/Index/Query Plug-in (feed)
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Basic Indexing 
Filter (index-basic)
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Basic 
Summarizer Plug-in (summary-basic)
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Text Parse 
Plug-in (parse-text)
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         JavaScript 
Parser (parse-js)
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Basic Query 
Filter (query-basic)
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Regex URL 
Filter (urlfilter-regex)
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         HTTP Framework 
(lib-http)
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         XML Libraries 
(lib-xml)
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         URL Query 
Filter (query-url)
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Regex URL 
Normalizer (urlnormalizer-regex)
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Http Protocol 
Plug-in (protocol-http)
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         the nutch core 
extension points (nutch-extensionpoints)
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         OPIC Scoring 
Plug-in (scoring-opic)
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository - Registered 
Extension-Points:
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Nutch 
Summarizer (org.apache.nutch.searcher.Summarizer)
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Nutch Scoring 
(org.apache.nutch.scoring.ScoringFilter)
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Nutch Protocol 
(org.apache.nutch.protocol.Protocol)
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Nutch URL 
Normalizer (org.apache.nutch.net.URLNormalizer
)
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Nutch URL 
Filter (org.apache.nutch.net.URLFilter)
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         HTML Parse 
Filter (org.apache.nutch.parse.HtmlParseFilte
r)
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Nutch Online 
Search Results Clustering Plugin (org.apach
e.nutch.clustering.OnlineClusterer)
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Nutch Indexing 
Filter (org.apache.nutch.indexer.Indexing
Filter)
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Nutch Content 
Parser (org.apache.nutch.parse.Parser)
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Ontology Model 
Loader (org.apache.nutch.ontology.Ontolog
y)
2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Nutch Analysis 
(org.apache.nutch.analysis.NutchAnalyzer)

2007-07-27 00:30:09,468 INFO  plugin.PluginRepository -         Nutch Query 
Filter (org.apache.nutch.searcher.QueryFilte
r)
2007-07-27 00:30:09,500 INFO  crawl.FetchScheduleFactory - Using FetchSchedule 
impl: org.apache.nutch.crawl.DefaultFetch
Schedule
2007-07-27 00:30:09,500 INFO  crawl.AbstractFetchSchedule - 
defaultInterval=2592000.0
2007-07-27 00:30:09,500 INFO  crawl.AbstractFetchSchedule - 
maxInterval=7776000.0
2007-07-27 00:30:10,187 INFO  crawl.Generator - Generator: Partitioning 
selected urls by host, for politeness.
2007-07-27 00:30:10,687 INFO  plugin.PluginRepository - Plugins: looking in: 
C:\nutch-2007-07-26_04-01-20\plugins
2007-07-27 00:30:10,859 INFO  plugin.PluginRepository - Plugin Auto-activation 
mode: [true]
2007-07-27 00:30:10,859 INFO  plugin.PluginRepository - Registered Plugins:
2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         CyberNeko HTML 
Parser (lib-nekohtml)
2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Site Query 
Filter (query-site)
2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Basic URL 
Normalizer (urlnormalizer-basic)
2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Html Parse 
Plug-in (parse-html)
2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Pass-through 
URL Normalizer (urlnormalizer-pass)
2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Regex URL 
Filter Framework (lib-regex-filter)
2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Feed 
Parse/Index/Query Plug-in (feed)
2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Basic Indexing 
Filter (index-basic)
2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Basic 
Summarizer Plug-in (summary-basic)
2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Text Parse 
Plug-in (parse-text)
2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         JavaScript 
Parser (parse-js)
2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Basic Query 
Filter (query-basic)
2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Regex URL 
Filter (urlfilter-regex)
2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         HTTP Framework 
(lib-http)
2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         XML Libraries 
(lib-xml)
2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         URL Query 
Filter (query-url)
2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Regex URL 
Normalizer (urlnormalizer-regex)
2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Http Protocol 
Plug-in (protocol-http)
2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         the nutch core 
extension points (nutch-extensionpoints)
2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         OPIC Scoring 
Plug-in (scoring-opic)
2007-07-27 00:30:10,859 INFO  plugin.PluginRepository - Registered 
Extension-Points:
2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Nutch 
Summarizer (org.apache.nutch.searcher.Summarizer)
2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Nutch Scoring 
(org.apache.nutch.scoring.ScoringFilter)
2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Nutch Protocol 
(org.apache.nutch.protocol.Protocol)
2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Nutch URL 
Normalizer (org.apache.nutch.net.URLNormalizer
)
2007-07-27 00:30:10,859 INFO  plugin.PluginRepository -         Nutch URL 
Filter (org.apache.nutch.net.URLFilter)
2007-07-27 00:30:10,875 INFO  plugin.PluginRepository -         HTML Parse 
Filter (org.apache.nutch.parse.HtmlParseFilte
r)
2007-07-27 00:30:10,875 INFO  plugin.PluginRepository -         Nutch Online 
Search Results Clustering Plugin (org.apach
e.nutch.clustering.OnlineClusterer)
2007-07-27 00:30:10,875 INFO  plugin.PluginRepository -         Nutch Indexing 
Filter (org.apache.nutch.indexer.Indexing
Filter)
2007-07-27 00:30:10,875 INFO  plugin.PluginRepository -         Nutch Content 
Parser (org.apache.nutch.parse.Parser)
2007-07-27 00:30:10,875 INFO  plugin.PluginRepository -         Ontology Model 
Loader (org.apache.nutch.ontology.Ontolog
y)
2007-07-27 00:30:10,875 INFO  plugin.PluginRepository -         Nutch Analysis 
(org.apache.nutch.analysis.NutchAnalyzer)

2007-07-27 00:30:10,875 INFO  plugin.PluginRepository -         Nutch Query 
Filter (org.apache.nutch.searcher.QueryFilte
r)
2007-07-27 00:30:10,890 WARN  regex.RegexURLNormalizer - can't find rules for 
scope 'partition', using default
2007-07-27 00:30:11,625 INFO  crawl.Generator - Generator: done.
2007-07-27 00:30:11,625 INFO  fetcher.Fetcher - Fetcher: starting
2007-07-27 00:30:11,625 INFO  fetcher.Fetcher - Fetcher: segment: 
c:/nutch-2007-07-26_04-01-20/content/sf911truth/segmen
ts/20070727003008
2007-07-27 00:30:12,078 INFO  fetcher.Fetcher - Fetcher: threads: 10
2007-07-27 00:30:12,093 INFO  plugin.PluginRepository - Plugins: looking in: 
C:\nutch-2007-07-26_04-01-20\plugins
2007-07-27 00:30:12,218 INFO  plugin.PluginRepository - Plugin Auto-activation 
mode: [true]
2007-07-27 00:30:12,218 INFO  plugin.PluginRepository - Registered Plugins:
2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         CyberNeko HTML 
Parser (lib-nekohtml)
2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Site Query 
Filter (query-site)
2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Basic URL 
Normalizer (urlnormalizer-basic)
2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Html Parse 
Plug-in (parse-html)
2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Pass-through 
URL Normalizer (urlnormalizer-pass)
2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Regex URL 
Filter Framework (lib-regex-filter)
2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Feed 
Parse/Index/Query Plug-in (feed)
2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Basic Indexing 
Filter (index-basic)
2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Basic 
Summarizer Plug-in (summary-basic)
2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Text Parse 
Plug-in (parse-text)
2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         JavaScript 
Parser (parse-js)
2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Basic Query 
Filter (query-basic)
2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Regex URL 
Filter (urlfilter-regex)
2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         HTTP Framework 
(lib-http)
2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         XML Libraries 
(lib-xml)
2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         URL Query 
Filter (query-url)
2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Regex URL 
Normalizer (urlnormalizer-regex)
2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Http Protocol 
Plug-in (protocol-http)
2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         the nutch core 
extension points (nutch-extensionpoints)
2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         OPIC Scoring 
Plug-in (scoring-opic)
2007-07-27 00:30:12,218 INFO  plugin.PluginRepository - Registered 
Extension-Points:
2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Nutch 
Summarizer (org.apache.nutch.searcher.Summarizer)
2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Nutch Scoring 
(org.apache.nutch.scoring.ScoringFilter)
2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Nutch Protocol 
(org.apache.nutch.protocol.Protocol)
2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Nutch URL 
Normalizer (org.apache.nutch.net.URLNormalizer
)
2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Nutch URL 
Filter (org.apache.nutch.net.URLFilter)
2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         HTML Parse 
Filter (org.apache.nutch.parse.HtmlParseFilte
r)
2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Nutch Online 
Search Results Clustering Plugin (org.apach
e.nutch.clustering.OnlineClusterer)
2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Nutch Indexing 
Filter (org.apache.nutch.indexer.Indexing
Filter)
2007-07-27 00:30:12,218 INFO  plugin.PluginRepository -         Nutch Content 
Parser (org.apache.nutch.parse.Parser)
2007-07-27 00:30:12,234 INFO  plugin.PluginRepository -         Ontology Model 
Loader (org.apache.nutch.ontology.Ontolog
y)
2007-07-27 00:30:12,234 INFO  plugin.PluginRepository -         Nutch Analysis 
(org.apache.nutch.analysis.NutchAnalyzer)

2007-07-27 00:30:12,234 INFO  plugin.PluginRepository -         Nutch Query 
Filter (org.apache.nutch.searcher.QueryFilte
r)
2007-07-27 00:30:12,265 INFO  fetcher.Fetcher - fetching 
http://www.sf911truth.org/
2007-07-27 00:30:12,312 FATAL api.RobotRulesParser - Agent we advertise 
(microlith-nutch) not listed first in 'http.robo
ts.agents' property!
2007-07-27 00:30:12,312 INFO  http.Http - http.proxy.host = null
2007-07-27 00:30:12,312 INFO  http.Http - http.proxy.port = 8080
2007-07-27 00:30:12,312 INFO  http.Http - http.timeout = 10000
2007-07-27 00:30:12,312 INFO  http.Http - http.content.limit = 65536
2007-07-27 00:30:12,312 INFO  http.Http - http.agent = 
microlith-nutch/Nutch-1.0-dev (crawler nutch-2007-07-26_04-01-20;
 http://hopoo.dyndns.org; kai(underscore)testing(att)yahoo(dotcom))
2007-07-27 00:30:12,312 INFO  http.Http - protocol.plugin.check.blocking = true
2007-07-27 00:30:12,312 INFO  http.Http - protocol.plugin.check.robots = true
2007-07-27 00:30:12,312 INFO  http.Http - fetcher.server.delay = 3000
2007-07-27 00:30:12,312 INFO  http.Http - http.max.delays = 100
2007-07-27 00:30:13,578 WARN  regex.RegexURLNormalizer - can't find rules for 
scope 'outlink', using default
2007-07-27 00:30:13,640 INFO  crawl.SignatureFactory - Using Signature impl: 
org.apache.nutch.crawl.MD5Signature
2007-07-27 00:30:14,406 INFO  plugin.PluginRepository - Plugins: looking in: 
C:\nutch-2007-07-26_04-01-20\plugins
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository - Plugin Auto-activation 
mode: [true]
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository - Registered Plugins:
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         CyberNeko HTML 
Parser (lib-nekohtml)
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Site Query 
Filter (query-site)
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Basic URL 
Normalizer (urlnormalizer-basic)
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Html Parse 
Plug-in (parse-html)
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Pass-through 
URL Normalizer (urlnormalizer-pass)
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Regex URL 
Filter Framework (lib-regex-filter)
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Feed 
Parse/Index/Query Plug-in (feed)
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Basic Indexing 
Filter (index-basic)
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Basic 
Summarizer Plug-in (summary-basic)
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Text Parse 
Plug-in (parse-text)
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         JavaScript 
Parser (parse-js)
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Basic Query 
Filter (query-basic)
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Regex URL 
Filter (urlfilter-regex)
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         HTTP Framework 
(lib-http)
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         XML Libraries 
(lib-xml)
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         URL Query 
Filter (query-url)
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Regex URL 
Normalizer (urlnormalizer-regex)
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Http Protocol 
Plug-in (protocol-http)
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         the nutch core 
extension points (nutch-extensionpoints)
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         OPIC Scoring 
Plug-in (scoring-opic)
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository - Registered 
Extension-Points:
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Nutch 
Summarizer (org.apache.nutch.searcher.Summarizer)
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Nutch Scoring 
(org.apache.nutch.scoring.ScoringFilter)
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Nutch Protocol 
(org.apache.nutch.protocol.Protocol)
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Nutch URL 
Normalizer (org.apache.nutch.net.URLNormalizer
)
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Nutch URL 
Filter (org.apache.nutch.net.URLFilter)
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         HTML Parse 
Filter (org.apache.nutch.parse.HtmlParseFilte
r)
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Nutch Online 
Search Results Clustering Plugin (org.apach
e.nutch.clustering.OnlineClusterer)
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Nutch Indexing 
Filter (org.apache.nutch.indexer.Indexing
Filter)
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Nutch Content 
Parser (org.apache.nutch.parse.Parser)
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Ontology Model 
Loader (org.apache.nutch.ontology.Ontolog
y)
2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Nutch Analysis 
(org.apache.nutch.analysis.NutchAnalyzer)

2007-07-27 00:30:14,609 INFO  plugin.PluginRepository -         Nutch Query 
Filter (org.apache.nutch.searcher.QueryFilte
r)
2007-07-27 00:30:14,718 WARN  mapred.LocalJobRunner - job_8r2j8
java.lang.IllegalArgumentException: Illegal Capacity: -1
        at java.util.ArrayList.<init>(ArrayList.java:111)
        at 
org.apache.nutch.parse.ParseOutputFormat$1.write(ParseOutputFormat.java:149)
        at 
org.apache.nutch.fetcher.FetcherOutputFormat$1.write(FetcherOutputFormat.java:94)
        at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:311)
        at 
org.apache.hadoop.mapred.lib.IdentityReducer.reduce(IdentityReducer.java:41)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:326)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:155)




       
____________________________________________________________________________________Ready
 for the edge of your seat? 
Check out tonight's top picks on Yahoo! TV. 
http://tv.yahoo.com/
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to