Obviously you've checked crawl-filter.txt rules. Beware there is a nasty thing that can happen : make sure there is a direct CR/LF at the end of the rules, I had recently a problem because some "invisible" spaces where following one rule and therefore this rule was never matching... took me a while to figure out.
2009/9/7 zo tiger <[email protected]> > > This is my hadoop.log file's contents > > > 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - HTTP > Framework (lib-http) > 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Text Parse > Plug-in (parse-text) > 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - > Pass-through > URL Normalizer (urlnormalizer-pass) > 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Regex URL > Filter (urlfilter-regex) > 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Http > Protocol Plug-in (protocol-http) > 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - XML > Response > Writer Plug-in (response-xml) > 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Regex URL > Normalizer (urlnormalizer-regex) > 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - OPIC > Scoring > Plug-in (scoring-opic) > 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - CyberNeko > HTML Parser (lib-nekohtml) > 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Anchor > Indexing Filter (index-anchor) > 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - JavaScript > Parser (parse-js) > 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - URL Query > Filter (query-url) > 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Regex URL > Filter Framework (lib-regex-filter) > 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - JSON > Response Writer Plug-in (response-json) > 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Registered > Extension-Points: > 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Nutch > Summarizer (org.apache.nutch.searcher.Summarizer) > 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Nutch > Protocol (org.apache.nutch.protocol.Protocol) > 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Nutch > Analysis (org.apache.nutch.analysis.NutchAnalyzer) > 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Nutch Field > Filter (org.apache.nutch.indexer.field.FieldFilter) > 2009-09-07 03:32:58,138 INFO plugin.PluginRepository - HTML Parse > Filter (org.apache.nutch.parse.HtmlParseFilter) > 2009-09-07 03:32:58,138 INFO plugin.PluginRepository - Nutch Query > Filter (org.apache.nutch.searcher.QueryFilter) > 2009-09-07 03:32:58,138 INFO plugin.PluginRepository - Nutch > Search > Results Response Writer (org.apache.nutch.searcher.response.ResponseWriter) > > > MilleBii wrote: > > > > Is there more information in logs/hadoop file ? > > > > What is your plug-in list ? > > > > 2009/9/2 zo tiger <[email protected]> > > > >> > >> Thank you for your reply. > >> > >> In urls directory(exactly /nutch/search/urls) , there is a file > >> urllist.txt. > >> > >> content is as following. > >> > >> http://lucene.apache.org > >> > >> I don't understand why nutch can not fetch any url. > >> > >> > >> Paul Tomblin wrote: > >> > > >> > On Wed, Sep 2, 2009 at 6:36 AM, zo tiger<[email protected]> wrote: > >> >> > >> > > >> >> At last i ran bin/nutch crawl command but it gives > >> >> > >> >> No urls to fetch check your filter and seed list error > >> >> > >> >> I am sure there is no problem in crawl-url filter and other > >> configuration > >> >> xml files > >> >> > >> >> İs anyone know any possible problem???? > >> >> > >> > > >> > What's in your url directory? > >> > > >> > > >> > -- > >> > http://www.linkedin.com/in/paultomblin > >> > > >> > > >> > >> -- > >> View this message in context: > >> > http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25255944.html > >> Sent from the Nutch - User mailing list archive at Nabble.com. > >> > >> > > > > > > -- > > -MilleBii- > > > > > > -- > View this message in context: > http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25324884.html > Sent from the Nutch - User mailing list archive at Nabble.com. > > -- -MilleBii-
