unsubscribe --- On Mon, 9/7/09, zo tiger <[email protected]> wrote:
From: zo tiger <[email protected]> Subject: Re: Help me, No urls to fetch. To: [email protected] Date: Monday, September 7, 2009, 3:31 AM Oh, i resolved it. Nutch is runned. Great. I forgot copy all conf file to other slave nodes. I only setted config files on the master node but not all slave nodes. thanks for help of Paul Tomblin , MilleBii and 皮皮. Very thank you. MilleBii wrote: > > Obviously you've checked crawl-filter.txt rules. > Beware there is a nasty thing that can happen : make sure there is a > direct > CR/LF at the end of the rules, I had recently a problem because some > "invisible" spaces where following one rule and therefore this rule was > never matching... took me a while to figure out. > > > 2009/9/7 zo tiger <[email protected]> > >> >> This is my hadoop.log file's contents >> >> >> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - HTTP >> Framework (lib-http) >> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Text >> Parse >> Plug-in (parse-text) >> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - >> Pass-through >> URL Normalizer (urlnormalizer-pass) >> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Regex URL >> Filter (urlfilter-regex) >> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Http >> Protocol Plug-in (protocol-http) >> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - XML >> Response >> Writer Plug-in (response-xml) >> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Regex URL >> Normalizer (urlnormalizer-regex) >> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - OPIC >> Scoring >> Plug-in (scoring-opic) >> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - CyberNeko >> HTML Parser (lib-nekohtml) >> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Anchor >> Indexing Filter (index-anchor) >> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - >> JavaScript >> Parser (parse-js) >> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - URL Query >> Filter (query-url) >> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Regex URL >> Filter Framework (lib-regex-filter) >> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - JSON >> Response Writer Plug-in (response-json) >> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Registered >> Extension-Points: >> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Nutch >> Summarizer (org.apache.nutch.searcher.Summarizer) >> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Nutch >> Protocol (org.apache.nutch.protocol.Protocol) >> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Nutch >> Analysis (org.apache.nutch.analysis.NutchAnalyzer) >> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Nutch >> Field >> Filter (org.apache.nutch.indexer.field.FieldFilter) >> 2009-09-07 03:32:58,138 INFO plugin.PluginRepository - HTML >> Parse >> Filter (org.apache.nutch.parse.HtmlParseFilter) >> 2009-09-07 03:32:58,138 INFO plugin.PluginRepository - Nutch >> Query >> Filter (org.apache.nutch.searcher.QueryFilter) >> 2009-09-07 03:32:58,138 INFO plugin.PluginRepository - Nutch >> Search >> Results Response Writer >> (org.apache.nutch.searcher.response.ResponseWriter) >> >> >> MilleBii wrote: >> > >> > Is there more information in logs/hadoop file ? >> > >> > What is your plug-in list ? >> > >> > 2009/9/2 zo tiger <[email protected]> >> > >> >> >> >> Thank you for your reply. >> >> >> >> In urls directory(exactly /nutch/search/urls) , there is a file >> >> urllist.txt. >> >> >> >> content is as following. >> >> >> >> http://lucene.apache.org >> >> >> >> I don't understand why nutch can not fetch any url. >> >> >> >> >> >> Paul Tomblin wrote: >> >> > >> >> > On Wed, Sep 2, 2009 at 6:36 AM, zo tiger<[email protected]> >> wrote: >> >> >> >> >> > >> >> >> At last i ran bin/nutch crawl command but it gives >> >> >> >> >> >> No urls to fetch check your filter and seed list error >> >> >> >> >> >> I am sure there is no problem in crawl-url filter and other >> >> configuration >> >> >> xml files >> >> >> >> >> >> İs anyone know any possible problem???? >> >> >> >> >> > >> >> > What's in your url directory? >> >> > >> >> > >> >> > -- >> >> > http://www.linkedin.com/in/paultomblin >> >> > >> >> > >> >> >> >> -- >> >> View this message in context: >> >> >> http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25255944.html >> >> Sent from the Nutch - User mailing list archive at Nabble.com. >> >> >> >> >> > >> > >> > -- >> > -MilleBii- >> > >> > >> >> -- >> View this message in context: >> http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25324884.html >> Sent from the Nutch - User mailing list archive at Nabble.com. >> >> > > > -- > -MilleBii- > > -- View this message in context: http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25328368.html Sent from the Nutch - User mailing list archive at Nabble.com.
