Hi Berlin,

   Nutch needs a file called urls.txt inside the directory that you are
passing to the inject command. Try renaming the urls file to urls.txt.

  Also, are you using the local FS or hadoop dfs? If it's the latter, you'll
have to put your dmoz directory on the hadoop fs.

-vishal.

-----Original Message-----
From: Berlin Brown [mailto:[EMAIL PROTECTED] 
Sent: Sunday, June 03, 2007 5:41 AM
To: [EMAIL PROTECTED]
Subject: Re: Error with the inject command

Anybody?  Still cant figure it out.  I even created the crawl/crawldb
directory.  Nothing.  The URLS are just a set of URLS.  I am using
0.9.1  is it a bug maybe?

On 6/2/07, Berlin Brown <[EMAIL PROTECTED]> wrote:
> I am getting this error when I am trying to run the inject:
> I have done this:
>
> mkdir dmoz
> bin/nutch org.apache.nutch.tools.DmozParser content.rdf.u8 -subset
> 5000 > dmoz/urls
>
> And an error here:
>
> bin/nutch inject crawl/crawldb dmoz
>
>
> 2007-06-02 02:37:19,796 WARN  plugin.PluginRepository - Plugins: not a
> file: url. Can't load plugins from:
>
jar:file:/C:/Berlin/Downloads4/workspaceTrunk/BotListProjects/botcrawl/nutch
/nutch-0.9.job!/plugins
> 2007-06-02 02:37:19,812 INFO  plugin.PluginRepository - Plugin
> Auto-activation mode: [true]
> 2007-06-02 02:37:19,812 INFO  plugin.PluginRepository - Registered
Plugins:
> 2007-06-02 02:37:19,812 INFO  plugin.PluginRepository -         NONE
> 2007-06-02 02:37:19,812 INFO  plugin.PluginRepository - Registered
> Extension-Points:
> 2007-06-02 02:37:19,812 INFO  plugin.PluginRepository -         NONE
> 2007-06-02 02:37:19,812 WARN  mapred.LocalJobRunner - job_5ysi6h
> java.lang.RuntimeException: x point org.apache.nutch.net.URLNormalizer
> not found.
>         at
org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:120)
>         at org.apache.nutch.crawl.Injector$InjectMapper.configure(Injecto
>
> --
> Berlin Brown
> http://www.newspiritcompany.com - newspirit technologies
>


-- 
Berlin Brown
http://www.newspiritcompany.com - newspirit technologies


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to