I was able to follow nutch version 0.8.* tutorial to run the whole-web
crawl.
I ran the inject and generate commands successfully in my windows eclipse
environment.  
But when I ran fetch command, I got the following error message:

2007-07-03 14:28:18,890 ERROR mapred.JobClient
(JobClient.java:submitJob(273)) - Input directory
C:/JavaSearchEngine/nutch-0.8.1/crawl-epwl/segments/20070703140147/crawl_gen
erate in local is invalid.
Exception in thread "main" java.io.IOException: Input directory
C:/JavaSearchEngine/nutch-0.8.1/crawl-epwl/segments/20070703140147/crawl_gen
erate in local is invalid.
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:274)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:327)
        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:443)
        at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:477)

Can anyone help me to solve this problem?

Adam Shuy, President
ePacific Web Design & Hosting
Professional Web/Software developer
TEL: 408-272-6946
www.epacificweb.com
-----Original Message-----
From: Tsengtan A Shuy [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, July 03, 2007 12:27 PM
To: [EMAIL PROTECTED]
Subject: RE: multiple sites run

I ran this 1002 websites in my cygwin environment.
I got the following error in the hadoop.log file:
java.lang.ClassNotFoundException:
org.apache.nutch.urlfilter.regex.RegexURLFilter

How can I include this class into my cygwin environment.

Adam Shuy, President
ePacific Web Design & Hosting
Professional Web/Software developer
TEL: 408-272-6946
www.epacificweb.com
-----Original Message-----
From: Tsengtan A Shuy [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, July 03, 2007 11:31 AM
To: [EMAIL PROTECTED]
Subject: RE: multiple sites run

I followed you advice, and change the JDK Compliance to include 1.4
compatibility running Java 5.0.
But the result folder of Crawl is still smaller than the folder only running
my own website.
What is wrong with my 1002 websites run?

Adam Shuy, President
ePacific Web Design & Hosting
Professional Web/Software developer
TEL: 408-272-6946
www.epacificweb.com
-----Original Message-----
From: Kai_testing Middleton [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, July 03, 2007 10:38 AM
To: [EMAIL PROTECTED]
Subject: Re: multiple sites run

Re eclipse:

Navigate to Project, then Properties, then Java Compiler.  There's a place
to specify "JDK Compliance" in the right hand pane.

----- Original Message ----
From: Tsengtan A Shuy <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Tuesday, July 3, 2007 9:59:39 AM
Subject: multiple sites run

I follow the RunNutchInEclipse wiki article to run 1002 websites.
I got all the five folders, but the size of the these folders is smaller
then the one only running my own website.

What went wrong with this 1002 websites run.

How do you run Java 1.4 and 1.5 at the same time in Eclipse environment?

Adam Shuy, President
ePacific Web Design & Hosting
Professional Web/Software developer
TEL: 408-272-6946
www.epacificweb.com








       
____________________________________________________________________________
________
Choose the right car based on your needs.  Check out Yahoo! Autos new Car
Finder tool.
http://autos.yahoo.com/carfinder/


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to