In the config file, site.xml, under the root directory of tomcat (tomcat/webapps/root/web-inf/classes), go the searcher properties and for searcher.dir, just type "crawl" or, if you have another name for this directory, just ". " I hope this works for you, as I had the same problem the first time I used the 0.8 version.
----- Original Message ----- From: "rashmin babaria" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Monday, May 07, 2007 2:35 PM Subject: Re: Why nutch return 0 results? > Hi, > > start tomcat after crawl is completed. so if crawl is completed by now > stop > the tomcat and start it again. It might solve your problem. > > -Rashmin. > > On 5/7/07, openxu <[EMAIL PROTECTED]> wrote: >> >> >> Hi ,all! >> I install nutch0.9. >> After starting tomcat, I crawl website as follows: >> ./nutch crawl urls -dir crawl -depth 2 -threads 2 -topN 4 >> But when I search in the http://localhost:8080/, it returns 0 results. >> Below is my configuration files. >> Will you give me any hints? >> Thanks in advance! >> crawl-urlfilter.txt: >> ---------------------------------------------------------- >> +^http://([a-z0-9]*\.)*apache.org/ >> ------------------------------------------------------------//end >> urls: >> ------------------------------------------------------------ >> http://www.apache.org/ >> ------------------------------------------------------------//end >> >> /apache-tomcat-5.5.23/webapps/root/web-inf/classes/nutch-site.xml: >> ------------------------------------------------------------ >> <configuration> >> <property> >> <name>searcher.dir</name> >> <value>/mnt/hdb7/search/nutch-0.9/nutch-0.9/bin/crawl</value> >> </property> >> </configuration> >> ------------------------------------------------------------//end >> >> /nutch-0.9/conf/nutch-site.xml: >> ------------------------------------------------------------ >> <configuration> >> <property> >> <name>http.agent.name</name> >> <value>nutch</value> >> <description>HTTP 'User-Agent' request header. MUST NOT be empty - >> please set this to a single word uniquely related to your organization. >> >> NOTE: You should also check other related properties: >> >> http.robots.agents >> http.agent.description >> http.agent.url >> http.agent.email >> http.agent.version >> >> and set their values appropriately. >> >> </description> >> </property> >> >> <property> >> <name>http.agent.description</name> >> <value>hello</value> >> <description>Further description of our bot- this text is used in >> the User-Agent header. It appears in parenthesis after the agent name. >> </description> >> </property> >> >> <property> >> <name>http.agent.url</name> >> <value>hello.com</value> >> <description>A URL to advertise in the User-Agent header. This will >> appear in parenthesis after the agent name. Custom dictates that this >> should be a URL of a page explaining the purpose and behavior of this >> crawler. >> </description> >> </property> >> >> <property> >> <name>http.agent.email</name> >> <value>[EMAIL PROTECTED]</value> >> <description>An email address to advertise in the HTTP 'From' request >> header and User-Agent header. A good practice is to mangle this >> address (e.g. 'info at example dot com') to avoid spamming. >> </description> >> </property> >> </configuration> >> ------------------------------------------------------------//end >> -- >> View this message in context: >> http://www.nabble.com/Why-nutch-return-0-results--tf3703924.html#a10357955 >> Sent from the Nutch - User mailing list archive at Nabble.com. >> >> > -------------------------------------------------------------------------------- No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.467 / Virus Database: 269.6.5/792 - Release Date: 6/5/2007 21:01 ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
