I had the same little big problem - everything seemed OK: - bin/nutch org.apache.nutch.searcher.NutchBean <search query> ... [in my case search query = "apache"] in cygwin returns 62 Total hits on cawled "+^http://([a-z0-9]*\.)*apache.org/"
- Nutch in Tomcat webapp after deploy seemed fine (no errors) - I had NOT created a new xml file named nutch-0.9.xml which contains <Context path="/nutch-0.9/" debug="5" privileged="true" docBase="C:\nutch-0.9"/> and NOT put it in C:\Tomcat6.0\conf\Catalina\localhost like Ramadhany had - but still got Hits 0-0 (out of about 0 total matching pages): in Tomcat-Nutch web interface. ... but I have solved it in my case: - I forgott to configure the searcher.dir in nutch-site.xml at C:\Tomcat6.0\webapps\nutch-0.9\WEB-INF\classes like in http://wiki.apache.org/nutch/GettingNutchRunningWithWindows http://wiki.apache.org/nutch/GettingNutchRunningWithWindows - Set Your Searcher Directory - and now it works fine - Tomcat-Nutch interface returns 62 matching pages :) Imam Nur Ramadhany wrote: > > Hello again everyone, > > My detail configuration is just like what > http://wiki.apache.org/nutch/GettingNutchRunningWithWindows said. I'm new > to > Tomcat and Java, so I just followed the instruction. > > I extracted the release at C:\nutch-0.9, made a directory > named urls with a file also named urls (without extention), then added the > URLs > to the crawl-urlfilter.txt (C:\nutch-0.9\conf\crawl-urlfilter.txt). I also > have > crawled a site (http://localhost/). For > web interface search I uploaded the nutch WAR file. And created a new xml > file > named nutch-0.9.xml which contains <Context path="/nutch-0.9/" > debug="5" privileged="true" docBase="C:\nutch-0.9" > /> and put it in C:\Tomcat6.0\conf\Catalina\localhost, I think there where > my problems are. Is it the correct path and docbase? When I enter > http://localhost:8080/nutch-0.9/ > there is a welcome page but when I put a query and click the search it > wasn't > returned any hit (Hits 0-0 (out of about 0 total matching pages):). I also > have > configured the searcher.dir in nutch-site.xml at > C:\Tomcat6.0\webapps\nutch-0.9\WEB-INF\classes > anyway. > > Then like Koch Martina's suggestion I tried to search > directly from the command line in cygwin by the command: > bin/nutch org.apache.nutch.searcher.NutchBean <search > query>. > It works. > I'm still working on > the nutch-0.9.xml to make the webapp works, trying some path and docbase. > But it would be helpful if you > have any other suggestions. > > Thanks in advance,Ramadhany > > > > ________________________________ > From: Imam Nur Ramadhany <ramadhanyov...@yahoo.com> > To: nutch-user@lucene.apache.org > Sent: Tuesday, January 13, 2009 7:27:21 AM > Subject: Re: AW: Null Indexing > > Thanks for your info Martina, > it works with the command line but it doesn't when using the webapp > (localhost:8080/nutch-0.9) > is it enough with only deploy the war file using Tomcat manager? > or should we include some other file to the catalina_home? > > > > > > ________________________________ > From: Koch Martina <k...@huberverlag.de> > To: "nutch-user@lucene.apache.org" <nutch-user@lucene.apache.org> > Sent: Friday, January 9, 2009 2:57:24 PM > Subject: AW: Null Indexing > > Hi Ramadhany, > > the mentioned warnings and fatals you see in the log have nothing to do > with getting 0 results at searching. > The fatal message can be eliminated by setting the property > "http.robots.agents" in the nutch-site.xml to "Imam Spider,*". > The urlnormalizer warn messages just inform you that you have not > specified a dedicated urlnormalizer for a certain scope so that the > default urlnormalizer is used. If you need more information on this, look > at URLNormalizers.java (package org.apache.nutch.net). > > To narrow down your searching problems, please provide some more details > on your configuration. > Did you check the content of your index using Luke > (http://www.getopt.org/luke/) to make sure that the pages and content you > are expecting in the index are really in there? > Did you try a search directly from the command line in cygwin by the > command: > bin/nutch org.apache.nutch.searcher.NutchBean <search query> > > Kind regards, > Martina > > -----Ursprüngliche Nachricht----- > Von: Imam Nur Ramadhany [mailto:ramadhanyov...@yahoo.com] > Gesendet: 09 January 2009 01:39 > An: nutch-user@lucene.apache.org > Betreff: Null Indexing > > I'm new to Nutch, I try to deploy nutch-0.9 but still having some problem. > when I try to search it returns 0 hits, I have configured the crawl > folder in the webapp and crawled my localhost (could it be done?) > I use Windows with cygwin, Tomcat6.0, and jdk1.6.0_10. There is no error > problem occurs when crawl based on the crawl.log. but it's return null > when indexing. > > on the hadoop.log there are some fatal and warn status like these: > > FATAL api.RobotRulesParser - Agent we advertise ('Imam Spider') not listed > first in 'http.robots.agents' property! > . > WARN regex.RegexURLNormalizer - can't find rules for scope 'inject', > using default > . > WARN regex.RegexURLNormalizer - can't find rules for scope 'crawldb', > using default > . > WARN util.NativeCodeLoader - Unable to load native-hadoop library for > your platform... using builtin-java classes where applicable > > is it related with this problem > > Regards, > > Ramadhany > > -- View this message in context: http://www.nabble.com/Null-Indexing-tp21364166p25531221.html Sent from the Nutch - User mailing list archive at Nabble.com.