I had the same little big problem - everything seemed OK:

- bin/nutch org.apache.nutch.searcher.NutchBean <search query> ... [in my
case search query = "apache"] in cygwin returns 62 Total hits on cawled
"+^http://([a-z0-9]*\.)*apache.org/"

- Nutch in Tomcat webapp after deploy seemed fine (no errors)

- I had NOT created a new xml file named nutch-0.9.xml which contains
<Context path="/nutch-0.9/" debug="5" privileged="true"
docBase="C:\nutch-0.9"/> and NOT put it in
C:\Tomcat6.0\conf\Catalina\localhost like Ramadhany had

- but still got Hits 0-0 (out of about 0 total matching pages): in
Tomcat-Nutch web interface.

... but I have solved it in my case:

- I forgott to configure the searcher.dir in nutch-site.xml at
C:\Tomcat6.0\webapps\nutch-0.9\WEB-INF\classes like in 
http://wiki.apache.org/nutch/GettingNutchRunningWithWindows
http://wiki.apache.org/nutch/GettingNutchRunningWithWindows  - Set Your
Searcher Directory

- and now it works fine - Tomcat-Nutch interface returns 62 matching pages
:)


Imam Nur Ramadhany wrote:
> 
> Hello again everyone,
> 
> My detail configuration is just like what
> http://wiki.apache.org/nutch/GettingNutchRunningWithWindows said. I'm new
> to
> Tomcat  and  Java, so I just followed the instruction. 
> 
> I extracted the release at C:\nutch-0.9, made a directory
> named urls with a file also named urls (without extention), then added the
> URLs
> to the crawl-urlfilter.txt (C:\nutch-0.9\conf\crawl-urlfilter.txt). I also
> have
> crawled  a site (http://localhost/). For
> web interface search I uploaded the nutch WAR file. And created a new xml
> file
> named nutch-0.9.xml which contains <Context path="/nutch-0.9/"
> debug="5" privileged="true" docBase="C:\nutch-0.9"
> /> and put it in C:\Tomcat6.0\conf\Catalina\localhost, I think there where
> my problems are. Is it the correct path and docbase? When I enter
> http://localhost:8080/nutch-0.9/
> there is a welcome page but when I put a query and click the search it
> wasn't
> returned any hit (Hits 0-0 (out of about 0 total matching pages):). I also
> have
> configured the searcher.dir in nutch-site.xml at
> C:\Tomcat6.0\webapps\nutch-0.9\WEB-INF\classes
> anyway.
>  
> Then like Koch Martina's suggestion I tried to search
> directly from the command line in cygwin by the command: 
> bin/nutch org.apache.nutch.searcher.NutchBean <search
> query>.
>  It works.
>  I'm still working on
> the nutch-0.9.xml to make the webapp works, trying some path and docbase.
> But it would be helpful if you
> have any other suggestions.
>  
> Thanks in advance,Ramadhany
> 
> 
> 
> ________________________________
> From: Imam Nur Ramadhany <ramadhanyov...@yahoo.com>
> To: nutch-user@lucene.apache.org
> Sent: Tuesday, January 13, 2009 7:27:21 AM
> Subject: Re: AW: Null Indexing
> 
> Thanks for your info Martina,
> it works with the command line but it doesn't when using the webapp
> (localhost:8080/nutch-0.9)
> is it enough with only deploy the war file using Tomcat manager?
> or should we include some other file to the catalina_home?
> 
> 
> 
> 
> 
> ________________________________
> From: Koch Martina <k...@huberverlag.de>
> To: "nutch-user@lucene.apache.org" <nutch-user@lucene.apache.org>
> Sent: Friday, January 9, 2009 2:57:24 PM
> Subject: AW: Null Indexing
> 
> Hi Ramadhany,
> 
> the mentioned warnings and fatals you see in the log have nothing to do
> with getting 0 results at searching.
> The fatal message can be eliminated by setting the property
> "http.robots.agents" in the nutch-site.xml to "Imam Spider,*".
> The urlnormalizer warn messages just inform you that you have not
> specified a dedicated urlnormalizer for a certain scope so that the
> default urlnormalizer is used. If you need more information on this, look
> at URLNormalizers.java (package org.apache.nutch.net).
> 
> To narrow down your searching problems, please provide some more details
> on your configuration.
> Did you check the content of your index using Luke
> (http://www.getopt.org/luke/) to make sure that the pages and content you
> are expecting in the index are really in there?
> Did you try a search directly from the command line in cygwin by the
> command:
> bin/nutch org.apache.nutch.searcher.NutchBean <search query>
> 
> Kind regards,
> Martina
> 
> -----Ursprüngliche Nachricht-----
> Von: Imam Nur Ramadhany [mailto:ramadhanyov...@yahoo.com] 
> Gesendet: 09 January 2009 01:39
> An: nutch-user@lucene.apache.org
> Betreff: Null Indexing
> 
> I'm new to Nutch, I try to deploy nutch-0.9 but still having some problem.
> when I try to search it returns  0 hits, I have configured the crawl
> folder in the webapp and crawled my localhost (could it be done?)
> I use Windows with cygwin, Tomcat6.0, and jdk1.6.0_10. There is no error
> problem occurs when crawl  based on the crawl.log. but it's return null
> when indexing.
> 
> on the hadoop.log there are some fatal and warn status like these:
> 
> FATAL api.RobotRulesParser - Agent we advertise ('Imam Spider') not listed
> first in 'http.robots.agents' property!
> .
> WARN  regex.RegexURLNormalizer - can't find rules for scope 'inject',
> using default 
> .
> WARN  regex.RegexURLNormalizer - can't find rules for scope 'crawldb',
> using default
> .
> WARN  util.NativeCodeLoader - Unable to load native-hadoop library for
> your platform... using builtin-java classes where applicable
> 
> is it related with this problem
> 
> Regards,
> 
> Ramadhany
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Null-Indexing-tp21364166p25531221.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to