[Nutch-general] Web interface problems

Robin Haswell Wed, 20 Dec 2006 03:04:27 -0800

Hey there

I'm having issues searching with my newly (vastly) expanded database.
Could anyone shed any light on this? Basically, on a newly started
server, I search for "test", and this appears in catalina.out:


2006-12-20 10:51:40,710 INFO  NutchBean - creating new bean
2006-12-20 10:51:40,725 INFO  NutchBean - opening merged index in
crawl/index
2006-12-20 10:51:40,871 INFO  Configuration - found resource
common-terms.utf8 at
file:/nutch/apache-tomcat-5.5/webapps/ROOT/WEB-INF/classes/common-terms.utf8
2006-12-20 10:51:40,880 INFO  NutchBean - opening segments in
crawl/segments
2006-12-20 10:51:40,898 INFO  SummarizerFactory - Using the first
summarizer extension found: Basic Summarizer
2006-12-20 10:51:40,901 INFO  NutchBean - opening linkdb in crawl/linkdb
2006-12-20 10:51:40,907 INFO  NutchBean - query request from
195.166.60.2
2006-12-20 10:51:40,925 INFO  NutchBean - query: test
2006-12-20 10:51:40,925 INFO  NutchBean - lang: en
2006-12-20 10:51:40,974 INFO  NutchBean - searching for 20 raw hits
2006-12-20 10:52:13,306 ERROR [jsp] - Servlet.service() for servlet jsp
threw exception
java.lang.OutOfMemoryError: Java heap space

If I then refresh the page (which is blank by the way), I get this:

2006-12-20 10:53:23,729 INFO  NutchBean - query request from
195.166.60.2
2006-12-20 10:53:23,730 INFO  NutchBean - query: test
2006-12-20 10:53:23,730 INFO  NutchBean - lang: en
2006-12-20 10:53:23,735 INFO  NutchBean - searching for 20 raw hits
2006-12-20 10:54:04,685 ERROR [jsp] - Servlet.service() for servlet jsp
threw exception
java.lang.RuntimeException: java.lang.NoClassDefFoundError

..plus a lot of stack trace. The odd thing is though If I do this:

[EMAIL PROTECTED]:/nutch$ bin/nutch org.apache.nutch.searcher.NutchBean test
Total hits: 64106
 0 20061215102534/http://www.dyslexia-test.co.uk/
 ... About us About dyslexia Dyslexia Test 7-16 Dyslexia Test for Adults
Frequently ... results in the test ... 
 1 20061215102534/http://www.dsa.gov.uk/
[etc]

It works absolutely fine. Does anyone have any idea what might be
preventing the web interface from working properly? I have seen this
tomcat installation work with exactly the same webapp before - that is,
before I expanded the index.

[EMAIL PROTECTED]:/nutch$ bin/nutch readdb crawl/crawldb/ -stats
CrawlDb statistics start: crawl/crawldb/
Statistics for CrawlDb: crawl/crawldb/
TOTAL urls:     11502550
retry 0:        11429183
retry 1:        61224
retry 2:        10594
retry 3:        1549
min score:      0.0
avg score:      0.05785237
max score:      1309.991
status 1 (DB_unfetched):        9067758
status 2 (DB_fetched):  2221161
status 3 (DB_gone):     213631
CrawlDb statistics: done


Any help would be great

Thanks

-Rob


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] Web interface problems

Reply via email to