Bill, I don't think you have to refetch if you fix your segments, i believe it just chops off what it can't read and dumps a clean segment (ofcourse missing some data).
Have you tried merging your indexes into a single index and search from that? Try dropping your java arguments listed below in half. Is it possible for you to try kernel 2.6 as well? Also, are you on IDE or SCSI drives? Have you ran hdparm to benchmark your drives to make sure they'r up to snuff? I personally use Resin, but even tomcat shouldn't be this slow. You say 300k sites indexed, was this done with crawl and a depth or how many urls make up your db and such? -byron --- Bill Goffe <[EMAIL PROTECTED]> wrote: > Byron said: > > > Since your running Debian, can you confirm your > > java_home points to 1.4.2 and not Kaffe for both > Nutch > > & Tomcat? > > Yes, sure of this (nothing with locate or which to > kaffe > and the environment variable seems correct). > > > If you have corruption, you may want to start > over. > > My laptop runs quicker queries on 300k pages than > this > > server yields results. > > Wow. FWIW I first tried bin/nutch segread -fix but > that didn't > fix the corrupted segment (has been another report > of that here, > even when I deleted all index files as was also > reported here). I > then tried bin/nutch segslice -fix and that indeed > worked > (created two segments, one of which had zero size > and the other > was fine. (Oh -- figured out if corrupted with > bin/nutch segread > -list). > > But, even with bin/nutch segslice -fix it would seem > that > at least I would need to refetch -- is this correct? > > > Was your crawl/fetch performing terribly as well > or > > just queries? > > Hm. Not sure how the crawl fetch should go; took > about 24 > hours for 300,000 sites (doubtless depends on the > files in > conf -- I can live with that speed). But, queries > take up to > 10-15 seconds. > > - Bill > > > > > -byron > > > > --- Bill Goffe <[EMAIL PROTECTED]> wrote: > > > > > Hello - > > > > > > I'm experiencing slow searches. Here's the > > > specifics: > > > - Search example: > > > > http://rfe.org/search.jsp?query=wealth+of+nations > > > reliably takes 11 seconds > > > - ~300K pages in the database (used mergesegs > w/ > > > indexing on my three > > > segments; one was found partially corrupted) > > > - Dual 2.80GHz Xeon machine with 3 gig RAM and > > > SCSI disks (hardware RAID?) > > > - Nutch 0.7.1 > > > - JAVA_OPTS="-Xmx1024m -Xms512m" (doesn't seem > to > > > matter) > > > - Tomcat 5.5.9 (minProcessors="5" > > > maxProcessors="75" in my connector > > > for proxying in server.xml) > > > - Java(TM) 2 SDK, Standard Edition Version > 1.4.2 > > > - Linux (Debian) with 2.4.27-2-686-smp kernel > > > > > > When I monitor the search with htop (a _nice_ > > > replacement for top -- much > > > easier to kill or renice jobs in it than top, > and > > > can easily view parent > > > and child processes and sort views different > ways) I > > > see 41 processes > > > (seems like a lot?) started by Tomcat. Memory > usage > > > for each goes to ~200M > > > after a search of the above from about 64K at > Tomcat > > > startup (even on a > > > single word search it goes to ~150M). > > > > > > I didn't see anything obvious in > nutch-default.xml > > > to fiddle with nor > > > anything that really seemed apropos in the list > > > archive (other than others > > > seem to get much faster searches). Any > suggestions? > > > > > > - Bill > > > > > > -- > > > > > > > > > *------------------------------------------------------* > > > | Bill Goffe > > > [EMAIL PROTECTED] | > > > | Department of Economics voice: > (315) > > > 312-3444 | > > > | SUNY Oswego fax: > (315) > > > 312-5444 | > > > | 416 Mahar Hall > > > <http://cook.rfe.org> | > > > | Oswego, NY 13126 > > > > | > > > > > > *--------*------------------------------------------------------*-----------* > > > | "Some predicted the disclosure would set off > > > strong reactions from | > > > | governments of the target countries." > > > > | > > > | -- A description of how China, Russia, Iraq, > > > North Korea, Iran, Libya | > > > | and Syria might feel about the revelation > > > that the U.S. has | > > > | contingency plans to use nuclear weapons > > > against them. "U.S. Works | > > > | Up Plan for Using Nuclear Arms," Paul > > > Richter, LA Times, | > > > | March 9, 2002. > > > > | > > > > > > *---------------------------------------------------------------------------* > > > > > > > > -- > > *------------------------------------------------------* > | Bill Goffe > [EMAIL PROTECTED] | > | Department of Economics voice: (315) > 312-3444 | > | SUNY Oswego fax: (315) > 312-5444 | > | 416 Mahar Hall > <http://cook.rfe.org> | > | Oswego, NY 13126 > | > *--------*------------------------------------------------------*-----------* > | "Students without a bedroom television scored an > average of about 63 on | > | the mathematics section of the test, while > students with a bedroom TV | > | scored an average of about 53 (P<0.001)." > | > | -- A study on the impact of TVs in the bedrooms > of third graders. | > | "Bedroom TV Associated With Lower Achievement > Scores," Jeff Minerd, | > | > http://www.medpagetoday.com/Pediatrics/Parenting/tb/1303>. > | > *---------------------------------------------------------------------------* > > ------------------------------------------------------- This SF.Net email is sponsored by the JBoss Inc. Get Certified Today Register for a JBoss Training Course. Free Certification Exam for All Training Attendees Through End of 2005. For more info visit: http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
