Byron said:
> Since your running Debian, can you confirm your
> java_home points to 1.4.2 and not Kaffe for both Nutch
> & Tomcat?
Yes, sure of this (nothing with locate or which to kaffe
and the environment variable seems correct).
> If you have corruption, you may want to start over.
> My laptop runs quicker queries on 300k pages than this
> server yields results.
Wow. FWIW I first tried bin/nutch segread -fix but that didn't
fix the corrupted segment (has been another report of that here,
even when I deleted all index files as was also reported here). I
then tried bin/nutch segslice -fix and that indeed worked
(created two segments, one of which had zero size and the other
was fine. (Oh -- figured out if corrupted with bin/nutch segread
-list).
But, even with bin/nutch segslice -fix it would seem that
at least I would need to refetch -- is this correct?
> Was your crawl/fetch performing terribly as well or
> just queries?
Hm. Not sure how the crawl fetch should go; took about 24
hours for 300,000 sites (doubtless depends on the files in
conf -- I can live with that speed). But, queries take up to
10-15 seconds.
- Bill
>
> -byron
>
> --- Bill Goffe <[EMAIL PROTECTED]> wrote:
>
> > Hello -
> >
> > I'm experiencing slow searches. Here's the
> > specifics:
> > - Search example:
> > http://rfe.org/search.jsp?query=wealth+of+nations
> > reliably takes 11 seconds
> > - ~300K pages in the database (used mergesegs w/
> > indexing on my three
> > segments; one was found partially corrupted)
> > - Dual 2.80GHz Xeon machine with 3 gig RAM and
> > SCSI disks (hardware RAID?)
> > - Nutch 0.7.1
> > - JAVA_OPTS="-Xmx1024m -Xms512m" (doesn't seem to
> > matter)
> > - Tomcat 5.5.9 (minProcessors="5"
> > maxProcessors="75" in my connector
> > for proxying in server.xml)
> > - Java(TM) 2 SDK, Standard Edition Version 1.4.2
> > - Linux (Debian) with 2.4.27-2-686-smp kernel
> >
> > When I monitor the search with htop (a _nice_
> > replacement for top -- much
> > easier to kill or renice jobs in it than top, and
> > can easily view parent
> > and child processes and sort views different ways) I
> > see 41 processes
> > (seems like a lot?) started by Tomcat. Memory usage
> > for each goes to ~200M
> > after a search of the above from about 64K at Tomcat
> > startup (even on a
> > single word search it goes to ~150M).
> >
> > I didn't see anything obvious in nutch-default.xml
> > to fiddle with nor
> > anything that really seemed apropos in the list
> > archive (other than others
> > seem to get much faster searches). Any suggestions?
> >
> > - Bill
> >
> > --
> >
> >
> *------------------------------------------------------*
> > | Bill Goffe
> > [EMAIL PROTECTED] |
> > | Department of Economics voice: (315)
> > 312-3444 |
> > | SUNY Oswego fax: (315)
> > 312-5444 |
> > | 416 Mahar Hall
> > <http://cook.rfe.org> |
> > | Oswego, NY 13126
> > |
> >
> *--------*------------------------------------------------------*-----------*
> > | "Some predicted the disclosure would set off
> > strong reactions from |
> > | governments of the target countries."
> > |
> > | -- A description of how China, Russia, Iraq,
> > North Korea, Iran, Libya |
> > | and Syria might feel about the revelation
> > that the U.S. has |
> > | contingency plans to use nuclear weapons
> > against them. "U.S. Works |
> > | Up Plan for Using Nuclear Arms," Paul
> > Richter, LA Times, |
> > | March 9, 2002.
> > |
> >
> *---------------------------------------------------------------------------*
> >
> >
--
*------------------------------------------------------*
| Bill Goffe [EMAIL PROTECTED] |
| Department of Economics voice: (315) 312-3444 |
| SUNY Oswego fax: (315) 312-5444 |
| 416 Mahar Hall <http://cook.rfe.org> |
| Oswego, NY 13126 |
*--------*------------------------------------------------------*-----------*
| "Students without a bedroom television scored an average of about 63 on |
| the mathematics section of the test, while students with a bedroom TV |
| scored an average of about 53 (P<0.001)." |
| -- A study on the impact of TVs in the bedrooms of third graders. |
| "Bedroom TV Associated With Lower Achievement Scores," Jeff Minerd, |
| http://www.medpagetoday.com/Pediatrics/Parenting/tb/1303>. |
*---------------------------------------------------------------------------*
-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc. Get Certified Today
Register for a JBoss Training Course. Free Certification Exam
for All Training Attendees Through End of 2005. For more info visit:
http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general