If I recall correctly, we just checked the segment directories for space size. The bad ones had files of only 32K or something like that.
g. Michael Wechner wrote: > Insurance Squared Inc. wrote: > >> Make sure you don't have any empty or bad segments. We had some >> serious speed issues for a long time until we realized we had some >> empty segments that had been generated as we tested. Nutch would >> then sit and spin on these bad segments for a few seconds on every >> search. Simply deleting the bad segments took search times from >10 >> seconds to fractions of a second. > > > how does one recognize bad (or empty) segments? > > Thanks > > Michael > >> >> g. >> >> >> RP wrote: >> >>> I've got 500k urls indexed on an old 700mhz P3 clunker with only >>> 384MB of RAM at my searches take sub-seconds.... Something is funny >>> here. I've got my JVM at 64MB for this as well, so be careful as it >>> sounds like you just caused the box to thrash a bit with swapping. >>> Set the JVM down to 128MB and see what happens.... >>> >>> rp >>> >>> Sean Dean wrote: >>> >>>> It looks like you don't have enough RAM to maintain the quick >>>> speeds you were seeing when the index was only around 3000 pages. >>>> >>>> Nutch scales very well, but the hardware behind it must also. Using >>>> quick calculations and common sense, if your total system RAM is >>>> only 512MB and all of that is given to tomcat alone your looking at >>>> a situation where other system applications and/or parts of Tomcat >>>> are being executed out of swap memory. This will kill search speed. >>>> >>>> My recommendation would be to get more RAM, another 512MB should >>>> support a 1.5 million page index running at the speeds you >>>> experienced during your 3000 page trials. If you can get even more, >>>> then your only helping system (search) performance. >>>> >>>> Here are a few other tips, just in case you cant get any more RAM >>>> at this time: >>>> >>>> 1. Make sure your passing "-server" via JAVA_OPTS. >>>> 2. Disable all non-required system and user applications. >>>> 3. Download or install the newest stable kernel and recompile >>>> without all the junk. >>>> 4. Reduce the size of your index. >>>> >>>> >>>> ----- Original Message ---- >>>> From: shrinivas patwardhan <[EMAIL PROTECTED]> >>>> To: [email protected] >>>> Sent: Friday, December 29, 2006 4:45:41 AM >>>> Subject: Re: search performance >>>> >>>> >>>> thank you Sean Dean for your quick reply ... >>>> well i am running nutch on ubuntu 5.01 and jdk1.5 >>>> there are some apps running in the background but they dont take up >>>> that >>>> much of memory . >>>> secondly i can understand about the first search .. but the other >>>> searches >>>> following it also take time even getting the next 10 pages also >>>> takes some >>>> time .. >>>> so looking at all the issues does it relate to my system on the >>>> whole .. or >>>> have i got wrong some where in the indexing process ? >>>> i just followed the tutorial for nutch -0.7.2 under the section >>>> whole >>>> web crawling . >>>> when i indexed just about 3000 pages (subset of that dmoz index) >>>> the search >>>> results were quick ) but now after loading the index file for almost >>>> 1.5million pages it really dies up >>>> i use to get a java heap space error in tomcat ,so i fixed it by >>>> setting the >>>> >>>> JAVA_OPTS to Xmx512m >>>> i guess i have made my self very clear now . so wht do guys think >>>> must be >>>> wrong ? >>>> >>>> Thanks >>>> Shrinivas >>>> >>> >>> >>> >> > > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
