Yeah, I think it happens when we restarted either Tomcat or Apache whilst in the middle of crawling or indexing (crawling if I had to guess). Now we're careful to let our crawls and indexing finish before we restart anything. Haven't had any problems since.
Michael Wechner wrote: > Insurance Squared Inc. wrote: > >> If I recall correctly, we just checked the segment directories for >> space size. The bad ones had files of only 32K or something like that. > > > thanks. Any idea why these are being created in the first place resp. > why these are not being created anymore? > > Thanks > > Michael > >> >> g. >> >> >> Michael Wechner wrote: >> >>> Insurance Squared Inc. wrote: >>> >>>> Make sure you don't have any empty or bad segments. We had some >>>> serious speed issues for a long time until we realized we had some >>>> empty segments that had been generated as we tested. Nutch would >>>> then sit and spin on these bad segments for a few seconds on every >>>> search. Simply deleting the bad segments took search times from >>>> >10 seconds to fractions of a second. >>> >>> >>> >>> how does one recognize bad (or empty) segments? >>> >>> Thanks >>> >>> Michael >>> >>>> >>>> g. >>>> >>>> >>>> RP wrote: >>>> >>>>> I've got 500k urls indexed on an old 700mhz P3 clunker with only >>>>> 384MB of RAM at my searches take sub-seconds.... Something is >>>>> funny here. I've got my JVM at 64MB for this as well, so be >>>>> careful as it sounds like you just caused the box to thrash a bit >>>>> with swapping. Set the JVM down to 128MB and see what happens.... >>>>> >>>>> rp >>>>> >>>>> Sean Dean wrote: >>>>> >>>>>> It looks like you don't have enough RAM to maintain the quick >>>>>> speeds you were seeing when the index was only around 3000 pages. >>>>>> >>>>>> Nutch scales very well, but the hardware behind it must also. >>>>>> Using quick calculations and common sense, if your total system >>>>>> RAM is only 512MB and all of that is given to tomcat alone your >>>>>> looking at a situation where other system applications and/or >>>>>> parts of Tomcat are being executed out of swap memory. This will >>>>>> kill search speed. >>>>>> >>>>>> My recommendation would be to get more RAM, another 512MB should >>>>>> support a 1.5 million page index running at the speeds you >>>>>> experienced during your 3000 page trials. If you can get even >>>>>> more, then your only helping system (search) performance. >>>>>> >>>>>> Here are a few other tips, just in case you cant get any more RAM >>>>>> at this time: >>>>>> >>>>>> 1. Make sure your passing "-server" via JAVA_OPTS. >>>>>> 2. Disable all non-required system and user applications. >>>>>> 3. Download or install the newest stable kernel and recompile >>>>>> without all the junk. >>>>>> 4. Reduce the size of your index. >>>>>> >>>>>> >>>>>> ----- Original Message ---- >>>>>> From: shrinivas patwardhan <[EMAIL PROTECTED]> >>>>>> To: [email protected] >>>>>> Sent: Friday, December 29, 2006 4:45:41 AM >>>>>> Subject: Re: search performance >>>>>> >>>>>> >>>>>> thank you Sean Dean for your quick reply ... >>>>>> well i am running nutch on ubuntu 5.01 and jdk1.5 >>>>>> there are some apps running in the background but they dont take >>>>>> up that >>>>>> much of memory . >>>>>> secondly i can understand about the first search .. but the other >>>>>> searches >>>>>> following it also take time even getting the next 10 pages also >>>>>> takes some >>>>>> time .. >>>>>> so looking at all the issues does it relate to my system on the >>>>>> whole .. or >>>>>> have i got wrong some where in the indexing process ? >>>>>> i just followed the tutorial for nutch -0.7.2 under the >>>>>> section whole >>>>>> web crawling . >>>>>> when i indexed just about 3000 pages (subset of that dmoz index) >>>>>> the search >>>>>> results were quick ) but now after loading the index file for almost >>>>>> 1.5million pages it really dies up >>>>>> i use to get a java heap space error in tomcat ,so i fixed it by >>>>>> setting the >>>>>> >>>>>> JAVA_OPTS to Xmx512m >>>>>> i guess i have made my self very clear now . so wht do guys think >>>>>> must be >>>>>> wrong ? >>>>>> >>>>>> Thanks >>>>>> Shrinivas >>>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >> > > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
