If I recall correctly, we just checked the segment directories for space 
size.  The bad ones had files of only 32K or something like that.

g.


Michael Wechner wrote:
> Insurance Squared Inc. wrote:
>
>> Make sure you don't have any empty or bad segments.   We had some 
>> serious speed issues for a long time until we realized we had some 
>> empty segments that had been generated as we tested.  Nutch would 
>> then sit and spin on these bad segments for a few seconds on every 
>> search.  Simply deleting the bad segments took search times from >10 
>> seconds to fractions of a second.
>
>
> how does one recognize bad (or empty) segments?
>
> Thanks
>
> Michael
>
>>
>> g.
>>
>>
>> RP wrote:
>>
>>> I've got 500k urls indexed on an old 700mhz P3 clunker with only 
>>> 384MB of RAM at my searches take sub-seconds....  Something is funny 
>>> here.  I've got my JVM at 64MB for this as well, so be careful as it 
>>> sounds like you just caused the box to thrash a bit with swapping.  
>>> Set the JVM down to 128MB and see what happens....
>>>
>>> rp
>>>
>>> Sean Dean wrote:
>>>
>>>> It looks like you don't have enough RAM to maintain the quick 
>>>> speeds you were seeing when the index was only around 3000 pages.
>>>>  
>>>> Nutch scales very well, but the hardware behind it must also. Using 
>>>> quick calculations and common sense, if your total system RAM is 
>>>> only 512MB and all of that is given to tomcat alone your looking at 
>>>> a situation where other system applications and/or parts of Tomcat 
>>>> are being executed out of swap memory. This will kill search speed.
>>>>  
>>>> My recommendation would be to get more RAM, another 512MB should 
>>>> support a 1.5 million page index running at the speeds you 
>>>> experienced during your 3000 page trials. If you can get even more, 
>>>> then your only helping system (search) performance.
>>>>
>>>> Here are a few other tips, just in case you cant get any more RAM 
>>>> at this time:
>>>>  
>>>> 1. Make sure your passing "-server" via JAVA_OPTS.
>>>> 2. Disable all non-required system and user applications.
>>>> 3. Download or install the newest stable kernel and recompile 
>>>> without all the junk.
>>>> 4. Reduce the size of your index.
>>>>
>>>>  
>>>> ----- Original Message ----
>>>> From: shrinivas patwardhan <[EMAIL PROTECTED]>
>>>> To: [email protected]
>>>> Sent: Friday, December 29, 2006 4:45:41 AM
>>>> Subject: Re: search performance
>>>>
>>>>
>>>> thank you Sean Dean for your quick reply ...
>>>> well i am running nutch on ubuntu 5.01 and jdk1.5
>>>> there are some apps running in the background but they dont take up 
>>>> that
>>>> much of memory .
>>>> secondly i can understand about the first search .. but the other 
>>>> searches
>>>> following it also take time even getting the next 10 pages also 
>>>> takes some
>>>> time ..
>>>> so looking at all the issues does it relate to my system on the 
>>>> whole .. or
>>>> have i got wrong some where in the indexing process ?
>>>> i just followed the tutorial  for  nutch -0.7.2   under the section 
>>>> whole
>>>> web crawling .
>>>> when i indexed just about 3000 pages (subset of that dmoz index) 
>>>> the search
>>>> results were quick ) but now after loading the index file for almost
>>>> 1.5million pages it really dies up
>>>> i use to get a java heap space error in tomcat ,so i fixed it by 
>>>> setting the
>>>>
>>>> JAVA_OPTS  to Xmx512m
>>>> i guess i have made my self very clear now . so wht do guys think 
>>>> must be
>>>> wrong ?
>>>>
>>>> Thanks
>>>> Shrinivas
>>>>   
>>>
>>>
>>>
>>
>
>

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to