Make sure you don't have any empty or bad segments.   We had some 
serious speed issues for a long time until we realized we had some empty 
segments that had been generated as we tested.  Nutch would then sit and 
spin on these bad segments for a few seconds on every search.  Simply 
deleting the bad segments took search times from >10 seconds to 
fractions of a second.

g.


RP wrote:
> I've got 500k urls indexed on an old 700mhz P3 clunker with only 384MB 
> of RAM at my searches take sub-seconds....  Something is funny here.  
> I've got my JVM at 64MB for this as well, so be careful as it sounds 
> like you just caused the box to thrash a bit with swapping.  Set the 
> JVM down to 128MB and see what happens....
>
> rp
>
> Sean Dean wrote:
>> It looks like you don't have enough RAM to maintain the quick speeds 
>> you were seeing when the index was only around 3000 pages.
>>  
>> Nutch scales very well, but the hardware behind it must also. Using 
>> quick calculations and common sense, if your total system RAM is only 
>> 512MB and all of that is given to tomcat alone your looking at a 
>> situation where other system applications and/or parts of Tomcat are 
>> being executed out of swap memory. This will kill search speed.
>>  
>> My recommendation would be to get more RAM, another 512MB should 
>> support a 1.5 million page index running at the speeds you 
>> experienced during your 3000 page trials. If you can get even more, 
>> then your only helping system (search) performance.
>>
>> Here are a few other tips, just in case you cant get any more RAM at 
>> this time:
>>  
>> 1. Make sure your passing "-server" via JAVA_OPTS.
>> 2. Disable all non-required system and user applications.
>> 3. Download or install the newest stable kernel and recompile without 
>> all the junk.
>> 4. Reduce the size of your index.
>>
>>  
>> ----- Original Message ----
>> From: shrinivas patwardhan <[EMAIL PROTECTED]>
>> To: [email protected]
>> Sent: Friday, December 29, 2006 4:45:41 AM
>> Subject: Re: search performance
>>
>>
>> thank you Sean Dean for your quick reply ...
>> well i am running nutch on ubuntu 5.01 and jdk1.5
>> there are some apps running in the background but they dont take up that
>> much of memory .
>> secondly i can understand about the first search .. but the other 
>> searches
>> following it also take time even getting the next 10 pages also takes 
>> some
>> time ..
>> so looking at all the issues does it relate to my system on the whole 
>> .. or
>> have i got wrong some where in the indexing process ?
>> i just followed the tutorial  for  nutch -0.7.2   under the section 
>> whole
>> web crawling .
>> when i indexed just about 3000 pages (subset of that dmoz index) the 
>> search
>> results were quick ) but now after loading the index file for almost
>> 1.5million pages it really dies up
>> i use to get a java heap space error in tomcat ,so i fixed it by 
>> setting the
>>
>> JAVA_OPTS  to Xmx512m
>> i guess i have made my self very clear now . so wht do guys think 
>> must be
>> wrong ?
>>
>> Thanks
>> Shrinivas
>>   
>
>

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to