Insurance Squared Inc. wrote:

> Yeah, I think it happens when we restarted either Tomcat or Apache 
> whilst in the middle of crawling or indexing (crawling if I had to 
> guess). Now we're careful to let our crawls and indexing finish before 
> we restart anything.  Haven't had any problems since.


good to hear :-)

Thanks

Michael

>
>
> Michael Wechner wrote:
>
>> Insurance Squared Inc. wrote:
>>
>>> If I recall correctly, we just checked the segment directories for 
>>> space size.  The bad ones had files of only 32K or something like that.
>>
>>
>>
>> thanks. Any idea why these are being created in the first place resp.
>> why these are not being created anymore?
>>
>> Thanks
>>
>> Michael
>>
>>>
>>> g.
>>>
>>>
>>> Michael Wechner wrote:
>>>
>>>> Insurance Squared Inc. wrote:
>>>>
>>>>> Make sure you don't have any empty or bad segments.   We had some 
>>>>> serious speed issues for a long time until we realized we had some 
>>>>> empty segments that had been generated as we tested.  Nutch would 
>>>>> then sit and spin on these bad segments for a few seconds on every 
>>>>> search.  Simply deleting the bad segments took search times from 
>>>>> >10 seconds to fractions of a second.
>>>>
>>>>
>>>>
>>>>
>>>> how does one recognize bad (or empty) segments?
>>>>
>>>> Thanks
>>>>
>>>> Michael
>>>>
>>>>>
>>>>> g.
>>>>>
>>>>>
>>>>> RP wrote:
>>>>>
>>>>>> I've got 500k urls indexed on an old 700mhz P3 clunker with only 
>>>>>> 384MB of RAM at my searches take sub-seconds....  Something is 
>>>>>> funny here.  I've got my JVM at 64MB for this as well, so be 
>>>>>> careful as it sounds like you just caused the box to thrash a bit 
>>>>>> with swapping.  Set the JVM down to 128MB and see what happens....
>>>>>>
>>>>>> rp
>>>>>>
>>>>>> Sean Dean wrote:
>>>>>>
>>>>>>> It looks like you don't have enough RAM to maintain the quick 
>>>>>>> speeds you were seeing when the index was only around 3000 pages.
>>>>>>>  
>>>>>>> Nutch scales very well, but the hardware behind it must also. 
>>>>>>> Using quick calculations and common sense, if your total system 
>>>>>>> RAM is only 512MB and all of that is given to tomcat alone your 
>>>>>>> looking at a situation where other system applications and/or 
>>>>>>> parts of Tomcat are being executed out of swap memory. This will 
>>>>>>> kill search speed.
>>>>>>>  
>>>>>>> My recommendation would be to get more RAM, another 512MB should 
>>>>>>> support a 1.5 million page index running at the speeds you 
>>>>>>> experienced during your 3000 page trials. If you can get even 
>>>>>>> more, then your only helping system (search) performance.
>>>>>>>
>>>>>>> Here are a few other tips, just in case you cant get any more 
>>>>>>> RAM at this time:
>>>>>>>  
>>>>>>> 1. Make sure your passing "-server" via JAVA_OPTS.
>>>>>>> 2. Disable all non-required system and user applications.
>>>>>>> 3. Download or install the newest stable kernel and recompile 
>>>>>>> without all the junk.
>>>>>>> 4. Reduce the size of your index.
>>>>>>>
>>>>>>>  
>>>>>>> ----- Original Message ----
>>>>>>> From: shrinivas patwardhan <[EMAIL PROTECTED]>
>>>>>>> To: [email protected]
>>>>>>> Sent: Friday, December 29, 2006 4:45:41 AM
>>>>>>> Subject: Re: search performance
>>>>>>>
>>>>>>>
>>>>>>> thank you Sean Dean for your quick reply ...
>>>>>>> well i am running nutch on ubuntu 5.01 and jdk1.5
>>>>>>> there are some apps running in the background but they dont take 
>>>>>>> up that
>>>>>>> much of memory .
>>>>>>> secondly i can understand about the first search .. but the 
>>>>>>> other searches
>>>>>>> following it also take time even getting the next 10 pages also 
>>>>>>> takes some
>>>>>>> time ..
>>>>>>> so looking at all the issues does it relate to my system on the 
>>>>>>> whole .. or
>>>>>>> have i got wrong some where in the indexing process ?
>>>>>>> i just followed the tutorial  for  nutch -0.7.2   under the 
>>>>>>> section whole
>>>>>>> web crawling .
>>>>>>> when i indexed just about 3000 pages (subset of that dmoz index) 
>>>>>>> the search
>>>>>>> results were quick ) but now after loading the index file for 
>>>>>>> almost
>>>>>>> 1.5million pages it really dies up
>>>>>>> i use to get a java heap space error in tomcat ,so i fixed it by 
>>>>>>> setting the
>>>>>>>
>>>>>>> JAVA_OPTS  to Xmx512m
>>>>>>> i guess i have made my self very clear now . so wht do guys 
>>>>>>> think must be
>>>>>>> wrong ?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Shrinivas
>>>>>>>   
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>


-- 
Michael Wechner
Wyona      -   Open Source Content Management   -    Apache Lenya
http://www.wyona.com                      http://lenya.apache.org
[EMAIL PROTECTED]                        [EMAIL PROTECTED]
+41 44 272 91 61


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to