Gregory Kozlovsky wrote: > Alexander, > > May be you can share with us the memory allocation secrets of the ASPSeek. I > run > indexing with -N 100 (for 560 sites) and see that index processes in "top" > reach size > 50MB each. This size probably include the shared area? What is real memory > needed > for each index thread? How to select values for DeltaBufferSize, > UrlBufferSize kilobytes, > WordCacheSize, HrefCacheSize, NextDocLimit? >
Those 50M are actually fully shared among all threads. Each thread can additionally consume up to "MaxDocSize" bytes for URL buffer and about of this value for processing of URL. This additional memory is partially freed after URL saving, partially used for subsequent URLs. Use WordCacheSize=50000 if you index one language content or more if you index multiple language content. See, how ratio "W hit" / "W miss" from logs.txt changes if WordCacheSize is changed. Optimal value for HrefCacheSize is about Total number of sites * 10. See ratio "HQ" - "Hr hits" / "HQ" from logs.txt NextDocLimit is ignored if you use ordering by hops (-o). Use default values for DeltaBufferSize and UrlBufferSize > > And what are memory settings for mySql? What keys mySql loads during the > run? From urlword and wordurl tables? Any recommendations for setting the > "key_buffer" value? > Optimal value for key_buffer is 1/4 of total memory if index runs. If index doesn't run, restart mysql and see how much memory it consumed after a few hours of work and set this value for "key_buffer" Since "index" updates both urlword and wordurl tables it caches in key buffer all their indexes. urlword has large index on url field, thats why large values for "key_buffer" is preferred during indexing. "searchd" uses only small indexes urlword(url_id), urlword(origin,site_id,crc) urlwordsNN(url_id), wordurl(word). > > We can install as much memory as needed, but we need to know how much > is needed and how to allocate it. > > Gregory > > -----Original Message----- > From: Alexander F Avdonkin [mailto:[EMAIL PROTECTED]] > Sent: Donnerstag, 20. Juni 2002 12:04 > To: [EMAIL PROTECTED] > Subject: Re: [aseek-users] ASPSeek performance > > Gregory Kozlovsky wrote: > > > Dear ASPSeekers, > > > > We have now several ASPSeek databases, the largest by far has about 2 > > million > > documents. In order to avoid problems with the obsolete document that are > > not > > deleted, I decided to make new index into a separate database, then switch > > the databases, erase the old one, and so on. The problem is that the > search > > became somewhat slow. We have 1 GB of memory and 2 processors. After the > > system is on for many days, mySql server processes grow very large. When > > I reboot the system they are small and, it seems, everything works faster. > > > > What is the best way to configure a system of this size? How much memory > > do I need? Why mysqld's grow to something like 250 MB? Is it because of > > caching or are there memory leaks? > > > > It seems that you set large "key_buffer" value for mysql. On start mysql > doesn't use all specified memory for keys, > but it will eat specified amount of memory as it processes queries. > So, try to reduce "key_buffer" value. > > > > > Gregory Kozlovsky > > > > Project Manager for Information Systems Tel: +41 (0)1 632 > 63 > > 70 > > International Relations and Security Network (ISN) Fax: +41 (0)1 632 > 14 > > 13 > > Center for Security Studies and Conflict Research Email: > > [EMAIL PROTECTED] > > Swiss Federal Institute of Technology (ETH) http://www.isn.ch > > Leonhardshalde 21, ETH-Zentrum / LEH > > CH-8092 Z�rich, Switzerland
