Re: Control segment size
Shalin, Here is what I've read on maxMergeDocs, While merging segments, Lucene will ensure that no segment with more than maxMergeDocs is created. Wouldn't that mean that no index file should contain more than max docs? I guess the index files could also just contain the index information which is not limited by any property - is that true? Is there any work around to limit the index size, beside limiting the index itself? Thanks, -vivek On Fri, May 8, 2009 at 10:02 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Fri, May 8, 2009 at 1:30 AM, vivek sar vivex...@gmail.com wrote: I did set the maxMergeDocs to 10M, but I still see couple of index files over 30G which do not match with max number of documents. Here are some numbers, 1) My total index size = 66GB 2) Number of total documents = 200M 3) 1M doc = 300MB 4) 10M doc should be roughly around 3-4GB. As you can see couple of files are huge. Are those documents or index files? How can I control the file size so no single file grows more than 10GB. No, there is no way to limit an individual file to a specific size. -- Regards, Shalin Shekhar Mangar.
Re: Control segment size
On Tue, May 12, 2009 at 2:30 AM, vivek sar vivex...@gmail.com wrote: Here is what I've read on maxMergeDocs, While merging segments, Lucene will ensure that no segment with more than maxMergeDocs is created. Wouldn't that mean that no index file should contain more than max docs? I guess the index files could also just contain the index information which is not limited by any property - is that true? Yes, an individual segment will not contain more than maxMergeDocs number of documents. But the size of the segment may still vary because some documents may have more unique tokens than others. What you saw originally must have been a segment merge which is normal and happens in the course of indexing. I don't think there's a way to avoid that other than to have a ridiculously high mergeFactor (which will affect search performance). -- Regards, Shalin Shekhar Mangar.
Re: Control segment size
On Fri, May 8, 2009 at 1:30 AM, vivek sar vivex...@gmail.com wrote: I did set the maxMergeDocs to 10M, but I still see couple of index files over 30G which do not match with max number of documents. Here are some numbers, 1) My total index size = 66GB 2) Number of total documents = 200M 3) 1M doc = 300MB 4) 10M doc should be roughly around 3-4GB. As you can see couple of files are huge. Are those documents or index files? How can I control the file size so no single file grows more than 10GB. No, there is no way to limit an individual file to a specific size. -- Regards, Shalin Shekhar Mangar.
Re: Control segment size
Thanks Otis. I did set the maxMergeDocs to 10M, but I still see couple of index files over 30G which do not match with max number of documents. Here are some numbers, 1) My total index size = 66GB 2) Number of total documents = 200M 3) 1M doc = 300MB 4) 10M doc should be roughly around 3-4GB. Under the index I see, -rw-r--r-- 1 dssearch staff 31771545312 May 6 14:15 _2tp.cfs -rw-r--r-- 1 dssearch staff 31932190573 May 7 08:13 _5ne.cfs -rw-r--r-- 1 dssearch staff543118747 May 7 08:32 _5p2.cfs -rw-r--r-- 1 dssearch staff543124452 May 7 08:53 _5qr.cfs -rw-r--r-- 1 dssearch staff543100201 May 7 09:18 _5sg.cfs .. .. As you can see couple of files are huge. Are those documents or index files? How can I control the file size so no single file grows more than 10GB. Thanks, -vivek On Thu, Apr 23, 2009 at 10:26 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, You are looking for maxMergeDocs, I believe. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: vivek sar vivex...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, April 23, 2009 1:08:20 PM Subject: Control segment size Hi, Is there any configuration to control the segments' file size in Solr? Currently, I've an index (70G) with 80 segment files and one of the file is 24G. We noticed that in some cases commit takes over 2 hours to complete (committing 50K records), whereas usually it finishes in 20 seconds. After further investigation it turns out the system was doing lot of paging - the file system buffer was trying to write back the big segment back to disk. I got 20G memory on system with 6 G assigned to Solr instance (running 2 instances). It seems if I can control the segment size to max of 4-5 GB I'll be ok. Is there any way to do so? I got merging factor of 100 - does that impacts the size too? Why different segments have different size? Thanks, -vivek
Re: Control segment size
Hi, You are looking for maxMergeDocs, I believe. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: vivek sar vivex...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, April 23, 2009 1:08:20 PM Subject: Control segment size Hi, Is there any configuration to control the segments' file size in Solr? Currently, I've an index (70G) with 80 segment files and one of the file is 24G. We noticed that in some cases commit takes over 2 hours to complete (committing 50K records), whereas usually it finishes in 20 seconds. After further investigation it turns out the system was doing lot of paging - the file system buffer was trying to write back the big segment back to disk. I got 20G memory on system with 6 G assigned to Solr instance (running 2 instances). It seems if I can control the segment size to max of 4-5 GB I'll be ok. Is there any way to do so? I got merging factor of 100 - does that impacts the size too? Why different segments have different size? Thanks, -vivek