Re: Control segment size

2009-05-11 Thread vivek sar
Shalin,

 Here is what I've read on maxMergeDocs,

 While merging segments, Lucene will ensure that no segment with more
than maxMergeDocs is created.

 Wouldn't that mean that no index file should contain more than max
docs? I guess the index files could also just contain the index
information which is not limited by any property - is that true?

Is there any work around to limit the index size, beside limiting the
index itself?

Thanks,
-vivek

On Fri, May 8, 2009 at 10:02 PM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
 On Fri, May 8, 2009 at 1:30 AM, vivek sar vivex...@gmail.com wrote:


 I did set the maxMergeDocs to 10M, but I still see couple of index
 files over 30G which do not match with max number of documents. Here
 are some numbers,

 1) My total index size = 66GB
 2) Number of total documents = 200M
 3) 1M doc = 300MB
 4) 10M doc should be roughly around 3-4GB.

 As you can see couple of files are huge. Are those documents or index
 files? How can I control the file size so no single file grows more
 than 10GB.


 No, there is no way to limit an individual file to a specific size.

 --
 Regards,
 Shalin Shekhar Mangar.



Re: Control segment size

2009-05-11 Thread Shalin Shekhar Mangar
On Tue, May 12, 2009 at 2:30 AM, vivek sar vivex...@gmail.com wrote:

 Here is what I've read on maxMergeDocs,

  While merging segments, Lucene will ensure that no segment with more
 than maxMergeDocs is created.

  Wouldn't that mean that no index file should contain more than max
 docs? I guess the index files could also just contain the index
 information which is not limited by any property - is that true?


Yes, an individual segment will not contain more than maxMergeDocs number of
documents. But the size of the segment may still vary because some documents
may have more unique tokens than others.

What you saw originally must have been a segment merge which is normal and
happens in the course of indexing. I don't think there's a way to avoid that
other than to have a ridiculously high mergeFactor (which will affect search
performance).

-- 
Regards,
Shalin Shekhar Mangar.


Re: Control segment size

2009-05-08 Thread Shalin Shekhar Mangar
On Fri, May 8, 2009 at 1:30 AM, vivek sar vivex...@gmail.com wrote:


 I did set the maxMergeDocs to 10M, but I still see couple of index
 files over 30G which do not match with max number of documents. Here
 are some numbers,

 1) My total index size = 66GB
 2) Number of total documents = 200M
 3) 1M doc = 300MB
 4) 10M doc should be roughly around 3-4GB.

 As you can see couple of files are huge. Are those documents or index
 files? How can I control the file size so no single file grows more
 than 10GB.


No, there is no way to limit an individual file to a specific size.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Control segment size

2009-05-07 Thread vivek sar
Thanks Otis.

I did set the maxMergeDocs to 10M, but I still see couple of index
files over 30G which do not match with max number of documents. Here
are some numbers,

1) My total index size = 66GB
2) Number of total documents = 200M
3) 1M doc = 300MB
4) 10M doc should be roughly around 3-4GB.

Under the index I see,

-rw-r--r--   1 dssearch  staff  31771545312 May  6 14:15 _2tp.cfs
-rw-r--r--   1 dssearch  staff  31932190573 May  7 08:13 _5ne.cfs
-rw-r--r--   1 dssearch  staff543118747 May  7 08:32 _5p2.cfs
-rw-r--r--   1 dssearch  staff543124452 May  7 08:53 _5qr.cfs
-rw-r--r--   1 dssearch  staff543100201 May  7 09:18 _5sg.cfs
..
..

As you can see couple of files are huge. Are those documents or index
files? How can I control the file size so no single file grows more
than 10GB.

Thanks,
-vivek



On Thu, Apr 23, 2009 at 10:26 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:

 Hi,

 You are looking for maxMergeDocs, I believe.

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: vivek sar vivex...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Thursday, April 23, 2009 1:08:20 PM
 Subject: Control segment size

 Hi,

   Is there any configuration to control the segments' file size in
 Solr? Currently, I've an index (70G) with 80 segment files and one of
 the file is 24G. We noticed that in some cases commit takes over 2
 hours to complete (committing 50K records), whereas usually it
 finishes in 20 seconds. After further investigation it turns out the
 system was doing lot of paging - the file system buffer was trying to
 write back the big segment back to disk. I got 20G memory on system
 with 6 G assigned to Solr instance (running 2 instances).

 It seems if I can control the segment size to max of 4-5 GB I'll be
 ok. Is there any way to do so?

 I got merging factor of 100 - does that impacts the size too? Why
 different segments have different size?

 Thanks,
 -vivek




Re: Control segment size

2009-04-23 Thread Otis Gospodnetic

Hi,

You are looking for maxMergeDocs, I believe.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: vivek sar vivex...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Thursday, April 23, 2009 1:08:20 PM
 Subject: Control segment size
 
 Hi,
 
   Is there any configuration to control the segments' file size in
 Solr? Currently, I've an index (70G) with 80 segment files and one of
 the file is 24G. We noticed that in some cases commit takes over 2
 hours to complete (committing 50K records), whereas usually it
 finishes in 20 seconds. After further investigation it turns out the
 system was doing lot of paging - the file system buffer was trying to
 write back the big segment back to disk. I got 20G memory on system
 with 6 G assigned to Solr instance (running 2 instances).
 
 It seems if I can control the segment size to max of 4-5 GB I'll be
 ok. Is there any way to do so?
 
 I got merging factor of 100 - does that impacts the size too? Why
 different segments have different size?
 
 Thanks,
 -vivek