Hi Lance, File handles can be a problem but the instantaneous opening of a great many files at exactly the same time give a big I/O hit during a query. This is compounded by many indexes on the server than can get hit at the same time. Limiting the number of files per index directory makes a difference.
Clive ________________________________ From: Lance Norskog <goks...@gmail.com> To: java-user@lucene.apache.org; kiwi clive <kiwi_cl...@yahoo.com> Sent: Sunday, October 28, 2012 11:09 PM Subject: Re: A large number of files in an index (3.6) An option: instead of merging continuously as you run, you can optimize with 'maxSegments=10'. This mean 'optimize but only until there are 10 segments'. If there are fewer than 10 segments, nothing happens. This lets you schedule merging I/O. Is the number of files a problem due to file space breakage? ----- Original Message ----- | From: "kiwi clive" <kiwi_cl...@yahoo.com> | To: java-user@lucene.apache.org | Sent: Saturday, October 27, 2012 12:44:34 PM | Subject: A large number of files in an index (3.6) | | Hi guys, | | I've recently moved from lucene 2.3 to 3.6. The application uses CF | format. With lucene 2.3, I understood the interaction of merge | factor etc with repect to how many files were created in the index | directory. With a merge factor of 10, the number of files in the | index directory could sometimes get up to 30, but you can see the | merging happen and the numeber of files would roll up after a while | and settle around 10-15. | | | With lucene 3.6, this is not the case. Firstly, even with MergePolicy | set to useCFS, the index appears to be a hybrid of cfs and raw index | format. I can understand that may have been done for performance | reasons, but it does increase the file count considerably. Also the | rollup of the merged segments is not occurring as it did on the | previous version. Originally I set the CFSRatio to 1.0 and found | the behaviour similar to lucene2.3 (file number wise) but this came | at a i/o cost and the machines ran with a higher load average. The | higher i/o starts to affect query performance. Reducing cfsRatio to | 0.1 (default), helped reduce i/o load but I am running several | thousand concurrent indexes across many disks on the servers and | the larger number of files per index means a large number of files | are being opened when a query hits the index, in addition to the | indexing load. | | I'm sure this is probably down to Merge policies and schedules, but | there are quite a few knobs to tweak here so some guidance as to the | the most beneficial parameters to tweak would be very helpful. | | I'm using the LogByteSizeMergePolicy with 3 background merge threads. | I'm considering using TieredMergePolicy and even reducing the number | of merge threads, but there is not much point if it does not roll up | the segments as expected. I can tweak with the cfsRatio but this | strikes me a large hammer and there may be more subtle ways to do | this ! | | So tell me I'm being stupid, just say 'derr- why dont you do | this....' and I'll be a happy man!! | | Thanks, | Clive --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org