[ https://issues.apache.org/jira/browse/LUCENE-3425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767770#comment-13767770 ]
Michael McCandless commented on LUCENE-3425: -------------------------------------------- OK, thanks for the explanation; now I understand AverageMergePolicy's purpose, and it makes sense. It's ironic that a fully "optimized" index is the worst thing you could do when searching segments concurrently ... But, I still don't understand why AverageMergePolicy is not merging the little segments from NRTCachingDir. Do you tell it to target a maximum number of segments in the index? If so, once the index is large enough, it seems like that'd force the small segments to be merged. Maybe, you could also tell it a minimum size of the segments, so that it would merge away any segments still held in NRTCachingDir? > NRT Caching Dir to allow for exact memory usage, better buffer allocation and > "global" cross indices control > ------------------------------------------------------------------------------------------------------------ > > Key: LUCENE-3425 > URL: https://issues.apache.org/jira/browse/LUCENE-3425 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index > Affects Versions: 3.4, 4.0-ALPHA > Reporter: Shay Banon > Fix For: 5.0, 4.5 > > > A discussion on IRC raised several improvements that can be made to NRT > caching dir. Some of the problems it currently has are: > 1. Not explicitly controlling the memory usage, which can result in overusing > memory (for example, large new segments being committed because refreshing is > too far behind). > 2. Heap fragmentation because of constant allocation of (probably promoted to > old gen) byte buffers. > 3. Not being able to control the memory usage across indices for multi index > usage within a single JVM. > A suggested solution (which still needs to be ironed out) is to have a > BufferAllocator that controls allocation of byte[], and allow to return > unused byte[] to it. It will have a cap on the size of memory it allows to be > allocated. > The NRT caching dir will use the allocator, which can either be provided (for > usage across several indices) or created internally. The caching dir will > also create a wrapped IndexOutput, that will flush to the main dir if the > allocator can no longer provide byte[] (exhausted). > When a file is "flushed" from the cache to the main directory, it will return > all the currently allocated byte[] to the BufferAllocator to be reused by > other "files". -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org