Date: 2004-07-20T14:39:16 Editor: JulienNioche <[EMAIL PROTECTED]> Wiki: Jakarta Lucene Wiki Page: PainlessIndexing URL: http://wiki.apache.org/jakarta-lucene/PainlessIndexing
hint for indexing with lucene New Page: IndexWriter has a useful method called (at least temporarily) '''setMinMergeDocs''' that should be used in order to avoid file handles problems and reduce indexing time. File handles problem is often due to the fact that people use large '''mergeFactor''' values in order to speed up indexation. The maximum number of open files while merging is around mergeFactor * (5 + number of indexed fields), which can be too much for the FSDirectory. By setting a higher value to '''minMergeDocs''', you'll index and merge with a RAMDirectory which is internally used by the IndexWriter. When the limit set by '''minMergeDocs''' is reached (ex 1000) a segment is written in the FS. '''mergeFactor''' controls the number of segments to be merged, so when you have 10 segments on the FS (which is already 10x1000 docs), the IndexWriter will merge them all into a single segment. This is equivalent to an optimize I think. The process continues like that until it's finished. Combining these parameters should be enough to achieve good performance. The good point of using '''minMergeDocs''' is that you make a heavy use of the RAMDirectory used by your IndexWriter (== fast) without having to be too careful with the RAM (which would be the case with RAMDirectory). At the same time keeping your mergeFactor low, limits the risk of too many file handles problems. <hint given by JulienNioche> --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]