Hi,
I am debugging a bulk indexing performance issue while upgrading to 6.6
from 4.5.0 . I have commits disabled while indexing total of 85G data
during 7 hours. At the end of it, I want some 30 or so big segments. But i
am getting 3000 segments.
I deleted the index and enabled infostream logging ; i have attached the
log when first segment is flushed. Here are few questions:
1. When a segment if flushed , then is it permanent or can more documents
be written to it (besides the merge scenario)?
2. It seems that 330+ threads are writing in parallel. Will each one of
them become one segment when written to the disk? In which case, i should
probably decrease concurrency?
3. One possibility is to delay flushing, the flush is getting triggered at
10000MB, probably coming from <ramBufferSizeMB>10000</ramBufferSizeMB> ;
however, the segment which is flushed is only 115MB. Is this limit for the
combined size of all in-memory segments? In which case, is it ok to
increase it further to use more of my heap (48GB).
4. How can I decrease the concurrency, maybe the solution is to use fewer
in memory segments?
In previous run, there were 110k files in the index folder after I stopping
indexing. Before doing commit, I noticed that the file count continued to
decrease every few minutes, until it reduced to 27k or so. (I committed
after it stabilized)
My Indexconfig is this:
<indexConfig>
<writeLockTimeout>1000</writeLockTimeout>
<commitLockTimeout>10000</commitLockTimeout>
<maxIndexingThreads>10</maxIndexingThreads>
<useCompoundFile>false</useCompoundFile>
<ramBufferSizeMB>10000</ramBufferSizeMB>
<mergePolicyFactory
class="org.apache.solr.index.TieredMergePolicyFactory">
<int name="maxMergeAtOnce">5</int>
<int name="segmentsPerTier">3000</int>
<int name="maxMergeAtOnceExplicit">10</int>
<int name="floorSegmentMB">16</int>
<!-- 200 gb since we want few big segments during full indexing -->
<double name="maxMergedSegmentMB">200000</double>
<double name="forceMergeDeletesPctAllowed">1</double>
</mergePolicyFactory>
<mergeScheduler
class="org.apache.lucene.index.ConcurrentMergeScheduler">
<int name="maxThreadCount">10</int>
<int name="maxMergeCount">10</int>
</mergeScheduler>
<lockType>${solr.lock.type:native}</lockType>
<reopenReaders>true</reopenReaders>
<deletionPolicy class="solr.SolrDeletionPolicy">
<str name="maxCommitsToKeep">1</str>
<str name="maxOptimizedCommitsToKeep">0</str>
</deletionPolicy>
<infoStream>true</infoStream>
<applyAllDeletesOnFlush>false</applyAllDeletesOnFlush>
</indexConfig>
Thanks
Nawab
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]