Hello all,
In the example solrconfig.xml file for Solr 4.10.2 there is the comment
(appended below) that says that setting checkIntegrityAtMerge to true
reduces risk of index corruption at the expense of slower merging.
Can someone please point me to any benchmarks or details about the
trade-offs? What kind of a slowdown occurs and what are the factors
affecting the magnitude of the slowdown?
I have huge indexes with huge merges, so I would really love to enable
integrity checking. On the other hand, we have very rarely ever had a
problem with a corrupt index and we allways do checkIndexes at the end of
the indexing process when we are re-indexing the entire corpus.
I'd like to get some kind of understanding of how much this will cost us in
merge speeds since re-indexing our corpus takes about 10 days and much of
that time is spent on merging.
We index 13 millon books (nearly 4 billion pages) averaging about 100,000
tokens/book. We now have about 1 millon books per shard. Merging 30,000
volumes takes about 30 minutes, with larger merges taking longer.)
<!--
Use true to enable this safety check, which can help
reduce the risk of propagating index corruption from older segments
into new ones, at the expense of slower merging.
-->
<checkIntegrityAtMerge>false</checkIntegrityAtMerge>
Tom Burton-West
http://www.hathitrust.org/blogs/Large-scale-Search