There are two costs: cpu and i/o. The cpu cost is not much anyway but can be made basically trivial by using java 8. The i/o cost is because the check is not done with any i/o locality to the data being merged. so it could be a perf hit for an extremely large merge.
In 5.0 the option is removed: we reworked this computation in merging to always have locality and so on, the checking always happens. On Wed, Dec 10, 2014 at 2:51 PM, Tom Burton-West <tburt...@umich.edu> wrote: > Hello all, > > In the example solrconfig.xml file for Solr 4.10.2 there is the comment > (appended below) that says that setting checkIntegrityAtMerge to true > reduces risk of index corruption at the expense of slower merging. > > Can someone please point me to any benchmarks or details about the > trade-offs? What kind of a slowdown occurs and what are the factors > affecting the magnitude of the slowdown? > > I have huge indexes with huge merges, so I would really love to enable > integrity checking. On the other hand, we have very rarely ever had a > problem with a corrupt index and we allways do checkIndexes at the end of > the indexing process when we are re-indexing the entire corpus. > > I'd like to get some kind of understanding of how much this will cost us in > merge speeds since re-indexing our corpus takes about 10 days and much of > that time is spent on merging. > > We index 13 millon books (nearly 4 billion pages) averaging about 100,000 > tokens/book. We now have about 1 millon books per shard. Merging 30,000 > volumes takes about 30 minutes, with larger merges taking longer.) > > > <!-- > Use true to enable this safety check, which can help > reduce the risk of propagating index corruption from older segments > into new ones, at the expense of slower merging. > --> > <checkIntegrityAtMerge>false</checkIntegrityAtMerge> > > Tom Burton-West > http://www.hathitrust.org/blogs/Large-scale-Search --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org