There are two costs: cpu and i/o.

The cpu cost is not much anyway but can be made basically trivial by
using java 8.
The i/o cost is because the check is not done with any i/o locality to
the data being merged. so it could be a perf hit for an extremely
large merge.

In 5.0 the option is removed: we reworked this computation in merging
to always have locality and so on, the checking always happens.

On Wed, Dec 10, 2014 at 2:51 PM, Tom Burton-West <tburt...@umich.edu> wrote:
> Hello all,
>
> In the example solrconfig.xml file for Solr 4.10.2 there is the comment
> (appended below) that says that  setting checkIntegrityAtMerge to true
> reduces risk of index corruption at the expense of slower merging.
>
> Can someone please point me to any benchmarks or details about the
> trade-offs?   What kind of a slowdown occurs and what are the factors
> affecting the magnitude of the slowdown?
>
> I have huge indexes with huge merges, so  I would really love to enable
> integrity checking.  On the other hand, we have very rarely ever had a
> problem with a corrupt index and we allways do checkIndexes  at the end of
> the indexing process  when we are re-indexing the entire corpus.
>
> I'd like to get some kind of understanding of how much this will cost us in
> merge speeds since re-indexing our corpus takes about 10 days and much of
> that time is spent on merging.
>
> We index 13 millon books (nearly 4 billion pages) averaging about 100,000
> tokens/book.  We now have about 1 millon books per shard.   Merging 30,000
> volumes takes about  30 minutes, with larger merges taking longer.)
>
>
>   <!--
>         Use true to enable this safety check, which can help
>         reduce the risk of propagating index corruption from older segments
>         into new ones, at the expense of slower merging.
>     -->
>      <checkIntegrityAtMerge>false</checkIntegrityAtMerge>
>
> Tom Burton-West
> http://www.hathitrust.org/blogs/Large-scale-Search

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to