Thanks Robert,

With indexes close to 1 TB in size, I/O is usually our big bottleneck.

Can you point me to where in the 4.x codebase and/or 5.x codebase I should
look to get a feel for what you mean by i/o locality?  Or should I be
looking at a JIRA issue?
is there a short explanation you might be able to supply?

Tom



On Wed, Dec 10, 2014 at 3:31 PM, Robert Muir <[email protected]> wrote:

> There are two costs: cpu and i/o.
>
> The cpu cost is not much anyway but can be made basically trivial by
> using java 8.
> The i/o cost is because the check is not done with any i/o locality to
> the data being merged. so it could be a perf hit for an extremely
> large merge.
>
> In 5.0 the option is removed: we reworked this computation in merging
> to always have locality and so on, the checking always happens.
>
> On Wed, Dec 10, 2014 at 2:51 PM, Tom Burton-West <[email protected]>
> wrote:
> > Hello all,
> >
> > In the example solrconfig.xml file for Solr 4.10.2 there is the comment
> > (appended below) that says that  setting checkIntegrityAtMerge to true
> > reduces risk of index corruption at the expense of slower merging.
> >
> > Can someone please point me to any benchmarks or details about the
> > trade-offs?   What kind of a slowdown occurs and what are the factors
> > affecting the magnitude of the slowdown?
> >
> > I have huge indexes with huge merges, so  I would really love to enable
> > integrity checking.  On the other hand, we have very rarely ever had a
> > problem with a corrupt index and we allways do checkIndexes  at the end
> of
> > the indexing process  when we are re-indexing the entire corpus.
> >
> > I'd like to get some kind of understanding of how much this will cost us
> in
> > merge speeds since re-indexing our corpus takes about 10 days and much of
> > that time is spent on merging.
> >
> > We index 13 millon books (nearly 4 billion pages) averaging about 100,000
> > tokens/book.  We now have about 1 millon books per shard.   Merging
> 30,000
> > volumes takes about  30 minutes, with larger merges taking longer.)
> >
> >
> >   <!--
> >         Use true to enable this safety check, which can help
> >         reduce the risk of propagating index corruption from older
> segments
> >         into new ones, at the expense of slower merging.
> >     -->
> >      <checkIntegrityAtMerge>false</checkIntegrityAtMerge>
> >
> > Tom Burton-West
> > http://www.hathitrust.org/blogs/Large-scale-Search
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to