Thanks Vinay for the link to Erick's talk! I hadn't seen it and I must
admit it did help put a few things into perspective.

I was able to track down the JIRAs (thank you 'git blame')
surrounding/leading up to this architectural decision and the linked
patches:
https://issues.apache.org/jira/browse/LUCENE-7703  (Record the version that
was used at index creation time)
https://issues.apache.org/jira/browse/LUCENE-7730  (Better encode length
normalization in similarities)
https://issues.apache.org/jira/browse/LUCENE-7837  (Use
indexCreatedVersionMajor to fail opening too old indices)

>From these JIRAs what I was able to piece together is that if not
reindexed, relevance scoring might act in unpredictable ways. For my use
case, I can live with that since we provide an explicit sort on one or more
fields.

In LUCENE-7703, Adrien says "we will reject broken offsets in term vectors
as of 7.0". So my questions to the community are
i) What are these offsets, and what feature/s might break with respect to
these offsets if not reindexed?
ii) Do the length normalization changes in  LUCENE-7730 affect only
relevance scores?

I understand I could be playing with fire here, but reindexing is not a
practical solution for my situation. At least not in the near future until
I figure out a more seamless way of reindexing with minimal downtime given
that there are multiple 1TB+ indexes. Would appreciate inputs from the dev
community on this.

Thanks,
Rahul

On Sun, Jan 9, 2022 at 2:41 PM Vinay Rajput <vinayrajput4...@gmail.com>
wrote:

> Hi Rahul,
>
> I am not an expert so someone else might provide a better answer. However,
> I remember
> @Erick briefly talked about this restriction in one of his talks here:-
> https://www.youtube.com/watch?v=eaQBH_H3d3g&t=621s (not sure if you have
> seen it already).
>
> As he explains, earlier it looked like IndexUpgrader tool was doing the job
> perfectly but it wasn't always the case. There is no guarantee that after
> using the IndexUpgrader tool, your 8.x index will keep all of the
> characteristics of lucene 8. There can be some situations (e.g. incorrect
> offset) where you might get an incorrect relevance score which might be
> difficult to trace and debug. So, Lucene developers now made it explicit
> that what people were doing earlier was not ideal, and they should now plan
> to reindex all the documents during the major upgrade.
>
> Having said that, what you have done can just work without any issue as
> long as you don't encounter any odd sorting behavior. This may/may not be
> super critical depending on the business use case and that is where you
> might need to make a decision.
>
> Thanks,
> Vinay
>
> On Sat, Jan 8, 2022 at 10:27 PM Rahul Goswami <rahul196...@gmail.com>
> wrote:
>
> > Hello,
> > Would appreciate any insights on the issue.Are there any backward
> > incompatible changes in 8.x index because of which the lucene upgrader is
> > unable to upgrade any index EVER touched by <= 6.x ? Or is the
> restriction
> > more of a safety net at this point for possible future incompatibilities
> ?
> >
> > Thanks,
> > Rahul
> >
> > On Thu, Jan 6, 2022 at 11:46 PM Rahul Goswami <rahul196...@gmail.com>
> > wrote:
> >
> > > Hello,
> > > I am using Apache Solr 7.7.2 with indexes which were originally created
> > on
> > > 4.8 and upgraded ever since. I recently tried upgrading to 8.x using
> the
> > > lucene IndexUpgrader tool and the upgrade fails. I know that lucene 8.x
> > > prevents opening any segment which was touched by <= 6.x at any point
> in
> > > the past. I also know the general recommendation is to reindex upon
> > > migration to another major release, however it is not always feasible.
> > >
> > > So I tried to remove the check for LATEST-1 in SegmentInfos.java (
> > >
> >
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L321
> > )
> > > and also checked for other references to IndexFormatTooOldException.
> > Turns
> > > out that removing this check and rebuilding lucene-core lets the
> upgrade
> > go
> > > through fine. I ran a full sequence of index upgrades from 5.x -> 6.x
> ->
> > > 7.x ->8.x. which went through fine. Also search/update operations work
> > > without any issues in 8.x.
> > >
> > > I could not find any JIRAs which talk about the technical reason behind
> > > imposing this restriction, and would like to know the nitty-gritties.
> > Also
> > > would like to know about any potential pitfalls that I might be
> > overlooking
> > > with the above hack.
> > >
> > > Thanks,
> > > Rahul
> > >
> > >
> >
>

Reply via email to