Thanks Vinay for the link to Erick's talk! I hadn't seen it and I must admit it did help put a few things into perspective.
I was able to track down the JIRAs (thank you 'git blame') surrounding/leading up to this architectural decision and the linked patches: https://issues.apache.org/jira/browse/LUCENE-7703 (Record the version that was used at index creation time) https://issues.apache.org/jira/browse/LUCENE-7730 (Better encode length normalization in similarities) https://issues.apache.org/jira/browse/LUCENE-7837 (Use indexCreatedVersionMajor to fail opening too old indices) >From these JIRAs what I was able to piece together is that if not reindexed, relevance scoring might act in unpredictable ways. For my use case, I can live with that since we provide an explicit sort on one or more fields. In LUCENE-7703, Adrien says "we will reject broken offsets in term vectors as of 7.0". So my questions to the community are i) What are these offsets, and what feature/s might break with respect to these offsets if not reindexed? ii) Do the length normalization changes in LUCENE-7730 affect only relevance scores? I understand I could be playing with fire here, but reindexing is not a practical solution for my situation. At least not in the near future until I figure out a more seamless way of reindexing with minimal downtime given that there are multiple 1TB+ indexes. Would appreciate inputs from the dev community on this. Thanks, Rahul On Sun, Jan 9, 2022 at 2:41 PM Vinay Rajput <vinayrajput4...@gmail.com> wrote: > Hi Rahul, > > I am not an expert so someone else might provide a better answer. However, > I remember > @Erick briefly talked about this restriction in one of his talks here:- > https://www.youtube.com/watch?v=eaQBH_H3d3g&t=621s (not sure if you have > seen it already). > > As he explains, earlier it looked like IndexUpgrader tool was doing the job > perfectly but it wasn't always the case. There is no guarantee that after > using the IndexUpgrader tool, your 8.x index will keep all of the > characteristics of lucene 8. There can be some situations (e.g. incorrect > offset) where you might get an incorrect relevance score which might be > difficult to trace and debug. So, Lucene developers now made it explicit > that what people were doing earlier was not ideal, and they should now plan > to reindex all the documents during the major upgrade. > > Having said that, what you have done can just work without any issue as > long as you don't encounter any odd sorting behavior. This may/may not be > super critical depending on the business use case and that is where you > might need to make a decision. > > Thanks, > Vinay > > On Sat, Jan 8, 2022 at 10:27 PM Rahul Goswami <rahul196...@gmail.com> > wrote: > > > Hello, > > Would appreciate any insights on the issue.Are there any backward > > incompatible changes in 8.x index because of which the lucene upgrader is > > unable to upgrade any index EVER touched by <= 6.x ? Or is the > restriction > > more of a safety net at this point for possible future incompatibilities > ? > > > > Thanks, > > Rahul > > > > On Thu, Jan 6, 2022 at 11:46 PM Rahul Goswami <rahul196...@gmail.com> > > wrote: > > > > > Hello, > > > I am using Apache Solr 7.7.2 with indexes which were originally created > > on > > > 4.8 and upgraded ever since. I recently tried upgrading to 8.x using > the > > > lucene IndexUpgrader tool and the upgrade fails. I know that lucene 8.x > > > prevents opening any segment which was touched by <= 6.x at any point > in > > > the past. I also know the general recommendation is to reindex upon > > > migration to another major release, however it is not always feasible. > > > > > > So I tried to remove the check for LATEST-1 in SegmentInfos.java ( > > > > > > https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L321 > > ) > > > and also checked for other references to IndexFormatTooOldException. > > Turns > > > out that removing this check and rebuilding lucene-core lets the > upgrade > > go > > > through fine. I ran a full sequence of index upgrades from 5.x -> 6.x > -> > > > 7.x ->8.x. which went through fine. Also search/update operations work > > > without any issues in 8.x. > > > > > > I could not find any JIRAs which talk about the technical reason behind > > > imposing this restriction, and would like to know the nitty-gritties. > > Also > > > would like to know about any potential pitfalls that I might be > > overlooking > > > with the above hack. > > > > > > Thanks, > > > Rahul > > > > > > > > >