Hi, one thing that always works to "forcefully" upgrade without reindexing. You 
just merge the old index into a completely new index not by coping files, but 
by sending their SegmentReaders to addIndex, stripping all metadata from them 
with some trick: 
https://lucene.apache.org/core/8_11_0/core/org/apache/lucene/index/SlowCodecReaderWrapper.html
 in combination with 
<https://lucene.apache.org/core/8_11_0/core/org/apache/lucene/index/IndexWriter.html#addIndexes-org.apache.lucene.index.CodecReader...->
 

One way to do this is the following:
- Open old index using DirectoryReader.open(): reader = 
DirectoryReader.open(...old directory...)
- Create a new Index with IndexWriter writer: writer = new IndedxWriter(...new 
directory...)
- Call 
writer.addIndexes(reader.leaves().stream().map(IndexReaderContext::reader).map(SlowCodecReaderWrapper::wrap).toArray(CodecReader[]::new));

This will add all segments from the old index logically (not reading plain 
files but using the logical layers on top) and add them to the current index as 
one large segment. If you want to keep the segment structure, then iterate over 
the leaves and call addIndexes() for each one separately.

This may be a bit slower as the whole index needs to be processed, but it is 
still faster than reindexing. If you have incorrect offsets, the process will 
fail, so there's no risk.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: u...@thetaphi.de

> -----Original Message-----
> From: Rahul Goswami <rahul196...@gmail.com>
> Sent: Wednesday, January 12, 2022 6:36 AM
> To: java-user@lucene.apache.org
> Subject: Re: Moving from lucene 6.x to 8.x
> 
> Thanks Vinay for the link to Erick's talk! I hadn't seen it and I must
> admit it did help put a few things into perspective.
> 
> I was able to track down the JIRAs (thank you 'git blame')
> surrounding/leading up to this architectural decision and the linked
> patches:
> https://issues.apache.org/jira/browse/LUCENE-7703  (Record the version that
> was used at index creation time)
> https://issues.apache.org/jira/browse/LUCENE-7730  (Better encode length
> normalization in similarities)
> https://issues.apache.org/jira/browse/LUCENE-7837  (Use
> indexCreatedVersionMajor to fail opening too old indices)
> 
> From these JIRAs what I was able to piece together is that if not
> reindexed, relevance scoring might act in unpredictable ways. For my use
> case, I can live with that since we provide an explicit sort on one or more
> fields.
> 
> In LUCENE-7703, Adrien says "we will reject broken offsets in term vectors
> as of 7.0". So my questions to the community are
> i) What are these offsets, and what feature/s might break with respect to
> these offsets if not reindexed?
> ii) Do the length normalization changes in  LUCENE-7730 affect only
> relevance scores?
> 
> I understand I could be playing with fire here, but reindexing is not a
> practical solution for my situation. At least not in the near future until
> I figure out a more seamless way of reindexing with minimal downtime given
> that there are multiple 1TB+ indexes. Would appreciate inputs from the dev
> community on this.
> 
> Thanks,
> Rahul
> 
> On Sun, Jan 9, 2022 at 2:41 PM Vinay Rajput <vinayrajput4...@gmail.com>
> wrote:
> 
> > Hi Rahul,
> >
> > I am not an expert so someone else might provide a better answer. However,
> > I remember
> > @Erick briefly talked about this restriction in one of his talks here:-
> > https://www.youtube.com/watch?v=eaQBH_H3d3g&t=621s (not sure if you
> have
> > seen it already).
> >
> > As he explains, earlier it looked like IndexUpgrader tool was doing the job
> > perfectly but it wasn't always the case. There is no guarantee that after
> > using the IndexUpgrader tool, your 8.x index will keep all of the
> > characteristics of lucene 8. There can be some situations (e.g. incorrect
> > offset) where you might get an incorrect relevance score which might be
> > difficult to trace and debug. So, Lucene developers now made it explicit
> > that what people were doing earlier was not ideal, and they should now plan
> > to reindex all the documents during the major upgrade.
> >
> > Having said that, what you have done can just work without any issue as
> > long as you don't encounter any odd sorting behavior. This may/may not be
> > super critical depending on the business use case and that is where you
> > might need to make a decision.
> >
> > Thanks,
> > Vinay
> >
> > On Sat, Jan 8, 2022 at 10:27 PM Rahul Goswami <rahul196...@gmail.com>
> > wrote:
> >
> > > Hello,
> > > Would appreciate any insights on the issue.Are there any backward
> > > incompatible changes in 8.x index because of which the lucene upgrader is
> > > unable to upgrade any index EVER touched by <= 6.x ? Or is the
> > restriction
> > > more of a safety net at this point for possible future incompatibilities
> > ?
> > >
> > > Thanks,
> > > Rahul
> > >
> > > On Thu, Jan 6, 2022 at 11:46 PM Rahul Goswami
> <rahul196...@gmail.com>
> > > wrote:
> > >
> > > > Hello,
> > > > I am using Apache Solr 7.7.2 with indexes which were originally created
> > > on
> > > > 4.8 and upgraded ever since. I recently tried upgrading to 8.x using
> > the
> > > > lucene IndexUpgrader tool and the upgrade fails. I know that lucene 8.x
> > > > prevents opening any segment which was touched by <= 6.x at any point
> > in
> > > > the past. I also know the general recommendation is to reindex upon
> > > > migration to another major release, however it is not always feasible.
> > > >
> > > > So I tried to remove the check for LATEST-1 in SegmentInfos.java (
> > > >
> > >
> > https://github.com/apache/lucene-solr/blob/releases/lucene-
> solr/8.11.1/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#
> L321
> > > )
> > > > and also checked for other references to IndexFormatTooOldException.
> > > Turns
> > > > out that removing this check and rebuilding lucene-core lets the
> > upgrade
> > > go
> > > > through fine. I ran a full sequence of index upgrades from 5.x -> 6.x
> > ->
> > > > 7.x ->8.x. which went through fine. Also search/update operations work
> > > > without any issues in 8.x.
> > > >
> > > > I could not find any JIRAs which talk about the technical reason behind
> > > > imposing this restriction, and would like to know the nitty-gritties.
> > > Also
> > > > would like to know about any potential pitfalls that I might be
> > > overlooking
> > > > with the above hack.
> > > >
> > > > Thanks,
> > > > Rahul
> > > >
> > > >
> > >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to