Hi all,

Thank you all for your responses.

So, when updating to a newer (major) Lucene version that modifies its
codecs, there is no way to ensure everything keeps working properly, unless
re-indexing, right?

Apart from not having some original sources that were indexed (which I will
try to solve by using the *IndexUpgrader *tool), I have another problem: I
was using the org.apache.lucene.uninverting.UninvertingReader to perform
queries against the index, mainly using the grouping api. But currently, it
was removed (since Lucene 7.0). So, again, do I have any other alternative,
apart from re-indexing to use docValues?

To give you more context, I am a developer of a tool that multiple
customers can use to index their data (currently, with Lucene 5.5.5). We
are planning to upgrade to Lucene 9 (because of some vulnerabilities
affecting Lucene 5.5.5) and I think asking them to reindex will not go down
well :(

Regards,

El sáb, 29 oct 2022 a las 23:31, Matt Davis (<kryptonics...@gmail.com>)
escribió:

> Inside of Zulia search engine, the object being indexed is always a
> JSON/BSON object and we store the BSON as a stored byte field in the
> index.  This allows easy internal reindexing when the searchable fields
> change but also allows us to update to the latest lucene version.
>  Combined with using lucene-backward-codecs an older index than the current
> major version can be opened and reindexed.  If you have stored all the
> fields (or a json/bson) in the index, it would be easy to reindex in the
> new format.  If you have not, maybe opening with lucene-backward-codecs
> will be enough for your use case.
>
> Thanks,
> Matt
>
> On Sat, Oct 29, 2022 at 2:30 PM Baris Kazar <baris.ka...@oracle.com>
> wrote:
>
> > It is always great practice to retain non-indexed
> > data since when Lucene changes version,
> > even minor version, I always reindex.
> >
> > Best regards
> > ________________________________
> > From: Gus Heck <gus.h...@gmail.com>
> > Sent: Saturday, October 29, 2022 2:17 PM
> > To: java-user@lucene.apache.org <java-user@lucene.apache.org>
> > Subject: Re: Best strategy migrate indexes
> >
> > Hi Pablo,
> >
> > The deafening silence is probably nobody wanting to give you the bad
> news.
> > You are on a mission that may not be feasible, and even if you can get it
> > to "work", the end result won't likely be equivalent to indexing the
> > original data with Lucene 9.x. The indexing process is fundamentally
> lossy
> > and information originally used to produce non-stored fields will have
> been
> > thrown out. A simple example is things like stopwords or anything
> analyzed
> > with subclasses of FilteringTokenFilter. If the stop word list changed,
> or
> > the details of one of these filters changed (bugfix?), you will end up
> with
> > a different result than indexing with 9.x. This is just one
> > example, another would be stemming where the index likely only contains
> the
> > stem, not the whole word. Other folks who are more interested in the
> > details of our codecs than I am can probably provide further examples on
> a
> > more fundamental level. Lucene is not a database, and the source
> documents
> > should always be retained in a form that can be reindexed. If you have
> > inherited a system where source material has not been retained, you have
> a
> > difficult project and may have some potentially painful expectation
> setting
> > to perform.
> >
> > Best,
> > Gus
> >
> >
> >
> > On Fri, Oct 28, 2022 at 8:01 AM Pablo Vázquez Blázquez <
> pabl...@gmail.com>
> > wrote:
> >
> > > Hi all,
> > >
> > > I have some indices indexed with lucene 5.5.0. I have updated my
> > > dependencies and code to Lucene 7 (but my final goal is to use Lucene
> 9)
> > > and when trying to work with them I am having the exception:
> > > org.apache.lucene.index.IndexFormatTooOldException: Format version is
> not
> > > supported (resource
> > >
> > >
> >
> BufferedChecksumIndexInput(MMapIndexInput(path=".......\tests\segments_b"))):
> > > this index is too old (version: 5.5.0). This version of Lucene only
> > > supports indexes created with release 6.0 and later.
> > >
> > > I want to migrate from Lucene 5.x to Lucene 9.x. Which is the best
> > > strategy? Is there any tool to migrate the indices? Is it mandatory to
> > > reindex? In this case, how can I deal with this when I do not have the
> > > sources of documents that generated my current indices (I mean, I just
> > have
> > > the indices themselves)?
> > >
> > > Thanks,
> > >
> > > --
> > > Pablo Vázquez
> > > (pabl...@gmail.com)
> > >
> >
> >
> > --
> >
> >
> https://urldefense.com/v3/__http://www.needhamsoftware.com__;!!ACWV5N9M2RV99hQ!PVR-c0gAs5FpIrnotHWeo3sEWScxV8oFJrVpGdItGZictcDbRvnp5aZSqCRhglMCYqQsewQOuio4iIYARA$
> >  (work)
> >
> >
> https://urldefense.com/v3/__http://www.the111shift.com__;!!ACWV5N9M2RV99hQ!PVR-c0gAs5FpIrnotHWeo3sEWScxV8oFJrVpGdItGZictcDbRvnp5aZSqCRhglMCYqQsewQOuirxfFWpEQ$
> >  (play)
> >
>


-- 
Pablo Vázquez
(pabl...@gmail.com)

Reply via email to