Agreed with Michael that setting expectations is going to be
important. The thing that I would like to make sure is that we would
never refrain from moving Lucene forward because of this feature. In
particular, lucene-core should be free to make assumptions that are
valid for N and N-1 indices without worrying about the fact that we
have this super-expert feature that allows opening older indices. Here
are some assumptions that I have in mind which have not always been
true:
 - norms might be encoded in a different way (this changed in 7)
 - all index files have a checksum (only true since Lucene 5)
 - offsets are always going forward (only enforced since Lucene 7)

This means that carrying indices over by just merging them with the
new version to move them to a new codec won't work all the time. For
instance if your index has backward offsets and new codecs assume that
offsets are going forward, then merging might fail or corrupt offsets
- I'd like to make sure that we would not consider this a bug.

Erick, I don't think this feature would be suitable for "robust index
upgrades". To me it is really a best effort and shouldn't be trusted
too much.

I think some users will be tempted to wrap old readers to make them
look good and then add them back to an index using addIndexes?
Something like https://issues.apache.org/jira/browse/LUCENE-8277 would
help make it harder to introduce corrupt data in an index.

On Wed, Jan 23, 2019 at 3:11 PM Simon Willnauer
<[email protected]> wrote:
>
> Hey folks,
>
> tl;dr; I want to be able to open an indexreader on an old index if the
> SegmentInfo version is supported and all segment codecs are available.
> Today that's not possible even if I port old formats to current
> versions.
>
> Our BWC policy for quite a while has been N-1 major versions. That's
> good and I think we should keep it that way. Only recently, caused by
> changes how we encode/decode norms we also hard-enforce a the
> index-version-created in several places and the version a segment was
> written with. These are great enforcements and I understand why. My
> request here is if we can find consensus on allowing somehow (a
> special DirectoryReader for instance) to open such an index for
> reading only that doesn't provide the guarantees that our high level
> APIs decode norms correctly for instance. This would be enough to for
> instance consume stored fields etc. for reindexing or if a users are
> aware do they norms decoding in the codec. I am happy to work on a
> proposal how this would work. It would still enforce no writing or
> anything like this. I am also all for putting such a reader into misc
> and being experimental.
>
> simon
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>


-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to