Agreed with Michael that setting expectations is going to be important. The thing that I would like to make sure is that we would never refrain from moving Lucene forward because of this feature. In particular, lucene-core should be free to make assumptions that are valid for N and N-1 indices without worrying about the fact that we have this super-expert feature that allows opening older indices. Here are some assumptions that I have in mind which have not always been true: - norms might be encoded in a different way (this changed in 7) - all index files have a checksum (only true since Lucene 5) - offsets are always going forward (only enforced since Lucene 7)
This means that carrying indices over by just merging them with the new version to move them to a new codec won't work all the time. For instance if your index has backward offsets and new codecs assume that offsets are going forward, then merging might fail or corrupt offsets - I'd like to make sure that we would not consider this a bug. Erick, I don't think this feature would be suitable for "robust index upgrades". To me it is really a best effort and shouldn't be trusted too much. I think some users will be tempted to wrap old readers to make them look good and then add them back to an index using addIndexes? Something like https://issues.apache.org/jira/browse/LUCENE-8277 would help make it harder to introduce corrupt data in an index. On Wed, Jan 23, 2019 at 3:11 PM Simon Willnauer <[email protected]> wrote: > > Hey folks, > > tl;dr; I want to be able to open an indexreader on an old index if the > SegmentInfo version is supported and all segment codecs are available. > Today that's not possible even if I port old formats to current > versions. > > Our BWC policy for quite a while has been N-1 major versions. That's > good and I think we should keep it that way. Only recently, caused by > changes how we encode/decode norms we also hard-enforce a the > index-version-created in several places and the version a segment was > written with. These are great enforcements and I understand why. My > request here is if we can find consensus on allowing somehow (a > special DirectoryReader for instance) to open such an index for > reading only that doesn't provide the guarantees that our high level > APIs decode norms correctly for instance. This would be enough to for > instance consume stored fields etc. for reindexing or if a users are > aware do they norms decoding in the codec. I am happy to work on a > proposal how this would work. It would still enforce no writing or > anything like this. I am also all for putting such a reader into misc > and being experimental. > > simon > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > -- Adrien --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
