This looks reasonable to me. On Tue, Jan 29, 2019 at 4:23 PM Simon Willnauer <[email protected]> wrote: > > thanks folks, > > these are all good points. I created a first cut of what I had in mind > [1] . It's relatively simple and from a java visibility perspective > the only change that a user can take advantage of is this [2] and this > [3] respectively. This would allow opening indices back to Lucene 7.0 > given that the codecs and postings formats are available. From a > documentation perspective I added [4]. Thisi s a pure read-only change > and doesn't allow opening these indices for writing. You can't merge > them neither would you be able to open an index writer on top of it. I > still need to add support to Check-Index but that's what it is > basically. > > lemme know what you think, > > simon > [1] > https://github.com/apache/lucene-solr/commit/0c4c885214ef30627a01e320f9c861dc2521b752 > [2] > https://github.com/apache/lucene-solr/commit/0c4c885214ef30627a01e320f9c861dc2521b752#diff-e0352098b027d6f41a17c068ad8d7ef0R689 > [3] > https://github.com/apache/lucene-solr/commit/0c4c885214ef30627a01e320f9c861dc2521b752#diff-e3ccf9ee90355b10f2dd22ce2da6c73cR306 > [4] > https://github.com/apache/lucene-solr/commit/0c4c885214ef30627a01e320f9c861dc2521b752#diff-1bedf4d0d52ff88ef8a16a6788ad7684R86 > > On Fri, Jan 25, 2019 at 3:14 PM Michael McCandless > <[email protected]> wrote: > > > > Another example is long ago Lucene allowed pos=-1 to be indexed and it > > caused all sorts of problems. We also stopped allowing positions close to > > Integer.MAX_VALUE (https://issues.apache.org/jira/browse/LUCENE-6382). Yet > > another is allowing negative vInts which are possible but horribly > > inefficient (https://issues.apache.org/jira/browse/LUCENE-3738). > > > > We do need to be free to fix these problems and then know after N+2 > > releases that no index can have the issue. > > > > I like the idea of providing "expert" / best effort / limited way of > > carrying forward such ancient indices, but I think the huge challenge for > > someone using that tool on an important index will be enumerating the list > > of issues that might "matter" (the 3 Adrien listed + the 3 I listed above > > is a start for this list) and taking appropriate steps to "correct" the > > index if so. E.g. on a norms encoding change, somehow these expert tools > > must decode norms the old way, encode them the new way, and then rewrite > > the norms files. Or if the index has pos=-1, changing that to pos=0. Or > > if it has negative vInts, ... etc. > > > > Or maybe the "special" DirectoryReader only reads stored fields? And so > > you would enumerate your _source and reindex into the latest format ... > > > > > Something like https://issues.apache.org/jira/browse/LUCENE-8277 would > > > help make it harder to introduce corrupt data in an index. > > > > +1 > > > > Every time we catch something like "don't allow pos = -1 into the index" we > > need somehow remember to go and add the check also in addIndices. > > > > Mike McCandless > > > > http://blog.mikemccandless.com > > > > > > On Fri, Jan 25, 2019 at 3:52 AM Adrien Grand <[email protected]> wrote: > >> > >> Agreed with Michael that setting expectations is going to be > >> important. The thing that I would like to make sure is that we would > >> never refrain from moving Lucene forward because of this feature. In > >> particular, lucene-core should be free to make assumptions that are > >> valid for N and N-1 indices without worrying about the fact that we > >> have this super-expert feature that allows opening older indices. Here > >> are some assumptions that I have in mind which have not always been > >> true: > >> - norms might be encoded in a different way (this changed in 7) > >> - all index files have a checksum (only true since Lucene 5) > >> - offsets are always going forward (only enforced since Lucene 7) > >> > >> This means that carrying indices over by just merging them with the > >> new version to move them to a new codec won't work all the time. For > >> instance if your index has backward offsets and new codecs assume that > >> offsets are going forward, then merging might fail or corrupt offsets > >> - I'd like to make sure that we would not consider this a bug. > >> > >> Erick, I don't think this feature would be suitable for "robust index > >> upgrades". To me it is really a best effort and shouldn't be trusted > >> too much. > >> > >> I think some users will be tempted to wrap old readers to make them > >> look good and then add them back to an index using addIndexes? > >> Something like https://issues.apache.org/jira/browse/LUCENE-8277 would > >> help make it harder to introduce corrupt data in an index. > >> > >> On Wed, Jan 23, 2019 at 3:11 PM Simon Willnauer > >> <[email protected]> wrote: > >> > > >> > Hey folks, > >> > > >> > tl;dr; I want to be able to open an indexreader on an old index if the > >> > SegmentInfo version is supported and all segment codecs are available. > >> > Today that's not possible even if I port old formats to current > >> > versions. > >> > > >> > Our BWC policy for quite a while has been N-1 major versions. That's > >> > good and I think we should keep it that way. Only recently, caused by > >> > changes how we encode/decode norms we also hard-enforce a the > >> > index-version-created in several places and the version a segment was > >> > written with. These are great enforcements and I understand why. My > >> > request here is if we can find consensus on allowing somehow (a > >> > special DirectoryReader for instance) to open such an index for > >> > reading only that doesn't provide the guarantees that our high level > >> > APIs decode norms correctly for instance. This would be enough to for > >> > instance consume stored fields etc. for reindexing or if a users are > >> > aware do they norms decoding in the codec. I am happy to work on a > >> > proposal how this would work. It would still enforce no writing or > >> > anything like this. I am also all for putting such a reader into misc > >> > and being experimental. > >> > > >> > simon > >> > > >> > --------------------------------------------------------------------- > >> > To unsubscribe, e-mail: [email protected] > >> > For additional commands, e-mail: [email protected] > >> > > >> > >> > >> -- > >> Adrien > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [email protected] > >> For additional commands, e-mail: [email protected] > >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] >
-- Adrien --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
