+1 it makes sense to me; real world problems sometimes require messy solutions. I guess the alternative is everybody develops their own suite of tools and it is hard to share.
Some caution is warranted though I think; even with misc/experimental caveats, these tools will only be useful if people can understand what to expect from them, so it should be explicit what guarantees can be offered: I don't know what they will be exactly, but supposing stored fields/doc values fields can be retrieved/iterated over, but search results might differ due to ranking differences, early termination relying on new index structures? Maybe naming/defining these as having a limited scope like disaster recovery or migration or similar would give a hint that it should not be used as some kind of adapter in a production system for old indexes. I expect explaining what these tools are for to a wider audience will deserve some care. -Mike On Wed, Jan 23, 2019 at 3:30 PM Erick Erickson <[email protected]> wrote: > +1, A lot of this was discussed on SOLR-12259, we should probably link > any Lucene JIRAs for this back to that one to make an easy trail to > follow. > > One thing I'd thought of is whether we should merge segments during > this operation. If we're going to rewrite the entire index anyway, > does it make sense to combine segments into max-sized segments a-la > TieredMergePolicy? > > I'm not thinking of anything fancy at all here, there's no "cost" to > calculate for instance. Just > 1> go through the list of segments adding to a OneMerge until it's as > big as it can be. > 2> repeat until you have a list of OneMerge's that contain all the > original segments. > > How big "as big as it can be" is TBD, TMP uses 5G. Could be a param I > suppose..... > > Erick > > > On Wed, Jan 23, 2019 at 9:24 AM Andrzej Białecki <[email protected]> wrote: > > > > +1. I think that even with these caveats (read-only, some data may > require re-interpretation) it would still be a great help for accessing > legacy data, for which the original source may no longer exist. > > > > > On 23 Jan 2019, at 15:11, Simon Willnauer <[email protected]> > wrote: > > > > > > Hey folks, > > > > > > tl;dr; I want to be able to open an indexreader on an old index if the > > > SegmentInfo version is supported and all segment codecs are available. > > > Today that's not possible even if I port old formats to current > > > versions. > > > > > > Our BWC policy for quite a while has been N-1 major versions. That's > > > good and I think we should keep it that way. Only recently, caused by > > > changes how we encode/decode norms we also hard-enforce a the > > > index-version-created in several places and the version a segment was > > > written with. These are great enforcements and I understand why. My > > > request here is if we can find consensus on allowing somehow (a > > > special DirectoryReader for instance) to open such an index for > > > reading only that doesn't provide the guarantees that our high level > > > APIs decode norms correctly for instance. This would be enough to for > > > instance consume stored fields etc. for reindexing or if a users are > > > aware do they norms decoding in the codec. I am happy to work on a > > > proposal how this would work. It would still enforce no writing or > > > anything like this. I am also all for putting such a reader into misc > > > and being experimental. > > > > > > simon > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [email protected] > > > For additional commands, e-mail: [email protected] > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
