+1 it makes sense to me; real world problems sometimes require messy
solutions. I guess the alternative is everybody develops their own suite of
tools and it is hard to share.

Some caution is warranted though I think; even with misc/experimental
caveats, these tools will only be useful if people can understand what to
expect from them, so it should be explicit what guarantees can be offered:
I don't know what they will be exactly, but supposing stored fields/doc
values fields can be retrieved/iterated over, but search results might
differ due to ranking differences, early termination relying on new index
structures?  Maybe naming/defining these as having a limited scope like
disaster recovery or migration or similar would give a hint that it should
not be used as some kind of adapter in a production system for old indexes.
I expect explaining what these tools are for to a wider audience will
deserve some care.

-Mike

On Wed, Jan 23, 2019 at 3:30 PM Erick Erickson <[email protected]>
wrote:

> +1, A lot of this was discussed on SOLR-12259, we should probably link
> any Lucene JIRAs for this back to that one to make an easy trail to
> follow.
>
> One thing I'd thought of is whether we should merge segments during
> this operation. If we're going to rewrite the entire index anyway,
> does it make sense to combine segments into max-sized segments a-la
> TieredMergePolicy?
>
> I'm not thinking of anything fancy at all here, there's no "cost" to
> calculate for instance. Just
> 1> go through the list of segments adding to a OneMerge until it's as
> big as it can be.
> 2> repeat until you have a list of OneMerge's that contain all the
> original segments.
>
> How big "as big as it can be" is TBD, TMP uses 5G. Could be a param I
> suppose.....
>
> Erick
>
>
> On Wed, Jan 23, 2019 at 9:24 AM Andrzej Białecki <[email protected]> wrote:
> >
> > +1. I think that even with these caveats (read-only, some data may
> require re-interpretation) it would still be a great help for accessing
> legacy data, for which the original source may no longer exist.
> >
> > > On 23 Jan 2019, at 15:11, Simon Willnauer <[email protected]>
> wrote:
> > >
> > > Hey folks,
> > >
> > > tl;dr; I want to be able to open an indexreader on an old index if the
> > > SegmentInfo version is supported and all segment codecs are available.
> > > Today that's not possible even if I port old formats to current
> > > versions.
> > >
> > > Our BWC policy for quite a while has been N-1 major versions. That's
> > > good and I think we should keep it that way. Only recently, caused by
> > > changes how we encode/decode norms we also hard-enforce a the
> > > index-version-created in several places and the version a segment was
> > > written with. These are great enforcements and I understand why. My
> > > request here is if we can find consensus on allowing somehow (a
> > > special DirectoryReader for instance) to open such an index for
> > > reading only that doesn't provide the guarantees that our high level
> > > APIs decode norms correctly for instance. This would be enough to for
> > > instance consume stored fields etc. for reindexing or if a users are
> > > aware do they norms decoding in the codec. I am happy to work on a
> > > proposal how this would work. It would still enforce no writing or
> > > anything like this. I am also all for putting such a reader into misc
> > > and being experimental.
> > >
> > > simon
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to