Thanks for bringing this again. I tend to say: Let us just allow also IndexUpgrader beyodn 2 versions! If somebody complains about incorrect offsets, oh man - It's their problem.
Uwe ----- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: [email protected] > -----Original Message----- > From: Erick Erickson <[email protected]> > Sent: Friday, November 20, 2020 4:03 PM > To: [email protected] > Subject: Thinking about upgrading indexes to X+2 > > So yet another iteration on the users list of going from X to X+2 got me to > thinking (dangerous I know). I wanted to run this by folks to see if it’s > worth a > JIRA. > > It _seems_ reasonable from a user’s perspective to create an index with, say, > 6x, then upgrade to 7x and reindex all documents (without deleting the index > first), then be able to upgrade to 8x and reindex all documents. Rinse, > repeat. > > The problem of course is that the 6x segments get merged by TMP and the 6x > stamp is preserved. (BTW, I’m going from hearsay here rather than code > knowledge, correct me if I’m wrong, I’ve assumed all along that these are on > each _segment_, not global to the entire index). > > I can think of a couple of options for, say, TMP that might work out to > support > the above (I’m not proposing both, and these are bad names…): > 1 - onlyMergeSegmentsCreatedWithTheSameVersion > 2 - neverMergeSegmentsCreatedWithAPriorVersion > > Either of these would, if and only if _all_ docs were indeed indexed again, > result in all the X-1 segments consisting entirely of deleted documents and > being dropped. Now no segment has the X-1 marker and we could upgrade to > X+1. > > There are some edge cases of course: > > - if even one X-1 doc wasn't reindexed, it wouldn’t work. I can think of ways > around this, e.g. a command deleteAllSegmentsCreatedWithPriorVersions, but > since that’s indeterminate in terms of _which_ docs get deleted, I don’t like > it > at all. Handling this case sounds like a best practice recommendation for > people > concerned with this to index a field in each doc themselves (we could automate > this) and do a delete-by-query. > > - Disk space issues. If we used <1> above, this wouldn’t be much differently > from what we have now in terms of wasted space. There’d be some extra > wasted space, but not much. <2> would cause greater disk space waste. <2> > would probably be easier, but I don’t think <1> is much work either. > > - Is it worth the effort? People have to reindex every doc anyway. > > - How to test? > > - ??? > > I think the question of whether to pursue this or not comes down to two > questions: > > 1> Does it really help end users enough to be worth the effort? How many > users can _guarantee_ that they reindex every document? > > 2> Would something along these lines work at all? Like I said, I’m going from > hearsay rather than deep knowledge of the X-2 mechanism. > > All I’m looking for here is whether it’s interesting enough for me to create a > JIRA and discuss details there... > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
