Thanks for bringing this again.

I tend to say: Let us just allow also IndexUpgrader beyodn 2 versions! If 
somebody complains about incorrect offsets, oh man - It's their problem.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: [email protected]

> -----Original Message-----
> From: Erick Erickson <[email protected]>
> Sent: Friday, November 20, 2020 4:03 PM
> To: [email protected]
> Subject: Thinking about upgrading indexes to X+2
> 
> So yet another iteration on the users list of going from X to X+2 got me to
> thinking (dangerous I know). I wanted to run this by folks to see if it’s 
> worth a
> JIRA.
> 
> It _seems_ reasonable from a user’s perspective to create an index with, say,
> 6x, then upgrade to 7x and reindex all documents (without deleting the index
> first), then be able to upgrade to 8x and reindex all documents. Rinse, 
> repeat.
> 
> The problem of course is that the 6x segments get merged by TMP and the 6x
> stamp is preserved. (BTW, I’m going from hearsay here rather than code
> knowledge, correct me if I’m wrong, I’ve assumed all along that these are on
> each _segment_, not global to the entire index).
> 
> I can think of a couple of options for, say, TMP that might work out to 
> support
> the above (I’m not proposing both, and these are bad names…):
> 1 - onlyMergeSegmentsCreatedWithTheSameVersion
> 2 - neverMergeSegmentsCreatedWithAPriorVersion
> 
> Either of these would, if and only if _all_ docs were indeed indexed again,
> result in all the X-1 segments consisting entirely of deleted documents and
> being dropped. Now no segment has the X-1 marker and we could upgrade to
> X+1.
> 
> There are some edge cases of course:
> 
> - if even one X-1 doc wasn't reindexed, it wouldn’t work. I can think of ways
> around this, e.g. a command deleteAllSegmentsCreatedWithPriorVersions, but
> since that’s indeterminate in terms of _which_ docs get deleted, I don’t like 
> it
> at all. Handling this case sounds like a best practice recommendation for 
> people
> concerned with this to index a field in each doc themselves (we could automate
> this) and do a delete-by-query.
> 
> - Disk space issues. If we used <1> above, this wouldn’t be much differently
> from what we have now in terms of wasted space. There’d be some extra
> wasted space, but not much. <2> would cause greater disk space waste. <2>
> would probably be easier, but I don’t think <1> is much work either.
> 
> - Is it worth the effort? People have to reindex every doc anyway.
> 
> - How to test?
> 
> - ???
> 
> I think the question of whether to pursue this or not comes down to two
> questions:
> 
> 1> Does it really help end users enough to be worth the effort? How many
> users can _guarantee_ that they reindex every document?
> 
> 2> Would something along these lines work at all? Like I said, I’m going from
> hearsay rather than deep knowledge of the X-2 mechanism.
> 
> All I’m looking for here is whether it’s interesting enough for me to create a
> JIRA and discuss details there...
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to