So yet another iteration on the users list of going from X to X+2 got me to 
thinking (dangerous I know). I wanted to run this by folks to see if it’s worth 
a JIRA.

It _seems_ reasonable from a user’s perspective to create an index with, say, 
6x, then upgrade to 7x and reindex all documents (without deleting the index 
first), then be able to upgrade to 8x and reindex all documents. Rinse, repeat.

The problem of course is that the 6x segments get merged by TMP and the 6x 
stamp is preserved. (BTW, I’m going from hearsay here rather than code 
knowledge, correct me if I’m wrong, I’ve assumed all along that these are on 
each _segment_, not global to the entire index).

I can think of a couple of options for, say, TMP that might work out to support 
the above (I’m not proposing both, and these are bad names…): 
1 - onlyMergeSegmentsCreatedWithTheSameVersion
2 - neverMergeSegmentsCreatedWithAPriorVersion

Either of these would, if and only if _all_ docs were indeed indexed again, 
result in all the X-1 segments consisting entirely of deleted documents and 
being dropped. Now no segment has the X-1 marker and we could upgrade to X+1.

There are some edge cases of course:

- if even one X-1 doc wasn't reindexed, it wouldn’t work. I can think of ways 
around this, e.g. a command deleteAllSegmentsCreatedWithPriorVersions, but 
since that’s indeterminate in terms of _which_ docs get deleted, I don’t like 
it at all. Handling this case sounds like a best practice recommendation for 
people concerned with this to index a field in each doc themselves (we could 
automate this) and do a delete-by-query.

- Disk space issues. If we used <1> above, this wouldn’t be much differently 
from what we have now in terms of wasted space. There’d be some extra wasted 
space, but not much. <2> would cause greater disk space waste. <2> would 
probably be easier, but I don’t think <1> is much work either.

- Is it worth the effort? People have to reindex every doc anyway.

- How to test?

- ???

I think the question of whether to pursue this or not comes down to two 
questions:

1> Does it really help end users enough to be worth the effort? How many users 
can _guarantee_ that they reindex every document?

2> Would something along these lines work at all? Like I said, I’m going from 
hearsay rather than deep knowledge of the X-2 mechanism.

All I’m looking for here is whether it’s interesting enough for me to create a 
JIRA and discuss details there...
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to