[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140703#comment-14140703
 ] 

Tim Smith commented on LUCENE-5940:
-----------------------------------

bq. Reindexing is part and parcel of search

i think the general goal should be that this is not the case, especially as 
search is adopted more and more as replacements for systems that do not have 
these limitations/requirements (databases). obviously this is an ambitious goal 
that can likely never be fully realized. 

also, "reindexing" comes in 2 distinct flavors:
* cold reindexing - rm -rf the index dir, re feed
** requires 2x hardware or downtime
* live reindexing - change config, restart system, re feed all docs, change is 
"live" once all docs have been reindexed
** obviously a good idea to snapshot any previous index and config so you can 
restore later on error
** minimal downtime (just restart)
** minimal search interruption (some queries related to the change may not 
match old documents until reindex is complete)
** old content can be replaced slowly over time to receive full functionality


live reindexing does have lots of pitfalls and may not always be viable. for 
instance, right now it is not possible to add offsets to an index using this 
approach. as soon as the a new segment is merged with an old one, the offsets 
are blown away. i had filed a ticket for this. i'm not looking to reopen old 
wounds here, just pointing out an issue i had with this and had to work around.

live reindexing is the goal i strive to achieve when reindexing is required 
(always comes with a caveat to backup your index first for safety). some smart 
choices when designing the internal schema can reduce or eliminate many 
prospective issues here even without any core changes to lucene.

bq. it's strongly recommended that it be gathered into an intermediate store

these recommendations are always valid to make (and i will make them), however 
this adds an entire new system to the mix. as well as new hardware, services, 
maintenance, security, etc. also, given the scale and perhaps complexity of the 
documents, this may not even be enough and will still require a large amount of 
processing hardware to process these documents as fast as the index can index 
them in a reasonable amount of time (days vs months). in general, this is just 
extra complexity that will be dropped due to the higher price tag and 
maintenance cost. then, when it finally is time to upgrade the end-user 
expectation is that "oh, we already have the data indexed, why can't we just 
use that with the new software". this expectation is set due to the fact that 
many customers/users are used to working with databases. i do not have this 
expectation myself, however i have people downstream that do have these 
expectations and i need to do my best to accommodate them whether i like it or 
not.


note, i'm not trying to force any requirements on lucene devs, or soliciting 
advice on specific functionality, just pointing out some real world use cases i 
encounter related to discussion here.


> change index backwards compatibility policy.
> --------------------------------------------
>
>                 Key: LUCENE-5940
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5940
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>
> Currently, our index backwards compatibility is unmanageable. The length of 
> time in which we must support old indexes is simply too long.
> The index back compat works like this: everyone wants it, but there are 
> frequently bugs, and when push comes to shove, its not a very sexy thing to 
> work on/fix, so its hard to get any help.
> Currently our back compat "promise" is just a broken promise, because we 
> cannot actually guarantee it for these reasons.
> I propose we scale back the length of time for which we must support old 
> indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to