[ https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140703#comment-14140703 ]
Tim Smith commented on LUCENE-5940: ----------------------------------- bq. Reindexing is part and parcel of search i think the general goal should be that this is not the case, especially as search is adopted more and more as replacements for systems that do not have these limitations/requirements (databases). obviously this is an ambitious goal that can likely never be fully realized. also, "reindexing" comes in 2 distinct flavors: * cold reindexing - rm -rf the index dir, re feed ** requires 2x hardware or downtime * live reindexing - change config, restart system, re feed all docs, change is "live" once all docs have been reindexed ** obviously a good idea to snapshot any previous index and config so you can restore later on error ** minimal downtime (just restart) ** minimal search interruption (some queries related to the change may not match old documents until reindex is complete) ** old content can be replaced slowly over time to receive full functionality live reindexing does have lots of pitfalls and may not always be viable. for instance, right now it is not possible to add offsets to an index using this approach. as soon as the a new segment is merged with an old one, the offsets are blown away. i had filed a ticket for this. i'm not looking to reopen old wounds here, just pointing out an issue i had with this and had to work around. live reindexing is the goal i strive to achieve when reindexing is required (always comes with a caveat to backup your index first for safety). some smart choices when designing the internal schema can reduce or eliminate many prospective issues here even without any core changes to lucene. bq. it's strongly recommended that it be gathered into an intermediate store these recommendations are always valid to make (and i will make them), however this adds an entire new system to the mix. as well as new hardware, services, maintenance, security, etc. also, given the scale and perhaps complexity of the documents, this may not even be enough and will still require a large amount of processing hardware to process these documents as fast as the index can index them in a reasonable amount of time (days vs months). in general, this is just extra complexity that will be dropped due to the higher price tag and maintenance cost. then, when it finally is time to upgrade the end-user expectation is that "oh, we already have the data indexed, why can't we just use that with the new software". this expectation is set due to the fact that many customers/users are used to working with databases. i do not have this expectation myself, however i have people downstream that do have these expectations and i need to do my best to accommodate them whether i like it or not. note, i'm not trying to force any requirements on lucene devs, or soliciting advice on specific functionality, just pointing out some real world use cases i encounter related to discussion here. > change index backwards compatibility policy. > -------------------------------------------- > > Key: LUCENE-5940 > URL: https://issues.apache.org/jira/browse/LUCENE-5940 > Project: Lucene - Core > Issue Type: Bug > Reporter: Robert Muir > > Currently, our index backwards compatibility is unmanageable. The length of > time in which we must support old indexes is simply too long. > The index back compat works like this: everyone wants it, but there are > frequently bugs, and when push comes to shove, its not a very sexy thing to > work on/fix, so its hard to get any help. > Currently our back compat "promise" is just a broken promise, because we > cannot actually guarantee it for these reasons. > I propose we scale back the length of time for which we must support old > indexes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org