[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14130262#comment-14130262
 ] 

Robert Muir commented on LUCENE-5940:
-------------------------------------

{quote}
are there technical issues here i'm unaware of beyond creating and maintaining 
the backwards compat tests?
something outside of the codec mechanism that causes problems?
{quote}

There are plenty, first of all, maintaining back compat codecs has a real cost 
to improving lucene in the future, because if e.g. I want to make a change to 
the codec API, i have to make deal with tons of medieval index formats. Same 
goes with structural changes like making docvalues updatable (shai had to fight 
a lot here). Even stuff like simple code refactoring is expensive because its 
just a ton of code.

Also the old codecs hang behind on features. They might not support various 
features like offsets in the postings, payloads in the term vectors, missing 
bitsets for docvalues, or whole datastructure types 
(SORTED_SET/SORTED_NUMERIC), or even whole parts of the index (3.x with 
docvalues at all). They are missing various useful statistics, etc. These are 
just ones i've worked on myself recently, there are more, and there are more 
coming (like Mike's range prefix feature). This makes things like testing 
difficult.

Backwards compat drags around a lot of stuff for a long time (see the packed 
ints api) that makes it more complex and hard to work with and make changes to. 
It prevents and discourages real improvements to lucene. 

There are plenty of bugs in the back compat, the last few indexes have been 
riddled with them, some of them bad. Its undertested, overcomplex, and 
undermaintained. Again, not sexy stuff to work on, nobody wants to improve it.

Finally, users want to have more options, but until we can minimize this 
backwards compat, i'm personally going to push back very hard on any "options", 
because we simply cannot take on more back compat. So the codec API goes mostly 
wasted. Maybe we should rename it "backcompat" api, because thats all its 
currently good for. Backcompat hurts the users here in this case. If we didn't 
have so many ancient formats, we could instead provide (and actually support) 
"breadth" instead, such as various options for the way to encode data so users 
really can take advantage of it.


> change index backwards compatibility policy.
> --------------------------------------------
>
>                 Key: LUCENE-5940
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5940
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>
> Currently, our index backwards compatibility is unmanageable. The length of 
> time in which we must support old indexes is simply too long.
> The index back compat works like this: everyone wants it, but there are 
> frequently bugs, and when push comes to shove, its not a very sexy thing to 
> work on/fix, so its hard to get any help.
> Currently our back compat "promise" is just a broken promise, because we 
> cannot actually guarantee it for these reasons.
> I propose we scale back the length of time for which we must support old 
> indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to