[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-19 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14140703#comment-14140703
 ] 

Tim Smith commented on LUCENE-5940:
---

bq. Reindexing is part and parcel of search

i think the general goal should be that this is not the case, especially as 
search is adopted more and more as replacements for systems that do not have 
these limitations/requirements (databases). obviously this is an ambitious goal 
that can likely never be fully realized. 

also, reindexing comes in 2 distinct flavors:
* cold reindexing - rm -rf the index dir, re feed
** requires 2x hardware or downtime
* live reindexing - change config, restart system, re feed all docs, change is 
live once all docs have been reindexed
** obviously a good idea to snapshot any previous index and config so you can 
restore later on error
** minimal downtime (just restart)
** minimal search interruption (some queries related to the change may not 
match old documents until reindex is complete)
** old content can be replaced slowly over time to receive full functionality


live reindexing does have lots of pitfalls and may not always be viable. for 
instance, right now it is not possible to add offsets to an index using this 
approach. as soon as the a new segment is merged with an old one, the offsets 
are blown away. i had filed a ticket for this. i'm not looking to reopen old 
wounds here, just pointing out an issue i had with this and had to work around.

live reindexing is the goal i strive to achieve when reindexing is required 
(always comes with a caveat to backup your index first for safety). some smart 
choices when designing the internal schema can reduce or eliminate many 
prospective issues here even without any core changes to lucene.

bq. it's strongly recommended that it be gathered into an intermediate store

these recommendations are always valid to make (and i will make them), however 
this adds an entire new system to the mix. as well as new hardware, services, 
maintenance, security, etc. also, given the scale and perhaps complexity of the 
documents, this may not even be enough and will still require a large amount of 
processing hardware to process these documents as fast as the index can index 
them in a reasonable amount of time (days vs months). in general, this is just 
extra complexity that will be dropped due to the higher price tag and 
maintenance cost. then, when it finally is time to upgrade the end-user 
expectation is that oh, we already have the data indexed, why can't we just 
use that with the new software. this expectation is set due to the fact that 
many customers/users are used to working with databases. i do not have this 
expectation myself, however i have people downstream that do have these 
expectations and i need to do my best to accommodate them whether i like it or 
not.


note, i'm not trying to force any requirements on lucene devs, or soliciting 
advice on specific functionality, just pointing out some real world use cases i 
encounter related to discussion here.


 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-19 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14140723#comment-14140723
 ] 

Shawn Heisey commented on LUCENE-5940:
--

{quote}
also, reindexing comes in 2 distinct flavors:
* cold reindexing - rm -rf the index dir, re feed
** requires 2x hardware or downtime
* live reindexing - change config, restart system, re feed all docs, change is 
live once all docs have been reindexed
** obviously a good idea to snapshot any previous index and config so you can 
restore later on error
** minimal downtime (just restart)
** minimal search interruption (some queries related to the change may not 
match old documents until reindex is complete)
** old content can be replaced slowly over time to receive full functionality
{quote}

I use Solr.  My reindexing method is actually a combination of the two you've 
mentioned.  For every shard, I have a live core and a build core.  When a 
reindex is required, I start importing from my database into the build cores.  
In the meantime, the live cores are still being updated once a minute with new 
data and deletes.  When the full import is done, I apply all relevant changes 
to the build cores, then swap them with the live cores.  Once that copy of my 
index is rebuilt, I re-enable it so that the load balancer can use it again.


 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-18 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14140022#comment-14140022
 ] 

Shawn Heisey commented on LUCENE-5940:
--

bq. i also get the feeling a lot of the lucene devs in general don't think 
full reindexing is an issue and can just be done at any point with minimal 
cost (just a vibe i've picked up).

You're definitely not wrong here.  When first getting an index into production 
(often daily or even more frequently), and later when the application needs 
change, the user must make changes to the code (Lucene) or schema (Solr, 
elasticsearch, or other product) that are incompatible with the existing index. 
 When a user obtains help from a mailing list or other support resource, such a 
change is VERY likely.

Reindexing is part and parcel of search.  Users who are unable to efficiently 
perform a reindex will usually find themselves without the search capabilities 
that they really need, because they made incorrect early assumptions that can't 
be fixed without reindexing.  This can be the case even if they go years 
without upgrading their Lucene libraries.  If the actual source data is 
difficult to obtain, it's strongly recommended that it be gathered into an 
intermediate store with excellent retrieval characteristics, such as a database.


 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-12 Thread Ryan Ernst (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131905#comment-14131905
 ] 

Ryan Ernst commented on LUCENE-5940:



bq. of course i'm not a committer, so i have no final say

Tim, please don't think that we are trying to ignore your concerns. While I 
understand your frustration (more work), I don't think the pain you could feel 
is really any different than today?  There is no specific measurement that goes 
into what constitutes enough work for a release, just community sway.  
Technically, if someone is willing to do the work (LUCENE-5944), and there are 
3 +1's, and more +1's than -1's, a release can happen.  I don't mean this as a 
threat, I only mean it to demonstrate how arbitrary the process can be, not 
guaranteeing you any kind of time between major releases.  Because of this, you 
could be in the same situation you described with the shorter BWC policy.

The suggested policy would greatly simplify the work needed on the development 
side, and give us a clean slate for each major release.  And at the same time, 
I think this could theoretically extend the ability to upgrade old indexes over 
a longer span .  The meta tool I have proposed could be the link between all 
major versions.  All it needs to do is be able to read what version an index 
was written with, so it knows the major version (and this ability can be 
segregated to that tool, as this should be relatively simple to copy if how to 
do that changes).  I think this is much more powerful than today's policy, 
while at the same time allowing the API to be improved in significant ways 
across major releases, compared to now, where it cannot really change without 
enormous effort because of the need to continue reading the entire previous 
major version.

So from a user perspective, we want to make this work; it is not just for 
developers.  Your main concerns seem to be about the tool being offline, the 
writing special segment metadata, and the network connectivity to grab the old 
upgraders.

First, I don't see a way around it being offline; the apis between major 
versions could differ in significant ways. But it is no different than if you 
had a 3x index today, and we released 5.0 tomorrow: you would first have to 
upgrade to a 4x index, why wouldn't you upgrade to 4.99? And that process would 
have to be offline, so adding an additional step of first going to 3.99 doesn't 
seem unreasonable.

Regarding special metadata, I think most users are just using the default codec 
as written. When you use non default setup, it will (most likely always) 
require additional work.  I understand this pain, but it is pain you have put 
upon yourself.  But if you already have code for 4x, then upgrading to 4.99 
before changing your code to work with 5.0 should not be difficult, since 
within a major release the APIs should be stable.

As for network connectivity, it seems like this could just be a packaging 
issue?  Would it help if each release had the metatool containing the necessary 
subjars for each previous release, so that it would not have to download (it 
would just make it a bit bigger)?

As developers we need this to happen, to maintain any kind of sanity in our 
ability to guarantee compatibility. As users you want backward compatibility to 
work as long as possible.  I think this would actually serve both purposes, in 
a way that is advantageous for both sides.

 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-12 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131951#comment-14131951
 ] 

Tim Smith commented on LUCENE-5940:
---

i fully understand the reasons for wanting to change the policy here. i 
absolutely hate maintaining backwards compat myself. its just a nightmare and 
leaves lots of rotting code laying around waiting to wreak havoc and makes it 
dicey to add new functionality. i'm fully on board with that sentiment

but, i have to support it, and do so in a seamless online manner that is not 
prone to user error.

i also get the feeling a lot of the lucene devs in general don't think full 
reindexing is an issue and can just be done at any point with minimal cost 
(just a vibe i've picked up). my experience is that this can be a many months 
long process (slow sources). this seems to influence support for backwards 
compatibility, as well as support for changing configuration/schema options, 
for existing fields, etc

by all means, create a good upgrade tool people can use. however, it won't be 
useful for me and i will need to find a different solution (which will likely 
result in slowing my adoption of 5.0 when it is released)

i am in no way advocating that 5.0 should support reading 3.x indexes.

again, i'm just adding my perspective here so informed people can make a 
decision based on all points of view

if the policy changes, i will just have to adapt as necessary


 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-12 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132382#comment-14132382
 ] 

Robert Muir commented on LUCENE-5940:
-

Actually 5.0 doesn't even need to read 4.x indexes. I had forgotten when I 
opened this JIRA issue that 
we already voted on this in 2010. (this vote passed).

{noformat}
[VOTE] Take 2: Open up a separate line for unstable Solr/Lucene development

This is a vote for the proposal discussed on the 'Proposal about
Version API relaxation' thread.  This thread replaces the first
VOTE thread!

The vote is to open up a separate parallel line of development, called
unstable (on trunk), where non-back-compatible changes, slated for the
next major release, may be safely developed.

But it's not a free for all: the back compat break must still be
carefully tracked in detail (maybe in CHANGES, maybe in a separate
more detailed guide -- tbd), including migration instructions, so
that this becomes the migration guide on how users can move to the
new major release.  If there are changes that break the index, we will
try very hard to create an index migration tool.
{noformat}


 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-12 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132444#comment-14132444
 ] 

Robert Muir commented on LUCENE-5940:
-

{quote}
 The meta tool I have proposed could be the link between all major versions.
{quote}

I agree. In fact its the current policy, we voted in it 4 years ago, Uwe even 
wrote the tool, but everyone forgot :)

A few notes:
* First of all, people dont realize that you cant take your 3.x index to 4.0, 
issue some commits, and bring it to 5.0 and use it. Its always been this way, 
you have to actually ensure every single segment is in the supported format. So 
some upgrade process is always necessary (a forceMerge(), or use of 
IndexUpgrader).
* There are bugs in this today, because we dont test the partially supported 
situation. For example if someone takes their 3.x index today to 4.0, kisses it 
with some commits, but it still have some 3x segments, then tries to read it 
with trunk, AFAIK they wont get IndexFormatTooOldException. Instead they will 
get a confusing SPI failure for Lucene3x. So really, we should make a 
TooOldCodec that throws IndexFormatTooOldException and register it in SPI with 
every single codec that is unsupported.
* Finally, IMO we have an upgrade tool. pretend we cut 5.0 today. The 
instructions are simple, just run java -cp lucene-4.10.0-core.jar 
org.apache.lucene.index.IndexUpgrader index and you are ready.

 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-11 Thread Ryan Ernst (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130103#comment-14130103
 ] 

Ryan Ernst commented on LUCENE-5940:


Big +1.  Our current policy has us supporting indexes 4+ years old, and given 
how long 4x is lasting, that will just keep stretching. Obviously there needs 
to be an upgrade path, but I don't think it needs to be so easy for someone 
that hasn't upgraded in 4 years.

My concrete proposal is supporting the current major release, plus the last 
minor release of the previous major release.  That should provide an upgrade 
path by first updating to the last minor release of the major release you are 
using, followed by the lastest of the next major release.  Given the 4.x 
architecture with codecs, this should be much easier than it has been to 
maintain 3x index formats. 

 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-11 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130104#comment-14130104
 ] 

Adrien Grand commented on LUCENE-5940:
--

bq. My concrete proposal is supporting the current major release, plus the last 
minor release of the previous major release.

That is what I was thinking about as well when reading the issue description.

 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-11 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130121#comment-14130121
 ] 

Tim Smith commented on LUCENE-5940:
---

i understand the desire for changing the policy here. i wish i didn't have to 
care about backwards compat support, but its just the nature of things. people 
have large indexes that can take a significant amount of time to reindex (due 
to a slow source, or complex processing)

the current proposal here would be problematic for any lucene users who do not 
release versions in lock step with lucene versions. Solr obviously would have 
limited issues here since a user could just upgrade to solr 4.99 (assuming 4.99 
is the final 4.x version) and then solr 5.0 and no problems.

however, if product X released with lucene 4.88 and the last minor version in 
4.x line was 4.99, then the upgrade process to get to a lucene 5.0 index is now 
convoluted and will require creation of custom offline tools to provide an 
upgrade path.  This backwards compatibility requirement is now just shifted 
from the lucene devs to the lucene users and can no longer be a seamless 
transition.

the current policy does not have these issues since all that i would need to do 
is fire up the next version, do a forceMerge, and everything is up to date on 
latest codecs. (no offline processes required, search can continue to work 
during upgrade)




 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130127#comment-14130127
 ] 

Robert Muir commented on LUCENE-5940:
-

Tim but the policy is really a joke. it just locks things in with releases.

Currently if you are on lucene 2, then in order to get to lucene 4, you have to 
move to 3.x first.

If we released 5.0 right now, we would not have to deal with 3.x indexes 
anymore. We could release 6.0 e.g. within a year of that, and we'd contain the 
problem.

I think if i actually proposed 5.0 and took it seriously, no one would really 
complain. But its bogus to do this and issue releases with not so many features 
just because it makes everyone feel better, when its really the policy that is 
broken. That is what we should fix.

This is from someone who has spent the the last 2 days doing nothing but fight 
back compat in lucene.

 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130128#comment-14130128
 ] 

Robert Muir commented on LUCENE-5940:
-

{quote}
Having to keep bw compat for all 4.x codecs once 5.0 is released would be a 
nightmare.
{quote}

Right, the fallback plan is to release 5.0, then rapidly release 6.0 (maybe 
just a few days after) so we can drop all the shit. That doesn't require a 
change to the backwards compatibility policy. But i hope everyone understands 
how ridiculous that is when we can just be reasonable instead.

 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130131#comment-14130131
 ] 

Uwe Schindler commented on LUCENE-5940:
---

bq. however, if product X released with lucene 4.88 and the last minor version 
in 4.x line was 4.99, then the upgrade process to get to a lucene 5.0 index is 
now convoluted and will require creation of custom offline tools to provide an 
upgrade path. the current policy does not have these issues since all that i 
would need to do is fire up the next version, do a forceMerge, and everything 
is up to date on latest codecs. (no offline processes required, search can 
continue to work during upgrade)

We have a tool that does this without forceMerge. It just upgrades those 
segments that need upgrade and writes a new commit point. It is called 
IndexUpgrader and has a main method.

My idea would be to privtde that tool, including all stuff as a self-executing 
JAR file, so you just need: {{java -jar lucene-indexupgrader-4.10.0.jar 
indexdir}} (basically it is already like that, but you need to build classpath 
and command line manually).

 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-11 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130138#comment-14130138
 ] 

Tim Smith commented on LUCENE-5940:
---

5.0 should not be saddled with supporting 3.x index. 100% agree there

however, 5.0 should ideally continue to support 4.0-4.99 indexes (at least from 
the codec/index reading perspective)

the best place to handle backwards compat is in the core of lucene.
otherwise, you are just going to have uses all over the place doing their own 
interpretation of backwards compat, getting it wrong, broken, etc. and will 
subsequently result in lots of irate user filing tickets.

if you only support the last minor version from the previous release, it makes 
it difficult for everyone who was not at that exact minor release. 


also, to uwe's point the indexupgrade tool is an offline process. also, in my 
situation, i would need custom packaging of that tool in order to provide ease 
of use/proper codec usage, etc. vs just fire up index on 5.0 and forceMerge. 
the custom packaging would also require including an old version of lucene in 
my project that would be packaged separately, and would just be a nightmare to 
maintain.

alternatively, i would just grab the source for all removed 4.x codecs i need 
and pull them into my project (this is not ideal since they are no longer 
maintained by lucene devs and may have dependency issues that would require 
porting)

 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130141#comment-14130141
 ] 

Robert Muir commented on LUCENE-5940:
-

{quote}
however, 5.0 should ideally continue to support 4.0-4.99 indexes (at least from 
the codec/index reading perspective)
{quote}

Who will do the work? Who will maintain this? 

Won't be me.

 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-11 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130144#comment-14130144
 ] 

Adrien Grand commented on LUCENE-5940:
--

Maybe another option would be to have a policy that is purely time-based? Eg. 
codecs would be removed, even in minor releases, when they have not been the 
default codec for more than one year?

 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130148#comment-14130148
 ] 

Robert Muir commented on LUCENE-5940:
-

firefox release policy is the other option. We can just release new major 
versions every few months and keep things contained.

We can do this now, without any change to the policy :)

 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-11 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130149#comment-14130149
 ] 

Tim Smith commented on LUCENE-5940:
---

time based would be much more reasonable

as long as people are on a 4.x release that is less 1-2 years old, they should 
be able to move directly to 5.0

supporting indexes 4+ years old is asking a bit much, but assuming an external 
release cycle of 1 year, a 1-2 year cutoff is manageable


 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-11 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130151#comment-14130151
 ] 

Tim Smith commented on LUCENE-5940:
---

firefox does not need to worry about an upgrade path for terabytes worth of 
data

they only need to worry about upgrading bookmarks and thats about it

 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130156#comment-14130156
 ] 

Robert Muir commented on LUCENE-5940:
-

The way i see it, back compat is just like any other feature. If people dont 
step up to contribute to make it happen, then we drop it. 

I'm done wasting days and days on it when i don't care about it.

 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-11 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130169#comment-14130169
 ] 

Tim Smith commented on LUCENE-5940:
---

i fully understand the pain associated with maintaining back compat

i guess it would be good if you (and others) could enumerate all the issues 
related here for full perspective (description does not list them)

also, it should be on the developer who removes write support (or removes a 
codec) to add the backwards compat support/testing.

creating a new codec that supplants an old codec should not inherently require 
removal of write support for old codec.

 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-11 Thread Ryan Ernst (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130180#comment-14130180
 ] 

Ryan Ernst commented on LUCENE-5940:


bq. Maybe another option would be to have a policy that is purely time-based?

I had thought about this before making my suggesting, but I think this has the 
problem of being very arbitrary, and hard to know what upgrade path is needed.  
For example, if the policy is 1 year, and I am at 4.3, and the latest is 5.6, 
how do I know what I need to upgrade to in order to get to 5.6? Is it 5.3.1 or 
5.2.4?  I think maintaining this table as old versions are dropped would be 
difficult in itself.

bq. My idea would be to privtde that tool, including all stuff as a 
self-executing JAR file

This is a great idea! In fact, I think we can make one better.  We could 
provide this tool, as well as a meta tool, which knows how to download those 
tools for each release.  It could then output something like:
{noformat}
Found index version 4.3.2
Latest version is 6.7.0
Upgrading index to 4.99.0...done
Upgrading index to 5.99.0...done
Upgrading index to 6.7.0...done
{noformat}

 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-11 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130218#comment-14130218
 ] 

Tim Smith commented on LUCENE-5940:
---

the problem with the upgrade tool approach is that it doesn't scale to clusters 
with large numbers of indexes.

for instance, a cluster that has 50 indexes spread across a bunch of machines.
this is now an involved manual task put in the hands of system administrators 
who don't really know whats going on under the hood. 

thats just asking for trouble

it seems like the whole power of codecs is that you can avoid all this and 
allow for seamless transitions by having read only codecs for previous index 
formats.

are there technical issues here i'm unaware of beyond creating and maintaining 
the backwards compat tests?
something outside of the codec mechanism that causes problems?

if not, just dump the read only codecs for old versions in an contrib module 
and let people upgrade at their leisure (and let the community find/fix bugs as 
they are encountered)

 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130247#comment-14130247
 ] 

Uwe Schindler commented on LUCENE-5940:
---

bq. if not, just dump the read only codecs for old versions in an contrib 
module and let people upgrade at their leisure (and let the community find/fix 
bugs as they are encountered)

Already done in Lucene trunk: There is a new backwards module. In trunk you can 
read previous indexes only with this jar is the classpath (loaded via SPI).

 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130262#comment-14130262
 ] 

Robert Muir commented on LUCENE-5940:
-

{quote}
are there technical issues here i'm unaware of beyond creating and maintaining 
the backwards compat tests?
something outside of the codec mechanism that causes problems?
{quote}

There are plenty, first of all, maintaining back compat codecs has a real cost 
to improving lucene in the future, because if e.g. I want to make a change to 
the codec API, i have to make deal with tons of medieval index formats. Same 
goes with structural changes like making docvalues updatable (shai had to fight 
a lot here). Even stuff like simple code refactoring is expensive because its 
just a ton of code.

Also the old codecs hang behind on features. They might not support various 
features like offsets in the postings, payloads in the term vectors, missing 
bitsets for docvalues, or whole datastructure types 
(SORTED_SET/SORTED_NUMERIC), or even whole parts of the index (3.x with 
docvalues at all). They are missing various useful statistics, etc. These are 
just ones i've worked on myself recently, there are more, and there are more 
coming (like Mike's range prefix feature). This makes things like testing 
difficult.

Backwards compat drags around a lot of stuff for a long time (see the packed 
ints api) that makes it more complex and hard to work with and make changes to. 
It prevents and discourages real improvements to lucene. 

There are plenty of bugs in the back compat, the last few indexes have been 
riddled with them, some of them bad. Its undertested, overcomplex, and 
undermaintained. Again, not sexy stuff to work on, nobody wants to improve it.

Finally, users want to have more options, but until we can minimize this 
backwards compat, i'm personally going to push back very hard on any options, 
because we simply cannot take on more back compat. So the codec API goes mostly 
wasted. Maybe we should rename it backcompat api, because thats all its 
currently good for. Backcompat hurts the users here in this case. If we didn't 
have so many ancient formats, we could instead provide (and actually support) 
breadth instead, such as various options for the way to encode data so users 
really can take advantage of it.


 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130272#comment-14130272
 ] 

Robert Muir commented on LUCENE-5940:
-

I agree, thats the worst part of all. trunk should not be burdened with this 
stuff, but its already overwhelmed completely with back compat.

 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-11 Thread Ryan Ernst (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130270#comment-14130270
 ] 

Ryan Ernst commented on LUCENE-5940:


bq. the problem with the upgrade tool approach is that it doesn't scale to 
clusters with large numbers of indexes.

Can you elaborate more?  Your example of 50 indexes spread across many machines 
doesn't make me understand how it would be difficult to run this tool. I see 
the steps as:
# Install the newest lucene (you would already have to do this)
# Run the meta tool. This will download the necessary indexupgrader self 
contained jar for previous releases, and follow the upgrade path to get to the 
current release.

bq. are there technical issues here i'm unaware of beyond creating and 
maintaining the backwards compat tests?

I'd just like to reiterate what Robert said.  Have you looked at how much code 
is involved in maintaining backcompat?  Just for the current 3x and 4x, it is 
enormous.  And you can't assume the codec API will stay the same. Changing the 
codec api means updating old codecs in some way that they still work as 
expected (Robert's example with updateable DV). Minimizing that effort for a 
developer allows more rapid experimentation and iteration.  

The advantage to the indexupgrader tool Uwe described is it is completely self 
contained.  All the old codecs are there, and when that jar was created, it was 
tested thoroughly with the upgrade paths it supports. But those old codecs and 
upgrade paths don't have to be in the current codebase, which makes changing 
the current code easier.

 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-11 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130307#comment-14130307
 ] 

Tim Smith commented on LUCENE-5940:
---

i would not consider old indexes not containing support for new features an 
issue.
if you want to use new options/features/structures, you need to reindex, no 
problem here. 

you don't have to convince me that supporting back compat sucks. i agree, but 
lucene is used by a lot of people for a lot of disparate use cases. removing 
support for back compat will drive people away since it removes seamless 
upgrade paths. 

think what would have happened if microsoft release 64-bit windows with no 
support for running old 32-bit programs.
people still want to run old dos programs on windows (go figure, but they 
want/need it)

it hurts adoption of new versions if you don't provide the back compat. this 
just leaves a bunch of people running ancient versions of lucene because they 
don't have any good upgrade path other than complete reindexing.

if there is a bug in feature x, a possible solution is to just remove 
feature x, but this is gonna piss off everyone who relies on it, regardless 
of how much you may personally hate feature x

the main thing i see as a challenge that you mention here is that you want (or 
new features may require) refactoring the codec api.  

this is an engineering challenge and would just require some thought out design 
to decide what final api refactors should be needed to support flexibility, 
addition of new features, and growth without requiring mucking with old codecs 
in the future. 

right now, the IndexWriter and codecs are pretty muddled together in some 
cases. cleaning up these interfaces and making the codecs self contained should 
be a goal for any refactors to allow future innovation/addition of features.

as a lucene user, if back compat is yanked and not provided in 5.0 for all 4.x 
indexes, i will be extremely resistant to upgrade. I would be more inclined to 
fork the latest 4.x and ditch 5.0. 5.0 would have to offer something REALLY 
compelling to get me to adopt it.





 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130308#comment-14130308
 ] 

Robert Muir commented on LUCENE-5940:
-

{quote}
i would not consider old indexes not containing support for new features an 
issue.
if you want to use new options/features/structures, you need to reindex, no 
problem here. 
{quote}

Because you are not even considering the developer pain. The tests man, 
maintaining the tests.

 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-11 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130314#comment-14130314
 ] 

Tim Smith commented on LUCENE-5940:
---

bq. Can you elaborate more? Your example of 50 indexes spread across many 
machines doesn't make me understand how it would be difficult to run this tool. 
I see the steps as:

here's the issues i would have with an upgrade tool approach here.

1. external network connectivity is not guaranteed 
2. i have special metadata written in the segment metadata that is important
3. i use custom codec configuration that upgrade tool would need to use
4. replicated indexes need a lot of care
5. this tool would need to be run once for each directory containing an index, 
for every node that contains indexes
- this is an ops nightmare since i won't personally be running the tool. this 
leaves lots of room for user error that is avoided completely if the index 
upgrade is seamless (via read only codecs for old versions)
6. custom directory implementations may muck up the works

in general, i don't see any way this upgrade tool would be useful to me 
without repackaging and adding a ton of extra code to do all the things i need 
to ensure a consistent index is emitted

 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130320#comment-14130320
 ] 

Robert Muir commented on LUCENE-5940:
-

No offense Tim, but your comments exactly fit my description of this issue.

{quote}
The index back compat works like this: everyone wants it, but there are 
frequently bugs, and when push comes to shove, its not a very sexy thing to 
work on/fix, so its hard to get any help.
{quote}

I don't care what happens on this issue, personally, I'm done working on back 
compat completely until the policy changes. That includes the current 
in-progress 4.10.1 release. I've done more than my fair share of fighting it, 
and it just causes me endless frustration.

If people care about back compat, then they can go do things like regenerate 
indexes from previous lucene versions to ensure they arent buggy like 
LUCENE-5939 and that its actually working. They can try to refactor out old 
cruft in some way and work on improving the APIs of dead index formats.

But thats not for me. 

 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-11 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130324#comment-14130324
 ] 

Tim Smith commented on LUCENE-5940:
---

bq. Because you are not even considering the developer pain. The tests man, 
maintaining the tests.

the pain will continue to exist, you are just shifting who feels it. again, i 
get how painful it is, but best to have that pain felt at the source (and 
handled properly and consistently by people who fully understand it) as opposed 
to pushing it all downstream, polluting the waters

 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130331#comment-14130331
 ] 

Robert Muir commented on LUCENE-5940:
-

No thats not correct. what you are saying there is fuck you man, you do the 
work.

I will not.

 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-11 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130333#comment-14130333
 ] 

Tim Smith commented on LUCENE-5940:
---

bq. I don't care what happens on this issue, personally, I'm done working on 
back compat completely until the policy changes. That includes the current 
in-progress 4.10.1 release. I've done more than my fair share of fighting it, 
and it just causes me endless frustration.

fully your prerogative, this is a volunteer community.

i'm just putting in my 2 cents here since a change here will really be painful 
to me personally

of course i'm not a committer, so i have no final say

 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.

2014-09-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130493#comment-14130493
 ] 

Michael McCandless commented on LUCENE-5940:


+1 to relax the policy

+1 for the .99 approach: I think it's easier to grok than the time-based 
approach.

But if we do relax the policy I think we should also improve IndexUpgrader (or 
make a new top-level tool, which is what we expose to users, hiding the current 
IndexUpgrader, i.e. [~rjernst]'s idea) to do this upgrade across any 4.x to any 
5.x (or across more than 1 major release).

 change index backwards compatibility policy.
 

 Key: LUCENE-5940
 URL: https://issues.apache.org/jira/browse/LUCENE-5940
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Currently, our index backwards compatibility is unmanageable. The length of 
 time in which we must support old indexes is simply too long.
 The index back compat works like this: everyone wants it, but there are 
 frequently bugs, and when push comes to shove, its not a very sexy thing to 
 work on/fix, so its hard to get any help.
 Currently our back compat promise is just a broken promise, because we 
 cannot actually guarantee it for these reasons.
 I propose we scale back the length of time for which we must support old 
 indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org