Re: 5.0 release status?

Jack Krupansky Sun, 05 Oct 2014 06:31:40 -0700

To be clear, I myself am not trying to offer advice on whether or when people 
should upgrade – I’m trying solely to determine if there is significant value 
to do so, and what that value might be. I did indeed read through Robert’s list 
and have watched the Jira flow over the years, but I am unable to pinpoint 
“significant” improvements that will have more than just a “minor” impact for 
users. I’m not trying to say that significant improvements aren’t actually in 
there, just that I don’t know of any. If I am wrong, please provide the 
details. Like... are there use cases where the 5.0 index will be at least 10% 
faster or at least 10% smaller, and if so, which specific features and use 
cases? Or if there is a cumulative improvement in performance or capacity.


Or... if there are specific feature transitions to recommend that would result 
in dramatic improvements.

I mean, as things stand, there has been a lot of “shuffling around”, but no 
clear, quantified insight on the benefits of that shuffling/refactoring. I’m 
all for cleaner code (which can manifest as more reliable and less bugs), but 
is that is gist of most of the index changes?

In short, I’m more interested in the impact of the 5.0 index changes (and their 
use cases), not the details of the implementation of those changes.

Put another way, will a typical app be at least 10% faster or 10% smaller (or 
both!) when its index is converted from 4.x to 5.0? Or 5% or 20% or... whatever 
it actually is?

And if there are specific new features that rely on conversion to 5.0 index 
format, lets get that list collected as some bullet points. Call this 
preparation for the 5.0 release! Maybe it could be a summary section in the 5.0 
migration guide.

Clearly there is plenty of goodness in the 5.0 work, but I’m just trying to get 
a handle on the overall impact.

-- Jack Krupansky

From: Ryan Ernst 
Sent: Sunday, October 5, 2014 12:48 AM
To: dev@lucene.apache.org 
Subject: Re: 5.0 release status?


On Oct 4, 2014 9:35 PM, "Jack Krupansky" <j...@basetechnology.com> wrote:
>
> Maybe I just can’t fully make sense of LUCENE-5934 – does it corrupt all 4.x 
> indexes, or some, or under some conditions? I mean, I had the impression that 
> it was only non-GA 4.0 indexes. And was it only 4.10 that was doing this, or 
> 4.0 GA through 4.9 as well?

The bug only affected people using the 4.10.0 release to read 4.0 beta/final 
segments (it thought they were 3x indexes).

>  
> In any case, I’m still not clear on the direct benefits to users of, say, 4.9 
> upgrading to 5.0 indexes. Any performance improvement? Any disk space 
> reduction? Any RAM reduction?

Again, read through all the stuff Robert has mentioned, read through 
lucene/CHANGES.txt, read the issues that are currently open. Your previous 
comments have suggested users upgrading to 5.0 would only do so so they can 
eventually upgrade to 6.0, implying they wouldn't upgrade their indexes for 
minor releases. This simply is not the best advice. Look back at 4.9 and 4.10 
for recent improvements in heap usage for doc values and norms for example. 
Going back farther, someone still on 4.0 doesn't benefit from the postings 
format improvements in 4.1. Users should upgrade their format whenever possible 
because improvements are always happening.

>  
> -- Jack Krupansky
>  
> From: Ryan Ernst
> Sent: Sunday, October 5, 2014 12:24 AM
> To: dev@lucene.apache.org
> Subject: Re: 5.0 release status?
>  
>
>
> On Oct 4, 2014 9:13 PM, "Jack Krupansky" <j...@basetechnology.com> wrote:
> >
> > Thanks for the further clarification. In short, the legacy of 3.x support 
> > was destabilizing 4.x itself (including testing), not just interfering with 
> > 6.x moving forward beyond 3.x index compatibility. So, 5.x will have less 
> > baggage holding it down than 4.x has today.
> >
> > I still need answers to:
> >
> > 1. Will users of 5.0 get any immediate benefit by reindexing or otherwise 
> > "upgrading" their 4.x indexes to 5.0?
>
> Yes, for all the reasons Robert already mentioned.
>
> >
> > 2. What is the easiest, most efficient way for users of 5.0 to upgrade 
> > their 4.x indexes to 5.0 so that they will not have to worry or do anything 
> > when 6.0 comes out?
>
> Again, users should always upgrade if possible. There are improvements for 
> memory and speed all the time. Currently they can use the IndexUpgrader 
> (offline) or wrap there merge policy with UpgradeIndexMergePolicy (although 
> both currently act like an optimize on the old segments, im hoping to change 
> that soon).
>
> Ryan
>
> >
> > -- Jack Krupansky
> >
> > -----Original Message----- From: Robert Muir
> > Sent: Saturday, October 4, 2014 10:43 PM
> >
> > To: dev@lucene.apache.org
> > Subject: Re: 5.0 release status?
> >
> > On Sat, Oct 4, 2014 at 12:35 PM, Jack Krupansky <j...@basetechnology.com> 
> > wrote:
> >>
> >> I tried to follow all of the trunk 6/branch 5x discussion, but... AFAICT
> >> there was no explicit decision or even implication that a release 5.0 would
> >> be imminent or that there would not be a 4.11 release. AFAICT, the whole
> >> trunk 6/branch 5x decision was more related to wanting to have a trunk that
> >> eliminated the 4x deprecations and was no longer constrained by
> >> compatibility with the 4x index – let me know if I am wrong about that in
> >> any way! But I did see a comment on one Jira referring to “preparation for 
> >> a
> >> 5.0 release”, so I wanted to inquire about intentions. So, is a 5.0 release
> >> “coming soon”, or are 4.11, 4.12, 4.13... equally likely?
> >
> >
> > I created a branch_5x because 3.x index support was responsible for
> > multiple recent corruption bugs, some of which starting impacting 4.x
> > indexes.
> >
> > Especially bad were:
> > LUCENE-5907: 3.x back compat code corrupts (not just can't read) your index.
> > LUCENE-5934: 3.x back compat code corrupts (not just can't read) your 4.0 
> > index.
> > LUCENE-5975: 3.x back compat code reports a false corruption (was
> > indeed a bug in those versions of lucene) for 3.0-3.3 indexes.
> >
> > Whenever I see patterns in corruptions then I see it as a systemic
> > problem and aggressively work to do something about it. I've seen
> > several lately, but these are the relevant ones:
> >
> > 3.x back compat: 3.x didn't have a codec API, so its wedged in, and
> > pretty hard. Its not that we were lazy, its that its radically
> > different: doesn't separate data by fields, sorts terms differently,
> > uses shared docstores, writes field numbers implicitly, ... We try to
> > emulate it the best we can for testing, but the emulation can't really
> > be perfect, so in such places: surprise, bugs. The only way to stop
> > these corruptions is to stop supporting it.
> >
> > test infrastructure: IMO lucene 4 wasn't really ready to support
> > multiple index formats from a test perspective, so we cheated and try
> > to emulate old formats and rotate them across all tests. This works
> > ok, but its horrible to debug (since
> > these are essentially integration tests), the false failure rate is
> > extremely high, and the complexity of the implementation is high. Its
> > not just that it misses to find some bugs, it was actually directly
> > responsible for corruption bugs like LUCENE-5377. But throughout 4.x,
> > we have fixed the situation and added BaseXYZFormat tests for each
> > part of an index format. Now we have reliable unit tests for each part
> > of the abstract codec API: adding new tests here finds old bugs and
> > prevents new ones in the future. For example I fixed several minor
> > bugs in 4.x's CFS code just the last few days with this approach.
> >
> > there are also other patterns like deleting files, commit fallback
> > logic, exception handling, addIndexes, etc that we have put
> > substantial work into recently for 5.0. Whatever was safe to backport
> > to bugfix releases, we tried, but some of these kinds of "fixes" are
> > just too heavy for a bugfix branch, and many just cannot even be done
> > as long as 3.x support exists. There is also some hardening in the 5.0
> > index format itself that really could not happen correctly as long as
> > we must support 3.x.
> >
> > So its not just that 3.x causes corruption bugs, it prevents us from
> > moving forward and actually tackling these other issues. This is
> > important to do or we will just continue to "tread water" and not
> > actually get ahead of them. So I did something about it and created a
> > 5.x branch. Worse case, nobody would follow along, but I guess I just
> > assumed the situation was widely understood.
> >
> >>
> >> Open questions: What is Heliosearch up to, and what are Elasticsearch’s
> >> intentions?
> >>
> >
> > I don't see how this is relevant. The straw the broke the camel's back
> > for me was LUCENE-5934, and it doesn't impact elasticsearch.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org 
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >

Re: 5.0 release status?

Reply via email to