Another proposal that I made on LUCENE-5859 is to get rid of Version (for Analyzers) and follow the solution we have with Codecs. If an Analyzer changes its runtime behavior, and e.g not marked @experimental, it can create a Foo49Analyzer with the new behavior. That way, apps are still safe when they upgrade, since their Foo45Analyzer still exists (but deprecated). And they can always copy a Foo45Analyzer when they upgrade to Lucene 6.0 where it no longer exists... with this approach, there's no single version across the app - it just uses the specific Analyzer impls.
Anyway, +1 to make bugfix release first class citizens. That's also the only way to make sure we support back compat if a bug is fixed in an Analyzer in e.g. 4.8.2. Shai On Sat, Aug 2, 2014 at 11:54 AM, Michael McCandless < [email protected]> wrote: > +1, this sounds like a great solution. It simplifies the APIs (no > more required Version to Analyzer), it consolidates the version logic > to a "single source", dot releases are first class. > > Mike McCandless > > http://blog.mikemccandless.com > > > On Fri, Aug 1, 2014 at 7:47 PM, Ryan Ernst <[email protected]> wrote: > > There has been a lot of heated discussion recently about version > > tracking in Lucene [1] [2]. I wanted to have a fresh discussion > > outside of jira to give a full description of the current state of > > things, the problems I have heard, and a proposed solution. > > > > CURRENT > > > > We have 2 pieces of code that handle “versioning.” The first is > > Constants.LUCENE_MAIN_VERSION, which is written to the SegmentsInfo > > for each segment. This is a string version which is used to detect > > when the current version of lucene is newer than the version that > > wrote the segment (and how/if an upgrade to to a newer codec should be > > done). There is some complication with the “display” version and > > non-display version, which are distinguished by whether the version of > > lucene was an official release, or an alpha/beta version (which was > > added specifically for the 4.0.0 release ramp up). This string > > version also has its own parsing and comparison methods. > > > > The second piece of versioning code is in Version.java, which is an > > enum used by analyzers to maintain backwards compatible behavior given > > a specific version of lucene. The enum only contains values for dot > > releases of lucene, not bug fixes (which was what spurred the recent > > discussions over version). Analyzers’ constructors take a required > > Version parameter, which is only actually used by the few analyzers > > that have changed behavior recently. Version.java contains a separate > > version parsing and comparison methods. > > > > > > CONCERNS > > > > * Having 2 different pieces of code that do very similar things is > > confusing for development. Very few developers appear to really > > understand the current system (especially when trying to understand > > the alpha/beta setup). > > > > * Users are generally confused by the Version passed to analyzers: I > > know I was when I first started working with Lucene, and > > Version.CURRENT_VERSION was deprecated because users used that without > > understanding the implications. > > > > * Most analyzers currently have dead code constructors, since they > > never make use of Version. There are also a lot of classes used by > > analyzers which contain similar dead code. > > > > * Backwards compatibility needs to be handled in some fashion, to > > ensure users have a path to upgrade from one version of lucene to > > another, without requiring immediate re-indexing. > > > > > > PROPOSAL > > > > I propose the following: > > > > * Consolidate all version related enumeration, including reading and > > writing string versions, into Version.java. Have a static method that > > returns the current lucene version (replacing > > Constants.LUCENE_MAIN_VERSION). > > > > * Make bug fix releases first class in the enumeration, so that they > > can be distinguished for any compatibility issues that come up. > > > > * Remove all snapshot/alpha/beta versioning logic. Alpha/beta was > > really only necessary for 4.0 because of the extreme changes that were > > being made. The system is much more stable now, and 5.0 should not > > require preview releases, IMO. I don’t think snapshots should be a > > concern because any user building an index from an unreleased build > > (which they built themselves) is just asking for trouble. They do so > > at their own risk (of figuring out how to upgrade their indexes if > > they are not trash-able). Backwards compatibility can be handled by > > adding the alpha/beta/final versions of 4.0 to the enum (and special > > parsing logic for this). If lucene changes so much that we need > > alpha/beta type discrimination in the future, we can revisit the > > system if simply having extra versions in the enum won't work. > > > > * Analyzers constructors should have Version removed, and a setter > > should be added which allows production users to set the version used. > > This way any analyzers can still use version if it is set to something > > other than current (which would be the default), but users simply > > prototyping do not need to worry about it. > > > > * Classes that analyzers use, which take Version, should have Version > > removed, and the analyzers should choose which settings/variants of > > those classes to use based on the version they have set. In other > > words, all version variant logic should be contained within the > > analyzers. For example, Lucene47WordDelimiterFilter, or > > StandardAnalyzer can take the unicode version. > > Factories could still take Version (e.g. TokenizerFactory, > > TokenFilterFactory, etc) to produce the correct component (so nothing > > will change for solr in this regard). > > > > I’m sure not everyone will be happy with what I have proposed, but I’m > > hoping we can work out a solution together, and then implement in a > > team-like fashion, the way I have seen the community work in the past, > > and I hope to see again in the future. > > > > Thanks > > Ryan > > > > [1] https://issues.apache.org/jira/browse/LUCENE-5850 > > [2] https://issues.apache.org/jira/browse/LUCENE-5859 > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
