Re: Lucene versioning logic

Michael McCandless Sat, 02 Aug 2014 01:55:29 -0700

+1, this sounds like a great solution.  It simplifies the APIs (no
more required Version to Analyzer), it consolidates the version logic
to a "single source", dot releases are first class.


Mike McCandless

http://blog.mikemccandless.com


On Fri, Aug 1, 2014 at 7:47 PM, Ryan Ernst <r...@iernst.net> wrote:
> There has been a lot of heated discussion recently about version
> tracking in Lucene [1] [2].  I wanted to have a fresh discussion
> outside of jira to give a full description of the current state of
> things, the problems I have heard, and a proposed solution.
>
> CURRENT
>
> We have 2 pieces of code that handle “versioning.”  The first is
> Constants.LUCENE_MAIN_VERSION, which is written to the SegmentsInfo
> for each segment.  This is a string version which is used to detect
> when the current version of lucene is newer than the version that
> wrote the segment (and how/if an upgrade to to a newer codec should be
> done). There is some complication with the “display” version and
> non-display version, which are distinguished by whether the version of
> lucene was an official release, or an alpha/beta version (which was
> added specifically for the 4.0.0 release ramp up).  This string
> version also has its own parsing and comparison methods.
>
> The second piece of versioning code is in Version.java, which is an
> enum used by analyzers to maintain backwards compatible behavior given
> a specific version of lucene.  The enum only contains values for dot
> releases of lucene, not bug fixes (which was what spurred the recent
> discussions over version). Analyzers’ constructors take a required
> Version parameter, which is only actually used by the few analyzers
> that have changed behavior recently.  Version.java contains a separate
> version parsing and comparison methods.
>
>
> CONCERNS
>
> * Having 2 different pieces of code that do very similar things is
> confusing for development.  Very few developers appear to really
> understand the current system (especially when trying to understand
> the alpha/beta setup).
>
> * Users are generally confused by the Version passed to analyzers: I
> know I was when I first started working with Lucene, and
> Version.CURRENT_VERSION was deprecated because users used that without
> understanding the implications.
>
> * Most analyzers currently have dead code constructors, since they
> never make use of Version.  There are also a lot of classes used by
> analyzers which contain similar dead code.
>
> * Backwards compatibility needs to be handled in some fashion, to
> ensure users have a path to upgrade from one version of lucene to
> another, without requiring immediate re-indexing.
>
>
> PROPOSAL
>
> I propose the following:
>
> * Consolidate all version related enumeration, including reading and
> writing string versions, into Version.java.  Have a static method that
> returns the current lucene version (replacing
> Constants.LUCENE_MAIN_VERSION).
>
> * Make bug fix releases first class in the enumeration, so that they
> can be distinguished for any compatibility issues that come up.
>
> * Remove all snapshot/alpha/beta versioning logic.  Alpha/beta was
> really only necessary for 4.0 because of the extreme changes that were
> being made.  The system is much more stable now, and 5.0 should not
> require preview releases, IMO.  I don’t think snapshots should be a
> concern because any user building an index from an unreleased build
> (which they built themselves) is just asking for trouble.  They do so
> at their own risk (of figuring out how to upgrade their indexes if
> they are not trash-able).  Backwards compatibility can be handled by
> adding the alpha/beta/final versions of 4.0 to the enum (and special
> parsing logic for this).  If lucene changes so much that we need
> alpha/beta type discrimination in the future, we can revisit the
> system if simply having extra versions in the enum won't work.
>
> * Analyzers constructors should have Version removed, and a setter
> should be added which allows production users to set the version used.
> This way any analyzers can still use version if it is set to something
> other than current (which would be the default), but users simply
> prototyping do not need to worry about it.
>
> * Classes that analyzers use, which take Version, should have Version
> removed, and the analyzers should choose which settings/variants of
> those classes to use based on the version they have set. In other
> words, all version variant logic should be contained within the
> analyzers.  For example, Lucene47WordDelimiterFilter, or
> StandardAnalyzer can take the unicode version.
> Factories could still take Version (e.g. TokenizerFactory,
> TokenFilterFactory, etc) to produce the correct component (so nothing
> will change for solr in this regard).
>
> I’m sure not everyone will be happy with what I have proposed, but I’m
> hoping we can work out a solution together, and then implement in a
> team-like fashion, the way I have seen the community work in the past,
> and I hope to see again in the future.
>
> Thanks
> Ryan
>
> [1] https://issues.apache.org/jira/browse/LUCENE-5850
> [2] https://issues.apache.org/jira/browse/LUCENE-5859
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene versioning logic

Reply via email to